DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v1 00/13] graph enhancement for multi-core dispatch
@ 2022-11-17  5:09 Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
                   ` (14 more replies)
  0 siblings, 15 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'generic' model selection which is a
self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

Generic Model
RTC:
Config Graph-A: node-0->current; node-1->current; node-2->current;
Graph-A':node-0/1/2 @0, Graph-A':node-0/1/2 @1, Graph-A':node-0/1/2 @2

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Pipeline:
Config Graph-A: node-0->0; node-1->1; node-2->2;
Graph-A':node-0 @0, Graph-A':node-1 @1, Graph-A':node-2 @2

+ - - - - - -+     +- - - - - - +     + - - - - - -+
'  Core #0   '     '  Core #1   '     '  Core #2   '
'            '     '            '     '            '
' +--------+ '     ' +--------+ '     ' +--------+ '
' | Node-0 | ' --> ' | Node-1 | ' --> ' | Node-2 | '
' +--------+ '     ' +--------+ '     ' +--------+ '
'            '     '            '     '            '
+ - - - - - -+     +- - - - - - +     + - - - - - -+

Hybrid:
Config Graph-A: node-0->current; node-1->current; node-2->2;
Graph-A':node-0/1 @0, Graph-A':node-0/1 @1, Graph-A':node-2 @2

+ - - - - - - - - - - - - - - - +     + - - - - - -+
'            Core #0            '     '  Core #2   '
'                               '     '            '
' +--------+         +--------+ '     ' +--------+ '
' | Node-0 | ------> | Node-1 | ' --> ' | Node-2 | '
' +--------+         +--------+ '     ' +--------+ '
'                               '     '            '
+ - - - - - - - - - - - - - - - +     + - - - - - -+
                                          ^
                                          |
                                          |
+ - - - - - - - - - - - - - - - +         |
'            Core #1            '         |
'                               '         |
' +--------+         +--------+ '         |
' | Node-0 | ------> | Node-1 | ' --------+
' +--------+         +--------+ '
'                               '
+ - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing and graph circular buffer walking to make
  it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8,9,10.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="generic"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (13):
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add macro to walk on graph circular buffer
  graph: add get/set graph worker model APIs
  graph: introduce core affinity API
  graph: introduce graph affinity API
  graph: introduce graph clone API for other worker core
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph generic scheduler model
  graph: add stats for corss-core dispatching
  examples/l3fwd-graph: introduce generic worker model

 examples/l3fwd-graph/main.c         | 218 +++++++++--
 lib/graph/graph.c                   | 179 +++++++++
 lib/graph/graph_debug.c             |   6 +
 lib/graph/graph_populate.c          |   1 +
 lib/graph/graph_private.h           |  44 +++
 lib/graph/graph_stats.c             |  74 +++-
 lib/graph/meson.build               |   3 +-
 lib/graph/node.c                    |   1 +
 lib/graph/rte_graph.h               |  44 +++
 lib/graph/rte_graph_model_generic.c | 179 +++++++++
 lib/graph/rte_graph_model_generic.h | 114 ++++++
 lib/graph/rte_graph_model_rtc.h     |  22 ++
 lib/graph/rte_graph_worker.h        | 516 ++------------------------
 lib/graph/rte_graph_worker_common.h | 545 ++++++++++++++++++++++++++++
 lib/graph/version.map               |   8 +
 15 files changed, 1430 insertions(+), 524 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_generic.c
 create mode 100644 lib/graph/rte_graph_model_generic.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 01/13] graph: split graph worker into common and default model
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 13:38   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 02/13] graph: move node process into inline function Zhirun Yan
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     |  57 ++++
 lib/graph/rte_graph_worker.h        | 498 +---------------------------
 lib/graph/rte_graph_worker_common.h | 456 +++++++++++++++++++++++++
 3 files changed, 515 insertions(+), 496 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker_common.h

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..fb58730bde
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,57 @@
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+		node->idx = 0;
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 6dc7461659..54d1390786 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -1,122 +1,4 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(C) 2020 Marvell International Ltd.
- */
-
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
-
-/**
- * @file rte_graph_worker.h
- *
- * @warning
- * @b EXPERIMENTAL:
- * All functions in this file may be changed or removed without prior notice.
- *
- * This API allows a worker thread to walk over a graph and nodes to create,
- * process, enqueue and move streams of objects to the next nodes.
- */
-
-#include <rte_common.h>
-#include <rte_cycles.h>
-#include <rte_prefetch.h>
-#include <rte_memcpy.h>
-#include <rte_memory.h>
-
-#include "rte_graph.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/**
- * @internal
- *
- * Data structure to hold graph data.
- */
-struct rte_graph {
-	uint32_t tail;		     /**< Tail of circular buffer. */
-	uint32_t head;		     /**< Head of circular buffer. */
-	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
-	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
-	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
-	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
-	rte_graph_t id;	/**< Graph identifier. */
-	int socket;	/**< Socket ID where memory is allocated. */
-	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
-	uint64_t fence;			/**< Fence. */
-} __rte_cache_aligned;
-
-/**
- * @internal
- *
- * Data structure to hold node data.
- */
-struct rte_node {
-	/* Slow path area  */
-	uint64_t fence;		/**< Fence. */
-	rte_graph_off_t next;	/**< Index to next node. */
-	rte_node_t id;		/**< Node identifier. */
-	rte_node_t parent_id;	/**< Parent Node identifier. */
-	rte_edge_t nb_edges;	/**< Number of edges from this node. */
-	uint32_t realloc_count;	/**< Number of times realloced. */
-
-	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
-	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
-
-	/* Fast path area  */
-#define RTE_NODE_CTX_SZ 16
-	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-	uint16_t size;		/**< Total number of objects available. */
-	uint16_t idx;		/**< Number of objects used. */
-	rte_graph_off_t off;	/**< Offset of node in the graph reel. */
-	uint64_t total_cycles;	/**< Cycles spent in this node. */
-	uint64_t total_calls;	/**< Calls done to this node. */
-	uint64_t total_objs;	/**< Objects processed by this node. */
-	RTE_STD_C11
-		union {
-			void **objs;	   /**< Array of object pointers. */
-			uint64_t objs_u64;
-		};
-	RTE_STD_C11
-		union {
-			rte_node_process_t process; /**< Process function. */
-			uint64_t process_u64;
-		};
-	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
-} __rte_cache_aligned;
-
-/**
- * @internal
- *
- * Allocate a stream of objects.
- *
- * If stream already exists then re-allocate it to a larger size.
- *
- * @param graph
- *   Pointer to the graph object.
- * @param node
- *   Pointer to the node object.
- */
-__rte_experimental
-void __rte_node_stream_alloc(struct rte_graph *graph, struct rte_node *node);
-
-/**
- * @internal
- *
- * Allocate a stream with requested number of objects.
- *
- * If stream already exists then re-allocate it to a larger size.
- *
- * @param graph
- *   Pointer to the graph object.
- * @param node
- *   Pointer to the node object.
- * @param req_size
- *   Number of objects to be allocated.
- */
-__rte_experimental
-void __rte_node_stream_alloc_size(struct rte_graph *graph,
-				  struct rte_node *node, uint16_t req_size);
+#include "rte_graph_model_rtc.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -131,381 +13,5 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
-/* Fast path helper functions */
-
-/**
- * @internal
- *
- * Enqueue a given node to the tail of the graph reel.
- *
- * @param graph
- *   Pointer Graph object.
- * @param node
- *   Pointer to node object to be enqueued.
- */
-static __rte_always_inline void
-__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
-{
-	uint32_t tail;
-
-	tail = graph->tail;
-	graph->cir_start[tail++] = node->off;
-	graph->tail = tail & graph->cir_mask;
-}
-
-/**
- * @internal
- *
- * Enqueue sequence prologue function.
- *
- * Updates the node to tail of graph reel and resizes the number of objects
- * available in the stream as needed.
- *
- * @param graph
- *   Pointer to the graph object.
- * @param node
- *   Pointer to the node object.
- * @param idx
- *   Index at which the object enqueue starts from.
- * @param space
- *   Space required for the object enqueue.
- */
-static __rte_always_inline void
-__rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node,
-			    const uint16_t idx, const uint16_t space)
-{
-
-	/* Add to the pending stream list if the node is new */
-	if (idx == 0)
-		__rte_node_enqueue_tail_update(graph, node);
-
-	if (unlikely(node->size < (idx + space)))
-		__rte_node_stream_alloc_size(graph, node, node->size + space);
-}
-
-/**
- * @internal
- *
- * Get the node pointer from current node edge id.
- *
- * @param node
- *   Current node pointer.
- * @param next
- *   Edge id of the required node.
- *
- * @return
- *   Pointer to the node denoted by the edge id.
- */
-static __rte_always_inline struct rte_node *
-__rte_node_next_node_get(struct rte_node *node, rte_edge_t next)
-{
-	RTE_ASSERT(next < node->nb_edges);
-	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-	node = node->nodes[next];
-	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-
-	return node;
-}
-
-/**
- * Enqueue the objs to next node for further processing and set
- * the next node to pending state in the circular buffer.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to enqueue objs.
- * @param objs
- *   Objs to enqueue.
- * @param nb_objs
- *   Number of objs to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue(struct rte_graph *graph, struct rte_node *node,
-		 rte_edge_t next, void **objs, uint16_t nb_objs)
-{
-	node = __rte_node_next_node_get(node, next);
-	const uint16_t idx = node->idx;
-
-	__rte_node_enqueue_prologue(graph, node, idx, nb_objs);
-
-	rte_memcpy(&node->objs[idx], objs, nb_objs * sizeof(void *));
-	node->idx = idx + nb_objs;
+	rte_graph_walk_rtc(graph);
 }
-
-/**
- * Enqueue only one obj to next node for further processing and
- * set the next node to pending state in the circular buffer.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to enqueue objs.
- * @param obj
- *   Obj to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue_x1(struct rte_graph *graph, struct rte_node *node,
-		    rte_edge_t next, void *obj)
-{
-	node = __rte_node_next_node_get(node, next);
-	uint16_t idx = node->idx;
-
-	__rte_node_enqueue_prologue(graph, node, idx, 1);
-
-	node->objs[idx++] = obj;
-	node->idx = idx;
-}
-
-/**
- * Enqueue only two objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
- * Same as rte_node_enqueue_x1 but enqueue two objs.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to enqueue objs.
- * @param obj0
- *   Obj to enqueue.
- * @param obj1
- *   Obj to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue_x2(struct rte_graph *graph, struct rte_node *node,
-		    rte_edge_t next, void *obj0, void *obj1)
-{
-	node = __rte_node_next_node_get(node, next);
-	uint16_t idx = node->idx;
-
-	__rte_node_enqueue_prologue(graph, node, idx, 2);
-
-	node->objs[idx++] = obj0;
-	node->objs[idx++] = obj1;
-	node->idx = idx;
-}
-
-/**
- * Enqueue only four objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
- * Same as rte_node_enqueue_x1 but enqueue four objs.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to enqueue objs.
- * @param obj0
- *   1st obj to enqueue.
- * @param obj1
- *   2nd obj to enqueue.
- * @param obj2
- *   3rd obj to enqueue.
- * @param obj3
- *   4th obj to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue_x4(struct rte_graph *graph, struct rte_node *node,
-		    rte_edge_t next, void *obj0, void *obj1, void *obj2,
-		    void *obj3)
-{
-	node = __rte_node_next_node_get(node, next);
-	uint16_t idx = node->idx;
-
-	__rte_node_enqueue_prologue(graph, node, idx, 4);
-
-	node->objs[idx++] = obj0;
-	node->objs[idx++] = obj1;
-	node->objs[idx++] = obj2;
-	node->objs[idx++] = obj3;
-	node->idx = idx;
-}
-
-/**
- * Enqueue objs to multiple next nodes for further processing and
- * set the next nodes to pending state in the circular buffer.
- * objs[i] will be enqueued to nexts[i].
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param nexts
- *   List of relative next node indices to enqueue objs.
- * @param objs
- *   List of objs to enqueue.
- * @param nb_objs
- *   Number of objs to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue_next(struct rte_graph *graph, struct rte_node *node,
-		      rte_edge_t *nexts, void **objs, uint16_t nb_objs)
-{
-	uint16_t i;
-
-	for (i = 0; i < nb_objs; i++)
-		rte_node_enqueue_x1(graph, node, nexts[i], objs[i]);
-}
-
-/**
- * Get the stream of next node to enqueue the objs.
- * Once done with the updating the objs, needs to call
- * rte_node_next_stream_put to put the next node to pending state.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to get stream.
- * @param nb_objs
- *   Requested free size of the next stream.
- *
- * @return
- *   Valid next stream on success.
- *
- * @see rte_node_next_stream_put().
- */
-__rte_experimental
-static inline void **
-rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node,
-			 rte_edge_t next, uint16_t nb_objs)
-{
-	node = __rte_node_next_node_get(node, next);
-	const uint16_t idx = node->idx;
-	uint16_t free_space = node->size - idx;
-
-	if (unlikely(free_space < nb_objs))
-		__rte_node_stream_alloc_size(graph, node, node->size + nb_objs);
-
-	return &node->objs[idx];
-}
-
-/**
- * Put the next stream to pending state in the circular buffer
- * for further processing. Should be invoked after rte_node_next_stream_get().
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index..
- * @param idx
- *   Number of objs updated in the stream after getting the stream using
- *   rte_node_next_stream_get.
- *
- * @see rte_node_next_stream_get().
- */
-__rte_experimental
-static inline void
-rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node,
-			 rte_edge_t next, uint16_t idx)
-{
-	if (unlikely(!idx))
-		return;
-
-	node = __rte_node_next_node_get(node, next);
-	if (node->idx == 0)
-		__rte_node_enqueue_tail_update(graph, node);
-
-	node->idx += idx;
-}
-
-/**
- * Home run scenario, Enqueue all the objs of current node to next
- * node in optimized way by swapping the streams of both nodes.
- * Performs good when next node is already not in pending state.
- * If next node is already in pending state then normal enqueue
- * will be used.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param src
- *   Current node pointer.
- * @param next
- *   Relative next node index.
- */
-__rte_experimental
-static inline void
-rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
-			  rte_edge_t next)
-{
-	struct rte_node *dst = __rte_node_next_node_get(src, next);
-
-	/* Let swap the pointers if dst don't have valid objs */
-	if (likely(dst->idx == 0)) {
-		void **dobjs = dst->objs;
-		uint16_t dsz = dst->size;
-		dst->objs = src->objs;
-		dst->size = src->size;
-		src->objs = dobjs;
-		src->size = dsz;
-		dst->idx = src->idx;
-		__rte_node_enqueue_tail_update(graph, dst);
-	} else { /* Move the objects from src node to dst node */
-		rte_node_enqueue(graph, src, next, src->objs, src->idx);
-	}
-}
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
new file mode 100644
index 0000000000..91a5de7fa4
--- /dev/null
+++ b/lib/graph/rte_graph_worker_common.h
@@ -0,0 +1,456 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ */
+
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
+
+/**
+ * @file rte_graph_worker.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows a worker thread to walk over a graph and nodes to create,
+ * process, enqueue and move streams of objects to the next nodes.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_prefetch.h>
+#include <rte_memcpy.h>
+#include <rte_memory.h>
+
+#include "rte_graph.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal
+ *
+ * Data structure to hold graph data.
+ */
+struct rte_graph {
+	uint32_t tail;		     /**< Tail of circular buffer. */
+	uint32_t head;		     /**< Head of circular buffer. */
+	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
+	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
+	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	rte_graph_t id;	/**< Graph identifier. */
+	int socket;	/**< Socket ID where memory is allocated. */
+	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
+	uint64_t fence;			/**< Fence. */
+} __rte_cache_aligned;
+
+/**
+ * @internal
+ *
+ * Data structure to hold node data.
+ */
+struct rte_node {
+	/* Slow path area  */
+	uint64_t fence;		/**< Fence. */
+	rte_graph_off_t next;	/**< Index to next node. */
+	rte_node_t id;		/**< Node identifier. */
+	rte_node_t parent_id;	/**< Parent Node identifier. */
+	rte_edge_t nb_edges;	/**< Number of edges from this node. */
+	uint32_t realloc_count;	/**< Number of times realloced. */
+
+	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
+	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
+
+	/* Fast path area  */
+#define RTE_NODE_CTX_SZ 16
+	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
+	uint16_t size;		/**< Total number of objects available. */
+	uint16_t idx;		/**< Number of objects used. */
+	rte_graph_off_t off;	/**< Offset of node in the graph reel. */
+	uint64_t total_cycles;	/**< Cycles spent in this node. */
+	uint64_t total_calls;	/**< Calls done to this node. */
+	uint64_t total_objs;	/**< Objects processed by this node. */
+
+	RTE_STD_C11
+		union {
+			void **objs;	   /**< Array of object pointers. */
+			uint64_t objs_u64;
+		};
+	RTE_STD_C11
+		union {
+			rte_node_process_t process; /**< Process function. */
+			uint64_t process_u64;
+		};
+	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
+} __rte_cache_aligned;
+
+/**
+ * @internal
+ *
+ * Allocate a stream of objects.
+ *
+ * If stream already exists then re-allocate it to a larger size.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ */
+__rte_experimental
+void __rte_node_stream_alloc(struct rte_graph *graph, struct rte_node *node);
+
+/**
+ * @internal
+ *
+ * Allocate a stream with requested number of objects.
+ *
+ * If stream already exists then re-allocate it to a larger size.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param req_size
+ *   Number of objects to be allocated.
+ */
+__rte_experimental
+void __rte_node_stream_alloc_size(struct rte_graph *graph,
+				  struct rte_node *node, uint16_t req_size);
+
+/* Fast path helper functions */
+
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
+{
+	uint32_t tail;
+
+	tail = graph->tail;
+	graph->cir_start[tail++] = node->off;
+	graph->tail = tail & graph->cir_mask;
+}
+
+/**
+ * @internal
+ *
+ * Enqueue sequence prologue function.
+ *
+ * Updates the node to tail of graph reel and resizes the number of objects
+ * available in the stream as needed.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param idx
+ *   Index at which the object enqueue starts from.
+ * @param space
+ *   Space required for the object enqueue.
+ */
+static __rte_always_inline void
+__rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node,
+			    const uint16_t idx, const uint16_t space)
+{
+
+	/* Add to the pending stream list if the node is new */
+	if (idx == 0)
+		__rte_node_enqueue_tail_update(graph, node);
+
+	if (unlikely(node->size < (idx + space)))
+		__rte_node_stream_alloc_size(graph, node, node->size + space);
+}
+
+/**
+ * @internal
+ *
+ * Get the node pointer from current node edge id.
+ *
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Edge id of the required node.
+ *
+ * @return
+ *   Pointer to the node denoted by the edge id.
+ */
+static __rte_always_inline struct rte_node *
+__rte_node_next_node_get(struct rte_node *node, rte_edge_t next)
+{
+	RTE_ASSERT(next < node->nb_edges);
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	node = node->nodes[next];
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+
+	return node;
+}
+
+/**
+ * Enqueue the objs to next node for further processing and set
+ * the next node to pending state in the circular buffer.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to enqueue objs.
+ * @param objs
+ *   Objs to enqueue.
+ * @param nb_objs
+ *   Number of objs to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue(struct rte_graph *graph, struct rte_node *node,
+		 rte_edge_t next, void **objs, uint16_t nb_objs)
+{
+	node = __rte_node_next_node_get(node, next);
+	const uint16_t idx = node->idx;
+
+	__rte_node_enqueue_prologue(graph, node, idx, nb_objs);
+
+	rte_memcpy(&node->objs[idx], objs, nb_objs * sizeof(void *));
+	node->idx = idx + nb_objs;
+}
+
+/**
+ * Enqueue only one obj to next node for further processing and
+ * set the next node to pending state in the circular buffer.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to enqueue objs.
+ * @param obj
+ *   Obj to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue_x1(struct rte_graph *graph, struct rte_node *node,
+		    rte_edge_t next, void *obj)
+{
+	node = __rte_node_next_node_get(node, next);
+	uint16_t idx = node->idx;
+
+	__rte_node_enqueue_prologue(graph, node, idx, 1);
+
+	node->objs[idx++] = obj;
+	node->idx = idx;
+}
+
+/**
+ * Enqueue only two objs to next node for further processing and
+ * set the next node to pending state in the circular buffer.
+ * Same as rte_node_enqueue_x1 but enqueue two objs.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to enqueue objs.
+ * @param obj0
+ *   Obj to enqueue.
+ * @param obj1
+ *   Obj to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue_x2(struct rte_graph *graph, struct rte_node *node,
+		    rte_edge_t next, void *obj0, void *obj1)
+{
+	node = __rte_node_next_node_get(node, next);
+	uint16_t idx = node->idx;
+
+	__rte_node_enqueue_prologue(graph, node, idx, 2);
+
+	node->objs[idx++] = obj0;
+	node->objs[idx++] = obj1;
+	node->idx = idx;
+}
+
+/**
+ * Enqueue only four objs to next node for further processing and
+ * set the next node to pending state in the circular buffer.
+ * Same as rte_node_enqueue_x1 but enqueue four objs.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to enqueue objs.
+ * @param obj0
+ *   1st obj to enqueue.
+ * @param obj1
+ *   2nd obj to enqueue.
+ * @param obj2
+ *   3rd obj to enqueue.
+ * @param obj3
+ *   4th obj to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue_x4(struct rte_graph *graph, struct rte_node *node,
+		    rte_edge_t next, void *obj0, void *obj1, void *obj2,
+		    void *obj3)
+{
+	node = __rte_node_next_node_get(node, next);
+	uint16_t idx = node->idx;
+
+	__rte_node_enqueue_prologue(graph, node, idx, 4);
+
+	node->objs[idx++] = obj0;
+	node->objs[idx++] = obj1;
+	node->objs[idx++] = obj2;
+	node->objs[idx++] = obj3;
+	node->idx = idx;
+}
+
+/**
+ * Enqueue objs to multiple next nodes for further processing and
+ * set the next nodes to pending state in the circular buffer.
+ * objs[i] will be enqueued to nexts[i].
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param nexts
+ *   List of relative next node indices to enqueue objs.
+ * @param objs
+ *   List of objs to enqueue.
+ * @param nb_objs
+ *   Number of objs to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue_next(struct rte_graph *graph, struct rte_node *node,
+		      rte_edge_t *nexts, void **objs, uint16_t nb_objs)
+{
+	uint16_t i;
+
+	for (i = 0; i < nb_objs; i++)
+		rte_node_enqueue_x1(graph, node, nexts[i], objs[i]);
+}
+
+/**
+ * Get the stream of next node to enqueue the objs.
+ * Once done with the updating the objs, needs to call
+ * rte_node_next_stream_put to put the next node to pending state.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to get stream.
+ * @param nb_objs
+ *   Requested free size of the next stream.
+ *
+ * @return
+ *   Valid next stream on success.
+ *
+ * @see rte_node_next_stream_put().
+ */
+__rte_experimental
+static inline void **
+rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node,
+			 rte_edge_t next, uint16_t nb_objs)
+{
+	node = __rte_node_next_node_get(node, next);
+	const uint16_t idx = node->idx;
+	uint16_t free_space = node->size - idx;
+
+	if (unlikely(free_space < nb_objs))
+		__rte_node_stream_alloc_size(graph, node, node->size + nb_objs);
+
+	return &node->objs[idx];
+}
+
+/**
+ * Put the next stream to pending state in the circular buffer
+ * for further processing. Should be invoked after rte_node_next_stream_get().
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index..
+ * @param idx
+ *   Number of objs updated in the stream after getting the stream using
+ *   rte_node_next_stream_get.
+ *
+ * @see rte_node_next_stream_get().
+ */
+__rte_experimental
+static inline void
+rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node,
+			 rte_edge_t next, uint16_t idx)
+{
+	if (unlikely(!idx))
+		return;
+
+	node = __rte_node_next_node_get(node, next);
+	if (node->idx == 0)
+		__rte_node_enqueue_tail_update(graph, node);
+
+	node->idx += idx;
+}
+
+/**
+ * Home run scenario, Enqueue all the objs of current node to next
+ * node in optimized way by swapping the streams of both nodes.
+ * Performs good when next node is already not in pending state.
+ * If next node is already in pending state then normal enqueue
+ * will be used.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param src
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index.
+ */
+__rte_experimental
+static inline void
+rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
+			  rte_edge_t next)
+{
+	struct rte_node *dst = __rte_node_next_node_get(src, next);
+
+	/* Let swap the pointers if dst don't have valid objs */
+	if (likely(dst->idx == 0)) {
+		void **dobjs = dst->objs;
+		uint16_t dsz = dst->size;
+
+		dst->objs = src->objs;
+		dst->size = src->size;
+		src->objs = dobjs;
+		src->size = dsz;
+		dst->idx = src->idx;
+		__rte_node_enqueue_tail_update(graph, dst);
+	} else { /* Move the objects from src node to dst node */
+		rte_node_enqueue(graph, src, next, src->objs, src->idx);
+	}
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_COMMON_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 02/13] graph: move node process into inline function
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 13:39   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 03/13] graph: add macro to walk on graph circular buffer Zhirun Yan
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 18 +---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index fb58730bde..c80b0ce962 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -16,9 +16,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -37,20 +34,7 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
+		__rte_node_process(graph, node);
 		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 91a5de7fa4..b7b2bb958c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -121,6 +121,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 03/13] graph: add macro to walk on graph circular buffer
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 02/13] graph: move node process into inline function Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 13:45   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

It is common to walk on graph circular buffer and use macro to make
it reusable for other worker models.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 23 ++---------------------
 lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index c80b0ce962..5474b06063 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -12,30 +12,11 @@
 static inline void
 rte_graph_walk_rtc(struct rte_graph *graph)
 {
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
 
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+	rte_graph_walk_node(graph, head, node)
 		__rte_node_process(graph, node);
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
+
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b7b2bb958c..df33204336 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -121,6 +121,29 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * Macro to walk on the source node(s) ((cir_start - head) -> cir_start)
+ * and then on the pending streams
+ * (cir_start -> (cir_start + mask) -> cir_start)
+ * in a circular buffer fashion.
+ *
+ *	+-----+ <= cir_start - head [number of source nodes]
+ *	|     |
+ *	| ... | <= source nodes
+ *	|     |
+ *	+-----+ <= cir_start [head = 0] [tail = 0]
+ *	|     |
+ *	| ... | <= pending streams
+ *	|     |
+ *	+-----+ <= cir_start + mask
+ */
+#define rte_graph_walk_node(graph, head, node)                                         \
+	for ((node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]);        \
+	     likely((head) != (graph)->tail);                                           \
+	     (head)++,                                                                  \
+	     (node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]),        \
+	     (head) = likely((int32_t)(head) > 0) ? (head) & (graph)->cir_mask : (head))
+
 /**
  * @internal
  *
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (2 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 03/13] graph: add macro to walk on graph circular buffer Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-12-06  3:35   ` [EXT] " Kiran Kumar Kokkilagadda
  2023-02-20 13:50   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 05/13] graph: introduce core affinity API Zhirun Yan
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 13 ++++++++
 lib/graph/version.map               |  3 ++
 3 files changed, 67 insertions(+)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 54d1390786..a0ea0df153 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -1,5 +1,56 @@
 #include "rte_graph_model_rtc.h"
 
+static enum rte_graph_worker_model worker_model = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+static inline int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_MAX)
+		goto fail;
+
+	worker_model = model;
+	return 0;
+
+fail:
+	worker_model = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return worker_model;
+}
+
 /**
  * Perform graph walk on the circular buffer and invoke the process function
  * of the nodes and collect the stats.
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index df33204336..507a344afd 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -86,6 +86,19 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+
+
+/** Graph worker models */
+enum rte_graph_worker_model {
+#define WORKER_MODEL_DEFAULT "default"
+	RTE_GRAPH_MODEL_DEFAULT = 0,
+#define WORKER_MODEL_RTC "rtc"
+	RTE_GRAPH_MODEL_RTC,
+#define WORKER_MODEL_GENERIC "generic"
+	RTE_GRAPH_MODEL_GENERIC,
+	RTE_GRAPH_MODEL_MAX,
+};
+
 /**
  * @internal
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 05/13] graph: introduce core affinity API
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (3 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 14:05   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 06/13] graph: introduce graph " Zhirun Yan
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

1. add lcore_id for node to hold affinity core id.
2. impl rte_node_model_generic_set_lcore_affinity to affinity node
   with one lcore.
3. update version map for graph public API.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h           |  1 +
 lib/graph/meson.build               |  1 +
 lib/graph/node.c                    |  1 +
 lib/graph/rte_graph_model_generic.c | 31 +++++++++++++++++++++
 lib/graph/rte_graph_model_generic.h | 43 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_generic.c
 create mode 100644 lib/graph/rte_graph_model_generic.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..627090f802 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -49,6 +49,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..8c8b11ed27 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -14,6 +14,7 @@ sources = files(
         'graph_debug.c',
         'graph_stats.c',
         'graph_populate.c',
+        'rte_graph_model_generic.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index fc6345de07..8ad4b3cbeb 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_generic.c b/lib/graph/rte_graph_model_generic.c
new file mode 100644
index 0000000000..54ff659c7b
--- /dev/null
+++ b/lib/graph/rte_graph_model_generic.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_generic.h"
+
+int
+rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_generic.h b/lib/graph/rte_graph_model_generic.h
new file mode 100644
index 0000000000..20ca48a9e3
--- /dev/null
+++ b/lib/graph/rte_graph_model_generic.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_GENERIC_H_
+#define _RTE_GRAPH_MODEL_GENERIC_H_
+
+/**
+ * @file rte_graph_model_generic.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows a worker thread to walk over a graph and nodes to create,
+ * process, enqueue and move streams of objects to the next nodes.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity to the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_GENERIC_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..33ff055be6 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_node_model_generic_set_lcore_affinity;
+
 	local: *;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 06/13] graph: introduce graph affinity API
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (4 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 05/13] graph: introduce core affinity API Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 14:07   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 07/13] graph: introduce graph clone API for other worker core Zhirun Yan
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 3a617cc369..a8d8eb633e 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -245,6 +245,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_bind_core(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_unbind_core(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -328,6 +386,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 627090f802..7326975a86 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -97,6 +97,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..1d938f6979 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -280,6 +280,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Set graph lcore affinity attribute
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_bind_core(rte_graph_t id, int lcore);
+
+/**
+ * Unset the graph lcore affinity attribute
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_unbind_core(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 33ff055be6..1c599b5b47 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_bind_core;
+	rte_graph_unbind_core;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 07/13] graph: introduce graph clone API for other worker core
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (5 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 06/13] graph: introduce graph " Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 08/13] graph: introduce stream moving cross cores Zhirun Yan
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a8d8eb633e..17a9c87032 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -386,6 +386,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 
 	/* Allocate the Graph fast path memory and populate the data */
@@ -447,6 +448,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7326975a86..c1f2aadd42 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -97,6 +97,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 1d938f6979..210e125661 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -242,6 +242,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1c599b5b47..c4d8c2c271 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 08/13] graph: introduce stream moving cross cores
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (6 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 07/13] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 14:17   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 09/13] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

1. add graph_sched_wq_node to hold graph scheduling workqueue node
stream
2. add workqueue help functions to create/destroy/enqueue/dequeue

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |   1 +
 lib/graph/graph_populate.c          |   1 +
 lib/graph/graph_private.h           |  39 ++++++++
 lib/graph/meson.build               |   2 +-
 lib/graph/rte_graph_model_generic.c | 145 ++++++++++++++++++++++++++++
 lib/graph/rte_graph_model_generic.h |  35 +++++++
 lib/graph/rte_graph_worker_common.h |  18 ++++
 7 files changed, 240 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 17a9c87032..8ea0daaa35 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -275,6 +275,7 @@ rte_graph_bind_core(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..26f9670406 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -84,6 +84,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index c1f2aadd42..f58d0d1d63 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -59,6 +59,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for generic worker model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
@@ -349,4 +361,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 8c8b11ed27..f93ab6fdcb 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -18,4 +18,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal']
+deps += ['eal', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_generic.c b/lib/graph/rte_graph_model_generic.c
index 54ff659c7b..c862237432 100644
--- a/lib/graph/rte_graph_model_generic.c
+++ b/lib/graph/rte_graph_model_generic.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_generic.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void __rte_noinline
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_generic.h b/lib/graph/rte_graph_model_generic.h
index 20ca48a9e3..5715fc8ffb 100644
--- a/lib/graph/rte_graph_model_generic.h
+++ b/lib/graph/rte_graph_model_generic.h
@@ -15,12 +15,47 @@
  * This API allows a worker thread to walk over a graph and nodes to create,
  * process, enqueue and move streams of objects to the next nodes.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+bool __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+void __rte_noinline __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity to the node.
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 507a344afd..cf38a03f44 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -28,6 +28,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -39,6 +46,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -63,6 +79,8 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/* Fast schedule area */
+	unsigned int lcore_id __rte_cache_aligned;  /**< Node running Lcore. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 09/13] graph: enable create and destroy graph scheduling workqueue
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (7 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 08/13] graph: introduce stream moving cross cores Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 10/13] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8ea0daaa35..63d9bcffd2 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -428,6 +428,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -522,6 +526,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 10/13] graph: introduce graph walk by cross-core dispatch
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (8 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 09/13] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 11/13] graph: enable graph generic scheduler model Zhirun Yan
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_generic.h | 36 +++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/lib/graph/rte_graph_model_generic.h b/lib/graph/rte_graph_model_generic.h
index 5715fc8ffb..c29fc31309 100644
--- a/lib/graph/rte_graph_model_generic.h
+++ b/lib/graph/rte_graph_model_generic.h
@@ -71,6 +71,42 @@ void __rte_noinline __rte_graph_sched_wq_process(struct rte_graph *graph);
 __rte_experimental
 int rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_generic(struct rte_graph *graph)
+{
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	rte_graph_walk_node(graph, head, node) {
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 11/13] graph: enable graph generic scheduler model
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (9 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 10/13] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 12/13] graph: add stats for corss-core dispatching Zhirun Yan
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index a0ea0df153..dea207ca46 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -1,4 +1,5 @@
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_generic.h"
 
 static enum rte_graph_worker_model worker_model = RTE_GRAPH_MODEL_DEFAULT;
 
@@ -64,5 +65,11 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_GENERIC)
+		rte_graph_walk_generic(graph);
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 12/13] graph: add stats for corss-core dispatching
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (10 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 11/13] graph: enable graph generic scheduler model Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model Zhirun Yan
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c             |  6 +++
 lib/graph/graph_stats.c             | 74 +++++++++++++++++++++++++----
 lib/graph/rte_graph.h               |  2 +
 lib/graph/rte_graph_model_generic.c |  3 ++
 lib/graph/rte_graph_worker_common.h |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..080ba16ad9 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..801fcb832d 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_generic()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_generic(FILE *f)
+{
+	boarder_model_generic();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_generic();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC)
+		print_banner_generic(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_GENERIC)
+			boarder_model_generic();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_GENERIC) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_GENERIC) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 210e125661..2d22ee0255 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -203,6 +203,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_generic.c b/lib/graph/rte_graph_model_generic.c
index c862237432..5504a65a39 100644
--- a/lib/graph/rte_graph_model_generic.c
+++ b/lib/graph/rte_graph_model_generic.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index cf38a03f44..346f8337d4 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -81,6 +81,8 @@ struct rte_node {
 
 	/* Fast schedule area */
 	unsigned int lcore_id __rte_cache_aligned;  /**< Node running Lcore. */
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (11 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 12/13] graph: add stats for corss-core dispatching Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 14:20   ` Jerin Jacob
  2023-02-20  0:22 ` [PATCH v1 00/13] graph enhancement for multi-core dispatch Thomas Monjalon
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose generic or rtc worker model.
And in generic model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="generic"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 218 +++++++++++++++++++++++++++++-------
 1 file changed, 179 insertions(+), 39 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 6dcb6ee92b..c145a3e3e8 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -147,6 +147,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -291,6 +304,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_DEFAULT) == 0)
+		return RTE_GRAPH_MODEL_DEFAULT;
+	else if (strcmp(model, WORKER_MODEL_GENERIC) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_GENERIC);
+		return RTE_GRAPH_MODEL_GENERIC;
+	}
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_MAX;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -404,6 +431,7 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_NO_NUMA	   "no-numa"
 #define CMD_LINE_OPT_MAX_PKT_LEN   "max-pkt-len"
 #define CMD_LINE_OPT_PER_PORT_POOL "per-port-pool"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
 enum {
 	/* Long options mapped to a short option */
 
@@ -416,6 +444,7 @@ enum {
 	CMD_LINE_OPT_NO_NUMA_NUM,
 	CMD_LINE_OPT_MAX_PKT_LEN_NUM,
 	CMD_LINE_OPT_PARSE_PER_PORT_POOL,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -424,6 +453,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_NO_NUMA, 0, 0, CMD_LINE_OPT_NO_NUMA_NUM},
 	{CMD_LINE_OPT_MAX_PKT_LEN, 1, 0, CMD_LINE_OPT_MAX_PKT_LEN_NUM},
 	{CMD_LINE_OPT_PER_PORT_POOL, 0, 0, CMD_LINE_OPT_PARSE_PER_PORT_POOL},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -498,6 +528,11 @@ parse_args(int argc, char **argv)
 			per_port_pool = 1;
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -735,6 +770,140 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_generic(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	rte_edge_t i;
+	int ret;
+
+	for (int j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_node_model_generic_set_lcore_affinity(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	/* >8 End of graph initialization. */
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_node_model_generic_set_lcore_affinity(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (int i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_bind_core(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		/* >8 End of graph initialization. */
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -759,6 +928,7 @@ main(int argc, char **argv)
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -787,6 +957,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1026,46 +1199,13 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_DEFAULT)
+		graph_config_rtc(graph_conf);
+	else if (model == RTE_GRAPH_MODEL_GENERIC)
+		graph_config_generic(graph_conf);
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
@ 2022-12-06  3:35   ` Kiran Kumar Kokkilagadda
  2022-12-08  7:26     ` Yan, Zhirun
  2023-02-20 13:50   ` Jerin Jacob
  1 sibling, 1 reply; 369+ messages in thread
From: Kiran Kumar Kokkilagadda @ 2022-12-06  3:35 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran, Nithin Kumar Dabilpuram
  Cc: cunming.liang, haiyue.wang



> -----Original Message-----
> From: Zhirun Yan <zhirun.yan@intel.com>
> Sent: 17 November 2022 10:39 AM
> To: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Kiran
> Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>
> Cc: cunming.liang@intel.com; haiyue.wang@intel.com; Zhirun Yan
> <zhirun.yan@intel.com>
> Subject: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> External Email
> 
> ----------------------------------------------------------------------
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 13 ++++++++
>  lib/graph/version.map               |  3 ++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index
> 54d1390786..a0ea0df153 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -1,5 +1,56 @@
>  #include "rte_graph_model_rtc.h"
> 
> +static enum rte_graph_worker_model worker_model =
> +RTE_GRAPH_MODEL_DEFAULT;
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + * Set the graph worker model
> + *
> + * @note This function does not perform any locking, and is only safe to call
> + *    before graph running.
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +static inline int
> +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> +	if (model >= RTE_GRAPH_MODEL_MAX)
> +		goto fail;
> +
> +	worker_model = model;
> +	return 0;
> +
> +fail:
> +	worker_model = RTE_GRAPH_MODEL_DEFAULT;
> +	return -1;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Get the graph worker model
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   Graph worker model on success.
> + */
> +__rte_experimental
> +static inline
> +enum rte_graph_worker_model
> +rte_graph_worker_model_get(void)
> +{
> +	return worker_model;
> +}
> +
>  /**
>   * Perform graph walk on the circular buffer and invoke the process function
>   * of the nodes and collect the stats.
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index df33204336..507a344afd 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -86,6 +86,19 @@ struct rte_node {
>  	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes.
> */  } __rte_cache_aligned;
> 
> +
> +
> +/** Graph worker models */
> +enum rte_graph_worker_model {
> +#define WORKER_MODEL_DEFAULT "default"
> +	RTE_GRAPH_MODEL_DEFAULT = 0,
> +#define WORKER_MODEL_RTC "rtc"
> +	RTE_GRAPH_MODEL_RTC,

Since default is RTC, do we need one more enum  for RTC? Can we just have default and generic and remove rtc?

> +#define WORKER_MODEL_GENERIC "generic"
> +	RTE_GRAPH_MODEL_GENERIC,
> +	RTE_GRAPH_MODEL_MAX,
> +};
> +
>  /**
>   * @internal
>   *
> diff --git a/lib/graph/version.map b/lib/graph/version.map index
> 13b838752d..eea73ec9ca 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -43,5 +43,8 @@ EXPERIMENTAL {
>  	rte_node_next_stream_put;
>  	rte_node_next_stream_move;
> 
> +	rte_graph_worker_model_set;
> +	rte_graph_worker_model_get;
> +
>  	local: *;
>  };
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2022-12-06  3:35   ` [EXT] " Kiran Kumar Kokkilagadda
@ 2022-12-08  7:26     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2022-12-08  7:26 UTC (permalink / raw)
  To: Kiran Kumar Kokkilagadda, dev, Jerin Jacob Kollanukkaran,
	Nithin Kumar Dabilpuram
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Kiran Kumar Kokkilagadda <kirankumark@marvell.com>
> Sent: Tuesday, December 6, 2022 11:35 AM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model
> APIs
> 
> 
> 
> > -----Original Message-----
> > From: Zhirun Yan <zhirun.yan@intel.com>
> > Sent: 17 November 2022 10:39 AM
> > To: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > Kiran Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar
> > Dabilpuram <ndabilpuram@marvell.com>
> > Cc: cunming.liang@intel.com; haiyue.wang@intel.com; Zhirun Yan
> > <zhirun.yan@intel.com>
> > Subject: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model
> > APIs
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> >  lib/graph/version.map               |  3 ++
> >  3 files changed, 67 insertions(+)
> >
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h index
> > 54d1390786..a0ea0df153 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -1,5 +1,56 @@
> >  #include "rte_graph_model_rtc.h"
> >
> > +static enum rte_graph_worker_model worker_model =
> > +RTE_GRAPH_MODEL_DEFAULT;
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + * Set the graph worker model
> > + *
> > + * @note This function does not perform any locking, and is only safe to
> call
> > + *    before graph running.
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> > +	if (model >= RTE_GRAPH_MODEL_MAX)
> > +		goto fail;
> > +
> > +	worker_model = model;
> > +	return 0;
> > +
> > +fail:
> > +	worker_model = RTE_GRAPH_MODEL_DEFAULT;
> > +	return -1;
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * Get the graph worker model
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   Graph worker model on success.
> > + */
> > +__rte_experimental
> > +static inline
> > +enum rte_graph_worker_model
> > +rte_graph_worker_model_get(void)
> > +{
> > +	return worker_model;
> > +}
> > +
> >  /**
> >   * Perform graph walk on the circular buffer and invoke the process
> function
> >   * of the nodes and collect the stats.
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index df33204336..507a344afd 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -86,6 +86,19 @@ struct rte_node {
> >  	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes.
> > */  } __rte_cache_aligned;
> >
> > +
> > +
> > +/** Graph worker models */
> > +enum rte_graph_worker_model {
> > +#define WORKER_MODEL_DEFAULT "default"
> > +	RTE_GRAPH_MODEL_DEFAULT = 0,
> > +#define WORKER_MODEL_RTC "rtc"
> > +	RTE_GRAPH_MODEL_RTC,
> 
> Since default is RTC, do we need one more enum  for RTC? Can we just have
> default and generic and remove rtc?
> 

Thanks for your comments.

Actually, there are two kinds of user, professional user and normal.
For professional users, If app chose RTC or GENERIC, it means that there is a specific
requirement for worker model.
And the default is for normal users who don't care the model.

Also, in the future, if more worker model added, RTC will be more clear to describe this
model rather than default. 


> > +#define WORKER_MODEL_GENERIC "generic"
> > +	RTE_GRAPH_MODEL_GENERIC,
> > +	RTE_GRAPH_MODEL_MAX,
> > +};
> > +
> >  /**
> >   * @internal
> >   *
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > 13b838752d..eea73ec9ca 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -43,5 +43,8 @@ EXPERIMENTAL {
> >  	rte_node_next_stream_put;
> >  	rte_node_next_stream_move;
> >
> > +	rte_graph_worker_model_set;
> > +	rte_graph_worker_model_get;
> > +
> >  	local: *;
> >  };
> > --
> > 2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 00/13] graph enhancement for multi-core dispatch
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (12 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model Zhirun Yan
@ 2023-02-20  0:22 ` Thomas Monjalon
  2023-02-20  8:28   ` Yan, Zhirun
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
  14 siblings, 1 reply; 369+ messages in thread
From: Thomas Monjalon @ 2023-02-20  0:22 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram
  Cc: dev, cunming.liang, haiyue.wang, Zhirun Yan, Zhirun Yan

This series doesn't look reviewed.
What is the status?

17/11/2022 06:09, Zhirun Yan:
> Currently, rte_graph supports RTC (Run-To-Completion) model within each
> of a single core.
> RTC is one of the typical model of packet processing. Others like
> Pipeline or Hybrid are lack of support.
> 
> The patch set introduces a 'generic' model selection which is a
> self-reacting scheme according to the core affinity.
> The new model enables a cross-core dispatching mechanism which employs a
> scheduling work-queue to dispatch streams to other worker cores which
> being associated with the destination node. When core flavor of the
> destination node is a default 'current', the stream can be continue
> executed as normal.
> 
> Example:
> 3-node graph targets 3-core budget
> 
> Generic Model
> RTC:
> Config Graph-A: node-0->current; node-1->current; node-2->current;
> Graph-A':node-0/1/2 @0, Graph-A':node-0/1/2 @1, Graph-A':node-0/1/2 @2
> 
> + - - - - - - - - - - - - - - - - - - - - - +
> '                Core #0/1/2                '
> '                                           '
> ' +--------+     +---------+     +--------+ '
> ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> ' +--------+     +---------+     +--------+ '
> '                                           '
> + - - - - - - - - - - - - - - - - - - - - - +
> 
> Pipeline:
> Config Graph-A: node-0->0; node-1->1; node-2->2;
> Graph-A':node-0 @0, Graph-A':node-1 @1, Graph-A':node-2 @2
> 
> + - - - - - -+     +- - - - - - +     + - - - - - -+
> '  Core #0   '     '  Core #1   '     '  Core #2   '
> '            '     '            '     '            '
> ' +--------+ '     ' +--------+ '     ' +--------+ '
> ' | Node-0 | ' --> ' | Node-1 | ' --> ' | Node-2 | '
> ' +--------+ '     ' +--------+ '     ' +--------+ '
> '            '     '            '     '            '
> + - - - - - -+     +- - - - - - +     + - - - - - -+
> 
> Hybrid:
> Config Graph-A: node-0->current; node-1->current; node-2->2;
> Graph-A':node-0/1 @0, Graph-A':node-0/1 @1, Graph-A':node-2 @2
> 
> + - - - - - - - - - - - - - - - +     + - - - - - -+
> '            Core #0            '     '  Core #2   '
> '                               '     '            '
> ' +--------+         +--------+ '     ' +--------+ '
> ' | Node-0 | ------> | Node-1 | ' --> ' | Node-2 | '
> ' +--------+         +--------+ '     ' +--------+ '
> '                               '     '            '
> + - - - - - - - - - - - - - - - +     + - - - - - -+
>                                           ^
>                                           |
>                                           |
> + - - - - - - - - - - - - - - - +         |
> '            Core #1            '         |
> '                               '         |
> ' +--------+         +--------+ '         |
> ' | Node-0 | ------> | Node-1 | ' --------+
> ' +--------+         +--------+ '
> '                               '
> + - - - - - - - - - - - - - - - +
> 
> 
> The patch set has been break down as below:
> 
> 1. Split graph worker into common and default model part.
> 2. Inline graph node processing and graph circular buffer walking to make
>   it reusable.
> 3. Add set/get APIs to choose worker model.
> 4. Introduce core affinity API to set the node run on specific worker core.
>   (only use in new model)
> 5. Introduce graph affinity API to bind one graph with specific worker
>   core.
> 6. Introduce graph clone API.
> 7. Introduce stream moving with scheduler work-queue in patch 8,9,10.
> 8. Add stats for new models.
> 9. Abstract default graph config process and integrate new model into
>   example/l3fwd-graph. Add new parameters for model choosing.
> 
> We could run with new worker model by this:
> ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="generic"
> 
> References:
> https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
> 
> Zhirun Yan (13):
>   graph: split graph worker into common and default model
>   graph: move node process into inline function
>   graph: add macro to walk on graph circular buffer
>   graph: add get/set graph worker model APIs
>   graph: introduce core affinity API
>   graph: introduce graph affinity API
>   graph: introduce graph clone API for other worker core
>   graph: introduce stream moving cross cores
>   graph: enable create and destroy graph scheduling workqueue
>   graph: introduce graph walk by cross-core dispatch
>   graph: enable graph generic scheduler model
>   graph: add stats for corss-core dispatching
>   examples/l3fwd-graph: introduce generic worker model
> 
>  examples/l3fwd-graph/main.c         | 218 +++++++++--
>  lib/graph/graph.c                   | 179 +++++++++
>  lib/graph/graph_debug.c             |   6 +
>  lib/graph/graph_populate.c          |   1 +
>  lib/graph/graph_private.h           |  44 +++
>  lib/graph/graph_stats.c             |  74 +++-
>  lib/graph/meson.build               |   3 +-
>  lib/graph/node.c                    |   1 +
>  lib/graph/rte_graph.h               |  44 +++
>  lib/graph/rte_graph_model_generic.c | 179 +++++++++
>  lib/graph/rte_graph_model_generic.h | 114 ++++++
>  lib/graph/rte_graph_model_rtc.h     |  22 ++
>  lib/graph/rte_graph_worker.h        | 516 ++------------------------
>  lib/graph/rte_graph_worker_common.h | 545 ++++++++++++++++++++++++++++
>  lib/graph/version.map               |   8 +
>  15 files changed, 1430 insertions(+), 524 deletions(-)
>  create mode 100644 lib/graph/rte_graph_model_generic.c
>  create mode 100644 lib/graph/rte_graph_model_generic.h
>  create mode 100644 lib/graph/rte_graph_model_rtc.h
>  create mode 100644 lib/graph/rte_graph_worker_common.h
> 
> 






^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 00/13] graph enhancement for multi-core dispatch
  2023-02-20  0:22 ` [PATCH v1 00/13] graph enhancement for multi-core dispatch Thomas Monjalon
@ 2023-02-20  8:28   ` Yan, Zhirun
  2023-02-20  9:33     ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-20  8:28 UTC (permalink / raw)
  To: Thomas Monjalon, jerinj, kirankumark, ndabilpuram
  Cc: dev, Liang, Cunming, Wang, Haiyue

Hi Thomas,

Jerin and Kiran gave some comments before.
And @jerinj@marvell.com @kirankumark@marvell.com 
could you help to review it?
Thanks.

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, February 20, 2023 8:22 AM
> To: jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com
> Cc: dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; Yan, Zhirun <zhirun.yan@intel.com>; Yan,
> Zhirun <zhirun.yan@intel.com>
> Subject: Re: [PATCH v1 00/13] graph enhancement for multi-core dispatch
> 
> This series doesn't look reviewed.
> What is the status?
> 
> 17/11/2022 06:09, Zhirun Yan:
> > Currently, rte_graph supports RTC (Run-To-Completion) model within
> > each of a single core.
> > RTC is one of the typical model of packet processing. Others like
> > Pipeline or Hybrid are lack of support.
> >
> > The patch set introduces a 'generic' model selection which is a
> > self-reacting scheme according to the core affinity.
> > The new model enables a cross-core dispatching mechanism which
> employs
> > a scheduling work-queue to dispatch streams to other worker cores
> > which being associated with the destination node. When core flavor of
> > the destination node is a default 'current', the stream can be
> > continue executed as normal.
> >
> > Example:
> > 3-node graph targets 3-core budget
> >
> > Generic Model
> > RTC:
> > Config Graph-A: node-0->current; node-1->current; node-2->current;
> > Graph-A':node-0/1/2 @0, Graph-A':node-0/1/2 @1, Graph-A':node-0/1/2
> @2
> >
> > + - - - - - - - - - - - - - - - - - - - - - +
> > '                Core #0/1/2                '
> > '                                           '
> > ' +--------+     +---------+     +--------+ '
> > ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> > ' +--------+     +---------+     +--------+ '
> > '                                           '
> > + - - - - - - - - - - - - - - - - - - - - - +
> >
> > Pipeline:
> > Config Graph-A: node-0->0; node-1->1; node-2->2;
> > Graph-A':node-0 @0, Graph-A':node-1 @1, Graph-A':node-2 @2
> >
> > + - - - - - -+     +- - - - - - +     + - - - - - -+
> > '  Core #0   '     '  Core #1   '     '  Core #2   '
> > '            '     '            '     '            '
> > ' +--------+ '     ' +--------+ '     ' +--------+ '
> > ' | Node-0 | ' --> ' | Node-1 | ' --> ' | Node-2 | '
> > ' +--------+ '     ' +--------+ '     ' +--------+ '
> > '            '     '            '     '            '
> > + - - - - - -+     +- - - - - - +     + - - - - - -+
> >
> > Hybrid:
> > Config Graph-A: node-0->current; node-1->current; node-2->2;
> > Graph-A':node-0/1 @0, Graph-A':node-0/1 @1, Graph-A':node-2 @2
> >
> > + - - - - - - - - - - - - - - - +     + - - - - - -+
> > '            Core #0            '     '  Core #2   '
> > '                               '     '            '
> > ' +--------+         +--------+ '     ' +--------+ '
> > ' | Node-0 | ------> | Node-1 | ' --> ' | Node-2 | '
> > ' +--------+         +--------+ '     ' +--------+ '
> > '                               '     '            '
> > + - - - - - - - - - - - - - - - +     + - - - - - -+
> >                                           ^
> >                                           |
> >                                           |
> > + - - - - - - - - - - - - - - - +         |
> > '            Core #1            '         |
> > '                               '         |
> > ' +--------+         +--------+ '         |
> > ' | Node-0 | ------> | Node-1 | ' --------+
> > ' +--------+         +--------+ '
> > '                               '
> > + - - - - - - - - - - - - - - - +
> >
> >
> > The patch set has been break down as below:
> >
> > 1. Split graph worker into common and default model part.
> > 2. Inline graph node processing and graph circular buffer walking to make
> >   it reusable.
> > 3. Add set/get APIs to choose worker model.
> > 4. Introduce core affinity API to set the node run on specific worker core.
> >   (only use in new model)
> > 5. Introduce graph affinity API to bind one graph with specific worker
> >   core.
> > 6. Introduce graph clone API.
> > 7. Introduce stream moving with scheduler work-queue in patch 8,9,10.
> > 8. Add stats for new models.
> > 9. Abstract default graph config process and integrate new model into
> >   example/l3fwd-graph. Add new parameters for model choosing.
> >
> > We could run with new worker model by this:
> > ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > --model="generic"
> >
> > References:
> > https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20intro
> > duce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
> >
> > Zhirun Yan (13):
> >   graph: split graph worker into common and default model
> >   graph: move node process into inline function
> >   graph: add macro to walk on graph circular buffer
> >   graph: add get/set graph worker model APIs
> >   graph: introduce core affinity API
> >   graph: introduce graph affinity API
> >   graph: introduce graph clone API for other worker core
> >   graph: introduce stream moving cross cores
> >   graph: enable create and destroy graph scheduling workqueue
> >   graph: introduce graph walk by cross-core dispatch
> >   graph: enable graph generic scheduler model
> >   graph: add stats for corss-core dispatching
> >   examples/l3fwd-graph: introduce generic worker model
> >
> >  examples/l3fwd-graph/main.c         | 218 +++++++++--
> >  lib/graph/graph.c                   | 179 +++++++++
> >  lib/graph/graph_debug.c             |   6 +
> >  lib/graph/graph_populate.c          |   1 +
> >  lib/graph/graph_private.h           |  44 +++
> >  lib/graph/graph_stats.c             |  74 +++-
> >  lib/graph/meson.build               |   3 +-
> >  lib/graph/node.c                    |   1 +
> >  lib/graph/rte_graph.h               |  44 +++
> >  lib/graph/rte_graph_model_generic.c | 179 +++++++++
> > lib/graph/rte_graph_model_generic.h | 114 ++++++
> >  lib/graph/rte_graph_model_rtc.h     |  22 ++
> >  lib/graph/rte_graph_worker.h        | 516 ++------------------------
> >  lib/graph/rte_graph_worker_common.h | 545
> ++++++++++++++++++++++++++++
> >  lib/graph/version.map               |   8 +
> >  15 files changed, 1430 insertions(+), 524 deletions(-)  create mode
> > 100644 lib/graph/rte_graph_model_generic.c
> >  create mode 100644 lib/graph/rte_graph_model_generic.h
> >  create mode 100644 lib/graph/rte_graph_model_rtc.h  create mode
> > 100644 lib/graph/rte_graph_worker_common.h
> >
> >
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 00/13] graph enhancement for multi-core dispatch
  2023-02-20  8:28   ` Yan, Zhirun
@ 2023-02-20  9:33     ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20  9:33 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: Thomas Monjalon, jerinj, kirankumark, ndabilpuram, dev, Liang,
	Cunming, Wang, Haiyue

On Mon, Feb 20, 2023 at 1:58 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
> Hi Thomas,
>
> Jerin and Kiran gave some comments before.
> And @jerinj@marvell.com @kirankumark@marvell.com
> could you help to review it?


Sure. I will do the next level.


> Thanks.
>
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Monday, February 20, 2023 8:22 AM
> > To: jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com
> > Cc: dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>; Yan, Zhirun <zhirun.yan@intel.com>; Yan,
> > Zhirun <zhirun.yan@intel.com>
> > Subject: Re: [PATCH v1 00/13] graph enhancement for multi-core dispatch
> >
> > This series doesn't look reviewed.
> > What is the status?
> >
> > 17/11/2022 06:09, Zhirun Yan:
> > > Currently, rte_graph supports RTC (Run-To-Completion) model within
> > > each of a single core.
> > > RTC is one of the typical model of packet processing. Others like
> > > Pipeline or Hybrid are lack of support.
> > >
> > > The patch set introduces a 'generic' model selection which is a
> > > self-reacting scheme according to the core affinity.
> > > The new model enables a cross-core dispatching mechanism which
> > employs
> > > a scheduling work-queue to dispatch streams to other worker cores
> > > which being associated with the destination node. When core flavor of
> > > the destination node is a default 'current', the stream can be
> > > continue executed as normal.
> > >
> > > Example:
> > > 3-node graph targets 3-core budget
> > >
> > > Generic Model
> > > RTC:
> > > Config Graph-A: node-0->current; node-1->current; node-2->current;
> > > Graph-A':node-0/1/2 @0, Graph-A':node-0/1/2 @1, Graph-A':node-0/1/2
> > @2
> > >
> > > + - - - - - - - - - - - - - - - - - - - - - +
> > > '                Core #0/1/2                '
> > > '                                           '
> > > ' +--------+     +---------+     +--------+ '
> > > ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> > > ' +--------+     +---------+     +--------+ '
> > > '                                           '
> > > + - - - - - - - - - - - - - - - - - - - - - +
> > >
> > > Pipeline:
> > > Config Graph-A: node-0->0; node-1->1; node-2->2;
> > > Graph-A':node-0 @0, Graph-A':node-1 @1, Graph-A':node-2 @2
> > >
> > > + - - - - - -+     +- - - - - - +     + - - - - - -+
> > > '  Core #0   '     '  Core #1   '     '  Core #2   '
> > > '            '     '            '     '            '
> > > ' +--------+ '     ' +--------+ '     ' +--------+ '
> > > ' | Node-0 | ' --> ' | Node-1 | ' --> ' | Node-2 | '
> > > ' +--------+ '     ' +--------+ '     ' +--------+ '
> > > '            '     '            '     '            '
> > > + - - - - - -+     +- - - - - - +     + - - - - - -+
> > >
> > > Hybrid:
> > > Config Graph-A: node-0->current; node-1->current; node-2->2;
> > > Graph-A':node-0/1 @0, Graph-A':node-0/1 @1, Graph-A':node-2 @2
> > >
> > > + - - - - - - - - - - - - - - - +     + - - - - - -+
> > > '            Core #0            '     '  Core #2   '
> > > '                               '     '            '
> > > ' +--------+         +--------+ '     ' +--------+ '
> > > ' | Node-0 | ------> | Node-1 | ' --> ' | Node-2 | '
> > > ' +--------+         +--------+ '     ' +--------+ '
> > > '                               '     '            '
> > > + - - - - - - - - - - - - - - - +     + - - - - - -+
> > >                                           ^
> > >                                           |
> > >                                           |
> > > + - - - - - - - - - - - - - - - +         |
> > > '            Core #1            '         |
> > > '                               '         |
> > > ' +--------+         +--------+ '         |
> > > ' | Node-0 | ------> | Node-1 | ' --------+
> > > ' +--------+         +--------+ '
> > > '                               '
> > > + - - - - - - - - - - - - - - - +
> > >
> > >
> > > The patch set has been break down as below:
> > >
> > > 1. Split graph worker into common and default model part.
> > > 2. Inline graph node processing and graph circular buffer walking to make
> > >   it reusable.
> > > 3. Add set/get APIs to choose worker model.
> > > 4. Introduce core affinity API to set the node run on specific worker core.
> > >   (only use in new model)
> > > 5. Introduce graph affinity API to bind one graph with specific worker
> > >   core.
> > > 6. Introduce graph clone API.
> > > 7. Introduce stream moving with scheduler work-queue in patch 8,9,10.
> > > 8. Add stats for new models.
> > > 9. Abstract default graph config process and integrate new model into
> > >   example/l3fwd-graph. Add new parameters for model choosing.
> > >
> > > We could run with new worker model by this:
> > > ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > > --model="generic"
> > >
> > > References:
> > > https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20intro
> > > duce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
> > >
> > > Zhirun Yan (13):
> > >   graph: split graph worker into common and default model
> > >   graph: move node process into inline function
> > >   graph: add macro to walk on graph circular buffer
> > >   graph: add get/set graph worker model APIs
> > >   graph: introduce core affinity API
> > >   graph: introduce graph affinity API
> > >   graph: introduce graph clone API for other worker core
> > >   graph: introduce stream moving cross cores
> > >   graph: enable create and destroy graph scheduling workqueue
> > >   graph: introduce graph walk by cross-core dispatch
> > >   graph: enable graph generic scheduler model
> > >   graph: add stats for corss-core dispatching
> > >   examples/l3fwd-graph: introduce generic worker model
> > >
> > >  examples/l3fwd-graph/main.c         | 218 +++++++++--
> > >  lib/graph/graph.c                   | 179 +++++++++
> > >  lib/graph/graph_debug.c             |   6 +
> > >  lib/graph/graph_populate.c          |   1 +
> > >  lib/graph/graph_private.h           |  44 +++
> > >  lib/graph/graph_stats.c             |  74 +++-
> > >  lib/graph/meson.build               |   3 +-
> > >  lib/graph/node.c                    |   1 +
> > >  lib/graph/rte_graph.h               |  44 +++
> > >  lib/graph/rte_graph_model_generic.c | 179 +++++++++
> > > lib/graph/rte_graph_model_generic.h | 114 ++++++
> > >  lib/graph/rte_graph_model_rtc.h     |  22 ++
> > >  lib/graph/rte_graph_worker.h        | 516 ++------------------------
> > >  lib/graph/rte_graph_worker_common.h | 545
> > ++++++++++++++++++++++++++++
> > >  lib/graph/version.map               |   8 +
> > >  15 files changed, 1430 insertions(+), 524 deletions(-)  create mode
> > > 100644 lib/graph/rte_graph_model_generic.c
> > >  create mode 100644 lib/graph/rte_graph_model_generic.h
> > >  create mode 100644 lib/graph/rte_graph_model_rtc.h  create mode
> > > 100644 lib/graph/rte_graph_worker_common.h
> > >
> > >
> >
> >
> >
> >
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 01/13] graph: split graph worker into common and default model
  2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
@ 2023-02-20 13:38   ` Jerin Jacob
  2023-02-24  6:29     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 13:38 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:39 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> To support multiple graph worker model, split graph into common
> and default. Naming the current walk function as rte_graph_model_rtc
> cause the default model is RTC(Run-to-completion).

There CI issues with this series. Please check
https://patches.dpdk.org/project/dpdk/patch/20221117050926.136974-2-zhirun.yan@intel.com/
# Please make sure each patch builds with devtools/test-meson-builds.sh
# Please make sure each patch dont any issue with app/test/test_graph.c test
# This series dont have perf issues with app/test/test_graph_perf.c
# Both RTC and new mode runs with l3fwd_graph without any performance regression
# Please Introduce model concept in documentation at
doc/guides/prog_guide/graph_lib.rst and details for this generic mode.

Also update the maintainers files for new model files.

>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_model_rtc.h     |  57 ++++
>  lib/graph/rte_graph_worker.h        | 498 +---------------------------
>  lib/graph/rte_graph_worker_common.h | 456 +++++++++++++++++++++++++


Use git mv to avoid loosing history and reduce the diff.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 02/13] graph: move node process into inline function
  2022-11-17  5:09 ` [PATCH v1 02/13] graph: move node process into inline function Zhirun Yan
@ 2023-02-20 13:39   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 13:39 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Node process is a single and reusable block, move the code into an inline
> function.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>


> ---
>  lib/graph/rte_graph_model_rtc.h     | 18 +---------------
>  lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
>  2 files changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
> index fb58730bde..c80b0ce962 100644
> --- a/lib/graph/rte_graph_model_rtc.h
> +++ b/lib/graph/rte_graph_model_rtc.h
> @@ -16,9 +16,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>         const rte_node_t mask = graph->cir_mask;
>         uint32_t head = graph->head;
>         struct rte_node *node;
> -       uint64_t start;
> -       uint16_t rc;
> -       void **objs;
>
>         /*
>          * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
> @@ -37,20 +34,7 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>          */
>         while (likely(head != graph->tail)) {
>                 node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
> -               RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> -               objs = node->objs;
> -               rte_prefetch0(objs);
> -
> -               if (rte_graph_has_stats_feature()) {
> -                       start = rte_rdtsc();
> -                       rc = node->process(graph, node, objs, node->idx);
> -                       node->total_cycles += rte_rdtsc() - start;
> -                       node->total_calls++;
> -                       node->total_objs += rc;
> -               } else {
> -                       node->process(graph, node, objs, node->idx);
> -               }
> -               node->idx = 0;
> +               __rte_node_process(graph, node);
>                 head = likely((int32_t)head > 0) ? head & mask : head;
>         }
>         graph->tail = 0;
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index 91a5de7fa4..b7b2bb958c 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -121,6 +121,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
>
>  /* Fast path helper functions */
>
> +/**
> + * @internal
> + *
> + * Enqueue a given node to the tail of the graph reel.
> + *
> + * @param graph
> + *   Pointer Graph object.
> + * @param node
> + *   Pointer to node object to be enqueued.
> + */
> +static __rte_always_inline void
> +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> +{
> +       uint64_t start;
> +       uint16_t rc;
> +       void **objs;
> +
> +       RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +       objs = node->objs;
> +       rte_prefetch0(objs);
> +
> +       if (rte_graph_has_stats_feature()) {
> +               start = rte_rdtsc();
> +               rc = node->process(graph, node, objs, node->idx);
> +               node->total_cycles += rte_rdtsc() - start;
> +               node->total_calls++;
> +               node->total_objs += rc;
> +       } else {
> +               node->process(graph, node, objs, node->idx);
> +       }
> +       node->idx = 0;
> +}
> +
>  /**
>   * @internal
>   *
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 03/13] graph: add macro to walk on graph circular buffer
  2022-11-17  5:09 ` [PATCH v1 03/13] graph: add macro to walk on graph circular buffer Zhirun Yan
@ 2023-02-20 13:45   ` Jerin Jacob
  2023-02-24  6:30     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 13:45 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> It is common to walk on graph circular buffer and use macro to make
> it reusable for other worker models.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_model_rtc.h     | 23 ++---------------------
>  lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
>  2 files changed, 25 insertions(+), 21 deletions(-)

> +/**
> + * Macro to walk on the source node(s) ((cir_start - head) -> cir_start)
> + * and then on the pending streams
> + * (cir_start -> (cir_start + mask) -> cir_start)
> + * in a circular buffer fashion.
> + *
> + *     +-----+ <= cir_start - head [number of source nodes]
> + *     |     |
> + *     | ... | <= source nodes
> + *     |     |
> + *     +-----+ <= cir_start [head = 0] [tail = 0]
> + *     |     |
> + *     | ... | <= pending streams
> + *     |     |
> + *     +-----+ <= cir_start + mask
> + */
> +#define rte_graph_walk_node(graph, head, node)                                         \
> +       for ((node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]);        \
> +            likely((head) != (graph)->tail);                                           \
> +            (head)++,                                                                  \
> +            (node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]),        \

This is an additional assignment compare to original while() based
version. Right?
No need to generalize with performance impact.


> +            (head) = likely((int32_t)(head) > 0) ? (head) & (graph)->cir_mask : (head))
> +
>  /**
>   * @internal
>   *
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
  2022-12-06  3:35   ` [EXT] " Kiran Kumar Kokkilagadda
@ 2023-02-20 13:50   ` Jerin Jacob
  2023-02-24  6:31     ` Yan, Zhirun
  1 sibling, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 13:50 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 13 ++++++++
>  lib/graph/version.map               |  3 ++
>  3 files changed, 67 insertions(+)
>
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index 54d1390786..a0ea0df153 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -1,5 +1,56 @@
>  #include "rte_graph_model_rtc.h"
>
> +static enum rte_graph_worker_model worker_model = RTE_GRAPH_MODEL_DEFAULT;

This will break the multiprocess.

> +
> +/** Graph worker models */
> +enum rte_graph_worker_model {
> +#define WORKER_MODEL_DEFAULT "default"

Why need strings?
Also, every symbol in a public header file should start with RTE_ to
avoid namespace conflict.

> +       RTE_GRAPH_MODEL_DEFAULT = 0,
> +#define WORKER_MODEL_RTC "rtc"
> +       RTE_GRAPH_MODEL_RTC,

Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum itself.

> +#define WORKER_MODEL_GENERIC "generic"

Generic is a very overloaded term. Use pipeline here i.e
RTE_GRAPH_MODEL_PIPELINE


> +       RTE_GRAPH_MODEL_GENERIC,
> +       RTE_GRAPH_MODEL_MAX,

No need for MAX, it will break the ABI for future. See other subsystem
such as cryptodev.

> +};

>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 05/13] graph: introduce core affinity API
  2022-11-17  5:09 ` [PATCH v1 05/13] graph: introduce core affinity API Zhirun Yan
@ 2023-02-20 14:05   ` Jerin Jacob
  2023-02-24  6:32     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 14:05 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> 1. add lcore_id for node to hold affinity core id.
> 2. impl rte_node_model_generic_set_lcore_affinity to affinity node
>    with one lcore.
> 3. update version map for graph public API.

No need to explicitly tell 3. Rewrite 1 and 2 , one or two sentence
without 1 and 2.

>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph_private.h           |  1 +
>  lib/graph/meson.build               |  1 +
>  lib/graph/node.c                    |  1 +
>  lib/graph/rte_graph_model_generic.c | 31 +++++++++++++++++++++
>  lib/graph/rte_graph_model_generic.h | 43 +++++++++++++++++++++++++++++
>  lib/graph/version.map               |  2 ++
>  6 files changed, 79 insertions(+)
>  create mode 100644 lib/graph/rte_graph_model_generic.c
>  create mode 100644 lib/graph/rte_graph_model_generic.h
>
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index f9a85c8926..627090f802 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -49,6 +49,7 @@ struct node {
>         STAILQ_ENTRY(node) next;      /**< Next node in the list. */
>         char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
>         uint64_t flags;               /**< Node configuration flag. */
> +       unsigned int lcore_id;        /**< Node runs on the Lcore ID */
>         rte_node_process_t process;   /**< Node process function. */
>         rte_node_init_t init;         /**< Node init function. */
>         rte_node_fini_t fini;         /**< Node fini function. */
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index c7327549e8..8c8b11ed27 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -14,6 +14,7 @@ sources = files(
>          'graph_debug.c',
>          'graph_stats.c',
>          'graph_populate.c',
> +        'rte_graph_model_generic.c',
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
>
> diff --git a/lib/graph/node.c b/lib/graph/node.c
> index fc6345de07..8ad4b3cbeb 100644
> --- a/lib/graph/node.c
> +++ b/lib/graph/node.c
> @@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
>                         goto free;
>         }
>
> +       node->lcore_id = RTE_MAX_LCORE;
>         node->id = node_id++;
>
>         /* Add the node at tail */
> diff --git a/lib/graph/rte_graph_model_generic.c b/lib/graph/rte_graph_model_generic.c
> new file mode 100644
> index 0000000000..54ff659c7b
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_generic.c
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Intel Corporation
> + */
> +
> +#include "graph_private.h"
> +#include "rte_graph_model_generic.h"
> +
> +int
> +rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id)

Please use action/verb as last. Also It is graph specific API. Right?
I would suggest, rte_graph_model_pipeline_lcore_affinity_set()

> diff --git a/lib/graph/rte_graph_model_generic.h b/lib/graph/rte_graph_model_generic.h
> new file mode 100644
> index 0000000000..20ca48a9e3
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_generic.h
> @@ -0,0 +1,43 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Intel Corporation
> + */
> +
> +#ifndef _RTE_GRAPH_MODEL_GENERIC_H_
> +#define _RTE_GRAPH_MODEL_GENERIC_H_
> +
> +/**
> + * @file rte_graph_model_generic.h
> + *
> + * @warning
> + * @b EXPERIMENTAL:
> + * All functions in this file may be changed or removed without prior notice.
> + *
> + * This API allows a worker thread to walk over a graph and nodes to create,
> + * process, enqueue and move streams of objects to the next nodes.
> + */
> +#include "rte_graph_worker_common.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Set lcore affinity to the node.
> + *
> + * @param name
> + *   Valid node name. In the case of the cloned node, the name will be
> + * "parent node name" + "-" + name.
> + * @param lcore_id
> + *   The lcore ID value.
> + *
> + * @return
> + *   0 on success, error otherwise.
> + */
> +__rte_experimental
> +int rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_GRAPH_MODEL_GENERIC_H_ */
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index eea73ec9ca..33ff055be6 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -46,5 +46,7 @@ EXPERIMENTAL {
>         rte_graph_worker_model_set;
>         rte_graph_worker_model_get;
>
> +       rte_node_model_generic_set_lcore_affinity;
> +
>         local: *;
>  };
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 06/13] graph: introduce graph affinity API
  2022-11-17  5:09 ` [PATCH v1 06/13] graph: introduce graph " Zhirun Yan
@ 2023-02-20 14:07   ` Jerin Jacob
  2023-02-24  6:39     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 14:07 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add lcore_id for graph to hold affinity core id where graph would run on.
> Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
> be set as MAX by default, it means not enable this attribute.
>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index 33ff055be6..1c599b5b47 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -18,6 +18,8 @@ EXPERIMENTAL {
>         rte_graph_node_get_by_name;
>         rte_graph_obj_dump;
>         rte_graph_walk;
> +       rte_graph_bind_core;

if it is not applicable to RTC, please change to
rte_graph_model_pipeline_core_bind()

> +       rte_graph_unbind_core;
>
>         rte_graph_cluster_stats_create;
>         rte_graph_cluster_stats_destroy;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 08/13] graph: introduce stream moving cross cores
  2022-11-17  5:09 ` [PATCH v1 08/13] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-02-20 14:17   ` Jerin Jacob
  2023-02-24  6:48     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 14:17 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch introduces key functions to allow a worker thread to
> enable enqueue and move streams of objects to the next nodes over
> different cores.
>
> 1. add graph_sched_wq_node to hold graph scheduling workqueue node
> stream
> 2. add workqueue help functions to create/destroy/enqueue/dequeue

Two things, make as two patches


> @@ -39,6 +46,15 @@ struct rte_graph {
>         uint32_t cir_mask;           /**< Circular buffer wrap around mask. */
>         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
>         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
> +       /* Graph schedule */
> +       struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
> +       struct rte_graph_rq_head rq_head; /* The head for run-queue list */
> +
> +       SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
> +       unsigned int lcore_id;  /**< The graph running Lcore. */
> +       struct rte_ring *wq;    /**< The work-queue for pending streams. */
> +       struct rte_mempool *mp; /**< The mempool for scheduling streams. */
> +       /* Graph schedule area */
>         rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
>         rte_graph_t id; /**< Graph identifier. */
>         int socket;     /**< Socket ID where memory is allocated. */
> @@ -63,6 +79,8 @@ struct rte_node {
>         char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
>         char name[RTE_NODE_NAMESIZE];   /**< Name of the node. */
>
> +       /* Fast schedule area */
> +       unsigned int lcore_id __rte_cache_aligned;  /**< Node running Lcore. */

Do we need __rte_cache_aligned here? I am wondering can we add union
for different model specific area ONLY for fast path so that we can
save memory and fast path data will be more warm.

>         /* Fast path area  */
>  #define RTE_NODE_CTX_SZ 16
>         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model
  2022-11-17  5:09 ` [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model Zhirun Yan
@ 2023-02-20 14:20   ` Jerin Jacob
  2023-02-24  6:49     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 14:20 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:41 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new parameter "model" to choose generic or rtc worker model.
> And in generic model, the node will affinity to worker core successively.
>
> Note:
> only support one RX node for remote model in current implementation.
>
> ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="generic"

Patch apply issue, please rebase with main.
See https://patches.dpdk.org/project/dpdk/patch/20221117050926.136974-14-zhirun.yan@intel.com/

>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  examples/l3fwd-graph/main.c | 218 +++++++++++++++++++++++++++++-------
>  1 file changed, 179 insertions(+), 39 deletions(-)
>
> diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> index 6dcb6ee92b..c145a3e3e8 100644
> --- a/examples/l3fwd-graph/main.c
> +++ b/examples/l3fwd-graph/main.c
> @@ -147,6 +147,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
>         {RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
>  };
>
> +static int
> +check_worker_model_params(void)
> +{
> +       if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC &&
> +           nb_lcore_params > 1) {
> +               printf("Exceeded max number of lcore params for remote model: %hu\n",
> +                      nb_lcore_params);
> +               return -1;
> +       }
> +
> +       return 0;
> +}
> +
>  static int
>  check_lcore_params(void)
>  {
> @@ -291,6 +304,20 @@ parse_max_pkt_len(const char *pktlen)
>         return len;
>  }
>
> +static int
> +parse_worker_model(const char *model)
> +{
> +       if (strcmp(model, WORKER_MODEL_DEFAULT) == 0)
> +               return RTE_GRAPH_MODEL_DEFAULT;
> +       else if (strcmp(model, WORKER_MODEL_GENERIC) == 0) {
> +               rte_graph_worker_model_set(RTE_GRAPH_MODEL_GENERIC);
> +               return RTE_GRAPH_MODEL_GENERIC;
> +       }
> +       rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
> +
> +       return RTE_GRAPH_MODEL_MAX;
> +}
> +
>  static int
>  parse_portmask(const char *portmask)
>  {
> @@ -404,6 +431,7 @@ static const char short_options[] = "p:" /* portmask */
>  #define CMD_LINE_OPT_NO_NUMA      "no-numa"
>  #define CMD_LINE_OPT_MAX_PKT_LEN   "max-pkt-len"
>  #define CMD_LINE_OPT_PER_PORT_POOL "per-port-pool"
> +#define CMD_LINE_OPT_WORKER_MODEL  "model"
>  enum {
>         /* Long options mapped to a short option */
>
> @@ -416,6 +444,7 @@ enum {
>         CMD_LINE_OPT_NO_NUMA_NUM,
>         CMD_LINE_OPT_MAX_PKT_LEN_NUM,
>         CMD_LINE_OPT_PARSE_PER_PORT_POOL,
> +       CMD_LINE_OPT_WORKER_MODEL_TYPE,
>  };
>
>  static const struct option lgopts[] = {
> @@ -424,6 +453,7 @@ static const struct option lgopts[] = {
>         {CMD_LINE_OPT_NO_NUMA, 0, 0, CMD_LINE_OPT_NO_NUMA_NUM},
>         {CMD_LINE_OPT_MAX_PKT_LEN, 1, 0, CMD_LINE_OPT_MAX_PKT_LEN_NUM},
>         {CMD_LINE_OPT_PER_PORT_POOL, 0, 0, CMD_LINE_OPT_PARSE_PER_PORT_POOL},
> +       {CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
>         {NULL, 0, 0, 0},
>  };
>
> @@ -498,6 +528,11 @@ parse_args(int argc, char **argv)
>                         per_port_pool = 1;
>                         break;
>
> +               case CMD_LINE_OPT_WORKER_MODEL_TYPE:
> +                       printf("Use new worker model: %s\n", optarg);
> +                       parse_worker_model(optarg);
> +                       break;
> +
>                 default:
>                         print_usage(prgname);
>                         return -1;
> @@ -735,6 +770,140 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
>         return 0;
>  }
>
> +static void
> +graph_config_generic(struct rte_graph_param graph_conf)
> +{
> +       uint16_t nb_patterns = graph_conf.nb_node_patterns;
> +       int worker_count = rte_lcore_count() - 1;
> +       int main_lcore_id = rte_get_main_lcore();
> +       int worker_lcore = main_lcore_id;
> +       rte_graph_t main_graph_id = 0;
> +       struct rte_node *node_tmp;
> +       struct lcore_conf *qconf;
> +       struct rte_graph *graph;
> +       rte_graph_t graph_id;
> +       rte_graph_off_t off;
> +       int n_rx_node = 0;
> +       rte_node_t count;
> +       rte_edge_t i;
> +       int ret;
> +
> +       for (int j = 0; j < nb_lcore_params; j++) {
> +               qconf = &lcore_conf[lcore_params[j].lcore_id];
> +               /* Add rx node patterns of all lcore */
> +               for (i = 0; i < qconf->n_rx_queue; i++) {
> +                       char *node_name = qconf->rx_queue_list[i].node_name;
> +
> +                       graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
> +                       n_rx_node++;
> +                       ret = rte_node_model_generic_set_lcore_affinity(node_name,
> +                                                                       lcore_params[j].lcore_id);
> +                       if (ret == 0)
> +                               printf("Set node %s affinity to lcore %u\n", node_name,
> +                                      lcore_params[j].lcore_id);
> +               }
> +       }
> +
> +       graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
> +       graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
> +
> +       snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> +                main_lcore_id);
> +
> +       /* create main graph */
> +       main_graph_id = rte_graph_create(qconf->name, &graph_conf);
> +       if (main_graph_id == RTE_GRAPH_ID_INVALID)
> +               rte_exit(EXIT_FAILURE,
> +                        "rte_graph_create(): main_graph_id invalid for lcore %u\n",
> +                        main_lcore_id);
> +
> +       qconf->graph_id = main_graph_id;
> +       qconf->graph = rte_graph_lookup(qconf->name);
> +       /* >8 End of graph initialization. */
> +       if (!qconf->graph)
> +               rte_exit(EXIT_FAILURE,
> +                        "rte_graph_lookup(): graph %s not found\n",
> +                        qconf->name);
> +
> +       graph = qconf->graph;
> +       rte_graph_foreach_node(count, off, graph, node_tmp) {
> +               worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
> +
> +               /* Need to set the node Lcore affinity before clone graph for each lcore */
> +               if (node_tmp->lcore_id == RTE_MAX_LCORE) {
> +                       ret = rte_node_model_generic_set_lcore_affinity(node_tmp->name,
> +                                                                       worker_lcore);
> +                       if (ret == 0)
> +                               printf("Set node %s affinity to lcore %u\n",
> +                                      node_tmp->name, worker_lcore);
> +               }
> +       }
> +
> +       worker_lcore = main_lcore_id;
> +       for (int i = 0; i < worker_count; i++) {
> +               worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
> +
> +               qconf = &lcore_conf[worker_lcore];
> +               snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
> +               graph_id = rte_graph_clone(main_graph_id, qconf->name);
> +               ret = rte_graph_bind_core(graph_id, worker_lcore);
> +               if (ret == 0)
> +                       printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
> +
> +               /* full cloned graph name */
> +               snprintf(qconf->name, sizeof(qconf->name), "%s",
> +                        rte_graph_id_to_name(graph_id));
> +               qconf->graph_id = graph_id;
> +               qconf->graph = rte_graph_lookup(qconf->name);
> +               if (!qconf->graph)
> +                       rte_exit(EXIT_FAILURE,
> +                                "Failed to lookup graph %s\n",
> +                                qconf->name);
> +               continue;
> +       }
> +}
> +
> +static void
> +graph_config_rtc(struct rte_graph_param graph_conf)
> +{
> +       uint16_t nb_patterns = graph_conf.nb_node_patterns;
> +       struct lcore_conf *qconf;
> +       rte_graph_t graph_id;
> +       uint32_t lcore_id;
> +       rte_edge_t i;
> +
> +       for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> +               if (rte_lcore_is_enabled(lcore_id) == 0)
> +                       continue;
> +
> +               qconf = &lcore_conf[lcore_id];
> +               /* Skip graph creation if no source exists */
> +               if (!qconf->n_rx_queue)
> +                       continue;
> +               /* Add rx node patterns of this lcore */
> +               for (i = 0; i < qconf->n_rx_queue; i++) {
> +                       graph_conf.node_patterns[nb_patterns + i] =
> +                               qconf->rx_queue_list[i].node_name;
> +               }
> +               graph_conf.nb_node_patterns = nb_patterns + i;
> +               graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
> +               snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> +                        lcore_id);
> +               graph_id = rte_graph_create(qconf->name, &graph_conf);
> +               if (graph_id == RTE_GRAPH_ID_INVALID)
> +                       rte_exit(EXIT_FAILURE,
> +                                "rte_graph_create(): graph_id invalid for lcore %u\n",
> +                                lcore_id);
> +               qconf->graph_id = graph_id;
> +               qconf->graph = rte_graph_lookup(qconf->name);
> +               /* >8 End of graph initialization. */
> +               if (!qconf->graph)
> +                       rte_exit(EXIT_FAILURE,
> +                                "rte_graph_lookup(): graph %s not found\n",
> +                                qconf->name);
> +       }
> +}
> +
>  int
>  main(int argc, char **argv)
>  {
> @@ -759,6 +928,7 @@ main(int argc, char **argv)
>         uint16_t nb_patterns;
>         uint8_t rewrite_len;
>         uint32_t lcore_id;
> +       uint16_t model;
>         int ret;
>
>         /* Init EAL */
> @@ -787,6 +957,9 @@ main(int argc, char **argv)
>         if (check_lcore_params() < 0)
>                 rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
>
> +       if (check_worker_model_params() < 0)
> +               rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
> +
>         ret = init_lcore_rx_queues();
>         if (ret < 0)
>                 rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
> @@ -1026,46 +1199,13 @@ main(int argc, char **argv)
>
>         memset(&graph_conf, 0, sizeof(graph_conf));
>         graph_conf.node_patterns = node_patterns;
> +       graph_conf.nb_node_patterns = nb_patterns;
>
> -       for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> -               rte_graph_t graph_id;
> -               rte_edge_t i;
> -
> -               if (rte_lcore_is_enabled(lcore_id) == 0)
> -                       continue;
> -
> -               qconf = &lcore_conf[lcore_id];
> -
> -               /* Skip graph creation if no source exists */
> -               if (!qconf->n_rx_queue)
> -                       continue;
> -
> -               /* Add rx node patterns of this lcore */
> -               for (i = 0; i < qconf->n_rx_queue; i++) {
> -                       graph_conf.node_patterns[nb_patterns + i] =
> -                               qconf->rx_queue_list[i].node_name;
> -               }
> -
> -               graph_conf.nb_node_patterns = nb_patterns + i;
> -               graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
> -
> -               snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> -                        lcore_id);
> -
> -               graph_id = rte_graph_create(qconf->name, &graph_conf);
> -               if (graph_id == RTE_GRAPH_ID_INVALID)
> -                       rte_exit(EXIT_FAILURE,
> -                                "rte_graph_create(): graph_id invalid"
> -                                " for lcore %u\n", lcore_id);
> -
> -               qconf->graph_id = graph_id;
> -               qconf->graph = rte_graph_lookup(qconf->name);
> -               /* >8 End of graph initialization. */
> -               if (!qconf->graph)
> -                       rte_exit(EXIT_FAILURE,
> -                                "rte_graph_lookup(): graph %s not found\n",
> -                                qconf->name);
> -       }
> +       model = rte_graph_worker_model_get();
> +       if (model == RTE_GRAPH_MODEL_DEFAULT)
> +               graph_config_rtc(graph_conf);
> +       else if (model == RTE_GRAPH_MODEL_GENERIC)
> +               graph_config_generic(graph_conf);
>
>         memset(&rewrite_data, 0, sizeof(rewrite_data));
>         rewrite_len = sizeof(rewrite_data);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 01/13] graph: split graph worker into common and default model
  2023-02-20 13:38   ` Jerin Jacob
@ 2023-02-24  6:29     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:29 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 9:38 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 01/13] graph: split graph worker into common and
> default model
> 
> On Thu, Nov 17, 2022 at 10:39 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > To support multiple graph worker model, split graph into common and
> > default. Naming the current walk function as rte_graph_model_rtc cause
> > the default model is RTC(Run-to-completion).
> 
> There CI issues with this series. Please check
> https://patches.dpdk.org/project/dpdk/patch/20221117050926.136974-2-
> zhirun.yan@intel.com/
> # Please make sure each patch builds with devtools/test-meson-builds.sh #
> Please make sure each patch dont any issue with app/test/test_graph.c test #
> This series dont have perf issues with app/test/test_graph_perf.c # Both RTC
> and new mode runs with l3fwd_graph without any performance regression #
> Please Introduce model concept in documentation at
> doc/guides/prog_guide/graph_lib.rst and details for this generic mode.
> 
> Also update the maintainers files for new model files.
> 
Yes, I will fix the CI issues and update doc, maintainers files in next version.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_model_rtc.h     |  57 ++++
> >  lib/graph/rte_graph_worker.h        | 498 +---------------------------
> >  lib/graph/rte_graph_worker_common.h | 456 +++++++++++++++++++++++++
> 
> 
> Use git mv to avoid loosing history and reduce the diff.

Actually, it is file A -> file B and file C, I will break it to 2 patch to keep the log history.
Got it, Thanks.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 03/13] graph: add macro to walk on graph circular buffer
  2023-02-20 13:45   ` Jerin Jacob
@ 2023-02-24  6:30     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:30 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 9:45 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 03/13] graph: add macro to walk on graph circular buffer
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > It is common to walk on graph circular buffer and use macro to make it
> > reusable for other worker models.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_model_rtc.h     | 23 ++---------------------
> >  lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
> >  2 files changed, 25 insertions(+), 21 deletions(-)
> 
> > +/**
> > + * Macro to walk on the source node(s) ((cir_start - head) ->
> > +cir_start)
> > + * and then on the pending streams
> > + * (cir_start -> (cir_start + mask) -> cir_start)
> > + * in a circular buffer fashion.
> > + *
> > + *     +-----+ <= cir_start - head [number of source nodes]
> > + *     |     |
> > + *     | ... | <= source nodes
> > + *     |     |
> > + *     +-----+ <= cir_start [head = 0] [tail = 0]
> > + *     |     |
> > + *     | ... | <= pending streams
> > + *     |     |
> > + *     +-----+ <= cir_start + mask
> > + */
> > +#define rte_graph_walk_node(graph, head, node)                                         \
> > +       for ((node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]);
> \
> > +            likely((head) != (graph)->tail);                                           \
> > +            (head)++,                                                                  \
> > +            (node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]),
> \
> 
> This is an additional assignment compare to original while() based version. Right?
> No need to generalize with performance impact.
Yes, you are right. I will change the macro to use the original while loop.

> 
> 
> > +            (head) = likely((int32_t)(head) > 0) ? (head) &
> > + (graph)->cir_mask : (head))
> > +
> >  /**
> >   * @internal
> >   *
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-20 13:50   ` Jerin Jacob
@ 2023-02-24  6:31     ` Yan, Zhirun
  2023-02-26 22:23       ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:31 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 9:51 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> >  lib/graph/version.map               |  3 ++
> >  3 files changed, 67 insertions(+)
> >
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -1,5 +1,56 @@
> >  #include "rte_graph_model_rtc.h"
> >
> > +static enum rte_graph_worker_model worker_model =
> > +RTE_GRAPH_MODEL_DEFAULT;
> 
> This will break the multiprocess.

Thanks. I will use TLS for per-thread local storage.

> 
> > +
> > +/** Graph worker models */
> > +enum rte_graph_worker_model {
> > +#define WORKER_MODEL_DEFAULT "default"
> 
> Why need strings?
> Also, every symbol in a public header file should start with RTE_ to avoid
> namespace conflict.

It was used to config the model in app. I can put the string into example.

> 
> > +       RTE_GRAPH_MODEL_DEFAULT = 0,
> > +#define WORKER_MODEL_RTC "rtc"
> > +       RTE_GRAPH_MODEL_RTC,
> 
> Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum
> itself.
Yes, will do in next version.

> 
> > +#define WORKER_MODEL_GENERIC "generic"
> 
> Generic is a very overloaded term. Use pipeline here i.e
> RTE_GRAPH_MODEL_PIPELINE

Actually, it's not a purely pipeline mode. I prefer to change to hybrid. 
> 
> 
> > +       RTE_GRAPH_MODEL_GENERIC,
> > +       RTE_GRAPH_MODEL_MAX,
> 
> No need for MAX, it will break the ABI for future. See other subsystem such as
> cryptodev.

Thanks, I will change it.
> 
> > +};
> 
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 05/13] graph: introduce core affinity API
  2023-02-20 14:05   ` Jerin Jacob
@ 2023-02-24  6:32     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:32 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 10:05 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 05/13] graph: introduce core affinity API
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > 1. add lcore_id for node to hold affinity core id.
> > 2. impl rte_node_model_generic_set_lcore_affinity to affinity node
> >    with one lcore.
> > 3. update version map for graph public API.
> 
> No need to explicitly tell 3. Rewrite 1 and 2 , one or two sentence without 1 and
> 2.
> 
Got it. I will change it in next version.

> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph_private.h           |  1 +
> >  lib/graph/meson.build               |  1 +
> >  lib/graph/node.c                    |  1 +
> >  lib/graph/rte_graph_model_generic.c | 31 +++++++++++++++++++++
> > lib/graph/rte_graph_model_generic.h | 43
> +++++++++++++++++++++++++++++
> >  lib/graph/version.map               |  2 ++
> >  6 files changed, 79 insertions(+)
> >  create mode 100644 lib/graph/rte_graph_model_generic.c
> >  create mode 100644 lib/graph/rte_graph_model_generic.h
> >
> > diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> > index f9a85c8926..627090f802 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -49,6 +49,7 @@ struct node {
> >         STAILQ_ENTRY(node) next;      /**< Next node in the list. */
> >         char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
> >         uint64_t flags;               /**< Node configuration flag. */
> > +       unsigned int lcore_id;        /**< Node runs on the Lcore ID */
> >         rte_node_process_t process;   /**< Node process function. */
> >         rte_node_init_t init;         /**< Node init function. */
> >         rte_node_fini_t fini;         /**< Node fini function. */
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > c7327549e8..8c8b11ed27 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -14,6 +14,7 @@ sources = files(
> >          'graph_debug.c',
> >          'graph_stats.c',
> >          'graph_populate.c',
> > +        'rte_graph_model_generic.c',
> >  )
> >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> > diff --git a/lib/graph/node.c b/lib/graph/node.c index
> > fc6345de07..8ad4b3cbeb 100644
> > --- a/lib/graph/node.c
> > +++ b/lib/graph/node.c
> > @@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register
> *reg)
> >                         goto free;
> >         }
> >
> > +       node->lcore_id = RTE_MAX_LCORE;
> >         node->id = node_id++;
> >
> >         /* Add the node at tail */
> > diff --git a/lib/graph/rte_graph_model_generic.c
> > b/lib/graph/rte_graph_model_generic.c
> > new file mode 100644
> > index 0000000000..54ff659c7b
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_model_generic.c
> > @@ -0,0 +1,31 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2022 Intel Corporation  */
> > +
> > +#include "graph_private.h"
> > +#include "rte_graph_model_generic.h"
> > +
> > +int
> > +rte_node_model_generic_set_lcore_affinity(const char *name, unsigned
> > +int lcore_id)
> 
> Please use action/verb as last. Also It is graph specific API. Right?
> I would suggest, rte_graph_model_pipeline_lcore_affinity_set()
> 
Yes, it is graph specific API. I will change in next version. Thanks.

> > diff --git a/lib/graph/rte_graph_model_generic.h
> > b/lib/graph/rte_graph_model_generic.h
> > new file mode 100644
> > index 0000000000..20ca48a9e3
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_model_generic.h
> > @@ -0,0 +1,43 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2022 Intel Corporation  */
> > +
> > +#ifndef _RTE_GRAPH_MODEL_GENERIC_H_
> > +#define _RTE_GRAPH_MODEL_GENERIC_H_
> > +
> > +/**
> > + * @file rte_graph_model_generic.h
> > + *
> > + * @warning
> > + * @b EXPERIMENTAL:
> > + * All functions in this file may be changed or removed without prior notice.
> > + *
> > + * This API allows a worker thread to walk over a graph and nodes to
> > +create,
> > + * process, enqueue and move streams of objects to the next nodes.
> > + */
> > +#include "rte_graph_worker_common.h"
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * Set lcore affinity to the node.
> > + *
> > + * @param name
> > + *   Valid node name. In the case of the cloned node, the name will be
> > + * "parent node name" + "-" + name.
> > + * @param lcore_id
> > + *   The lcore ID value.
> > + *
> > + * @return
> > + *   0 on success, error otherwise.
> > + */
> > +__rte_experimental
> > +int rte_node_model_generic_set_lcore_affinity(const char *name,
> > +unsigned int lcore_id);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_GRAPH_MODEL_GENERIC_H_ */
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > eea73ec9ca..33ff055be6 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -46,5 +46,7 @@ EXPERIMENTAL {
> >         rte_graph_worker_model_set;
> >         rte_graph_worker_model_get;
> >
> > +       rte_node_model_generic_set_lcore_affinity;
> > +
> >         local: *;
> >  };
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 06/13] graph: introduce graph affinity API
  2023-02-20 14:07   ` Jerin Jacob
@ 2023-02-24  6:39     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:39 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 10:07 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 06/13] graph: introduce graph affinity API
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add lcore_id for graph to hold affinity core id where graph would run on.
> > Add bind/unbind API to set/unset graph affinity attribute. lcore_id
> > will be set as MAX by default, it means not enable this attribute.
> >
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> 
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > 33ff055be6..1c599b5b47 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -18,6 +18,8 @@ EXPERIMENTAL {
> >         rte_graph_node_get_by_name;
> >         rte_graph_obj_dump;
> >         rte_graph_walk;
> > +       rte_graph_bind_core;
> 
> if it is not applicable to RTC, please change to
> rte_graph_model_pipeline_core_bind()
> 

It could be used by RTC, and means to bind all nodes to the same core.
But it's not necessary.
I will change it with specific mode name.

> > +       rte_graph_unbind_core;
> >
> >         rte_graph_cluster_stats_create;
> >         rte_graph_cluster_stats_destroy;
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 08/13] graph: introduce stream moving cross cores
  2023-02-20 14:17   ` Jerin Jacob
@ 2023-02-24  6:48     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:48 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 10:17 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 08/13] graph: introduce stream moving cross cores
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > This patch introduces key functions to allow a worker thread to enable
> > enqueue and move streams of objects to the next nodes over different
> > cores.
> >
> > 1. add graph_sched_wq_node to hold graph scheduling workqueue node
> > stream 2. add workqueue help functions to
> > create/destroy/enqueue/dequeue
> 
> Two things, make as two patches
> 
I will do in next version. 

> 
> > @@ -39,6 +46,15 @@ struct rte_graph {
> >         uint32_t cir_mask;           /**< Circular buffer wrap around mask. */
> >         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
> >         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer.
> > */
> > +       /* Graph schedule */
> > +       struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
> > +       struct rte_graph_rq_head rq_head; /* The head for run-queue
> > + list */
> > +
> > +       SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
> > +       unsigned int lcore_id;  /**< The graph running Lcore. */
> > +       struct rte_ring *wq;    /**< The work-queue for pending streams. */
> > +       struct rte_mempool *mp; /**< The mempool for scheduling streams. */
> > +       /* Graph schedule area */
> >         rte_graph_off_t nodes_start; /**< Offset at which node memory starts.
> */
> >         rte_graph_t id; /**< Graph identifier. */
> >         int socket;     /**< Socket ID where memory is allocated. */
> > @@ -63,6 +79,8 @@ struct rte_node {
> >         char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
> >         char name[RTE_NODE_NAMESIZE];   /**< Name of the node. */
> >
> > +       /* Fast schedule area */
> > +       unsigned int lcore_id __rte_cache_aligned;  /**< Node running
> > + Lcore. */
> 
> Do we need __rte_cache_aligned here? I am wondering can we add union for
> different model specific area ONLY for fast path so that we can save memory
> and fast path data will be more warm.

Maybe it is not necessary. I agree with you and I can use union to cover the specific field.

> 
> >         /* Fast path area  */
> >  #define RTE_NODE_CTX_SZ 16
> >         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node
> > Context. */
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model
  2023-02-20 14:20   ` Jerin Jacob
@ 2023-02-24  6:49     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:49 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 10:20 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker
> model
> 
> On Thu, Nov 17, 2022 at 10:41 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new parameter "model" to choose generic or rtc worker model.
> > And in generic model, the node will affinity to worker core successively.
> >
> > Note:
> > only support one RX node for remote model in current implementation.
> >
> > ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > --model="generic"
> 
> Patch apply issue, please rebase with main.
> See https://patches.dpdk.org/project/dpdk/patch/20221117050926.136974-14-
> zhirun.yan@intel.com/
> 
Will fix in next version. Thanks for your comments.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  examples/l3fwd-graph/main.c | 218
> > +++++++++++++++++++++++++++++-------
> >  1 file changed, 179 insertions(+), 39 deletions(-)
> >
> > diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> > index 6dcb6ee92b..c145a3e3e8 100644
> > --- a/examples/l3fwd-graph/main.c
> > +++ b/examples/l3fwd-graph/main.c
> > @@ -147,6 +147,19 @@ static struct ipv4_l3fwd_lpm_route
> ipv4_l3fwd_lpm_route_array[] = {
> >         {RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0),
> > 24, 7},  };
> >
> > +static int
> > +check_worker_model_params(void)
> > +{
> > +       if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC &&
> > +           nb_lcore_params > 1) {
> > +               printf("Exceeded max number of lcore params for remote
> model: %hu\n",
> > +                      nb_lcore_params);
> > +               return -1;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int
> >  check_lcore_params(void)
> >  {
> > @@ -291,6 +304,20 @@ parse_max_pkt_len(const char *pktlen)
> >         return len;
> >  }
> >
> > +static int
> > +parse_worker_model(const char *model) {
> > +       if (strcmp(model, WORKER_MODEL_DEFAULT) == 0)
> > +               return RTE_GRAPH_MODEL_DEFAULT;
> > +       else if (strcmp(model, WORKER_MODEL_GENERIC) == 0) {
> > +               rte_graph_worker_model_set(RTE_GRAPH_MODEL_GENERIC);
> > +               return RTE_GRAPH_MODEL_GENERIC;
> > +       }
> > +       rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
> > +
> > +       return RTE_GRAPH_MODEL_MAX;
> > +}
> > +
> >  static int
> >  parse_portmask(const char *portmask)
> >  {
> > @@ -404,6 +431,7 @@ static const char short_options[] = "p:" /* portmask */
> >  #define CMD_LINE_OPT_NO_NUMA      "no-numa"
> >  #define CMD_LINE_OPT_MAX_PKT_LEN   "max-pkt-len"
> >  #define CMD_LINE_OPT_PER_PORT_POOL "per-port-pool"
> > +#define CMD_LINE_OPT_WORKER_MODEL  "model"
> >  enum {
> >         /* Long options mapped to a short option */
> >
> > @@ -416,6 +444,7 @@ enum {
> >         CMD_LINE_OPT_NO_NUMA_NUM,
> >         CMD_LINE_OPT_MAX_PKT_LEN_NUM,
> >         CMD_LINE_OPT_PARSE_PER_PORT_POOL,
> > +       CMD_LINE_OPT_WORKER_MODEL_TYPE,
> >  };
> >
> >  static const struct option lgopts[] = { @@ -424,6 +453,7 @@ static
> > const struct option lgopts[] = {
> >         {CMD_LINE_OPT_NO_NUMA, 0, 0, CMD_LINE_OPT_NO_NUMA_NUM},
> >         {CMD_LINE_OPT_MAX_PKT_LEN, 1, 0,
> CMD_LINE_OPT_MAX_PKT_LEN_NUM},
> >         {CMD_LINE_OPT_PER_PORT_POOL, 0, 0,
> > CMD_LINE_OPT_PARSE_PER_PORT_POOL},
> > +       {CMD_LINE_OPT_WORKER_MODEL, 1, 0,
> > + CMD_LINE_OPT_WORKER_MODEL_TYPE},
> >         {NULL, 0, 0, 0},
> >  };
> >
> > @@ -498,6 +528,11 @@ parse_args(int argc, char **argv)
> >                         per_port_pool = 1;
> >                         break;
> >
> > +               case CMD_LINE_OPT_WORKER_MODEL_TYPE:
> > +                       printf("Use new worker model: %s\n", optarg);
> > +                       parse_worker_model(optarg);
> > +                       break;
> > +
> >                 default:
> >                         print_usage(prgname);
> >                         return -1;
> > @@ -735,6 +770,140 @@ config_port_max_pkt_len(struct rte_eth_conf
> *conf,
> >         return 0;
> >  }
> >
> > +static void
> > +graph_config_generic(struct rte_graph_param graph_conf) {
> > +       uint16_t nb_patterns = graph_conf.nb_node_patterns;
> > +       int worker_count = rte_lcore_count() - 1;
> > +       int main_lcore_id = rte_get_main_lcore();
> > +       int worker_lcore = main_lcore_id;
> > +       rte_graph_t main_graph_id = 0;
> > +       struct rte_node *node_tmp;
> > +       struct lcore_conf *qconf;
> > +       struct rte_graph *graph;
> > +       rte_graph_t graph_id;
> > +       rte_graph_off_t off;
> > +       int n_rx_node = 0;
> > +       rte_node_t count;
> > +       rte_edge_t i;
> > +       int ret;
> > +
> > +       for (int j = 0; j < nb_lcore_params; j++) {
> > +               qconf = &lcore_conf[lcore_params[j].lcore_id];
> > +               /* Add rx node patterns of all lcore */
> > +               for (i = 0; i < qconf->n_rx_queue; i++) {
> > +                       char *node_name =
> > + qconf->rx_queue_list[i].node_name;
> > +
> > +                       graph_conf.node_patterns[nb_patterns + n_rx_node + i] =
> node_name;
> > +                       n_rx_node++;
> > +                       ret = rte_node_model_generic_set_lcore_affinity(node_name,
> > +                                                                       lcore_params[j].lcore_id);
> > +                       if (ret == 0)
> > +                               printf("Set node %s affinity to lcore %u\n", node_name,
> > +                                      lcore_params[j].lcore_id);
> > +               }
> > +       }
> > +
> > +       graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
> > +       graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
> > +
> > +       snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> > +                main_lcore_id);
> > +
> > +       /* create main graph */
> > +       main_graph_id = rte_graph_create(qconf->name, &graph_conf);
> > +       if (main_graph_id == RTE_GRAPH_ID_INVALID)
> > +               rte_exit(EXIT_FAILURE,
> > +                        "rte_graph_create(): main_graph_id invalid for lcore %u\n",
> > +                        main_lcore_id);
> > +
> > +       qconf->graph_id = main_graph_id;
> > +       qconf->graph = rte_graph_lookup(qconf->name);
> > +       /* >8 End of graph initialization. */
> > +       if (!qconf->graph)
> > +               rte_exit(EXIT_FAILURE,
> > +                        "rte_graph_lookup(): graph %s not found\n",
> > +                        qconf->name);
> > +
> > +       graph = qconf->graph;
> > +       rte_graph_foreach_node(count, off, graph, node_tmp) {
> > +               worker_lcore = rte_get_next_lcore(worker_lcore, true,
> > + 1);
> > +
> > +               /* Need to set the node Lcore affinity before clone graph for each
> lcore */
> > +               if (node_tmp->lcore_id == RTE_MAX_LCORE) {
> > +                       ret = rte_node_model_generic_set_lcore_affinity(node_tmp-
> >name,
> > +                                                                       worker_lcore);
> > +                       if (ret == 0)
> > +                               printf("Set node %s affinity to lcore %u\n",
> > +                                      node_tmp->name, worker_lcore);
> > +               }
> > +       }
> > +
> > +       worker_lcore = main_lcore_id;
> > +       for (int i = 0; i < worker_count; i++) {
> > +               worker_lcore = rte_get_next_lcore(worker_lcore, true,
> > + 1);
> > +
> > +               qconf = &lcore_conf[worker_lcore];
> > +               snprintf(qconf->name, sizeof(qconf->name), "cloned-%u",
> worker_lcore);
> > +               graph_id = rte_graph_clone(main_graph_id, qconf->name);
> > +               ret = rte_graph_bind_core(graph_id, worker_lcore);
> > +               if (ret == 0)
> > +                       printf("bind graph %d to lcore %u\n",
> > + graph_id, worker_lcore);
> > +
> > +               /* full cloned graph name */
> > +               snprintf(qconf->name, sizeof(qconf->name), "%s",
> > +                        rte_graph_id_to_name(graph_id));
> > +               qconf->graph_id = graph_id;
> > +               qconf->graph = rte_graph_lookup(qconf->name);
> > +               if (!qconf->graph)
> > +                       rte_exit(EXIT_FAILURE,
> > +                                "Failed to lookup graph %s\n",
> > +                                qconf->name);
> > +               continue;
> > +       }
> > +}
> > +
> > +static void
> > +graph_config_rtc(struct rte_graph_param graph_conf) {
> > +       uint16_t nb_patterns = graph_conf.nb_node_patterns;
> > +       struct lcore_conf *qconf;
> > +       rte_graph_t graph_id;
> > +       uint32_t lcore_id;
> > +       rte_edge_t i;
> > +
> > +       for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> > +               if (rte_lcore_is_enabled(lcore_id) == 0)
> > +                       continue;
> > +
> > +               qconf = &lcore_conf[lcore_id];
> > +               /* Skip graph creation if no source exists */
> > +               if (!qconf->n_rx_queue)
> > +                       continue;
> > +               /* Add rx node patterns of this lcore */
> > +               for (i = 0; i < qconf->n_rx_queue; i++) {
> > +                       graph_conf.node_patterns[nb_patterns + i] =
> > +                               qconf->rx_queue_list[i].node_name;
> > +               }
> > +               graph_conf.nb_node_patterns = nb_patterns + i;
> > +               graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
> > +               snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> > +                        lcore_id);
> > +               graph_id = rte_graph_create(qconf->name, &graph_conf);
> > +               if (graph_id == RTE_GRAPH_ID_INVALID)
> > +                       rte_exit(EXIT_FAILURE,
> > +                                "rte_graph_create(): graph_id invalid for lcore %u\n",
> > +                                lcore_id);
> > +               qconf->graph_id = graph_id;
> > +               qconf->graph = rte_graph_lookup(qconf->name);
> > +               /* >8 End of graph initialization. */
> > +               if (!qconf->graph)
> > +                       rte_exit(EXIT_FAILURE,
> > +                                "rte_graph_lookup(): graph %s not found\n",
> > +                                qconf->name);
> > +       }
> > +}
> > +
> >  int
> >  main(int argc, char **argv)
> >  {
> > @@ -759,6 +928,7 @@ main(int argc, char **argv)
> >         uint16_t nb_patterns;
> >         uint8_t rewrite_len;
> >         uint32_t lcore_id;
> > +       uint16_t model;
> >         int ret;
> >
> >         /* Init EAL */
> > @@ -787,6 +957,9 @@ main(int argc, char **argv)
> >         if (check_lcore_params() < 0)
> >                 rte_exit(EXIT_FAILURE, "check_lcore_params()
> > failed\n");
> >
> > +       if (check_worker_model_params() < 0)
> > +               rte_exit(EXIT_FAILURE, "check_worker_model_params()
> > + failed\n");
> > +
> >         ret = init_lcore_rx_queues();
> >         if (ret < 0)
> >                 rte_exit(EXIT_FAILURE, "init_lcore_rx_queues()
> > failed\n"); @@ -1026,46 +1199,13 @@ main(int argc, char **argv)
> >
> >         memset(&graph_conf, 0, sizeof(graph_conf));
> >         graph_conf.node_patterns = node_patterns;
> > +       graph_conf.nb_node_patterns = nb_patterns;
> >
> > -       for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> > -               rte_graph_t graph_id;
> > -               rte_edge_t i;
> > -
> > -               if (rte_lcore_is_enabled(lcore_id) == 0)
> > -                       continue;
> > -
> > -               qconf = &lcore_conf[lcore_id];
> > -
> > -               /* Skip graph creation if no source exists */
> > -               if (!qconf->n_rx_queue)
> > -                       continue;
> > -
> > -               /* Add rx node patterns of this lcore */
> > -               for (i = 0; i < qconf->n_rx_queue; i++) {
> > -                       graph_conf.node_patterns[nb_patterns + i] =
> > -                               qconf->rx_queue_list[i].node_name;
> > -               }
> > -
> > -               graph_conf.nb_node_patterns = nb_patterns + i;
> > -               graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
> > -
> > -               snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> > -                        lcore_id);
> > -
> > -               graph_id = rte_graph_create(qconf->name, &graph_conf);
> > -               if (graph_id == RTE_GRAPH_ID_INVALID)
> > -                       rte_exit(EXIT_FAILURE,
> > -                                "rte_graph_create(): graph_id invalid"
> > -                                " for lcore %u\n", lcore_id);
> > -
> > -               qconf->graph_id = graph_id;
> > -               qconf->graph = rte_graph_lookup(qconf->name);
> > -               /* >8 End of graph initialization. */
> > -               if (!qconf->graph)
> > -                       rte_exit(EXIT_FAILURE,
> > -                                "rte_graph_lookup(): graph %s not found\n",
> > -                                qconf->name);
> > -       }
> > +       model = rte_graph_worker_model_get();
> > +       if (model == RTE_GRAPH_MODEL_DEFAULT)
> > +               graph_config_rtc(graph_conf);
> > +       else if (model == RTE_GRAPH_MODEL_GENERIC)
> > +               graph_config_generic(graph_conf);
> >
> >         memset(&rewrite_data, 0, sizeof(rewrite_data));
> >         rewrite_len = sizeof(rewrite_data);
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-24  6:31     ` Yan, Zhirun
@ 2023-02-26 22:23       ` Jerin Jacob
  2023-03-02  8:38         ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-26 22:23 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue

On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, February 20, 2023 9:51 PM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>
> > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> >
> > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > Add new get/set APIs to configure graph worker model which is used to
> > > determine which model will be chosen.
> > >
> > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > ---
> > >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > >  lib/graph/version.map               |  3 ++
> > >  3 files changed, 67 insertions(+)
> > >
> > > diff --git a/lib/graph/rte_graph_worker.h
> > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153 100644
> > > --- a/lib/graph/rte_graph_worker.h
> > > +++ b/lib/graph/rte_graph_worker.h
> > > @@ -1,5 +1,56 @@
> > >  #include "rte_graph_model_rtc.h"
> > >
> > > +static enum rte_graph_worker_model worker_model =
> > > +RTE_GRAPH_MODEL_DEFAULT;
> >
> > This will break the multiprocess.
>
> Thanks. I will use TLS for per-thread local storage.

If it needs to be used from secondary process, then it needs to be from memzone.



>
> >
> > > +
> > > +/** Graph worker models */
> > > +enum rte_graph_worker_model {
> > > +#define WORKER_MODEL_DEFAULT "default"
> >
> > Why need strings?
> > Also, every symbol in a public header file should start with RTE_ to avoid
> > namespace conflict.
>
> It was used to config the model in app. I can put the string into example.

OK

>
> >
> > > +       RTE_GRAPH_MODEL_DEFAULT = 0,
> > > +#define WORKER_MODEL_RTC "rtc"
> > > +       RTE_GRAPH_MODEL_RTC,
> >
> > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum
> > itself.
> Yes, will do in next version.
>
> >
> > > +#define WORKER_MODEL_GENERIC "generic"
> >
> > Generic is a very overloaded term. Use pipeline here i.e
> > RTE_GRAPH_MODEL_PIPELINE
>
> Actually, it's not a purely pipeline mode. I prefer to change to hybrid.

Hybrid is very overloaded term, and it will be confusing (considering
there will be new models in future).
Please pick a word that really express the model working.

> >
> >
> > > +       RTE_GRAPH_MODEL_GENERIC,
> > > +       RTE_GRAPH_MODEL_MAX,
> >
> > No need for MAX, it will break the ABI for future. See other subsystem such as
> > cryptodev.
>
> Thanks, I will change it.
> >
> > > +};
> >
> > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-26 22:23       ` Jerin Jacob
@ 2023-03-02  8:38         ` Yan, Zhirun
  2023-03-02 13:58           ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-03-02  8:38 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 27, 2023 6:23 AM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, February 20, 2023 9:51 PM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com>
> wrote:
> > > >
> > > > Add new get/set APIs to configure graph worker model which is used
> > > > to determine which model will be chosen.
> > > >
> > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > ---
> > > >  lib/graph/rte_graph_worker.h        | 51
> +++++++++++++++++++++++++++++
> > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > >  lib/graph/version.map               |  3 ++
> > > >  3 files changed, 67 insertions(+)
> > > >
> > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> 100644
> > > > --- a/lib/graph/rte_graph_worker.h
> > > > +++ b/lib/graph/rte_graph_worker.h
> > > > @@ -1,5 +1,56 @@
> > > >  #include "rte_graph_model_rtc.h"
> > > >
> > > > +static enum rte_graph_worker_model worker_model =
> > > > +RTE_GRAPH_MODEL_DEFAULT;
> > >
> > > This will break the multiprocess.
> >
> > Thanks. I will use TLS for per-thread local storage.
> 
> If it needs to be used from secondary process, then it needs to be from
> memzone.
> 


This filed will be set by primary process in initial stage, and then lcore will only read it.
I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It seems
not necessary to allocate from memzone.

> 
> 
> >
> > >
> > > > +
> > > > +/** Graph worker models */
> > > > +enum rte_graph_worker_model {
> > > > +#define WORKER_MODEL_DEFAULT "default"
> > >
> > > Why need strings?
> > > Also, every symbol in a public header file should start with RTE_ to
> > > avoid namespace conflict.
> >
> > It was used to config the model in app. I can put the string into example.
> 
> OK
> 
> >
> > >
> > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define WORKER_MODEL_RTC
> > > > +"rtc"
> > > > +       RTE_GRAPH_MODEL_RTC,
> > >
> > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> enum
> > > itself.
> > Yes, will do in next version.
> >
> > >
> > > > +#define WORKER_MODEL_GENERIC "generic"
> > >
> > > Generic is a very overloaded term. Use pipeline here i.e
> > > RTE_GRAPH_MODEL_PIPELINE
> >
> > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> 
> Hybrid is very overloaded term, and it will be confusing (considering there
> will be new models in future).
> Please pick a word that really express the model working.
> 

In this case, the path is Node0 -> Node1 -> Node2 -> Node3
And Node1 and Node3 are binding with one core.

Our model offers the ability to dispatch between cores.

Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?

+ - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
'  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
'            '     '                          '     '            '
' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
'            '     '     |                    '     '      ^     '
+ - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                         |                                 |
                         + - - - - - - - - - - - - - - - - +


> > >
> > >
> > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > +       RTE_GRAPH_MODEL_MAX,
> > >
> > > No need for MAX, it will break the ABI for future. See other
> > > subsystem such as cryptodev.
> >
> > Thanks, I will change it.
> > >
> > > > +};
> > >
> > > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-03-02  8:38         ` Yan, Zhirun
@ 2023-03-02 13:58           ` Jerin Jacob
  2023-03-07  8:26             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-03-02 13:58 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue

On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, February 27, 2023 6:23 AM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > Wang, Haiyue <haiyue.wang@intel.com>
> > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> >
> > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > > Wang, Haiyue <haiyue.wang@intel.com>
> > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > > APIs
> > > >
> > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com>
> > wrote:
> > > > >
> > > > > Add new get/set APIs to configure graph worker model which is used
> > > > > to determine which model will be chosen.
> > > > >
> > > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > > ---
> > > > >  lib/graph/rte_graph_worker.h        | 51
> > +++++++++++++++++++++++++++++
> > > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > > >  lib/graph/version.map               |  3 ++
> > > > >  3 files changed, 67 insertions(+)
> > > > >
> > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > 100644
> > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > @@ -1,5 +1,56 @@
> > > > >  #include "rte_graph_model_rtc.h"
> > > > >
> > > > > +static enum rte_graph_worker_model worker_model =
> > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > >
> > > > This will break the multiprocess.
> > >
> > > Thanks. I will use TLS for per-thread local storage.
> >
> > If it needs to be used from secondary process, then it needs to be from
> > memzone.
> >
>
>
> This filed will be set by primary process in initial stage, and then lcore will only read it.
> I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It seems
> not necessary to allocate from memzone.
>
> >
> >
> > >
> > > >
> > > > > +
> > > > > +/** Graph worker models */
> > > > > +enum rte_graph_worker_model {
> > > > > +#define WORKER_MODEL_DEFAULT "default"
> > > >
> > > > Why need strings?
> > > > Also, every symbol in a public header file should start with RTE_ to
> > > > avoid namespace conflict.
> > >
> > > It was used to config the model in app. I can put the string into example.
> >
> > OK
> >
> > >
> > > >
> > > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define WORKER_MODEL_RTC
> > > > > +"rtc"
> > > > > +       RTE_GRAPH_MODEL_RTC,
> > > >
> > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > enum
> > > > itself.
> > > Yes, will do in next version.
> > >
> > > >
> > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > >
> > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > RTE_GRAPH_MODEL_PIPELINE
> > >
> > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> >
> > Hybrid is very overloaded term, and it will be confusing (considering there
> > will be new models in future).
> > Please pick a word that really express the model working.
> >
>
> In this case, the path is Node0 -> Node1 -> Node2 -> Node3
> And Node1 and Node3 are binding with one core.
>
> Our model offers the ability to dispatch between cores.
>
> Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?

Some names, What I can think of

// MCORE->MULTI CORE

RTE_GRAPH_MODEL_MCORE_PIPELINE
or
RTE_GRAG_MODEL_MCORE_DISPATCH
or
RTE_GRAG_MODEL_MCORE_RING
or
RTE_GRAPH_MODEL_MULTI_CORE

>
> + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> '  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
> '            '     '                          '     '            '
> ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> '            '     '     |                    '     '      ^     '
> + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
>                          |                                 |
>                          + - - - - - - - - - - - - - - - - +
>
>
> > > >
> > > >
> > > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > > +       RTE_GRAPH_MODEL_MAX,
> > > >
> > > > No need for MAX, it will break the ABI for future. See other
> > > > subsystem such as cryptodev.
> > >
> > > Thanks, I will change it.
> > > >
> > > > > +};
> > > >
> > > > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-03-02 13:58           ` Jerin Jacob
@ 2023-03-07  8:26             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-03-07  8:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, March 2, 2023 9:58 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, February 27, 2023 6:23 AM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > > > ndabilpuram@marvell.com; Liang, Cunming
> > > > > <cunming.liang@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> > > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker
> > > > > model APIs
> > > > >
> > > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan
> > > > > <zhirun.yan@intel.com>
> > > wrote:
> > > > > >
> > > > > > Add new get/set APIs to configure graph worker model which is
> > > > > > used to determine which model will be chosen.
> > > > > >
> > > > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > > > ---
> > > > > >  lib/graph/rte_graph_worker.h        | 51
> > > +++++++++++++++++++++++++++++
> > > > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > > > >  lib/graph/version.map               |  3 ++
> > > > > >  3 files changed, 67 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > > 100644
> > > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > > @@ -1,5 +1,56 @@
> > > > > >  #include "rte_graph_model_rtc.h"
> > > > > >
> > > > > > +static enum rte_graph_worker_model worker_model =
> > > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > > >
> > > > > This will break the multiprocess.
> > > >
> > > > Thanks. I will use TLS for per-thread local storage.
> > >
> > > If it needs to be used from secondary process, then it needs to be
> > > from memzone.
> > >
> >
> >
> > This filed will be set by primary process in initial stage, and then lcore will only
> read it.
> > I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It
> > seems not necessary to allocate from memzone.
> >
> > >
> > >
> > > >
> > > > >
> > > > > > +
> > > > > > +/** Graph worker models */
> > > > > > +enum rte_graph_worker_model { #define WORKER_MODEL_DEFAULT
> > > > > > +"default"
> > > > >
> > > > > Why need strings?
> > > > > Also, every symbol in a public header file should start with
> > > > > RTE_ to avoid namespace conflict.
> > > >
> > > > It was used to config the model in app. I can put the string into example.
> > >
> > > OK
> > >
> > > >
> > > > >
> > > > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define
> WORKER_MODEL_RTC
> > > > > > +"rtc"
> > > > > > +       RTE_GRAPH_MODEL_RTC,
> > > > >
> > > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > > enum
> > > > > itself.
> > > > Yes, will do in next version.
> > > >
> > > > >
> > > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > > >
> > > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > > RTE_GRAPH_MODEL_PIPELINE
> > > >
> > > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> > >
> > > Hybrid is very overloaded term, and it will be confusing
> > > (considering there will be new models in future).
> > > Please pick a word that really express the model working.
> > >
> >
> > In this case, the path is Node0 -> Node1 -> Node2 -> Node3 And Node1
> > and Node3 are binding with one core.
> >
> > Our model offers the ability to dispatch between cores.
> >
> > Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?
> 
> Some names, What I can think of
> 
> // MCORE->MULTI CORE
> 
> RTE_GRAPH_MODEL_MCORE_PIPELINE
> or
> RTE_GRAG_MODEL_MCORE_DISPATCH
> or
> RTE_GRAG_MODEL_MCORE_RING
> or
> RTE_GRAPH_MODEL_MULTI_CORE
> 

Thanks, I will use RTE_GRAG_MODEL_MCORE_DISPATCH as the name.

> >
> > + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> > '  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
> > '            '     '                          '     '            '
> > ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> > ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > '            '     '     |                    '     '      ^     '
> > + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> >                          |                                 |
> >                          + - - - - - - - - - - - - - - - - +
> >
> >
> > > > >
> > > > >
> > > > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > > > +       RTE_GRAPH_MODEL_MAX,
> > > > >
> > > > > No need for MAX, it will break the ABI for future. See other
> > > > > subsystem such as cryptodev.
> > > >
> > > > Thanks, I will change it.
> > > > >
> > > > > > +};
> > > > >
> > > > > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 00/15] graph enhancement for multi-core dispatch
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (13 preceding siblings ...)
  2023-02-20  0:22 ` [PATCH v1 00/13] graph enhancement for multi-core dispatch Thomas Monjalon
@ 2023-03-24  2:16 ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 01/15] graph: rename rte_graph_work as common Zhirun Yan
                     ` (15 more replies)
  14 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V2:
Use git mv to keep git history for patch 1,2.
Use TLS for per-thread local storage about model setting in patch 4.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch for patch 8,9.
Fix CI build issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.


Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf


Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for corss-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 237 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  44 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  44 +++
 lib/graph/rte_graph_model_dispatch.c | 179 +++++++++
 lib/graph/rte_graph_model_dispatch.h | 120 ++++++
 lib/graph/rte_graph_model_rtc.h      |  45 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 498 +------------------------
 lib/graph/rte_graph_worker_common.h  | 536 +++++++++++++++++++++++++++
 lib/graph/version.map                |   8 +
 18 files changed, 1546 insertions(+), 544 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 01/15] graph: rename rte_graph_work as common
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 02/15] graph: split graph worker into common and default model Zhirun Yan
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1a33ad8592..2608afba7b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1715,6 +1715,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..f08dbc7e9d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 02/15] graph: split graph worker into common and default model
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 03/15] graph: move node process into inline function Zhirun Yan
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 61 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 ---------------------------
 6 files changed, 98 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f08dbc7e9d..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..665560f831
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..7ea18ba80a
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 03/15] graph: move node process into inline function
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 665560f831..0dcb7151e9 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 04/15] graph: add get/set graph worker model APIs
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (2 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 05/15] graph: introduce graph node core affinity API Zhirun Yan
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 16 +++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 74 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..692ee1b0d2
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+inline int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..64d777bd5f 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,14 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
 /**
  * @internal
  *
@@ -490,6 +498,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 05/15] graph: introduce graph node core affinity API
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (3 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 06/15] graph: introduce graph bind unbind API Zhirun Yan
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 31 ++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..409eed3284 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -50,6 +50,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..4a2f99496d
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 06/15] graph: introduce graph bind unbind API
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (4 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a839a2803b..b39a99aac6 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 409eed3284..ad1d058945 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 07/15] graph: introduce graph clone API for other worker core
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (5 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 08/15] graph: add struct for stream moving between cores Zhirun Yan
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b39a99aac6..90eaad0378 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ad1d058945..d28a5af93e 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 08/15] graph: add struct for stream moving between cores
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (6 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 09/15] graph: introduce stream moving cross cores Zhirun Yan
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 90eaad0378..dd3d69dbf7 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d28a5af93e..b66b18ebbc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -60,6 +60,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 64d777bd5f..70cfde7015 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -40,6 +47,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -73,6 +89,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 09/15] graph: introduce stream moving cross cores
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (7 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  27 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph_model_dispatch.c | 145 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  35 +++++++
 4 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index b66b18ebbc..e1a2a4bfd8 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4a2f99496d..b46dd156ac 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void __rte_noinline
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..7cbdf2fdcf 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,47 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+bool __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+void __rte_noinline __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (8 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index dd3d69dbf7..1f1ee9b622 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (9 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 7cbdf2fdcf..764c4ecfd0 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -71,6 +71,48 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 12/15] graph: enable graph multicore dispatch scheduler model
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (10 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 13/15] graph: add stats for corss-core dispatching Zhirun Yan
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 7ea18ba80a..d608c7513e 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -10,6 +10,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -24,7 +25,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 13/15] graph: add stats for corss-core dispatching
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (11 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..aa22cc403c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..7d77a790ac 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index b46dd156ac..4cf00160ea 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 70cfde7015..be8508cd83 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -94,6 +94,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (12 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 13/15] graph: add stats for corss-core dispatching Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 237 +++++++++++++++++++++++++++++-------
 1 file changed, 195 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..cfa78003f4 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,141 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	/* >8 End of graph initialization. */
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		/* >8 End of graph initialization. */
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +987,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1021,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1263,18 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1325,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 15/15] doc: update multicore dispatch model in graph guides
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (13 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..e3c0d652e4 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walk model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 00/15] graph enhancement for multi-core dispatch
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (14 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-03-29  6:43   ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 01/15] graph: rename rte_graph_work as common Zhirun Yan
                       ` (15 more replies)
  15 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.


Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf


Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for corss-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 237 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  44 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  44 +++
 lib/graph/rte_graph_model_dispatch.c | 179 +++++++++
 lib/graph/rte_graph_model_dispatch.h | 120 ++++++
 lib/graph/rte_graph_model_rtc.h      |  45 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 498 +------------------------
 lib/graph/rte_graph_worker_common.h  | 536 +++++++++++++++++++++++++++
 lib/graph/version.map                |   8 +
 18 files changed, 1546 insertions(+), 544 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 01/15] graph: rename rte_graph_work as common
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 02/15] graph: split graph worker into common and default model Zhirun Yan
                       ` (14 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 280058adfc..9d9467dd00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..f08dbc7e9d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 02/15] graph: split graph worker into common and default model
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 03/15] graph: move node process into inline function Zhirun Yan
                       ` (13 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 61 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 ---------------------------
 6 files changed, 98 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f08dbc7e9d..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..665560f831
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..7ea18ba80a
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 03/15] graph: move node process into inline function
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29 15:34       ` Stephen Hemminger
  2023-03-29  6:43     ` [PATCH v3 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                       ` (12 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 665560f831..0dcb7151e9 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 04/15] graph: add get/set graph worker model APIs
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (2 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29 15:35       ` Stephen Hemminger
  2023-03-29  6:43     ` [PATCH v3 05/15] graph: introduce graph node core affinity API Zhirun Yan
                       ` (11 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 77 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..692ee1b0d2
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+inline int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..1526da6e2c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -19,6 +19,7 @@
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
+#include <rte_per_lcore.h>
 #include <rte_prefetch.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
@@ -95,6 +96,16 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
+RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model);
+
 /**
  * @internal
  *
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 05/15] graph: introduce graph node core affinity API
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (3 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 06/15] graph: introduce graph bind unbind API Zhirun Yan
                       ` (10 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 31 ++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..409eed3284 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -50,6 +50,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..4a2f99496d
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 06/15] graph: introduce graph bind unbind API
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (4 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                       ` (9 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a839a2803b..b39a99aac6 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 409eed3284..ad1d058945 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 07/15] graph: introduce graph clone API for other worker core
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (5 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 08/15] graph: add struct for stream moving between cores Zhirun Yan
                       ` (8 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b39a99aac6..90eaad0378 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ad1d058945..d28a5af93e 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 08/15] graph: add struct for stream moving between cores
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (6 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 09/15] graph: introduce stream moving cross cores Zhirun Yan
                       ` (7 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 90eaad0378..dd3d69dbf7 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d28a5af93e..b66b18ebbc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -60,6 +60,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 1526da6e2c..dc0a0b5554 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -30,6 +30,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -41,6 +48,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -74,6 +90,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 09/15] graph: introduce stream moving cross cores
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (7 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                       ` (6 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  27 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph_model_dispatch.c | 145 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  35 +++++++
 4 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index b66b18ebbc..e1a2a4bfd8 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4a2f99496d..b46dd156ac 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void __rte_noinline
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..7cbdf2fdcf 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,47 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+bool __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+void __rte_noinline __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (8 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                       ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index dd3d69dbf7..1f1ee9b622 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (9 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                       ` (4 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 7cbdf2fdcf..764c4ecfd0 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -71,6 +71,48 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 12/15] graph: enable graph multicore dispatch scheduler model
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (10 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 13/15] graph: add stats for cross-core dispatching Zhirun Yan
                       ` (3 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 7ea18ba80a..d608c7513e 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -10,6 +10,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -24,7 +25,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 13/15] graph: add stats for cross-core dispatching
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (11 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                       ` (2 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..aa22cc403c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..7d77a790ac 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index b46dd156ac..4cf00160ea 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index dc0a0b5554..d94983589c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (12 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 237 +++++++++++++++++++++++++++++-------
 1 file changed, 195 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..cfa78003f4 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,141 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	/* >8 End of graph initialization. */
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		/* >8 End of graph initialization. */
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +987,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1021,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1263,18 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1325,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 15/15] doc: update multicore dispatch model in graph guides
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (13 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..72e26f3a5a 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walking model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v3 03/15] graph: move node process into inline function
  2023-03-29  6:43     ` [PATCH v3 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-29 15:34       ` Stephen Hemminger
  2023-03-29 15:41         ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Stephen Hemminger @ 2023-03-29 15:34 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Wed, 29 Mar 2023 15:43:28 +0900
Zhirun Yan <zhirun.yan@intel.com> wrote:

> +/**
> + * @internal
> + *
> + * Enqueue a given node to the tail of the graph reel.
> + *
> + * @param graph
> + *   Pointer Graph object.
> + * @param node
> + *   Pointer to node object to be enqueued.
> + */
> +static __rte_always_inline void
> +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> +{
> +	uint64_t start;
> +	uint16_t rc;
> +	void **objs;
> +
> +	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +	objs = node->objs;
> +	rte_prefetch0(objs);
> +
> +	if (rte_graph_has_stats_feature()) {
> +		start = rte_rdtsc();
> +		rc = node->process(graph, node, objs, node->idx);
> +		node->total_cycles += rte_rdtsc() - start;
> +		node->total_calls++;
> +		node->total_objs += rc;
> +	} else {
> +		node->process(graph, node, objs, node->idx);
> +	}
> +	node->idx = 0;
> +}
> +

Why inline? Doing everything as inlines has long term ABI
impacts. And this is not a super critical performance path.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v3 04/15] graph: add get/set graph worker model APIs
  2023-03-29  6:43     ` [PATCH v3 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-29 15:35       ` Stephen Hemminger
  2023-03-30  3:37         ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Stephen Hemminger @ 2023-03-29 15:35 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Wed, 29 Mar 2023 15:43:29 +0900
Zhirun Yan <zhirun.yan@intel.com> wrote:

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + * Set the graph worker model
> + *
> + * @note This function does not perform any locking, and is only safe to call
> + *    before graph running.
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +inline int
> +rte_graph_worker_model_set(enum rte_graph_worker_model model)
> +{
> +	if (model >= RTE_GRAPH_MODEL_LIST_END)
> +		goto fail;
> +
> +	RTE_PER_LCORE(worker_model) = model;
> +	return 0;
> +
> +fail:
> +	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
> +	return -1;
> +}
> +

Once again, this doesn't have to be inline, could be a real API.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v3 03/15] graph: move node process into inline function
  2023-03-29 15:34       ` Stephen Hemminger
@ 2023-03-29 15:41         ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-03-29 15:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Zhirun Yan, dev, jerinj, kirankumark, ndabilpuram, cunming.liang,
	haiyue.wang

On Wed, Mar 29, 2023 at 9:04 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Wed, 29 Mar 2023 15:43:28 +0900
> Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> > +/**
> > + * @internal
> > + *
> > + * Enqueue a given node to the tail of the graph reel.
> > + *
> > + * @param graph
> > + *   Pointer Graph object.
> > + * @param node
> > + *   Pointer to node object to be enqueued.
> > + */
> > +static __rte_always_inline void
> > +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> > +{
> > +     uint64_t start;
> > +     uint16_t rc;
> > +     void **objs;
> > +
> > +     RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +     objs = node->objs;
> > +     rte_prefetch0(objs);
> > +
> > +     if (rte_graph_has_stats_feature()) {
> > +             start = rte_rdtsc();
> > +             rc = node->process(graph, node, objs, node->idx);
> > +             node->total_cycles += rte_rdtsc() - start;
> > +             node->total_calls++;
> > +             node->total_objs += rc;
> > +     } else {
> > +             node->process(graph, node, objs, node->idx);
> > +     }
> > +     node->idx = 0;
> > +}
> > +
>
> Why inline? Doing everything as inlines has long term ABI
> impacts. And this is not a super critical performance path.

This is one of the real fast path routine.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v3 04/15] graph: add get/set graph worker model APIs
  2023-03-29 15:35       ` Stephen Hemminger
@ 2023-03-30  3:37         ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-03-30  3:37 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, March 29, 2023 11:35 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v3 04/15] graph: add get/set graph worker model APIs
> 
> On Wed, 29 Mar 2023 15:43:29 +0900
> Zhirun Yan <zhirun.yan@intel.com> wrote:
> 
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + * Set the graph worker model
> > + *
> > + * @note This function does not perform any locking, and is only safe to call
> > + *    before graph running.
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> > + */
> > +inline int
> > +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> > +	if (model >= RTE_GRAPH_MODEL_LIST_END)
> > +		goto fail;
> > +
> > +	RTE_PER_LCORE(worker_model) = model;
> > +	return 0;
> > +
> > +fail:
> > +	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
> > +	return -1;
> > +}
> > +
> 
> Once again, this doesn't have to be inline, could be a real API.

Thanks, I will remove inline in next version.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 00/15] graph enhancement for multi-core dispatch
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (14 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-03-30  6:18     ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 01/15] graph: rename rte_graph_work as common Zhirun Yan
                         ` (15 more replies)
  15 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.


Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf


Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 237 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  44 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  44 +++
 lib/graph/rte_graph_model_dispatch.c | 179 +++++++++
 lib/graph/rte_graph_model_dispatch.h | 122 ++++++
 lib/graph/rte_graph_model_rtc.h      |  45 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 498 +------------------------
 lib/graph/rte_graph_worker_common.h  | 539 +++++++++++++++++++++++++++
 lib/graph/version.map                |  10 +
 18 files changed, 1553 insertions(+), 544 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 01/15] graph: rename rte_graph_work as common
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 02/15] graph: split graph worker into common and default model Zhirun Yan
                         ` (14 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 280058adfc..9d9467dd00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..f08dbc7e9d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 02/15] graph: split graph worker into common and default model
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 03/15] graph: move node process into inline function Zhirun Yan
                         ` (13 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 61 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 ---------------------------
 6 files changed, 98 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f08dbc7e9d..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..665560f831
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..7ea18ba80a
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 03/15] graph: move node process into inline function
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                         ` (12 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 665560f831..0dcb7151e9 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 04/15] graph: add get/set graph worker model APIs
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (2 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 05/15] graph: introduce graph node core affinity API Zhirun Yan
                         ` (11 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 77 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..cabc101262
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..1526da6e2c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -19,6 +19,7 @@
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
+#include <rte_per_lcore.h>
 #include <rte_prefetch.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
@@ -95,6 +96,16 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
+RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model);
+
 /**
  * @internal
  *
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 05/15] graph: introduce graph node core affinity API
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (3 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 06/15] graph: introduce graph bind unbind API Zhirun Yan
                         ` (10 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 31 ++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..409eed3284 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -50,6 +50,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..4a2f99496d
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 06/15] graph: introduce graph bind unbind API
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (4 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                         ` (9 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a839a2803b..b39a99aac6 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 409eed3284..ad1d058945 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 07/15] graph: introduce graph clone API for other worker core
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (5 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 08/15] graph: add struct for stream moving between cores Zhirun Yan
                         ` (8 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b39a99aac6..90eaad0378 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ad1d058945..d28a5af93e 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 08/15] graph: add struct for stream moving between cores
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (6 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 09/15] graph: introduce stream moving cross cores Zhirun Yan
                         ` (7 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 90eaad0378..dd3d69dbf7 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d28a5af93e..b66b18ebbc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -60,6 +60,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 1526da6e2c..dc0a0b5554 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -30,6 +30,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -41,6 +48,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -74,6 +90,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 09/15] graph: introduce stream moving cross cores
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (7 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                         ` (6 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  27 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph_model_dispatch.c | 145 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  37 +++++++
 lib/graph/version.map                |   2 +
 5 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index b66b18ebbc..e1a2a4bfd8 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4a2f99496d..a300fefb85 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..18fa7ce0ab 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,49 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+__rte_experimental
+void __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index aaa86f66ed..d511133f39 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_sched_wq_process;
+	__rte_graph_sched_node_enqueue;
 
 	rte_graph_model_dispatch_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (8 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                         ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index dd3d69dbf7..1f1ee9b622 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (9 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                         ` (4 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 18fa7ce0ab..65b2cc6d87 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -73,6 +73,48 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 12/15] graph: enable graph multicore dispatch scheduler model
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (10 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 13/15] graph: add stats for cross-core dispatching Zhirun Yan
                         ` (3 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 7ea18ba80a..d608c7513e 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -10,6 +10,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -24,7 +25,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 13/15] graph: add stats for cross-core dispatching
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (11 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                         ` (2 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..aa22cc403c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..7d77a790ac 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index a300fefb85..9db60eb463 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index dc0a0b5554..d94983589c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (12 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 237 +++++++++++++++++++++++++++++-------
 1 file changed, 195 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..cfa78003f4 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,141 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	/* >8 End of graph initialization. */
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		/* >8 End of graph initialization. */
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +987,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1021,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1263,18 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1325,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 15/15] doc: update multicore dispatch model in graph guides
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (13 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..72e26f3a5a 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walking model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 00/15] graph enhancement for multi-core dispatch
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (14 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-03-31  4:02       ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 01/15] graph: rename rte_graph_work as common Zhirun Yan
                           ` (15 more replies)
  15 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.


Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf



Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 236 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  44 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  44 +++
 lib/graph/rte_graph_model_dispatch.c | 179 +++++++++
 lib/graph/rte_graph_model_dispatch.h | 122 ++++++
 lib/graph/rte_graph_model_rtc.h      |  45 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 498 +------------------------
 lib/graph/rte_graph_worker_common.h  | 539 +++++++++++++++++++++++++++
 lib/graph/version.map                |  10 +
 18 files changed, 1552 insertions(+), 544 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 01/15] graph: rename rte_graph_work as common
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 02/15] graph: split graph worker into common and default model Zhirun Yan
                           ` (14 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 280058adfc..9d9467dd00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..f08dbc7e9d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 02/15] graph: split graph worker into common and default model
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-04-27 14:11           ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-03-31  4:02         ` [PATCH v5 03/15] graph: move node process into inline function Zhirun Yan
                           ` (13 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 61 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 ---------------------------
 6 files changed, 98 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f08dbc7e9d..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..665560f831
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..7ea18ba80a
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 03/15] graph: move node process into inline function
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-04-27 15:03           ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-03-31  4:02         ` [PATCH v5 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                           ` (12 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 665560f831..0dcb7151e9 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 04/15] graph: add get/set graph worker model APIs
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (2 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 05/15] graph: introduce graph node core affinity API Zhirun Yan
                           ` (11 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 77 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..cabc101262
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..1526da6e2c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -19,6 +19,7 @@
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
+#include <rte_per_lcore.h>
 #include <rte_prefetch.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
@@ -95,6 +96,16 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
+RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model);
+
 /**
  * @internal
  *
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 05/15] graph: introduce graph node core affinity API
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (3 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 06/15] graph: introduce graph bind unbind API Zhirun Yan
                           ` (10 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 31 ++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..409eed3284 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -50,6 +50,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..4a2f99496d
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 06/15] graph: introduce graph bind unbind API
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (4 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                           ` (9 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a839a2803b..b39a99aac6 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 409eed3284..ad1d058945 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 07/15] graph: introduce graph clone API for other worker core
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (5 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 08/15] graph: add struct for stream moving between cores Zhirun Yan
                           ` (8 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b39a99aac6..90eaad0378 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ad1d058945..d28a5af93e 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 08/15] graph: add struct for stream moving between cores
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (6 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 09/15] graph: introduce stream moving cross cores Zhirun Yan
                           ` (7 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 90eaad0378..dd3d69dbf7 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d28a5af93e..b66b18ebbc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -60,6 +60,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 1526da6e2c..dc0a0b5554 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -30,6 +30,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -41,6 +48,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -74,6 +90,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 09/15] graph: introduce stream moving cross cores
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (7 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-04-27 14:52           ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-03-31  4:03         ` [PATCH v5 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                           ` (6 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  27 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph_model_dispatch.c | 145 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  37 +++++++
 lib/graph/version.map                |   2 +
 5 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index b66b18ebbc..e1a2a4bfd8 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4a2f99496d..a300fefb85 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..18fa7ce0ab 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,49 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+__rte_experimental
+void __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index aaa86f66ed..d511133f39 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_sched_wq_process;
+	__rte_graph_sched_node_enqueue;
 
 	rte_graph_model_dispatch_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (8 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                           ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index dd3d69dbf7..1f1ee9b622 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (9 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-04-27 14:58           ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-03-31  4:03         ` [PATCH v5 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                           ` (4 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 18fa7ce0ab..65b2cc6d87 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -73,6 +73,48 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 12/15] graph: enable graph multicore dispatch scheduler model
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (10 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 13/15] graph: add stats for cross-core dispatching Zhirun Yan
                           ` (3 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 7ea18ba80a..d608c7513e 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -10,6 +10,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -24,7 +25,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 13/15] graph: add stats for cross-core dispatching
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (11 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                           ` (2 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..aa22cc403c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..7d77a790ac 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index a300fefb85..9db60eb463 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index dc0a0b5554..d94983589c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (12 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 236 +++++++++++++++++++++++++++++-------
 1 file changed, 194 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..7078ed4c77 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,139 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +985,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1019,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1261,19 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1324,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 15/15] doc: update multicore dispatch model in graph guides
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (13 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..72e26f3a5a 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walking model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 02/15] graph: split graph worker into common and default model
  2023-03-31  4:02         ` [PATCH v5 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-04-27 14:11           ` Pavan Nikhilesh Bhagavatula
  2023-05-05  2:09             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-04-27 14:11 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: cunming.liang, haiyue.wang



> -----Original Message-----
> From: Zhirun Yan <zhirun.yan@intel.com>
> Sent: Friday, March 31, 2023 9:33 AM
> To: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Kiran
> Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: cunming.liang@intel.com; haiyue.wang@intel.com; Zhirun Yan
> <zhirun.yan@intel.com>
> Subject: [EXT] [PATCH v5 02/15] graph: split graph worker into common and
> default model
> 
> External Email
> 
> ----------------------------------------------------------------------
> To support multiple graph worker model, split graph into common
> and default. Naming the current walk function as rte_graph_model_rtc
> cause the default model is RTC(Run-to-completion).
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph_pcap.c              |  2 +-
>  lib/graph/graph_private.h           |  2 +-
>  lib/graph/meson.build               |  2 +-
>  lib/graph/rte_graph_model_rtc.h     | 61
> +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 57 ---------------------------
>  6 files changed, 98 insertions(+), 60 deletions(-)
>  create mode 100644 lib/graph/rte_graph_model_rtc.h
>  create mode 100644 lib/graph/rte_graph_worker.h
> 
> diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
> index 8a220370fa..6c43330029 100644
> --- a/lib/graph/graph_pcap.c
> +++ b/lib/graph/graph_pcap.c
> @@ -10,7 +10,7 @@
>  #include <rte_mbuf.h>
>  #include <rte_pcapng.h>
> 
> -#include "rte_graph_worker_common.h"
> +#include "rte_graph_worker.h"
> 
>  #include "graph_pcap_private.h"
> 
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index f08dbc7e9d..7d1b30b8ac 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -12,7 +12,7 @@
>  #include <rte_eal.h>
> 
>  #include "rte_graph.h"
> -#include "rte_graph_worker_common.h"
> +#include "rte_graph_worker.h"
> 
>  extern int rte_graph_logtype;
> 
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index 4e2b612ad3..3526d1b5d4 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -16,6 +16,6 @@ sources = files(
>          'graph_populate.c',
>          'graph_pcap.c',
>  )
> -headers = files('rte_graph.h', 'rte_graph_worker_common.h')
> +headers = files('rte_graph.h', 'rte_graph_worker.h')
> 
>  deps += ['eal', 'pcapng']
> diff --git a/lib/graph/rte_graph_model_rtc.h
> b/lib/graph/rte_graph_model_rtc.h
> new file mode 100644
> index 0000000000..665560f831
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_rtc.h
> @@ -0,0 +1,61 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +

Please retain Marvell copyright too.

> +#include "rte_graph_worker_common.h"
> +
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +static inline void
> +rte_graph_walk_rtc(struct rte_graph *graph)
> +{
> +	const rte_graph_off_t *cir_start = graph->cir_start;
> +	const rte_node_t mask = graph->cir_mask;
> +	uint32_t head = graph->head;
> +	struct rte_node *node;
> +	uint64_t start;
> +	uint16_t rc;
> +	void **objs;
> +
> +	/*
> +	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
> +	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> +	 * in a circular buffer fashion.
> +	 *
> +	 *	+-----+ <= cir_start - head [number of source nodes]
> +	 *	|     |
> +	 *	| ... | <= source nodes
> +	 *	|     |
> +	 *	+-----+ <= cir_start [head = 0] [tail = 0]
> +	 *	|     |
> +	 *	| ... | <= pending streams
> +	 *	|     |
> +	 *	+-----+ <= cir_start + mask
> +	 */
> +	while (likely(head != graph->tail)) {
> +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +		objs = node->objs;
> +		rte_prefetch0(objs);
> +
> +		if (rte_graph_has_stats_feature()) {
> +			start = rte_rdtsc();
> +			rc = node->process(graph, node, objs, node->idx);
> +			node->total_cycles += rte_rdtsc() - start;
> +			node->total_calls++;
> +			node->total_objs += rc;
> +		} else {
> +			node->process(graph, node, objs, node->idx);
> +		}
> +			node->idx = 0;
> +			head = likely((int32_t)head > 0) ? head & mask :
> head;
> +	}
> +	graph->tail = 0;
> +}
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> new file mode 100644
> index 0000000000..7ea18ba80a
> --- /dev/null
> +++ b/lib/graph/rte_graph_worker.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +
> +#ifndef _RTE_GRAPH_WORKER_H_
> +#define _RTE_GRAPH_WORKER_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_graph_model_rtc.h"
> +
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +__rte_experimental
> +static inline void
> +rte_graph_walk(struct rte_graph *graph)
> +{
> +	rte_graph_walk_rtc(graph);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_GRAPH_WORKER_H_ */
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index 0bad2938f3..b58f8f6947 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -128,63 +128,6 @@ __rte_experimental
>  void __rte_node_stream_alloc_size(struct rte_graph *graph,
>  				  struct rte_node *node, uint16_t req_size);
> 
> -/**
> - * Perform graph walk on the circular buffer and invoke the process function
> - * of the nodes and collect the stats.
> - *
> - * @param graph
> - *   Graph pointer returned from rte_graph_lookup function.
> - *
> - * @see rte_graph_lookup()
> - */
> -__rte_experimental
> -static inline void
> -rte_graph_walk(struct rte_graph *graph)
> -{
> -	const rte_graph_off_t *cir_start = graph->cir_start;
> -	const rte_node_t mask = graph->cir_mask;
> -	uint32_t head = graph->head;
> -	struct rte_node *node;
> -	uint64_t start;
> -	uint16_t rc;
> -	void **objs;
> -
> -	/*
> -	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
> -	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> -	 * in a circular buffer fashion.
> -	 *
> -	 *	+-----+ <= cir_start - head [number of source nodes]
> -	 *	|     |
> -	 *	| ... | <= source nodes
> -	 *	|     |
> -	 *	+-----+ <= cir_start [head = 0] [tail = 0]
> -	 *	|     |
> -	 *	| ... | <= pending streams
> -	 *	|     |
> -	 *	+-----+ <= cir_start + mask
> -	 */
> -	while (likely(head != graph->tail)) {
> -		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> -		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> -		objs = node->objs;
> -		rte_prefetch0(objs);
> -
> -		if (rte_graph_has_stats_feature()) {
> -			start = rte_rdtsc();
> -			rc = node->process(graph, node, objs, node->idx);
> -			node->total_cycles += rte_rdtsc() - start;
> -			node->total_calls++;
> -			node->total_objs += rc;
> -		} else {
> -			node->process(graph, node, objs, node->idx);
> -		}
> -		node->idx = 0;
> -		head = likely((int32_t)head > 0) ? head & mask : head;
> -	}
> -	graph->tail = 0;
> -}
> -
>  /* Fast path helper functions */
> 
>  /**
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 09/15] graph: introduce stream moving cross cores
  2023-03-31  4:03         ` [PATCH v5 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-04-27 14:52           ` Pavan Nikhilesh Bhagavatula
  2023-05-05  2:10             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-04-27 14:52 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: cunming.liang, haiyue.wang

> This patch introduces key functions to allow a worker thread to
> enable enqueue and move streams of objects to the next nodes over
> different cores.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph_private.h            |  27 +++++
>  lib/graph/meson.build                |   2 +-
>  lib/graph/rte_graph_model_dispatch.c | 145
> +++++++++++++++++++++++++++
>  lib/graph/rte_graph_model_dispatch.h |  37 +++++++
>  lib/graph/version.map                |   2 +
>  5 files changed, 212 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index b66b18ebbc..e1a2a4bfd8 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
>   */
>  void node_dump(FILE *f, struct node *n);
> 
> +/**
> + * @internal
> + *
> + * Create the graph schedule work queue. And all cloned graphs attached to
> the
> + * parent graph MUST be destroyed together for fast schedule design
> limitation.
> + *
> + * @param _graph
> + *   The graph object
> + * @param _parent_graph
> + *   The parent graph object which holds the run-queue head.
> + *
> + * @return
> + *   - 0: Success.
> + *   - <0: Graph schedule work queue related error.
> + */
> +int graph_sched_wq_create(struct graph *_graph, struct graph
> *_parent_graph);
> +
> +/**
> + * @internal
> + *
> + * Destroy the graph schedule work queue.
> + *
> + * @param _graph
> + *   The graph object
> + */
> +void graph_sched_wq_destroy(struct graph *_graph);
> +
>  #endif /* _RTE_GRAPH_PRIVATE_H_ */
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index c729d984b6..e21affa280 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -20,4 +20,4 @@ sources = files(
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
> 
> -deps += ['eal', 'pcapng']
> +deps += ['eal', 'pcapng', 'mempool', 'ring']
> diff --git a/lib/graph/rte_graph_model_dispatch.c
> b/lib/graph/rte_graph_model_dispatch.c
> index 4a2f99496d..a300fefb85 100644
> --- a/lib/graph/rte_graph_model_dispatch.c
> +++ b/lib/graph/rte_graph_model_dispatch.c
> @@ -5,6 +5,151 @@
>  #include "graph_private.h"
>  #include "rte_graph_model_dispatch.h"
> 
> +int
> +graph_sched_wq_create(struct graph *_graph, struct graph
> *_parent_graph)
> +{
> +	struct rte_graph *parent_graph = _parent_graph->graph;
> +	struct rte_graph *graph = _graph->graph;
> +	unsigned int wq_size;
> +
> +	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
> +	wq_size = rte_align32pow2(wq_size + 1);

Hi Zhirun,

We should introduce a new function `rte_graph_configure` which can help 
application to control the ring size and mempool size of the work queue?
We could fallback to default values if nothing is configured.

rte_graph_configure should take a 
struct rte_graph_config {
	struct {
		u64 rsvd[8];
	} rtc;
	struct {
		u16 wq_size;
		...
	} dispatch;
};

This will help future graph models to have their own configuration.

We can have a rte_graph_config_init() function to initialize the rte_graph_config structure.


> +
> +	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
> +				    RING_F_SC_DEQ);
> +	if (graph->wq == NULL)
> +		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
> +
> +	graph->mp = rte_mempool_create(graph->name, wq_size,
> +				       sizeof(struct graph_sched_wq_node),
> +				       0, 0, NULL, NULL, NULL, NULL,
> +				       graph->socket, MEMPOOL_F_SP_PUT);
> +	if (graph->mp == NULL)
> +		SET_ERR_JMP(EIO, fail_mp,
> +			    "Failed to allocate graph WQ schedule entry");
> +
> +	graph->lcore_id = _graph->lcore_id;
> +
> +	if (parent_graph->rq == NULL) {
> +		parent_graph->rq = &parent_graph->rq_head;
> +		SLIST_INIT(parent_graph->rq);
> +	}
> +
> +	graph->rq = parent_graph->rq;
> +	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
> +
> +	return 0;
> +
> +fail_mp:
> +	rte_ring_free(graph->wq);
> +	graph->wq = NULL;
> +fail:
> +	return -rte_errno;
> +}
> +
> +void
> +graph_sched_wq_destroy(struct graph *_graph)
> +{
> +	struct rte_graph *graph = _graph->graph;
> +
> +	if (graph == NULL)
> +		return;
> +
> +	rte_ring_free(graph->wq);
> +	graph->wq = NULL;
> +
> +	rte_mempool_free(graph->mp);
> +	graph->mp = NULL;
> +}
> +
> +static __rte_always_inline bool
> +__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph
> *graph)
> +{
> +	struct graph_sched_wq_node *wq_node;
> +	uint16_t off = 0;
> +	uint16_t size;
> +
> +submit_again:
> +	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
> +		goto fallback;
> +
> +	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
> +	wq_node->node_off = node->off;
> +	wq_node->nb_objs = size;
> +	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void
> *));
> +
> +	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void
> *)&wq_node,
> +					  sizeof(wq_node), 1, NULL) == 0)
> +		rte_pause();
> +
> +	off += size;
> +	node->idx -= size;
> +	if (node->idx > 0)
> +		goto submit_again;
> +
> +	return true;
> +
> +fallback:
> +	if (off != 0)
> +		memmove(&node->objs[0], &node->objs[off],
> +			node->idx * sizeof(void *));
> +
> +	return false;
> +}
> +
> +bool __rte_noinline
> +__rte_graph_sched_node_enqueue(struct rte_node *node,
> +			       struct rte_graph_rq_head *rq)
> +{
> +	const unsigned int lcore_id = node->lcore_id;
> +	struct rte_graph *graph;
> +
> +	SLIST_FOREACH(graph, rq, rq_next)
> +		if (graph->lcore_id == lcore_id)
> +			break;
> +
> +	return graph != NULL ? __graph_sched_node_enqueue(node,
> graph) : false;
> +}
> +
> +void
> +__rte_graph_sched_wq_process(struct rte_graph *graph)
> +{
> +	struct graph_sched_wq_node *wq_node;
> +	struct rte_mempool *mp = graph->mp;
> +	struct rte_ring *wq = graph->wq;
> +	uint16_t idx, free_space;
> +	struct rte_node *node;
> +	unsigned int i, n;
> +	struct graph_sched_wq_node *wq_nodes[32];
> +
> +	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes,
> sizeof(wq_nodes[0]),
> +					   RTE_DIM(wq_nodes), NULL);
> +	if (n == 0)
> +		return;
> +
> +	for (i = 0; i < n; i++) {
> +		wq_node = wq_nodes[i];
> +		node = RTE_PTR_ADD(graph, wq_node->node_off);
> +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +		idx = node->idx;
> +		free_space = node->size - idx;
> +
> +		if (unlikely(free_space < wq_node->nb_objs))
> +			__rte_node_stream_alloc_size(graph, node, node-
> >size + wq_node->nb_objs);
> +
> +		memmove(&node->objs[idx], wq_node->objs, wq_node-
> >nb_objs * sizeof(void *));
> +		memset(wq_node->objs, 0, wq_node->nb_objs *
> sizeof(void *));

Memset should be avoided in fastpath for better performance as we anyway set  wq_node->nb_objs as 0.

> +		node->idx = idx + wq_node->nb_objs;
> +
> +		__rte_node_process(graph, node);
> +
> +		wq_node->nb_objs = 0;
> +		node->idx = 0;
> +	}
> +
> +	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
> +}
> +
>  int
>  rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned
> int lcore_id)
>  {
> diff --git a/lib/graph/rte_graph_model_dispatch.h
> b/lib/graph/rte_graph_model_dispatch.h
> index 179624e972..18fa7ce0ab 100644
> --- a/lib/graph/rte_graph_model_dispatch.h
> +++ b/lib/graph/rte_graph_model_dispatch.h
> @@ -14,12 +14,49 @@
>   *
>   * This API allows to set core affinity with the node.
>   */
> +#include <rte_errno.h>
> +#include <rte_mempool.h>
> +#include <rte_memzone.h>
> +#include <rte_ring.h>
> +
>  #include "rte_graph_worker_common.h"
> 
>  #ifdef __cplusplus
>  extern "C" {
>  #endif
> 
> +#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
> +#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
> +	((typeof(nb_nodes))((nb_nodes) *
> GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
> +
> +/**
> + * @internal
> + *
> + * Schedule the node to the right graph's work queue.
> + *
> + * @param node
> + *   Pointer to the scheduled node object.
> + * @param rq
> + *   Pointer to the scheduled run-queue for all graphs.
> + *
> + * @return
> + *   True on success, false otherwise.
> + */
> +__rte_experimental
> +bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node
> *node,
> +				    struct rte_graph_rq_head *rq);
> +
> +/**
> + * @internal
> + *
> + * Process all nodes (streams) in the graph's work queue.
> + *
> + * @param graph
> + *   Pointer to the graph object.
> + */
> +__rte_experimental
> +void __rte_graph_sched_wq_process(struct rte_graph *graph);
> +
>  /**
>   * Set lcore affinity with the node.
>   *
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index aaa86f66ed..d511133f39 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -48,6 +48,8 @@ EXPERIMENTAL {
> 
>  	rte_graph_worker_model_set;
>  	rte_graph_worker_model_get;
> +	__rte_graph_sched_wq_process;
> +	__rte_graph_sched_node_enqueue;
> 
>  	rte_graph_model_dispatch_lcore_affinity_set;
> 
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-31  4:03         ` [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-04-27 14:58           ` Pavan Nikhilesh Bhagavatula
  2023-05-05  2:09             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-04-27 14:58 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: cunming.liang, haiyue.wang

> This patch introduces the task scheduler mechanism to enable dispatching
> tasks to another worker cores. Currently, there is only a local work
> queue for one graph to walk. We introduce a scheduler worker queue in
> each worker core for dispatching tasks. It will perform the walk on
> scheduler work queue first, then handle the local work queue.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_model_dispatch.h | 42
> ++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/lib/graph/rte_graph_model_dispatch.h
> b/lib/graph/rte_graph_model_dispatch.h
> index 18fa7ce0ab..65b2cc6d87 100644
> --- a/lib/graph/rte_graph_model_dispatch.h
> +++ b/lib/graph/rte_graph_model_dispatch.h
> @@ -73,6 +73,48 @@ __rte_experimental
>  int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
>  						unsigned int lcore_id);
> 
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +__rte_experimental
> +static inline void
> +rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
> +{
> +	const rte_graph_off_t *cir_start = graph->cir_start;
> +	const rte_node_t mask = graph->cir_mask;
> +	uint32_t head = graph->head;
> +	struct rte_node *node;

I think we should add a RTE_ASSERT here to make sure that the graph object is a cloned graph.

> +
> +	if (graph->wq != NULL)
> +		__rte_graph_sched_wq_process(graph);
> +
> +	while (likely(head != graph->tail)) {
> +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> +
> +		/* skip the src nodes which not bind with current worker */
> +		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
> +			continue;
> +
> +		/* Schedule the node until all task/objs are done */
> +		if (node->lcore_id != RTE_MAX_LCORE &&
> +		    graph->lcore_id != node->lcore_id && graph->rq != NULL
> &&
> +		    __rte_graph_sched_node_enqueue(node, graph->rq))
> +			continue;
> +
> +		__rte_node_process(graph, node);
> +
> +		head = likely((int32_t)head > 0) ? head & mask : head;
> +	}
> +
> +	graph->tail = 0;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 03/15] graph: move node process into inline function
  2023-03-31  4:02         ` [PATCH v5 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-04-27 15:03           ` Pavan Nikhilesh Bhagavatula
  2023-05-05  2:10             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-04-27 15:03 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: cunming.liang, haiyue.wang

> Node process is a single and reusable block, move the code into an inline
> function.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
>  lib/graph/rte_graph_worker_common.h | 33
> +++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/graph/rte_graph_model_rtc.h
> b/lib/graph/rte_graph_model_rtc.h
> index 665560f831..0dcb7151e9 100644
> --- a/lib/graph/rte_graph_model_rtc.h
> +++ b/lib/graph/rte_graph_model_rtc.h
> @@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>  	const rte_node_t mask = graph->cir_mask;
>  	uint32_t head = graph->head;
>  	struct rte_node *node;
> -	uint64_t start;
> -	uint16_t rc;
> -	void **objs;
> 
>  	/*
>  	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
> @@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>  	 */
>  	while (likely(head != graph->tail)) {
>  		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> -		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> -		objs = node->objs;
> -		rte_prefetch0(objs);
> -
> -		if (rte_graph_has_stats_feature()) {
> -			start = rte_rdtsc();

Since we are refactoring this function could you change rte_rdtsc() to rte_rdtsc_precise().

> -			rc = node->process(graph, node, objs, node->idx);
> -			node->total_cycles += rte_rdtsc() - start;
> -			node->total_calls++;
> -			node->total_objs += rc;
> -		} else {
> -			node->process(graph, node, objs, node->idx);
> -		}
> -			node->idx = 0;
> -			head = likely((int32_t)head > 0) ? head & mask :
> head;
> +		__rte_node_process(graph, node);
> +		head = likely((int32_t)head > 0) ? head & mask : head;
>  	}
>  	graph->tail = 0;
>  }
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index b58f8f6947..41428974db 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct
> rte_graph *graph,
> 
>  /* Fast path helper functions */
> 
> +/**
> + * @internal
> + *
> + * Enqueue a given node to the tail of the graph reel.
> + *
> + * @param graph
> + *   Pointer Graph object.
> + * @param node
> + *   Pointer to node object to be enqueued.
> + */
> +static __rte_always_inline void
> +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> +{
> +	uint64_t start;
> +	uint16_t rc;
> +	void **objs;
> +
> +	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +	objs = node->objs;
> +	rte_prefetch0(objs);
> +
> +	if (rte_graph_has_stats_feature()) {
> +		start = rte_rdtsc();
> +		rc = node->process(graph, node, objs, node->idx);
> +		node->total_cycles += rte_rdtsc() - start;
> +		node->total_calls++;
> +		node->total_objs += rc;
> +	} else {
> +		node->process(graph, node, objs, node->idx);
> +	}
> +	node->idx = 0;
> +}
> +
>  /**
>   * @internal
>   *
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 02/15] graph: split graph worker into common and default model
  2023-04-27 14:11           ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-05-05  2:09             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-05  2:09 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, April 27, 2023 10:11 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v5 02/15] graph: split graph worker into common and
> default model
> 
> 
> 
> > -----Original Message-----
> > From: Zhirun Yan <zhirun.yan@intel.com>
> > Sent: Friday, March 31, 2023 9:33 AM
> > To: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > Kiran Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar
> > Dabilpuram <ndabilpuram@marvell.com>; stephen@networkplumber.org
> > Cc: cunming.liang@intel.com; haiyue.wang@intel.com; Zhirun Yan
> > <zhirun.yan@intel.com>
> > Subject: [EXT] [PATCH v5 02/15] graph: split graph worker into common
> > and default model
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > To support multiple graph worker model, split graph into common and
> > default. Naming the current walk function as rte_graph_model_rtc cause
> > the default model is RTC(Run-to-completion).
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph_pcap.c              |  2 +-
> >  lib/graph/graph_private.h           |  2 +-
> >  lib/graph/meson.build               |  2 +-
> >  lib/graph/rte_graph_model_rtc.h     | 61
> > +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 57 ---------------------------
> >  6 files changed, 98 insertions(+), 60 deletions(-)  create mode
> > 100644 lib/graph/rte_graph_model_rtc.h  create mode 100644
> > lib/graph/rte_graph_worker.h
> >
> > diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index
> > 8a220370fa..6c43330029 100644
> > --- a/lib/graph/graph_pcap.c
> > +++ b/lib/graph/graph_pcap.c
> > @@ -10,7 +10,7 @@
> >  #include <rte_mbuf.h>
> >  #include <rte_pcapng.h>
> >
> > -#include "rte_graph_worker_common.h"
> > +#include "rte_graph_worker.h"
> >
> >  #include "graph_pcap_private.h"
> >
> > diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> > index f08dbc7e9d..7d1b30b8ac 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -12,7 +12,7 @@
> >  #include <rte_eal.h>
> >
> >  #include "rte_graph.h"
> > -#include "rte_graph_worker_common.h"
> > +#include "rte_graph_worker.h"
> >
> >  extern int rte_graph_logtype;
> >
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > 4e2b612ad3..3526d1b5d4 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -16,6 +16,6 @@ sources = files(
> >          'graph_populate.c',
> >          'graph_pcap.c',
> >  )
> > -headers = files('rte_graph.h', 'rte_graph_worker_common.h')
> > +headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> >  deps += ['eal', 'pcapng']
> > diff --git a/lib/graph/rte_graph_model_rtc.h
> > b/lib/graph/rte_graph_model_rtc.h new file mode 100644 index
> > 0000000000..665560f831
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_model_rtc.h
> > @@ -0,0 +1,61 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Intel Corporation  */
> > +
> 
> Please retain Marvell copyright too.
> 
Yes, I will do in next version. Thanks for reminding me.

> > +#include "rte_graph_worker_common.h"
> > +
> > +/**
> > + * Perform graph walk on the circular buffer and invoke the process
> > function
> > + * of the nodes and collect the stats.
> > + *
> > + * @param graph
> > + *   Graph pointer returned from rte_graph_lookup function.
> > + *
> > + * @see rte_graph_lookup()
> > + */
> > +static inline void
> > +rte_graph_walk_rtc(struct rte_graph *graph) {
> > +	const rte_graph_off_t *cir_start = graph->cir_start;
> > +	const rte_node_t mask = graph->cir_mask;
> > +	uint32_t head = graph->head;
> > +	struct rte_node *node;
> > +	uint64_t start;
> > +	uint16_t rc;
> > +	void **objs;
> > +
> > +	/*
> > +	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> > then
> > +	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> > +	 * in a circular buffer fashion.
> > +	 *
> > +	 *	+-----+ <= cir_start - head [number of source nodes]
> > +	 *	|     |
> > +	 *	| ... | <= source nodes
> > +	 *	|     |
> > +	 *	+-----+ <= cir_start [head = 0] [tail = 0]
> > +	 *	|     |
> > +	 *	| ... | <= pending streams
> > +	 *	|     |
> > +	 *	+-----+ <= cir_start + mask
> > +	 */
> > +	while (likely(head != graph->tail)) {
> > +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> > cir_start[(int32_t)head++]);
> > +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +		objs = node->objs;
> > +		rte_prefetch0(objs);
> > +
> > +		if (rte_graph_has_stats_feature()) {
> > +			start = rte_rdtsc();
> > +			rc = node->process(graph, node, objs, node->idx);
> > +			node->total_cycles += rte_rdtsc() - start;
> > +			node->total_calls++;
> > +			node->total_objs += rc;
> > +		} else {
> > +			node->process(graph, node, objs, node->idx);
> > +		}
> > +			node->idx = 0;
> > +			head = likely((int32_t)head > 0) ? head & mask :
> > head;
> > +	}
> > +	graph->tail = 0;
> > +}
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h new file mode 100644 index
> > 0000000000..7ea18ba80a
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -0,0 +1,34 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Intel Corporation  */
> > +
> > +#ifndef _RTE_GRAPH_WORKER_H_
> > +#define _RTE_GRAPH_WORKER_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "rte_graph_model_rtc.h"
> > +
> > +/**
> > + * Perform graph walk on the circular buffer and invoke the process
> > function
> > + * of the nodes and collect the stats.
> > + *
> > + * @param graph
> > + *   Graph pointer returned from rte_graph_lookup function.
> > + *
> > + * @see rte_graph_lookup()
> > + */
> > +__rte_experimental
> > +static inline void
> > +rte_graph_walk(struct rte_graph *graph) {
> > +	rte_graph_walk_rtc(graph);
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_GRAPH_WORKER_H_ */
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index 0bad2938f3..b58f8f6947 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -128,63 +128,6 @@ __rte_experimental  void
> > __rte_node_stream_alloc_size(struct rte_graph *graph,
> >  				  struct rte_node *node, uint16_t req_size);
> >
> > -/**
> > - * Perform graph walk on the circular buffer and invoke the process
> > function
> > - * of the nodes and collect the stats.
> > - *
> > - * @param graph
> > - *   Graph pointer returned from rte_graph_lookup function.
> > - *
> > - * @see rte_graph_lookup()
> > - */
> > -__rte_experimental
> > -static inline void
> > -rte_graph_walk(struct rte_graph *graph) -{
> > -	const rte_graph_off_t *cir_start = graph->cir_start;
> > -	const rte_node_t mask = graph->cir_mask;
> > -	uint32_t head = graph->head;
> > -	struct rte_node *node;
> > -	uint64_t start;
> > -	uint16_t rc;
> > -	void **objs;
> > -
> > -	/*
> > -	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> > then
> > -	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> > -	 * in a circular buffer fashion.
> > -	 *
> > -	 *	+-----+ <= cir_start - head [number of source nodes]
> > -	 *	|     |
> > -	 *	| ... | <= source nodes
> > -	 *	|     |
> > -	 *	+-----+ <= cir_start [head = 0] [tail = 0]
> > -	 *	|     |
> > -	 *	| ... | <= pending streams
> > -	 *	|     |
> > -	 *	+-----+ <= cir_start + mask
> > -	 */
> > -	while (likely(head != graph->tail)) {
> > -		node = (struct rte_node *)RTE_PTR_ADD(graph,
> > cir_start[(int32_t)head++]);
> > -		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > -		objs = node->objs;
> > -		rte_prefetch0(objs);
> > -
> > -		if (rte_graph_has_stats_feature()) {
> > -			start = rte_rdtsc();
> > -			rc = node->process(graph, node, objs, node->idx);
> > -			node->total_cycles += rte_rdtsc() - start;
> > -			node->total_calls++;
> > -			node->total_objs += rc;
> > -		} else {
> > -			node->process(graph, node, objs, node->idx);
> > -		}
> > -		node->idx = 0;
> > -		head = likely((int32_t)head > 0) ? head & mask : head;
> > -	}
> > -	graph->tail = 0;
> > -}
> > -
> >  /* Fast path helper functions */
> >
> >  /**
> > --
> > 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch
  2023-04-27 14:58           ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-05-05  2:09             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-05  2:09 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, April 27, 2023 10:59 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v5 11/15] graph: introduce graph walk by cross-core
> dispatch
> 
> > This patch introduces the task scheduler mechanism to enable
> > dispatching tasks to another worker cores. Currently, there is only a
> > local work queue for one graph to walk. We introduce a scheduler
> > worker queue in each worker core for dispatching tasks. It will
> > perform the walk on scheduler work queue first, then handle the local work
> queue.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_model_dispatch.h | 42
> > ++++++++++++++++++++++++++++
> >  1 file changed, 42 insertions(+)
> >
> > diff --git a/lib/graph/rte_graph_model_dispatch.h
> > b/lib/graph/rte_graph_model_dispatch.h
> > index 18fa7ce0ab..65b2cc6d87 100644
> > --- a/lib/graph/rte_graph_model_dispatch.h
> > +++ b/lib/graph/rte_graph_model_dispatch.h
> > @@ -73,6 +73,48 @@ __rte_experimental
> >  int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
> >  						unsigned int lcore_id);
> >
> > +/**
> > + * Perform graph walk on the circular buffer and invoke the process
> > function
> > + * of the nodes and collect the stats.
> > + *
> > + * @param graph
> > + *   Graph pointer returned from rte_graph_lookup function.
> > + *
> > + * @see rte_graph_lookup()
> > + */
> > +__rte_experimental
> > +static inline void
> > +rte_graph_walk_mcore_dispatch(struct rte_graph *graph) {
> > +	const rte_graph_off_t *cir_start = graph->cir_start;
> > +	const rte_node_t mask = graph->cir_mask;
> > +	uint32_t head = graph->head;
> > +	struct rte_node *node;
> 
> I think we should add a RTE_ASSERT here to make sure that the graph object is a
> cloned graph.
> 
Ok, I will add RTE_ASSERT in next version. 

> > +
> > +	if (graph->wq != NULL)
> > +		__rte_graph_sched_wq_process(graph);
> > +
> > +	while (likely(head != graph->tail)) {
> > +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> > cir_start[(int32_t)head++]);
> > +
> > +		/* skip the src nodes which not bind with current worker */
> > +		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
> > +			continue;
> > +
> > +		/* Schedule the node until all task/objs are done */
> > +		if (node->lcore_id != RTE_MAX_LCORE &&
> > +		    graph->lcore_id != node->lcore_id && graph->rq != NULL
> > &&
> > +		    __rte_graph_sched_node_enqueue(node, graph->rq))
> > +			continue;
> > +
> > +		__rte_node_process(graph, node);
> > +
> > +		head = likely((int32_t)head > 0) ? head & mask : head;
> > +	}
> > +
> > +	graph->tail = 0;
> > +}
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > --
> > 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 09/15] graph: introduce stream moving cross cores
  2023-04-27 14:52           ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-05-05  2:10             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-05  2:10 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, April 27, 2023 10:53 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v5 09/15] graph: introduce stream moving cross cores
> 
> > This patch introduces key functions to allow a worker thread to enable
> > enqueue and move streams of objects to the next nodes over different
> > cores.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph_private.h            |  27 +++++
> >  lib/graph/meson.build                |   2 +-
> >  lib/graph/rte_graph_model_dispatch.c | 145
> > +++++++++++++++++++++++++++
> >  lib/graph/rte_graph_model_dispatch.h |  37 +++++++
> >  lib/graph/version.map                |   2 +
> >  5 files changed, 212 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> > index b66b18ebbc..e1a2a4bfd8 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
> >   */
> >  void node_dump(FILE *f, struct node *n);
> >
> > +/**
> > + * @internal
> > + *
> > + * Create the graph schedule work queue. And all cloned graphs
> > +attached to
> > the
> > + * parent graph MUST be destroyed together for fast schedule design
> > limitation.
> > + *
> > + * @param _graph
> > + *   The graph object
> > + * @param _parent_graph
> > + *   The parent graph object which holds the run-queue head.
> > + *
> > + * @return
> > + *   - 0: Success.
> > + *   - <0: Graph schedule work queue related error.
> > + */
> > +int graph_sched_wq_create(struct graph *_graph, struct graph
> > *_parent_graph);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Destroy the graph schedule work queue.
> > + *
> > + * @param _graph
> > + *   The graph object
> > + */
> > +void graph_sched_wq_destroy(struct graph *_graph);
> > +
> >  #endif /* _RTE_GRAPH_PRIVATE_H_ */
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > c729d984b6..e21affa280 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -20,4 +20,4 @@ sources = files(
> >  )
> >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> > -deps += ['eal', 'pcapng']
> > +deps += ['eal', 'pcapng', 'mempool', 'ring']
> > diff --git a/lib/graph/rte_graph_model_dispatch.c
> > b/lib/graph/rte_graph_model_dispatch.c
> > index 4a2f99496d..a300fefb85 100644
> > --- a/lib/graph/rte_graph_model_dispatch.c
> > +++ b/lib/graph/rte_graph_model_dispatch.c
> > @@ -5,6 +5,151 @@
> >  #include "graph_private.h"
> >  #include "rte_graph_model_dispatch.h"
> >
> > +int
> > +graph_sched_wq_create(struct graph *_graph, struct graph
> > *_parent_graph)
> > +{
> > +	struct rte_graph *parent_graph = _parent_graph->graph;
> > +	struct rte_graph *graph = _graph->graph;
> > +	unsigned int wq_size;
> > +
> > +	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
> > +	wq_size = rte_align32pow2(wq_size + 1);
> 
> Hi Zhirun,
> 
> We should introduce a new function `rte_graph_configure` which can help
> application to control the ring size and mempool size of the work queue?
> We could fallback to default values if nothing is configured.
> 
> rte_graph_configure should take a
> struct rte_graph_config {
> 	struct {
> 		u64 rsvd[8];
> 	} rtc;
> 	struct {
> 		u16 wq_size;
> 		...
> 	} dispatch;
> };
> 
> This will help future graph models to have their own configuration.
> 
> We can have a rte_graph_config_init() function to initialize the rte_graph_config
> structure.
> 

Hi Pavan,

Thanks for your comments. I am agree with you. It would be more friendly for user/developer.
And for ring and mempool, there are some limitations(must be a power of 2) about the size. So
I prefer to use u16 wq_size_max and u32 mp_size_max for user if they have limited resources.

> 
> > +
> > +	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
> > +				    RING_F_SC_DEQ);
> > +	if (graph->wq == NULL)
> > +		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
> > +
> > +	graph->mp = rte_mempool_create(graph->name, wq_size,
> > +				       sizeof(struct graph_sched_wq_node),
> > +				       0, 0, NULL, NULL, NULL, NULL,
> > +				       graph->socket, MEMPOOL_F_SP_PUT);
> > +	if (graph->mp == NULL)
> > +		SET_ERR_JMP(EIO, fail_mp,
> > +			    "Failed to allocate graph WQ schedule entry");
> > +
> > +	graph->lcore_id = _graph->lcore_id;
> > +
> > +	if (parent_graph->rq == NULL) {
> > +		parent_graph->rq = &parent_graph->rq_head;
> > +		SLIST_INIT(parent_graph->rq);
> > +	}
> > +
> > +	graph->rq = parent_graph->rq;
> > +	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
> > +
> > +	return 0;
> > +
> > +fail_mp:
> > +	rte_ring_free(graph->wq);
> > +	graph->wq = NULL;
> > +fail:
> > +	return -rte_errno;
> > +}
> > +
> > +void
> > +graph_sched_wq_destroy(struct graph *_graph) {
> > +	struct rte_graph *graph = _graph->graph;
> > +
> > +	if (graph == NULL)
> > +		return;
> > +
> > +	rte_ring_free(graph->wq);
> > +	graph->wq = NULL;
> > +
> > +	rte_mempool_free(graph->mp);
> > +	graph->mp = NULL;
> > +}
> > +
> > +static __rte_always_inline bool
> > +__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph
> > *graph)
> > +{
> > +	struct graph_sched_wq_node *wq_node;
> > +	uint16_t off = 0;
> > +	uint16_t size;
> > +
> > +submit_again:
> > +	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
> > +		goto fallback;
> > +
> > +	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
> > +	wq_node->node_off = node->off;
> > +	wq_node->nb_objs = size;
> > +	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void
> > *));
> > +
> > +	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void
> > *)&wq_node,
> > +					  sizeof(wq_node), 1, NULL) == 0)
> > +		rte_pause();
> > +
> > +	off += size;
> > +	node->idx -= size;
> > +	if (node->idx > 0)
> > +		goto submit_again;
> > +
> > +	return true;
> > +
> > +fallback:
> > +	if (off != 0)
> > +		memmove(&node->objs[0], &node->objs[off],
> > +			node->idx * sizeof(void *));
> > +
> > +	return false;
> > +}
> > +
> > +bool __rte_noinline
> > +__rte_graph_sched_node_enqueue(struct rte_node *node,
> > +			       struct rte_graph_rq_head *rq) {
> > +	const unsigned int lcore_id = node->lcore_id;
> > +	struct rte_graph *graph;
> > +
> > +	SLIST_FOREACH(graph, rq, rq_next)
> > +		if (graph->lcore_id == lcore_id)
> > +			break;
> > +
> > +	return graph != NULL ? __graph_sched_node_enqueue(node,
> > graph) : false;
> > +}
> > +
> > +void
> > +__rte_graph_sched_wq_process(struct rte_graph *graph) {
> > +	struct graph_sched_wq_node *wq_node;
> > +	struct rte_mempool *mp = graph->mp;
> > +	struct rte_ring *wq = graph->wq;
> > +	uint16_t idx, free_space;
> > +	struct rte_node *node;
> > +	unsigned int i, n;
> > +	struct graph_sched_wq_node *wq_nodes[32];
> > +
> > +	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes,
> > sizeof(wq_nodes[0]),
> > +					   RTE_DIM(wq_nodes), NULL);
> > +	if (n == 0)
> > +		return;
> > +
> > +	for (i = 0; i < n; i++) {
> > +		wq_node = wq_nodes[i];
> > +		node = RTE_PTR_ADD(graph, wq_node->node_off);
> > +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +		idx = node->idx;
> > +		free_space = node->size - idx;
> > +
> > +		if (unlikely(free_space < wq_node->nb_objs))
> > +			__rte_node_stream_alloc_size(graph, node, node-
> > >size + wq_node->nb_objs);
> > +
> > +		memmove(&node->objs[idx], wq_node->objs, wq_node-
> > >nb_objs * sizeof(void *));
> > +		memset(wq_node->objs, 0, wq_node->nb_objs *
> > sizeof(void *));
> 
> Memset should be avoided in fastpath for better performance as we anyway set
> wq_node->nb_objs as 0.
> 
> > +		node->idx = idx + wq_node->nb_objs;
> > +
> > +		__rte_node_process(graph, node);
> > +
> > +		wq_node->nb_objs = 0;
> > +		node->idx = 0;
> > +	}
> > +
> > +	rte_mempool_put_bulk(mp, (void **)wq_nodes, n); }
> > +
> >  int
> >  rte_graph_model_dispatch_lcore_affinity_set(const char *name,
> > unsigned int lcore_id)  { diff --git
> > a/lib/graph/rte_graph_model_dispatch.h
> > b/lib/graph/rte_graph_model_dispatch.h
> > index 179624e972..18fa7ce0ab 100644
> > --- a/lib/graph/rte_graph_model_dispatch.h
> > +++ b/lib/graph/rte_graph_model_dispatch.h
> > @@ -14,12 +14,49 @@
> >   *
> >   * This API allows to set core affinity with the node.
> >   */
> > +#include <rte_errno.h>
> > +#include <rte_mempool.h>
> > +#include <rte_memzone.h>
> > +#include <rte_ring.h>
> > +
> >  #include "rte_graph_worker_common.h"
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> >  #endif
> >
> > +#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
> > +#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
> > +	((typeof(nb_nodes))((nb_nodes) *
> > GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
> > +
> > +/**
> > + * @internal
> > + *
> > + * Schedule the node to the right graph's work queue.
> > + *
> > + * @param node
> > + *   Pointer to the scheduled node object.
> > + * @param rq
> > + *   Pointer to the scheduled run-queue for all graphs.
> > + *
> > + * @return
> > + *   True on success, false otherwise.
> > + */
> > +__rte_experimental
> > +bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node
> > *node,
> > +				    struct rte_graph_rq_head *rq);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Process all nodes (streams) in the graph's work queue.
> > + *
> > + * @param graph
> > + *   Pointer to the graph object.
> > + */
> > +__rte_experimental
> > +void __rte_graph_sched_wq_process(struct rte_graph *graph);
> > +
> >  /**
> >   * Set lcore affinity with the node.
> >   *
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > aaa86f66ed..d511133f39 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -48,6 +48,8 @@ EXPERIMENTAL {
> >
> >  	rte_graph_worker_model_set;
> >  	rte_graph_worker_model_get;
> > +	__rte_graph_sched_wq_process;
> > +	__rte_graph_sched_node_enqueue;
> >
> >  	rte_graph_model_dispatch_lcore_affinity_set;
> >
> > --
> > 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 03/15] graph: move node process into inline function
  2023-04-27 15:03           ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-05-05  2:10             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-05  2:10 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, April 27, 2023 11:03 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v5 03/15] graph: move node process into inline
> function
> 
> > Node process is a single and reusable block, move the code into an
> > inline function.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
> >  lib/graph/rte_graph_worker_common.h | 33
> > +++++++++++++++++++++++++++++
> >  2 files changed, 35 insertions(+), 18 deletions(-)
> >
> > diff --git a/lib/graph/rte_graph_model_rtc.h
> > b/lib/graph/rte_graph_model_rtc.h index 665560f831..0dcb7151e9 100644
> > --- a/lib/graph/rte_graph_model_rtc.h
> > +++ b/lib/graph/rte_graph_model_rtc.h
> > @@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
> >  	const rte_node_t mask = graph->cir_mask;
> >  	uint32_t head = graph->head;
> >  	struct rte_node *node;
> > -	uint64_t start;
> > -	uint16_t rc;
> > -	void **objs;
> >
> >  	/*
> >  	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> > then @@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
> >  	 */
> >  	while (likely(head != graph->tail)) {
> >  		node = (struct rte_node *)RTE_PTR_ADD(graph,
> > cir_start[(int32_t)head++]);
> > -		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > -		objs = node->objs;
> > -		rte_prefetch0(objs);
> > -
> > -		if (rte_graph_has_stats_feature()) {
> > -			start = rte_rdtsc();
> 
> Since we are refactoring this function could you change rte_rdtsc() to
> rte_rdtsc_precise().

Sure, I will do in next version.

> 
> > -			rc = node->process(graph, node, objs, node->idx);
> > -			node->total_cycles += rte_rdtsc() - start;
> > -			node->total_calls++;
> > -			node->total_objs += rc;
> > -		} else {
> > -			node->process(graph, node, objs, node->idx);
> > -		}
> > -			node->idx = 0;
> > -			head = likely((int32_t)head > 0) ? head & mask :
> > head;
> > +		__rte_node_process(graph, node);
> > +		head = likely((int32_t)head > 0) ? head & mask : head;
> >  	}
> >  	graph->tail = 0;
> >  }
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index b58f8f6947..41428974db 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct
> > rte_graph *graph,
> >
> >  /* Fast path helper functions */
> >
> > +/**
> > + * @internal
> > + *
> > + * Enqueue a given node to the tail of the graph reel.
> > + *
> > + * @param graph
> > + *   Pointer Graph object.
> > + * @param node
> > + *   Pointer to node object to be enqueued.
> > + */
> > +static __rte_always_inline void
> > +__rte_node_process(struct rte_graph *graph, struct rte_node *node) {
> > +	uint64_t start;
> > +	uint16_t rc;
> > +	void **objs;
> > +
> > +	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +	objs = node->objs;
> > +	rte_prefetch0(objs);
> > +
> > +	if (rte_graph_has_stats_feature()) {
> > +		start = rte_rdtsc();
> > +		rc = node->process(graph, node, objs, node->idx);
> > +		node->total_cycles += rte_rdtsc() - start;
> > +		node->total_calls++;
> > +		node->total_objs += rc;
> > +	} else {
> > +		node->process(graph, node, objs, node->idx);
> > +	}
> > +	node->idx = 0;
> > +}
> > +
> >  /**
> >   * @internal
> >   *
> > --
> > 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 00/15] graph enhancement for multi-core dispatch
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (14 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-05-09  6:03         ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 01/15] graph: rename rte_graph_work as common Zhirun Yan
                             ` (15 more replies)
  15 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.


Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf



Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 236 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  47 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  57 +++
 lib/graph/rte_graph_model_dispatch.c | 190 ++++++++++
 lib/graph/rte_graph_model_dispatch.h | 123 ++++++
 lib/graph/rte_graph_model_rtc.h      |  46 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 497 +-----------------------
 lib/graph/rte_graph_worker_common.h  | 539 +++++++++++++++++++++++++++
 lib/graph/version.map                |  10 +
 18 files changed, 1581 insertions(+), 543 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 01/15] graph: rename rte_graph_work as common
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-22  8:25             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 02/15] graph: split graph worker into common and default model Zhirun Yan
                             ` (14 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df23e5099..cc11328242 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 02/15] graph: split graph worker into common and default model
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 03/15] graph: move node process into inline function Zhirun Yan
                             ` (13 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 6 files changed, 100 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 03/15] graph: move node process into inline function
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                             ` (12 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..e25eabc81f 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc_precise();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc_precise() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 04/15] graph: add get/set graph worker model APIs
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (2 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  6:08             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 05/15] graph: introduce graph node core affinity API Zhirun Yan
                             ` (11 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 77 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..cabc101262
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index e25eabc81f..9bde8856ae 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -19,6 +19,7 @@
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
+#include <rte_per_lcore.h>
 #include <rte_prefetch.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
@@ -95,6 +96,16 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
+RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model);
+
 /**
  * @internal
  *
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 05/15] graph: introduce graph node core affinity API
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (3 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  6:36             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 06/15] graph: introduce graph bind unbind API Zhirun Yan
                             ` (10 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 30 +++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 78 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..bd4c576324 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..3364a76ed4
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 06/15] graph: introduce graph bind unbind API
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (4 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  6:23             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                             ` (9 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..b8ef86da45 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +404,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index bd4c576324..f63b339d81 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -99,6 +99,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 07/15] graph: introduce graph clone API for other worker core
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (5 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  7:14             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 08/15] graph: add struct for stream moving between cores Zhirun Yan
                             ` (8 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b8ef86da45..2629c79103 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -404,6 +404,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -468,6 +469,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f63b339d81..52ca30ed56 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -99,6 +99,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 08/15] graph: add struct for stream moving between cores
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (6 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  7:24             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 09/15] graph: introduce stream moving cross cores Zhirun Yan
                             ` (7 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 2629c79103..e809aa55b0 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -290,6 +290,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 52ca30ed56..02b10ea2b6 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -61,6 +61,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 9bde8856ae..8e968e2022 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -30,6 +30,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -41,6 +48,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -74,6 +90,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 09/15] graph: introduce stream moving cross cores
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (7 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  8:00             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                             ` (6 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                    |   6 +-
 lib/graph/graph_private.h            |  30 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph.h                |  15 ++-
 lib/graph/rte_graph_model_dispatch.c | 157 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  37 +++++++
 lib/graph/version.map                |   2 +
 7 files changed, 244 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index e809aa55b0..f555844d8f 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -495,7 +495,7 @@ clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -566,14 +566,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 02b10ea2b6..70347116ba 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -372,4 +372,34 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..0ac764daf8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd[8];
+		} rtc;
+		struct {
+			uint32_t wq_size_max;
+			uint32_t mp_capacity;
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 3364a76ed4..4264723485 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,163 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    flags);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..18fa7ce0ab 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,49 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+__rte_experimental
+void __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index aaa86f66ed..d511133f39 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_sched_wq_process;
+	__rte_graph_sched_node_enqueue;
 
 	rte_graph_model_dispatch_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (8 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                             ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index f555844d8f..8b42d43193 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -449,6 +449,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -543,6 +547,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 11/15] graph: introduce graph walk by cross-core dispatch
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (9 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                             ` (4 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 18fa7ce0ab..f35cddba31 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -73,6 +73,49 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (10 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  8:45             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 13/15] graph: add stats for cross-core dispatching Zhirun Yan
                             ` (3 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..2dd27b3949 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 13/15] graph: add stats for cross-core dispatching
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (11 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  8:08             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                             ` (2 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..9ccb358aa2 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +377,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 0ac764daf8..ee6c970ca4 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -219,6 +219,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4264723485..cb7b6b9b7a 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 8e968e2022..7095cb4699 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (12 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 236 +++++++++++++++++++++++++++++-------
 1 file changed, 194 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..a91947e940 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,139 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +985,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1019,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1261,19 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1324,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 15/15] doc: update multicore dispatch model in graph guides
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (13 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  8:12             ` Jerin Jacob
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..72e26f3a5a 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walking model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. s