[PATCH v1 00/13] graph enhancement for multi-core dispatch

DPDK patches and discussions
 help / color / mirror / Atom feed

* [PATCH v1 00/13] graph enhancement for multi-core dispatch
@ 2022-11-17  5:09 Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
                   ` (14 more replies)
  0 siblings, 15 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'generic' model selection which is a
self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

Generic Model
RTC:
Config Graph-A: node-0->current; node-1->current; node-2->current;
Graph-A':node-0/1/2 @0, Graph-A':node-0/1/2 @1, Graph-A':node-0/1/2 @2

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Pipeline:
Config Graph-A: node-0->0; node-1->1; node-2->2;
Graph-A':node-0 @0, Graph-A':node-1 @1, Graph-A':node-2 @2

+ - - - - - -+     +- - - - - - +     + - - - - - -+
'  Core #0   '     '  Core #1   '     '  Core #2   '
'            '     '            '     '            '
' +--------+ '     ' +--------+ '     ' +--------+ '
' | Node-0 | ' --> ' | Node-1 | ' --> ' | Node-2 | '
' +--------+ '     ' +--------+ '     ' +--------+ '
'            '     '            '     '            '
+ - - - - - -+     +- - - - - - +     + - - - - - -+

Hybrid:
Config Graph-A: node-0->current; node-1->current; node-2->2;
Graph-A':node-0/1 @0, Graph-A':node-0/1 @1, Graph-A':node-2 @2

+ - - - - - - - - - - - - - - - +     + - - - - - -+
'            Core #0            '     '  Core #2   '
'                               '     '            '
' +--------+         +--------+ '     ' +--------+ '
' | Node-0 | ------> | Node-1 | ' --> ' | Node-2 | '
' +--------+         +--------+ '     ' +--------+ '
'                               '     '            '
+ - - - - - - - - - - - - - - - +     + - - - - - -+
                                          ^
                                          |
                                          |
+ - - - - - - - - - - - - - - - +         |
'            Core #1            '         |
'                               '         |
' +--------+         +--------+ '         |
' | Node-0 | ------> | Node-1 | ' --------+
' +--------+         +--------+ '
'                               '
+ - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing and graph circular buffer walking to make
  it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8,9,10.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="generic"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (13):
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add macro to walk on graph circular buffer
  graph: add get/set graph worker model APIs
  graph: introduce core affinity API
  graph: introduce graph affinity API
  graph: introduce graph clone API for other worker core
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph generic scheduler model
  graph: add stats for corss-core dispatching
  examples/l3fwd-graph: introduce generic worker model

 examples/l3fwd-graph/main.c         | 218 +++++++++--
 lib/graph/graph.c                   | 179 +++++++++
 lib/graph/graph_debug.c             |   6 +
 lib/graph/graph_populate.c          |   1 +
 lib/graph/graph_private.h           |  44 +++
 lib/graph/graph_stats.c             |  74 +++-
 lib/graph/meson.build               |   3 +-
 lib/graph/node.c                    |   1 +
 lib/graph/rte_graph.h               |  44 +++
 lib/graph/rte_graph_model_generic.c | 179 +++++++++
 lib/graph/rte_graph_model_generic.h | 114 ++++++
 lib/graph/rte_graph_model_rtc.h     |  22 ++
 lib/graph/rte_graph_worker.h        | 516 ++------------------------
 lib/graph/rte_graph_worker_common.h | 545 ++++++++++++++++++++++++++++
 lib/graph/version.map               |   8 +
 15 files changed, 1430 insertions(+), 524 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_generic.c
 create mode 100644 lib/graph/rte_graph_model_generic.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 01/13] graph: split graph worker into common and default model
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 13:38   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 02/13] graph: move node process into inline function Zhirun Yan
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     |  57 ++++
 lib/graph/rte_graph_worker.h        | 498 +---------------------------
 lib/graph/rte_graph_worker_common.h | 456 +++++++++++++++++++++++++
 3 files changed, 515 insertions(+), 496 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker_common.h

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..fb58730bde
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,57 @@
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+		node->idx = 0;
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 6dc7461659..54d1390786 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -1,122 +1,4 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(C) 2020 Marvell International Ltd.
- */
-
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
-
-/**
- * @file rte_graph_worker.h
- *
- * @warning
- * @b EXPERIMENTAL:
- * All functions in this file may be changed or removed without prior notice.
- *
- * This API allows a worker thread to walk over a graph and nodes to create,
- * process, enqueue and move streams of objects to the next nodes.
- */
-
-#include <rte_common.h>
-#include <rte_cycles.h>
-#include <rte_prefetch.h>
-#include <rte_memcpy.h>
-#include <rte_memory.h>
-
-#include "rte_graph.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/**
- * @internal
- *
- * Data structure to hold graph data.
- */
-struct rte_graph {
-	uint32_t tail;		     /**< Tail of circular buffer. */
-	uint32_t head;		     /**< Head of circular buffer. */
-	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
-	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
-	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
-	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
-	rte_graph_t id;	/**< Graph identifier. */
-	int socket;	/**< Socket ID where memory is allocated. */
-	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
-	uint64_t fence;			/**< Fence. */
-} __rte_cache_aligned;
-
-/**
- * @internal
- *
- * Data structure to hold node data.
- */
-struct rte_node {
-	/* Slow path area  */
-	uint64_t fence;		/**< Fence. */
-	rte_graph_off_t next;	/**< Index to next node. */
-	rte_node_t id;		/**< Node identifier. */
-	rte_node_t parent_id;	/**< Parent Node identifier. */
-	rte_edge_t nb_edges;	/**< Number of edges from this node. */
-	uint32_t realloc_count;	/**< Number of times realloced. */
-
-	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
-	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
-
-	/* Fast path area  */
-#define RTE_NODE_CTX_SZ 16
-	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-	uint16_t size;		/**< Total number of objects available. */
-	uint16_t idx;		/**< Number of objects used. */
-	rte_graph_off_t off;	/**< Offset of node in the graph reel. */
-	uint64_t total_cycles;	/**< Cycles spent in this node. */
-	uint64_t total_calls;	/**< Calls done to this node. */
-	uint64_t total_objs;	/**< Objects processed by this node. */
-	RTE_STD_C11
-		union {
-			void **objs;	   /**< Array of object pointers. */
-			uint64_t objs_u64;
-		};
-	RTE_STD_C11
-		union {
-			rte_node_process_t process; /**< Process function. */
-			uint64_t process_u64;
-		};
-	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
-} __rte_cache_aligned;
-
-/**
- * @internal
- *
- * Allocate a stream of objects.
- *
- * If stream already exists then re-allocate it to a larger size.
- *
- * @param graph
- *   Pointer to the graph object.
- * @param node
- *   Pointer to the node object.
- */
-__rte_experimental
-void __rte_node_stream_alloc(struct rte_graph *graph, struct rte_node *node);
-
-/**
- * @internal
- *
- * Allocate a stream with requested number of objects.
- *
- * If stream already exists then re-allocate it to a larger size.
- *
- * @param graph
- *   Pointer to the graph object.
- * @param node
- *   Pointer to the node object.
- * @param req_size
- *   Number of objects to be allocated.
- */
-__rte_experimental
-void __rte_node_stream_alloc_size(struct rte_graph *graph,
-				  struct rte_node *node, uint16_t req_size);
+#include "rte_graph_model_rtc.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -131,381 +13,5 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
-/* Fast path helper functions */
-
-/**
- * @internal
- *
- * Enqueue a given node to the tail of the graph reel.
- *
- * @param graph
- *   Pointer Graph object.
- * @param node
- *   Pointer to node object to be enqueued.
- */
-static __rte_always_inline void
-__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
-{
-	uint32_t tail;
-
-	tail = graph->tail;
-	graph->cir_start[tail++] = node->off;
-	graph->tail = tail & graph->cir_mask;
-}
-
-/**
- * @internal
- *
- * Enqueue sequence prologue function.
- *
- * Updates the node to tail of graph reel and resizes the number of objects
- * available in the stream as needed.
- *
- * @param graph
- *   Pointer to the graph object.
- * @param node
- *   Pointer to the node object.
- * @param idx
- *   Index at which the object enqueue starts from.
- * @param space
- *   Space required for the object enqueue.
- */
-static __rte_always_inline void
-__rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node,
-			    const uint16_t idx, const uint16_t space)
-{
-
-	/* Add to the pending stream list if the node is new */
-	if (idx == 0)
-		__rte_node_enqueue_tail_update(graph, node);
-
-	if (unlikely(node->size < (idx + space)))
-		__rte_node_stream_alloc_size(graph, node, node->size + space);
-}
-
-/**
- * @internal
- *
- * Get the node pointer from current node edge id.
- *
- * @param node
- *   Current node pointer.
- * @param next
- *   Edge id of the required node.
- *
- * @return
- *   Pointer to the node denoted by the edge id.
- */
-static __rte_always_inline struct rte_node *
-__rte_node_next_node_get(struct rte_node *node, rte_edge_t next)
-{
-	RTE_ASSERT(next < node->nb_edges);
-	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-	node = node->nodes[next];
-	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-
-	return node;
-}
-
-/**
- * Enqueue the objs to next node for further processing and set
- * the next node to pending state in the circular buffer.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to enqueue objs.
- * @param objs
- *   Objs to enqueue.
- * @param nb_objs
- *   Number of objs to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue(struct rte_graph *graph, struct rte_node *node,
-		 rte_edge_t next, void **objs, uint16_t nb_objs)
-{
-	node = __rte_node_next_node_get(node, next);
-	const uint16_t idx = node->idx;
-
-	__rte_node_enqueue_prologue(graph, node, idx, nb_objs);
-
-	rte_memcpy(&node->objs[idx], objs, nb_objs * sizeof(void *));
-	node->idx = idx + nb_objs;
+	rte_graph_walk_rtc(graph);
 }
-
-/**
- * Enqueue only one obj to next node for further processing and
- * set the next node to pending state in the circular buffer.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to enqueue objs.
- * @param obj
- *   Obj to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue_x1(struct rte_graph *graph, struct rte_node *node,
-		    rte_edge_t next, void *obj)
-{
-	node = __rte_node_next_node_get(node, next);
-	uint16_t idx = node->idx;
-
-	__rte_node_enqueue_prologue(graph, node, idx, 1);
-
-	node->objs[idx++] = obj;
-	node->idx = idx;
-}
-
-/**
- * Enqueue only two objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
- * Same as rte_node_enqueue_x1 but enqueue two objs.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to enqueue objs.
- * @param obj0
- *   Obj to enqueue.
- * @param obj1
- *   Obj to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue_x2(struct rte_graph *graph, struct rte_node *node,
-		    rte_edge_t next, void *obj0, void *obj1)
-{
-	node = __rte_node_next_node_get(node, next);
-	uint16_t idx = node->idx;
-
-	__rte_node_enqueue_prologue(graph, node, idx, 2);
-
-	node->objs[idx++] = obj0;
-	node->objs[idx++] = obj1;
-	node->idx = idx;
-}
-
-/**
- * Enqueue only four objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
- * Same as rte_node_enqueue_x1 but enqueue four objs.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to enqueue objs.
- * @param obj0
- *   1st obj to enqueue.
- * @param obj1
- *   2nd obj to enqueue.
- * @param obj2
- *   3rd obj to enqueue.
- * @param obj3
- *   4th obj to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue_x4(struct rte_graph *graph, struct rte_node *node,
-		    rte_edge_t next, void *obj0, void *obj1, void *obj2,
-		    void *obj3)
-{
-	node = __rte_node_next_node_get(node, next);
-	uint16_t idx = node->idx;
-
-	__rte_node_enqueue_prologue(graph, node, idx, 4);
-
-	node->objs[idx++] = obj0;
-	node->objs[idx++] = obj1;
-	node->objs[idx++] = obj2;
-	node->objs[idx++] = obj3;
-	node->idx = idx;
-}
-
-/**
- * Enqueue objs to multiple next nodes for further processing and
- * set the next nodes to pending state in the circular buffer.
- * objs[i] will be enqueued to nexts[i].
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param nexts
- *   List of relative next node indices to enqueue objs.
- * @param objs
- *   List of objs to enqueue.
- * @param nb_objs
- *   Number of objs to enqueue.
- */
-__rte_experimental
-static inline void
-rte_node_enqueue_next(struct rte_graph *graph, struct rte_node *node,
-		      rte_edge_t *nexts, void **objs, uint16_t nb_objs)
-{
-	uint16_t i;
-
-	for (i = 0; i < nb_objs; i++)
-		rte_node_enqueue_x1(graph, node, nexts[i], objs[i]);
-}
-
-/**
- * Get the stream of next node to enqueue the objs.
- * Once done with the updating the objs, needs to call
- * rte_node_next_stream_put to put the next node to pending state.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index to get stream.
- * @param nb_objs
- *   Requested free size of the next stream.
- *
- * @return
- *   Valid next stream on success.
- *
- * @see rte_node_next_stream_put().
- */
-__rte_experimental
-static inline void **
-rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node,
-			 rte_edge_t next, uint16_t nb_objs)
-{
-	node = __rte_node_next_node_get(node, next);
-	const uint16_t idx = node->idx;
-	uint16_t free_space = node->size - idx;
-
-	if (unlikely(free_space < nb_objs))
-		__rte_node_stream_alloc_size(graph, node, node->size + nb_objs);
-
-	return &node->objs[idx];
-}
-
-/**
- * Put the next stream to pending state in the circular buffer
- * for further processing. Should be invoked after rte_node_next_stream_get().
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param node
- *   Current node pointer.
- * @param next
- *   Relative next node index..
- * @param idx
- *   Number of objs updated in the stream after getting the stream using
- *   rte_node_next_stream_get.
- *
- * @see rte_node_next_stream_get().
- */
-__rte_experimental
-static inline void
-rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node,
-			 rte_edge_t next, uint16_t idx)
-{
-	if (unlikely(!idx))
-		return;
-
-	node = __rte_node_next_node_get(node, next);
-	if (node->idx == 0)
-		__rte_node_enqueue_tail_update(graph, node);
-
-	node->idx += idx;
-}
-
-/**
- * Home run scenario, Enqueue all the objs of current node to next
- * node in optimized way by swapping the streams of both nodes.
- * Performs good when next node is already not in pending state.
- * If next node is already in pending state then normal enqueue
- * will be used.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup().
- * @param src
- *   Current node pointer.
- * @param next
- *   Relative next node index.
- */
-__rte_experimental
-static inline void
-rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
-			  rte_edge_t next)
-{
-	struct rte_node *dst = __rte_node_next_node_get(src, next);
-
-	/* Let swap the pointers if dst don't have valid objs */
-	if (likely(dst->idx == 0)) {
-		void **dobjs = dst->objs;
-		uint16_t dsz = dst->size;
-		dst->objs = src->objs;
-		dst->size = src->size;
-		src->objs = dobjs;
-		src->size = dsz;
-		dst->idx = src->idx;
-		__rte_node_enqueue_tail_update(graph, dst);
-	} else { /* Move the objects from src node to dst node */
-		rte_node_enqueue(graph, src, next, src->objs, src->idx);
-	}
-}
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
new file mode 100644
index 0000000000..91a5de7fa4
--- /dev/null
+++ b/lib/graph/rte_graph_worker_common.h
@@ -0,0 +1,456 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ */
+
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
+
+/**
+ * @file rte_graph_worker.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows a worker thread to walk over a graph and nodes to create,
+ * process, enqueue and move streams of objects to the next nodes.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_prefetch.h>
+#include <rte_memcpy.h>
+#include <rte_memory.h>
+
+#include "rte_graph.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal
+ *
+ * Data structure to hold graph data.
+ */
+struct rte_graph {
+	uint32_t tail;		     /**< Tail of circular buffer. */
+	uint32_t head;		     /**< Head of circular buffer. */
+	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
+	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
+	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	rte_graph_t id;	/**< Graph identifier. */
+	int socket;	/**< Socket ID where memory is allocated. */
+	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
+	uint64_t fence;			/**< Fence. */
+} __rte_cache_aligned;
+
+/**
+ * @internal
+ *
+ * Data structure to hold node data.
+ */
+struct rte_node {
+	/* Slow path area  */
+	uint64_t fence;		/**< Fence. */
+	rte_graph_off_t next;	/**< Index to next node. */
+	rte_node_t id;		/**< Node identifier. */
+	rte_node_t parent_id;	/**< Parent Node identifier. */
+	rte_edge_t nb_edges;	/**< Number of edges from this node. */
+	uint32_t realloc_count;	/**< Number of times realloced. */
+
+	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
+	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
+
+	/* Fast path area  */
+#define RTE_NODE_CTX_SZ 16
+	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
+	uint16_t size;		/**< Total number of objects available. */
+	uint16_t idx;		/**< Number of objects used. */
+	rte_graph_off_t off;	/**< Offset of node in the graph reel. */
+	uint64_t total_cycles;	/**< Cycles spent in this node. */
+	uint64_t total_calls;	/**< Calls done to this node. */
+	uint64_t total_objs;	/**< Objects processed by this node. */
+
+	RTE_STD_C11
+		union {
+			void **objs;	   /**< Array of object pointers. */
+			uint64_t objs_u64;
+		};
+	RTE_STD_C11
+		union {
+			rte_node_process_t process; /**< Process function. */
+			uint64_t process_u64;
+		};
+	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
+} __rte_cache_aligned;
+
+/**
+ * @internal
+ *
+ * Allocate a stream of objects.
+ *
+ * If stream already exists then re-allocate it to a larger size.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ */
+__rte_experimental
+void __rte_node_stream_alloc(struct rte_graph *graph, struct rte_node *node);
+
+/**
+ * @internal
+ *
+ * Allocate a stream with requested number of objects.
+ *
+ * If stream already exists then re-allocate it to a larger size.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param req_size
+ *   Number of objects to be allocated.
+ */
+__rte_experimental
+void __rte_node_stream_alloc_size(struct rte_graph *graph,
+				  struct rte_node *node, uint16_t req_size);
+
+/* Fast path helper functions */
+
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
+{
+	uint32_t tail;
+
+	tail = graph->tail;
+	graph->cir_start[tail++] = node->off;
+	graph->tail = tail & graph->cir_mask;
+}
+
+/**
+ * @internal
+ *
+ * Enqueue sequence prologue function.
+ *
+ * Updates the node to tail of graph reel and resizes the number of objects
+ * available in the stream as needed.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param idx
+ *   Index at which the object enqueue starts from.
+ * @param space
+ *   Space required for the object enqueue.
+ */
+static __rte_always_inline void
+__rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node,
+			    const uint16_t idx, const uint16_t space)
+{
+
+	/* Add to the pending stream list if the node is new */
+	if (idx == 0)
+		__rte_node_enqueue_tail_update(graph, node);
+
+	if (unlikely(node->size < (idx + space)))
+		__rte_node_stream_alloc_size(graph, node, node->size + space);
+}
+
+/**
+ * @internal
+ *
+ * Get the node pointer from current node edge id.
+ *
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Edge id of the required node.
+ *
+ * @return
+ *   Pointer to the node denoted by the edge id.
+ */
+static __rte_always_inline struct rte_node *
+__rte_node_next_node_get(struct rte_node *node, rte_edge_t next)
+{
+	RTE_ASSERT(next < node->nb_edges);
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	node = node->nodes[next];
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+
+	return node;
+}
+
+/**
+ * Enqueue the objs to next node for further processing and set
+ * the next node to pending state in the circular buffer.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to enqueue objs.
+ * @param objs
+ *   Objs to enqueue.
+ * @param nb_objs
+ *   Number of objs to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue(struct rte_graph *graph, struct rte_node *node,
+		 rte_edge_t next, void **objs, uint16_t nb_objs)
+{
+	node = __rte_node_next_node_get(node, next);
+	const uint16_t idx = node->idx;
+
+	__rte_node_enqueue_prologue(graph, node, idx, nb_objs);
+
+	rte_memcpy(&node->objs[idx], objs, nb_objs * sizeof(void *));
+	node->idx = idx + nb_objs;
+}
+
+/**
+ * Enqueue only one obj to next node for further processing and
+ * set the next node to pending state in the circular buffer.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to enqueue objs.
+ * @param obj
+ *   Obj to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue_x1(struct rte_graph *graph, struct rte_node *node,
+		    rte_edge_t next, void *obj)
+{
+	node = __rte_node_next_node_get(node, next);
+	uint16_t idx = node->idx;
+
+	__rte_node_enqueue_prologue(graph, node, idx, 1);
+
+	node->objs[idx++] = obj;
+	node->idx = idx;
+}
+
+/**
+ * Enqueue only two objs to next node for further processing and
+ * set the next node to pending state in the circular buffer.
+ * Same as rte_node_enqueue_x1 but enqueue two objs.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to enqueue objs.
+ * @param obj0
+ *   Obj to enqueue.
+ * @param obj1
+ *   Obj to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue_x2(struct rte_graph *graph, struct rte_node *node,
+		    rte_edge_t next, void *obj0, void *obj1)
+{
+	node = __rte_node_next_node_get(node, next);
+	uint16_t idx = node->idx;
+
+	__rte_node_enqueue_prologue(graph, node, idx, 2);
+
+	node->objs[idx++] = obj0;
+	node->objs[idx++] = obj1;
+	node->idx = idx;
+}
+
+/**
+ * Enqueue only four objs to next node for further processing and
+ * set the next node to pending state in the circular buffer.
+ * Same as rte_node_enqueue_x1 but enqueue four objs.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to enqueue objs.
+ * @param obj0
+ *   1st obj to enqueue.
+ * @param obj1
+ *   2nd obj to enqueue.
+ * @param obj2
+ *   3rd obj to enqueue.
+ * @param obj3
+ *   4th obj to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue_x4(struct rte_graph *graph, struct rte_node *node,
+		    rte_edge_t next, void *obj0, void *obj1, void *obj2,
+		    void *obj3)
+{
+	node = __rte_node_next_node_get(node, next);
+	uint16_t idx = node->idx;
+
+	__rte_node_enqueue_prologue(graph, node, idx, 4);
+
+	node->objs[idx++] = obj0;
+	node->objs[idx++] = obj1;
+	node->objs[idx++] = obj2;
+	node->objs[idx++] = obj3;
+	node->idx = idx;
+}
+
+/**
+ * Enqueue objs to multiple next nodes for further processing and
+ * set the next nodes to pending state in the circular buffer.
+ * objs[i] will be enqueued to nexts[i].
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param nexts
+ *   List of relative next node indices to enqueue objs.
+ * @param objs
+ *   List of objs to enqueue.
+ * @param nb_objs
+ *   Number of objs to enqueue.
+ */
+__rte_experimental
+static inline void
+rte_node_enqueue_next(struct rte_graph *graph, struct rte_node *node,
+		      rte_edge_t *nexts, void **objs, uint16_t nb_objs)
+{
+	uint16_t i;
+
+	for (i = 0; i < nb_objs; i++)
+		rte_node_enqueue_x1(graph, node, nexts[i], objs[i]);
+}
+
+/**
+ * Get the stream of next node to enqueue the objs.
+ * Once done with the updating the objs, needs to call
+ * rte_node_next_stream_put to put the next node to pending state.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index to get stream.
+ * @param nb_objs
+ *   Requested free size of the next stream.
+ *
+ * @return
+ *   Valid next stream on success.
+ *
+ * @see rte_node_next_stream_put().
+ */
+__rte_experimental
+static inline void **
+rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node,
+			 rte_edge_t next, uint16_t nb_objs)
+{
+	node = __rte_node_next_node_get(node, next);
+	const uint16_t idx = node->idx;
+	uint16_t free_space = node->size - idx;
+
+	if (unlikely(free_space < nb_objs))
+		__rte_node_stream_alloc_size(graph, node, node->size + nb_objs);
+
+	return &node->objs[idx];
+}
+
+/**
+ * Put the next stream to pending state in the circular buffer
+ * for further processing. Should be invoked after rte_node_next_stream_get().
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param node
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index..
+ * @param idx
+ *   Number of objs updated in the stream after getting the stream using
+ *   rte_node_next_stream_get.
+ *
+ * @see rte_node_next_stream_get().
+ */
+__rte_experimental
+static inline void
+rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node,
+			 rte_edge_t next, uint16_t idx)
+{
+	if (unlikely(!idx))
+		return;
+
+	node = __rte_node_next_node_get(node, next);
+	if (node->idx == 0)
+		__rte_node_enqueue_tail_update(graph, node);
+
+	node->idx += idx;
+}
+
+/**
+ * Home run scenario, Enqueue all the objs of current node to next
+ * node in optimized way by swapping the streams of both nodes.
+ * Performs good when next node is already not in pending state.
+ * If next node is already in pending state then normal enqueue
+ * will be used.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup().
+ * @param src
+ *   Current node pointer.
+ * @param next
+ *   Relative next node index.
+ */
+__rte_experimental
+static inline void
+rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
+			  rte_edge_t next)
+{
+	struct rte_node *dst = __rte_node_next_node_get(src, next);
+
+	/* Let swap the pointers if dst don't have valid objs */
+	if (likely(dst->idx == 0)) {
+		void **dobjs = dst->objs;
+		uint16_t dsz = dst->size;
+
+		dst->objs = src->objs;
+		dst->size = src->size;
+		src->objs = dobjs;
+		src->size = dsz;
+		dst->idx = src->idx;
+		__rte_node_enqueue_tail_update(graph, dst);
+	} else { /* Move the objects from src node to dst node */
+		rte_node_enqueue(graph, src, next, src->objs, src->idx);
+	}
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_COMMON_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 02/13] graph: move node process into inline function
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 13:39   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 03/13] graph: add macro to walk on graph circular buffer Zhirun Yan
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 18 +---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index fb58730bde..c80b0ce962 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -16,9 +16,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -37,20 +34,7 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
+		__rte_node_process(graph, node);
 		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 91a5de7fa4..b7b2bb958c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -121,6 +121,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 03/13] graph: add macro to walk on graph circular buffer
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 02/13] graph: move node process into inline function Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 13:45   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

It is common to walk on graph circular buffer and use macro to make
it reusable for other worker models.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 23 ++---------------------
 lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index c80b0ce962..5474b06063 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -12,30 +12,11 @@
 static inline void
 rte_graph_walk_rtc(struct rte_graph *graph)
 {
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
 
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+	rte_graph_walk_node(graph, head, node)
 		__rte_node_process(graph, node);
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
+
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b7b2bb958c..df33204336 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -121,6 +121,29 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * Macro to walk on the source node(s) ((cir_start - head) -> cir_start)
+ * and then on the pending streams
+ * (cir_start -> (cir_start + mask) -> cir_start)
+ * in a circular buffer fashion.
+ *
+ *	+-----+ <= cir_start - head [number of source nodes]
+ *	|     |
+ *	| ... | <= source nodes
+ *	|     |
+ *	+-----+ <= cir_start [head = 0] [tail = 0]
+ *	|     |
+ *	| ... | <= pending streams
+ *	|     |
+ *	+-----+ <= cir_start + mask
+ */
+#define rte_graph_walk_node(graph, head, node)                                         \
+	for ((node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]);        \
+	     likely((head) != (graph)->tail);                                           \
+	     (head)++,                                                                  \
+	     (node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]),        \
+	     (head) = likely((int32_t)(head) > 0) ? (head) & (graph)->cir_mask : (head))
+
 /**
  * @internal
  *
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (2 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 03/13] graph: add macro to walk on graph circular buffer Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-12-06  3:35   ` [EXT] " Kiran Kumar Kokkilagadda
  2023-02-20 13:50   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 05/13] graph: introduce core affinity API Zhirun Yan
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 13 ++++++++
 lib/graph/version.map               |  3 ++
 3 files changed, 67 insertions(+)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 54d1390786..a0ea0df153 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -1,5 +1,56 @@
 #include "rte_graph_model_rtc.h"
 
+static enum rte_graph_worker_model worker_model = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+static inline int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_MAX)
+		goto fail;
+
+	worker_model = model;
+	return 0;
+
+fail:
+	worker_model = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return worker_model;
+}
+
 /**
  * Perform graph walk on the circular buffer and invoke the process function
  * of the nodes and collect the stats.
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index df33204336..507a344afd 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -86,6 +86,19 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+
+
+/** Graph worker models */
+enum rte_graph_worker_model {
+#define WORKER_MODEL_DEFAULT "default"
+	RTE_GRAPH_MODEL_DEFAULT = 0,
+#define WORKER_MODEL_RTC "rtc"
+	RTE_GRAPH_MODEL_RTC,
+#define WORKER_MODEL_GENERIC "generic"
+	RTE_GRAPH_MODEL_GENERIC,
+	RTE_GRAPH_MODEL_MAX,
+};
+
 /**
  * @internal
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 05/13] graph: introduce core affinity API
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (3 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 14:05   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 06/13] graph: introduce graph " Zhirun Yan
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

1. add lcore_id for node to hold affinity core id.
2. impl rte_node_model_generic_set_lcore_affinity to affinity node
   with one lcore.
3. update version map for graph public API.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h           |  1 +
 lib/graph/meson.build               |  1 +
 lib/graph/node.c                    |  1 +
 lib/graph/rte_graph_model_generic.c | 31 +++++++++++++++++++++
 lib/graph/rte_graph_model_generic.h | 43 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_generic.c
 create mode 100644 lib/graph/rte_graph_model_generic.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..627090f802 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -49,6 +49,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..8c8b11ed27 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -14,6 +14,7 @@ sources = files(
         'graph_debug.c',
         'graph_stats.c',
         'graph_populate.c',
+        'rte_graph_model_generic.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index fc6345de07..8ad4b3cbeb 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_generic.c b/lib/graph/rte_graph_model_generic.c
new file mode 100644
index 0000000000..54ff659c7b
--- /dev/null
+++ b/lib/graph/rte_graph_model_generic.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_generic.h"
+
+int
+rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_generic.h b/lib/graph/rte_graph_model_generic.h
new file mode 100644
index 0000000000..20ca48a9e3
--- /dev/null
+++ b/lib/graph/rte_graph_model_generic.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_GENERIC_H_
+#define _RTE_GRAPH_MODEL_GENERIC_H_
+
+/**
+ * @file rte_graph_model_generic.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows a worker thread to walk over a graph and nodes to create,
+ * process, enqueue and move streams of objects to the next nodes.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity to the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_GENERIC_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..33ff055be6 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_node_model_generic_set_lcore_affinity;
+
 	local: *;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 06/13] graph: introduce graph affinity API
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (4 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 05/13] graph: introduce core affinity API Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 14:07   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 07/13] graph: introduce graph clone API for other worker core Zhirun Yan
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 3a617cc369..a8d8eb633e 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -245,6 +245,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_bind_core(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_unbind_core(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -328,6 +386,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 627090f802..7326975a86 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -97,6 +97,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..1d938f6979 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -280,6 +280,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Set graph lcore affinity attribute
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_bind_core(rte_graph_t id, int lcore);
+
+/**
+ * Unset the graph lcore affinity attribute
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_unbind_core(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 33ff055be6..1c599b5b47 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_bind_core;
+	rte_graph_unbind_core;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 07/13] graph: introduce graph clone API for other worker core
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (5 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 06/13] graph: introduce graph " Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 08/13] graph: introduce stream moving cross cores Zhirun Yan
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a8d8eb633e..17a9c87032 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -386,6 +386,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 
 	/* Allocate the Graph fast path memory and populate the data */
@@ -447,6 +448,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7326975a86..c1f2aadd42 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -97,6 +97,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 1d938f6979..210e125661 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -242,6 +242,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1c599b5b47..c4d8c2c271 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 08/13] graph: introduce stream moving cross cores
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (6 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 07/13] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 14:17   ` Jerin Jacob
  2022-11-17  5:09 ` [PATCH v1 09/13] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

1. add graph_sched_wq_node to hold graph scheduling workqueue node
stream
2. add workqueue help functions to create/destroy/enqueue/dequeue

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |   1 +
 lib/graph/graph_populate.c          |   1 +
 lib/graph/graph_private.h           |  39 ++++++++
 lib/graph/meson.build               |   2 +-
 lib/graph/rte_graph_model_generic.c | 145 ++++++++++++++++++++++++++++
 lib/graph/rte_graph_model_generic.h |  35 +++++++
 lib/graph/rte_graph_worker_common.h |  18 ++++
 7 files changed, 240 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 17a9c87032..8ea0daaa35 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -275,6 +275,7 @@ rte_graph_bind_core(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..26f9670406 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -84,6 +84,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index c1f2aadd42..f58d0d1d63 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -59,6 +59,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for generic worker model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
@@ -349,4 +361,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 8c8b11ed27..f93ab6fdcb 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -18,4 +18,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal']
+deps += ['eal', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_generic.c b/lib/graph/rte_graph_model_generic.c
index 54ff659c7b..c862237432 100644
--- a/lib/graph/rte_graph_model_generic.c
+++ b/lib/graph/rte_graph_model_generic.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_generic.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void __rte_noinline
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_generic.h b/lib/graph/rte_graph_model_generic.h
index 20ca48a9e3..5715fc8ffb 100644
--- a/lib/graph/rte_graph_model_generic.h
+++ b/lib/graph/rte_graph_model_generic.h
@@ -15,12 +15,47 @@
  * This API allows a worker thread to walk over a graph and nodes to create,
  * process, enqueue and move streams of objects to the next nodes.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+bool __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+void __rte_noinline __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity to the node.
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 507a344afd..cf38a03f44 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -28,6 +28,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -39,6 +46,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -63,6 +79,8 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/* Fast schedule area */
+	unsigned int lcore_id __rte_cache_aligned;  /**< Node running Lcore. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 09/13] graph: enable create and destroy graph scheduling workqueue
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (7 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 08/13] graph: introduce stream moving cross cores Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 10/13] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8ea0daaa35..63d9bcffd2 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -428,6 +428,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -522,6 +526,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 10/13] graph: introduce graph walk by cross-core dispatch
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (8 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 09/13] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 11/13] graph: enable graph generic scheduler model Zhirun Yan
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_generic.h | 36 +++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/lib/graph/rte_graph_model_generic.h b/lib/graph/rte_graph_model_generic.h
index 5715fc8ffb..c29fc31309 100644
--- a/lib/graph/rte_graph_model_generic.h
+++ b/lib/graph/rte_graph_model_generic.h
@@ -71,6 +71,42 @@ void __rte_noinline __rte_graph_sched_wq_process(struct rte_graph *graph);
 __rte_experimental
 int rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_generic(struct rte_graph *graph)
+{
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	rte_graph_walk_node(graph, head, node) {
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 11/13] graph: enable graph generic scheduler model
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (9 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 10/13] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 12/13] graph: add stats for corss-core dispatching Zhirun Yan
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index a0ea0df153..dea207ca46 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -1,4 +1,5 @@
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_generic.h"
 
 static enum rte_graph_worker_model worker_model = RTE_GRAPH_MODEL_DEFAULT;
 
@@ -64,5 +65,11 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_GENERIC)
+		rte_graph_walk_generic(graph);
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 12/13] graph: add stats for corss-core dispatching
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (10 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 11/13] graph: enable graph generic scheduler model Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2022-11-17  5:09 ` [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model Zhirun Yan
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c             |  6 +++
 lib/graph/graph_stats.c             | 74 +++++++++++++++++++++++++----
 lib/graph/rte_graph.h               |  2 +
 lib/graph/rte_graph_model_generic.c |  3 ++
 lib/graph/rte_graph_worker_common.h |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..080ba16ad9 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..801fcb832d 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_generic()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_generic(FILE *f)
+{
+	boarder_model_generic();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_generic();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC)
+		print_banner_generic(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_GENERIC)
+			boarder_model_generic();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_GENERIC) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_GENERIC) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 210e125661..2d22ee0255 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -203,6 +203,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_generic.c b/lib/graph/rte_graph_model_generic.c
index c862237432..5504a65a39 100644
--- a/lib/graph/rte_graph_model_generic.c
+++ b/lib/graph/rte_graph_model_generic.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index cf38a03f44..346f8337d4 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -81,6 +81,8 @@ struct rte_node {
 
 	/* Fast schedule area */
 	unsigned int lcore_id __rte_cache_aligned;  /**< Node running Lcore. */
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (11 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 12/13] graph: add stats for corss-core dispatching Zhirun Yan
@ 2022-11-17  5:09 ` Zhirun Yan
  2023-02-20 14:20   ` Jerin Jacob
  2023-02-20  0:22 ` [PATCH v1 00/13] graph enhancement for multi-core dispatch Thomas Monjalon
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
  14 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2022-11-17  5:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose generic or rtc worker model.
And in generic model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="generic"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 218 +++++++++++++++++++++++++++++-------
 1 file changed, 179 insertions(+), 39 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 6dcb6ee92b..c145a3e3e8 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -147,6 +147,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -291,6 +304,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_DEFAULT) == 0)
+		return RTE_GRAPH_MODEL_DEFAULT;
+	else if (strcmp(model, WORKER_MODEL_GENERIC) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_GENERIC);
+		return RTE_GRAPH_MODEL_GENERIC;
+	}
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_MAX;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -404,6 +431,7 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_NO_NUMA	   "no-numa"
 #define CMD_LINE_OPT_MAX_PKT_LEN   "max-pkt-len"
 #define CMD_LINE_OPT_PER_PORT_POOL "per-port-pool"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
 enum {
 	/* Long options mapped to a short option */
 
@@ -416,6 +444,7 @@ enum {
 	CMD_LINE_OPT_NO_NUMA_NUM,
 	CMD_LINE_OPT_MAX_PKT_LEN_NUM,
 	CMD_LINE_OPT_PARSE_PER_PORT_POOL,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -424,6 +453,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_NO_NUMA, 0, 0, CMD_LINE_OPT_NO_NUMA_NUM},
 	{CMD_LINE_OPT_MAX_PKT_LEN, 1, 0, CMD_LINE_OPT_MAX_PKT_LEN_NUM},
 	{CMD_LINE_OPT_PER_PORT_POOL, 0, 0, CMD_LINE_OPT_PARSE_PER_PORT_POOL},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -498,6 +528,11 @@ parse_args(int argc, char **argv)
 			per_port_pool = 1;
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -735,6 +770,140 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_generic(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	rte_edge_t i;
+	int ret;
+
+	for (int j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_node_model_generic_set_lcore_affinity(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	/* >8 End of graph initialization. */
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_node_model_generic_set_lcore_affinity(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (int i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_bind_core(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		/* >8 End of graph initialization. */
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -759,6 +928,7 @@ main(int argc, char **argv)
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -787,6 +957,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1026,46 +1199,13 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_DEFAULT)
+		graph_config_rtc(graph_conf);
+	else if (model == RTE_GRAPH_MODEL_GENERIC)
+		graph_config_generic(graph_conf);
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
@ 2022-12-06  3:35   ` Kiran Kumar Kokkilagadda
  2022-12-08  7:26     ` Yan, Zhirun
  2023-02-20 13:50   ` Jerin Jacob
  1 sibling, 1 reply; 369+ messages in thread
From: Kiran Kumar Kokkilagadda @ 2022-12-06  3:35 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran, Nithin Kumar Dabilpuram
  Cc: cunming.liang, haiyue.wang



> -----Original Message-----
> From: Zhirun Yan <zhirun.yan@intel.com>
> Sent: 17 November 2022 10:39 AM
> To: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Kiran
> Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>
> Cc: cunming.liang@intel.com; haiyue.wang@intel.com; Zhirun Yan
> <zhirun.yan@intel.com>
> Subject: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> External Email
> 
> ----------------------------------------------------------------------
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 13 ++++++++
>  lib/graph/version.map               |  3 ++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index
> 54d1390786..a0ea0df153 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -1,5 +1,56 @@
>  #include "rte_graph_model_rtc.h"
> 
> +static enum rte_graph_worker_model worker_model =
> +RTE_GRAPH_MODEL_DEFAULT;
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + * Set the graph worker model
> + *
> + * @note This function does not perform any locking, and is only safe to call
> + *    before graph running.
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +static inline int
> +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> +	if (model >= RTE_GRAPH_MODEL_MAX)
> +		goto fail;
> +
> +	worker_model = model;
> +	return 0;
> +
> +fail:
> +	worker_model = RTE_GRAPH_MODEL_DEFAULT;
> +	return -1;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * Get the graph worker model
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   Graph worker model on success.
> + */
> +__rte_experimental
> +static inline
> +enum rte_graph_worker_model
> +rte_graph_worker_model_get(void)
> +{
> +	return worker_model;
> +}
> +
>  /**
>   * Perform graph walk on the circular buffer and invoke the process function
>   * of the nodes and collect the stats.
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index df33204336..507a344afd 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -86,6 +86,19 @@ struct rte_node {
>  	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes.
> */  } __rte_cache_aligned;
> 
> +
> +
> +/** Graph worker models */
> +enum rte_graph_worker_model {
> +#define WORKER_MODEL_DEFAULT "default"
> +	RTE_GRAPH_MODEL_DEFAULT = 0,
> +#define WORKER_MODEL_RTC "rtc"
> +	RTE_GRAPH_MODEL_RTC,

Since default is RTC, do we need one more enum  for RTC? Can we just have default and generic and remove rtc?

> +#define WORKER_MODEL_GENERIC "generic"
> +	RTE_GRAPH_MODEL_GENERIC,
> +	RTE_GRAPH_MODEL_MAX,
> +};
> +
>  /**
>   * @internal
>   *
> diff --git a/lib/graph/version.map b/lib/graph/version.map index
> 13b838752d..eea73ec9ca 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -43,5 +43,8 @@ EXPERIMENTAL {
>  	rte_node_next_stream_put;
>  	rte_node_next_stream_move;
> 
> +	rte_graph_worker_model_set;
> +	rte_graph_worker_model_get;
> +
>  	local: *;
>  };
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2022-12-06  3:35   ` [EXT] " Kiran Kumar Kokkilagadda
@ 2022-12-08  7:26     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2022-12-08  7:26 UTC (permalink / raw)
  To: Kiran Kumar Kokkilagadda, dev, Jerin Jacob Kollanukkaran,
	Nithin Kumar Dabilpuram
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Kiran Kumar Kokkilagadda <kirankumark@marvell.com>
> Sent: Tuesday, December 6, 2022 11:35 AM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model
> APIs
> 
> 
> 
> > -----Original Message-----
> > From: Zhirun Yan <zhirun.yan@intel.com>
> > Sent: 17 November 2022 10:39 AM
> > To: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > Kiran Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar
> > Dabilpuram <ndabilpuram@marvell.com>
> > Cc: cunming.liang@intel.com; haiyue.wang@intel.com; Zhirun Yan
> > <zhirun.yan@intel.com>
> > Subject: [EXT] [PATCH v1 04/13] graph: add get/set graph worker model
> > APIs
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> >  lib/graph/version.map               |  3 ++
> >  3 files changed, 67 insertions(+)
> >
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h index
> > 54d1390786..a0ea0df153 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -1,5 +1,56 @@
> >  #include "rte_graph_model_rtc.h"
> >
> > +static enum rte_graph_worker_model worker_model =
> > +RTE_GRAPH_MODEL_DEFAULT;
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + * Set the graph worker model
> > + *
> > + * @note This function does not perform any locking, and is only safe to
> call
> > + *    before graph running.
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> > +	if (model >= RTE_GRAPH_MODEL_MAX)
> > +		goto fail;
> > +
> > +	worker_model = model;
> > +	return 0;
> > +
> > +fail:
> > +	worker_model = RTE_GRAPH_MODEL_DEFAULT;
> > +	return -1;
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * Get the graph worker model
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   Graph worker model on success.
> > + */
> > +__rte_experimental
> > +static inline
> > +enum rte_graph_worker_model
> > +rte_graph_worker_model_get(void)
> > +{
> > +	return worker_model;
> > +}
> > +
> >  /**
> >   * Perform graph walk on the circular buffer and invoke the process
> function
> >   * of the nodes and collect the stats.
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index df33204336..507a344afd 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -86,6 +86,19 @@ struct rte_node {
> >  	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes.
> > */  } __rte_cache_aligned;
> >
> > +
> > +
> > +/** Graph worker models */
> > +enum rte_graph_worker_model {
> > +#define WORKER_MODEL_DEFAULT "default"
> > +	RTE_GRAPH_MODEL_DEFAULT = 0,
> > +#define WORKER_MODEL_RTC "rtc"
> > +	RTE_GRAPH_MODEL_RTC,
> 
> Since default is RTC, do we need one more enum  for RTC? Can we just have
> default and generic and remove rtc?
> 

Thanks for your comments.

Actually, there are two kinds of user, professional user and normal.
For professional users, If app chose RTC or GENERIC, it means that there is a specific
requirement for worker model.
And the default is for normal users who don't care the model.

Also, in the future, if more worker model added, RTC will be more clear to describe this
model rather than default. 


> > +#define WORKER_MODEL_GENERIC "generic"
> > +	RTE_GRAPH_MODEL_GENERIC,
> > +	RTE_GRAPH_MODEL_MAX,
> > +};
> > +
> >  /**
> >   * @internal
> >   *
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > 13b838752d..eea73ec9ca 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -43,5 +43,8 @@ EXPERIMENTAL {
> >  	rte_node_next_stream_put;
> >  	rte_node_next_stream_move;
> >
> > +	rte_graph_worker_model_set;
> > +	rte_graph_worker_model_get;
> > +
> >  	local: *;
> >  };
> > --
> > 2.25.1


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 00/13] graph enhancement for multi-core dispatch
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (12 preceding siblings ...)
  2022-11-17  5:09 ` [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model Zhirun Yan
@ 2023-02-20  0:22 ` Thomas Monjalon
  2023-02-20  8:28   ` Yan, Zhirun
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
  14 siblings, 1 reply; 369+ messages in thread
From: Thomas Monjalon @ 2023-02-20  0:22 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram
  Cc: dev, cunming.liang, haiyue.wang, Zhirun Yan, Zhirun Yan

This series doesn't look reviewed.
What is the status?

17/11/2022 06:09, Zhirun Yan:
> Currently, rte_graph supports RTC (Run-To-Completion) model within each
> of a single core.
> RTC is one of the typical model of packet processing. Others like
> Pipeline or Hybrid are lack of support.
> 
> The patch set introduces a 'generic' model selection which is a
> self-reacting scheme according to the core affinity.
> The new model enables a cross-core dispatching mechanism which employs a
> scheduling work-queue to dispatch streams to other worker cores which
> being associated with the destination node. When core flavor of the
> destination node is a default 'current', the stream can be continue
> executed as normal.
> 
> Example:
> 3-node graph targets 3-core budget
> 
> Generic Model
> RTC:
> Config Graph-A: node-0->current; node-1->current; node-2->current;
> Graph-A':node-0/1/2 @0, Graph-A':node-0/1/2 @1, Graph-A':node-0/1/2 @2
> 
> + - - - - - - - - - - - - - - - - - - - - - +
> '                Core #0/1/2                '
> '                                           '
> ' +--------+     +---------+     +--------+ '
> ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> ' +--------+     +---------+     +--------+ '
> '                                           '
> + - - - - - - - - - - - - - - - - - - - - - +
> 
> Pipeline:
> Config Graph-A: node-0->0; node-1->1; node-2->2;
> Graph-A':node-0 @0, Graph-A':node-1 @1, Graph-A':node-2 @2
> 
> + - - - - - -+     +- - - - - - +     + - - - - - -+
> '  Core #0   '     '  Core #1   '     '  Core #2   '
> '            '     '            '     '            '
> ' +--------+ '     ' +--------+ '     ' +--------+ '
> ' | Node-0 | ' --> ' | Node-1 | ' --> ' | Node-2 | '
> ' +--------+ '     ' +--------+ '     ' +--------+ '
> '            '     '            '     '            '
> + - - - - - -+     +- - - - - - +     + - - - - - -+
> 
> Hybrid:
> Config Graph-A: node-0->current; node-1->current; node-2->2;
> Graph-A':node-0/1 @0, Graph-A':node-0/1 @1, Graph-A':node-2 @2
> 
> + - - - - - - - - - - - - - - - +     + - - - - - -+
> '            Core #0            '     '  Core #2   '
> '                               '     '            '
> ' +--------+         +--------+ '     ' +--------+ '
> ' | Node-0 | ------> | Node-1 | ' --> ' | Node-2 | '
> ' +--------+         +--------+ '     ' +--------+ '
> '                               '     '            '
> + - - - - - - - - - - - - - - - +     + - - - - - -+
>                                           ^
>                                           |
>                                           |
> + - - - - - - - - - - - - - - - +         |
> '            Core #1            '         |
> '                               '         |
> ' +--------+         +--------+ '         |
> ' | Node-0 | ------> | Node-1 | ' --------+
> ' +--------+         +--------+ '
> '                               '
> + - - - - - - - - - - - - - - - +
> 
> 
> The patch set has been break down as below:
> 
> 1. Split graph worker into common and default model part.
> 2. Inline graph node processing and graph circular buffer walking to make
>   it reusable.
> 3. Add set/get APIs to choose worker model.
> 4. Introduce core affinity API to set the node run on specific worker core.
>   (only use in new model)
> 5. Introduce graph affinity API to bind one graph with specific worker
>   core.
> 6. Introduce graph clone API.
> 7. Introduce stream moving with scheduler work-queue in patch 8,9,10.
> 8. Add stats for new models.
> 9. Abstract default graph config process and integrate new model into
>   example/l3fwd-graph. Add new parameters for model choosing.
> 
> We could run with new worker model by this:
> ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="generic"
> 
> References:
> https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
> 
> Zhirun Yan (13):
>   graph: split graph worker into common and default model
>   graph: move node process into inline function
>   graph: add macro to walk on graph circular buffer
>   graph: add get/set graph worker model APIs
>   graph: introduce core affinity API
>   graph: introduce graph affinity API
>   graph: introduce graph clone API for other worker core
>   graph: introduce stream moving cross cores
>   graph: enable create and destroy graph scheduling workqueue
>   graph: introduce graph walk by cross-core dispatch
>   graph: enable graph generic scheduler model
>   graph: add stats for corss-core dispatching
>   examples/l3fwd-graph: introduce generic worker model
> 
>  examples/l3fwd-graph/main.c         | 218 +++++++++--
>  lib/graph/graph.c                   | 179 +++++++++
>  lib/graph/graph_debug.c             |   6 +
>  lib/graph/graph_populate.c          |   1 +
>  lib/graph/graph_private.h           |  44 +++
>  lib/graph/graph_stats.c             |  74 +++-
>  lib/graph/meson.build               |   3 +-
>  lib/graph/node.c                    |   1 +
>  lib/graph/rte_graph.h               |  44 +++
>  lib/graph/rte_graph_model_generic.c | 179 +++++++++
>  lib/graph/rte_graph_model_generic.h | 114 ++++++
>  lib/graph/rte_graph_model_rtc.h     |  22 ++
>  lib/graph/rte_graph_worker.h        | 516 ++------------------------
>  lib/graph/rte_graph_worker_common.h | 545 ++++++++++++++++++++++++++++
>  lib/graph/version.map               |   8 +
>  15 files changed, 1430 insertions(+), 524 deletions(-)
>  create mode 100644 lib/graph/rte_graph_model_generic.c
>  create mode 100644 lib/graph/rte_graph_model_generic.h
>  create mode 100644 lib/graph/rte_graph_model_rtc.h
>  create mode 100644 lib/graph/rte_graph_worker_common.h
> 
> 






^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 00/13] graph enhancement for multi-core dispatch
  2023-02-20  0:22 ` [PATCH v1 00/13] graph enhancement for multi-core dispatch Thomas Monjalon
@ 2023-02-20  8:28   ` Yan, Zhirun
  2023-02-20  9:33     ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-20  8:28 UTC (permalink / raw)
  To: Thomas Monjalon, jerinj, kirankumark, ndabilpuram
  Cc: dev, Liang, Cunming, Wang, Haiyue

Hi Thomas,

Jerin and Kiran gave some comments before.
And @jerinj@marvell.com @kirankumark@marvell.com 
could you help to review it?
Thanks.

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, February 20, 2023 8:22 AM
> To: jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com
> Cc: dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; Yan, Zhirun <zhirun.yan@intel.com>; Yan,
> Zhirun <zhirun.yan@intel.com>
> Subject: Re: [PATCH v1 00/13] graph enhancement for multi-core dispatch
> 
> This series doesn't look reviewed.
> What is the status?
> 
> 17/11/2022 06:09, Zhirun Yan:
> > Currently, rte_graph supports RTC (Run-To-Completion) model within
> > each of a single core.
> > RTC is one of the typical model of packet processing. Others like
> > Pipeline or Hybrid are lack of support.
> >
> > The patch set introduces a 'generic' model selection which is a
> > self-reacting scheme according to the core affinity.
> > The new model enables a cross-core dispatching mechanism which
> employs
> > a scheduling work-queue to dispatch streams to other worker cores
> > which being associated with the destination node. When core flavor of
> > the destination node is a default 'current', the stream can be
> > continue executed as normal.
> >
> > Example:
> > 3-node graph targets 3-core budget
> >
> > Generic Model
> > RTC:
> > Config Graph-A: node-0->current; node-1->current; node-2->current;
> > Graph-A':node-0/1/2 @0, Graph-A':node-0/1/2 @1, Graph-A':node-0/1/2
> @2
> >
> > + - - - - - - - - - - - - - - - - - - - - - +
> > '                Core #0/1/2                '
> > '                                           '
> > ' +--------+     +---------+     +--------+ '
> > ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> > ' +--------+     +---------+     +--------+ '
> > '                                           '
> > + - - - - - - - - - - - - - - - - - - - - - +
> >
> > Pipeline:
> > Config Graph-A: node-0->0; node-1->1; node-2->2;
> > Graph-A':node-0 @0, Graph-A':node-1 @1, Graph-A':node-2 @2
> >
> > + - - - - - -+     +- - - - - - +     + - - - - - -+
> > '  Core #0   '     '  Core #1   '     '  Core #2   '
> > '            '     '            '     '            '
> > ' +--------+ '     ' +--------+ '     ' +--------+ '
> > ' | Node-0 | ' --> ' | Node-1 | ' --> ' | Node-2 | '
> > ' +--------+ '     ' +--------+ '     ' +--------+ '
> > '            '     '            '     '            '
> > + - - - - - -+     +- - - - - - +     + - - - - - -+
> >
> > Hybrid:
> > Config Graph-A: node-0->current; node-1->current; node-2->2;
> > Graph-A':node-0/1 @0, Graph-A':node-0/1 @1, Graph-A':node-2 @2
> >
> > + - - - - - - - - - - - - - - - +     + - - - - - -+
> > '            Core #0            '     '  Core #2   '
> > '                               '     '            '
> > ' +--------+         +--------+ '     ' +--------+ '
> > ' | Node-0 | ------> | Node-1 | ' --> ' | Node-2 | '
> > ' +--------+         +--------+ '     ' +--------+ '
> > '                               '     '            '
> > + - - - - - - - - - - - - - - - +     + - - - - - -+
> >                                           ^
> >                                           |
> >                                           |
> > + - - - - - - - - - - - - - - - +         |
> > '            Core #1            '         |
> > '                               '         |
> > ' +--------+         +--------+ '         |
> > ' | Node-0 | ------> | Node-1 | ' --------+
> > ' +--------+         +--------+ '
> > '                               '
> > + - - - - - - - - - - - - - - - +
> >
> >
> > The patch set has been break down as below:
> >
> > 1. Split graph worker into common and default model part.
> > 2. Inline graph node processing and graph circular buffer walking to make
> >   it reusable.
> > 3. Add set/get APIs to choose worker model.
> > 4. Introduce core affinity API to set the node run on specific worker core.
> >   (only use in new model)
> > 5. Introduce graph affinity API to bind one graph with specific worker
> >   core.
> > 6. Introduce graph clone API.
> > 7. Introduce stream moving with scheduler work-queue in patch 8,9,10.
> > 8. Add stats for new models.
> > 9. Abstract default graph config process and integrate new model into
> >   example/l3fwd-graph. Add new parameters for model choosing.
> >
> > We could run with new worker model by this:
> > ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > --model="generic"
> >
> > References:
> > https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20intro
> > duce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
> >
> > Zhirun Yan (13):
> >   graph: split graph worker into common and default model
> >   graph: move node process into inline function
> >   graph: add macro to walk on graph circular buffer
> >   graph: add get/set graph worker model APIs
> >   graph: introduce core affinity API
> >   graph: introduce graph affinity API
> >   graph: introduce graph clone API for other worker core
> >   graph: introduce stream moving cross cores
> >   graph: enable create and destroy graph scheduling workqueue
> >   graph: introduce graph walk by cross-core dispatch
> >   graph: enable graph generic scheduler model
> >   graph: add stats for corss-core dispatching
> >   examples/l3fwd-graph: introduce generic worker model
> >
> >  examples/l3fwd-graph/main.c         | 218 +++++++++--
> >  lib/graph/graph.c                   | 179 +++++++++
> >  lib/graph/graph_debug.c             |   6 +
> >  lib/graph/graph_populate.c          |   1 +
> >  lib/graph/graph_private.h           |  44 +++
> >  lib/graph/graph_stats.c             |  74 +++-
> >  lib/graph/meson.build               |   3 +-
> >  lib/graph/node.c                    |   1 +
> >  lib/graph/rte_graph.h               |  44 +++
> >  lib/graph/rte_graph_model_generic.c | 179 +++++++++
> > lib/graph/rte_graph_model_generic.h | 114 ++++++
> >  lib/graph/rte_graph_model_rtc.h     |  22 ++
> >  lib/graph/rte_graph_worker.h        | 516 ++------------------------
> >  lib/graph/rte_graph_worker_common.h | 545
> ++++++++++++++++++++++++++++
> >  lib/graph/version.map               |   8 +
> >  15 files changed, 1430 insertions(+), 524 deletions(-)  create mode
> > 100644 lib/graph/rte_graph_model_generic.c
> >  create mode 100644 lib/graph/rte_graph_model_generic.h
> >  create mode 100644 lib/graph/rte_graph_model_rtc.h  create mode
> > 100644 lib/graph/rte_graph_worker_common.h
> >
> >
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 00/13] graph enhancement for multi-core dispatch
  2023-02-20  8:28   ` Yan, Zhirun
@ 2023-02-20  9:33     ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20  9:33 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: Thomas Monjalon, jerinj, kirankumark, ndabilpuram, dev, Liang,
	Cunming, Wang, Haiyue

On Mon, Feb 20, 2023 at 1:58 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
> Hi Thomas,
>
> Jerin and Kiran gave some comments before.
> And @jerinj@marvell.com @kirankumark@marvell.com
> could you help to review it?


Sure. I will do the next level.


> Thanks.
>
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Monday, February 20, 2023 8:22 AM
> > To: jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com
> > Cc: dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>; Yan, Zhirun <zhirun.yan@intel.com>; Yan,
> > Zhirun <zhirun.yan@intel.com>
> > Subject: Re: [PATCH v1 00/13] graph enhancement for multi-core dispatch
> >
> > This series doesn't look reviewed.
> > What is the status?
> >
> > 17/11/2022 06:09, Zhirun Yan:
> > > Currently, rte_graph supports RTC (Run-To-Completion) model within
> > > each of a single core.
> > > RTC is one of the typical model of packet processing. Others like
> > > Pipeline or Hybrid are lack of support.
> > >
> > > The patch set introduces a 'generic' model selection which is a
> > > self-reacting scheme according to the core affinity.
> > > The new model enables a cross-core dispatching mechanism which
> > employs
> > > a scheduling work-queue to dispatch streams to other worker cores
> > > which being associated with the destination node. When core flavor of
> > > the destination node is a default 'current', the stream can be
> > > continue executed as normal.
> > >
> > > Example:
> > > 3-node graph targets 3-core budget
> > >
> > > Generic Model
> > > RTC:
> > > Config Graph-A: node-0->current; node-1->current; node-2->current;
> > > Graph-A':node-0/1/2 @0, Graph-A':node-0/1/2 @1, Graph-A':node-0/1/2
> > @2
> > >
> > > + - - - - - - - - - - - - - - - - - - - - - +
> > > '                Core #0/1/2                '
> > > '                                           '
> > > ' +--------+     +---------+     +--------+ '
> > > ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> > > ' +--------+     +---------+     +--------+ '
> > > '                                           '
> > > + - - - - - - - - - - - - - - - - - - - - - +
> > >
> > > Pipeline:
> > > Config Graph-A: node-0->0; node-1->1; node-2->2;
> > > Graph-A':node-0 @0, Graph-A':node-1 @1, Graph-A':node-2 @2
> > >
> > > + - - - - - -+     +- - - - - - +     + - - - - - -+
> > > '  Core #0   '     '  Core #1   '     '  Core #2   '
> > > '            '     '            '     '            '
> > > ' +--------+ '     ' +--------+ '     ' +--------+ '
> > > ' | Node-0 | ' --> ' | Node-1 | ' --> ' | Node-2 | '
> > > ' +--------+ '     ' +--------+ '     ' +--------+ '
> > > '            '     '            '     '            '
> > > + - - - - - -+     +- - - - - - +     + - - - - - -+
> > >
> > > Hybrid:
> > > Config Graph-A: node-0->current; node-1->current; node-2->2;
> > > Graph-A':node-0/1 @0, Graph-A':node-0/1 @1, Graph-A':node-2 @2
> > >
> > > + - - - - - - - - - - - - - - - +     + - - - - - -+
> > > '            Core #0            '     '  Core #2   '
> > > '                               '     '            '
> > > ' +--------+         +--------+ '     ' +--------+ '
> > > ' | Node-0 | ------> | Node-1 | ' --> ' | Node-2 | '
> > > ' +--------+         +--------+ '     ' +--------+ '
> > > '                               '     '            '
> > > + - - - - - - - - - - - - - - - +     + - - - - - -+
> > >                                           ^
> > >                                           |
> > >                                           |
> > > + - - - - - - - - - - - - - - - +         |
> > > '            Core #1            '         |
> > > '                               '         |
> > > ' +--------+         +--------+ '         |
> > > ' | Node-0 | ------> | Node-1 | ' --------+
> > > ' +--------+         +--------+ '
> > > '                               '
> > > + - - - - - - - - - - - - - - - +
> > >
> > >
> > > The patch set has been break down as below:
> > >
> > > 1. Split graph worker into common and default model part.
> > > 2. Inline graph node processing and graph circular buffer walking to make
> > >   it reusable.
> > > 3. Add set/get APIs to choose worker model.
> > > 4. Introduce core affinity API to set the node run on specific worker core.
> > >   (only use in new model)
> > > 5. Introduce graph affinity API to bind one graph with specific worker
> > >   core.
> > > 6. Introduce graph clone API.
> > > 7. Introduce stream moving with scheduler work-queue in patch 8,9,10.
> > > 8. Add stats for new models.
> > > 9. Abstract default graph config process and integrate new model into
> > >   example/l3fwd-graph. Add new parameters for model choosing.
> > >
> > > We could run with new worker model by this:
> > > ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > > --model="generic"
> > >
> > > References:
> > > https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20intro
> > > duce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
> > >
> > > Zhirun Yan (13):
> > >   graph: split graph worker into common and default model
> > >   graph: move node process into inline function
> > >   graph: add macro to walk on graph circular buffer
> > >   graph: add get/set graph worker model APIs
> > >   graph: introduce core affinity API
> > >   graph: introduce graph affinity API
> > >   graph: introduce graph clone API for other worker core
> > >   graph: introduce stream moving cross cores
> > >   graph: enable create and destroy graph scheduling workqueue
> > >   graph: introduce graph walk by cross-core dispatch
> > >   graph: enable graph generic scheduler model
> > >   graph: add stats for corss-core dispatching
> > >   examples/l3fwd-graph: introduce generic worker model
> > >
> > >  examples/l3fwd-graph/main.c         | 218 +++++++++--
> > >  lib/graph/graph.c                   | 179 +++++++++
> > >  lib/graph/graph_debug.c             |   6 +
> > >  lib/graph/graph_populate.c          |   1 +
> > >  lib/graph/graph_private.h           |  44 +++
> > >  lib/graph/graph_stats.c             |  74 +++-
> > >  lib/graph/meson.build               |   3 +-
> > >  lib/graph/node.c                    |   1 +
> > >  lib/graph/rte_graph.h               |  44 +++
> > >  lib/graph/rte_graph_model_generic.c | 179 +++++++++
> > > lib/graph/rte_graph_model_generic.h | 114 ++++++
> > >  lib/graph/rte_graph_model_rtc.h     |  22 ++
> > >  lib/graph/rte_graph_worker.h        | 516 ++------------------------
> > >  lib/graph/rte_graph_worker_common.h | 545
> > ++++++++++++++++++++++++++++
> > >  lib/graph/version.map               |   8 +
> > >  15 files changed, 1430 insertions(+), 524 deletions(-)  create mode
> > > 100644 lib/graph/rte_graph_model_generic.c
> > >  create mode 100644 lib/graph/rte_graph_model_generic.h
> > >  create mode 100644 lib/graph/rte_graph_model_rtc.h  create mode
> > > 100644 lib/graph/rte_graph_worker_common.h
> > >
> > >
> >
> >
> >
> >
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 01/13] graph: split graph worker into common and default model
  2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
@ 2023-02-20 13:38   ` Jerin Jacob
  2023-02-24  6:29     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 13:38 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:39 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> To support multiple graph worker model, split graph into common
> and default. Naming the current walk function as rte_graph_model_rtc
> cause the default model is RTC(Run-to-completion).

There CI issues with this series. Please check
https://patches.dpdk.org/project/dpdk/patch/20221117050926.136974-2-zhirun.yan@intel.com/
# Please make sure each patch builds with devtools/test-meson-builds.sh
# Please make sure each patch dont any issue with app/test/test_graph.c test
# This series dont have perf issues with app/test/test_graph_perf.c
# Both RTC and new mode runs with l3fwd_graph without any performance regression
# Please Introduce model concept in documentation at
doc/guides/prog_guide/graph_lib.rst and details for this generic mode.

Also update the maintainers files for new model files.

>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_model_rtc.h     |  57 ++++
>  lib/graph/rte_graph_worker.h        | 498 +---------------------------
>  lib/graph/rte_graph_worker_common.h | 456 +++++++++++++++++++++++++

Use git mv to avoid loosing history and reduce the diff.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 02/13] graph: move node process into inline function
  2022-11-17  5:09 ` [PATCH v1 02/13] graph: move node process into inline function Zhirun Yan
@ 2023-02-20 13:39   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 13:39 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Node process is a single and reusable block, move the code into an inline
> function.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>


> ---
>  lib/graph/rte_graph_model_rtc.h     | 18 +---------------
>  lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
>  2 files changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
> index fb58730bde..c80b0ce962 100644
> --- a/lib/graph/rte_graph_model_rtc.h
> +++ b/lib/graph/rte_graph_model_rtc.h
> @@ -16,9 +16,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>         const rte_node_t mask = graph->cir_mask;
>         uint32_t head = graph->head;
>         struct rte_node *node;
> -       uint64_t start;
> -       uint16_t rc;
> -       void **objs;
>
>         /*
>          * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
> @@ -37,20 +34,7 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>          */
>         while (likely(head != graph->tail)) {
>                 node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
> -               RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> -               objs = node->objs;
> -               rte_prefetch0(objs);
> -
> -               if (rte_graph_has_stats_feature()) {
> -                       start = rte_rdtsc();
> -                       rc = node->process(graph, node, objs, node->idx);
> -                       node->total_cycles += rte_rdtsc() - start;
> -                       node->total_calls++;
> -                       node->total_objs += rc;
> -               } else {
> -                       node->process(graph, node, objs, node->idx);
> -               }
> -               node->idx = 0;
> +               __rte_node_process(graph, node);
>                 head = likely((int32_t)head > 0) ? head & mask : head;
>         }
>         graph->tail = 0;
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index 91a5de7fa4..b7b2bb958c 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -121,6 +121,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
>
>  /* Fast path helper functions */
>
> +/**
> + * @internal
> + *
> + * Enqueue a given node to the tail of the graph reel.
> + *
> + * @param graph
> + *   Pointer Graph object.
> + * @param node
> + *   Pointer to node object to be enqueued.
> + */
> +static __rte_always_inline void
> +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> +{
> +       uint64_t start;
> +       uint16_t rc;
> +       void **objs;
> +
> +       RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +       objs = node->objs;
> +       rte_prefetch0(objs);
> +
> +       if (rte_graph_has_stats_feature()) {
> +               start = rte_rdtsc();
> +               rc = node->process(graph, node, objs, node->idx);
> +               node->total_cycles += rte_rdtsc() - start;
> +               node->total_calls++;
> +               node->total_objs += rc;
> +       } else {
> +               node->process(graph, node, objs, node->idx);
> +       }
> +       node->idx = 0;
> +}
> +
>  /**
>   * @internal
>   *
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 03/13] graph: add macro to walk on graph circular buffer
  2022-11-17  5:09 ` [PATCH v1 03/13] graph: add macro to walk on graph circular buffer Zhirun Yan
@ 2023-02-20 13:45   ` Jerin Jacob
  2023-02-24  6:30     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 13:45 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> It is common to walk on graph circular buffer and use macro to make
> it reusable for other worker models.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_model_rtc.h     | 23 ++---------------------
>  lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
>  2 files changed, 25 insertions(+), 21 deletions(-)

> +/**
> + * Macro to walk on the source node(s) ((cir_start - head) -> cir_start)
> + * and then on the pending streams
> + * (cir_start -> (cir_start + mask) -> cir_start)
> + * in a circular buffer fashion.
> + *
> + *     +-----+ <= cir_start - head [number of source nodes]
> + *     |     |
> + *     | ... | <= source nodes
> + *     |     |
> + *     +-----+ <= cir_start [head = 0] [tail = 0]
> + *     |     |
> + *     | ... | <= pending streams
> + *     |     |
> + *     +-----+ <= cir_start + mask
> + */
> +#define rte_graph_walk_node(graph, head, node)                                         \
> +       for ((node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]);        \
> +            likely((head) != (graph)->tail);                                           \
> +            (head)++,                                                                  \
> +            (node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]),        \

This is an additional assignment compare to original while() based
version. Right?
No need to generalize with performance impact.


> +            (head) = likely((int32_t)(head) > 0) ? (head) & (graph)->cir_mask : (head))
> +
>  /**
>   * @internal
>   *
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
  2022-12-06  3:35   ` [EXT] " Kiran Kumar Kokkilagadda
@ 2023-02-20 13:50   ` Jerin Jacob
  2023-02-24  6:31     ` Yan, Zhirun
  1 sibling, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 13:50 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 13 ++++++++
>  lib/graph/version.map               |  3 ++
>  3 files changed, 67 insertions(+)
>
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index 54d1390786..a0ea0df153 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -1,5 +1,56 @@
>  #include "rte_graph_model_rtc.h"
>
> +static enum rte_graph_worker_model worker_model = RTE_GRAPH_MODEL_DEFAULT;

This will break the multiprocess.

> +
> +/** Graph worker models */
> +enum rte_graph_worker_model {
> +#define WORKER_MODEL_DEFAULT "default"

Why need strings?
Also, every symbol in a public header file should start with RTE_ to
avoid namespace conflict.

> +       RTE_GRAPH_MODEL_DEFAULT = 0,
> +#define WORKER_MODEL_RTC "rtc"
> +       RTE_GRAPH_MODEL_RTC,

Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum itself.

> +#define WORKER_MODEL_GENERIC "generic"

Generic is a very overloaded term. Use pipeline here i.e
RTE_GRAPH_MODEL_PIPELINE


> +       RTE_GRAPH_MODEL_GENERIC,
> +       RTE_GRAPH_MODEL_MAX,

No need for MAX, it will break the ABI for future. See other subsystem
such as cryptodev.

> +};

>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 05/13] graph: introduce core affinity API
  2022-11-17  5:09 ` [PATCH v1 05/13] graph: introduce core affinity API Zhirun Yan
@ 2023-02-20 14:05   ` Jerin Jacob
  2023-02-24  6:32     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 14:05 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> 1. add lcore_id for node to hold affinity core id.
> 2. impl rte_node_model_generic_set_lcore_affinity to affinity node
>    with one lcore.
> 3. update version map for graph public API.

No need to explicitly tell 3. Rewrite 1 and 2 , one or two sentence
without 1 and 2.

>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph_private.h           |  1 +
>  lib/graph/meson.build               |  1 +
>  lib/graph/node.c                    |  1 +
>  lib/graph/rte_graph_model_generic.c | 31 +++++++++++++++++++++
>  lib/graph/rte_graph_model_generic.h | 43 +++++++++++++++++++++++++++++
>  lib/graph/version.map               |  2 ++
>  6 files changed, 79 insertions(+)
>  create mode 100644 lib/graph/rte_graph_model_generic.c
>  create mode 100644 lib/graph/rte_graph_model_generic.h
>
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index f9a85c8926..627090f802 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -49,6 +49,7 @@ struct node {
>         STAILQ_ENTRY(node) next;      /**< Next node in the list. */
>         char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
>         uint64_t flags;               /**< Node configuration flag. */
> +       unsigned int lcore_id;        /**< Node runs on the Lcore ID */
>         rte_node_process_t process;   /**< Node process function. */
>         rte_node_init_t init;         /**< Node init function. */
>         rte_node_fini_t fini;         /**< Node fini function. */
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index c7327549e8..8c8b11ed27 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -14,6 +14,7 @@ sources = files(
>          'graph_debug.c',
>          'graph_stats.c',
>          'graph_populate.c',
> +        'rte_graph_model_generic.c',
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
>
> diff --git a/lib/graph/node.c b/lib/graph/node.c
> index fc6345de07..8ad4b3cbeb 100644
> --- a/lib/graph/node.c
> +++ b/lib/graph/node.c
> @@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
>                         goto free;
>         }
>
> +       node->lcore_id = RTE_MAX_LCORE;
>         node->id = node_id++;
>
>         /* Add the node at tail */
> diff --git a/lib/graph/rte_graph_model_generic.c b/lib/graph/rte_graph_model_generic.c
> new file mode 100644
> index 0000000000..54ff659c7b
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_generic.c
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Intel Corporation
> + */
> +
> +#include "graph_private.h"
> +#include "rte_graph_model_generic.h"
> +
> +int
> +rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id)

Please use action/verb as last. Also It is graph specific API. Right?
I would suggest, rte_graph_model_pipeline_lcore_affinity_set()

> diff --git a/lib/graph/rte_graph_model_generic.h b/lib/graph/rte_graph_model_generic.h
> new file mode 100644
> index 0000000000..20ca48a9e3
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_generic.h
> @@ -0,0 +1,43 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Intel Corporation
> + */
> +
> +#ifndef _RTE_GRAPH_MODEL_GENERIC_H_
> +#define _RTE_GRAPH_MODEL_GENERIC_H_
> +
> +/**
> + * @file rte_graph_model_generic.h
> + *
> + * @warning
> + * @b EXPERIMENTAL:
> + * All functions in this file may be changed or removed without prior notice.
> + *
> + * This API allows a worker thread to walk over a graph and nodes to create,
> + * process, enqueue and move streams of objects to the next nodes.
> + */
> +#include "rte_graph_worker_common.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Set lcore affinity to the node.
> + *
> + * @param name
> + *   Valid node name. In the case of the cloned node, the name will be
> + * "parent node name" + "-" + name.
> + * @param lcore_id
> + *   The lcore ID value.
> + *
> + * @return
> + *   0 on success, error otherwise.
> + */
> +__rte_experimental
> +int rte_node_model_generic_set_lcore_affinity(const char *name, unsigned int lcore_id);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_GRAPH_MODEL_GENERIC_H_ */
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index eea73ec9ca..33ff055be6 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -46,5 +46,7 @@ EXPERIMENTAL {
>         rte_graph_worker_model_set;
>         rte_graph_worker_model_get;
>
> +       rte_node_model_generic_set_lcore_affinity;
> +
>         local: *;
>  };
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 06/13] graph: introduce graph affinity API
  2022-11-17  5:09 ` [PATCH v1 06/13] graph: introduce graph " Zhirun Yan
@ 2023-02-20 14:07   ` Jerin Jacob
  2023-02-24  6:39     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 14:07 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add lcore_id for graph to hold affinity core id where graph would run on.
> Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
> be set as MAX by default, it means not enable this attribute.
>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index 33ff055be6..1c599b5b47 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -18,6 +18,8 @@ EXPERIMENTAL {
>         rte_graph_node_get_by_name;
>         rte_graph_obj_dump;
>         rte_graph_walk;
> +       rte_graph_bind_core;

if it is not applicable to RTC, please change to
rte_graph_model_pipeline_core_bind()

> +       rte_graph_unbind_core;
>
>         rte_graph_cluster_stats_create;
>         rte_graph_cluster_stats_destroy;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 08/13] graph: introduce stream moving cross cores
  2022-11-17  5:09 ` [PATCH v1 08/13] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-02-20 14:17   ` Jerin Jacob
  2023-02-24  6:48     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 14:17 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch introduces key functions to allow a worker thread to
> enable enqueue and move streams of objects to the next nodes over
> different cores.
>
> 1. add graph_sched_wq_node to hold graph scheduling workqueue node
> stream
> 2. add workqueue help functions to create/destroy/enqueue/dequeue

Two things, make as two patches


> @@ -39,6 +46,15 @@ struct rte_graph {
>         uint32_t cir_mask;           /**< Circular buffer wrap around mask. */
>         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
>         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
> +       /* Graph schedule */
> +       struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
> +       struct rte_graph_rq_head rq_head; /* The head for run-queue list */
> +
> +       SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
> +       unsigned int lcore_id;  /**< The graph running Lcore. */
> +       struct rte_ring *wq;    /**< The work-queue for pending streams. */
> +       struct rte_mempool *mp; /**< The mempool for scheduling streams. */
> +       /* Graph schedule area */
>         rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
>         rte_graph_t id; /**< Graph identifier. */
>         int socket;     /**< Socket ID where memory is allocated. */
> @@ -63,6 +79,8 @@ struct rte_node {
>         char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
>         char name[RTE_NODE_NAMESIZE];   /**< Name of the node. */
>
> +       /* Fast schedule area */
> +       unsigned int lcore_id __rte_cache_aligned;  /**< Node running Lcore. */

Do we need __rte_cache_aligned here? I am wondering can we add union
for different model specific area ONLY for fast path so that we can
save memory and fast path data will be more warm.

>         /* Fast path area  */
>  #define RTE_NODE_CTX_SZ 16
>         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model
  2022-11-17  5:09 ` [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model Zhirun Yan
@ 2023-02-20 14:20   ` Jerin Jacob
  2023-02-24  6:49     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-20 14:20 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:41 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new parameter "model" to choose generic or rtc worker model.
> And in generic model, the node will affinity to worker core successively.
>
> Note:
> only support one RX node for remote model in current implementation.
>
> ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="generic"

Patch apply issue, please rebase with main.
See https://patches.dpdk.org/project/dpdk/patch/20221117050926.136974-14-zhirun.yan@intel.com/

>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  examples/l3fwd-graph/main.c | 218 +++++++++++++++++++++++++++++-------
>  1 file changed, 179 insertions(+), 39 deletions(-)
>
> diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> index 6dcb6ee92b..c145a3e3e8 100644
> --- a/examples/l3fwd-graph/main.c
> +++ b/examples/l3fwd-graph/main.c
> @@ -147,6 +147,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
>         {RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
>  };
>
> +static int
> +check_worker_model_params(void)
> +{
> +       if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC &&
> +           nb_lcore_params > 1) {
> +               printf("Exceeded max number of lcore params for remote model: %hu\n",
> +                      nb_lcore_params);
> +               return -1;
> +       }
> +
> +       return 0;
> +}
> +
>  static int
>  check_lcore_params(void)
>  {
> @@ -291,6 +304,20 @@ parse_max_pkt_len(const char *pktlen)
>         return len;
>  }
>
> +static int
> +parse_worker_model(const char *model)
> +{
> +       if (strcmp(model, WORKER_MODEL_DEFAULT) == 0)
> +               return RTE_GRAPH_MODEL_DEFAULT;
> +       else if (strcmp(model, WORKER_MODEL_GENERIC) == 0) {
> +               rte_graph_worker_model_set(RTE_GRAPH_MODEL_GENERIC);
> +               return RTE_GRAPH_MODEL_GENERIC;
> +       }
> +       rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
> +
> +       return RTE_GRAPH_MODEL_MAX;
> +}
> +
>  static int
>  parse_portmask(const char *portmask)
>  {
> @@ -404,6 +431,7 @@ static const char short_options[] = "p:" /* portmask */
>  #define CMD_LINE_OPT_NO_NUMA      "no-numa"
>  #define CMD_LINE_OPT_MAX_PKT_LEN   "max-pkt-len"
>  #define CMD_LINE_OPT_PER_PORT_POOL "per-port-pool"
> +#define CMD_LINE_OPT_WORKER_MODEL  "model"
>  enum {
>         /* Long options mapped to a short option */
>
> @@ -416,6 +444,7 @@ enum {
>         CMD_LINE_OPT_NO_NUMA_NUM,
>         CMD_LINE_OPT_MAX_PKT_LEN_NUM,
>         CMD_LINE_OPT_PARSE_PER_PORT_POOL,
> +       CMD_LINE_OPT_WORKER_MODEL_TYPE,
>  };
>
>  static const struct option lgopts[] = {
> @@ -424,6 +453,7 @@ static const struct option lgopts[] = {
>         {CMD_LINE_OPT_NO_NUMA, 0, 0, CMD_LINE_OPT_NO_NUMA_NUM},
>         {CMD_LINE_OPT_MAX_PKT_LEN, 1, 0, CMD_LINE_OPT_MAX_PKT_LEN_NUM},
>         {CMD_LINE_OPT_PER_PORT_POOL, 0, 0, CMD_LINE_OPT_PARSE_PER_PORT_POOL},
> +       {CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
>         {NULL, 0, 0, 0},
>  };
>
> @@ -498,6 +528,11 @@ parse_args(int argc, char **argv)
>                         per_port_pool = 1;
>                         break;
>
> +               case CMD_LINE_OPT_WORKER_MODEL_TYPE:
> +                       printf("Use new worker model: %s\n", optarg);
> +                       parse_worker_model(optarg);
> +                       break;
> +
>                 default:
>                         print_usage(prgname);
>                         return -1;
> @@ -735,6 +770,140 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
>         return 0;
>  }
>
> +static void
> +graph_config_generic(struct rte_graph_param graph_conf)
> +{
> +       uint16_t nb_patterns = graph_conf.nb_node_patterns;
> +       int worker_count = rte_lcore_count() - 1;
> +       int main_lcore_id = rte_get_main_lcore();
> +       int worker_lcore = main_lcore_id;
> +       rte_graph_t main_graph_id = 0;
> +       struct rte_node *node_tmp;
> +       struct lcore_conf *qconf;
> +       struct rte_graph *graph;
> +       rte_graph_t graph_id;
> +       rte_graph_off_t off;
> +       int n_rx_node = 0;
> +       rte_node_t count;
> +       rte_edge_t i;
> +       int ret;
> +
> +       for (int j = 0; j < nb_lcore_params; j++) {
> +               qconf = &lcore_conf[lcore_params[j].lcore_id];
> +               /* Add rx node patterns of all lcore */
> +               for (i = 0; i < qconf->n_rx_queue; i++) {
> +                       char *node_name = qconf->rx_queue_list[i].node_name;
> +
> +                       graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
> +                       n_rx_node++;
> +                       ret = rte_node_model_generic_set_lcore_affinity(node_name,
> +                                                                       lcore_params[j].lcore_id);
> +                       if (ret == 0)
> +                               printf("Set node %s affinity to lcore %u\n", node_name,
> +                                      lcore_params[j].lcore_id);
> +               }
> +       }
> +
> +       graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
> +       graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
> +
> +       snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> +                main_lcore_id);
> +
> +       /* create main graph */
> +       main_graph_id = rte_graph_create(qconf->name, &graph_conf);
> +       if (main_graph_id == RTE_GRAPH_ID_INVALID)
> +               rte_exit(EXIT_FAILURE,
> +                        "rte_graph_create(): main_graph_id invalid for lcore %u\n",
> +                        main_lcore_id);
> +
> +       qconf->graph_id = main_graph_id;
> +       qconf->graph = rte_graph_lookup(qconf->name);
> +       /* >8 End of graph initialization. */
> +       if (!qconf->graph)
> +               rte_exit(EXIT_FAILURE,
> +                        "rte_graph_lookup(): graph %s not found\n",
> +                        qconf->name);
> +
> +       graph = qconf->graph;
> +       rte_graph_foreach_node(count, off, graph, node_tmp) {
> +               worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
> +
> +               /* Need to set the node Lcore affinity before clone graph for each lcore */
> +               if (node_tmp->lcore_id == RTE_MAX_LCORE) {
> +                       ret = rte_node_model_generic_set_lcore_affinity(node_tmp->name,
> +                                                                       worker_lcore);
> +                       if (ret == 0)
> +                               printf("Set node %s affinity to lcore %u\n",
> +                                      node_tmp->name, worker_lcore);
> +               }
> +       }
> +
> +       worker_lcore = main_lcore_id;
> +       for (int i = 0; i < worker_count; i++) {
> +               worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
> +
> +               qconf = &lcore_conf[worker_lcore];
> +               snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
> +               graph_id = rte_graph_clone(main_graph_id, qconf->name);
> +               ret = rte_graph_bind_core(graph_id, worker_lcore);
> +               if (ret == 0)
> +                       printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
> +
> +               /* full cloned graph name */
> +               snprintf(qconf->name, sizeof(qconf->name), "%s",
> +                        rte_graph_id_to_name(graph_id));
> +               qconf->graph_id = graph_id;
> +               qconf->graph = rte_graph_lookup(qconf->name);
> +               if (!qconf->graph)
> +                       rte_exit(EXIT_FAILURE,
> +                                "Failed to lookup graph %s\n",
> +                                qconf->name);
> +               continue;
> +       }
> +}
> +
> +static void
> +graph_config_rtc(struct rte_graph_param graph_conf)
> +{
> +       uint16_t nb_patterns = graph_conf.nb_node_patterns;
> +       struct lcore_conf *qconf;
> +       rte_graph_t graph_id;
> +       uint32_t lcore_id;
> +       rte_edge_t i;
> +
> +       for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> +               if (rte_lcore_is_enabled(lcore_id) == 0)
> +                       continue;
> +
> +               qconf = &lcore_conf[lcore_id];
> +               /* Skip graph creation if no source exists */
> +               if (!qconf->n_rx_queue)
> +                       continue;
> +               /* Add rx node patterns of this lcore */
> +               for (i = 0; i < qconf->n_rx_queue; i++) {
> +                       graph_conf.node_patterns[nb_patterns + i] =
> +                               qconf->rx_queue_list[i].node_name;
> +               }
> +               graph_conf.nb_node_patterns = nb_patterns + i;
> +               graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
> +               snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> +                        lcore_id);
> +               graph_id = rte_graph_create(qconf->name, &graph_conf);
> +               if (graph_id == RTE_GRAPH_ID_INVALID)
> +                       rte_exit(EXIT_FAILURE,
> +                                "rte_graph_create(): graph_id invalid for lcore %u\n",
> +                                lcore_id);
> +               qconf->graph_id = graph_id;
> +               qconf->graph = rte_graph_lookup(qconf->name);
> +               /* >8 End of graph initialization. */
> +               if (!qconf->graph)
> +                       rte_exit(EXIT_FAILURE,
> +                                "rte_graph_lookup(): graph %s not found\n",
> +                                qconf->name);
> +       }
> +}
> +
>  int
>  main(int argc, char **argv)
>  {
> @@ -759,6 +928,7 @@ main(int argc, char **argv)
>         uint16_t nb_patterns;
>         uint8_t rewrite_len;
>         uint32_t lcore_id;
> +       uint16_t model;
>         int ret;
>
>         /* Init EAL */
> @@ -787,6 +957,9 @@ main(int argc, char **argv)
>         if (check_lcore_params() < 0)
>                 rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
>
> +       if (check_worker_model_params() < 0)
> +               rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
> +
>         ret = init_lcore_rx_queues();
>         if (ret < 0)
>                 rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
> @@ -1026,46 +1199,13 @@ main(int argc, char **argv)
>
>         memset(&graph_conf, 0, sizeof(graph_conf));
>         graph_conf.node_patterns = node_patterns;
> +       graph_conf.nb_node_patterns = nb_patterns;
>
> -       for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> -               rte_graph_t graph_id;
> -               rte_edge_t i;
> -
> -               if (rte_lcore_is_enabled(lcore_id) == 0)
> -                       continue;
> -
> -               qconf = &lcore_conf[lcore_id];
> -
> -               /* Skip graph creation if no source exists */
> -               if (!qconf->n_rx_queue)
> -                       continue;
> -
> -               /* Add rx node patterns of this lcore */
> -               for (i = 0; i < qconf->n_rx_queue; i++) {
> -                       graph_conf.node_patterns[nb_patterns + i] =
> -                               qconf->rx_queue_list[i].node_name;
> -               }
> -
> -               graph_conf.nb_node_patterns = nb_patterns + i;
> -               graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
> -
> -               snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> -                        lcore_id);
> -
> -               graph_id = rte_graph_create(qconf->name, &graph_conf);
> -               if (graph_id == RTE_GRAPH_ID_INVALID)
> -                       rte_exit(EXIT_FAILURE,
> -                                "rte_graph_create(): graph_id invalid"
> -                                " for lcore %u\n", lcore_id);
> -
> -               qconf->graph_id = graph_id;
> -               qconf->graph = rte_graph_lookup(qconf->name);
> -               /* >8 End of graph initialization. */
> -               if (!qconf->graph)
> -                       rte_exit(EXIT_FAILURE,
> -                                "rte_graph_lookup(): graph %s not found\n",
> -                                qconf->name);
> -       }
> +       model = rte_graph_worker_model_get();
> +       if (model == RTE_GRAPH_MODEL_DEFAULT)
> +               graph_config_rtc(graph_conf);
> +       else if (model == RTE_GRAPH_MODEL_GENERIC)
> +               graph_config_generic(graph_conf);
>
>         memset(&rewrite_data, 0, sizeof(rewrite_data));
>         rewrite_len = sizeof(rewrite_data);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 01/13] graph: split graph worker into common and default model
  2023-02-20 13:38   ` Jerin Jacob
@ 2023-02-24  6:29     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:29 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 9:38 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 01/13] graph: split graph worker into common and
> default model
> 
> On Thu, Nov 17, 2022 at 10:39 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > To support multiple graph worker model, split graph into common and
> > default. Naming the current walk function as rte_graph_model_rtc cause
> > the default model is RTC(Run-to-completion).
> 
> There CI issues with this series. Please check
> https://patches.dpdk.org/project/dpdk/patch/20221117050926.136974-2-
> zhirun.yan@intel.com/
> # Please make sure each patch builds with devtools/test-meson-builds.sh #
> Please make sure each patch dont any issue with app/test/test_graph.c test #
> This series dont have perf issues with app/test/test_graph_perf.c # Both RTC
> and new mode runs with l3fwd_graph without any performance regression #
> Please Introduce model concept in documentation at
> doc/guides/prog_guide/graph_lib.rst and details for this generic mode.
> 
> Also update the maintainers files for new model files.
> 
Yes, I will fix the CI issues and update doc, maintainers files in next version.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_model_rtc.h     |  57 ++++
> >  lib/graph/rte_graph_worker.h        | 498 +---------------------------
> >  lib/graph/rte_graph_worker_common.h | 456 +++++++++++++++++++++++++
> 
> 
> Use git mv to avoid loosing history and reduce the diff.

Actually, it is file A -> file B and file C, I will break it to 2 patch to keep the log history.
Got it, Thanks.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 03/13] graph: add macro to walk on graph circular buffer
  2023-02-20 13:45   ` Jerin Jacob
@ 2023-02-24  6:30     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:30 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 9:45 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 03/13] graph: add macro to walk on graph circular buffer
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > It is common to walk on graph circular buffer and use macro to make it
> > reusable for other worker models.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_model_rtc.h     | 23 ++---------------------
> >  lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
> >  2 files changed, 25 insertions(+), 21 deletions(-)
> 
> > +/**
> > + * Macro to walk on the source node(s) ((cir_start - head) ->
> > +cir_start)
> > + * and then on the pending streams
> > + * (cir_start -> (cir_start + mask) -> cir_start)
> > + * in a circular buffer fashion.
> > + *
> > + *     +-----+ <= cir_start - head [number of source nodes]
> > + *     |     |
> > + *     | ... | <= source nodes
> > + *     |     |
> > + *     +-----+ <= cir_start [head = 0] [tail = 0]
> > + *     |     |
> > + *     | ... | <= pending streams
> > + *     |     |
> > + *     +-----+ <= cir_start + mask
> > + */
> > +#define rte_graph_walk_node(graph, head, node)                                         \
> > +       for ((node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]);
> \
> > +            likely((head) != (graph)->tail);                                           \
> > +            (head)++,                                                                  \
> > +            (node) = RTE_PTR_ADD((graph), (graph)->cir_start[(int32_t)(head)]),
> \
> 
> This is an additional assignment compare to original while() based version. Right?
> No need to generalize with performance impact.
Yes, you are right. I will change the macro to use the original while loop.

> 
> 
> > +            (head) = likely((int32_t)(head) > 0) ? (head) &
> > + (graph)->cir_mask : (head))
> > +
> >  /**
> >   * @internal
> >   *
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-20 13:50   ` Jerin Jacob
@ 2023-02-24  6:31     ` Yan, Zhirun
  2023-02-26 22:23       ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:31 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 9:51 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> >  lib/graph/version.map               |  3 ++
> >  3 files changed, 67 insertions(+)
> >
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -1,5 +1,56 @@
> >  #include "rte_graph_model_rtc.h"
> >
> > +static enum rte_graph_worker_model worker_model =
> > +RTE_GRAPH_MODEL_DEFAULT;
> 
> This will break the multiprocess.

Thanks. I will use TLS for per-thread local storage.

> 
> > +
> > +/** Graph worker models */
> > +enum rte_graph_worker_model {
> > +#define WORKER_MODEL_DEFAULT "default"
> 
> Why need strings?
> Also, every symbol in a public header file should start with RTE_ to avoid
> namespace conflict.

It was used to config the model in app. I can put the string into example.

> 
> > +       RTE_GRAPH_MODEL_DEFAULT = 0,
> > +#define WORKER_MODEL_RTC "rtc"
> > +       RTE_GRAPH_MODEL_RTC,
> 
> Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum
> itself.
Yes, will do in next version.

> 
> > +#define WORKER_MODEL_GENERIC "generic"
> 
> Generic is a very overloaded term. Use pipeline here i.e
> RTE_GRAPH_MODEL_PIPELINE

Actually, it's not a purely pipeline mode. I prefer to change to hybrid. 
> 
> 
> > +       RTE_GRAPH_MODEL_GENERIC,
> > +       RTE_GRAPH_MODEL_MAX,
> 
> No need for MAX, it will break the ABI for future. See other subsystem such as
> cryptodev.

Thanks, I will change it.
> 
> > +};
> 
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 05/13] graph: introduce core affinity API
  2023-02-20 14:05   ` Jerin Jacob
@ 2023-02-24  6:32     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:32 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 10:05 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 05/13] graph: introduce core affinity API
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > 1. add lcore_id for node to hold affinity core id.
> > 2. impl rte_node_model_generic_set_lcore_affinity to affinity node
> >    with one lcore.
> > 3. update version map for graph public API.
> 
> No need to explicitly tell 3. Rewrite 1 and 2 , one or two sentence without 1 and
> 2.
> 
Got it. I will change it in next version.

> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph_private.h           |  1 +
> >  lib/graph/meson.build               |  1 +
> >  lib/graph/node.c                    |  1 +
> >  lib/graph/rte_graph_model_generic.c | 31 +++++++++++++++++++++
> > lib/graph/rte_graph_model_generic.h | 43
> +++++++++++++++++++++++++++++
> >  lib/graph/version.map               |  2 ++
> >  6 files changed, 79 insertions(+)
> >  create mode 100644 lib/graph/rte_graph_model_generic.c
> >  create mode 100644 lib/graph/rte_graph_model_generic.h
> >
> > diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> > index f9a85c8926..627090f802 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -49,6 +49,7 @@ struct node {
> >         STAILQ_ENTRY(node) next;      /**< Next node in the list. */
> >         char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
> >         uint64_t flags;               /**< Node configuration flag. */
> > +       unsigned int lcore_id;        /**< Node runs on the Lcore ID */
> >         rte_node_process_t process;   /**< Node process function. */
> >         rte_node_init_t init;         /**< Node init function. */
> >         rte_node_fini_t fini;         /**< Node fini function. */
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > c7327549e8..8c8b11ed27 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -14,6 +14,7 @@ sources = files(
> >          'graph_debug.c',
> >          'graph_stats.c',
> >          'graph_populate.c',
> > +        'rte_graph_model_generic.c',
> >  )
> >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> > diff --git a/lib/graph/node.c b/lib/graph/node.c index
> > fc6345de07..8ad4b3cbeb 100644
> > --- a/lib/graph/node.c
> > +++ b/lib/graph/node.c
> > @@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register
> *reg)
> >                         goto free;
> >         }
> >
> > +       node->lcore_id = RTE_MAX_LCORE;
> >         node->id = node_id++;
> >
> >         /* Add the node at tail */
> > diff --git a/lib/graph/rte_graph_model_generic.c
> > b/lib/graph/rte_graph_model_generic.c
> > new file mode 100644
> > index 0000000000..54ff659c7b
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_model_generic.c
> > @@ -0,0 +1,31 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2022 Intel Corporation  */
> > +
> > +#include "graph_private.h"
> > +#include "rte_graph_model_generic.h"
> > +
> > +int
> > +rte_node_model_generic_set_lcore_affinity(const char *name, unsigned
> > +int lcore_id)
> 
> Please use action/verb as last. Also It is graph specific API. Right?
> I would suggest, rte_graph_model_pipeline_lcore_affinity_set()
> 
Yes, it is graph specific API. I will change in next version. Thanks.

> > diff --git a/lib/graph/rte_graph_model_generic.h
> > b/lib/graph/rte_graph_model_generic.h
> > new file mode 100644
> > index 0000000000..20ca48a9e3
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_model_generic.h
> > @@ -0,0 +1,43 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2022 Intel Corporation  */
> > +
> > +#ifndef _RTE_GRAPH_MODEL_GENERIC_H_
> > +#define _RTE_GRAPH_MODEL_GENERIC_H_
> > +
> > +/**
> > + * @file rte_graph_model_generic.h
> > + *
> > + * @warning
> > + * @b EXPERIMENTAL:
> > + * All functions in this file may be changed or removed without prior notice.
> > + *
> > + * This API allows a worker thread to walk over a graph and nodes to
> > +create,
> > + * process, enqueue and move streams of objects to the next nodes.
> > + */
> > +#include "rte_graph_worker_common.h"
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * Set lcore affinity to the node.
> > + *
> > + * @param name
> > + *   Valid node name. In the case of the cloned node, the name will be
> > + * "parent node name" + "-" + name.
> > + * @param lcore_id
> > + *   The lcore ID value.
> > + *
> > + * @return
> > + *   0 on success, error otherwise.
> > + */
> > +__rte_experimental
> > +int rte_node_model_generic_set_lcore_affinity(const char *name,
> > +unsigned int lcore_id);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_GRAPH_MODEL_GENERIC_H_ */
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > eea73ec9ca..33ff055be6 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -46,5 +46,7 @@ EXPERIMENTAL {
> >         rte_graph_worker_model_set;
> >         rte_graph_worker_model_get;
> >
> > +       rte_node_model_generic_set_lcore_affinity;
> > +
> >         local: *;
> >  };
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 06/13] graph: introduce graph affinity API
  2023-02-20 14:07   ` Jerin Jacob
@ 2023-02-24  6:39     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:39 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 10:07 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 06/13] graph: introduce graph affinity API
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add lcore_id for graph to hold affinity core id where graph would run on.
> > Add bind/unbind API to set/unset graph affinity attribute. lcore_id
> > will be set as MAX by default, it means not enable this attribute.
> >
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> 
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > 33ff055be6..1c599b5b47 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -18,6 +18,8 @@ EXPERIMENTAL {
> >         rte_graph_node_get_by_name;
> >         rte_graph_obj_dump;
> >         rte_graph_walk;
> > +       rte_graph_bind_core;
> 
> if it is not applicable to RTC, please change to
> rte_graph_model_pipeline_core_bind()
> 

It could be used by RTC, and means to bind all nodes to the same core.
But it's not necessary.
I will change it with specific mode name.

> > +       rte_graph_unbind_core;
> >
> >         rte_graph_cluster_stats_create;
> >         rte_graph_cluster_stats_destroy;
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 08/13] graph: introduce stream moving cross cores
  2023-02-20 14:17   ` Jerin Jacob
@ 2023-02-24  6:48     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:48 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 10:17 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 08/13] graph: introduce stream moving cross cores
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > This patch introduces key functions to allow a worker thread to enable
> > enqueue and move streams of objects to the next nodes over different
> > cores.
> >
> > 1. add graph_sched_wq_node to hold graph scheduling workqueue node
> > stream 2. add workqueue help functions to
> > create/destroy/enqueue/dequeue
> 
> Two things, make as two patches
> 
I will do in next version. 

> 
> > @@ -39,6 +46,15 @@ struct rte_graph {
> >         uint32_t cir_mask;           /**< Circular buffer wrap around mask. */
> >         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
> >         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer.
> > */
> > +       /* Graph schedule */
> > +       struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
> > +       struct rte_graph_rq_head rq_head; /* The head for run-queue
> > + list */
> > +
> > +       SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
> > +       unsigned int lcore_id;  /**< The graph running Lcore. */
> > +       struct rte_ring *wq;    /**< The work-queue for pending streams. */
> > +       struct rte_mempool *mp; /**< The mempool for scheduling streams. */
> > +       /* Graph schedule area */
> >         rte_graph_off_t nodes_start; /**< Offset at which node memory starts.
> */
> >         rte_graph_t id; /**< Graph identifier. */
> >         int socket;     /**< Socket ID where memory is allocated. */
> > @@ -63,6 +79,8 @@ struct rte_node {
> >         char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
> >         char name[RTE_NODE_NAMESIZE];   /**< Name of the node. */
> >
> > +       /* Fast schedule area */
> > +       unsigned int lcore_id __rte_cache_aligned;  /**< Node running
> > + Lcore. */
> 
> Do we need __rte_cache_aligned here? I am wondering can we add union for
> different model specific area ONLY for fast path so that we can save memory
> and fast path data will be more warm.

Maybe it is not necessary. I agree with you and I can use union to cover the specific field.

> 
> >         /* Fast path area  */
> >  #define RTE_NODE_CTX_SZ 16
> >         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node
> > Context. */
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model
  2023-02-20 14:20   ` Jerin Jacob
@ 2023-02-24  6:49     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-02-24  6:49 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 10:20 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker
> model
> 
> On Thu, Nov 17, 2022 at 10:41 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new parameter "model" to choose generic or rtc worker model.
> > And in generic model, the node will affinity to worker core successively.
> >
> > Note:
> > only support one RX node for remote model in current implementation.
> >
> > ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > --model="generic"
> 
> Patch apply issue, please rebase with main.
> See https://patches.dpdk.org/project/dpdk/patch/20221117050926.136974-14-
> zhirun.yan@intel.com/
> 
Will fix in next version. Thanks for your comments.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  examples/l3fwd-graph/main.c | 218
> > +++++++++++++++++++++++++++++-------
> >  1 file changed, 179 insertions(+), 39 deletions(-)
> >
> > diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> > index 6dcb6ee92b..c145a3e3e8 100644
> > --- a/examples/l3fwd-graph/main.c
> > +++ b/examples/l3fwd-graph/main.c
> > @@ -147,6 +147,19 @@ static struct ipv4_l3fwd_lpm_route
> ipv4_l3fwd_lpm_route_array[] = {
> >         {RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0),
> > 24, 7},  };
> >
> > +static int
> > +check_worker_model_params(void)
> > +{
> > +       if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_GENERIC &&
> > +           nb_lcore_params > 1) {
> > +               printf("Exceeded max number of lcore params for remote
> model: %hu\n",
> > +                      nb_lcore_params);
> > +               return -1;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int
> >  check_lcore_params(void)
> >  {
> > @@ -291,6 +304,20 @@ parse_max_pkt_len(const char *pktlen)
> >         return len;
> >  }
> >
> > +static int
> > +parse_worker_model(const char *model) {
> > +       if (strcmp(model, WORKER_MODEL_DEFAULT) == 0)
> > +               return RTE_GRAPH_MODEL_DEFAULT;
> > +       else if (strcmp(model, WORKER_MODEL_GENERIC) == 0) {
> > +               rte_graph_worker_model_set(RTE_GRAPH_MODEL_GENERIC);
> > +               return RTE_GRAPH_MODEL_GENERIC;
> > +       }
> > +       rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
> > +
> > +       return RTE_GRAPH_MODEL_MAX;
> > +}
> > +
> >  static int
> >  parse_portmask(const char *portmask)
> >  {
> > @@ -404,6 +431,7 @@ static const char short_options[] = "p:" /* portmask */
> >  #define CMD_LINE_OPT_NO_NUMA      "no-numa"
> >  #define CMD_LINE_OPT_MAX_PKT_LEN   "max-pkt-len"
> >  #define CMD_LINE_OPT_PER_PORT_POOL "per-port-pool"
> > +#define CMD_LINE_OPT_WORKER_MODEL  "model"
> >  enum {
> >         /* Long options mapped to a short option */
> >
> > @@ -416,6 +444,7 @@ enum {
> >         CMD_LINE_OPT_NO_NUMA_NUM,
> >         CMD_LINE_OPT_MAX_PKT_LEN_NUM,
> >         CMD_LINE_OPT_PARSE_PER_PORT_POOL,
> > +       CMD_LINE_OPT_WORKER_MODEL_TYPE,
> >  };
> >
> >  static const struct option lgopts[] = { @@ -424,6 +453,7 @@ static
> > const struct option lgopts[] = {
> >         {CMD_LINE_OPT_NO_NUMA, 0, 0, CMD_LINE_OPT_NO_NUMA_NUM},
> >         {CMD_LINE_OPT_MAX_PKT_LEN, 1, 0,
> CMD_LINE_OPT_MAX_PKT_LEN_NUM},
> >         {CMD_LINE_OPT_PER_PORT_POOL, 0, 0,
> > CMD_LINE_OPT_PARSE_PER_PORT_POOL},
> > +       {CMD_LINE_OPT_WORKER_MODEL, 1, 0,
> > + CMD_LINE_OPT_WORKER_MODEL_TYPE},
> >         {NULL, 0, 0, 0},
> >  };
> >
> > @@ -498,6 +528,11 @@ parse_args(int argc, char **argv)
> >                         per_port_pool = 1;
> >                         break;
> >
> > +               case CMD_LINE_OPT_WORKER_MODEL_TYPE:
> > +                       printf("Use new worker model: %s\n", optarg);
> > +                       parse_worker_model(optarg);
> > +                       break;
> > +
> >                 default:
> >                         print_usage(prgname);
> >                         return -1;
> > @@ -735,6 +770,140 @@ config_port_max_pkt_len(struct rte_eth_conf
> *conf,
> >         return 0;
> >  }
> >
> > +static void
> > +graph_config_generic(struct rte_graph_param graph_conf) {
> > +       uint16_t nb_patterns = graph_conf.nb_node_patterns;
> > +       int worker_count = rte_lcore_count() - 1;
> > +       int main_lcore_id = rte_get_main_lcore();
> > +       int worker_lcore = main_lcore_id;
> > +       rte_graph_t main_graph_id = 0;
> > +       struct rte_node *node_tmp;
> > +       struct lcore_conf *qconf;
> > +       struct rte_graph *graph;
> > +       rte_graph_t graph_id;
> > +       rte_graph_off_t off;
> > +       int n_rx_node = 0;
> > +       rte_node_t count;
> > +       rte_edge_t i;
> > +       int ret;
> > +
> > +       for (int j = 0; j < nb_lcore_params; j++) {
> > +               qconf = &lcore_conf[lcore_params[j].lcore_id];
> > +               /* Add rx node patterns of all lcore */
> > +               for (i = 0; i < qconf->n_rx_queue; i++) {
> > +                       char *node_name =
> > + qconf->rx_queue_list[i].node_name;
> > +
> > +                       graph_conf.node_patterns[nb_patterns + n_rx_node + i] =
> node_name;
> > +                       n_rx_node++;
> > +                       ret = rte_node_model_generic_set_lcore_affinity(node_name,
> > +                                                                       lcore_params[j].lcore_id);
> > +                       if (ret == 0)
> > +                               printf("Set node %s affinity to lcore %u\n", node_name,
> > +                                      lcore_params[j].lcore_id);
> > +               }
> > +       }
> > +
> > +       graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
> > +       graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
> > +
> > +       snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> > +                main_lcore_id);
> > +
> > +       /* create main graph */
> > +       main_graph_id = rte_graph_create(qconf->name, &graph_conf);
> > +       if (main_graph_id == RTE_GRAPH_ID_INVALID)
> > +               rte_exit(EXIT_FAILURE,
> > +                        "rte_graph_create(): main_graph_id invalid for lcore %u\n",
> > +                        main_lcore_id);
> > +
> > +       qconf->graph_id = main_graph_id;
> > +       qconf->graph = rte_graph_lookup(qconf->name);
> > +       /* >8 End of graph initialization. */
> > +       if (!qconf->graph)
> > +               rte_exit(EXIT_FAILURE,
> > +                        "rte_graph_lookup(): graph %s not found\n",
> > +                        qconf->name);
> > +
> > +       graph = qconf->graph;
> > +       rte_graph_foreach_node(count, off, graph, node_tmp) {
> > +               worker_lcore = rte_get_next_lcore(worker_lcore, true,
> > + 1);
> > +
> > +               /* Need to set the node Lcore affinity before clone graph for each
> lcore */
> > +               if (node_tmp->lcore_id == RTE_MAX_LCORE) {
> > +                       ret = rte_node_model_generic_set_lcore_affinity(node_tmp-
> >name,
> > +                                                                       worker_lcore);
> > +                       if (ret == 0)
> > +                               printf("Set node %s affinity to lcore %u\n",
> > +                                      node_tmp->name, worker_lcore);
> > +               }
> > +       }
> > +
> > +       worker_lcore = main_lcore_id;
> > +       for (int i = 0; i < worker_count; i++) {
> > +               worker_lcore = rte_get_next_lcore(worker_lcore, true,
> > + 1);
> > +
> > +               qconf = &lcore_conf[worker_lcore];
> > +               snprintf(qconf->name, sizeof(qconf->name), "cloned-%u",
> worker_lcore);
> > +               graph_id = rte_graph_clone(main_graph_id, qconf->name);
> > +               ret = rte_graph_bind_core(graph_id, worker_lcore);
> > +               if (ret == 0)
> > +                       printf("bind graph %d to lcore %u\n",
> > + graph_id, worker_lcore);
> > +
> > +               /* full cloned graph name */
> > +               snprintf(qconf->name, sizeof(qconf->name), "%s",
> > +                        rte_graph_id_to_name(graph_id));
> > +               qconf->graph_id = graph_id;
> > +               qconf->graph = rte_graph_lookup(qconf->name);
> > +               if (!qconf->graph)
> > +                       rte_exit(EXIT_FAILURE,
> > +                                "Failed to lookup graph %s\n",
> > +                                qconf->name);
> > +               continue;
> > +       }
> > +}
> > +
> > +static void
> > +graph_config_rtc(struct rte_graph_param graph_conf) {
> > +       uint16_t nb_patterns = graph_conf.nb_node_patterns;
> > +       struct lcore_conf *qconf;
> > +       rte_graph_t graph_id;
> > +       uint32_t lcore_id;
> > +       rte_edge_t i;
> > +
> > +       for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> > +               if (rte_lcore_is_enabled(lcore_id) == 0)
> > +                       continue;
> > +
> > +               qconf = &lcore_conf[lcore_id];
> > +               /* Skip graph creation if no source exists */
> > +               if (!qconf->n_rx_queue)
> > +                       continue;
> > +               /* Add rx node patterns of this lcore */
> > +               for (i = 0; i < qconf->n_rx_queue; i++) {
> > +                       graph_conf.node_patterns[nb_patterns + i] =
> > +                               qconf->rx_queue_list[i].node_name;
> > +               }
> > +               graph_conf.nb_node_patterns = nb_patterns + i;
> > +               graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
> > +               snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> > +                        lcore_id);
> > +               graph_id = rte_graph_create(qconf->name, &graph_conf);
> > +               if (graph_id == RTE_GRAPH_ID_INVALID)
> > +                       rte_exit(EXIT_FAILURE,
> > +                                "rte_graph_create(): graph_id invalid for lcore %u\n",
> > +                                lcore_id);
> > +               qconf->graph_id = graph_id;
> > +               qconf->graph = rte_graph_lookup(qconf->name);
> > +               /* >8 End of graph initialization. */
> > +               if (!qconf->graph)
> > +                       rte_exit(EXIT_FAILURE,
> > +                                "rte_graph_lookup(): graph %s not found\n",
> > +                                qconf->name);
> > +       }
> > +}
> > +
> >  int
> >  main(int argc, char **argv)
> >  {
> > @@ -759,6 +928,7 @@ main(int argc, char **argv)
> >         uint16_t nb_patterns;
> >         uint8_t rewrite_len;
> >         uint32_t lcore_id;
> > +       uint16_t model;
> >         int ret;
> >
> >         /* Init EAL */
> > @@ -787,6 +957,9 @@ main(int argc, char **argv)
> >         if (check_lcore_params() < 0)
> >                 rte_exit(EXIT_FAILURE, "check_lcore_params()
> > failed\n");
> >
> > +       if (check_worker_model_params() < 0)
> > +               rte_exit(EXIT_FAILURE, "check_worker_model_params()
> > + failed\n");
> > +
> >         ret = init_lcore_rx_queues();
> >         if (ret < 0)
> >                 rte_exit(EXIT_FAILURE, "init_lcore_rx_queues()
> > failed\n"); @@ -1026,46 +1199,13 @@ main(int argc, char **argv)
> >
> >         memset(&graph_conf, 0, sizeof(graph_conf));
> >         graph_conf.node_patterns = node_patterns;
> > +       graph_conf.nb_node_patterns = nb_patterns;
> >
> > -       for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> > -               rte_graph_t graph_id;
> > -               rte_edge_t i;
> > -
> > -               if (rte_lcore_is_enabled(lcore_id) == 0)
> > -                       continue;
> > -
> > -               qconf = &lcore_conf[lcore_id];
> > -
> > -               /* Skip graph creation if no source exists */
> > -               if (!qconf->n_rx_queue)
> > -                       continue;
> > -
> > -               /* Add rx node patterns of this lcore */
> > -               for (i = 0; i < qconf->n_rx_queue; i++) {
> > -                       graph_conf.node_patterns[nb_patterns + i] =
> > -                               qconf->rx_queue_list[i].node_name;
> > -               }
> > -
> > -               graph_conf.nb_node_patterns = nb_patterns + i;
> > -               graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
> > -
> > -               snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
> > -                        lcore_id);
> > -
> > -               graph_id = rte_graph_create(qconf->name, &graph_conf);
> > -               if (graph_id == RTE_GRAPH_ID_INVALID)
> > -                       rte_exit(EXIT_FAILURE,
> > -                                "rte_graph_create(): graph_id invalid"
> > -                                " for lcore %u\n", lcore_id);
> > -
> > -               qconf->graph_id = graph_id;
> > -               qconf->graph = rte_graph_lookup(qconf->name);
> > -               /* >8 End of graph initialization. */
> > -               if (!qconf->graph)
> > -                       rte_exit(EXIT_FAILURE,
> > -                                "rte_graph_lookup(): graph %s not found\n",
> > -                                qconf->name);
> > -       }
> > +       model = rte_graph_worker_model_get();
> > +       if (model == RTE_GRAPH_MODEL_DEFAULT)
> > +               graph_config_rtc(graph_conf);
> > +       else if (model == RTE_GRAPH_MODEL_GENERIC)
> > +               graph_config_generic(graph_conf);
> >
> >         memset(&rewrite_data, 0, sizeof(rewrite_data));
> >         rewrite_len = sizeof(rewrite_data);
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-24  6:31     ` Yan, Zhirun
@ 2023-02-26 22:23       ` Jerin Jacob
  2023-03-02  8:38         ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-02-26 22:23 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue

On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, February 20, 2023 9:51 PM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>
> > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> >
> > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > Add new get/set APIs to configure graph worker model which is used to
> > > determine which model will be chosen.
> > >
> > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > ---
> > >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > >  lib/graph/version.map               |  3 ++
> > >  3 files changed, 67 insertions(+)
> > >
> > > diff --git a/lib/graph/rte_graph_worker.h
> > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153 100644
> > > --- a/lib/graph/rte_graph_worker.h
> > > +++ b/lib/graph/rte_graph_worker.h
> > > @@ -1,5 +1,56 @@
> > >  #include "rte_graph_model_rtc.h"
> > >
> > > +static enum rte_graph_worker_model worker_model =
> > > +RTE_GRAPH_MODEL_DEFAULT;
> >
> > This will break the multiprocess.
>
> Thanks. I will use TLS for per-thread local storage.

If it needs to be used from secondary process, then it needs to be from memzone.



>
> >
> > > +
> > > +/** Graph worker models */
> > > +enum rte_graph_worker_model {
> > > +#define WORKER_MODEL_DEFAULT "default"
> >
> > Why need strings?
> > Also, every symbol in a public header file should start with RTE_ to avoid
> > namespace conflict.
>
> It was used to config the model in app. I can put the string into example.

OK

>
> >
> > > +       RTE_GRAPH_MODEL_DEFAULT = 0,
> > > +#define WORKER_MODEL_RTC "rtc"
> > > +       RTE_GRAPH_MODEL_RTC,
> >
> > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum
> > itself.
> Yes, will do in next version.
>
> >
> > > +#define WORKER_MODEL_GENERIC "generic"
> >
> > Generic is a very overloaded term. Use pipeline here i.e
> > RTE_GRAPH_MODEL_PIPELINE
>
> Actually, it's not a purely pipeline mode. I prefer to change to hybrid.

Hybrid is very overloaded term, and it will be confusing (considering
there will be new models in future).
Please pick a word that really express the model working.

> >
> >
> > > +       RTE_GRAPH_MODEL_GENERIC,
> > > +       RTE_GRAPH_MODEL_MAX,
> >
> > No need for MAX, it will break the ABI for future. See other subsystem such as
> > cryptodev.
>
> Thanks, I will change it.
> >
> > > +};
> >
> > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-26 22:23       ` Jerin Jacob
@ 2023-03-02  8:38         ` Yan, Zhirun
  2023-03-02 13:58           ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-03-02  8:38 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 27, 2023 6:23 AM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, February 20, 2023 9:51 PM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com>
> wrote:
> > > >
> > > > Add new get/set APIs to configure graph worker model which is used
> > > > to determine which model will be chosen.
> > > >
> > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > ---
> > > >  lib/graph/rte_graph_worker.h        | 51
> +++++++++++++++++++++++++++++
> > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > >  lib/graph/version.map               |  3 ++
> > > >  3 files changed, 67 insertions(+)
> > > >
> > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> 100644
> > > > --- a/lib/graph/rte_graph_worker.h
> > > > +++ b/lib/graph/rte_graph_worker.h
> > > > @@ -1,5 +1,56 @@
> > > >  #include "rte_graph_model_rtc.h"
> > > >
> > > > +static enum rte_graph_worker_model worker_model =
> > > > +RTE_GRAPH_MODEL_DEFAULT;
> > >
> > > This will break the multiprocess.
> >
> > Thanks. I will use TLS for per-thread local storage.
> 
> If it needs to be used from secondary process, then it needs to be from
> memzone.
> 


This filed will be set by primary process in initial stage, and then lcore will only read it.
I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It seems
not necessary to allocate from memzone.

> 
> 
> >
> > >
> > > > +
> > > > +/** Graph worker models */
> > > > +enum rte_graph_worker_model {
> > > > +#define WORKER_MODEL_DEFAULT "default"
> > >
> > > Why need strings?
> > > Also, every symbol in a public header file should start with RTE_ to
> > > avoid namespace conflict.
> >
> > It was used to config the model in app. I can put the string into example.
> 
> OK
> 
> >
> > >
> > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define WORKER_MODEL_RTC
> > > > +"rtc"
> > > > +       RTE_GRAPH_MODEL_RTC,
> > >
> > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> enum
> > > itself.
> > Yes, will do in next version.
> >
> > >
> > > > +#define WORKER_MODEL_GENERIC "generic"
> > >
> > > Generic is a very overloaded term. Use pipeline here i.e
> > > RTE_GRAPH_MODEL_PIPELINE
> >
> > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> 
> Hybrid is very overloaded term, and it will be confusing (considering there
> will be new models in future).
> Please pick a word that really express the model working.
> 

In this case, the path is Node0 -> Node1 -> Node2 -> Node3
And Node1 and Node3 are binding with one core.

Our model offers the ability to dispatch between cores.

Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?

+ - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
'  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
'            '     '                          '     '            '
' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
'            '     '     |                    '     '      ^     '
+ - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                         |                                 |
                         + - - - - - - - - - - - - - - - - +


> > >
> > >
> > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > +       RTE_GRAPH_MODEL_MAX,
> > >
> > > No need for MAX, it will break the ABI for future. See other
> > > subsystem such as cryptodev.
> >
> > Thanks, I will change it.
> > >
> > > > +};
> > >
> > > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-03-02  8:38         ` Yan, Zhirun
@ 2023-03-02 13:58           ` Jerin Jacob
  2023-03-07  8:26             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-03-02 13:58 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue

On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, February 27, 2023 6:23 AM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > Wang, Haiyue <haiyue.wang@intel.com>
> > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> >
> > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > > Wang, Haiyue <haiyue.wang@intel.com>
> > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > > APIs
> > > >
> > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com>
> > wrote:
> > > > >
> > > > > Add new get/set APIs to configure graph worker model which is used
> > > > > to determine which model will be chosen.
> > > > >
> > > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > > ---
> > > > >  lib/graph/rte_graph_worker.h        | 51
> > +++++++++++++++++++++++++++++
> > > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > > >  lib/graph/version.map               |  3 ++
> > > > >  3 files changed, 67 insertions(+)
> > > > >
> > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > 100644
> > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > @@ -1,5 +1,56 @@
> > > > >  #include "rte_graph_model_rtc.h"
> > > > >
> > > > > +static enum rte_graph_worker_model worker_model =
> > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > >
> > > > This will break the multiprocess.
> > >
> > > Thanks. I will use TLS for per-thread local storage.
> >
> > If it needs to be used from secondary process, then it needs to be from
> > memzone.
> >
>
>
> This filed will be set by primary process in initial stage, and then lcore will only read it.
> I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It seems
> not necessary to allocate from memzone.
>
> >
> >
> > >
> > > >
> > > > > +
> > > > > +/** Graph worker models */
> > > > > +enum rte_graph_worker_model {
> > > > > +#define WORKER_MODEL_DEFAULT "default"
> > > >
> > > > Why need strings?
> > > > Also, every symbol in a public header file should start with RTE_ to
> > > > avoid namespace conflict.
> > >
> > > It was used to config the model in app. I can put the string into example.
> >
> > OK
> >
> > >
> > > >
> > > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define WORKER_MODEL_RTC
> > > > > +"rtc"
> > > > > +       RTE_GRAPH_MODEL_RTC,
> > > >
> > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > enum
> > > > itself.
> > > Yes, will do in next version.
> > >
> > > >
> > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > >
> > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > RTE_GRAPH_MODEL_PIPELINE
> > >
> > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> >
> > Hybrid is very overloaded term, and it will be confusing (considering there
> > will be new models in future).
> > Please pick a word that really express the model working.
> >
>
> In this case, the path is Node0 -> Node1 -> Node2 -> Node3
> And Node1 and Node3 are binding with one core.
>
> Our model offers the ability to dispatch between cores.
>
> Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?

Some names, What I can think of

// MCORE->MULTI CORE

RTE_GRAPH_MODEL_MCORE_PIPELINE
or
RTE_GRAG_MODEL_MCORE_DISPATCH
or
RTE_GRAG_MODEL_MCORE_RING
or
RTE_GRAPH_MODEL_MULTI_CORE

>
> + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> '  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
> '            '     '                          '     '            '
> ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> '            '     '     |                    '     '      ^     '
> + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
>                          |                                 |
>                          + - - - - - - - - - - - - - - - - +
>
>
> > > >
> > > >
> > > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > > +       RTE_GRAPH_MODEL_MAX,
> > > >
> > > > No need for MAX, it will break the ABI for future. See other
> > > > subsystem such as cryptodev.
> > >
> > > Thanks, I will change it.
> > > >
> > > > > +};
> > > >
> > > > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-03-02 13:58           ` Jerin Jacob
@ 2023-03-07  8:26             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-03-07  8:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, March 2, 2023 9:58 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, February 27, 2023 6:23 AM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > > > ndabilpuram@marvell.com; Liang, Cunming
> > > > > <cunming.liang@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> > > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker
> > > > > model APIs
> > > > >
> > > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan
> > > > > <zhirun.yan@intel.com>
> > > wrote:
> > > > > >
> > > > > > Add new get/set APIs to configure graph worker model which is
> > > > > > used to determine which model will be chosen.
> > > > > >
> > > > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > > > ---
> > > > > >  lib/graph/rte_graph_worker.h        | 51
> > > +++++++++++++++++++++++++++++
> > > > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > > > >  lib/graph/version.map               |  3 ++
> > > > > >  3 files changed, 67 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > > 100644
> > > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > > @@ -1,5 +1,56 @@
> > > > > >  #include "rte_graph_model_rtc.h"
> > > > > >
> > > > > > +static enum rte_graph_worker_model worker_model =
> > > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > > >
> > > > > This will break the multiprocess.
> > > >
> > > > Thanks. I will use TLS for per-thread local storage.
> > >
> > > If it needs to be used from secondary process, then it needs to be
> > > from memzone.
> > >
> >
> >
> > This filed will be set by primary process in initial stage, and then lcore will only
> read it.
> > I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It
> > seems not necessary to allocate from memzone.
> >
> > >
> > >
> > > >
> > > > >
> > > > > > +
> > > > > > +/** Graph worker models */
> > > > > > +enum rte_graph_worker_model { #define WORKER_MODEL_DEFAULT
> > > > > > +"default"
> > > > >
> > > > > Why need strings?
> > > > > Also, every symbol in a public header file should start with
> > > > > RTE_ to avoid namespace conflict.
> > > >
> > > > It was used to config the model in app. I can put the string into example.
> > >
> > > OK
> > >
> > > >
> > > > >
> > > > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define
> WORKER_MODEL_RTC
> > > > > > +"rtc"
> > > > > > +       RTE_GRAPH_MODEL_RTC,
> > > > >
> > > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > > enum
> > > > > itself.
> > > > Yes, will do in next version.
> > > >
> > > > >
> > > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > > >
> > > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > > RTE_GRAPH_MODEL_PIPELINE
> > > >
> > > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> > >
> > > Hybrid is very overloaded term, and it will be confusing
> > > (considering there will be new models in future).
> > > Please pick a word that really express the model working.
> > >
> >
> > In this case, the path is Node0 -> Node1 -> Node2 -> Node3 And Node1
> > and Node3 are binding with one core.
> >
> > Our model offers the ability to dispatch between cores.
> >
> > Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?
> 
> Some names, What I can think of
> 
> // MCORE->MULTI CORE
> 
> RTE_GRAPH_MODEL_MCORE_PIPELINE
> or
> RTE_GRAG_MODEL_MCORE_DISPATCH
> or
> RTE_GRAG_MODEL_MCORE_RING
> or
> RTE_GRAPH_MODEL_MULTI_CORE
> 

Thanks, I will use RTE_GRAG_MODEL_MCORE_DISPATCH as the name.

> >
> > + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> > '  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
> > '            '     '                          '     '            '
> > ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> > ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > '            '     '     |                    '     '      ^     '
> > + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> >                          |                                 |
> >                          + - - - - - - - - - - - - - - - - +
> >
> >
> > > > >
> > > > >
> > > > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > > > +       RTE_GRAPH_MODEL_MAX,
> > > > >
> > > > > No need for MAX, it will break the ABI for future. See other
> > > > > subsystem such as cryptodev.
> > > >
> > > > Thanks, I will change it.
> > > > >
> > > > > > +};
> > > > >
> > > > > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 00/15] graph enhancement for multi-core dispatch
  2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (13 preceding siblings ...)
  2023-02-20  0:22 ` [PATCH v1 00/13] graph enhancement for multi-core dispatch Thomas Monjalon
@ 2023-03-24  2:16 ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 01/15] graph: rename rte_graph_work as common Zhirun Yan
                     ` (15 more replies)
  14 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V2:
Use git mv to keep git history for patch 1,2.
Use TLS for per-thread local storage about model setting in patch 4.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch for patch 8,9.
Fix CI build issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.


Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf


Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for corss-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 237 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  44 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  44 +++
 lib/graph/rte_graph_model_dispatch.c | 179 +++++++++
 lib/graph/rte_graph_model_dispatch.h | 120 ++++++
 lib/graph/rte_graph_model_rtc.h      |  45 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 498 +------------------------
 lib/graph/rte_graph_worker_common.h  | 536 +++++++++++++++++++++++++++
 lib/graph/version.map                |   8 +
 18 files changed, 1546 insertions(+), 544 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 01/15] graph: rename rte_graph_work as common
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 02/15] graph: split graph worker into common and default model Zhirun Yan
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1a33ad8592..2608afba7b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1715,6 +1715,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..f08dbc7e9d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 02/15] graph: split graph worker into common and default model
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 03/15] graph: move node process into inline function Zhirun Yan
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 61 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 ---------------------------
 6 files changed, 98 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f08dbc7e9d..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..665560f831
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..7ea18ba80a
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 03/15] graph: move node process into inline function
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 665560f831..0dcb7151e9 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 04/15] graph: add get/set graph worker model APIs
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (2 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 05/15] graph: introduce graph node core affinity API Zhirun Yan
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 16 +++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 74 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..692ee1b0d2
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+inline int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..64d777bd5f 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,14 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
 /**
  * @internal
  *
@@ -490,6 +498,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 05/15] graph: introduce graph node core affinity API
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (3 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 06/15] graph: introduce graph bind unbind API Zhirun Yan
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 31 ++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..409eed3284 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -50,6 +50,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..4a2f99496d
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 06/15] graph: introduce graph bind unbind API
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (4 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a839a2803b..b39a99aac6 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 409eed3284..ad1d058945 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 07/15] graph: introduce graph clone API for other worker core
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (5 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 08/15] graph: add struct for stream moving between cores Zhirun Yan
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b39a99aac6..90eaad0378 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ad1d058945..d28a5af93e 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 08/15] graph: add struct for stream moving between cores
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (6 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 09/15] graph: introduce stream moving cross cores Zhirun Yan
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 90eaad0378..dd3d69dbf7 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d28a5af93e..b66b18ebbc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -60,6 +60,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 64d777bd5f..70cfde7015 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -40,6 +47,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -73,6 +89,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 09/15] graph: introduce stream moving cross cores
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (7 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  27 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph_model_dispatch.c | 145 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  35 +++++++
 4 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index b66b18ebbc..e1a2a4bfd8 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4a2f99496d..b46dd156ac 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void __rte_noinline
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..7cbdf2fdcf 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,47 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+bool __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+void __rte_noinline __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (8 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index dd3d69dbf7..1f1ee9b622 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (9 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 7cbdf2fdcf..764c4ecfd0 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -71,6 +71,48 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 12/15] graph: enable graph multicore dispatch scheduler model
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (10 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 13/15] graph: add stats for corss-core dispatching Zhirun Yan
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 7ea18ba80a..d608c7513e 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -10,6 +10,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -24,7 +25,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 13/15] graph: add stats for corss-core dispatching
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (11 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..aa22cc403c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..7d77a790ac 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index b46dd156ac..4cf00160ea 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 70cfde7015..be8508cd83 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -94,6 +94,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (12 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 13/15] graph: add stats for corss-core dispatching Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-24  2:16   ` [PATCH v2 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 237 +++++++++++++++++++++++++++++-------
 1 file changed, 195 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..cfa78003f4 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,141 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	/* >8 End of graph initialization. */
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		/* >8 End of graph initialization. */
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +987,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1021,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1263,18 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1325,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v2 15/15] doc: update multicore dispatch model in graph guides
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (13 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-03-24  2:16   ` Zhirun Yan
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-24  2:16 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..e3c0d652e4 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walk model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 00/15] graph enhancement for multi-core dispatch
  2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
                     ` (14 preceding siblings ...)
  2023-03-24  2:16   ` [PATCH v2 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-03-29  6:43   ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 01/15] graph: rename rte_graph_work as common Zhirun Yan
                       ` (15 more replies)
  15 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.


Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf


Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for corss-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 237 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  44 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  44 +++
 lib/graph/rte_graph_model_dispatch.c | 179 +++++++++
 lib/graph/rte_graph_model_dispatch.h | 120 ++++++
 lib/graph/rte_graph_model_rtc.h      |  45 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 498 +------------------------
 lib/graph/rte_graph_worker_common.h  | 536 +++++++++++++++++++++++++++
 lib/graph/version.map                |   8 +
 18 files changed, 1546 insertions(+), 544 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 01/15] graph: rename rte_graph_work as common
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 02/15] graph: split graph worker into common and default model Zhirun Yan
                       ` (14 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 280058adfc..9d9467dd00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..f08dbc7e9d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 02/15] graph: split graph worker into common and default model
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 03/15] graph: move node process into inline function Zhirun Yan
                       ` (13 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 61 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 ---------------------------
 6 files changed, 98 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f08dbc7e9d..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..665560f831
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..7ea18ba80a
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 03/15] graph: move node process into inline function
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29 15:34       ` Stephen Hemminger
  2023-03-29  6:43     ` [PATCH v3 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                       ` (12 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 665560f831..0dcb7151e9 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 04/15] graph: add get/set graph worker model APIs
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (2 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29 15:35       ` Stephen Hemminger
  2023-03-29  6:43     ` [PATCH v3 05/15] graph: introduce graph node core affinity API Zhirun Yan
                       ` (11 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 77 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..692ee1b0d2
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+inline int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..1526da6e2c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -19,6 +19,7 @@
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
+#include <rte_per_lcore.h>
 #include <rte_prefetch.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
@@ -95,6 +96,16 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
+RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model);
+
 /**
  * @internal
  *
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 05/15] graph: introduce graph node core affinity API
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (3 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 06/15] graph: introduce graph bind unbind API Zhirun Yan
                       ` (10 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 31 ++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..409eed3284 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -50,6 +50,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..4a2f99496d
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 06/15] graph: introduce graph bind unbind API
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (4 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                       ` (9 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a839a2803b..b39a99aac6 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 409eed3284..ad1d058945 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 07/15] graph: introduce graph clone API for other worker core
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (5 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 08/15] graph: add struct for stream moving between cores Zhirun Yan
                       ` (8 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b39a99aac6..90eaad0378 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ad1d058945..d28a5af93e 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 08/15] graph: add struct for stream moving between cores
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (6 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 09/15] graph: introduce stream moving cross cores Zhirun Yan
                       ` (7 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 90eaad0378..dd3d69dbf7 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d28a5af93e..b66b18ebbc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -60,6 +60,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 1526da6e2c..dc0a0b5554 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -30,6 +30,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -41,6 +48,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -74,6 +90,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 09/15] graph: introduce stream moving cross cores
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (7 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                       ` (6 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  27 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph_model_dispatch.c | 145 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  35 +++++++
 4 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index b66b18ebbc..e1a2a4bfd8 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4a2f99496d..b46dd156ac 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void __rte_noinline
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..7cbdf2fdcf 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,47 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+bool __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+void __rte_noinline __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (8 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                       ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index dd3d69dbf7..1f1ee9b622 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (9 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                       ` (4 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 7cbdf2fdcf..764c4ecfd0 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -71,6 +71,48 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 12/15] graph: enable graph multicore dispatch scheduler model
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (10 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 13/15] graph: add stats for cross-core dispatching Zhirun Yan
                       ` (3 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 7ea18ba80a..d608c7513e 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -10,6 +10,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -24,7 +25,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 13/15] graph: add stats for cross-core dispatching
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (11 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                       ` (2 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..aa22cc403c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..7d77a790ac 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index b46dd156ac..4cf00160ea 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index dc0a0b5554..d94983589c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (12 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-29  6:43     ` [PATCH v3 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 237 +++++++++++++++++++++++++++++-------
 1 file changed, 195 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..cfa78003f4 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,141 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	/* >8 End of graph initialization. */
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		/* >8 End of graph initialization. */
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +987,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1021,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1263,18 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1325,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v3 15/15] doc: update multicore dispatch model in graph guides
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (13 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-03-29  6:43     ` Zhirun Yan
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-29  6:43 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..72e26f3a5a 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walking model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v3 03/15] graph: move node process into inline function
  2023-03-29  6:43     ` [PATCH v3 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-29 15:34       ` Stephen Hemminger
  2023-03-29 15:41         ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Stephen Hemminger @ 2023-03-29 15:34 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Wed, 29 Mar 2023 15:43:28 +0900
Zhirun Yan <zhirun.yan@intel.com> wrote:

> +/**
> + * @internal
> + *
> + * Enqueue a given node to the tail of the graph reel.
> + *
> + * @param graph
> + *   Pointer Graph object.
> + * @param node
> + *   Pointer to node object to be enqueued.
> + */
> +static __rte_always_inline void
> +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> +{
> +	uint64_t start;
> +	uint16_t rc;
> +	void **objs;
> +
> +	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +	objs = node->objs;
> +	rte_prefetch0(objs);
> +
> +	if (rte_graph_has_stats_feature()) {
> +		start = rte_rdtsc();
> +		rc = node->process(graph, node, objs, node->idx);
> +		node->total_cycles += rte_rdtsc() - start;
> +		node->total_calls++;
> +		node->total_objs += rc;
> +	} else {
> +		node->process(graph, node, objs, node->idx);
> +	}
> +	node->idx = 0;
> +}
> +

Why inline? Doing everything as inlines has long term ABI
impacts. And this is not a super critical performance path.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v3 04/15] graph: add get/set graph worker model APIs
  2023-03-29  6:43     ` [PATCH v3 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-29 15:35       ` Stephen Hemminger
  2023-03-30  3:37         ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Stephen Hemminger @ 2023-03-29 15:35 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Wed, 29 Mar 2023 15:43:29 +0900
Zhirun Yan <zhirun.yan@intel.com> wrote:

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + * Set the graph worker model
> + *
> + * @note This function does not perform any locking, and is only safe to call
> + *    before graph running.
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +inline int
> +rte_graph_worker_model_set(enum rte_graph_worker_model model)
> +{
> +	if (model >= RTE_GRAPH_MODEL_LIST_END)
> +		goto fail;
> +
> +	RTE_PER_LCORE(worker_model) = model;
> +	return 0;
> +
> +fail:
> +	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
> +	return -1;
> +}
> +

Once again, this doesn't have to be inline, could be a real API.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v3 03/15] graph: move node process into inline function
  2023-03-29 15:34       ` Stephen Hemminger
@ 2023-03-29 15:41         ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-03-29 15:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Zhirun Yan, dev, jerinj, kirankumark, ndabilpuram, cunming.liang,
	haiyue.wang

On Wed, Mar 29, 2023 at 9:04 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Wed, 29 Mar 2023 15:43:28 +0900
> Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> > +/**
> > + * @internal
> > + *
> > + * Enqueue a given node to the tail of the graph reel.
> > + *
> > + * @param graph
> > + *   Pointer Graph object.
> > + * @param node
> > + *   Pointer to node object to be enqueued.
> > + */
> > +static __rte_always_inline void
> > +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> > +{
> > +     uint64_t start;
> > +     uint16_t rc;
> > +     void **objs;
> > +
> > +     RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +     objs = node->objs;
> > +     rte_prefetch0(objs);
> > +
> > +     if (rte_graph_has_stats_feature()) {
> > +             start = rte_rdtsc();
> > +             rc = node->process(graph, node, objs, node->idx);
> > +             node->total_cycles += rte_rdtsc() - start;
> > +             node->total_calls++;
> > +             node->total_objs += rc;
> > +     } else {
> > +             node->process(graph, node, objs, node->idx);
> > +     }
> > +     node->idx = 0;
> > +}
> > +
>
> Why inline? Doing everything as inlines has long term ABI
> impacts. And this is not a super critical performance path.

This is one of the real fast path routine.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v3 04/15] graph: add get/set graph worker model APIs
  2023-03-29 15:35       ` Stephen Hemminger
@ 2023-03-30  3:37         ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-03-30  3:37 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, March 29, 2023 11:35 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v3 04/15] graph: add get/set graph worker model APIs
> 
> On Wed, 29 Mar 2023 15:43:29 +0900
> Zhirun Yan <zhirun.yan@intel.com> wrote:
> 
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + * Set the graph worker model
> > + *
> > + * @note This function does not perform any locking, and is only safe to call
> > + *    before graph running.
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> > + */
> > +inline int
> > +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> > +	if (model >= RTE_GRAPH_MODEL_LIST_END)
> > +		goto fail;
> > +
> > +	RTE_PER_LCORE(worker_model) = model;
> > +	return 0;
> > +
> > +fail:
> > +	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
> > +	return -1;
> > +}
> > +
> 
> Once again, this doesn't have to be inline, could be a real API.

Thanks, I will remove inline in next version.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 00/15] graph enhancement for multi-core dispatch
  2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                       ` (14 preceding siblings ...)
  2023-03-29  6:43     ` [PATCH v3 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-03-30  6:18     ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 01/15] graph: rename rte_graph_work as common Zhirun Yan
                         ` (15 more replies)
  15 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.


Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +


The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf


Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 237 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  44 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  44 +++
 lib/graph/rte_graph_model_dispatch.c | 179 +++++++++
 lib/graph/rte_graph_model_dispatch.h | 122 ++++++
 lib/graph/rte_graph_model_rtc.h      |  45 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 498 +------------------------
 lib/graph/rte_graph_worker_common.h  | 539 +++++++++++++++++++++++++++
 lib/graph/version.map                |  10 +
 18 files changed, 1553 insertions(+), 544 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 01/15] graph: rename rte_graph_work as common
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 02/15] graph: split graph worker into common and default model Zhirun Yan
                         ` (14 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 280058adfc..9d9467dd00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..f08dbc7e9d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 02/15] graph: split graph worker into common and default model
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 03/15] graph: move node process into inline function Zhirun Yan
                         ` (13 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 61 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 ---------------------------
 6 files changed, 98 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f08dbc7e9d..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..665560f831
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..7ea18ba80a
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 03/15] graph: move node process into inline function
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                         ` (12 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 665560f831..0dcb7151e9 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 04/15] graph: add get/set graph worker model APIs
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (2 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 05/15] graph: introduce graph node core affinity API Zhirun Yan
                         ` (11 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 77 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..cabc101262
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..1526da6e2c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -19,6 +19,7 @@
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
+#include <rte_per_lcore.h>
 #include <rte_prefetch.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
@@ -95,6 +96,16 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
+RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model);
+
 /**
  * @internal
  *
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 05/15] graph: introduce graph node core affinity API
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (3 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 06/15] graph: introduce graph bind unbind API Zhirun Yan
                         ` (10 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 31 ++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..409eed3284 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -50,6 +50,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..4a2f99496d
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 06/15] graph: introduce graph bind unbind API
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (4 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                         ` (9 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a839a2803b..b39a99aac6 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 409eed3284..ad1d058945 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 07/15] graph: introduce graph clone API for other worker core
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (5 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 08/15] graph: add struct for stream moving between cores Zhirun Yan
                         ` (8 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b39a99aac6..90eaad0378 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ad1d058945..d28a5af93e 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 08/15] graph: add struct for stream moving between cores
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (6 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 09/15] graph: introduce stream moving cross cores Zhirun Yan
                         ` (7 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 90eaad0378..dd3d69dbf7 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d28a5af93e..b66b18ebbc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -60,6 +60,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 1526da6e2c..dc0a0b5554 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -30,6 +30,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -41,6 +48,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -74,6 +90,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 09/15] graph: introduce stream moving cross cores
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (7 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                         ` (6 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  27 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph_model_dispatch.c | 145 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  37 +++++++
 lib/graph/version.map                |   2 +
 5 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index b66b18ebbc..e1a2a4bfd8 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4a2f99496d..a300fefb85 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..18fa7ce0ab 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,49 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+__rte_experimental
+void __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index aaa86f66ed..d511133f39 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_sched_wq_process;
+	__rte_graph_sched_node_enqueue;
 
 	rte_graph_model_dispatch_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (8 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                         ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index dd3d69dbf7..1f1ee9b622 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (9 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                         ` (4 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 18fa7ce0ab..65b2cc6d87 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -73,6 +73,48 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 12/15] graph: enable graph multicore dispatch scheduler model
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (10 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 13/15] graph: add stats for cross-core dispatching Zhirun Yan
                         ` (3 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 7ea18ba80a..d608c7513e 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -10,6 +10,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -24,7 +25,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 13/15] graph: add stats for cross-core dispatching
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (11 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                         ` (2 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..aa22cc403c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..7d77a790ac 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index a300fefb85..9db60eb463 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index dc0a0b5554..d94983589c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (12 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-30  6:18       ` [PATCH v4 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 237 +++++++++++++++++++++++++++++-------
 1 file changed, 195 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..cfa78003f4 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,141 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	/* >8 End of graph initialization. */
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		/* >8 End of graph initialization. */
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +987,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1021,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1263,18 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1325,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v4 15/15] doc: update multicore dispatch model in graph guides
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (13 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-03-30  6:18       ` Zhirun Yan
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-30  6:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..72e26f3a5a 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walking model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 00/15] graph enhancement for multi-core dispatch
  2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                         ` (14 preceding siblings ...)
  2023-03-30  6:18       ` [PATCH v4 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-03-31  4:02       ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 01/15] graph: rename rte_graph_work as common Zhirun Yan
                           ` (15 more replies)
  15 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 236 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  44 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  44 +++
 lib/graph/rte_graph_model_dispatch.c | 179 +++++++++
 lib/graph/rte_graph_model_dispatch.h | 122 ++++++
 lib/graph/rte_graph_model_rtc.h      |  45 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 498 +------------------------
 lib/graph/rte_graph_worker_common.h  | 539 +++++++++++++++++++++++++++
 lib/graph/version.map                |  10 +
 18 files changed, 1552 insertions(+), 544 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 01/15] graph: rename rte_graph_work as common
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 02/15] graph: split graph worker into common and default model Zhirun Yan
                           ` (14 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 280058adfc..9d9467dd00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..f08dbc7e9d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 02/15] graph: split graph worker into common and default model
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-04-27 14:11           ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-03-31  4:02         ` [PATCH v5 03/15] graph: move node process into inline function Zhirun Yan
                           ` (13 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 61 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 ---------------------------
 6 files changed, 98 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f08dbc7e9d..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -12,7 +12,7 @@
 #include <rte_eal.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..665560f831
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..7ea18ba80a
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 03/15] graph: move node process into inline function
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-04-27 15:03           ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-03-31  4:02         ` [PATCH v5 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                           ` (12 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 665560f831..0dcb7151e9 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 04/15] graph: add get/set graph worker model APIs
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (2 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 05/15] graph: introduce graph node core affinity API Zhirun Yan
                           ` (11 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 77 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..cabc101262
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..1526da6e2c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -19,6 +19,7 @@
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
+#include <rte_per_lcore.h>
 #include <rte_prefetch.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
@@ -95,6 +96,16 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
+RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model);
+
 /**
  * @internal
  *
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 05/15] graph: introduce graph node core affinity API
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (3 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 06/15] graph: introduce graph bind unbind API Zhirun Yan
                           ` (10 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 31 ++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 79 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 7d1b30b8ac..409eed3284 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -50,6 +50,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..4a2f99496d
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 06/15] graph: introduce graph bind unbind API
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (4 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                           ` (9 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a839a2803b..b39a99aac6 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 409eed3284..ad1d058945 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 07/15] graph: introduce graph clone API for other worker core
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (5 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:02         ` [PATCH v5 08/15] graph: add struct for stream moving between cores Zhirun Yan
                           ` (8 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b39a99aac6..90eaad0378 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ad1d058945..d28a5af93e 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -98,6 +98,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 08/15] graph: add struct for stream moving between cores
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (6 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-03-31  4:02         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 09/15] graph: introduce stream moving cross cores Zhirun Yan
                           ` (7 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:02 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 90eaad0378..dd3d69dbf7 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d28a5af93e..b66b18ebbc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -60,6 +60,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 1526da6e2c..dc0a0b5554 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -30,6 +30,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -41,6 +48,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -74,6 +90,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 09/15] graph: introduce stream moving cross cores
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (7 preceding siblings ...)
  2023-03-31  4:02         ` [PATCH v5 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-04-27 14:52           ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-03-31  4:03         ` [PATCH v5 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                           ` (6 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  27 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph_model_dispatch.c | 145 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  37 +++++++
 lib/graph/version.map                |   2 +
 5 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index b66b18ebbc..e1a2a4bfd8 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4a2f99496d..a300fefb85 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,151 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..18fa7ce0ab 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,49 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+__rte_experimental
+void __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index aaa86f66ed..d511133f39 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_sched_wq_process;
+	__rte_graph_sched_node_enqueue;
 
 	rte_graph_model_dispatch_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (8 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                           ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index dd3d69dbf7..1f1ee9b622 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name)
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (9 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-04-27 14:58           ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-03-31  4:03         ` [PATCH v5 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                           ` (4 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 18fa7ce0ab..65b2cc6d87 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -73,6 +73,48 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 12/15] graph: enable graph multicore dispatch scheduler model
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (10 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 13/15] graph: add stats for cross-core dispatching Zhirun Yan
                           ` (3 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 7ea18ba80a..d608c7513e 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -10,6 +10,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -24,7 +25,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 13/15] graph: add stats for cross-core dispatching
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (11 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                           ` (2 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..aa22cc403c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -332,13 +376,21 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..7d77a790ac 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index a300fefb85..9db60eb463 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -83,6 +83,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -94,6 +95,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index dc0a0b5554..d94983589c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (12 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-03-31  4:03         ` [PATCH v5 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 236 +++++++++++++++++++++++++++++-------
 1 file changed, 194 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..7078ed4c77 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,139 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	int worker_lcore = main_lcore_id;
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +985,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1019,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1261,19 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1324,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v5 15/15] doc: update multicore dispatch model in graph guides
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (13 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-03-31  4:03         ` Zhirun Yan
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-03-31  4:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..72e26f3a5a 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walking model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 02/15] graph: split graph worker into common and default model
  2023-03-31  4:02         ` [PATCH v5 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-04-27 14:11           ` Pavan Nikhilesh Bhagavatula
  2023-05-05  2:09             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-04-27 14:11 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: cunming.liang, haiyue.wang



> -----Original Message-----
> From: Zhirun Yan <zhirun.yan@intel.com>
> Sent: Friday, March 31, 2023 9:33 AM
> To: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Kiran
> Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: cunming.liang@intel.com; haiyue.wang@intel.com; Zhirun Yan
> <zhirun.yan@intel.com>
> Subject: [EXT] [PATCH v5 02/15] graph: split graph worker into common and
> default model
> 
> External Email
> 
> ----------------------------------------------------------------------
> To support multiple graph worker model, split graph into common
> and default. Naming the current walk function as rte_graph_model_rtc
> cause the default model is RTC(Run-to-completion).
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph_pcap.c              |  2 +-
>  lib/graph/graph_private.h           |  2 +-
>  lib/graph/meson.build               |  2 +-
>  lib/graph/rte_graph_model_rtc.h     | 61
> +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 57 ---------------------------
>  6 files changed, 98 insertions(+), 60 deletions(-)
>  create mode 100644 lib/graph/rte_graph_model_rtc.h
>  create mode 100644 lib/graph/rte_graph_worker.h
> 
> diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
> index 8a220370fa..6c43330029 100644
> --- a/lib/graph/graph_pcap.c
> +++ b/lib/graph/graph_pcap.c
> @@ -10,7 +10,7 @@
>  #include <rte_mbuf.h>
>  #include <rte_pcapng.h>
> 
> -#include "rte_graph_worker_common.h"
> +#include "rte_graph_worker.h"
> 
>  #include "graph_pcap_private.h"
> 
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index f08dbc7e9d..7d1b30b8ac 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -12,7 +12,7 @@
>  #include <rte_eal.h>
> 
>  #include "rte_graph.h"
> -#include "rte_graph_worker_common.h"
> +#include "rte_graph_worker.h"
> 
>  extern int rte_graph_logtype;
> 
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index 4e2b612ad3..3526d1b5d4 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -16,6 +16,6 @@ sources = files(
>          'graph_populate.c',
>          'graph_pcap.c',
>  )
> -headers = files('rte_graph.h', 'rte_graph_worker_common.h')
> +headers = files('rte_graph.h', 'rte_graph_worker.h')
> 
>  deps += ['eal', 'pcapng']
> diff --git a/lib/graph/rte_graph_model_rtc.h
> b/lib/graph/rte_graph_model_rtc.h
> new file mode 100644
> index 0000000000..665560f831
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_rtc.h
> @@ -0,0 +1,61 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +

Please retain Marvell copyright too.

> +#include "rte_graph_worker_common.h"
> +
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +static inline void
> +rte_graph_walk_rtc(struct rte_graph *graph)
> +{
> +	const rte_graph_off_t *cir_start = graph->cir_start;
> +	const rte_node_t mask = graph->cir_mask;
> +	uint32_t head = graph->head;
> +	struct rte_node *node;
> +	uint64_t start;
> +	uint16_t rc;
> +	void **objs;
> +
> +	/*
> +	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
> +	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> +	 * in a circular buffer fashion.
> +	 *
> +	 *	+-----+ <= cir_start - head [number of source nodes]
> +	 *	|     |
> +	 *	| ... | <= source nodes
> +	 *	|     |
> +	 *	+-----+ <= cir_start [head = 0] [tail = 0]
> +	 *	|     |
> +	 *	| ... | <= pending streams
> +	 *	|     |
> +	 *	+-----+ <= cir_start + mask
> +	 */
> +	while (likely(head != graph->tail)) {
> +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +		objs = node->objs;
> +		rte_prefetch0(objs);
> +
> +		if (rte_graph_has_stats_feature()) {
> +			start = rte_rdtsc();
> +			rc = node->process(graph, node, objs, node->idx);
> +			node->total_cycles += rte_rdtsc() - start;
> +			node->total_calls++;
> +			node->total_objs += rc;
> +		} else {
> +			node->process(graph, node, objs, node->idx);
> +		}
> +			node->idx = 0;
> +			head = likely((int32_t)head > 0) ? head & mask :
> head;
> +	}
> +	graph->tail = 0;
> +}
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> new file mode 100644
> index 0000000000..7ea18ba80a
> --- /dev/null
> +++ b/lib/graph/rte_graph_worker.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +
> +#ifndef _RTE_GRAPH_WORKER_H_
> +#define _RTE_GRAPH_WORKER_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_graph_model_rtc.h"
> +
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +__rte_experimental
> +static inline void
> +rte_graph_walk(struct rte_graph *graph)
> +{
> +	rte_graph_walk_rtc(graph);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_GRAPH_WORKER_H_ */
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index 0bad2938f3..b58f8f6947 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -128,63 +128,6 @@ __rte_experimental
>  void __rte_node_stream_alloc_size(struct rte_graph *graph,
>  				  struct rte_node *node, uint16_t req_size);
> 
> -/**
> - * Perform graph walk on the circular buffer and invoke the process function
> - * of the nodes and collect the stats.
> - *
> - * @param graph
> - *   Graph pointer returned from rte_graph_lookup function.
> - *
> - * @see rte_graph_lookup()
> - */
> -__rte_experimental
> -static inline void
> -rte_graph_walk(struct rte_graph *graph)
> -{
> -	const rte_graph_off_t *cir_start = graph->cir_start;
> -	const rte_node_t mask = graph->cir_mask;
> -	uint32_t head = graph->head;
> -	struct rte_node *node;
> -	uint64_t start;
> -	uint16_t rc;
> -	void **objs;
> -
> -	/*
> -	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
> -	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> -	 * in a circular buffer fashion.
> -	 *
> -	 *	+-----+ <= cir_start - head [number of source nodes]
> -	 *	|     |
> -	 *	| ... | <= source nodes
> -	 *	|     |
> -	 *	+-----+ <= cir_start [head = 0] [tail = 0]
> -	 *	|     |
> -	 *	| ... | <= pending streams
> -	 *	|     |
> -	 *	+-----+ <= cir_start + mask
> -	 */
> -	while (likely(head != graph->tail)) {
> -		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> -		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> -		objs = node->objs;
> -		rte_prefetch0(objs);
> -
> -		if (rte_graph_has_stats_feature()) {
> -			start = rte_rdtsc();
> -			rc = node->process(graph, node, objs, node->idx);
> -			node->total_cycles += rte_rdtsc() - start;
> -			node->total_calls++;
> -			node->total_objs += rc;
> -		} else {
> -			node->process(graph, node, objs, node->idx);
> -		}
> -		node->idx = 0;
> -		head = likely((int32_t)head > 0) ? head & mask : head;
> -	}
> -	graph->tail = 0;
> -}
> -
>  /* Fast path helper functions */
> 
>  /**
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 09/15] graph: introduce stream moving cross cores
  2023-03-31  4:03         ` [PATCH v5 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-04-27 14:52           ` Pavan Nikhilesh Bhagavatula
  2023-05-05  2:10             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-04-27 14:52 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: cunming.liang, haiyue.wang

> This patch introduces key functions to allow a worker thread to
> enable enqueue and move streams of objects to the next nodes over
> different cores.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph_private.h            |  27 +++++
>  lib/graph/meson.build                |   2 +-
>  lib/graph/rte_graph_model_dispatch.c | 145
> +++++++++++++++++++++++++++
>  lib/graph/rte_graph_model_dispatch.h |  37 +++++++
>  lib/graph/version.map                |   2 +
>  5 files changed, 212 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index b66b18ebbc..e1a2a4bfd8 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
>   */
>  void node_dump(FILE *f, struct node *n);
> 
> +/**
> + * @internal
> + *
> + * Create the graph schedule work queue. And all cloned graphs attached to
> the
> + * parent graph MUST be destroyed together for fast schedule design
> limitation.
> + *
> + * @param _graph
> + *   The graph object
> + * @param _parent_graph
> + *   The parent graph object which holds the run-queue head.
> + *
> + * @return
> + *   - 0: Success.
> + *   - <0: Graph schedule work queue related error.
> + */
> +int graph_sched_wq_create(struct graph *_graph, struct graph
> *_parent_graph);
> +
> +/**
> + * @internal
> + *
> + * Destroy the graph schedule work queue.
> + *
> + * @param _graph
> + *   The graph object
> + */
> +void graph_sched_wq_destroy(struct graph *_graph);
> +
>  #endif /* _RTE_GRAPH_PRIVATE_H_ */
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index c729d984b6..e21affa280 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -20,4 +20,4 @@ sources = files(
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
> 
> -deps += ['eal', 'pcapng']
> +deps += ['eal', 'pcapng', 'mempool', 'ring']
> diff --git a/lib/graph/rte_graph_model_dispatch.c
> b/lib/graph/rte_graph_model_dispatch.c
> index 4a2f99496d..a300fefb85 100644
> --- a/lib/graph/rte_graph_model_dispatch.c
> +++ b/lib/graph/rte_graph_model_dispatch.c
> @@ -5,6 +5,151 @@
>  #include "graph_private.h"
>  #include "rte_graph_model_dispatch.h"
> 
> +int
> +graph_sched_wq_create(struct graph *_graph, struct graph
> *_parent_graph)
> +{
> +	struct rte_graph *parent_graph = _parent_graph->graph;
> +	struct rte_graph *graph = _graph->graph;
> +	unsigned int wq_size;
> +
> +	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
> +	wq_size = rte_align32pow2(wq_size + 1);

Hi Zhirun,

We should introduce a new function `rte_graph_configure` which can help 
application to control the ring size and mempool size of the work queue?
We could fallback to default values if nothing is configured.

rte_graph_configure should take a 
struct rte_graph_config {
	struct {
		u64 rsvd[8];
	} rtc;
	struct {
		u16 wq_size;
		...
	} dispatch;
};

This will help future graph models to have their own configuration.

We can have a rte_graph_config_init() function to initialize the rte_graph_config structure.


> +
> +	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
> +				    RING_F_SC_DEQ);
> +	if (graph->wq == NULL)
> +		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
> +
> +	graph->mp = rte_mempool_create(graph->name, wq_size,
> +				       sizeof(struct graph_sched_wq_node),
> +				       0, 0, NULL, NULL, NULL, NULL,
> +				       graph->socket, MEMPOOL_F_SP_PUT);
> +	if (graph->mp == NULL)
> +		SET_ERR_JMP(EIO, fail_mp,
> +			    "Failed to allocate graph WQ schedule entry");
> +
> +	graph->lcore_id = _graph->lcore_id;
> +
> +	if (parent_graph->rq == NULL) {
> +		parent_graph->rq = &parent_graph->rq_head;
> +		SLIST_INIT(parent_graph->rq);
> +	}
> +
> +	graph->rq = parent_graph->rq;
> +	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
> +
> +	return 0;
> +
> +fail_mp:
> +	rte_ring_free(graph->wq);
> +	graph->wq = NULL;
> +fail:
> +	return -rte_errno;
> +}
> +
> +void
> +graph_sched_wq_destroy(struct graph *_graph)
> +{
> +	struct rte_graph *graph = _graph->graph;
> +
> +	if (graph == NULL)
> +		return;
> +
> +	rte_ring_free(graph->wq);
> +	graph->wq = NULL;
> +
> +	rte_mempool_free(graph->mp);
> +	graph->mp = NULL;
> +}
> +
> +static __rte_always_inline bool
> +__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph
> *graph)
> +{
> +	struct graph_sched_wq_node *wq_node;
> +	uint16_t off = 0;
> +	uint16_t size;
> +
> +submit_again:
> +	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
> +		goto fallback;
> +
> +	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
> +	wq_node->node_off = node->off;
> +	wq_node->nb_objs = size;
> +	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void
> *));
> +
> +	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void
> *)&wq_node,
> +					  sizeof(wq_node), 1, NULL) == 0)
> +		rte_pause();
> +
> +	off += size;
> +	node->idx -= size;
> +	if (node->idx > 0)
> +		goto submit_again;
> +
> +	return true;
> +
> +fallback:
> +	if (off != 0)
> +		memmove(&node->objs[0], &node->objs[off],
> +			node->idx * sizeof(void *));
> +
> +	return false;
> +}
> +
> +bool __rte_noinline
> +__rte_graph_sched_node_enqueue(struct rte_node *node,
> +			       struct rte_graph_rq_head *rq)
> +{
> +	const unsigned int lcore_id = node->lcore_id;
> +	struct rte_graph *graph;
> +
> +	SLIST_FOREACH(graph, rq, rq_next)
> +		if (graph->lcore_id == lcore_id)
> +			break;
> +
> +	return graph != NULL ? __graph_sched_node_enqueue(node,
> graph) : false;
> +}
> +
> +void
> +__rte_graph_sched_wq_process(struct rte_graph *graph)
> +{
> +	struct graph_sched_wq_node *wq_node;
> +	struct rte_mempool *mp = graph->mp;
> +	struct rte_ring *wq = graph->wq;
> +	uint16_t idx, free_space;
> +	struct rte_node *node;
> +	unsigned int i, n;
> +	struct graph_sched_wq_node *wq_nodes[32];
> +
> +	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes,
> sizeof(wq_nodes[0]),
> +					   RTE_DIM(wq_nodes), NULL);
> +	if (n == 0)
> +		return;
> +
> +	for (i = 0; i < n; i++) {
> +		wq_node = wq_nodes[i];
> +		node = RTE_PTR_ADD(graph, wq_node->node_off);
> +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +		idx = node->idx;
> +		free_space = node->size - idx;
> +
> +		if (unlikely(free_space < wq_node->nb_objs))
> +			__rte_node_stream_alloc_size(graph, node, node-
> >size + wq_node->nb_objs);
> +
> +		memmove(&node->objs[idx], wq_node->objs, wq_node-
> >nb_objs * sizeof(void *));
> +		memset(wq_node->objs, 0, wq_node->nb_objs *
> sizeof(void *));

Memset should be avoided in fastpath for better performance as we anyway set  wq_node->nb_objs as 0.

> +		node->idx = idx + wq_node->nb_objs;
> +
> +		__rte_node_process(graph, node);
> +
> +		wq_node->nb_objs = 0;
> +		node->idx = 0;
> +	}
> +
> +	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
> +}
> +
>  int
>  rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned
> int lcore_id)
>  {
> diff --git a/lib/graph/rte_graph_model_dispatch.h
> b/lib/graph/rte_graph_model_dispatch.h
> index 179624e972..18fa7ce0ab 100644
> --- a/lib/graph/rte_graph_model_dispatch.h
> +++ b/lib/graph/rte_graph_model_dispatch.h
> @@ -14,12 +14,49 @@
>   *
>   * This API allows to set core affinity with the node.
>   */
> +#include <rte_errno.h>
> +#include <rte_mempool.h>
> +#include <rte_memzone.h>
> +#include <rte_ring.h>
> +
>  #include "rte_graph_worker_common.h"
> 
>  #ifdef __cplusplus
>  extern "C" {
>  #endif
> 
> +#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
> +#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
> +	((typeof(nb_nodes))((nb_nodes) *
> GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
> +
> +/**
> + * @internal
> + *
> + * Schedule the node to the right graph's work queue.
> + *
> + * @param node
> + *   Pointer to the scheduled node object.
> + * @param rq
> + *   Pointer to the scheduled run-queue for all graphs.
> + *
> + * @return
> + *   True on success, false otherwise.
> + */
> +__rte_experimental
> +bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node
> *node,
> +				    struct rte_graph_rq_head *rq);
> +
> +/**
> + * @internal
> + *
> + * Process all nodes (streams) in the graph's work queue.
> + *
> + * @param graph
> + *   Pointer to the graph object.
> + */
> +__rte_experimental
> +void __rte_graph_sched_wq_process(struct rte_graph *graph);
> +
>  /**
>   * Set lcore affinity with the node.
>   *
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index aaa86f66ed..d511133f39 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -48,6 +48,8 @@ EXPERIMENTAL {
> 
>  	rte_graph_worker_model_set;
>  	rte_graph_worker_model_get;
> +	__rte_graph_sched_wq_process;
> +	__rte_graph_sched_node_enqueue;
> 
>  	rte_graph_model_dispatch_lcore_affinity_set;
> 
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch
  2023-03-31  4:03         ` [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-04-27 14:58           ` Pavan Nikhilesh Bhagavatula
  2023-05-05  2:09             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-04-27 14:58 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: cunming.liang, haiyue.wang

> This patch introduces the task scheduler mechanism to enable dispatching
> tasks to another worker cores. Currently, there is only a local work
> queue for one graph to walk. We introduce a scheduler worker queue in
> each worker core for dispatching tasks. It will perform the walk on
> scheduler work queue first, then handle the local work queue.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_model_dispatch.h | 42
> ++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/lib/graph/rte_graph_model_dispatch.h
> b/lib/graph/rte_graph_model_dispatch.h
> index 18fa7ce0ab..65b2cc6d87 100644
> --- a/lib/graph/rte_graph_model_dispatch.h
> +++ b/lib/graph/rte_graph_model_dispatch.h
> @@ -73,6 +73,48 @@ __rte_experimental
>  int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
>  						unsigned int lcore_id);
> 
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +__rte_experimental
> +static inline void
> +rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
> +{
> +	const rte_graph_off_t *cir_start = graph->cir_start;
> +	const rte_node_t mask = graph->cir_mask;
> +	uint32_t head = graph->head;
> +	struct rte_node *node;

I think we should add a RTE_ASSERT here to make sure that the graph object is a cloned graph.

> +
> +	if (graph->wq != NULL)
> +		__rte_graph_sched_wq_process(graph);
> +
> +	while (likely(head != graph->tail)) {
> +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> +
> +		/* skip the src nodes which not bind with current worker */
> +		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
> +			continue;
> +
> +		/* Schedule the node until all task/objs are done */
> +		if (node->lcore_id != RTE_MAX_LCORE &&
> +		    graph->lcore_id != node->lcore_id && graph->rq != NULL
> &&
> +		    __rte_graph_sched_node_enqueue(node, graph->rq))
> +			continue;
> +
> +		__rte_node_process(graph, node);
> +
> +		head = likely((int32_t)head > 0) ? head & mask : head;
> +	}
> +
> +	graph->tail = 0;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 03/15] graph: move node process into inline function
  2023-03-31  4:02         ` [PATCH v5 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-04-27 15:03           ` Pavan Nikhilesh Bhagavatula
  2023-05-05  2:10             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-04-27 15:03 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: cunming.liang, haiyue.wang

> Node process is a single and reusable block, move the code into an inline
> function.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
>  lib/graph/rte_graph_worker_common.h | 33
> +++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/graph/rte_graph_model_rtc.h
> b/lib/graph/rte_graph_model_rtc.h
> index 665560f831..0dcb7151e9 100644
> --- a/lib/graph/rte_graph_model_rtc.h
> +++ b/lib/graph/rte_graph_model_rtc.h
> @@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>  	const rte_node_t mask = graph->cir_mask;
>  	uint32_t head = graph->head;
>  	struct rte_node *node;
> -	uint64_t start;
> -	uint16_t rc;
> -	void **objs;
> 
>  	/*
>  	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
> @@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>  	 */
>  	while (likely(head != graph->tail)) {
>  		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> -		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> -		objs = node->objs;
> -		rte_prefetch0(objs);
> -
> -		if (rte_graph_has_stats_feature()) {
> -			start = rte_rdtsc();

Since we are refactoring this function could you change rte_rdtsc() to rte_rdtsc_precise().

> -			rc = node->process(graph, node, objs, node->idx);
> -			node->total_cycles += rte_rdtsc() - start;
> -			node->total_calls++;
> -			node->total_objs += rc;
> -		} else {
> -			node->process(graph, node, objs, node->idx);
> -		}
> -			node->idx = 0;
> -			head = likely((int32_t)head > 0) ? head & mask :
> head;
> +		__rte_node_process(graph, node);
> +		head = likely((int32_t)head > 0) ? head & mask : head;
>  	}
>  	graph->tail = 0;
>  }
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index b58f8f6947..41428974db 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct
> rte_graph *graph,
> 
>  /* Fast path helper functions */
> 
> +/**
> + * @internal
> + *
> + * Enqueue a given node to the tail of the graph reel.
> + *
> + * @param graph
> + *   Pointer Graph object.
> + * @param node
> + *   Pointer to node object to be enqueued.
> + */
> +static __rte_always_inline void
> +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> +{
> +	uint64_t start;
> +	uint16_t rc;
> +	void **objs;
> +
> +	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +	objs = node->objs;
> +	rte_prefetch0(objs);
> +
> +	if (rte_graph_has_stats_feature()) {
> +		start = rte_rdtsc();
> +		rc = node->process(graph, node, objs, node->idx);
> +		node->total_cycles += rte_rdtsc() - start;
> +		node->total_calls++;
> +		node->total_objs += rc;
> +	} else {
> +		node->process(graph, node, objs, node->idx);
> +	}
> +	node->idx = 0;
> +}
> +
>  /**
>   * @internal
>   *
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 02/15] graph: split graph worker into common and default model
  2023-04-27 14:11           ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-05-05  2:09             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-05  2:09 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, April 27, 2023 10:11 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v5 02/15] graph: split graph worker into common and
> default model
> 
> 
> 
> > -----Original Message-----
> > From: Zhirun Yan <zhirun.yan@intel.com>
> > Sent: Friday, March 31, 2023 9:33 AM
> > To: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > Kiran Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar
> > Dabilpuram <ndabilpuram@marvell.com>; stephen@networkplumber.org
> > Cc: cunming.liang@intel.com; haiyue.wang@intel.com; Zhirun Yan
> > <zhirun.yan@intel.com>
> > Subject: [EXT] [PATCH v5 02/15] graph: split graph worker into common
> > and default model
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > To support multiple graph worker model, split graph into common and
> > default. Naming the current walk function as rte_graph_model_rtc cause
> > the default model is RTC(Run-to-completion).
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph_pcap.c              |  2 +-
> >  lib/graph/graph_private.h           |  2 +-
> >  lib/graph/meson.build               |  2 +-
> >  lib/graph/rte_graph_model_rtc.h     | 61
> > +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker.h        | 34 ++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 57 ---------------------------
> >  6 files changed, 98 insertions(+), 60 deletions(-)  create mode
> > 100644 lib/graph/rte_graph_model_rtc.h  create mode 100644
> > lib/graph/rte_graph_worker.h
> >
> > diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index
> > 8a220370fa..6c43330029 100644
> > --- a/lib/graph/graph_pcap.c
> > +++ b/lib/graph/graph_pcap.c
> > @@ -10,7 +10,7 @@
> >  #include <rte_mbuf.h>
> >  #include <rte_pcapng.h>
> >
> > -#include "rte_graph_worker_common.h"
> > +#include "rte_graph_worker.h"
> >
> >  #include "graph_pcap_private.h"
> >
> > diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> > index f08dbc7e9d..7d1b30b8ac 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -12,7 +12,7 @@
> >  #include <rte_eal.h>
> >
> >  #include "rte_graph.h"
> > -#include "rte_graph_worker_common.h"
> > +#include "rte_graph_worker.h"
> >
> >  extern int rte_graph_logtype;
> >
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > 4e2b612ad3..3526d1b5d4 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -16,6 +16,6 @@ sources = files(
> >          'graph_populate.c',
> >          'graph_pcap.c',
> >  )
> > -headers = files('rte_graph.h', 'rte_graph_worker_common.h')
> > +headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> >  deps += ['eal', 'pcapng']
> > diff --git a/lib/graph/rte_graph_model_rtc.h
> > b/lib/graph/rte_graph_model_rtc.h new file mode 100644 index
> > 0000000000..665560f831
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_model_rtc.h
> > @@ -0,0 +1,61 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Intel Corporation  */
> > +
> 
> Please retain Marvell copyright too.
> 
Yes, I will do in next version. Thanks for reminding me.

> > +#include "rte_graph_worker_common.h"
> > +
> > +/**
> > + * Perform graph walk on the circular buffer and invoke the process
> > function
> > + * of the nodes and collect the stats.
> > + *
> > + * @param graph
> > + *   Graph pointer returned from rte_graph_lookup function.
> > + *
> > + * @see rte_graph_lookup()
> > + */
> > +static inline void
> > +rte_graph_walk_rtc(struct rte_graph *graph) {
> > +	const rte_graph_off_t *cir_start = graph->cir_start;
> > +	const rte_node_t mask = graph->cir_mask;
> > +	uint32_t head = graph->head;
> > +	struct rte_node *node;
> > +	uint64_t start;
> > +	uint16_t rc;
> > +	void **objs;
> > +
> > +	/*
> > +	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> > then
> > +	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> > +	 * in a circular buffer fashion.
> > +	 *
> > +	 *	+-----+ <= cir_start - head [number of source nodes]
> > +	 *	|     |
> > +	 *	| ... | <= source nodes
> > +	 *	|     |
> > +	 *	+-----+ <= cir_start [head = 0] [tail = 0]
> > +	 *	|     |
> > +	 *	| ... | <= pending streams
> > +	 *	|     |
> > +	 *	+-----+ <= cir_start + mask
> > +	 */
> > +	while (likely(head != graph->tail)) {
> > +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> > cir_start[(int32_t)head++]);
> > +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +		objs = node->objs;
> > +		rte_prefetch0(objs);
> > +
> > +		if (rte_graph_has_stats_feature()) {
> > +			start = rte_rdtsc();
> > +			rc = node->process(graph, node, objs, node->idx);
> > +			node->total_cycles += rte_rdtsc() - start;
> > +			node->total_calls++;
> > +			node->total_objs += rc;
> > +		} else {
> > +			node->process(graph, node, objs, node->idx);
> > +		}
> > +			node->idx = 0;
> > +			head = likely((int32_t)head > 0) ? head & mask :
> > head;
> > +	}
> > +	graph->tail = 0;
> > +}
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h new file mode 100644 index
> > 0000000000..7ea18ba80a
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -0,0 +1,34 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Intel Corporation  */
> > +
> > +#ifndef _RTE_GRAPH_WORKER_H_
> > +#define _RTE_GRAPH_WORKER_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "rte_graph_model_rtc.h"
> > +
> > +/**
> > + * Perform graph walk on the circular buffer and invoke the process
> > function
> > + * of the nodes and collect the stats.
> > + *
> > + * @param graph
> > + *   Graph pointer returned from rte_graph_lookup function.
> > + *
> > + * @see rte_graph_lookup()
> > + */
> > +__rte_experimental
> > +static inline void
> > +rte_graph_walk(struct rte_graph *graph) {
> > +	rte_graph_walk_rtc(graph);
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_GRAPH_WORKER_H_ */
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index 0bad2938f3..b58f8f6947 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -128,63 +128,6 @@ __rte_experimental  void
> > __rte_node_stream_alloc_size(struct rte_graph *graph,
> >  				  struct rte_node *node, uint16_t req_size);
> >
> > -/**
> > - * Perform graph walk on the circular buffer and invoke the process
> > function
> > - * of the nodes and collect the stats.
> > - *
> > - * @param graph
> > - *   Graph pointer returned from rte_graph_lookup function.
> > - *
> > - * @see rte_graph_lookup()
> > - */
> > -__rte_experimental
> > -static inline void
> > -rte_graph_walk(struct rte_graph *graph) -{
> > -	const rte_graph_off_t *cir_start = graph->cir_start;
> > -	const rte_node_t mask = graph->cir_mask;
> > -	uint32_t head = graph->head;
> > -	struct rte_node *node;
> > -	uint64_t start;
> > -	uint16_t rc;
> > -	void **objs;
> > -
> > -	/*
> > -	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> > then
> > -	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> > -	 * in a circular buffer fashion.
> > -	 *
> > -	 *	+-----+ <= cir_start - head [number of source nodes]
> > -	 *	|     |
> > -	 *	| ... | <= source nodes
> > -	 *	|     |
> > -	 *	+-----+ <= cir_start [head = 0] [tail = 0]
> > -	 *	|     |
> > -	 *	| ... | <= pending streams
> > -	 *	|     |
> > -	 *	+-----+ <= cir_start + mask
> > -	 */
> > -	while (likely(head != graph->tail)) {
> > -		node = (struct rte_node *)RTE_PTR_ADD(graph,
> > cir_start[(int32_t)head++]);
> > -		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > -		objs = node->objs;
> > -		rte_prefetch0(objs);
> > -
> > -		if (rte_graph_has_stats_feature()) {
> > -			start = rte_rdtsc();
> > -			rc = node->process(graph, node, objs, node->idx);
> > -			node->total_cycles += rte_rdtsc() - start;
> > -			node->total_calls++;
> > -			node->total_objs += rc;
> > -		} else {
> > -			node->process(graph, node, objs, node->idx);
> > -		}
> > -		node->idx = 0;
> > -		head = likely((int32_t)head > 0) ? head & mask : head;
> > -	}
> > -	graph->tail = 0;
> > -}
> > -
> >  /* Fast path helper functions */
> >
> >  /**
> > --
> > 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch
  2023-04-27 14:58           ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-05-05  2:09             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-05  2:09 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, April 27, 2023 10:59 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v5 11/15] graph: introduce graph walk by cross-core
> dispatch
> 
> > This patch introduces the task scheduler mechanism to enable
> > dispatching tasks to another worker cores. Currently, there is only a
> > local work queue for one graph to walk. We introduce a scheduler
> > worker queue in each worker core for dispatching tasks. It will
> > perform the walk on scheduler work queue first, then handle the local work
> queue.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_model_dispatch.h | 42
> > ++++++++++++++++++++++++++++
> >  1 file changed, 42 insertions(+)
> >
> > diff --git a/lib/graph/rte_graph_model_dispatch.h
> > b/lib/graph/rte_graph_model_dispatch.h
> > index 18fa7ce0ab..65b2cc6d87 100644
> > --- a/lib/graph/rte_graph_model_dispatch.h
> > +++ b/lib/graph/rte_graph_model_dispatch.h
> > @@ -73,6 +73,48 @@ __rte_experimental
> >  int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
> >  						unsigned int lcore_id);
> >
> > +/**
> > + * Perform graph walk on the circular buffer and invoke the process
> > function
> > + * of the nodes and collect the stats.
> > + *
> > + * @param graph
> > + *   Graph pointer returned from rte_graph_lookup function.
> > + *
> > + * @see rte_graph_lookup()
> > + */
> > +__rte_experimental
> > +static inline void
> > +rte_graph_walk_mcore_dispatch(struct rte_graph *graph) {
> > +	const rte_graph_off_t *cir_start = graph->cir_start;
> > +	const rte_node_t mask = graph->cir_mask;
> > +	uint32_t head = graph->head;
> > +	struct rte_node *node;
> 
> I think we should add a RTE_ASSERT here to make sure that the graph object is a
> cloned graph.
> 
Ok, I will add RTE_ASSERT in next version. 

> > +
> > +	if (graph->wq != NULL)
> > +		__rte_graph_sched_wq_process(graph);
> > +
> > +	while (likely(head != graph->tail)) {
> > +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> > cir_start[(int32_t)head++]);
> > +
> > +		/* skip the src nodes which not bind with current worker */
> > +		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
> > +			continue;
> > +
> > +		/* Schedule the node until all task/objs are done */
> > +		if (node->lcore_id != RTE_MAX_LCORE &&
> > +		    graph->lcore_id != node->lcore_id && graph->rq != NULL
> > &&
> > +		    __rte_graph_sched_node_enqueue(node, graph->rq))
> > +			continue;
> > +
> > +		__rte_node_process(graph, node);
> > +
> > +		head = likely((int32_t)head > 0) ? head & mask : head;
> > +	}
> > +
> > +	graph->tail = 0;
> > +}
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > --
> > 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 09/15] graph: introduce stream moving cross cores
  2023-04-27 14:52           ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-05-05  2:10             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-05  2:10 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, April 27, 2023 10:53 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v5 09/15] graph: introduce stream moving cross cores
> 
> > This patch introduces key functions to allow a worker thread to enable
> > enqueue and move streams of objects to the next nodes over different
> > cores.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph_private.h            |  27 +++++
> >  lib/graph/meson.build                |   2 +-
> >  lib/graph/rte_graph_model_dispatch.c | 145
> > +++++++++++++++++++++++++++
> >  lib/graph/rte_graph_model_dispatch.h |  37 +++++++
> >  lib/graph/version.map                |   2 +
> >  5 files changed, 212 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> > index b66b18ebbc..e1a2a4bfd8 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
> >   */
> >  void node_dump(FILE *f, struct node *n);
> >
> > +/**
> > + * @internal
> > + *
> > + * Create the graph schedule work queue. And all cloned graphs
> > +attached to
> > the
> > + * parent graph MUST be destroyed together for fast schedule design
> > limitation.
> > + *
> > + * @param _graph
> > + *   The graph object
> > + * @param _parent_graph
> > + *   The parent graph object which holds the run-queue head.
> > + *
> > + * @return
> > + *   - 0: Success.
> > + *   - <0: Graph schedule work queue related error.
> > + */
> > +int graph_sched_wq_create(struct graph *_graph, struct graph
> > *_parent_graph);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Destroy the graph schedule work queue.
> > + *
> > + * @param _graph
> > + *   The graph object
> > + */
> > +void graph_sched_wq_destroy(struct graph *_graph);
> > +
> >  #endif /* _RTE_GRAPH_PRIVATE_H_ */
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > c729d984b6..e21affa280 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -20,4 +20,4 @@ sources = files(
> >  )
> >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> > -deps += ['eal', 'pcapng']
> > +deps += ['eal', 'pcapng', 'mempool', 'ring']
> > diff --git a/lib/graph/rte_graph_model_dispatch.c
> > b/lib/graph/rte_graph_model_dispatch.c
> > index 4a2f99496d..a300fefb85 100644
> > --- a/lib/graph/rte_graph_model_dispatch.c
> > +++ b/lib/graph/rte_graph_model_dispatch.c
> > @@ -5,6 +5,151 @@
> >  #include "graph_private.h"
> >  #include "rte_graph_model_dispatch.h"
> >
> > +int
> > +graph_sched_wq_create(struct graph *_graph, struct graph
> > *_parent_graph)
> > +{
> > +	struct rte_graph *parent_graph = _parent_graph->graph;
> > +	struct rte_graph *graph = _graph->graph;
> > +	unsigned int wq_size;
> > +
> > +	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
> > +	wq_size = rte_align32pow2(wq_size + 1);
> 
> Hi Zhirun,
> 
> We should introduce a new function `rte_graph_configure` which can help
> application to control the ring size and mempool size of the work queue?
> We could fallback to default values if nothing is configured.
> 
> rte_graph_configure should take a
> struct rte_graph_config {
> 	struct {
> 		u64 rsvd[8];
> 	} rtc;
> 	struct {
> 		u16 wq_size;
> 		...
> 	} dispatch;
> };
> 
> This will help future graph models to have their own configuration.
> 
> We can have a rte_graph_config_init() function to initialize the rte_graph_config
> structure.
> 

Hi Pavan,

Thanks for your comments. I am agree with you. It would be more friendly for user/developer.
And for ring and mempool, there are some limitations(must be a power of 2) about the size. So
I prefer to use u16 wq_size_max and u32 mp_size_max for user if they have limited resources.

> 
> > +
> > +	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
> > +				    RING_F_SC_DEQ);
> > +	if (graph->wq == NULL)
> > +		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
> > +
> > +	graph->mp = rte_mempool_create(graph->name, wq_size,
> > +				       sizeof(struct graph_sched_wq_node),
> > +				       0, 0, NULL, NULL, NULL, NULL,
> > +				       graph->socket, MEMPOOL_F_SP_PUT);
> > +	if (graph->mp == NULL)
> > +		SET_ERR_JMP(EIO, fail_mp,
> > +			    "Failed to allocate graph WQ schedule entry");
> > +
> > +	graph->lcore_id = _graph->lcore_id;
> > +
> > +	if (parent_graph->rq == NULL) {
> > +		parent_graph->rq = &parent_graph->rq_head;
> > +		SLIST_INIT(parent_graph->rq);
> > +	}
> > +
> > +	graph->rq = parent_graph->rq;
> > +	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
> > +
> > +	return 0;
> > +
> > +fail_mp:
> > +	rte_ring_free(graph->wq);
> > +	graph->wq = NULL;
> > +fail:
> > +	return -rte_errno;
> > +}
> > +
> > +void
> > +graph_sched_wq_destroy(struct graph *_graph) {
> > +	struct rte_graph *graph = _graph->graph;
> > +
> > +	if (graph == NULL)
> > +		return;
> > +
> > +	rte_ring_free(graph->wq);
> > +	graph->wq = NULL;
> > +
> > +	rte_mempool_free(graph->mp);
> > +	graph->mp = NULL;
> > +}
> > +
> > +static __rte_always_inline bool
> > +__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph
> > *graph)
> > +{
> > +	struct graph_sched_wq_node *wq_node;
> > +	uint16_t off = 0;
> > +	uint16_t size;
> > +
> > +submit_again:
> > +	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
> > +		goto fallback;
> > +
> > +	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
> > +	wq_node->node_off = node->off;
> > +	wq_node->nb_objs = size;
> > +	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void
> > *));
> > +
> > +	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void
> > *)&wq_node,
> > +					  sizeof(wq_node), 1, NULL) == 0)
> > +		rte_pause();
> > +
> > +	off += size;
> > +	node->idx -= size;
> > +	if (node->idx > 0)
> > +		goto submit_again;
> > +
> > +	return true;
> > +
> > +fallback:
> > +	if (off != 0)
> > +		memmove(&node->objs[0], &node->objs[off],
> > +			node->idx * sizeof(void *));
> > +
> > +	return false;
> > +}
> > +
> > +bool __rte_noinline
> > +__rte_graph_sched_node_enqueue(struct rte_node *node,
> > +			       struct rte_graph_rq_head *rq) {
> > +	const unsigned int lcore_id = node->lcore_id;
> > +	struct rte_graph *graph;
> > +
> > +	SLIST_FOREACH(graph, rq, rq_next)
> > +		if (graph->lcore_id == lcore_id)
> > +			break;
> > +
> > +	return graph != NULL ? __graph_sched_node_enqueue(node,
> > graph) : false;
> > +}
> > +
> > +void
> > +__rte_graph_sched_wq_process(struct rte_graph *graph) {
> > +	struct graph_sched_wq_node *wq_node;
> > +	struct rte_mempool *mp = graph->mp;
> > +	struct rte_ring *wq = graph->wq;
> > +	uint16_t idx, free_space;
> > +	struct rte_node *node;
> > +	unsigned int i, n;
> > +	struct graph_sched_wq_node *wq_nodes[32];
> > +
> > +	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes,
> > sizeof(wq_nodes[0]),
> > +					   RTE_DIM(wq_nodes), NULL);
> > +	if (n == 0)
> > +		return;
> > +
> > +	for (i = 0; i < n; i++) {
> > +		wq_node = wq_nodes[i];
> > +		node = RTE_PTR_ADD(graph, wq_node->node_off);
> > +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +		idx = node->idx;
> > +		free_space = node->size - idx;
> > +
> > +		if (unlikely(free_space < wq_node->nb_objs))
> > +			__rte_node_stream_alloc_size(graph, node, node-
> > >size + wq_node->nb_objs);
> > +
> > +		memmove(&node->objs[idx], wq_node->objs, wq_node-
> > >nb_objs * sizeof(void *));
> > +		memset(wq_node->objs, 0, wq_node->nb_objs *
> > sizeof(void *));
> 
> Memset should be avoided in fastpath for better performance as we anyway set
> wq_node->nb_objs as 0.
> 
> > +		node->idx = idx + wq_node->nb_objs;
> > +
> > +		__rte_node_process(graph, node);
> > +
> > +		wq_node->nb_objs = 0;
> > +		node->idx = 0;
> > +	}
> > +
> > +	rte_mempool_put_bulk(mp, (void **)wq_nodes, n); }
> > +
> >  int
> >  rte_graph_model_dispatch_lcore_affinity_set(const char *name,
> > unsigned int lcore_id)  { diff --git
> > a/lib/graph/rte_graph_model_dispatch.h
> > b/lib/graph/rte_graph_model_dispatch.h
> > index 179624e972..18fa7ce0ab 100644
> > --- a/lib/graph/rte_graph_model_dispatch.h
> > +++ b/lib/graph/rte_graph_model_dispatch.h
> > @@ -14,12 +14,49 @@
> >   *
> >   * This API allows to set core affinity with the node.
> >   */
> > +#include <rte_errno.h>
> > +#include <rte_mempool.h>
> > +#include <rte_memzone.h>
> > +#include <rte_ring.h>
> > +
> >  #include "rte_graph_worker_common.h"
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> >  #endif
> >
> > +#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
> > +#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
> > +	((typeof(nb_nodes))((nb_nodes) *
> > GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
> > +
> > +/**
> > + * @internal
> > + *
> > + * Schedule the node to the right graph's work queue.
> > + *
> > + * @param node
> > + *   Pointer to the scheduled node object.
> > + * @param rq
> > + *   Pointer to the scheduled run-queue for all graphs.
> > + *
> > + * @return
> > + *   True on success, false otherwise.
> > + */
> > +__rte_experimental
> > +bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node
> > *node,
> > +				    struct rte_graph_rq_head *rq);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Process all nodes (streams) in the graph's work queue.
> > + *
> > + * @param graph
> > + *   Pointer to the graph object.
> > + */
> > +__rte_experimental
> > +void __rte_graph_sched_wq_process(struct rte_graph *graph);
> > +
> >  /**
> >   * Set lcore affinity with the node.
> >   *
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > aaa86f66ed..d511133f39 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -48,6 +48,8 @@ EXPERIMENTAL {
> >
> >  	rte_graph_worker_model_set;
> >  	rte_graph_worker_model_get;
> > +	__rte_graph_sched_wq_process;
> > +	__rte_graph_sched_node_enqueue;
> >
> >  	rte_graph_model_dispatch_lcore_affinity_set;
> >
> > --
> > 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v5 03/15] graph: move node process into inline function
  2023-04-27 15:03           ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-05-05  2:10             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-05  2:10 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, April 27, 2023 11:03 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [PATCH v5 03/15] graph: move node process into inline
> function
> 
> > Node process is a single and reusable block, move the code into an
> > inline function.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
> >  lib/graph/rte_graph_worker_common.h | 33
> > +++++++++++++++++++++++++++++
> >  2 files changed, 35 insertions(+), 18 deletions(-)
> >
> > diff --git a/lib/graph/rte_graph_model_rtc.h
> > b/lib/graph/rte_graph_model_rtc.h index 665560f831..0dcb7151e9 100644
> > --- a/lib/graph/rte_graph_model_rtc.h
> > +++ b/lib/graph/rte_graph_model_rtc.h
> > @@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
> >  	const rte_node_t mask = graph->cir_mask;
> >  	uint32_t head = graph->head;
> >  	struct rte_node *node;
> > -	uint64_t start;
> > -	uint16_t rc;
> > -	void **objs;
> >
> >  	/*
> >  	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> > then @@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
> >  	 */
> >  	while (likely(head != graph->tail)) {
> >  		node = (struct rte_node *)RTE_PTR_ADD(graph,
> > cir_start[(int32_t)head++]);
> > -		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > -		objs = node->objs;
> > -		rte_prefetch0(objs);
> > -
> > -		if (rte_graph_has_stats_feature()) {
> > -			start = rte_rdtsc();
> 
> Since we are refactoring this function could you change rte_rdtsc() to
> rte_rdtsc_precise().

Sure, I will do in next version.

> 
> > -			rc = node->process(graph, node, objs, node->idx);
> > -			node->total_cycles += rte_rdtsc() - start;
> > -			node->total_calls++;
> > -			node->total_objs += rc;
> > -		} else {
> > -			node->process(graph, node, objs, node->idx);
> > -		}
> > -			node->idx = 0;
> > -			head = likely((int32_t)head > 0) ? head & mask :
> > head;
> > +		__rte_node_process(graph, node);
> > +		head = likely((int32_t)head > 0) ? head & mask : head;
> >  	}
> >  	graph->tail = 0;
> >  }
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index b58f8f6947..41428974db 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct
> > rte_graph *graph,
> >
> >  /* Fast path helper functions */
> >
> > +/**
> > + * @internal
> > + *
> > + * Enqueue a given node to the tail of the graph reel.
> > + *
> > + * @param graph
> > + *   Pointer Graph object.
> > + * @param node
> > + *   Pointer to node object to be enqueued.
> > + */
> > +static __rte_always_inline void
> > +__rte_node_process(struct rte_graph *graph, struct rte_node *node) {
> > +	uint64_t start;
> > +	uint16_t rc;
> > +	void **objs;
> > +
> > +	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +	objs = node->objs;
> > +	rte_prefetch0(objs);
> > +
> > +	if (rte_graph_has_stats_feature()) {
> > +		start = rte_rdtsc();
> > +		rc = node->process(graph, node, objs, node->idx);
> > +		node->total_cycles += rte_rdtsc() - start;
> > +		node->total_calls++;
> > +		node->total_objs += rc;
> > +	} else {
> > +		node->process(graph, node, objs, node->idx);
> > +	}
> > +	node->idx = 0;
> > +}
> > +
> >  /**
> >   * @internal
> >   *
> > --
> > 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 00/15] graph enhancement for multi-core dispatch
  2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                           ` (14 preceding siblings ...)
  2023-03-31  4:03         ` [PATCH v5 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-05-09  6:03         ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 01/15] graph: rename rte_graph_work as common Zhirun Yan
                             ` (15 more replies)
  15 siblings, 16 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (15):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: introduce graph clone API for other worker core
  graph: add struct for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                          |   1 +
 doc/guides/prog_guide/graph_lib.rst  |  59 ++-
 examples/l3fwd-graph/main.c          | 236 +++++++++---
 lib/graph/graph.c                    | 179 +++++++++
 lib/graph/graph_debug.c              |   6 +
 lib/graph/graph_populate.c           |   1 +
 lib/graph/graph_private.h            |  47 +++
 lib/graph/graph_stats.c              |  74 +++-
 lib/graph/meson.build                |   4 +-
 lib/graph/node.c                     |   1 +
 lib/graph/rte_graph.h                |  57 +++
 lib/graph/rte_graph_model_dispatch.c | 190 ++++++++++
 lib/graph/rte_graph_model_dispatch.h | 123 ++++++
 lib/graph/rte_graph_model_rtc.h      |  46 +++
 lib/graph/rte_graph_worker.c         |  54 +++
 lib/graph/rte_graph_worker.h         | 497 +-----------------------
 lib/graph/rte_graph_worker_common.h  | 539 +++++++++++++++++++++++++++
 lib/graph/version.map                |  10 +
 18 files changed, 1581 insertions(+), 543 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 01/15] graph: rename rte_graph_work as common
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-22  8:25             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 02/15] graph: split graph worker into common and default model Zhirun Yan
                             ` (14 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 1 +
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 7 insertions(+), 6 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df23e5099..cc11328242 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 02/15] graph: split graph worker into common and default model
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 03/15] graph: move node process into inline function Zhirun Yan
                             ` (13 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 6 files changed, 100 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 03/15] graph: move node process into inline function
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 01/15] graph: rename rte_graph_work as common Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 02/15] graph: split graph worker into common and default model Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 04/15] graph: add get/set graph worker model APIs Zhirun Yan
                             ` (12 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..e25eabc81f 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc_precise();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc_precise() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 04/15] graph: add get/set graph worker model APIs
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (2 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 03/15] graph: move node process into inline function Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  6:08             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 05/15] graph: introduce graph node core affinity API Zhirun Yan
                             ` (11 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 54 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 77 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..cabc101262
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ * Set the graph worker model
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	if (model >= RTE_GRAPH_MODEL_LIST_END)
+		goto fail;
+
+	RTE_PER_LCORE(worker_model) = model;
+	return 0;
+
+fail:
+	RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
+	return -1;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Get the graph worker model
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	return RTE_PER_LCORE(worker_model);
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index e25eabc81f..9bde8856ae 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -19,6 +19,7 @@
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
+#include <rte_per_lcore.h>
 #include <rte_prefetch.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
@@ -95,6 +96,16 @@ struct rte_node {
 	struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */
 } __rte_cache_aligned;
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	RTE_GRAPH_MODEL_MCORE_DISPATCH,
+	RTE_GRAPH_MODEL_LIST_END
+};
+
+RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model);
+
 /**
  * @internal
  *
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 05/15] graph: introduce graph node core affinity API
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (3 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  6:36             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 06/15] graph: introduce graph bind unbind API Zhirun Yan
                             ` (10 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h            |  1 +
 lib/graph/meson.build                |  1 +
 lib/graph/node.c                     |  1 +
 lib/graph/rte_graph_model_dispatch.c | 30 +++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 lib/graph/version.map                |  2 ++
 6 files changed, 78 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..bd4c576324 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..c729d984b6 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
new file mode 100644
index 0000000000..3364a76ed4
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_dispatch.h"
+
+int
+rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
new file mode 100644
index 0000000000..179624e972
--- /dev/null
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * This API allows to set core affinity with the node.
+ */
+#include "rte_graph_worker_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Set lcore affinity with the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
+						unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..1f090be74e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_dispatch_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 06/15] graph: introduce graph bind unbind API
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (4 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  6:23             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
                             ` (9 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..b8ef86da45 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail,
+			    "lcore %d not enabled\n",
+			    lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +404,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index bd4c576324..f63b339d81 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -99,6 +99,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..c523809d1f 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 1f090be74e..7de6f08f59 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_dispatch_core_bind;
+	rte_graph_model_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 07/15] graph: introduce graph clone API for other worker core
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (5 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  7:14             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 08/15] graph: add struct for stream moving between cores Zhirun Yan
                             ` (8 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 110 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   2 +
 lib/graph/rte_graph.h     |  20 +++++++
 lib/graph/version.map     |   1 +
 4 files changed, 133 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b8ef86da45..2629c79103 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -404,6 +404,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -468,6 +469,115 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f63b339d81..52ca30ed56 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -99,6 +99,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c523809d1f..2f86c17de7 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 7de6f08f59..aaa86f66ed 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 08/15] graph: add struct for stream moving between cores
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (6 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  7:24             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 09/15] graph: introduce stream moving cross cores Zhirun Yan
                             ` (7 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add graph_sched_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  1 +
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 21 +++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 2629c79103..e809aa55b0 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -290,6 +290,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
 			break;
 
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..7dcf1420c1 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 52ca30ed56..02b10ea2b6 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -61,6 +61,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 9bde8856ae..8e968e2022 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -30,6 +30,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -41,6 +48,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -74,6 +90,11 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+		/* Fast schedule area for mcore dispatch model */
+		unsigned int lcore_id;  /**< Node running lcore. */
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 09/15] graph: introduce stream moving cross cores
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (7 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  8:00             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                             ` (6 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                    |   6 +-
 lib/graph/graph_private.h            |  30 +++++
 lib/graph/meson.build                |   2 +-
 lib/graph/rte_graph.h                |  15 ++-
 lib/graph/rte_graph_model_dispatch.c | 157 +++++++++++++++++++++++++++
 lib/graph/rte_graph_model_dispatch.h |  37 +++++++
 lib/graph/version.map                |   2 +
 7 files changed, 244 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index e809aa55b0..f555844d8f 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -495,7 +495,7 @@ clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -566,14 +566,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 02b10ea2b6..70347116ba 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -372,4 +372,34 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroyed together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c729d984b6..e21affa280 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2f86c17de7..0ac764daf8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd[8];
+		} rtc;
+		struct {
+			uint32_t wq_size_max;
+			uint32_t mp_capacity;
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 3364a76ed4..4264723485 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -5,6 +5,163 @@
 #include "graph_private.h"
 #include "rte_graph_model_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    flags);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 179624e972..18fa7ce0ab 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -14,12 +14,49 @@
  *
  * This API allows to set core affinity with the node.
  */
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+__rte_experimental
+void __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index aaa86f66ed..d511133f39 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_sched_wq_process;
+	__rte_graph_sched_node_enqueue;
 
 	rte_graph_model_dispatch_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 10/15] graph: enable create and destroy graph scheduling workqueue
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (8 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                             ` (5 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index f555844d8f..8b42d43193 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -449,6 +449,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -543,6 +547,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 11/15] graph: introduce graph walk by cross-core dispatch
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (9 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                             ` (4 subsequent siblings)
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
index 18fa7ce0ab..f35cddba31 100644
--- a/lib/graph/rte_graph_model_dispatch.h
+++ b/lib/graph/rte_graph_model_dispatch.h
@@ -73,6 +73,49 @@ __rte_experimental
 int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
 						unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (10 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  8:45             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 13/15] graph: add stats for cross-core dispatching Zhirun Yan
                             ` (3 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..2dd27b3949 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,13 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
-	rte_graph_walk_rtc(graph);
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT ||
+	    model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 13/15] graph: add stats for cross-core dispatching
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (11 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  8:08             ` Jerin Jacob
  2023-05-09  6:03           ` [PATCH v6 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                             ` (2 subsequent siblings)
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c              |  6 +++
 lib/graph/graph_stats.c              | 74 +++++++++++++++++++++++++---
 lib/graph/rte_graph.h                |  2 +
 lib/graph/rte_graph_model_dispatch.c |  3 ++
 lib/graph/rte_graph_worker_common.h  |  2 +
 5 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..9ccb358aa2 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->sched_objs,
+			stat->sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +377,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->sched_objs = sched_objs;
+		stat->sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 0ac764daf8..ee6c970ca4 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -219,6 +219,8 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
+	uint64_t sched_objs;	/**< Previous number of scheduled objs. */
+	uint64_t sched_fail;	/**< Previous number of failed schedule objs. */
 
 	uint64_t realloc_count; /**< Realloc count. */
 
diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
index 4264723485..cb7b6b9b7a 100644
--- a/lib/graph/rte_graph_model_dispatch.c
+++ b/lib/graph/rte_graph_model_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 8e968e2022..7095cb4699 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -95,6 +95,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		unsigned int lcore_id;  /**< Node running lcore. */
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (12 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-09  6:03           ` [PATCH v6 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add new parameter "model" to choose dispatch or rtc worker model.
And in dispatch model, the node will affinity to worker core successively.

Note:
only support one RX node for remote model in current implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 236 +++++++++++++++++++++++++++++-------
 1 file changed, 194 insertions(+), 42 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..a91947e940 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -55,6 +55,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +91,10 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+struct model_conf {
+	enum rte_graph_worker_model model;
+};
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +296,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static int
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) {
+		rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+		return RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	} else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		return RTE_GRAPH_MODEL_RTC;
+
+	rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+	return RTE_GRAPH_MODEL_LIST_END;
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +486,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +498,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +590,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -726,15 +770,15 @@ print_stats(void)
 static int
 graph_main_loop(void *conf)
 {
+	struct model_conf *mconf = conf;
 	struct lcore_conf *qconf;
 	struct rte_graph *graph;
 	uint32_t lcore_id;
 
-	RTE_SET_USED(conf);
-
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 	graph = qconf->graph;
+	rte_graph_worker_model_set(mconf->model);
 
 	if (!graph) {
 		RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n",
@@ -788,6 +832,139 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_name,
+									lcore_params[j].lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_dispatch_lcore_affinity_set(node_tmp->name,
+									worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -808,10 +985,12 @@ main(int argc, char **argv)
 	uint16_t queueid, portid, i;
 	const char **node_patterns;
 	struct lcore_conf *qconf;
+	struct model_conf mconf;
 	uint16_t nb_graphs = 0;
 	uint16_t nb_patterns;
 	uint8_t rewrite_len;
 	uint32_t lcore_id;
+	uint16_t model;
 	int ret;
 
 	/* Init EAL */
@@ -840,6 +1019,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1261,19 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
-
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	model = rte_graph_worker_model_get();
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
@@ -1174,8 +1324,10 @@ main(int argc, char **argv)
 	}
 	/* >8 End of adding route to ip4 graph infa. */
 
+	mconf.model = model;
 	/* Launch per-lcore init on every worker lcore */
-	rte_eal_mp_remote_launch(graph_main_loop, NULL, SKIP_MAIN);
+	rte_eal_mp_remote_launch(graph_main_loop, &mconf,
+				 SKIP_MAIN);
 
 	/* Accumulate and print stats on main until exit */
 	if (rte_graph_has_stats_feature())
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v6 15/15] doc: update multicore dispatch model in graph guides
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (13 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-05-09  6:03           ` Zhirun Yan
  2023-05-24  8:12             ` Jerin Jacob
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  15 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-05-09  6:03 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula
  Cc: cunming.liang, haiyue.wang, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..72e26f3a5a 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+graph model chossing
+~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking model. Use
+``rte_graph_worker_model_set()`` to set the walking model.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
+clone graph for each worker and use``rte_graph_model_dispatch_core_bind()``
+to bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 01/15] graph: rename rte_graph_work as common
  2023-05-09  6:03           ` [PATCH v6 01/15] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-05-22  8:25             ` Jerin Jacob
  2023-05-23  8:13               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-22  8:25 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:34 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Rename rte_graph_work.h to rte_graph_work_common.h for supporting
> multiple graph worker model.


I have requested to check the performance with dpdk-test and l3fwd
graph in last series.
Have you checked the performance? In my testing, there is regression.
Please check the performance with dpdk-test and l3fwd graph, there
should not be any regression in RTC mode.

There is around -300% regression arm64 and x86.
Command to mesure:
./build/app/test/dpdk-test -c 0xf00000 -- graph_perf_autotest

There is around ~-2% regression in l3fwd-graph. I dont think, there
should not be any reason for regression as it is model are separate
header file.
Please check the common header file in fastpath and fix the regression
to accept this series.

./build/examples/dpdk-l3fwd-graph -a 0002:02:00.0 -c 0xc00000  -- -p
0x1 --config="(0, 0, 23)" -P (edited)
Old
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|Node                           |calls          |objs
|realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|ip4_lookup                     |7282757        |1864385584     |1
         |256.000        |38.704896      |1770.0000  |
|ip4_rewrite                    |7282758        |1864385840     |1
         |256.000        |38.704896      |1431.0000  |
|ethdev_tx-0                    |7282758        |1864385840     |1
         |256.000        |38.704896      |922.0000   |
|ethdev_rx-0-0                  |14882133       |1864386096     |2
         |256.000        |38.704896      |2015.0000  |
|pkt_cls                        |7282760        |1864386352     |1
         |256.000        |38.704896      |392.0000   |
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+


New
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|Node                           |calls          |objs
|realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|ip4_lookup                     |3002135        |768546560      |2
         |256.000        |38.402048      |1770.0000  |
|ip4_rewrite                    |3002136        |768546816      |1
         |256.000        |38.402048      |1425.0000  |
|ethdev_tx-0                    |3002137        |768547072      |2
         |256.000        |38.402048      |949.0000   |
|ethdev_rx-0-0                  |3002138        |768547328      |2
         |256.000        |38.402048      |1966.0000  |
|pkt_cls                        |3002138        |768547328      |1
         |256.000        |38.402048      |408.0000   |
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+

NAK for this series till the performance issues fixed.



>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8df23e5099..cc11328242 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst
>  Graph - EXPERIMENTAL
>  M: Jerin Jacob <jerinj@marvell.com>
>  M: Kiran Kumar K <kirankumark@marvell.com>
> +M: Zhirun Yan <zhirun.yan@intel.com>

Thanks for adding as maintainer.
Since you are at this change. Could you move up "Nithin Dabilpuram
<ndabilpuram@marvell.com>" two lines below and group all together?

>  F: lib/graph/
>  F: doc/guides/prog_guide/graph_lib.rst
>  F: app/test/test_graph*

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 01/15] graph: rename rte_graph_work as common
  2023-05-22  8:25             ` Jerin Jacob
@ 2023-05-23  8:13               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-23  8:13 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, May 22, 2023 4:26 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 01/15] graph: rename rte_graph_work as common
> 
> On Tue, May 9, 2023 at 11:34 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Rename rte_graph_work.h to rte_graph_work_common.h for supporting
> > multiple graph worker model.
> 
> 
> I have requested to check the performance with dpdk-test and l3fwd graph in
> last series.
> Have you checked the performance? In my testing, there is regression.
> Please check the performance with dpdk-test and l3fwd graph, there should not
> be any regression in RTC mode.
> 
> There is around -300% regression arm64 and x86.
> Command to mesure:
> ./build/app/test/dpdk-test -c 0xf00000 -- graph_perf_autotest
> 
> There is around ~-2% regression in l3fwd-graph. I dont think, there should not be
> any reason for regression as it is model are separate header file.
> Please check the common header file in fastpath and fix the regression to accept
> this series.
> 
> ./build/examples/dpdk-l3fwd-graph -a 0002:02:00.0 -c 0xc00000  -- -p
> 0x1 --config="(0, 0, 23)" -P (edited)
> Old
> +-------------------------------+---------------+---------------+---------------+--------------
> -+---------------+-----------+
> |Node                           |calls          |objs
> |realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
> +-------------------------------+---------------+---------------+---------------+--------------
> -+---------------+-----------+
> |ip4_lookup                     |7282757        |1864385584     |1
>          |256.000        |38.704896      |1770.0000  |
> |ip4_rewrite                    |7282758        |1864385840     |1
>          |256.000        |38.704896      |1431.0000  |
> |ethdev_tx-0                    |7282758        |1864385840     |1
>          |256.000        |38.704896      |922.0000   |
> |ethdev_rx-0-0                  |14882133       |1864386096     |2
>          |256.000        |38.704896      |2015.0000  |
> |pkt_cls                        |7282760        |1864386352     |1
>          |256.000        |38.704896      |392.0000   |
> +-------------------------------+---------------+---------------+---------------+--------------
> -+---------------+-----------+
> 
> 
> New
> +-------------------------------+---------------+---------------+---------------+--------------
> -+---------------+-----------+
> |Node                           |calls          |objs
> |realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
> +-------------------------------+---------------+---------------+---------------+--------------
> -+---------------+-----------+
> |ip4_lookup                     |3002135        |768546560      |2
>          |256.000        |38.402048      |1770.0000  |
> |ip4_rewrite                    |3002136        |768546816      |1
>          |256.000        |38.402048      |1425.0000  |
> |ethdev_tx-0                    |3002137        |768547072      |2
>          |256.000        |38.402048      |949.0000   |
> |ethdev_rx-0-0                  |3002138        |768547328      |2
>          |256.000        |38.402048      |1966.0000  |
> |pkt_cls                        |3002138        |768547328      |1
>          |256.000        |38.402048      |408.0000   |
> +-------------------------------+---------------+---------------+---------------+--------------
> -+---------------+-----------+
> 
> NAK for this series till the performance issues fixed.
> 

The root cause is come from V5->V6, change rte_rdtsc() to rte_rdtsc_precise() in node process in patch 03.

rte_rdtsc_precise() is heavy than rte_rdtsc(). And the graph walk will call __rte_node_process() for each node.

I will revert this change.


> 
> 
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> 
> > diff --git a/MAINTAINERS b/MAINTAINERS index 8df23e5099..cc11328242
> > 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst  Graph -
> > EXPERIMENTAL
> >  M: Jerin Jacob <jerinj@marvell.com>
> >  M: Kiran Kumar K <kirankumark@marvell.com>
> > +M: Zhirun Yan <zhirun.yan@intel.com>
> 
> Thanks for adding as maintainer.
> Since you are at this change. Could you move up "Nithin Dabilpuram
> <ndabilpuram@marvell.com>" two lines below and group all together?


Yes, I will do in next version.
> 
> >  F: lib/graph/
> >  F: doc/guides/prog_guide/graph_lib.rst
> >  F: app/test/test_graph*

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 04/15] graph: add get/set graph worker model APIs
  2023-05-09  6:03           ` [PATCH v6 04/15] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-05-24  6:08             ` Jerin Jacob
  2023-05-26  9:58               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  6:08 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:34 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
> diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
> new file mode 100644
> index 0000000000..cabc101262
> --- /dev/null
> +++ b/lib/graph/rte_graph_worker.c
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +
> +#include "rte_graph_worker_common.h"
> +
> +RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT;
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + * Set the graph worker model

Just declaring this top of the header file enough to avoid duplicating
in every functions
as all functions in header is experimental. See lib/graph/rte_graph.h


> + *
> + * @note This function does not perform any locking, and is only safe to call
> + *    before graph running.
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +int
> +rte_graph_worker_model_set(enum rte_graph_worker_model model)
> +{
> +       if (model >= RTE_GRAPH_MODEL_LIST_END)
> +               goto fail;
> +
> +       RTE_PER_LCORE(worker_model) = model;

Application needs to set this per core . Right?
Are we anticipating a case where one core runs one model and another
core runs with another model?
If not OR it is not practically possible, then,  To make application
programmer life easy,
We could loop through all lore and set on all of them instead of
application setting on each
one separately.


> +       return 0;
> +
> +fail:
> +       RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
> +       return -1;
> +}
> +

> +/** Graph worker models */
> +enum rte_graph_worker_model {
> +       RTE_GRAPH_MODEL_DEFAULT,

Add Doxygen comment
> +       RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,


Add Doxygen comment to explain what this mode does.


> +       RTE_GRAPH_MODEL_MCORE_DISPATCH,

Add Doxygen comment to explain what this mode does.

> +       RTE_GRAPH_MODEL_LIST_END

This can break the ABI if we add one in middle. Please remove this.
See lib/crytodev for
how to handle with _END symbols.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 06/15] graph: introduce graph bind unbind API
  2023-05-09  6:03           ` [PATCH v6 06/15] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-05-24  6:23             ` Jerin Jacob
  2023-05-26 10:00               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  6:23 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:34 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add lcore_id for graph to hold affinity core id where graph would run on.
> Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
> be set as MAX by default, it means not enable this attribute.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
>  lib/graph/graph_private.h |  2 ++
>  lib/graph/rte_graph.h     | 22 +++++++++++++++
>  lib/graph/version.map     |  2 ++
> +
> +int
> +rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore)
> +{
> +       struct graph *graph;
> +
> +       GRAPH_ID_CHECK(id);
> +       if (!rte_lcore_is_enabled(lcore))
> +               SET_ERR_JMP(ENOLINK, fail,
> +                           "lcore %d not enabled\n",

\n is already part of it from graph_err() so no need to add it.
Also, DPDK coding standard now supports 100 column as max width, so
below "lcore" can be moved to the same line if there is space.


> +                           lcore);
> +
> +       STAILQ_FOREACH(graph, &graph_list, next)
> +               if (graph->id == id)
> +                       break;
> +
> +       graph->lcore_id = lcore;
> +       graph->socket = rte_lcore_to_socket_id(lcore);
> +
> +       /* check the availability of source node */
> +       if (!graph_src_node_avail(graph))
> +               graph->graph->head = 0;
> +
> +       return 0;
> +
> +fail:
> +       return -rte_errno;
> +}
> +
> +void
> +rte_graph_model_dispatch_core_unbind(rte_graph_t id)
> +{
> +       struct graph *graph;
> +
> +       GRAPH_ID_CHECK(id);
> +       STAILQ_FOREACH(graph, &graph_list, next)
> +               if (graph->id == id)
> +                       break;
> +
> +       graph->lcore_id = RTE_MAX_LCORE;
> +
> +fail:
> +       return;
> +}
> +
>  struct rte_graph *
>  rte_graph_lookup(const char *name)
>  {
> @@ -346,6 +404,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
>         graph->src_node_count = src_node_count;
>         graph->node_count = graph_nodes_count(graph);
>         graph->id = graph_id;
> +       graph->lcore_id = RTE_MAX_LCORE;
>         graph->num_pkt_to_capture = prm->num_pkt_to_capture;
>         if (prm->pcap_filename)
>                 rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index bd4c576324..f63b339d81 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -99,6 +99,8 @@ struct graph {
>         /**< Circular buffer mask for wrap around. */
>         rte_graph_t id;
>         /**< Graph identifier. */
> +       unsigned int lcore_id;
> +       /**< Lcore identifier where the graph prefer to run on. */

Could you move to end of existing fast path variables. Also, please
extend the comments for new variable introduced ONLY for dispatch
model.
Something like " Lcore identifier where the graph prefer to run on."  to
" Lcore identifier where the graph prefer to run on. Used with
dispatch model" or so.


>         size_t mem_sz;
>         /**< Memory size of the graph. */
>         int socket;
> diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
> index c9a77297fc..c523809d1f 100644
> --- a/lib/graph/rte_graph.h
> +++ b/lib/graph/rte_graph.h
> @@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
>  __rte_experimental
>  int rte_graph_export(const char *name, FILE *f);
>
> +/**
> + * Bind graph with specific lcore
> + *
> + * @param id
> + *   Graph id to get the pointer of graph object
> + * @param lcore
> + * The lcore where the graph will run on
> + * @return
> + *   0 on success, error otherwise.
> + */
> +__rte_experimental
> +int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
> +
> +/**
> + * Unbind graph with lcore
> + *
> + * @param id
> + * Graph id to get the pointer of graph object
> + */
> +__rte_experimental
> +void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
> +
>  /**
>   * Get graph object from its name.
>   *
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index 1f090be74e..7de6f08f59 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -18,6 +18,8 @@ EXPERIMENTAL {
>         rte_graph_node_get_by_name;
>         rte_graph_obj_dump;
>         rte_graph_walk;
> +       rte_graph_model_dispatch_core_bind;
> +       rte_graph_model_dispatch_core_unbind;
>
>         rte_graph_cluster_stats_create;
>         rte_graph_cluster_stats_destroy;
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 05/15] graph: introduce graph node core affinity API
  2023-05-09  6:03           ` [PATCH v6 05/15] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-05-24  6:36             ` Jerin Jacob
  2023-05-26 10:00               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  6:36 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:34 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add lcore_id for node to hold affinity core id and impl
> rte_graph_model_dispatch_lcore_affinity_set to set node affinity
> with specific lcore.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph_private.h            |  1 +
>  lib/graph/meson.build                |  1 +
>  lib/graph/node.c                     |  1 +
>  lib/graph/rte_graph_model_dispatch.c | 30 +++++++++++++++++++
>  lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++

Could you change all the new model file to prefix _mcore_ like below
to align with enum name.
rte_graph_model_mcore_dispatch.*

>  lib/graph/version.map                |  2 ++
>  6 files changed, 78 insertions(+)
>  create mode 100644 lib/graph/rte_graph_model_dispatch.c
>  create mode 100644 lib/graph/rte_graph_model_dispatch.h
>
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index eacdef45f0..bd4c576324 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -51,6 +51,7 @@ struct node {
>         STAILQ_ENTRY(node) next;      /**< Next node in the list. */
>         char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
>         uint64_t flags;               /**< Node configuration flag. */
> +       unsigned int lcore_id;        /**< Node runs on the Lcore ID */

Could you move to end of existing fast path variables. Also, please
extend the comments for new variable introduced ONLY for dispatch
model. Comment should express it is only for dispatch model.


>         rte_node_process_t process;   /**< Node process function. */
>         rte_node_init_t init;         /**< Node init function. */
>         rte_node_fini_t fini;         /**< Node fini function. */
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index 9fab8243da..c729d984b6 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -16,6 +16,7 @@ sources = files(
>          'graph_populate.c',
>          'graph_pcap.c',
>          'rte_graph_worker.c',
> +        'rte_graph_model_dispatch.c',
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
>
> diff --git a/lib/graph/node.c b/lib/graph/node.c
> index 149414dcd9..339b4a0da5 100644
> --- a/lib/graph/node.c
> +++ b/lib/graph/node.c
> @@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
>                         goto free;
>         }
>
> +       node->lcore_id = RTE_MAX_LCORE;
>         node->id = node_id++;
>
>         /* Add the node at tail */
> diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c
> new file mode 100644
> index 0000000000..3364a76ed4
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_dispatch.c
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +
> +#include "graph_private.h"
> +#include "rte_graph_model_dispatch.h"
> +
> +int
> +rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id)
> +{
> +       struct node *node;
> +       int ret = -EINVAL;


Shouldn't we need to check the is model is dispatch before proceeding?
If so, same comment applicable for all function created solely for new model.

Also, in case some _exiting_ API not valid for dispatch model, please
add the check for the same.

> +
> +       if (lcore_id >= RTE_MAX_LCORE)
> +               return ret;
> +
> +       graph_spinlock_lock();
> +
> +       STAILQ_FOREACH(node, node_list_head_get(), next) {
> +               if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
> +                       node->lcore_id = lcore_id;
> +                       ret = 0;
> +                       break;
> +               }
> +       }
> +
> +       graph_spinlock_unlock();
> +
> +       return ret;
> +}
> diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h
> new file mode 100644
> index 0000000000..179624e972
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_dispatch.h
> @@ -0,0 +1,43 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +
> +#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
> +#define _RTE_GRAPH_MODEL_DISPATCH_H_

_RTE_GRAPH_MODEL_MCORE_DISPATCH_H_

> +
> +/**
> + * @file rte_graph_model_dispatch.h
> + *
> + * @warning
> + * @b EXPERIMENTAL:
> + * All functions in this file may be changed or removed without prior notice.
> + *
> + * This API allows to set core affinity with the node.
> + */
> +#include "rte_graph_worker_common.h"

Move this under below "extern "C

> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Set lcore affinity with the node.

Please change API description for all the API introduced for dispatch model
to explicitly mention, it is valid only for dispatch model.

Something like, "Set lcore affinity with the node for dispatch model" or so.


> + *
> + * @param name
> + *   Valid node name. In the case of the cloned node, the name will be
> + * "parent node name" + "-" + name.
> + * @param lcore_id
> + *   The lcore ID value.
> + *
> + * @return
> + *   0 on success, error otherwise.
> + */
> +__rte_experimental
> +int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
> +                                               unsigned int lcore_id);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index eea73ec9ca..1f090be74e 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -46,5 +46,7 @@ EXPERIMENTAL {
>         rte_graph_worker_model_set;
>         rte_graph_worker_model_get;
>
> +       rte_graph_model_dispatch_lcore_affinity_set;

Could you change all the new model API to prefix _mcore_ like below
to align with enum name.

rte_graph_model_mcore_dispatch_lcore_affinity_set

> +
>         local: *;
>  };
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 07/15] graph: introduce graph clone API for other worker core
  2023-05-09  6:03           ` [PATCH v6 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-05-24  7:14             ` Jerin Jacob
  2023-05-26 10:02               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  7:14 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch adds graph API for supporting to clone the graph object for
> a specified worker core. The new graph will also clone all nodes.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
> +
> +static rte_graph_t
> +graph_clone(struct graph *parent_graph, const char *name)
> +{
> +       struct graph_node *graph_node;
> +       struct graph *graph;
> +
> +       graph_spinlock_lock();
> +
> +       /* Don't allow to clone a node from a cloned graph */

Both clone_name() and graph_clone() kind of duplicate
rte_node_clone(), Please check, Can we have common _private_ function
to reuse just the common code between them


> +
>  rte_graph_t
>  rte_graph_from_name(const char *name)
>  {
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index f63b339d81..52ca30ed56 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -99,6 +99,8 @@ struct graph {
>         /**< Circular buffer mask for wrap around. */
>         rte_graph_t id;
>         /**< Graph identifier. */
> +       rte_graph_t parent_id;
> +       /**< Parent graph identifier. */
>         unsigned int lcore_id;
>         /**< Lcore identifier where the graph prefer to run on. */
>         size_t mem_sz;
> diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
> index c523809d1f..2f86c17de7 100644
> --- a/lib/graph/rte_graph.h
> +++ b/lib/graph/rte_graph.h
> @@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
>  __rte_experimental
>  int rte_graph_destroy(rte_graph_t id);
>
> +/**
> + * Clone Graph.
> + *
> + * Clone a graph from static graph (graph created from rte_graph_create). And

rte_graph_create->rte_graph_create()

> + * all cloned graphs attached to the parent graph MUST be destroyed together


Can we add reference count in the implementation to avoid this
limition. If this too much for this release, we can try to add next
release

> + * for fast schedule design limitation (stop ALL graph walk firstly).
> + *
> + * @param id
> + *   Static graph id to clone from.
> + * @param name
> + *   Name of the new graph. The library prepends the parent graph name to the
> + * user-specified name. The final graph name will be,
> + * "parent graph name" + "-" + name.
> + *
> + * @return
> + *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
> + */
> +__rte_experimental
> +rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
> +
>  /**
>   * Get graph id from graph name.
>   *
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index 7de6f08f59..aaa86f66ed 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -7,6 +7,7 @@ EXPERIMENTAL {
>
>         rte_graph_create;
>         rte_graph_destroy;
> +       rte_graph_clone;


Found new generic graph API, Please add test case for it at
app/test/test_graph.c.



>         rte_graph_dump;
>         rte_graph_export;
>         rte_graph_from_name;
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 08/15] graph: add struct for stream moving between cores
  2023-05-09  6:03           ` [PATCH v6 08/15] graph: add struct for stream moving between cores Zhirun Yan
@ 2023-05-24  7:24             ` Jerin Jacob
  2023-05-26 10:02               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  7:24 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add graph_sched_wq_node to hold graph scheduling workqueue
> node.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> + * @internal
> + *
> + * Structure that holds the graph scheduling workqueue node stream.
> + * Used for mcore dispatch model.
> + */
> +struct graph_sched_wq_node {

Could we change to graph_mcore_dispatch_wq_node? Just to make sure
this for mcore dispatch model.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 09/15] graph: introduce stream moving cross cores
  2023-05-09  6:03           ` [PATCH v6 09/15] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-05-24  8:00             ` Jerin Jacob
  2023-05-26 10:03               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  8:00 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch introduces key functions to allow a worker thread to
> enable enqueue and move streams of objects to the next nodes over
> different cores.

different cores-> different cores for mcore dispatch model.


>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph.c                    |   6 +-
>  lib/graph/graph_private.h            |  30 +++++
>  lib/graph/meson.build                |   2 +-
>  lib/graph/rte_graph.h                |  15 ++-
>  lib/graph/rte_graph_model_dispatch.c | 157 +++++++++++++++++++++++++++
>  lib/graph/rte_graph_model_dispatch.h |  37 +++++++
>  lib/graph/version.map                |   2 +
>  7 files changed, 244 insertions(+), 5 deletions(-)
>
> diff --git a/lib/graph/graph.c b/lib/graph/graph.c
> index e809aa55b0..f555844d8f 100644
> --- a/lib/graph/graph.c
> +++ b/lib/graph/graph.c
> @@ -495,7 +495,7 @@ clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
>  }
>
>  static rte_graph_t
> -graph_clone(struct graph *parent_graph, const char *name)
> +graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
>  {
>         struct graph_node *graph_node;
>         struct graph *graph;
> @@ -566,14 +566,14 @@ graph_clone(struct graph *parent_graph, const char *name)
>  }
>
> --- a/lib/graph/rte_graph.h
> +++ b/lib/graph/rte_graph.h
> @@ -169,6 +169,17 @@ struct rte_graph_param {
>         bool pcap_enable; /**< Pcap enable. */
>         uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
>         char *pcap_filename; /**< Filename in which packets to be captured.*/
> +
> +       RTE_STD_C11
> +       union {
> +               struct {
> +                       uint64_t rsvd[8];
> +               } rtc;
> +               struct {
> +                       uint32_t wq_size_max;
> +                       uint32_t mp_capacity;

Add doxgen comment for all please.

> +               } dispatch;
> +       };
>  };
>
>  /**
> @@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
>   *   Name of the new graph. The library prepends the parent graph name to the
>   * user-specified name. The final graph name will be,
>   * "parent graph name" + "-" + name.
> + * @param prm
> + *   Graph parameter, includes model-specific parameters in this graph.
>   *
>   * @return
>   *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
>   */
>  __rte_experimental
> -rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
> +rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
>
>  /**
> +void
> +__rte_graph_sched_wq_process(struct rte_graph *graph)
> +{
> +       struct graph_sched_wq_node *wq_node;
> +       struct rte_mempool *mp = graph->mp;
> +       struct rte_ring *wq = graph->wq;
> +       uint16_t idx, free_space;
> +       struct rte_node *node;
> +       unsigned int i, n;
> +       struct graph_sched_wq_node *wq_nodes[32];

Use RTE_GRAPH_BURST_SIZE instead of 32, if it is anything do with
burst size? else ignore.


> +
> +       n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
> +                                          RTE_DIM(wq_nodes), NULL);
> +       if (n == 0)
> +               return;
> +
> +       for (i = 0; i < n; i++) {
> +               wq_node = wq_nodes[i];
> +               node = RTE_PTR_ADD(graph, wq_node->node_off);
> +               RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +               idx = node->idx;
> +               free_space = node->size - idx;
> +
> +               if (unlikely(free_space < wq_node->nb_objs))
> +                       __rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
> +
> +               memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
> +               node->idx = idx + wq_node->nb_objs;
> +
> +               __rte_node_process(graph, node);
> +
> +               wq_node->nb_objs = 0;
> +               node->idx = 0;
> +       }
> +
> +       rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
> +}
> +
> +/**
> + * @internal

For both internal function, you can add Doxygen comment as @note to
tell this must not be used directly.

> + *
> + * Process all nodes (streams) in the graph's work queue.
> + *
> + * @param graph
> + *   Pointer to the graph object.
> + */
> +__rte_experimental
> +void __rte_graph_sched_wq_process(struct rte_graph *graph);
> +
>  /**
>   * Set lcore affinity with the node.
>   *
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index aaa86f66ed..d511133f39 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -48,6 +48,8 @@ EXPERIMENTAL {
>
>         rte_graph_worker_model_set;
>         rte_graph_worker_model_get;
> +       __rte_graph_sched_wq_process;
> +       __rte_graph_sched_node_enqueue;

Please add _mcore_dispatch_ name space.



>
>         rte_graph_model_dispatch_lcore_affinity_set;
>
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 13/15] graph: add stats for cross-core dispatching
  2023-05-09  6:03           ` [PATCH v6 13/15] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-05-24  8:08             ` Jerin Jacob
  2023-05-26 10:03               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  8:08 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add stats for cross-core dispatching scheduler if stats collection is
> enabled.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

> diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
> index 0ac764daf8..ee6c970ca4 100644
> --- a/lib/graph/rte_graph.h
> +++ b/lib/graph/rte_graph.h
> @@ -219,6 +219,8 @@ struct rte_graph_cluster_node_stats {
>         uint64_t prev_calls;    /**< Previous number of calls. */
>         uint64_t prev_objs;     /**< Previous number of processed objs. */
>         uint64_t prev_cycles;   /**< Previous number of cycles. */
> +       uint64_t sched_objs;    /**< Previous number of scheduled objs. */
> +       uint64_t sched_fail;    /**< Previous number of failed schedule objs. */

Add comment to specify it is for mcore_dispatch model. Also make it as
anonymous union so that later we can add new item for other model.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 15/15] doc: update multicore dispatch model in graph guides
  2023-05-09  6:03           ` [PATCH v6 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-05-24  8:12             ` Jerin Jacob
  2023-05-26 10:04               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  8:12 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang

On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Update graph documentation to introduce new multicore dispatch model.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  doc/guides/prog_guide/graph_lib.rst | 59 +++++++++++++++++++++++++++--
>  1 file changed, 55 insertions(+), 4 deletions(-)
>
> diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
> index 1cfdc86433..72e26f3a5a 100644
> --- a/doc/guides/prog_guide/graph_lib.rst
> +++ b/doc/guides/prog_guide/graph_lib.rst
> @@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx
>  node of port 0 and queue 0, all ipv4* nodes in the system,
>  and ethdev tx node of all ports.
>
> -Multicore graph processing
> -~~~~~~~~~~~~~~~~~~~~~~~~~~
> -In the current graph library implementation, specifically,
> -``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
> +graph model chossing

Graph models


> +~~~~~~~~~~~~~~~~~~~~
> +Currently, there are 2 different walking model. Use

model -> models

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model
  2023-05-09  6:03           ` [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-05-24  8:45             ` Jerin Jacob
  2023-05-26 10:04               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-05-24  8:45 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, Mattias Rönnblom

On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch enables to chose new scheduler model.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

>  rte_graph_walk(struct rte_graph *graph)
>  {
> -       rte_graph_walk_rtc(graph);
> +       int model = rte_graph_worker_model_get();

 Any specific to reason to keep model value in LCORE variable , why
not in  struct rte_graph?
It is not specific to this patch. But good to understand as storing in
rte_graph* will avoid cache miss.

> +
> +       if (model == RTE_GRAPH_MODEL_DEFAULT ||
> +           model == RTE_GRAPH_MODEL_RTC)

I think, there can be three ways to do this

a) Store model in PER_LCORE or struct rte_graph and add runtime check
b) Make it as function pointer for graph_walk

mcore_dispatch model is reusing all rte_node_enqueue_* functions, so
for NOW only graph walk is different.
But if need to integrate other graph models like eventdev
backend(similar problem trying to solve in
https://patches.dpdk.org/project/dpdk/patch/20230522091628.96236-2-mattias.ronnblom@ericsson.com/),
I think, we need to change enqueue variants.

Both (a) and (b) has little performance impact in "current situation
with this patch" and if we need to add similar check and function
pointer for overriding node_enqueue_ functions it will have major
impact.
In order to have NO performance impact and able to overide
node_enqueue_ functions, I think, we can have the following scheme in
application and library.

In application
#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
#include <rte_graph_model.h>

In library:
#if RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC
#define rte_graph_walk rte_graph_walk_rtc
#else if RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH
#define rte_graph_walk rte_graph_walk_mcore_dispatch

It is kind of compile time, But application can use function
templating by proving different values RTE_GRAPH_MODEL_SELECT to make
runtime if given application needs to support all
modes at runtime.

As an example:

app_my_graph_processing.h has application code  for graph walk and
node processing.

app_worker_rtc.c
------------------------
#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
#include <rte_graph_model.h>
#include app_my_graph_processing.h

void app_worker_rtc()
{
          while (1) {
               rte_graph_walk()
          }
}

app_worker_mcore_dispatch.c
-----------------------------------------

#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH
#include <rte_graph_model.h>
#include app_my_graph_processing.h

void app_worker_mcore_dispatch()
{
          while (1) {
               rte_graph_walk()
          }
}

in main()
-------------

if (command line arg provided worker as rtc)
rte_eal_remote_launch(app_worker_rtc)
else
rte_eal_remote_launch(app_worker_mcore_dispatch)

-----------------------------------------
With that note, ending review comment for this series.

In general patches look good high level, following items need to be
sorted in next version. Then I think, it is good to merge in this
release.

1) Above points on fixing performance and supporting more graph model variants
2) Need to add UT for ALL new APIs in app/test/test_graph.c
3) Make sure no performance regression with app/test/test_graph_perf.c
with new changes
4) Addressing existing comments in this series.

Thanks for great work.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 04/15] graph: add get/set graph worker model APIs
  2023-05-24  6:08             ` Jerin Jacob
@ 2023-05-26  9:58               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26  9:58 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 2:09 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 04/15] graph: add get/set graph worker model APIs
> 
> On Tue, May 9, 2023 at 11:34 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> > diff --git a/lib/graph/rte_graph_worker.c
> > b/lib/graph/rte_graph_worker.c new file mode 100644 index
> > 0000000000..cabc101262
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_worker.c
> > @@ -0,0 +1,54 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Intel Corporation  */
> > +
> > +#include "rte_graph_worker_common.h"
> > +
> > +RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) =
> > +RTE_GRAPH_MODEL_DEFAULT;
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + * Set the graph worker model
> 
> Just declaring this top of the header file enough to avoid duplicating in every
> functions as all functions in header is experimental. See lib/graph/rte_graph.h
> 
Got it, I will do in next version.

> 
> > + *
> > + * @note This function does not perform any locking, and is only safe to call
> > + *    before graph running.
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> > + */
> > +int
> > +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> > +       if (model >= RTE_GRAPH_MODEL_LIST_END)
> > +               goto fail;
> > +
> > +       RTE_PER_LCORE(worker_model) = model;
> 
> Application needs to set this per core . Right?

Yes. Each worker needs to know its model.

> Are we anticipating a case where one core runs one model and another core
> runs with another model?
> If not OR it is not practically possible, then,  To make application programmer
> life easy, We could loop through all lore and set on all of them instead of
> application setting on each one separately.
> 

For current rtc and dispatch models, it is not necessary.
To some extent that models are mutually exclusive.

For this case:
    Core 1: A->B->C (RTC)

    Core 2: A' (DISPATCH)
    Core 3: B' (DISPATCH)
    Core 4: C' (DISPATCH)

It may change the graph topo,  or need some prerequisites like RSS before input node A.

BTW, if there are some requirements with more models in future, we could add some attributions for graph, lcore, node.
Like taint/affinity for node and model. Then we could allow a node to appeal/repel a set of models. 

I will change to put the model into struct rte_graph as you suggested in patch 12  for this release.

> 
> > +       return 0;
> > +
> > +fail:
> > +       RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT;
> > +       return -1;
> > +}
> > +
> 
> > +/** Graph worker models */
> > +enum rte_graph_worker_model {
> > +       RTE_GRAPH_MODEL_DEFAULT,
> 
> Add Doxygen comment
> > +       RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
> 
> 
> Add Doxygen comment to explain what this mode does.
> 
> 
> > +       RTE_GRAPH_MODEL_MCORE_DISPATCH,
> 
> Add Doxygen comment to explain what this mode does.
> 
Ok, I will add Doxygen comments for these models.

> > +       RTE_GRAPH_MODEL_LIST_END
> 
> This can break the ABI if we add one in middle. Please remove this.
> See lib/crytodev for
> how to handle with _END symbols.

Yes, I will remove this.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 05/15] graph: introduce graph node core affinity API
  2023-05-24  6:36             ` Jerin Jacob
@ 2023-05-26 10:00               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26 10:00 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 2:36 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 05/15] graph: introduce graph node core affinity API
> 
> On Tue, May 9, 2023 at 11:34 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add lcore_id for node to hold affinity core id and impl
> > rte_graph_model_dispatch_lcore_affinity_set to set node affinity with
> > specific lcore.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph_private.h            |  1 +
> >  lib/graph/meson.build                |  1 +
> >  lib/graph/node.c                     |  1 +
> >  lib/graph/rte_graph_model_dispatch.c | 30 +++++++++++++++++++
> > lib/graph/rte_graph_model_dispatch.h | 43 ++++++++++++++++++++++++++++
> 
> Could you change all the new model file to prefix _mcore_ like below to align
> with enum name.
> rte_graph_model_mcore_dispatch.*

Yes,  I will do in next version.

> 
> >  lib/graph/version.map                |  2 ++
> >  6 files changed, 78 insertions(+)
> >  create mode 100644 lib/graph/rte_graph_model_dispatch.c
> >  create mode 100644 lib/graph/rte_graph_model_dispatch.h
> >
> > diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> > index eacdef45f0..bd4c576324 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -51,6 +51,7 @@ struct node {
> >         STAILQ_ENTRY(node) next;      /**< Next node in the list. */
> >         char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
> >         uint64_t flags;               /**< Node configuration flag. */
> > +       unsigned int lcore_id;        /**< Node runs on the Lcore ID */
> 
> Could you move to end of existing fast path variables. Also, please extend the
> comments for new variable introduced ONLY for dispatch model. Comment
> should express it is only for dispatch model.
> 
Got it, will do in next version.
> 
> >         rte_node_process_t process;   /**< Node process function. */
> >         rte_node_init_t init;         /**< Node init function. */
> >         rte_node_fini_t fini;         /**< Node fini function. */
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > 9fab8243da..c729d984b6 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -16,6 +16,7 @@ sources = files(
> >          'graph_populate.c',
> >          'graph_pcap.c',
> >          'rte_graph_worker.c',
> > +        'rte_graph_model_dispatch.c',
> >  )
> >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> > diff --git a/lib/graph/node.c b/lib/graph/node.c index
> > 149414dcd9..339b4a0da5 100644
> > --- a/lib/graph/node.c
> > +++ b/lib/graph/node.c
> > @@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register
> *reg)
> >                         goto free;
> >         }
> >
> > +       node->lcore_id = RTE_MAX_LCORE;
> >         node->id = node_id++;
> >
> >         /* Add the node at tail */
> > diff --git a/lib/graph/rte_graph_model_dispatch.c
> > b/lib/graph/rte_graph_model_dispatch.c
> > new file mode 100644
> > index 0000000000..3364a76ed4
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_model_dispatch.c
> > @@ -0,0 +1,30 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Intel Corporation  */
> > +
> > +#include "graph_private.h"
> > +#include "rte_graph_model_dispatch.h"
> > +
> > +int
> > +rte_graph_model_dispatch_lcore_affinity_set(const char *name,
> > +unsigned int lcore_id) {
> > +       struct node *node;
> > +       int ret = -EINVAL;
> 
> 
> Shouldn't we need to check the is model is dispatch before proceeding?
> If so, same comment applicable for all function created solely for new model.
> 
> Also, in case some _exiting_ API not valid for dispatch model, please add the
> check for the same.

Yes, I will add check for all function if it is only used by new model.
Actually, most existing API are valid, I will double check.

> 
> > +
> > +       if (lcore_id >= RTE_MAX_LCORE)
> > +               return ret;
> > +
> > +       graph_spinlock_lock();
> > +
> > +       STAILQ_FOREACH(node, node_list_head_get(), next) {
> > +               if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
> > +                       node->lcore_id = lcore_id;
> > +                       ret = 0;
> > +                       break;
> > +               }
> > +       }
> > +
> > +       graph_spinlock_unlock();
> > +
> > +       return ret;
> > +}
> > diff --git a/lib/graph/rte_graph_model_dispatch.h
> > b/lib/graph/rte_graph_model_dispatch.h
> > new file mode 100644
> > index 0000000000..179624e972
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_model_dispatch.h
> > @@ -0,0 +1,43 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Intel Corporation  */
> > +
> > +#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_
> > +#define _RTE_GRAPH_MODEL_DISPATCH_H_
> 
> _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
> 
> > +
> > +/**
> > + * @file rte_graph_model_dispatch.h
> > + *
> > + * @warning
> > + * @b EXPERIMENTAL:
> > + * All functions in this file may be changed or removed without prior notice.
> > + *
> > + * This API allows to set core affinity with the node.
> > + */
> > +#include "rte_graph_worker_common.h"
> 
> Move this under below "extern "C
> 
Yes.

> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * Set lcore affinity with the node.
> 
> Please change API description for all the API introduced for dispatch model to
> explicitly mention, it is valid only for dispatch model.
> 
> Something like, "Set lcore affinity with the node for dispatch model" or so.

Got it.

> 
> 
> > + *
> > + * @param name
> > + *   Valid node name. In the case of the cloned node, the name will be
> > + * "parent node name" + "-" + name.
> > + * @param lcore_id
> > + *   The lcore ID value.
> > + *
> > + * @return
> > + *   0 on success, error otherwise.
> > + */
> > +__rte_experimental
> > +int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
> > +                                               unsigned int
> > +lcore_id);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > eea73ec9ca..1f090be74e 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -46,5 +46,7 @@ EXPERIMENTAL {
> >         rte_graph_worker_model_set;
> >         rte_graph_worker_model_get;
> >
> > +       rte_graph_model_dispatch_lcore_affinity_set;
> 
> Could you change all the new model API to prefix _mcore_ like below to align
> with enum name.
> 
> rte_graph_model_mcore_dispatch_lcore_affinity_set

Yes, will do in next version.
> 
> > +
> >         local: *;
> >  };
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 06/15] graph: introduce graph bind unbind API
  2023-05-24  6:23             ` Jerin Jacob
@ 2023-05-26 10:00               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26 10:00 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 2:24 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 06/15] graph: introduce graph bind unbind API
> 
> On Tue, May 9, 2023 at 11:34 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add lcore_id for graph to hold affinity core id where graph would run on.
> > Add bind/unbind API to set/unset graph affinity attribute. lcore_id
> > will be set as MAX by default, it means not enable this attribute.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph.c         | 59 +++++++++++++++++++++++++++++++++++++++
> >  lib/graph/graph_private.h |  2 ++
> >  lib/graph/rte_graph.h     | 22 +++++++++++++++
> >  lib/graph/version.map     |  2 ++
> > +
> > +int
> > +rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore) {
> > +       struct graph *graph;
> > +
> > +       GRAPH_ID_CHECK(id);
> > +       if (!rte_lcore_is_enabled(lcore))
> > +               SET_ERR_JMP(ENOLINK, fail,
> > +                           "lcore %d not enabled\n",
> 
> \n is already part of it from graph_err() so no need to add it.
> Also, DPDK coding standard now supports 100 column as max width, so below
> "lcore" can be moved to the same line if there is space.
> 

Ok, I will merge into one line and remove \n.

> 
> > +                           lcore);
> > +
> > +       STAILQ_FOREACH(graph, &graph_list, next)
> > +               if (graph->id == id)
> > +                       break;
> > +
> > +       graph->lcore_id = lcore;
> > +       graph->socket = rte_lcore_to_socket_id(lcore);
> > +
> > +       /* check the availability of source node */
> > +       if (!graph_src_node_avail(graph))
> > +               graph->graph->head = 0;
> > +
> > +       return 0;
> > +
> > +fail:
> > +       return -rte_errno;
> > +}
> > +
> > +void
> > +rte_graph_model_dispatch_core_unbind(rte_graph_t id) {
> > +       struct graph *graph;
> > +
> > +       GRAPH_ID_CHECK(id);
> > +       STAILQ_FOREACH(graph, &graph_list, next)
> > +               if (graph->id == id)
> > +                       break;
> > +
> > +       graph->lcore_id = RTE_MAX_LCORE;
> > +
> > +fail:
> > +       return;
> > +}
> > +
> >  struct rte_graph *
> >  rte_graph_lookup(const char *name)
> >  {
> > @@ -346,6 +404,7 @@ rte_graph_create(const char *name, struct
> rte_graph_param *prm)
> >         graph->src_node_count = src_node_count;
> >         graph->node_count = graph_nodes_count(graph);
> >         graph->id = graph_id;
> > +       graph->lcore_id = RTE_MAX_LCORE;
> >         graph->num_pkt_to_capture = prm->num_pkt_to_capture;
> >         if (prm->pcap_filename)
> >                 rte_strscpy(graph->pcap_filename, prm->pcap_filename,
> > RTE_GRAPH_PCAP_FILE_SZ); diff --git a/lib/graph/graph_private.h
> > b/lib/graph/graph_private.h index bd4c576324..f63b339d81 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -99,6 +99,8 @@ struct graph {
> >         /**< Circular buffer mask for wrap around. */
> >         rte_graph_t id;
> >         /**< Graph identifier. */
> > +       unsigned int lcore_id;
> > +       /**< Lcore identifier where the graph prefer to run on. */
> 
> Could you move to end of existing fast path variables. Also, please extend the
> comments for new variable introduced ONLY for dispatch model.
> Something like " Lcore identifier where the graph prefer to run on."  to " Lcore
> identifier where the graph prefer to run on. Used with dispatch model" or so.

Yes, I will do in next version.

> 
> 
> >         size_t mem_sz;
> >         /**< Memory size of the graph. */
> >         int socket;
> > diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h index
> > c9a77297fc..c523809d1f 100644
> > --- a/lib/graph/rte_graph.h
> > +++ b/lib/graph/rte_graph.h
> > @@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
> > __rte_experimental  int rte_graph_export(const char *name, FILE *f);
> >
> > +/**
> > + * Bind graph with specific lcore
> > + *
> > + * @param id
> > + *   Graph id to get the pointer of graph object
> > + * @param lcore
> > + * The lcore where the graph will run on
> > + * @return
> > + *   0 on success, error otherwise.
> > + */
> > +__rte_experimental
> > +int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore);
> > +
> > +/**
> > + * Unbind graph with lcore
> > + *
> > + * @param id
> > + * Graph id to get the pointer of graph object  */ __rte_experimental
> > +void rte_graph_model_dispatch_core_unbind(rte_graph_t id);
> > +
> >  /**
> >   * Get graph object from its name.
> >   *
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > 1f090be74e..7de6f08f59 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -18,6 +18,8 @@ EXPERIMENTAL {
> >         rte_graph_node_get_by_name;
> >         rte_graph_obj_dump;
> >         rte_graph_walk;
> > +       rte_graph_model_dispatch_core_bind;
> > +       rte_graph_model_dispatch_core_unbind;
> >
> >         rte_graph_cluster_stats_create;
> >         rte_graph_cluster_stats_destroy;
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 07/15] graph: introduce graph clone API for other worker core
  2023-05-24  7:14             ` Jerin Jacob
@ 2023-05-26 10:02               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26 10:02 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 3:14 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 07/15] graph: introduce graph clone API for other worker
> core
> 
> On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > This patch adds graph API for supporting to clone the graph object for
> > a specified worker core. The new graph will also clone all nodes.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> > +
> > +static rte_graph_t
> > +graph_clone(struct graph *parent_graph, const char *name) {
> > +       struct graph_node *graph_node;
> > +       struct graph *graph;
> > +
> > +       graph_spinlock_lock();
> > +
> > +       /* Don't allow to clone a node from a cloned graph */
> 
> Both clone_name() and graph_clone() kind of duplicate rte_node_clone(), Please
> check, Can we have common _private_ function to reuse just the common code
> between them


rte_graph_clone(), graph_clone() in graph.c and rte_node_clone(), node_clone() in node.c are different.
They are specific for node or graph.
I think they could remain the same.

Only clone_name(struct rte_node_register *reg, struct node *node, const char *name)
could be reuse.

And I will change the param to clone_name(char *name, char* ori_str, const char *append_str) and put
this func into graph_private.h
 

> 
> 
> > +
> >  rte_graph_t
> >  rte_graph_from_name(const char *name)  { diff --git
> > a/lib/graph/graph_private.h b/lib/graph/graph_private.h index
> > f63b339d81..52ca30ed56 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -99,6 +99,8 @@ struct graph {
> >         /**< Circular buffer mask for wrap around. */
> >         rte_graph_t id;
> >         /**< Graph identifier. */
> > +       rte_graph_t parent_id;
> > +       /**< Parent graph identifier. */
> >         unsigned int lcore_id;
> >         /**< Lcore identifier where the graph prefer to run on. */
> >         size_t mem_sz;
> > diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h index
> > c523809d1f..2f86c17de7 100644
> > --- a/lib/graph/rte_graph.h
> > +++ b/lib/graph/rte_graph.h
> > @@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name,
> > struct rte_graph_param *prm);  __rte_experimental  int
> > rte_graph_destroy(rte_graph_t id);
> >
> > +/**
> > + * Clone Graph.
> > + *
> > + * Clone a graph from static graph (graph created from
> > +rte_graph_create). And
> 
> rte_graph_create->rte_graph_create()
> 
> > + * all cloned graphs attached to the parent graph MUST be destroyed
> > + together
> 
> 
> Can we add reference count in the implementation to avoid this limition. If this
> too much for this release, we can try to add next release

Yes, I think it's good in next release. Thanks.

> 
> > + * for fast schedule design limitation (stop ALL graph walk firstly).
> > + *
> > + * @param id
> > + *   Static graph id to clone from.
> > + * @param name
> > + *   Name of the new graph. The library prepends the parent graph name to
> the
> > + * user-specified name. The final graph name will be,
> > + * "parent graph name" + "-" + name.
> > + *
> > + * @return
> > + *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
> > + */
> > +__rte_experimental
> > +rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
> > +
> >  /**
> >   * Get graph id from graph name.
> >   *
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > 7de6f08f59..aaa86f66ed 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -7,6 +7,7 @@ EXPERIMENTAL {
> >
> >         rte_graph_create;
> >         rte_graph_destroy;
> > +       rte_graph_clone;
> 
> 
> Found new generic graph API, Please add test case for it at
> app/test/test_graph.c.
> 
Yes, I will add test.
> 
> 
> >         rte_graph_dump;
> >         rte_graph_export;
> >         rte_graph_from_name;
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 08/15] graph: add struct for stream moving between cores
  2023-05-24  7:24             ` Jerin Jacob
@ 2023-05-26 10:02               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26 10:02 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 3:25 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 08/15] graph: add struct for stream moving between
> cores
> 
> On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add graph_sched_wq_node to hold graph scheduling workqueue node.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > + * @internal
> > + *
> > + * Structure that holds the graph scheduling workqueue node stream.
> > + * Used for mcore dispatch model.
> > + */
> > +struct graph_sched_wq_node {
> 
> Could we change to graph_mcore_dispatch_wq_node? Just to make sure this
> for mcore dispatch model.

Yes.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 09/15] graph: introduce stream moving cross cores
  2023-05-24  8:00             ` Jerin Jacob
@ 2023-05-26 10:03               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26 10:03 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 4:00 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 09/15] graph: introduce stream moving cross cores
> 
> On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > This patch introduces key functions to allow a worker thread to enable
> > enqueue and move streams of objects to the next nodes over different
> > cores.
> 
> different cores-> different cores for mcore dispatch model.
> 
Got it. Thanks.

> 
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph.c                    |   6 +-
> >  lib/graph/graph_private.h            |  30 +++++
> >  lib/graph/meson.build                |   2 +-
> >  lib/graph/rte_graph.h                |  15 ++-
> >  lib/graph/rte_graph_model_dispatch.c | 157
> > +++++++++++++++++++++++++++  lib/graph/rte_graph_model_dispatch.h |  37
> +++++++
> >  lib/graph/version.map                |   2 +
> >  7 files changed, 244 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/graph/graph.c b/lib/graph/graph.c index
> > e809aa55b0..f555844d8f 100644
> > --- a/lib/graph/graph.c
> > +++ b/lib/graph/graph.c
> > @@ -495,7 +495,7 @@ clone_name(struct graph *graph, struct graph
> > *parent_graph, const char *name)  }
> >
> >  static rte_graph_t
> > -graph_clone(struct graph *parent_graph, const char *name)
> > +graph_clone(struct graph *parent_graph, const char *name, struct
> > +rte_graph_param *prm)
> >  {
> >         struct graph_node *graph_node;
> >         struct graph *graph;
> > @@ -566,14 +566,14 @@ graph_clone(struct graph *parent_graph, const
> > char *name)  }
> >
> > --- a/lib/graph/rte_graph.h
> > +++ b/lib/graph/rte_graph.h
> > @@ -169,6 +169,17 @@ struct rte_graph_param {
> >         bool pcap_enable; /**< Pcap enable. */
> >         uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
> >         char *pcap_filename; /**< Filename in which packets to be
> > captured.*/
> > +
> > +       RTE_STD_C11
> > +       union {
> > +               struct {
> > +                       uint64_t rsvd[8];
> > +               } rtc;
> > +               struct {
> > +                       uint32_t wq_size_max;
> > +                       uint32_t mp_capacity;
> 
> Add doxgen comment for all please.
> 
> > +               } dispatch;
> > +       };
> >  };
> >
> >  /**
> > @@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
> >   *   Name of the new graph. The library prepends the parent graph name to
> the
> >   * user-specified name. The final graph name will be,
> >   * "parent graph name" + "-" + name.
> > + * @param prm
> > + *   Graph parameter, includes model-specific parameters in this graph.
> >   *
> >   * @return
> >   *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
> >   */
> >  __rte_experimental
> > -rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
> > +rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct
> > +rte_graph_param *prm);
> >
> >  /**
> > +void
> > +__rte_graph_sched_wq_process(struct rte_graph *graph) {
> > +       struct graph_sched_wq_node *wq_node;
> > +       struct rte_mempool *mp = graph->mp;
> > +       struct rte_ring *wq = graph->wq;
> > +       uint16_t idx, free_space;
> > +       struct rte_node *node;
> > +       unsigned int i, n;
> > +       struct graph_sched_wq_node *wq_nodes[32];
> 
> Use RTE_GRAPH_BURST_SIZE instead of 32, if it is anything do with burst size?
> else ignore.

No, wq_nodes[32] is just a temporary space to consume the task.

I will add a macro WQ_SIZE to define 32.

> 
> 
> > +
> > +       n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes,
> sizeof(wq_nodes[0]),
> > +                                          RTE_DIM(wq_nodes), NULL);
> > +       if (n == 0)
> > +               return;
> > +
> > +       for (i = 0; i < n; i++) {
> > +               wq_node = wq_nodes[i];
> > +               node = RTE_PTR_ADD(graph, wq_node->node_off);
> > +               RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +               idx = node->idx;
> > +               free_space = node->size - idx;
> > +
> > +               if (unlikely(free_space < wq_node->nb_objs))
> > +                       __rte_node_stream_alloc_size(graph, node,
> > + node->size + wq_node->nb_objs);
> > +
> > +               memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs *
> sizeof(void *));
> > +               node->idx = idx + wq_node->nb_objs;
> > +
> > +               __rte_node_process(graph, node);
> > +
> > +               wq_node->nb_objs = 0;
> > +               node->idx = 0;
> > +       }
> > +
> > +       rte_mempool_put_bulk(mp, (void **)wq_nodes, n); }
> > +
> > +/**
> > + * @internal
> 
> For both internal function, you can add Doxygen comment as @note to tell this
> must not be used directly.

Yes, I will add a note here.

> 
> > + *
> > + * Process all nodes (streams) in the graph's work queue.
> > + *
> > + * @param graph
> > + *   Pointer to the graph object.
> > + */
> > +__rte_experimental
> > +void __rte_graph_sched_wq_process(struct rte_graph *graph);
> > +
> >  /**
> >   * Set lcore affinity with the node.
> >   *
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > aaa86f66ed..d511133f39 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -48,6 +48,8 @@ EXPERIMENTAL {
> >
> >         rte_graph_worker_model_set;
> >         rte_graph_worker_model_get;
> > +       __rte_graph_sched_wq_process;
> > +       __rte_graph_sched_node_enqueue;
> 
> Please add _mcore_dispatch_ name space.

Yes.

> 
> 
> 
> >
> >         rte_graph_model_dispatch_lcore_affinity_set;
> >
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 13/15] graph: add stats for cross-core dispatching
  2023-05-24  8:08             ` Jerin Jacob
@ 2023-05-26 10:03               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26 10:03 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 4:09 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 13/15] graph: add stats for cross-core dispatching
> 
> On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add stats for cross-core dispatching scheduler if stats collection is
> > enabled.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> 
> > diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h index
> > 0ac764daf8..ee6c970ca4 100644
> > --- a/lib/graph/rte_graph.h
> > +++ b/lib/graph/rte_graph.h
> > @@ -219,6 +219,8 @@ struct rte_graph_cluster_node_stats {
> >         uint64_t prev_calls;    /**< Previous number of calls. */
> >         uint64_t prev_objs;     /**< Previous number of processed objs. */
> >         uint64_t prev_cycles;   /**< Previous number of cycles. */
> > +       uint64_t sched_objs;    /**< Previous number of scheduled objs. */
> > +       uint64_t sched_fail;    /**< Previous number of failed schedule objs. */
> 
> Add comment to specify it is for mcore_dispatch model. Also make it as
> anonymous union so that later we can add new item for other model.


Sure. Will do in next version.


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 15/15] doc: update multicore dispatch model in graph guides
  2023-05-24  8:12             ` Jerin Jacob
@ 2023-05-26 10:04               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26 10:04 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 4:13 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 15/15] doc: update multicore dispatch model in graph
> guides
> 
> On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Update graph documentation to introduce new multicore dispatch model.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  doc/guides/prog_guide/graph_lib.rst | 59
> > +++++++++++++++++++++++++++--
> >  1 file changed, 55 insertions(+), 4 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/graph_lib.rst
> > b/doc/guides/prog_guide/graph_lib.rst
> > index 1cfdc86433..72e26f3a5a 100644
> > --- a/doc/guides/prog_guide/graph_lib.rst
> > +++ b/doc/guides/prog_guide/graph_lib.rst
> > @@ -189,14 +189,65 @@ In the above example, A graph object will be
> > created with ethdev Rx  node of port 0 and queue 0, all ipv4* nodes in
> > the system,  and ethdev tx node of all ports.
> >
> > -Multicore graph processing
> > -~~~~~~~~~~~~~~~~~~~~~~~~~~
> > -In the current graph library implementation, specifically,
> > -``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API
> > functions
> > +graph model chossing
> 
> Graph models
> 
> 
> > +~~~~~~~~~~~~~~~~~~~~
> > +Currently, there are 2 different walking model. Use
> 
> model -> models

Got it. Thanks.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model
  2023-05-24  8:45             ` Jerin Jacob
@ 2023-05-26 10:04               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-05-26 10:04 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 24, 2023 4:46 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler
> model
> 
> On Tue, May 9, 2023 at 11:35 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > This patch enables to chose new scheduler model.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> 
> >  rte_graph_walk(struct rte_graph *graph)  {
> > -       rte_graph_walk_rtc(graph);
> > +       int model = rte_graph_worker_model_get();
> 
>  Any specific to reason to keep model value in LCORE variable , why not in  struct
> rte_graph?
> It is not specific to this patch. But good to understand as storing in
> rte_graph* will avoid cache miss.
> 
Yes, I can put it into rte_graph.

> 
> > +
> > +       if (model == RTE_GRAPH_MODEL_DEFAULT ||
> > +           model == RTE_GRAPH_MODEL_RTC)
> 
> I think, there can be three ways to do this
> 
> a) Store model in PER_LCORE or struct rte_graph and add runtime check
> b) Make it as function pointer for graph_walk
> 
> mcore_dispatch model is reusing all rte_node_enqueue_* functions, so for
> NOW only graph walk is different.
> But if need to integrate other graph models like eventdev backend(similar
> problem trying to solve in
> https://patches.dpdk.org/project/dpdk/patch/20230522091628.96236-2-
> mattias.ronnblom@ericsson.com/),
> I think, we need to change enqueue variants.
> 
Yes, there is no change for rte_node_enqueue_*.
I will follow this thread. And may make some contribution after this release.

> Both (a) and (b) has little performance impact in "current situation with this
> patch" and if we need to add similar check and function pointer for overriding
> node_enqueue_ functions it will have major impact.
> In order to have NO performance impact and able to overide node_enqueue_
> functions, I think, we can have the following scheme in application and library.
> 
> In application
> #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC #include
> <rte_graph_model.h>
> 
> In library:
> #if RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC #define
> rte_graph_walk rte_graph_walk_rtc #else if RTE_GRAPH_MODEL_SELECT ==
> RTE_GRAPH_MODEL_MCORE_DISPATCH #define rte_graph_walk
> rte_graph_walk_mcore_dispatch
> 
> It is kind of compile time, But application can use function templating by proving
> different values RTE_GRAPH_MODEL_SELECT to make runtime if given
> application needs to support all modes at runtime.
> 
> 
> As an example:
> 
> app_my_graph_processing.h has application code  for graph walk and node
> processing.
> 
> app_worker_rtc.c
> ------------------------
> #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC #include
> <rte_graph_model.h> #include app_my_graph_processing.h
> 
> void app_worker_rtc()
> {
>           while (1) {
>                rte_graph_walk()
>           }
> }
> 
> app_worker_mcore_dispatch.c
> -----------------------------------------
> 
> #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH
> #include <rte_graph_model.h> #include app_my_graph_processing.h
> 
> void app_worker_mcore_dispatch()
> {
>           while (1) {
>                rte_graph_walk()
>           }
> }
> 
> in main()
> -------------
> 
> if (command line arg provided worker as rtc)

Got it.
And we could use rte_graph->model to choose rtc or dispatch for future. Then it could be possible for models coexistence as you said before.


> rte_eal_remote_launch(app_worker_rtc)
> else
> rte_eal_remote_launch(app_worker_mcore_dispatch)
> 
> -----------------------------------------
> With that note, ending review comment for this series.
> 
> In general patches look good high level, following items need to be sorted in
> next version. Then I think, it is good to merge in this release.
> 
> 1) Above points on fixing performance and supporting more graph model
> variants
> 2) Need to add UT for ALL new APIs in app/test/test_graph.c
> 3) Make sure no performance regression with app/test/test_graph_perf.c with
> new changes
> 4) Addressing existing comments in this series.
> 
> Thanks for great work.


Hi Jerin,

Thanks for your review. I will fix these in next version.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 00/15] graph enhancement for multi-core dispatch
  2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                             ` (14 preceding siblings ...)
  2023-05-09  6:03           ` [PATCH v6 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-06-05 11:19           ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 01/17] graph: rename rte_graph_work as common Zhirun Yan
                               ` (17 more replies)
  15 siblings, 18 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (17):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  test/graph: add functional tests for mcore dispatch model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                                |   3 +-
 app/test/test_graph.c                      | 113 +++++
 doc/guides/prog_guide/graph_lib.rst        |  60 ++-
 examples/l3fwd-graph/main.c                | 231 +++++++--
 lib/graph/graph.c                          | 155 ++++++
 lib/graph/graph_debug.c                    |   6 +
 lib/graph/graph_populate.c                 |   1 +
 lib/graph/graph_private.h                  |  90 ++++
 lib/graph/graph_stats.c                    |  74 ++-
 lib/graph/meson.build                      |   4 +-
 lib/graph/node.c                           |  27 +-
 lib/graph/rte_graph.h                      |  65 +++
 lib/graph/rte_graph_model_mcore_dispatch.c | 191 ++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 133 +++++
 lib/graph/rte_graph_model_rtc.h            |  46 ++
 lib/graph/rte_graph_worker.c               |  70 +++
 lib/graph/rte_graph_worker.h               | 503 +------------------
 lib/graph/rte_graph_worker_common.h        | 541 +++++++++++++++++++++
 lib/graph/version.map                      |  10 +
 19 files changed, 1759 insertions(+), 564 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 01/17] graph: rename rte_graph_work as common
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 02/17] graph: split graph worker into common and default model Zhirun Yan
                               ` (16 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 3 ++-
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 8 insertions(+), 7 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df23e5099..4bb40db2d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1714,10 +1714,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 02/17] graph: split graph worker into common and default model
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 01/17] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 03/17] graph: move node process into inline function Zhirun Yan
                               ` (15 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 6 files changed, 100 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 03/17] graph: move node process into inline function
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 01/17] graph: rename rte_graph_work as common Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 02/17] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 04/17] graph: add get/set graph worker model APIs Zhirun Yan
                               ` (14 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 04/17] graph: add get/set graph worker model APIs
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (2 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 03/17] graph: move node process into inline function Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 12:38               ` Jerin Jacob
  2023-06-05 11:19             ` [PATCH v7 05/17] graph: introduce graph node core affinity API Zhirun Yan
                               ` (13 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 70 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 19 ++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 93 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..fc188e7cfa
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+/**
+ * @file graph_worker.c
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These API enable to set/get graph walking model.
+ *
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+	int ret = 0;
+
+	if (model == RTE_GRAPH_MODEL_DEFAULT || model == RTE_GRAPH_MODEL_RTC ||
+	    model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+	else {
+		STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = RTE_GRAPH_MODEL_DEFAULT;
+		ret = -1;
+		}
+
+	return ret;
+}
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one
+ *
+ * @param name
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+inline
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	graph = STAILQ_FIRST(graph_head);
+
+	return graph->graph->model;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..72d132bae4 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,16 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+enum rte_graph_worker_model {
+	RTE_GRAPH_MODEL_DEFAULT,
+	/**< Default graph model*/
+	RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
+	/**< Run-To-Completion model. It is the default model. */
+	RTE_GRAPH_MODEL_MCORE_DISPATCH
+	/**< Dispatch model to support cross-core dispatching within core affinity. */
+};
+
 /**
  * @internal
  *
@@ -50,6 +60,7 @@ struct rte_graph {
 	/** Number of packets to capture per core. */
 	uint64_t nb_pkt_to_capture;
 	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
+	enum rte_graph_worker_model model; /**< graph model */
 	uint64_t fence;			/**< Fence. */
 } __rte_cache_aligned;
 
@@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+__rte_experimental
+enum rte_graph_worker_model
+rte_graph_worker_model_get(void);
+
+__rte_experimental
+int
+rte_graph_worker_model_set(enum rte_graph_worker_model model);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 05/17] graph: introduce graph node core affinity API
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (3 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 04/17] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 06/17] graph: introduce graph bind unbind API Zhirun Yan
                               ` (12 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  1 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  2 +
 6 files changed, 81 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..0685cf9e72 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..f39a65e902 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 06/17] graph: introduce graph bind unbind API
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (4 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 05/17] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 12:41               ` Jerin Jacob
  2023-06-05 11:19             ` [PATCH v7 07/17] graph: move node clone name func into private as common Zhirun Yan
                               ` (11 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 58 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 84 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..ecb47dceda 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,63 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	RTE_ASSERT(graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +403,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index f39a65e902..132e666b79 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 07/17] graph: move node clone name func into private as common
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (5 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 06/17] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
                               ` (10 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 08/17] graph: introduce graph clone API for other worker core
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (6 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 07/17] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 09/17] graph: add structure for stream moving between cores Zhirun Yan
                               ` (9 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 86 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 109 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index ecb47dceda..8ce87ae6da 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -403,6 +403,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -467,6 +468,91 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 132e666b79..eccecc8767 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 09/17] graph: add structure for stream moving between cores
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (7 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 12:46               ` Jerin Jacob
  2023-06-05 11:19             ` [PATCH v7 10/17] graph: introduce stream moving cross cores Zhirun Yan
                               ` (8 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8ce87ae6da..9f107db425 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -289,6 +289,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 
 	RTE_ASSERT(graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH);
 	graph->lcore_id = lcore;
+	graph->graph->lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -312,6 +313,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 72d132bae4..00bcf47ee8 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -39,6 +39,13 @@ enum rte_graph_worker_model {
 	/**< Dispatch model to support cross-core dispatching within core affinity. */
 };
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -50,6 +57,15 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -84,6 +100,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+		union {
+			/* Fast schedule area for mcore dispatch model */
+			struct {
+				unsigned int lcore_id;  /**< Node running lcore. */
+			} dispatch;
+		};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 10/17] graph: introduce stream moving cross cores
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (8 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 09/17] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                               ` (7 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                          |   6 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   2 +
 7 files changed, 254 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 9f107db425..2eb2455f77 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -471,7 +471,7 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -542,14 +542,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 0685cf9e72..9d51eabe33 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..552c87a0b4 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    flags);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_mcore_dispatch_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eccecc8767..d33c453d97 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
 
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 11/17] graph: enable create and destroy graph scheduling workqueue
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (9 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 10/17] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                               ` (6 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 2eb2455f77..e32455f76f 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -449,6 +449,10 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -519,6 +523,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	if (graph_fp_mem_create(graph))
 		goto graph_cleanup;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 12/17] graph: introduce graph walk by cross-core dispatch
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (10 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                               ` (5 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 43 ++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..610251430f 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,49 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->lcore_id != node->dispatch.lcore_id && graph->rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (11 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 14/17] graph: add stats for cross-core dispatching Zhirun Yan
                               ` (4 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..541c373cb1 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,18 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC
 	rte_graph_walk_rtc(graph);
+#elif RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	int model = rte_graph_worker_model_get();
+
+	if (model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 14/17] graph: add stats for cross-core dispatching
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (12 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                               ` (3 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 74 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 87 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..7dcf07b080 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..c7e2c0a28d 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,27 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get();
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +377,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get();
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +400,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 552c87a0b4..ec333e7dd7 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 00bcf47ee8..23c23a31d5 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -107,6 +107,8 @@ struct rte_node {
 				unsigned int lcore_id;  /**< Node running lcore. */
 			} dispatch;
 		};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (13 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 14/17] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 13:42               ` Jerin Jacob
  2023-06-05 11:19             ` [PATCH v7 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
                               ` (2 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

Note:
RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
choose by model in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c  | 231 +++++++++++++++++++++++++++++------
 lib/graph/rte_graph_worker.h |   3 +
 2 files changed, 196 insertions(+), 38 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..4ecc6c9af4 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,12 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default. */
+#define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
+#define GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by */
+					   /* parsing model in cmdline. */
+#undef RTE_GRAPH_MODEL_SELECT
+#define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -55,6 +61,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +97,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static enum rte_graph_worker_model model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +164,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +300,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +343,19 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+
+	if (model_conf != RTE_GRAPH_MODEL_SELECT &&
+	    RTE_GRAPH_MODEL_SELECT <= RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +472,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +489,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +501,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +593,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -788,6 +835,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -840,6 +1023,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1265,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 541c373cb1..19b4c1514f 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -26,6 +26,9 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#define RTE_GRAPH_MODEL_RTC 0
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+
 #if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC
 	rte_graph_walk_rtc(graph);
 #elif RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 16/17] test/graph: add functional tests for mcore dispatch model
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (14 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-05 11:19             ` [PATCH v7 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 app/test/test_graph.c | 113 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..d711b7deb9 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,115 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		return -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		return -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		return -1;
+	}
+	return 0;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		return -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		return -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->lcore_id);
+		return -1;
+	}
+	return 0;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	int ret;
+
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		return -1;
+	}
+
+	if (rte_graph_worker_model_get() != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		return -1;
+	}
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +946,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v7 17/17] doc: update multicore dispatch model in graph guides
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (15 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
@ 2023-06-05 11:19             ` Zhirun Yan
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-05 11:19 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 60 +++++++++++++++++++++++++++--
 1 file changed, 56 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..c0bdb4c71f 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,66 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+Graph models chossing
+~~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking models. Use macro
+RTE_GRAPH_MODEL_SELECT to set the model in compile time. Use
+``rte_graph_worker_model_set()`` to set the walking model in runtime.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v7 04/17] graph: add get/set graph worker model APIs
  2023-06-05 11:19             ` [PATCH v7 04/17] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-05 12:38               ` Jerin Jacob
  2023-06-06  4:30                 ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-05 12:38 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Mon, Jun 5, 2023 at 4:56 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/meson.build               |  1 +
>  lib/graph/rte_graph_worker.c        | 70 +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 19 ++++++++
>  lib/graph/version.map               |  3 ++
>  4 files changed, 93 insertions(+)
>  create mode 100644 lib/graph/rte_graph_worker.c
>
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index 3526d1b5d4..9fab8243da 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -15,6 +15,7 @@ sources = files(
>          'graph_stats.c',
>          'graph_populate.c',
>          'graph_pcap.c',
> +        'rte_graph_worker.c',
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
>
> diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
> new file mode 100644
> index 0000000000..fc188e7cfa
> --- /dev/null
> +++ b/lib/graph/rte_graph_worker.c
> @@ -0,0 +1,70 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +
> +/**
> + * @file graph_worker.c
> + *
> + * @warning
> + * @b EXPERIMENTAL:
> + * All functions in this file may be changed or removed without prior notice.
> + *
> + * These API enable to set/get graph walking model.
> + *
> + */
> +
> +#include "rte_graph_worker_common.h"
> +#include "graph_private.h"
> +
> +/**
> + * @note This function does not perform any locking, and is only safe to call
> + *    before graph running. It will set all graphs the same model.
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
< 0 otherwise

Doxygen comment is not required .c file.


> + */
> +int
> +rte_graph_worker_model_set(enum rte_graph_worker_model model)
> +{
> +       struct graph_head *graph_head = graph_list_head_get();
> +       struct graph *graph;
> +       int ret = 0;
> +
> +       if (model == RTE_GRAPH_MODEL_DEFAULT || model == RTE_GRAPH_MODEL_RTC ||
> +           model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> +               STAILQ_FOREACH(graph, graph_head, next)
> +                       graph->graph->model = model;
> +       else {
> +               STAILQ_FOREACH(graph, graph_head, next)
> +                       graph->graph->model = RTE_GRAPH_MODEL_DEFAULT;
> +               ret = -1;

Why returning -1 here?
Also, why "else" case needed as RTE_GRAPH_MODEL_RTC == RTE_GRAPH_MODEL_DEFAULT

Why not

struct graph_head *graph_head = graph_list_head_get();
struct graph *graph;

[1]
if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
       return -EINVAL;

STAILQ_FOREACH(graph, graph_head, next)
              graph->graph->model = model;


For [1], Please add internal helper function graph_model_is_valid() to
use everywhere as needed.

> +               }
> +
> +       return ret;
> +}
> +
> +/**
> + * Get the graph worker model
> + *
> + * @note All graph will use the same model and this function will get model from the first one
> + *
> + * @param name
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   Graph worker model on success.
> + */
> +inline
> +enum rte_graph_worker_model
> +rte_graph_worker_model_get(void)
> +{
> +       struct graph_head *graph_head = graph_list_head_get();
> +       struct graph *graph;
> +
> +       graph = STAILQ_FIRST(graph_head);

This can be used in fastpath, So lets pass graph object and make
inline function to return graph->model

> +
> +       return graph->graph->model;

> +}
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index 41428974db..72d132bae4 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -29,6 +29,16 @@
>  extern "C" {
>  #endif
>
> +/** Graph worker models */
> +enum rte_graph_worker_model {
> +       RTE_GRAPH_MODEL_DEFAULT,
> +       /**< Default graph model*/
> +       RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
> +       /**< Run-To-Completion model. It is the default model. */
> +       RTE_GRAPH_MODEL_MCORE_DISPATCH
> +       /**< Dispatch model to support cross-core dispatching within core affinity. */
> +};
> +
>  /**
>   * @internal
>   *
> @@ -50,6 +60,7 @@ struct rte_graph {
>         /** Number of packets to capture per core. */
>         uint64_t nb_pkt_to_capture;
>         char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
> +       enum rte_graph_worker_model model; /**< graph model */

Used in fastpath, use in fastpath area
[main]dell[dpdk.org] $ git diff
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 438595b15c..462bbfa705 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -41,6 +41,7 @@ struct rte_graph {
        rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
        rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
        rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+       enum rte_graph_worker_model  model;
        rte_graph_t id; /**< Graph identifier. */
        int socket;     /**< Socket ID where memory is allocated. */
        char name[RTE_GRAPH_NAMESIZE];  /**< Name of the graph. */

>         uint64_t fence;                 /**< Fence. */
>  } __rte_cache_aligned;
>
> @@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
>         }
>  }
>
> +__rte_experimental
> +enum rte_graph_worker_model
> +rte_graph_worker_model_get(void);
> +
> +__rte_experimental
> +int
> +rte_graph_worker_model_set(enum rte_graph_worker_model model);


Add proper Doxygen comment for both.

Also check the build issue at
http://mails.dpdk.org/archives/test-report/2023-June/404496.html

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v7 06/17] graph: introduce graph bind unbind API
  2023-06-05 11:19             ` [PATCH v7 06/17] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-05 12:41               ` Jerin Jacob
  2023-06-06  4:30                 ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-05 12:41 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Mon, Jun 5, 2023 at 4:56 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add lcore_id for graph to hold affinity core id where graph would run on.
> Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
> be set as MAX by default, it means not enable this attribute.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph.c         | 58 +++++++++++++++++++++++++++++++++++++++
>  lib/graph/graph_private.h |  2 ++
>  lib/graph/rte_graph.h     | 22 +++++++++++++++
>  lib/graph/version.map     |  2 ++
>  4 files changed, 84 insertions(+)
>
> diff --git a/lib/graph/graph.c b/lib/graph/graph.c
> index 5582631b53..ecb47dceda 100644
> --- a/lib/graph/graph.c
> +++ b/lib/graph/graph.c
> @@ -260,6 +260,63 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
>         return graph_mem_fixup_node_ctx(graph);
>  }
>
> +static __rte_always_inline bool

slowpath function, no need for __rte_always_inline

> +graph_src_node_avail(struct graph *graph)

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v7 09/17] graph: add structure for stream moving between cores
  2023-06-05 11:19             ` [PATCH v7 09/17] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-05 12:46               ` Jerin Jacob
  2023-06-06  4:30                 ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-05 12:46 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Mon, Jun 5, 2023 at 4:56 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
> node.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph.c                   |  2 ++
>  lib/graph/graph_populate.c          |  1 +
>  lib/graph/graph_private.h           | 12 ++++++++++++
>  lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
>  4 files changed, 38 insertions(+)
>
> diff --git a/lib/graph/graph.c b/lib/graph/graph.c
> index 8ce87ae6da..9f107db425 100644
> --- a/lib/graph/graph.c
> +++ b/lib/graph/graph.c
> @@ -289,6 +289,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
>
>         RTE_ASSERT(graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH);
>         graph->lcore_id = lcore;
> +       graph->graph->lcore_id = graph->lcore_id;
>         graph->socket = rte_lcore_to_socket_id(lcore);
>
>         /* check the availability of source node */
> @@ -312,6 +313,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
>                         break;
>
>         graph->lcore_id = RTE_MAX_LCORE;
> +       graph->graph->lcore_id = RTE_MAX_LCORE;
>
>  fail:
>         return;
> diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
> index 2c0844ce92..ed596a7711 100644
> --- a/lib/graph/graph_populate.c
> +++ b/lib/graph/graph_populate.c
> @@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
>                 }
>                 node->id = graph_node->node->id;
>                 node->parent_id = pid;
> +               node->dispatch.lcore_id = graph_node->node->lcore_id;
>                 nb_edges = graph_node->node->nb_edges;
>                 node->nb_edges = nb_edges;
>                 off += sizeof(struct rte_node);
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index 354dc8ac0a..d84174b667 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -64,6 +64,18 @@ struct node {
>         char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
>  };
>
> +/**
> + * @internal
> + *
> + * Structure that holds the graph scheduling workqueue node stream.
> + * Used for mcore dispatch model.
> + */
> +struct graph_mcore_dispatch_wq_node {
> +       rte_graph_off_t node_off;
> +       uint16_t nb_objs;
> +       void *objs[RTE_GRAPH_BURST_SIZE];
> +} __rte_cache_aligned;
> +
>  /**
>   * @internal
>   *
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index 72d132bae4..00bcf47ee8 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -39,6 +39,13 @@ enum rte_graph_worker_model {
>         /**< Dispatch model to support cross-core dispatching within core affinity. */
>  };
>
> +/**
> + * @internal
> + *
> + * Singly-linked list head for graph schedule run-queue.
> + */
> +SLIST_HEAD(rte_graph_rq_head, rte_graph);
> +
>  /**
>   * @internal
>   *
> @@ -50,6 +57,15 @@ struct rte_graph {
>         uint32_t cir_mask;           /**< Circular buffer wrap around mask. */
>         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
>         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */

Please add comment here, End of Fast path variables.


> +       /* Graph schedule */
> +       struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
> +       struct rte_graph_rq_head rq_head; /* The head for run-queue list */
> +
> +       SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
> +       unsigned int lcore_id;  /**< The graph running Lcore. */
> +       struct rte_ring *wq;    /**< The work-queue for pending streams. */
> +       struct rte_mempool *mp; /**< The mempool for scheduling streams. */
> +       /* Graph schedule area */

Please move above sections to _dispatch_ union.

>         rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
>         rte_graph_t id; /**< Graph identifier. */
>         int socket;     /**< Socket ID where memory is allocated. */
> @@ -84,6 +100,13 @@ struct rte_node {
>         /** Original process function when pcap is enabled. */
>         rte_node_process_t original_process;
>
> +       RTE_STD_C11
> +               union {
> +                       /* Fast schedule area for mcore dispatch model */
> +                       struct {
> +                               unsigned int lcore_id;  /**< Node running lcore. */
> +                       } dispatch;
> +               };
>         /* Fast path area  */
>  #define RTE_NODE_CTX_SZ 16
>         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-05 11:19             ` [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-06-05 13:42               ` Jerin Jacob
  2023-06-06  5:10                 ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-05 13:42 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Mon, Jun 5, 2023 at 4:57 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new parameter "model" to choose mcore dispatch or rtc model.
> And in dispatch model, the node will affinity to worker core successively.
>
> Note:
> RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must set
> model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
> choose by model in runtime.
> Only support one RX node for mcore dispatch model in current
> implementation.
>
> ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="dispatch"
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  examples/l3fwd-graph/main.c  | 231 +++++++++++++++++++++++++++++------
>  lib/graph/rte_graph_worker.h |   3 +
>  2 files changed, 196 insertions(+), 38 deletions(-)
>
> diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> index 5feeab4f0f..4ecc6c9af4 100644
> --- a/examples/l3fwd-graph/main.c
> +++ b/examples/l3fwd-graph/main.c
> @@ -23,6 +23,12 @@
>  #include <rte_cycles.h>
>  #include <rte_eal.h>
>  #include <rte_ethdev.h>
> +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default. */
> +#define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
> +#define GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by */
> +                                          /* parsing model in cmdline. */

After moving model to graph->model, Can you check the performance.
This may not be needed for l3fwd

or if there is not much code duplication,

Do the following remove the limitation,
 #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC.

graph_main_loop change to graph_main_rtc_loop

 #define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_MCORE_DISPATCH

graph_main_loop change to graph_main_mcore_loop

Select the following based on runtime option
        /* Launch per-lcore init on every worker lcore */
        rte_eal_mp_remote_launch(graph_main_rtc_loop, NULL, SKIP_MAIN);
or
        rte_eal_mp_remote_launch(graph_main_mcore_loop, NULL, SKIP_MAIN);

>         memset(&rewrite_data, 0, sizeof(rewrite_data));
>         rewrite_len = sizeof(rewrite_data);
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index 541c373cb1..19b4c1514f 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -26,6 +26,9 @@ __rte_experimental
>  static inline void
>  rte_graph_walk(struct rte_graph *graph)
>  {
> +#define RTE_GRAPH_MODEL_RTC 0
> +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1

No need for duplicate enum. Please remove enum make  this as in public
header file.


> +

Add comment here, On how  application uses this, aka.  before
inlcuding the worker header file #define RTE_GRAPH_MODEL_SELECT
RTE_GRAPH_MODEL_RTC.
Please change the text as needed.


>  #if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC
>         rte_graph_walk_rtc(graph);
>  #elif RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v7 04/17] graph: add get/set graph worker model APIs
  2023-06-05 12:38               ` Jerin Jacob
@ 2023-06-06  4:30                 ` Yan, Zhirun
  2023-06-06  5:48                   ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-06  4:30 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, June 5, 2023 8:38 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v7 04/17] graph: add get/set graph worker model APIs
> 
> On Mon, Jun 5, 2023 at 4:56 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/meson.build               |  1 +
> >  lib/graph/rte_graph_worker.c        | 70 +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 19 ++++++++
> >  lib/graph/version.map               |  3 ++
> >  4 files changed, 93 insertions(+)
> >  create mode 100644 lib/graph/rte_graph_worker.c
> >
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > 3526d1b5d4..9fab8243da 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -15,6 +15,7 @@ sources = files(
> >          'graph_stats.c',
> >          'graph_populate.c',
> >          'graph_pcap.c',
> > +        'rte_graph_worker.c',
> >  )
> >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> > diff --git a/lib/graph/rte_graph_worker.c
> > b/lib/graph/rte_graph_worker.c new file mode 100644 index
> > 0000000000..fc188e7cfa
> > --- /dev/null
> > +++ b/lib/graph/rte_graph_worker.c
> > @@ -0,0 +1,70 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Intel Corporation  */
> > +
> > +/**
> > + * @file graph_worker.c
> > + *
> > + * @warning
> > + * @b EXPERIMENTAL:
> > + * All functions in this file may be changed or removed without prior notice.
> > + *
> > + * These API enable to set/get graph walking model.
> > + *
> > + */
> > +
> > +#include "rte_graph_worker_common.h"
> > +#include "graph_private.h"
> > +
> > +/**
> > + * @note This function does not perform any locking, and is only safe to call
> > + *    before graph running. It will set all graphs the same model.
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> < 0 otherwise
> 
> Doxygen comment is not required .c file.
> 
Yes, I will move the declaration into rte_graph_worker.h

> 
> > + */
> > +int
> > +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> > +       struct graph_head *graph_head = graph_list_head_get();
> > +       struct graph *graph;
> > +       int ret = 0;
> > +
> > +       if (model == RTE_GRAPH_MODEL_DEFAULT || model ==
> RTE_GRAPH_MODEL_RTC ||
> > +           model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> > +               STAILQ_FOREACH(graph, graph_head, next)
> > +                       graph->graph->model = model;
> > +       else {
> > +               STAILQ_FOREACH(graph, graph_head, next)
> > +                       graph->graph->model = RTE_GRAPH_MODEL_DEFAULT;
> > +               ret = -1;
> 
> Why returning -1 here?
> Also, why "else" case needed as RTE_GRAPH_MODEL_RTC ==
> RTE_GRAPH_MODEL_DEFAULT

Actually, the "else" offers a way to recover if this func called with model >
RTE_GRAPH_MODEL_MCORE_DISPATCH, or another not supported value. Then the
app could have ability to run in default RTC model or user app could reset the model.

> 
> Why not
> 
> struct graph_head *graph_head = graph_list_head_get(); struct graph *graph;
> 
> [1]
> if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
>        return -EINVAL;
> 
> STAILQ_FOREACH(graph, graph_head, next)
>               graph->graph->model = model;
> 
> 
> For [1], Please add internal helper function graph_model_is_valid() to use
> everywhere as needed.
> 

I will add graph_model_is_valid().

And change it to 

If (!graph_model_is_valid()) {
  model = RTE_GRAPH_MODEL_DEFAULT;
  return -EINVAL;
}
STAILQ_FOREACH(graph, graph_head, next)
    graph->graph->model = model;

return 0;

> > +               }
> > +
> > +       return ret;
> > +}
> > +
> > +/**
> > + * Get the graph worker model
> > + *
> > + * @note All graph will use the same model and this function will get
> > +model from the first one
> > + *
> > + * @param name
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   Graph worker model on success.
> > + */
> > +inline
> > +enum rte_graph_worker_model
> > +rte_graph_worker_model_get(void)
> > +{
> > +       struct graph_head *graph_head = graph_list_head_get();
> > +       struct graph *graph;
> > +
> > +       graph = STAILQ_FIRST(graph_head);
> 
> This can be used in fastpath, So lets pass graph object and make inline function
> to return graph->model
> 

Some functions don't have the graph object like graph_stats_*. No param could make
this API easy to call for current impl. And I think this func is mainly for configuration,
should not in fastpath. If different models coexisted, then the graph object must be passed.

I will change it. Thanks.

> > +
> > +       return graph->graph->model;
> 
> > +}
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index 41428974db..72d132bae4 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -29,6 +29,16 @@
> >  extern "C" {
> >  #endif
> >
> > +/** Graph worker models */
> > +enum rte_graph_worker_model {
> > +       RTE_GRAPH_MODEL_DEFAULT,
> > +       /**< Default graph model*/
> > +       RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT,
> > +       /**< Run-To-Completion model. It is the default model. */
> > +       RTE_GRAPH_MODEL_MCORE_DISPATCH
> > +       /**< Dispatch model to support cross-core dispatching within
> > +core affinity. */ };
> > +
> >  /**
> >   * @internal
> >   *
> > @@ -50,6 +60,7 @@ struct rte_graph {
> >         /** Number of packets to capture per core. */
> >         uint64_t nb_pkt_to_capture;
> >         char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap
> > filename. */
> > +       enum rte_graph_worker_model model; /**< graph model */
> 
> Used in fastpath, use in fastpath area
> [main]dell[dpdk.org] $ git diff
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index
> 438595b15c..462bbfa705 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -41,6 +41,7 @@ struct rte_graph {
>         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
>         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
>         rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
> +       enum rte_graph_worker_model  model;
>         rte_graph_t id; /**< Graph identifier. */
>         int socket;     /**< Socket ID where memory is allocated. */
>         char name[RTE_GRAPH_NAMESIZE];  /**< Name of the graph. */
> 
> >         uint64_t fence;                 /**< Fence. */
> >  } __rte_cache_aligned;
> >
> > @@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph
> *graph, struct rte_node *src,
> >         }
> >  }
> >
> > +__rte_experimental
> > +enum rte_graph_worker_model
> > +rte_graph_worker_model_get(void);
> > +
> > +__rte_experimental
> > +int
> > +rte_graph_worker_model_set(enum rte_graph_worker_model model);
> 
> 
> Add proper Doxygen comment for both.
> 
> Also check the build issue at
> http://mails.dpdk.org/archives/test-report/2023-June/404496.html

Got it, will fix in next version. Thanks.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v7 06/17] graph: introduce graph bind unbind API
  2023-06-05 12:41               ` Jerin Jacob
@ 2023-06-06  4:30                 ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-06  4:30 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, June 5, 2023 8:41 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v7 06/17] graph: introduce graph bind unbind API
> 
> On Mon, Jun 5, 2023 at 4:56 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add lcore_id for graph to hold affinity core id where graph would run on.
> > Add bind/unbind API to set/unset graph affinity attribute. lcore_id
> > will be set as MAX by default, it means not enable this attribute.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph.c         | 58 +++++++++++++++++++++++++++++++++++++++
> >  lib/graph/graph_private.h |  2 ++
> >  lib/graph/rte_graph.h     | 22 +++++++++++++++
> >  lib/graph/version.map     |  2 ++
> >  4 files changed, 84 insertions(+)
> >
> > diff --git a/lib/graph/graph.c b/lib/graph/graph.c index
> > 5582631b53..ecb47dceda 100644
> > --- a/lib/graph/graph.c
> > +++ b/lib/graph/graph.c
> > @@ -260,6 +260,63 @@ graph_mem_fixup_secondary(struct rte_graph
> *graph)
> >         return graph_mem_fixup_node_ctx(graph);  }
> >
> > +static __rte_always_inline bool
> 
> slowpath function, no need for __rte_always_inline
> 
Ok, I will remove this.

> > +graph_src_node_avail(struct graph *graph)

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v7 09/17] graph: add structure for stream moving between cores
  2023-06-05 12:46               ` Jerin Jacob
@ 2023-06-06  4:30                 ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-06  4:30 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, June 5, 2023 8:47 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v7 09/17] graph: add structure for stream moving between
> cores
> 
> On Mon, Jun 5, 2023 at 4:56 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
> > node.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph.c                   |  2 ++
> >  lib/graph/graph_populate.c          |  1 +
> >  lib/graph/graph_private.h           | 12 ++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 23 +++++++++++++++++++++++
> >  4 files changed, 38 insertions(+)
> >
> > diff --git a/lib/graph/graph.c b/lib/graph/graph.c index
> > 8ce87ae6da..9f107db425 100644
> > --- a/lib/graph/graph.c
> > +++ b/lib/graph/graph.c
> > @@ -289,6 +289,7 @@
> > rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
> >
> >         RTE_ASSERT(graph->graph->model ==
> RTE_GRAPH_MODEL_MCORE_DISPATCH);
> >         graph->lcore_id = lcore;
> > +       graph->graph->lcore_id = graph->lcore_id;
> >         graph->socket = rte_lcore_to_socket_id(lcore);
> >
> >         /* check the availability of source node */ @@ -312,6 +313,7
> > @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
> >                         break;
> >
> >         graph->lcore_id = RTE_MAX_LCORE;
> > +       graph->graph->lcore_id = RTE_MAX_LCORE;
> >
> >  fail:
> >         return;
> > diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
> > index 2c0844ce92..ed596a7711 100644
> > --- a/lib/graph/graph_populate.c
> > +++ b/lib/graph/graph_populate.c
> > @@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
> >                 }
> >                 node->id = graph_node->node->id;
> >                 node->parent_id = pid;
> > +               node->dispatch.lcore_id = graph_node->node->lcore_id;
> >                 nb_edges = graph_node->node->nb_edges;
> >                 node->nb_edges = nb_edges;
> >                 off += sizeof(struct rte_node); diff --git
> > a/lib/graph/graph_private.h b/lib/graph/graph_private.h index
> > 354dc8ac0a..d84174b667 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -64,6 +64,18 @@ struct node {
> >         char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next
> > nodes. */  };
> >
> > +/**
> > + * @internal
> > + *
> > + * Structure that holds the graph scheduling workqueue node stream.
> > + * Used for mcore dispatch model.
> > + */
> > +struct graph_mcore_dispatch_wq_node {
> > +       rte_graph_off_t node_off;
> > +       uint16_t nb_objs;
> > +       void *objs[RTE_GRAPH_BURST_SIZE]; } __rte_cache_aligned;
> > +
> >  /**
> >   * @internal
> >   *
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index 72d132bae4..00bcf47ee8 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -39,6 +39,13 @@ enum rte_graph_worker_model {
> >         /**< Dispatch model to support cross-core dispatching within
> > core affinity. */  };
> >
> > +/**
> > + * @internal
> > + *
> > + * Singly-linked list head for graph schedule run-queue.
> > + */
> > +SLIST_HEAD(rte_graph_rq_head, rte_graph);
> > +
> >  /**
> >   * @internal
> >   *
> > @@ -50,6 +57,15 @@ struct rte_graph {
> >         uint32_t cir_mask;           /**< Circular buffer wrap around mask. */
> >         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
> >         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer.
> > */
> 
> Please add comment here, End of Fast path variables.

Got it.

> 
> 
> > +       /* Graph schedule */
> > +       struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
> > +       struct rte_graph_rq_head rq_head; /* The head for run-queue
> > + list */
> > +
> > +       SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
> > +       unsigned int lcore_id;  /**< The graph running Lcore. */
> > +       struct rte_ring *wq;    /**< The work-queue for pending streams. */
> > +       struct rte_mempool *mp; /**< The mempool for scheduling streams. */
> > +       /* Graph schedule area */
> 
> Please move above sections to _dispatch_ union.

Yes, will change it to union.

> 
> >         rte_graph_off_t nodes_start; /**< Offset at which node memory starts.
> */
> >         rte_graph_t id; /**< Graph identifier. */
> >         int socket;     /**< Socket ID where memory is allocated. */
> > @@ -84,6 +100,13 @@ struct rte_node {
> >         /** Original process function when pcap is enabled. */
> >         rte_node_process_t original_process;
> >
> > +       RTE_STD_C11
> > +               union {
> > +                       /* Fast schedule area for mcore dispatch model */
> > +                       struct {
> > +                               unsigned int lcore_id;  /**< Node running lcore. */
> > +                       } dispatch;
> > +               };
> >         /* Fast path area  */
> >  #define RTE_NODE_CTX_SZ 16
> >         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node
> > Context. */
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-05 13:42               ` Jerin Jacob
@ 2023-06-06  5:10                 ` Yan, Zhirun
  2023-06-06  5:55                   ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-06  5:10 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, June 5, 2023 9:42 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore
> dispatch worker model
> 
> On Mon, Jun 5, 2023 at 4:57 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new parameter "model" to choose mcore dispatch or rtc model.
> > And in dispatch model, the node will affinity to worker core successively.
> >
> > Note:
> > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must
> set
> > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
> > choose by model in runtime.
> > Only support one RX node for mcore dispatch model in current
> > implementation.
> >
> > ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > --model="dispatch"
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  examples/l3fwd-graph/main.c  | 231 +++++++++++++++++++++++++++++------
> >  lib/graph/rte_graph_worker.h |   3 +
> >  2 files changed, 196 insertions(+), 38 deletions(-)
> >
> > diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> > index 5feeab4f0f..4ecc6c9af4 100644
> > --- a/examples/l3fwd-graph/main.c
> > +++ b/examples/l3fwd-graph/main.c
> > @@ -23,6 +23,12 @@
> >  #include <rte_cycles.h>
> >  #include <rte_eal.h>
> >  #include <rte_ethdev.h>
> > +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default.
> > +*/ #define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
> #define
> > +GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by
> */
> > +                                          /* parsing model in
> > +cmdline. */
> 
> After moving model to graph->model, Can you check the performance.

In my env, I test l3fwd-graph, I got the same throughput.(slight improve could be treated as jitter)
For graph_perf_autotest in test app, there is slight drop (About 0.2% call, similar cycles/call) 
Can it be treated as jitter?

Old:
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|Node                           |calls          |objs           |realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|test_graph_perf_worker-0-0     |10175176       |2604845056     |1              |256.000        |2015.394304    |27.0000    |
|test_graph_perf_worker-1-0     |10175542       |2604938752     |1              |256.000        |2015.488000    |28.0000    |
|test_graph_perf_worker-2-0     |10175565       |2604944640     |1              |256.000        |2015.493888    |28.0000    |
|test_graph_perf_worker-3-0     |10175593       |2604951808     |1              |256.000        |2015.501056    |27.0000    |
|test_graph_perf_source-0       |10175623       |2604959488     |2              |256.000        |2015.508480    |27.0000    |
|test_graph_perf_sink-0         |10175642       |2604964352     |1              |256.000        |2015.513600    |27.0000    |
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+

New:
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|Node                           |calls          |objs           |realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|test_graph_perf_worker-0-0     |10154953       |2599667968     |1              |256.000        |2010.960128    |27.0000    |
|test_graph_perf_worker-1-0     |10155316       |2599760896     |1              |256.000        |2011.053056    |27.0000    |
|test_graph_perf_worker-2-0     |10155338       |2599766528     |1              |256.000        |2011.058688    |28.0000    |
|test_graph_perf_worker-3-0     |10155357       |2599771392     |1              |256.000        |2011.063552    |28.0000    |
|test_graph_perf_source-0       |10155394       |2599780864     |2              |256.000        |2011.072768    |27.0000    |
|test_graph_perf_sink-0         |10155422       |2599788032     |1              |256.000        |2011.080192    |27.0000    |
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+

> This may not be needed for l3fwd
> 
Do you mean graph->model?

> or if there is not much code duplication,
> 
> Do the following remove the limitation,
>  #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC.
> 
> graph_main_loop change to graph_main_rtc_loop
> 
>  #define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_MCORE_DISPATCH
> 
> graph_main_loop change to graph_main_mcore_loop
> 
> Select the following based on runtime option
>         /* Launch per-lcore init on every worker lcore */
>         rte_eal_mp_remote_launch(graph_main_rtc_loop, NULL, SKIP_MAIN); or
>         rte_eal_mp_remote_launch(graph_main_mcore_loop, NULL, SKIP_MAIN);
> 

We want to 1. Use same API (rte_graph_walk()) for diff models.
2. no performance drop for rtc (use RTE_GRAPH_MODEL_SELECT in compile time)

If I understand correctly, I need remove graph->model and only use
RTE_GRAPH_MODEL_SELECT to select models? 

And change it as
graph_main_rtc_loop()
{
  While(1)
    rte_graph_walk_rtc()
}

But actually, I think graph->model is need, especially for config stage and for runtime config
If set RTE_GRAPH_MODEL_SELECT_RUNTIME.
We need the model type to decide to alloc workqueue and use RTE_GRAPH_MODEL_SELECT
to choose the walk.


> >         memset(&rewrite_data, 0, sizeof(rewrite_data));
> >         rewrite_len = sizeof(rewrite_data); diff --git
> > a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index
> > 541c373cb1..19b4c1514f 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -26,6 +26,9 @@ __rte_experimental
> >  static inline void
> >  rte_graph_walk(struct rte_graph *graph)  {
> > +#define RTE_GRAPH_MODEL_RTC 0
> > +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
> 
> No need for duplicate enum. Please remove enum make  this as in public header
> file.
> 
Yes, it will cause no defined warnings.
Thanks for your comments.
I will remove enum and define model type macros in public header. And also change
the related structs/APIs.

> 
> > +
> 
> Add comment here, On how  application uses this, aka.  before inlcuding the
> worker header file #define RTE_GRAPH_MODEL_SELECT
> RTE_GRAPH_MODEL_RTC.
> Please change the text as needed.
Yes, I will add comment and add the usage in document.

> 
> 
> >  #if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT ==
> RTE_GRAPH_MODEL_RTC
> >         rte_graph_walk_rtc(graph);
> >  #elif RTE_GRAPH_MODEL_SELECT ==
> RTE_GRAPH_MODEL_MCORE_DISPATCH
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v7 04/17] graph: add get/set graph worker model APIs
  2023-06-06  4:30                 ` Yan, Zhirun
@ 2023-06-06  5:48                   ` Jerin Jacob
  2023-06-06  6:34                     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-06  5:48 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom

On Tue, Jun 6, 2023 at 10:00 AM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, June 5, 2023 8:38 PM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; stephen@networkplumber.org;
> > pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> > <mattias.ronnblom@ericsson.com>
> > Subject: Re: [PATCH v7 04/17] graph: add get/set graph worker model APIs
> >
> > On Mon, Jun 5, 2023 at 4:56 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > Add new get/set APIs to configure graph worker model which is used to
> > > determine which model will be chosen.
> > >
> > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > ---
> > >  lib/graph/meson.build               |  1 +
> > >  lib/graph/rte_graph_worker.c        | 70 +++++++++++++++++++++++++++++
> > >  lib/graph/rte_graph_worker_common.h | 19 ++++++++
> > >  lib/graph/version.map               |  3 ++
> > >  4 files changed, 93 insertions(+)
> > >  create mode 100644 lib/graph/rte_graph_worker.c
> > >
> > > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > > 3526d1b5d4..9fab8243da 100644
> > > --- a/lib/graph/meson.build
> > > +++ b/lib/graph/meson.build
> > > @@ -15,6 +15,7 @@ sources = files(
> > >          'graph_stats.c',
> > >          'graph_populate.c',
> > >          'graph_pcap.c',
> > > +        'rte_graph_worker.c',
> > >  )
> > >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> > >
> > > diff --git a/lib/graph/rte_graph_worker.c
> > > b/lib/graph/rte_graph_worker.c new file mode 100644 index
> > > 0000000000..fc188e7cfa
> > > --- /dev/null
> > > +++ b/lib/graph/rte_graph_worker.c
> > > @@ -0,0 +1,70 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(C) 2023 Intel Corporation  */
> > > +
> > > +/**
> > > + * @file graph_worker.c
> > > + *
> > > + * @warning
> > > + * @b EXPERIMENTAL:
> > > + * All functions in this file may be changed or removed without prior notice.
> > > + *
> > > + * These API enable to set/get graph walking model.
> > > + *
> > > + */
> > > +
> > > +#include "rte_graph_worker_common.h"
> > > +#include "graph_private.h"
> > > +
> > > +/**
> > > + * @note This function does not perform any locking, and is only safe to call
> > > + *    before graph running. It will set all graphs the same model.
> > > + *
> > > + * @param name
> > > + *   Name of the graph worker model.
> > > + *
> > > + * @return
> > > + *   0 on success, -1 otherwise.
> > < 0 otherwise
> >
> > Doxygen comment is not required .c file.
> >
> Yes, I will move the declaration into rte_graph_worker.h
>
> >
> > > + */
> > > +int
> > > +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> > > +       struct graph_head *graph_head = graph_list_head_get();
> > > +       struct graph *graph;
> > > +       int ret = 0;
> > > +
> > > +       if (model == RTE_GRAPH_MODEL_DEFAULT || model ==
> > RTE_GRAPH_MODEL_RTC ||
> > > +           model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> > > +               STAILQ_FOREACH(graph, graph_head, next)
> > > +                       graph->graph->model = model;
> > > +       else {
> > > +               STAILQ_FOREACH(graph, graph_head, next)
> > > +                       graph->graph->model = RTE_GRAPH_MODEL_DEFAULT;
> > > +               ret = -1;
> >
> > Why returning -1 here?
> > Also, why "else" case needed as RTE_GRAPH_MODEL_RTC ==
> > RTE_GRAPH_MODEL_DEFAULT
>
> Actually, the "else" offers a way to recover if this func called with model >
> RTE_GRAPH_MODEL_MCORE_DISPATCH, or another not supported value. Then the
> app could have ability to run in default RTC model or user app could reset the model.
>
> >
> > Why not
> >
> > struct graph_head *graph_head = graph_list_head_get(); struct graph *graph;
> >
> > [1]
> > if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
> >        return -EINVAL;
> >
> > STAILQ_FOREACH(graph, graph_head, next)
> >               graph->graph->model = model;
> >
> >
> > For [1], Please add internal helper function graph_model_is_valid() to use
> > everywhere as needed.
> >
>
> I will add graph_model_is_valid().
>
> And change it to
>
> If (!graph_model_is_valid()) {
>   model = RTE_GRAPH_MODEL_DEFAULT;

Since it returning from below, Do we need to update model?

>   return -EINVAL;
> }
> STAILQ_FOREACH(graph, graph_head, next)
>     graph->graph->model = model;

graph->model = model. Right?


>
> return 0;
>
> > > +               }
> > > +
> > > +       return ret;
> > > +}
> > > +
> > > +/**
> > > + * Get the graph worker model
> > > + *
> > > + * @note All graph will use the same model and this function will get
> > > +model from the first one
> > > + *
> > > + * @param name
> > > + *   Name of the graph worker model.
> > > + *
> > > + * @return
> > > + *   Graph worker model on success.
> > > + */
> > > +inline
> > > +enum rte_graph_worker_model
> > > +rte_graph_worker_model_get(void)
> > > +{
> > > +       struct graph_head *graph_head = graph_list_head_get();
> > > +       struct graph *graph;
> > > +
> > > +       graph = STAILQ_FIRST(graph_head);
> >
> > This can be used in fastpath, So lets pass graph object and make inline function
> > to return graph->model
> >
>
> Some functions don't have the graph object like graph_stats_*. No param could make
> this API easy to call for current impl. And I think this func is mainly for configuration,

But, you are using this in fastpath here.
https://patches.dpdk.org/project/dpdk/patch/20230605111923.3772260-14-zhirun.yan@intel.com/

> should not in fastpath. If different models coexisted, then the graph object must be passed.

I think, all fastpath cases graph->model can tell the model.
graph_stats_* API etc still
has rte_graph object so it should be possible to reuse there.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-06  5:10                 ` Yan, Zhirun
@ 2023-06-06  5:55                   ` Jerin Jacob
  2023-06-06  8:51                     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-06  5:55 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom

On Tue, Jun 6, 2023 at 10:41 AM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, June 5, 2023 9:42 PM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; stephen@networkplumber.org;
> > pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> > <mattias.ronnblom@ericsson.com>
> > Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore
> > dispatch worker model
> >
> > On Mon, Jun 5, 2023 at 4:57 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > Add new parameter "model" to choose mcore dispatch or rtc model.
> > > And in dispatch model, the node will affinity to worker core successively.
> > >
> > > Note:
> > > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must
> > set
> > > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
> > > choose by model in runtime.
> > > Only support one RX node for mcore dispatch model in current
> > > implementation.
> > >
> > > ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > > --model="dispatch"
> > >
> > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > ---
> > >  examples/l3fwd-graph/main.c  | 231 +++++++++++++++++++++++++++++------
> > >  lib/graph/rte_graph_worker.h |   3 +
> > >  2 files changed, 196 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> > > index 5feeab4f0f..4ecc6c9af4 100644
> > > --- a/examples/l3fwd-graph/main.c
> > > +++ b/examples/l3fwd-graph/main.c
> > > @@ -23,6 +23,12 @@
> > >  #include <rte_cycles.h>
> > >  #include <rte_eal.h>
> > >  #include <rte_ethdev.h>
> > > +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default.
> > > +*/ #define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
> > #define
> > > +GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by
> > */
> > > +                                          /* parsing model in
> > > +cmdline. */
> >
> > After moving model to graph->model, Can you check the performance.
>
> In my env, I test l3fwd-graph, I got the same throughput.(slight improve could be treated as jitter)
> For graph_perf_autotest in test app, there is slight drop (About 0.2% call, similar cycles/call)
> Can it be treated as jitter?

Most likely.
Try in following in fasth path.
 const ... model = graph->model;

>
> Old:
> +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
> |Node                           |calls          |objs           |realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
> +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
> |test_graph_perf_worker-0-0     |10175176       |2604845056     |1              |256.000        |2015.394304    |27.0000    |
> |test_graph_perf_worker-1-0     |10175542       |2604938752     |1              |256.000        |2015.488000    |28.0000    |
> |test_graph_perf_worker-2-0     |10175565       |2604944640     |1              |256.000        |2015.493888    |28.0000    |
> |test_graph_perf_worker-3-0     |10175593       |2604951808     |1              |256.000        |2015.501056    |27.0000    |
> |test_graph_perf_source-0       |10175623       |2604959488     |2              |256.000        |2015.508480    |27.0000    |
> |test_graph_perf_sink-0         |10175642       |2604964352     |1              |256.000        |2015.513600    |27.0000    |
> +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
>
> New:
> +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
> |Node                           |calls          |objs           |realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
> +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
> |test_graph_perf_worker-0-0     |10154953       |2599667968     |1              |256.000        |2010.960128    |27.0000    |
> |test_graph_perf_worker-1-0     |10155316       |2599760896     |1              |256.000        |2011.053056    |27.0000    |
> |test_graph_perf_worker-2-0     |10155338       |2599766528     |1              |256.000        |2011.058688    |28.0000    |
> |test_graph_perf_worker-3-0     |10155357       |2599771392     |1              |256.000        |2011.063552    |28.0000    |
> |test_graph_perf_source-0       |10155394       |2599780864     |2              |256.000        |2011.072768    |27.0000    |
> |test_graph_perf_sink-0         |10155422       |2599788032     |1              |256.000        |2011.080192    |27.0000    |
> +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
>
> > This may not be needed for l3fwd
> >
> Do you mean graph->model?

Yes.

>
> > or if there is not much code duplication,
> >
> > Do the following remove the limitation,
> >  #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC.
> >
> > graph_main_loop change to graph_main_rtc_loop
> >
> >  #define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_MCORE_DISPATCH
> >
> > graph_main_loop change to graph_main_mcore_loop
> >
> > Select the following based on runtime option
> >         /* Launch per-lcore init on every worker lcore */
> >         rte_eal_mp_remote_launch(graph_main_rtc_loop, NULL, SKIP_MAIN); or
> >         rte_eal_mp_remote_launch(graph_main_mcore_loop, NULL, SKIP_MAIN);
> >
>
> We want to 1. Use same API (rte_graph_walk()) for diff models.
> 2. no performance drop for rtc (use RTE_GRAPH_MODEL_SELECT in compile time)
>
> If I understand correctly, I need remove graph->model and only use
> RTE_GRAPH_MODEL_SELECT to select models?
>
> And change it as
> graph_main_rtc_loop()
> {
>   While(1)
>     rte_graph_walk_rtc()
> }
>
> But actually, I think graph->model is need, especially for config stage and for runtime config
> If set RTE_GRAPH_MODEL_SELECT_RUNTIME.

Yes. Agree. If there is no MAJOR performance issues lets use
RTE_GRAPH_MODEL_SELECT_RUNTIME for l3fwd.

> We need the model type to decide to alloc workqueue and use RTE_GRAPH_MODEL_SELECT
> to choose the walk.
>
>
> > >         memset(&rewrite_data, 0, sizeof(rewrite_data));
> > >         rewrite_len = sizeof(rewrite_data); diff --git
> > > a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index
> > > 541c373cb1..19b4c1514f 100644
> > > --- a/lib/graph/rte_graph_worker.h
> > > +++ b/lib/graph/rte_graph_worker.h
> > > @@ -26,6 +26,9 @@ __rte_experimental
> > >  static inline void
> > >  rte_graph_walk(struct rte_graph *graph)  {
> > > +#define RTE_GRAPH_MODEL_RTC 0
> > > +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
> >
> > No need for duplicate enum. Please remove enum make  this as in public header
> > file.
> >
> Yes, it will cause no defined warnings.
> Thanks for your comments.
> I will remove enum and define model type macros in public header. And also change
> the related structs/APIs.

Also add a comment in RTE_GRAPH_MODEL_MCORE_DISPATCH, If adding
new entry, then update graph_is_valid API.

>
> >
> > > +
> >
> > Add comment here, On how  application uses this, aka.  before inlcuding the
> > worker header file #define RTE_GRAPH_MODEL_SELECT
> > RTE_GRAPH_MODEL_RTC.
> > Please change the text as needed.
> Yes, I will add comment and add the usage in document.
>
> >
> >
> > >  #if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT ==
> > RTE_GRAPH_MODEL_RTC
> > >         rte_graph_walk_rtc(graph);
> > >  #elif RTE_GRAPH_MODEL_SELECT ==
> > RTE_GRAPH_MODEL_MCORE_DISPATCH
> > > --
> > > 2.37.2
> > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v7 04/17] graph: add get/set graph worker model APIs
  2023-06-06  5:48                   ` Jerin Jacob
@ 2023-06-06  6:34                     ` Yan, Zhirun
  2023-06-06  6:39                       ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-06  6:34 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, June 6, 2023 1:48 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v7 04/17] graph: add get/set graph worker model APIs
> 
> On Tue, Jun 6, 2023 at 10:00 AM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, June 5, 2023 8:38 PM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; stephen@networkplumber.org;
> > > pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> > > <mattias.ronnblom@ericsson.com>
> > > Subject: Re: [PATCH v7 04/17] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Mon, Jun 5, 2023 at 4:56 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > > >
> > > > Add new get/set APIs to configure graph worker model which is used
> > > > to determine which model will be chosen.
> > > >
> > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > ---
> > > >  lib/graph/meson.build               |  1 +
> > > >  lib/graph/rte_graph_worker.c        | 70
> +++++++++++++++++++++++++++++
> > > >  lib/graph/rte_graph_worker_common.h | 19 ++++++++
> > > >  lib/graph/version.map               |  3 ++
> > > >  4 files changed, 93 insertions(+)  create mode 100644
> > > > lib/graph/rte_graph_worker.c
> > > >
> > > > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > > > 3526d1b5d4..9fab8243da 100644
> > > > --- a/lib/graph/meson.build
> > > > +++ b/lib/graph/meson.build
> > > > @@ -15,6 +15,7 @@ sources = files(
> > > >          'graph_stats.c',
> > > >          'graph_populate.c',
> > > >          'graph_pcap.c',
> > > > +        'rte_graph_worker.c',
> > > >  )
> > > >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> > > >
> > > > diff --git a/lib/graph/rte_graph_worker.c
> > > > b/lib/graph/rte_graph_worker.c new file mode 100644 index
> > > > 0000000000..fc188e7cfa
> > > > --- /dev/null
> > > > +++ b/lib/graph/rte_graph_worker.c
> > > > @@ -0,0 +1,70 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright(C) 2023 Intel Corporation  */
> > > > +
> > > > +/**
> > > > + * @file graph_worker.c
> > > > + *
> > > > + * @warning
> > > > + * @b EXPERIMENTAL:
> > > > + * All functions in this file may be changed or removed without prior
> notice.
> > > > + *
> > > > + * These API enable to set/get graph walking model.
> > > > + *
> > > > + */
> > > > +
> > > > +#include "rte_graph_worker_common.h"
> > > > +#include "graph_private.h"
> > > > +
> > > > +/**
> > > > + * @note This function does not perform any locking, and is only safe to
> call
> > > > + *    before graph running. It will set all graphs the same model.
> > > > + *
> > > > + * @param name
> > > > + *   Name of the graph worker model.
> > > > + *
> > > > + * @return
> > > > + *   0 on success, -1 otherwise.
> > > < 0 otherwise
> > >
> > > Doxygen comment is not required .c file.
> > >
> > Yes, I will move the declaration into rte_graph_worker.h
> >
> > >
> > > > + */
> > > > +int
> > > > +rte_graph_worker_model_set(enum rte_graph_worker_model model) {
> > > > +       struct graph_head *graph_head = graph_list_head_get();
> > > > +       struct graph *graph;
> > > > +       int ret = 0;
> > > > +
> > > > +       if (model == RTE_GRAPH_MODEL_DEFAULT || model ==
> > > RTE_GRAPH_MODEL_RTC ||
> > > > +           model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> > > > +               STAILQ_FOREACH(graph, graph_head, next)
> > > > +                       graph->graph->model = model;
> > > > +       else {
> > > > +               STAILQ_FOREACH(graph, graph_head, next)
> > > > +                       graph->graph->model = RTE_GRAPH_MODEL_DEFAULT;
> > > > +               ret = -1;
> > >
> > > Why returning -1 here?
> > > Also, why "else" case needed as RTE_GRAPH_MODEL_RTC ==
> > > RTE_GRAPH_MODEL_DEFAULT
> >
> > Actually, the "else" offers a way to recover if this func called with
> > model > RTE_GRAPH_MODEL_MCORE_DISPATCH, or another not supported
> > value. Then the app could have ability to run in default RTC model or user app
> could reset the model.
> >
> > >
> > > Why not
> > >
> > > struct graph_head *graph_head = graph_list_head_get(); struct graph
> > > *graph;
> > >
> > > [1]
> > > if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
> > >        return -EINVAL;
> > >
> > > STAILQ_FOREACH(graph, graph_head, next)
> > >               graph->graph->model = model;
> > >
> > >
> > > For [1], Please add internal helper function graph_model_is_valid()
> > > to use everywhere as needed.
> > >
> >
> > I will add graph_model_is_valid().
> >
> > And change it to
> >
> > If (!graph_model_is_valid()) {
> >   model = RTE_GRAPH_MODEL_DEFAULT;
> 
> Since it returning from below, Do we need to update model?
> 
No, no need to update it. I reset model here cause I want to reset all graph->model,
But miss the graph loop. I think it is no need to update all graph->model also.
I will remove it and return directly.

> >   return -EINVAL;
> > }
> > STAILQ_FOREACH(graph, graph_head, next)
> >     graph->graph->model = model;
> 
> graph->model = model. Right?
No. This graph is struct graph, and we put model into struct rte_graph.
So it is 
(struct graph) graph -> (struct rte_graph *) graph -> model

> 
> 
> >
> > return 0;
> >
> > > > +               }
> > > > +
> > > > +       return ret;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Get the graph worker model
> > > > + *
> > > > + * @note All graph will use the same model and this function will
> > > > +get model from the first one
> > > > + *
> > > > + * @param name
> > > > + *   Name of the graph worker model.
> > > > + *
> > > > + * @return
> > > > + *   Graph worker model on success.
> > > > + */
> > > > +inline
> > > > +enum rte_graph_worker_model
> > > > +rte_graph_worker_model_get(void)
> > > > +{
> > > > +       struct graph_head *graph_head = graph_list_head_get();
> > > > +       struct graph *graph;
> > > > +
> > > > +       graph = STAILQ_FIRST(graph_head);
> > >
> > > This can be used in fastpath, So lets pass graph object and make
> > > inline function to return graph->model
> > >
> >
> > Some functions don't have the graph object like graph_stats_*. No
> > param could make this API easy to call for current impl. And I think
> > this func is mainly for configuration,
> 
> But, you are using this in fastpath here.
> https://patches.dpdk.org/project/dpdk/patch/20230605111923.3772260-14-
> zhirun.yan@intel.com/
> 

Yes, this is in "else" case. In my opinion, it  is for those who care flexibility
more than performance(they choose GRAPH_MODEL_MCORE_RUNTIME_SELECT).

You are right. Passing the graph object is better.

> > should not in fastpath. If different models coexisted, then the graph object
> must be passed.
> 
> I think, all fastpath cases graph->model can tell the model.
> graph_stats_* API etc still
> has rte_graph object so it should be possible to reuse there.

Yes, I understand that. I will change it as you said. Thanks.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v7 04/17] graph: add get/set graph worker model APIs
  2023-06-06  6:34                     ` Yan, Zhirun
@ 2023-06-06  6:39                       ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-06  6:39 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom

On Tue, Jun 6, 2023 at 12:04 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > >
> > > And change it to
> > >
> > > If (!graph_model_is_valid()) {
> > >   model = RTE_GRAPH_MODEL_DEFAULT;
> >
> > Since it returning from below, Do we need to update model?
> >
> No, no need to update it. I reset model here cause I want to reset all graph->model,
> But miss the graph loop. I think it is no need to update all graph->model also.
> I will remove it and return directly.
>
> > >   return -EINVAL;
> > > }
> > > STAILQ_FOREACH(graph, graph_head, next)
> > >     graph->graph->model = model;
> >
> > graph->model = model. Right?
> No. This graph is struct graph, and we put model into struct rte_graph.
> So it is
> (struct graph) graph -> (struct rte_graph *) graph -> model

Got it.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-06  5:55                   ` Jerin Jacob
@ 2023-06-06  8:51                     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-06  8:51 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, June 6, 2023 1:55 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore
> dispatch worker model
> 
> On Tue, Jun 6, 2023 at 10:41 AM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, June 5, 2023 9:42 PM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; stephen@networkplumber.org;
> > > pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> > > <mattias.ronnblom@ericsson.com>
> > > Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce
> > > multicore dispatch worker model
> > >
> > > On Mon, Jun 5, 2023 at 4:57 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > > >
> > > > Add new parameter "model" to choose mcore dispatch or rtc model.
> > > > And in dispatch model, the node will affinity to worker core successively.
> > > >
> > > > Note:
> > > > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default.
> Must
> > > set
> > > > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > > > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it
> > > > could choose by model in runtime.
> > > > Only support one RX node for mcore dispatch model in current
> > > > implementation.
> > > >
> > > > ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)"
> > > > -P --model="dispatch"
> > > >
> > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > ---
> > > >  examples/l3fwd-graph/main.c  | 231 +++++++++++++++++++++++++++++--
> ----
> > > >  lib/graph/rte_graph_worker.h |   3 +
> > > >  2 files changed, 196 insertions(+), 38 deletions(-)
> > > >
> > > > diff --git a/examples/l3fwd-graph/main.c
> > > > b/examples/l3fwd-graph/main.c index 5feeab4f0f..4ecc6c9af4 100644
> > > > --- a/examples/l3fwd-graph/main.c
> > > > +++ b/examples/l3fwd-graph/main.c
> > > > @@ -23,6 +23,12 @@
> > > >  #include <rte_cycles.h>
> > > >  #include <rte_eal.h>
> > > >  #include <rte_ethdev.h>
> > > > +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by
> default.
> > > > +*/ #define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
> > > #define
> > > > +GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model
> by
> > > */
> > > > +                                          /* parsing model in
> > > > +cmdline. */
> > >
> > > After moving model to graph->model, Can you check the performance.
> >
> > In my env, I test l3fwd-graph, I got the same throughput.(slight
> > improve could be treated as jitter) For graph_perf_autotest in test
> > app, there is slight drop (About 0.2% call, similar cycles/call) Can it be treated
> as jitter?
> 
> Most likely.
> Try in following in fasth path.
>  const ... model = graph->model;
> 

By default we set RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC,
So get model is only in stats, not in fast path.

I found the root cause is coming from the additional unit test.

But actually, it should not be called.  All added UT is under graph_autotest, not in graph_perf_autotest.

It's strange if I destroy the cloned graph in testcase in the additional UT, can get same stats as we expected(even
better performance). A little more calls, objs and better cycles/call (28->27).

New:
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|Node                           |calls          |objs           |realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|test_graph_perf_worker-0-0     |10286353       |2633306368     |1              |256.000        |2037.676032    |27.0000    |
|test_graph_perf_worker-1-0     |10286709       |2633397504     |1              |256.000        |2037.767168    |27.0000    |
|test_graph_perf_worker-2-0     |10286732       |2633403392     |1              |256.000        |2037.773056    |27.0000    |
|test_graph_perf_worker-3-0     |10286751       |2633408256     |1              |256.000        |2037.777920    |27.0000    |
|test_graph_perf_source-0       |10286774       |2633414144     |2              |256.000        |2037.783552    |27.0000    |
|test_graph_perf_sink-0         |10286791       |2633418496     |1              |256.000        |2037.788160    |27.0000    |
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+


> >
> > Old:
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> > |Node                           |calls          |objs           |realloc_count  |objs/call
> |objs/sec(10E6) |cycles/call|
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> > |test_graph_perf_worker-0-0     |10175176       |2604845056     |1
> |256.000        |2015.394304    |27.0000    |
> > |test_graph_perf_worker-1-0     |10175542       |2604938752     |1
> |256.000        |2015.488000    |28.0000    |
> > |test_graph_perf_worker-2-0     |10175565       |2604944640     |1
> |256.000        |2015.493888    |28.0000    |
> > |test_graph_perf_worker-3-0     |10175593       |2604951808     |1
> |256.000        |2015.501056    |27.0000    |
> > |test_graph_perf_source-0       |10175623       |2604959488     |2
> |256.000        |2015.508480    |27.0000    |
> > |test_graph_perf_sink-0         |10175642       |2604964352     |1
> |256.000        |2015.513600    |27.0000    |
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> >
> > New:
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> > |Node                           |calls          |objs           |realloc_count  |objs/call
> |objs/sec(10E6) |cycles/call|
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> > |test_graph_perf_worker-0-0     |10154953       |2599667968     |1
> |256.000        |2010.960128    |27.0000    |
> > |test_graph_perf_worker-1-0     |10155316       |2599760896     |1
> |256.000        |2011.053056    |27.0000    |
> > |test_graph_perf_worker-2-0     |10155338       |2599766528     |1
> |256.000        |2011.058688    |28.0000    |
> > |test_graph_perf_worker-3-0     |10155357       |2599771392     |1
> |256.000        |2011.063552    |28.0000    |
> > |test_graph_perf_source-0       |10155394       |2599780864     |2
> |256.000        |2011.072768    |27.0000    |
> > |test_graph_perf_sink-0         |10155422       |2599788032     |1
> |256.000        |2011.080192    |27.0000    |
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> >
> > > This may not be needed for l3fwd
> > >
> > Do you mean graph->model?
> 
> Yes.
> 
> >
> > > or if there is not much code duplication,
> > >
> > > Do the following remove the limitation,  #define
> > > RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC.
> > >
> > > graph_main_loop change to graph_main_rtc_loop
> > >
> > >  #define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_MCORE_DISPATCH
> > >
> > > graph_main_loop change to graph_main_mcore_loop
> > >
> > > Select the following based on runtime option
> > >         /* Launch per-lcore init on every worker lcore */
> > >         rte_eal_mp_remote_launch(graph_main_rtc_loop, NULL, SKIP_MAIN);
> or
> > >         rte_eal_mp_remote_launch(graph_main_mcore_loop, NULL,
> > > SKIP_MAIN);
> > >
> >
> > We want to 1. Use same API (rte_graph_walk()) for diff models.
> > 2. no performance drop for rtc (use RTE_GRAPH_MODEL_SELECT in compile
> > time)
> >
> > If I understand correctly, I need remove graph->model and only use
> > RTE_GRAPH_MODEL_SELECT to select models?
> >
> > And change it as
> > graph_main_rtc_loop()
> > {
> >   While(1)
> >     rte_graph_walk_rtc()
> > }
> >
> > But actually, I think graph->model is need, especially for config
> > stage and for runtime config If set RTE_GRAPH_MODEL_SELECT_RUNTIME.
> 
> Yes. Agree. If there is no MAJOR performance issues lets use
> RTE_GRAPH_MODEL_SELECT_RUNTIME for l3fwd.
> 

Thanks, there is no major performance issues. I will keep to use current
scheme.

> > We need the model type to decide to alloc workqueue and use
> > RTE_GRAPH_MODEL_SELECT to choose the walk.
> >
> >
> > > >         memset(&rewrite_data, 0, sizeof(rewrite_data));
> > > >         rewrite_len = sizeof(rewrite_data); diff --git
> > > > a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> > > > index 541c373cb1..19b4c1514f 100644
> > > > --- a/lib/graph/rte_graph_worker.h
> > > > +++ b/lib/graph/rte_graph_worker.h
> > > > @@ -26,6 +26,9 @@ __rte_experimental  static inline void
> > > > rte_graph_walk(struct rte_graph *graph)  {
> > > > +#define RTE_GRAPH_MODEL_RTC 0
> > > > +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
> > >
> > > No need for duplicate enum. Please remove enum make  this as in
> > > public header file.
> > >
> > Yes, it will cause no defined warnings.
> > Thanks for your comments.
> > I will remove enum and define model type macros in public header. And
> > also change the related structs/APIs.
> 
> Also add a comment in RTE_GRAPH_MODEL_MCORE_DISPATCH, If adding new
> entry, then update graph_is_valid API.
> 
Got it. Thanks very much.

> >
> > >
> > > > +
> > >
> > > Add comment here, On how  application uses this, aka.  before
> > > inlcuding the worker header file #define RTE_GRAPH_MODEL_SELECT
> > > RTE_GRAPH_MODEL_RTC.
> > > Please change the text as needed.
> > Yes, I will add comment and add the usage in document.
> >
> > >
> > >
> > > >  #if !defined(RTE_GRAPH_MODEL_SELECT) ||
> RTE_GRAPH_MODEL_SELECT ==
> > > RTE_GRAPH_MODEL_RTC
> > > >         rte_graph_walk_rtc(graph);  #elif RTE_GRAPH_MODEL_SELECT
> > > > ==
> > > RTE_GRAPH_MODEL_MCORE_DISPATCH
> > > > --
> > > > 2.37.2
> > > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 00/17] graph enhancement for multi-core dispatch
  2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
                               ` (16 preceding siblings ...)
  2023-06-05 11:19             ` [PATCH v7 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-06-06 14:47             ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 01/17] graph: rename rte_graph_work as common Zhirun Yan
                                 ` (17 more replies)
  17 siblings, 18 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V8:
No performance dorp for original l3fwd-graph and graph_perf_autotest.

Update graph model set/get functions and add graph_model_is_valid() in patch 04.
Update doc for new scheme usage(choose model in runtime or compile time).
Update dispatch schedule struct into union.
Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
Add model clone in graph_clone() in patch 08.
Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (17):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  test/graph: add functional tests for mcore dispatch model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                                |   3 +-
 app/test/test_graph.c                      | 130 +++++
 doc/guides/prog_guide/graph_lib.rst        |  68 ++-
 examples/l3fwd-graph/main.c                | 231 ++++++--
 lib/graph/graph.c                          | 159 ++++++
 lib/graph/graph_debug.c                    |   6 +
 lib/graph/graph_populate.c                 |   1 +
 lib/graph/graph_private.h                  |  90 ++++
 lib/graph/graph_stats.c                    |  76 ++-
 lib/graph/meson.build                      |   4 +-
 lib/graph/node.c                           |  27 +-
 lib/graph/rte_graph.h                      |  65 +++
 lib/graph/rte_graph_model_mcore_dispatch.c | 191 +++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 134 +++++
 lib/graph/rte_graph_model_rtc.h            |  46 ++
 lib/graph/rte_graph_worker.c               |  21 +
 lib/graph/rte_graph_worker.h               | 504 +-----------------
 lib/graph/rte_graph_worker_common.h        | 589 +++++++++++++++++++++
 lib/graph/version.map                      |  10 +
 19 files changed, 1789 insertions(+), 566 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 01/17] graph: rename rte_graph_work as common
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 02/17] graph: split graph worker into common and default model Zhirun Yan
                                 ` (16 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 3 ++-
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 8 insertions(+), 7 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 48830ae571..34ac499c14 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1716,10 +1716,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 02/17] graph: split graph worker into common and default model
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 01/17] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 03/17] graph: move node process into inline function Zhirun Yan
                                 ` (15 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 6 files changed, 100 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 03/17] graph: move node process into inline function
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 01/17] graph: rename rte_graph_work as common Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 02/17] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 04/17] graph: add get/set graph worker model APIs Zhirun Yan
                                 ` (14 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 04/17] graph: add get/set graph worker model APIs
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (2 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 03/17] graph: move node process into inline function Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 05/17] graph: introduce graph node core affinity API Zhirun Yan
                                 ` (13 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 21 ++++++++++
 lib/graph/rte_graph_worker_common.h | 61 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 86 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..3a4215f1a2
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+int
+rte_graph_worker_model_set(uint32_t model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	if (graph_model_is_valid(model))
+		return -EINVAL;
+
+	STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+
+	return 0;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..5dba3c0edd 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+/* If adding new entry, then update graph_model_is_valid API. */
+#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
+#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+/**< Dispatch model to support cross-core dispatching within core affinity. */
+
 /**
  * @internal
  *
@@ -41,6 +48,7 @@ struct rte_graph {
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	uint32_t model;		     /**< graph model */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -490,6 +498,59 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+/**
+ * Test the validity of model.
+ *
+ * @param id
+ *   Node id to check.
+ *
+ * @return
+ *   true if graph model is valid, false otherwise.
+ */
+static __rte_always_inline
+bool
+graph_model_is_valid(uint32_t model)
+{
+	if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		return false;
+
+	return true;
+}
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param model
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_graph_worker_model_set(uint32_t model);
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static inline uint32_t
+rte_graph_worker_model_get(struct rte_graph *graph)
+{
+	if (!graph_model_is_valid(graph->model))
+		return -EINVAL;
+
+	return graph->model;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 05/17] graph: introduce graph node core affinity API
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (3 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 04/17] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 06/17] graph: introduce graph bind unbind API Zhirun Yan
                                 ` (12 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  1 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  2 +
 6 files changed, 81 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..0685cf9e72 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..f39a65e902 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 06/17] graph: introduce graph bind unbind API
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (4 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 05/17] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 07/17] graph: move node clone name func into private as common Zhirun Yan
                                 ` (11 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 58 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 84 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..f8243fa61a 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,63 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	RTE_ASSERT(graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +403,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index f39a65e902..132e666b79 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 07/17] graph: move node clone name func into private as common
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (5 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 06/17] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
                                 ` (10 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 08/17] graph: introduce graph clone API for other worker core
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (6 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 07/17] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 09/17] graph: add structure for stream moving between cores Zhirun Yan
                                 ` (9 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 89 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 112 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index f8243fa61a..84e01d11d0 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -403,6 +403,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -467,6 +468,94 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Clone the graph model */
+	graph->graph->model = parent_graph->graph->model;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 132e666b79..eccecc8767 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 09/17] graph: add structure for stream moving between cores
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (7 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 10/17] graph: introduce stream moving cross cores Zhirun Yan
                                 ` (8 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 84e01d11d0..1fee835804 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -289,6 +289,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 
 	RTE_ASSERT(graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH);
 	graph->lcore_id = lcore;
+	graph->graph->dispatch.lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -312,6 +313,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->dispatch.lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 5dba3c0edd..a9a72a6005 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -36,12 +36,20 @@ extern "C" {
 #define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
 /**< Dispatch model to support cross-core dispatching within core affinity. */
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
  * Data structure to hold graph data.
  */
 struct rte_graph {
+	/* Fast path area. */
 	uint32_t tail;		     /**< Tail of circular buffer. */
 	uint32_t head;		     /**< Head of circular buffer. */
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
@@ -49,6 +57,20 @@ struct rte_graph {
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	uint32_t model;		     /**< graph model */
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+			struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+			SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+			unsigned int lcore_id;  /**< The graph running Lcore. */
+			struct rte_ring *wq;    /**< The work-queue for pending streams. */
+			struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+		} dispatch; /** Only used by dispatch model */
+	};
+	/* End of Fast path area.*/
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -81,6 +103,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			unsigned int lcore_id;  /**< Node running lcore. */
+		} dispatch;
+	};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 10/17] graph: introduce stream moving cross cores
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (8 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 09/17] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                                 ` (7 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                          |   6 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   2 +
 7 files changed, 254 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 1fee835804..8307ff079c 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -471,7 +471,7 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -545,14 +545,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 0685cf9e72..9d51eabe33 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..1d99384816 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->dispatch.wq = rte_ring_create(graph->name, wq_size, graph->socket,
+					     flags);
+	if (graph->dispatch.wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
+						sizeof(struct graph_mcore_dispatch_wq_node),
+						0, 0, NULL, NULL, NULL, NULL,
+						graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->dispatch.mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->dispatch.lcore_id = _graph->lcore_id;
+
+	if (parent_graph->dispatch.rq == NULL) {
+		parent_graph->dispatch.rq = &parent_graph->dispatch.rq_head;
+		SLIST_INIT(parent_graph->dispatch.rq);
+	}
+
+	graph->dispatch.rq = parent_graph->dispatch.rq;
+	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, dispatch.rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+
+	rte_mempool_free(graph->dispatch.mp);
+	graph->dispatch.mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void *)&wq_node,
+					     sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, dispatch.rq_next)
+		if (graph->dispatch.lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->dispatch.mp;
+	struct rte_ring *wq = graph->dispatch.wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eccecc8767..d33c453d97 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
 
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 11/17] graph: enable create and destroy graph scheduling workqueue
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (9 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 10/17] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                                 ` (6 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8307ff079c..18318b5745 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -449,6 +449,11 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get(graph->graph) ==
+			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -522,6 +527,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	/* Clone the graph model */
 	graph->graph->model = parent_graph->graph->model;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get(graph->graph) == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 12/17] graph: introduce graph walk by cross-core dispatch
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (10 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                                 ` (5 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..c78a3bbdf9 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,50 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->dispatch.wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
+		    graph->dispatch.rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->dispatch.rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (11 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 14/17] graph: add stats for cross-core dispatching Zhirun Yan
                                 ` (4 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model. Must define
RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
to enable specific model choosing.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..69bdd0e074 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,18 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC
 	rte_graph_walk_rtc(graph);
+#elif defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	int model = rte_graph_worker_model_get(graph);
+
+	if (model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 14/17] graph: add stats for cross-core dispatching
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (12 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                                 ` (3 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..cbb512e120 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get(g) == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..210ba01f5c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,28 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +104,22 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 1d99384816..b379568d6c 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a9a72a6005..cd49518164 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,8 @@ struct rte_node {
 			unsigned int lcore_id;  /**< Node running lcore. */
 		} dispatch;
 	};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (13 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 14/17] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
                                 ` (2 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

Note:
RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
choose by model in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 231 ++++++++++++++++++++++++++++++------
 1 file changed, 193 insertions(+), 38 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..77a5a98aec 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,12 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default. */
+#define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
+#define GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by */
+					   /* parsing model in cmdline. */
+#undef RTE_GRAPH_MODEL_SELECT
+#define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -55,6 +61,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +97,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static uint32_t model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +164,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +300,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +343,19 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+
+	if (model_conf != RTE_GRAPH_MODEL_SELECT &&
+	    RTE_GRAPH_MODEL_SELECT <= RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +472,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +489,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +501,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +593,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -788,6 +835,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -840,6 +1023,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1265,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 16/17] test/graph: add functional tests for mcore dispatch model
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (14 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-06 14:47               ` [PATCH v8 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 app/test/test_graph.c | 130 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..8609c0b3a4 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,132 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+	int ret = 0;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		ret = -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		ret = -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->dispatch.lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		ret = -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->dispatch.lcore_id);
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test3");
+	if (rte_graph_worker_model_get(graph) != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v8 17/17] doc: update multicore dispatch model in graph guides
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (15 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
@ 2023-06-06 14:47               ` Zhirun Yan
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-06 14:47 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 68 +++++++++++++++++++++++++++--
 1 file changed, 64 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..aecf3d2e03 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,74 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+Graph models chossing
+~~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking models. Use macro
+RTE_GRAPH_MODEL_SELECT to set the model in compile time. Also offers the
+ability to choose models in runtime.
+For application, must #define RTE_GRAPH_MODEL_SELECT before inlcuding
+rte_graph_worker.h.
+
+In l3fwd-graph, set RTE_GRAPH_MODEL_SELECT as the model explicitly for
+performance-sensitive use case.
+Or set the macro as GRAPH_MODEL_MCORE_RUNTIME_SELECT. And parse
+``"--model=NAME"`` in cmdline and use ``rte_graph_worker_model_set()``
+to set the walking model in runtime.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 00/17] graph enhancement for multi-core dispatch
  2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                 ` (16 preceding siblings ...)
  2023-06-06 14:47               ` [PATCH v8 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-06-07  3:51               ` Zhirun Yan
  2023-06-07  3:51                 ` [PATCH v9 01/17] graph: rename rte_graph_work as common Zhirun Yan
                                   ` (17 more replies)
  17 siblings, 18 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V9:
Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
Fix graph model check in rte_graph_worker_model_set() in patch 04.
Fix typo in doc.

V8:
No performance dorp for original l3fwd-graph and graph_perf_autotest.

Update graph model set/get functions and add graph_model_is_valid() in patch 04.
Update doc for new scheme usage(choose model in runtime or compile time).
Update dispatch schedule struct into union.
Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
Add model clone in graph_clone() in patch 08.
Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (17):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for cross-core dispatching
  examples/l3fwd-graph: introduce multicore dispatch worker model
  test/graph: add functional tests for mcore dispatch model
  doc: update multicore dispatch model in graph guides

 MAINTAINERS                                |   3 +-
 app/test/test_graph.c                      | 130 +++++
 doc/guides/prog_guide/graph_lib.rst        |  68 ++-
 examples/l3fwd-graph/main.c                | 231 ++++++--
 lib/graph/graph.c                          | 159 ++++++
 lib/graph/graph_debug.c                    |   6 +
 lib/graph/graph_populate.c                 |   1 +
 lib/graph/graph_private.h                  |  90 ++++
 lib/graph/graph_stats.c                    |  76 ++-
 lib/graph/meson.build                      |   4 +-
 lib/graph/node.c                           |  27 +-
 lib/graph/rte_graph.h                      |  65 +++
 lib/graph/rte_graph_model_mcore_dispatch.c | 191 +++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 134 +++++
 lib/graph/rte_graph_model_rtc.h            |  46 ++
 lib/graph/rte_graph_worker.c               |  21 +
 lib/graph/rte_graph_worker.h               | 504 +-----------------
 lib/graph/rte_graph_worker_common.h        | 589 +++++++++++++++++++++
 lib/graph/version.map                      |  10 +
 19 files changed, 1789 insertions(+), 566 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 01/17] graph: rename rte_graph_work as common
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  7:28                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 02/17] graph: split graph worker into common and default model Zhirun Yan
                                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 MAINTAINERS                                                 | 3 ++-
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 8 insertions(+), 7 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 48830ae571..34ac499c14 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1716,10 +1716,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 02/17] graph: split graph worker into common and default model
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-07  3:51                 ` [PATCH v9 01/17] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  7:30                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 03/17] graph: move node process into inline function Zhirun Yan
                                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 6 files changed, 100 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 03/17] graph: move node process into inline function
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-07  3:51                 ` [PATCH v9 01/17] graph: rename rte_graph_work as common Zhirun Yan
  2023-06-07  3:51                 ` [PATCH v9 02/17] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  7:31                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 04/17] graph: add get/set graph worker model APIs Zhirun Yan
                                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 04/17] graph: add get/set graph worker model APIs
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (2 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 03/17] graph: move node process into inline function Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  7:42                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 05/17] graph: introduce graph node core affinity API Zhirun Yan
                                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 21 ++++++++++
 lib/graph/rte_graph_worker_common.h | 61 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  3 ++
 4 files changed, 86 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..b43330bc8d
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+int
+rte_graph_worker_model_set(uint32_t model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	if (!graph_model_is_valid(model))
+		return -EINVAL;
+
+	STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+
+	return 0;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..5dba3c0edd 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+/* If adding new entry, then update graph_model_is_valid API. */
+#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
+#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+/**< Dispatch model to support cross-core dispatching within core affinity. */
+
 /**
  * @internal
  *
@@ -41,6 +48,7 @@ struct rte_graph {
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	uint32_t model;		     /**< graph model */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -490,6 +498,59 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+/**
+ * Test the validity of model.
+ *
+ * @param id
+ *   Node id to check.
+ *
+ * @return
+ *   true if graph model is valid, false otherwise.
+ */
+static __rte_always_inline
+bool
+graph_model_is_valid(uint32_t model)
+{
+	if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		return false;
+
+	return true;
+}
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param model
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_graph_worker_model_set(uint32_t model);
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static inline uint32_t
+rte_graph_worker_model_get(struct rte_graph *graph)
+{
+	if (!graph_model_is_valid(graph->model))
+		return -EINVAL;
+
+	return graph->model;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..eea73ec9ca 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,8 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 05/17] graph: introduce graph node core affinity API
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (3 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 04/17] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  7:56                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 06/17] graph: introduce graph bind unbind API Zhirun Yan
                                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  1 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  2 +
 6 files changed, 81 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..0685cf9e72 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eea73ec9ca..f39a65e902 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -46,5 +46,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 06/17] graph: introduce graph bind unbind API
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (4 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 05/17] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  7:59                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 07/17] graph: move node clone name func into private as common Zhirun Yan
                                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 58 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 +++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 84 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..f8243fa61a 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,63 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	RTE_ASSERT(graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +403,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index f39a65e902..132e666b79 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 07/17] graph: move node clone name func into private as common
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (5 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 06/17] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  8:01                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
                                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 08/17] graph: introduce graph clone API for other worker core
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (6 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 07/17] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  8:04                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 09/17] graph: add structure for stream moving between cores Zhirun Yan
                                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 89 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 112 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index f8243fa61a..84e01d11d0 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -403,6 +403,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -467,6 +468,94 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Clone the graph model */
+	graph->graph->model = parent_graph->graph->model;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 132e666b79..eccecc8767 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 09/17] graph: add structure for stream moving between cores
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (7 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  3:51                 ` [PATCH v9 10/17] graph: introduce stream moving cross cores Zhirun Yan
                                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 84e01d11d0..1fee835804 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -289,6 +289,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 
 	RTE_ASSERT(graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH);
 	graph->lcore_id = lcore;
+	graph->graph->dispatch.lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -312,6 +313,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->dispatch.lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 5dba3c0edd..9088d4c173 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -36,12 +36,20 @@ extern "C" {
 #define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
 /**< Dispatch model to support cross-core dispatching within core affinity. */
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
  * Data structure to hold graph data.
  */
 struct rte_graph {
+	/* Fast path area. */
 	uint32_t tail;		     /**< Tail of circular buffer. */
 	uint32_t head;		     /**< Head of circular buffer. */
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
@@ -49,6 +57,20 @@ struct rte_graph {
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	uint32_t model;		     /**< graph model */
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+			struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+			unsigned int lcore_id;  /**< The graph running Lcore. */
+			struct rte_ring *wq;    /**< The work-queue for pending streams. */
+			struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+		} dispatch; /** Only used by dispatch model */
+	};
+	SLIST_ENTRY(rte_graph) next;   /* The next for rte_graph list */
+	/* End of Fast path area.*/
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -81,6 +103,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			unsigned int lcore_id;  /**< Node running lcore. */
+		} dispatch;
+	};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 10/17] graph: introduce stream moving cross cores
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (8 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 09/17] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  3:51                 ` [PATCH v9 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                          |   6 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   2 +
 7 files changed, 254 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 1fee835804..8307ff079c 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -471,7 +471,7 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -545,14 +545,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 0685cf9e72..9d51eabe33 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..8f4bc860ab 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->dispatch.wq = rte_ring_create(graph->name, wq_size, graph->socket,
+					     flags);
+	if (graph->dispatch.wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
+						sizeof(struct graph_mcore_dispatch_wq_node),
+						0, 0, NULL, NULL, NULL, NULL,
+						graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->dispatch.mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->dispatch.lcore_id = _graph->lcore_id;
+
+	if (parent_graph->dispatch.rq == NULL) {
+		parent_graph->dispatch.rq = &parent_graph->dispatch.rq_head;
+		SLIST_INIT(parent_graph->dispatch.rq);
+	}
+
+	graph->dispatch.rq = parent_graph->dispatch.rq;
+	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+
+	rte_mempool_free(graph->dispatch.mp);
+	graph->dispatch.mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void *)&wq_node,
+					     sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, next)
+		if (graph->dispatch.lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->dispatch.mp;
+	struct rte_ring *wq = graph->dispatch.wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index eccecc8767..d33c453d97 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,6 +48,8 @@ EXPERIMENTAL {
 
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
 
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 11/17] graph: enable create and destroy graph scheduling workqueue
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (9 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 10/17] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  3:51                 ` [PATCH v9 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8307ff079c..18318b5745 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -449,6 +449,11 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get(graph->graph) ==
+			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -522,6 +527,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	/* Clone the graph model */
 	graph->graph->model = parent_graph->graph->model;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get(graph->graph) == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 12/17] graph: introduce graph walk by cross-core dispatch
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (10 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  8:09                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..c78a3bbdf9 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,50 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->dispatch.wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
+		    graph->dispatch.rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->dispatch.rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (11 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  8:15                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 14/17] graph: add stats for cross-core dispatching Zhirun Yan
                                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model. Must define
RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
to enable specific model choosing.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/rte_graph_worker.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..69bdd0e074 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,18 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC
 	rte_graph_walk_rtc(graph);
+#elif defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	int model = rte_graph_worker_model_get(graph);
+
+	if (model == RTE_GRAPH_MODEL_RTC)
+		rte_graph_walk_rtc(graph);
+	else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_graph_walk_mcore_dispatch(graph);
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 14/17] graph: add stats for cross-core dispatching
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (12 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  8:20                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
                                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..cbb512e120 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get(g) == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..210ba01f5c 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,28 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +104,22 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->total_sched_objs;
+			sched_fail += node->total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 8f4bc860ab..3f5799dbe3 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 9088d4c173..3fd3f7622f 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,8 @@ struct rte_node {
 			unsigned int lcore_id;  /**< Node running lcore. */
 		} dispatch;
 	};
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (13 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 14/17] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  8:27                   ` Jerin Jacob
  2023-06-07  3:51                 ` [PATCH v9 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
                                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

Note:
RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
choose by model in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l3fwd-graph/main.c | 231 ++++++++++++++++++++++++++++++------
 1 file changed, 193 insertions(+), 38 deletions(-)

diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..77a5a98aec 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,12 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default. */
+#define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
+#define GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by */
+					   /* parsing model in cmdline. */
+#undef RTE_GRAPH_MODEL_SELECT
+#define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -55,6 +61,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +97,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static uint32_t model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +164,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +300,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +343,19 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+
+	if (model_conf != RTE_GRAPH_MODEL_SELECT &&
+	    RTE_GRAPH_MODEL_SELECT <= RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +472,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +489,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +501,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +593,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -788,6 +835,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -840,6 +1023,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1265,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 16/17] test/graph: add functional tests for mcore dispatch model
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (14 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07  3:51                 ` [PATCH v9 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 app/test/test_graph.c | 130 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..8609c0b3a4 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,132 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+	int ret = 0;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		ret = -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		ret = -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->dispatch.lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		ret = -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->dispatch.lcore_id);
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test3");
+	if (rte_graph_worker_model_get(graph) != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v9 17/17] doc: update multicore dispatch model in graph guides
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (15 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
@ 2023-06-07  3:51                 ` Zhirun Yan
  2023-06-07 12:45                   ` Jerin Jacob
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  17 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-07  3:51 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Update graph documentation to introduce new multicore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 68 +++++++++++++++++++++++++++--
 1 file changed, 64 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..8c2c2816ed 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,14 +189,74 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
+Graph models chossing
+~~~~~~~~~~~~~~~~~~~~~
+Currently, there are 2 different walking models. Use macro
+RTE_GRAPH_MODEL_SELECT to set the model in compile time. Also offers the
+ability to choose models in runtime.
+For application, must #define RTE_GRAPH_MODEL_SELECT before including
+rte_graph_worker.h.
+
+In l3fwd-graph, set RTE_GRAPH_MODEL_SELECT as the model explicitly for
+performance-sensitive use case.
+Or set the macro as GRAPH_MODEL_MCORE_RUNTIME_SELECT. And parse
+``"--model=NAME"`` in cmdline and use ``rte_graph_worker_model_set()``
+to set the walking model in runtime.
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically,
+``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
 are designed to work on single-core to have better performance.
 The fast path API works on graph object, So the multi-core graph
 processing strategy would be to create graph object PER WORKER.
 
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
+
 In fast path
 ~~~~~~~~~~~~
 Typical fast-path code looks like below, where the application
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 01/17] graph: rename rte_graph_work as common
  2023-06-07  3:51                 ` [PATCH v9 01/17] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-07  7:28                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  7:28 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:29 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Rename rte_graph_work.h to rte_graph_work_common.h for supporting
> multiple graph worker model.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

You can keep this TAG in upcoming versions on the patch, if there is no change.

Acked-by: Jerin Jacob <jerinj@marvell.com>


> ---
>  MAINTAINERS                                                 | 3 ++-
>  lib/graph/graph_pcap.c                                      | 2 +-
>  lib/graph/graph_private.h                                   | 2 +-
>  lib/graph/meson.build                                       | 2 +-
>  lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
>  5 files changed, 8 insertions(+), 7 deletions(-)
>  rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 48830ae571..34ac499c14 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1716,10 +1716,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
>  Graph - EXPERIMENTAL
>  M: Jerin Jacob <jerinj@marvell.com>
>  M: Kiran Kumar K <kirankumark@marvell.com>
> +M: Nithin Dabilpuram <ndabilpuram@marvell.com>
> +M: Zhirun Yan <zhirun.yan@intel.com>
>  F: lib/graph/
>  F: doc/guides/prog_guide/graph_lib.rst
>  F: app/test/test_graph*
> -M: Nithin Dabilpuram <ndabilpuram@marvell.com>
>  F: examples/l3fwd-graph/
>  F: doc/guides/sample_app_ug/l3_forward_graph.rst
>
> diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
> index 6c43330029..8a220370fa 100644
> --- a/lib/graph/graph_pcap.c
> +++ b/lib/graph/graph_pcap.c
> @@ -10,7 +10,7 @@
>  #include <rte_mbuf.h>
>  #include <rte_pcapng.h>
>
> -#include "rte_graph_worker.h"
> +#include "rte_graph_worker_common.h"
>
>  #include "graph_pcap_private.h"
>
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index eacdef45f0..307e5f70bc 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -13,7 +13,7 @@
>  #include <rte_spinlock.h>
>
>  #include "rte_graph.h"
> -#include "rte_graph_worker.h"
> +#include "rte_graph_worker_common.h"
>
>  extern int rte_graph_logtype;
>
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index 3526d1b5d4..4e2b612ad3 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -16,6 +16,6 @@ sources = files(
>          'graph_populate.c',
>          'graph_pcap.c',
>  )
> -headers = files('rte_graph.h', 'rte_graph_worker.h')
> +headers = files('rte_graph.h', 'rte_graph_worker_common.h')
>
>  deps += ['eal', 'pcapng']
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
> similarity index 99%
> rename from lib/graph/rte_graph_worker.h
> rename to lib/graph/rte_graph_worker_common.h
> index 438595b15c..0bad2938f3 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -2,8 +2,8 @@
>   * Copyright(C) 2020 Marvell International Ltd.
>   */
>
> -#ifndef _RTE_GRAPH_WORKER_H_
> -#define _RTE_GRAPH_WORKER_H_
> +#ifndef _RTE_GRAPH_WORKER_COMMON_H_
> +#define _RTE_GRAPH_WORKER_COMMON_H_
>
>  /**
>   * @file rte_graph_worker.h
> @@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
>  }
>  #endif
>
> -#endif /* _RTE_GRAPH_WORKER_H_ */
> +#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 02/17] graph: split graph worker into common and default model
  2023-06-07  3:51                 ` [PATCH v9 02/17] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-07  7:30                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  7:30 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:29 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> To support multiple graph worker model, split graph into common
> and default. Naming the current walk function as rte_graph_model_rtc
> cause the default model is RTC(Run-to-completion).
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>


Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 03/17] graph: move node process into inline function
  2023-06-07  3:51                 ` [PATCH v9 03/17] graph: move node process into inline function Zhirun Yan
@ 2023-06-07  7:31                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  7:31 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Node process is a single and reusable block, move the code into an inline
> function.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 04/17] graph: add get/set graph worker model APIs
  2023-06-07  3:51                 ` [PATCH v9 04/17] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-07  7:42                   ` Jerin Jacob
  2023-06-07 12:25                     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  7:42 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

>
> +/** Graph worker models */
> +/* If adding new entry, then update graph_model_is_valid API. */

When adding new model entry, update rte_graph_model_is_valid API logic

> +#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
> +#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */

This line can come after RTE_GRAPH_MODEL_MCORE_DISPATCH

> +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
> +/**< Dispatch model to support cross-core dispatching within core affinity. */
> +
>  /**
>   * @internal
>   *
> @@ -41,6 +48,7 @@ struct rte_graph {
>         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
>         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
>         rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
> +       uint32_t model;              /**< graph model */

uin8_t is enough and add uint8_t and uint16_t reserved. So that this
fastpath area can be used in future as needed.


>         rte_graph_t id; /**< Graph identifier. */
>         int socket;     /**< Socket ID where memory is allocated. */
>         char name[RTE_GRAPH_NAMESIZE];  /**< Name of the graph. */
> @@ -490,6 +498,59 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
>         }
>  }
>
> +/**
> + * Test the validity of model.
> + *
> + * @param id
> + *   Node id to check.

It is not node id

> + *
> + * @return
> + *   true if graph model is valid, false otherwise.
> + */
> +static __rte_always_inline
> +bool
> +graph_model_is_valid(uint32_t model)

Public API in header file, use rte_graph_...
Also, implementation can go to .c file. See below comment for no_check version.



> +{
> +       if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
> +               return false;
> +
> +       return true;
> +}
> +
> +/**
> + * @note This function does not perform any locking, and is only safe to call
> + *    before graph running. It will set all graphs the same model.
> + *
> + * @param model
> + *   Name of the graph worker model.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_graph_worker_model_set(uint32_t model);
> +
> +/**
> + * Get the graph worker model
> + *
> + * @note All graph will use the same model and this function will get model from the first one
> + *
> + * @param graph
> + *   Graph pointer.
> + *
> + * @return
> + *   Graph worker model on success.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_graph_worker_model_get(struct rte_graph *graph)
> +{
> +       if (!graph_model_is_valid(graph->model))
> +               return -EINVAL;

Introduce rte_graph_worker_model_no_check_get() to skip this check to
use with fastpath.

rte_graph_worker_model_get can move to .c file.

> +
> +       return graph->model;
> +}


> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index 13b838752d..eea73ec9ca 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -43,5 +43,8 @@ EXPERIMENTAL {
>         rte_node_next_stream_put;
>         rte_node_next_stream_move;
>
> +       rte_graph_worker_model_set;
> +       rte_graph_worker_model_get;

Add rte_graph_model_is_valid() in next verion.


> +
>         local: *;
>  };
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 05/17] graph: introduce graph node core affinity API
  2023-06-07  3:51                 ` [PATCH v9 05/17] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-07  7:56                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  7:56 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add lcore_id for node to hold affinity core id and impl
> rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
> with specific lcore.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 06/17] graph: introduce graph bind unbind API
  2023-06-07  3:51                 ` [PATCH v9 06/17] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-07  7:59                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  7:59 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add lcore_id for graph to hold affinity core id where graph would run on.
> Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
> be set as MAX by default, it means not enable this attribute.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

> +int
> +rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
> +{
> +       struct graph *graph;
> +
> +       GRAPH_ID_CHECK(id);
> +       if (!rte_lcore_is_enabled(lcore))
> +               SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
> +
> +       STAILQ_FOREACH(graph, &graph_list, next)
> +               if (graph->id == id)
> +                       break;
> +
> +       RTE_ASSERT(graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH);

No assert in library or slowpath. Return error.

With that change
Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 07/17] graph: move node clone name func into private as common
  2023-06-07  3:51                 ` [PATCH v9 07/17] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-07  8:01                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  8:01 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Move clone_name() into graph_private.h as a common function for both node
> and graph to naming a new cloned object.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 08/17] graph: introduce graph clone API for other worker core
  2023-06-07  3:51                 ` [PATCH v9 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-07  8:04                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  8:04 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch adds graph API for supporting to clone the graph object for
> a specified worker core. The new graph will also clone all nodes.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---

> +/**
> + * Clone Graph.
> + *
> + * Clone a graph from static graph (graph created from rte_graph_create()). And
> + * all cloned graphs attached to the parent graph MUST be destroyed together
> + * for fast schedule design limitation (stop ALL graph walk firstly).

Please plan to use reference count scheme to avoid this limitation in
next release.

With that,

Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 12/17] graph: introduce graph walk by cross-core dispatch
  2023-06-07  3:51                 ` [PATCH v9 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-07  8:09                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  8:09 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch introduces the task scheduler mechanism to enable dispatching
> tasks to another worker cores. Currently, there is only a local work
> queue for one graph to walk. We introduce a scheduler worker queue in
> each worker core for dispatching tasks. It will perform the walk on
> scheduler work queue first, then handle the local work queue.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-07  3:51                 ` [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-07  8:15                   ` Jerin Jacob
  2023-06-07 12:25                     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  8:15 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch enables to chose new scheduler model. Must define
> RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
> to enable specific model choosing.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_worker.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>

>  rte_graph_walk(struct rte_graph *graph)
>  {
> +#if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC

Is nt defined instead of !defined?

Use bracket around RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC.


>         rte_graph_walk_rtc(graph);
> +#elif defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH

Use bracket around RTE_GRAPH_MODEL_SELECT ==

> +       rte_graph_walk_mcore_dispatch(graph);
> +#else
> +       int model = rte_graph_worker_model_get(graph);

Introduce rte_graph_worker_model_no_check_get() as commented earlier.
> +
> +       if (model == RTE_GRAPH_MODEL_RTC)
> +               rte_graph_walk_rtc(graph);
> +       else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> +               rte_graph_walk_mcore_dispatch(graph);

I think, switch case better to support new model in future. Please
check the performance before changing.

i.e

switch ( rte_graph_worker_model_no_check_get())
{
case RTE_GRAPH_MODEL_MCORE_DISPATCH:
        rte_graph_walk_mcore_dispatch(graph)
        break;
default:
       rte_graph_walk_rtc(graph);
}

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 14/17] graph: add stats for cross-core dispatching
  2023-06-07  3:51                 ` [PATCH v9 14/17] graph: add stats for cross-core dispatching Zhirun Yan
@ 2023-06-07  8:20                   ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  8:20 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add stats for cross-core dispatching scheduler if stats collection is

Use mcore dispatch model.(subject as well). be consistent in name usage.

> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index 9088d4c173..3fd3f7622f 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -110,6 +110,8 @@ struct rte_node {
>                         unsigned int lcore_id;  /**< Node running lcore. */
>                 } dispatch;
>         };
> +       uint64_t total_sched_objs; /**< Number of objects scheduled. */
> +       uint64_t total_sched_fail; /**< Number of scheduled failure. */

Move under dispatch_stats union.

With above change:
Acked-by: Jerin Jacob <jerinj@marvell.com>

>         /* Fast path area  */
>  #define RTE_NODE_CTX_SZ 16
>         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-07  3:51                 ` [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
@ 2023-06-07  8:27                   ` Jerin Jacob
  2023-06-07 12:26                     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07  8:27 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new parameter "model" to choose mcore dispatch or rtc model.
> And in dispatch model, the node will affinity to worker core successively.
>
> Note:
> RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must set
> model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
> choose by model in runtime.

Now no GRAPH_MODEL_MCORE_RUNTIME_SELECT. Right? it same as
!RTE_GRAPH_MODEL_SELECT

> Only support one RX node for mcore dispatch model in current
> implementation.
>
> ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="dispatch"

Please update doc/guides/sample_app_ug/l3_forward_graph.rst for new
model option and example command line
and relevant detail on RTE_GRAPH_MODEL_SELECT scheme.


> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  examples/l3fwd-graph/main.c | 231 ++++++++++++++++++++++++++++++------
>  1 file changed, 193 insertions(+), 38 deletions(-)
>
> diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> index 5feeab4f0f..77a5a98aec 100644
> --- a/examples/l3fwd-graph/main.c
> +++ b/examples/l3fwd-graph/main.c
> @@ -23,6 +23,12 @@
>  #include <rte_cycles.h>
>  #include <rte_eal.h>
>  #include <rte_ethdev.h>
> +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default. */
> +#define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
> +#define GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by */

Duplication from public header file. Not needed.

> +                                          /* parsing model in cmdline. */
> +#undef RTE_GRAPH_MODEL_SELECT

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-07  8:15                   ` Jerin Jacob
@ 2023-06-07 12:25                     ` Yan, Zhirun
  2023-06-07 13:26                       ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-07 12:25 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, June 7, 2023 4:15 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler
> model
> 
> On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > This patch enables to chose new scheduler model. Must define
> > RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h to enable
> > specific model choosing.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_worker.h | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> 
> >  rte_graph_walk(struct rte_graph *graph)  {
> > +#if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT
> ==
> > +RTE_GRAPH_MODEL_RTC
> 
> Is nt defined instead of !defined?
>

!defined(XX) means not defined XX.
What is nt defined means?

> Use bracket around RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC.
> 
Ok.

> 
> >         rte_graph_walk_rtc(graph);
> > +#elif defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT
> ==
> > +RTE_GRAPH_MODEL_MCORE_DISPATCH
> 
> Use bracket around RTE_GRAPH_MODEL_SELECT ==
Ok.
> 
> > +       rte_graph_walk_mcore_dispatch(graph);
> > +#else
> > +       int model = rte_graph_worker_model_get(graph);
> 
> Introduce rte_graph_worker_model_no_check_get() as commented earlier.

Got it.

> > +
> > +       if (model == RTE_GRAPH_MODEL_RTC)
> > +               rte_graph_walk_rtc(graph);
> > +       else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> > +               rte_graph_walk_mcore_dispatch(graph);
> 
> I think, switch case better to support new model in future. Please check the
> performance before changing.
> 

Yes, I agree.
And I checked the performance with switch case and rte_graph_worker_model_no_check_get()
I get very similar performance. The improvements is <0.1%. I guess the performance impact will
be less if the workerload goes more complicated, cause the node process in walk will spent more
time.
And I will change in next version.

> i.e
> 
> switch ( rte_graph_worker_model_no_check_get())
> {
> case RTE_GRAPH_MODEL_MCORE_DISPATCH:
>         rte_graph_walk_mcore_dispatch(graph)
>         break;
> default:
>        rte_graph_walk_rtc(graph);
> }

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v9 04/17] graph: add get/set graph worker model APIs
  2023-06-07  7:42                   ` Jerin Jacob
@ 2023-06-07 12:25                     ` Yan, Zhirun
  2023-06-07 13:28                       ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-07 12:25 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, June 7, 2023 3:43 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v9 04/17] graph: add get/set graph worker model APIs
> 
> On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> 
> >
> > +/** Graph worker models */
> > +/* If adding new entry, then update graph_model_is_valid API. */
> 
> When adding new model entry, update rte_graph_model_is_valid API logic
> 
Got it.

> > +#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the
> > +default model. */ #define RTE_GRAPH_MODEL_DEFAULT
> RTE_GRAPH_MODEL_RTC
> > +/**< Default graph model. */
> 
> This line can come after RTE_GRAPH_MODEL_MCORE_DISPATCH
> 
Ok.

> > +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1 /**< Dispatch model to
> > +support cross-core dispatching within core affinity. */
> > +
> >  /**
> >   * @internal
> >   *
> > @@ -41,6 +48,7 @@ struct rte_graph {
> >         rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
> >         rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
> >         rte_graph_off_t nodes_start; /**< Offset at which node memory
> > starts. */
> > +       uint32_t model;              /**< graph model */
> 
> uin8_t is enough and add uint8_t and uint16_t reserved. So that this fastpath
> area can be used in future as needed.
> 
I agree.  Thanks.
> 
> >         rte_graph_t id; /**< Graph identifier. */
> >         int socket;     /**< Socket ID where memory is allocated. */
> >         char name[RTE_GRAPH_NAMESIZE];  /**< Name of the graph. */ @@
> > -490,6 +498,59 @@ rte_node_next_stream_move(struct rte_graph *graph,
> struct rte_node *src,
> >         }
> >  }
> >
> > +/**
> > + * Test the validity of model.
> > + *
> > + * @param id
> > + *   Node id to check.
> 
> It is not node id
> 
Will change it to model.

> > + *
> > + * @return
> > + *   true if graph model is valid, false otherwise.
> > + */
> > +static __rte_always_inline
> > +bool
> > +graph_model_is_valid(uint32_t model)
> 
> Public API in header file, use rte_graph_...
> Also, implementation can go to .c file. See below comment for no_check version.
> 
Ok. I will move it into rte_graph_worker.c.
> 
> 
> > +{
> > +       if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
> > +               return false;
> > +
> > +       return true;
> > +}
> > +
> > +/**
> > + * @note This function does not perform any locking, and is only safe to call
> > + *    before graph running. It will set all graphs the same model.
> > + *
> > + * @param model
> > + *   Name of the graph worker model.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> > + */
> > +__rte_experimental
> > +int rte_graph_worker_model_set(uint32_t model);
> > +
> > +/**
> > + * Get the graph worker model
> > + *
> > + * @note All graph will use the same model and this function will get
> > +model from the first one
> > + *
> > + * @param graph
> > + *   Graph pointer.
> > + *
> > + * @return
> > + *   Graph worker model on success.
> > + */
> > +__rte_experimental
> > +static inline uint32_t
> > +rte_graph_worker_model_get(struct rte_graph *graph) {
> > +       if (!graph_model_is_valid(graph->model))
> > +               return -EINVAL;
> 
> Introduce rte_graph_worker_model_no_check_get() to skip this check to use
> with fastpath.
> 
> rte_graph_worker_model_get can move to .c file.

Yes. Will move in next version.
Got it. rte_graph_worker_model_no_check_get() will be used in fast path.
Actually, I don’t find the performance impact about static inline, so should the new
API declared with static inline keywords or put it into .c file also?

> 
> > +
> > +       return graph->model;
> > +}
> 
> 
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > 13b838752d..eea73ec9ca 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -43,5 +43,8 @@ EXPERIMENTAL {
> >         rte_node_next_stream_put;
> >         rte_node_next_stream_move;
> >
> > +       rte_graph_worker_model_set;
> > +       rte_graph_worker_model_get;
> 
> Add rte_graph_model_is_valid() in next verion.
Yes.
> 
> 
> > +
> >         local: *;
> >  };
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-07  8:27                   ` Jerin Jacob
@ 2023-06-07 12:26                     ` Yan, Zhirun
  2023-06-07 13:58                       ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-07 12:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, June 7, 2023 4:27 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore
> dispatch worker model
> 
> On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new parameter "model" to choose mcore dispatch or rtc model.
> > And in dispatch model, the node will affinity to worker core successively.
> >
> > Note:
> > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must
> set
> > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
> > choose by model in runtime.
> 
> Now no GRAPH_MODEL_MCORE_RUNTIME_SELECT. Right? it same
> as !RTE_GRAPH_MODEL_SELECT

GRAPH_MODEL_MCORE_RUNTIME_SELECT is defined in app. And it will hit rte_graph_walk()
"else" case to get model and select walk_rtc()/walk_dispatch().

It is not the same as !RTE_GRAPH_MODEL_SELECT.
GRAPH_MODEL_MCORE_RUNTIME_SELECT means to select in runtime.
!RTE_GRAPH_MODEL_SELECT means to use the default rtc_walk(), it will hit the first case in
rte_graph_walk() in patch 13(http://patchwork.dpdk.org/project/dpdk/patch/20230607035144.1214492-14-zhirun.yan@intel.com/)

Should I put it with other model definition?
If yes, I prefer to put it the first one in patch 04, like:

+#define GRAPH_MODEL_MCORE_RUNTIME_SELECT 0
#define RTE_GRAPH_MODEL_RTC 1
#define RTE_GRAPH_MODEL_MCORE_DISPATCH 2
#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC

> 
> > Only support one RX node for mcore dispatch model in current
> > implementation.
> >
> > ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > --model="dispatch"
> 
> Please update doc/guides/sample_app_ug/l3_forward_graph.rst for new model
> option and example command line and relevant detail on
> RTE_GRAPH_MODEL_SELECT scheme.
> 
Ok, will update in next version.

> 
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  examples/l3fwd-graph/main.c | 231
> > ++++++++++++++++++++++++++++++------
> >  1 file changed, 193 insertions(+), 38 deletions(-)
> >
> > diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> > index 5feeab4f0f..77a5a98aec 100644
> > --- a/examples/l3fwd-graph/main.c
> > +++ b/examples/l3fwd-graph/main.c
> > @@ -23,6 +23,12 @@
> >  #include <rte_cycles.h>
> >  #include <rte_eal.h>
> >  #include <rte_ethdev.h>
> > +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default.
> > +*/ #define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
> #define
> > +GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by
> */
> 
> Duplication from public header file. Not needed.

Yes, I will remove these in next version. Thanks.

> 
> > +                                          /* parsing model in
> > +cmdline. */ #undef RTE_GRAPH_MODEL_SELECT

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 17/17] doc: update multicore dispatch model in graph guides
  2023-06-07  3:51                 ` [PATCH v9 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-06-07 12:45                   ` Jerin Jacob
  2023-06-08  3:21                     ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07 12:45 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Update graph documentation to introduce new multicore dispatch model.

Please squash this to relevant implementation patches, No need for
separate patch. This is the contribution guideline followed.

>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  doc/guides/prog_guide/graph_lib.rst | 68 +++++++++++++++++++++++++++--
>  1 file changed, 64 insertions(+), 4 deletions(-)
>
> diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
> index 1cfdc86433..8c2c2816ed 100644
> --- a/doc/guides/prog_guide/graph_lib.rst
> +++ b/doc/guides/prog_guide/graph_lib.rst
> @@ -189,14 +189,74 @@ In the above example, A graph object will be created with ethdev Rx
>  node of port 0 and queue 0, all ipv4* nodes in the system,
>  and ethdev tx node of all ports.
>
> -Multicore graph processing
> -~~~~~~~~~~~~~~~~~~~~~~~~~~
> -In the current graph library implementation, specifically,
> -``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
> +Graph models chossing

Graph models

> +~~~~~~~~~~~~~~~~~~~~~
> +Currently, there are 2 different walking models. Use macro
> +RTE_GRAPH_MODEL_SELECT to set the model in compile time. Also offers the
> +ability to choose models in runtime

by not defining RTE_GRAPH_MODEL_SELECT.

> +For application, must #define RTE_GRAPH_MODEL_SELECT before including
> +rte_graph_worker.h

Some rewording as suggestion.

Graph models
~~~~~~~~~~~~
There are two different kinds of graph walking models. User can select
the model using
``rte_graph_worker_model_set()`` API. If the application decides to
use only one model,
 the fast path check can be avoided by defining the model with
RTE_GRAPH_MODEL_SELECT. For example:

.. code-block:: console

#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
#include "rte_graph_worker.h"



> +
> +In l3fwd-graph, set RTE_GRAPH_MODEL_SELECT as the model explicitly for
> +performance-sensitive use case.
> +Or set the macro as GRAPH_MODEL_MCORE_RUNTIME_SELECT. And parse
> +``"--model=NAME"`` in cmdline and use ``rte_graph_worker_model_set()``
> +to set the walking model in runtime.

Please move to doc/guides/sample_app_ug/l3_forward_graph.rst

> +
> +RTC (Run-To-Completion)
> +^^^^^^^^^^^^^^^^^^^^^^^
> +This is the default graph walking model. Specifically,
> +``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions
>  are designed to work on single-core to have better performance.
>  The fast path API works on graph object, So the multi-core graph
>  processing strategy would be to create graph object PER WORKER.
>
> +Example:
> +
> +Graph: node-0 -> node-1 -> node-2 @Core0.
> +
> +.. code-block:: diff
> +
> +    + - - - - - - - - - - - - - - - - - - - - - +
> +    '                  Core #0                  '
> +    '                                           '
> +    ' +--------+     +---------+     +--------+ '
> +    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> +    ' +--------+     +---------+     +--------+ '
> +    '                                           '
> +    + - - - - - - - - - - - - - - - - - - - - - +
> +
> +Dispatch model
> +^^^^^^^^^^^^^^
> +The dispatch model enables a cross-core dispatching mechanism which employs
> +a scheduling work-queue to dispatch streams to other worker cores which
> +being associated with the destination node.
> +
> +Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
> +with the node.
> +Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
> +graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
> +bind graph with the worker core.
> +
> +Example:
> +
> +Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
> +Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
> +
> +.. code-block:: diff
> +
> +    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> +    '  Core #0   '     '          Core #1         '     '  Core #2   '
> +    '            '     '                          '     '            '
> +    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> +    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> +    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> +    '            '     '     |                    '     '      ^     '
> +    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> +                             |                                 |
> +                             + - - - - - - - - - - - - - - - - +
> +
> +
>  In fast path
>  ~~~~~~~~~~~~
>  Typical fast-path code looks like below, where the application
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-07 12:25                     ` Yan, Zhirun
@ 2023-06-07 13:26                       ` Jerin Jacob
  2023-06-08  3:08                         ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07 13:26 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom

On Wed, Jun 7, 2023 at 5:55 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:

> >
> > >  rte_graph_walk(struct rte_graph *graph)  {
> > > +#if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT
> > ==
> > > +RTE_GRAPH_MODEL_RTC
> >
> > Is nt defined instead of !defined?
> >
>
> !defined(XX) means not defined XX.
> What is nt defined means?

#undef RTE_GRAPH_MODEL_SELECT or not #define RTE_GRAPH_MODEL_SELECT
anywhere in .c file.

>
> > Use bracket around RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC.
> >
> Ok.
>
> >
> > >         rte_graph_walk_rtc(graph);
> > > +#elif defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT
> > ==
> > > +RTE_GRAPH_MODEL_MCORE_DISPATCH
> >
> > Use bracket around RTE_GRAPH_MODEL_SELECT ==
> Ok.
> >
> > > +       rte_graph_walk_mcore_dispatch(graph);
> > > +#else
> > > +       int model = rte_graph_worker_model_get(graph);
> >
> > Introduce rte_graph_worker_model_no_check_get() as commented earlier.
>
> Got it.
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 04/17] graph: add get/set graph worker model APIs
  2023-06-07 12:25                     ` Yan, Zhirun
@ 2023-06-07 13:28                       ` Jerin Jacob
  2023-06-08  3:08                         ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07 13:28 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom

On Wed, Jun 7, 2023 at 5:55 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
> > > +__rte_experimental
> > > +static inline uint32_t
> > > +rte_graph_worker_model_get(struct rte_graph *graph) {
> > > +       if (!graph_model_is_valid(graph->model))
> > > +               return -EINVAL;
> >
> > Introduce rte_graph_worker_model_no_check_get() to skip this check to use
> > with fastpath.
> >
> > rte_graph_worker_model_get can move to .c file.
>
> Yes. Will move in next version.
> Got it. rte_graph_worker_model_no_check_get() will be used in fast path.
> Actually, I don’t find the performance impact about static inline, so should the new

May be due to burst size 32 or 256.it will start impacting if burst
size is less.


> API declared with static inline keywords or put it into .c file also?

Keep in line fastpath function in .h as inline.

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-07 12:26                     ` Yan, Zhirun
@ 2023-06-07 13:58                       ` Jerin Jacob
  2023-06-08  2:58                         ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-07 13:58 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom

On Wed, Jun 7, 2023 at 5:56 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, June 7, 2023 4:27 PM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; stephen@networkplumber.org;
> > pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> > <mattias.ronnblom@ericsson.com>
> > Subject: Re: [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore
> > dispatch worker model
> >
> > On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > Add new parameter "model" to choose mcore dispatch or rtc model.
> > > And in dispatch model, the node will affinity to worker core successively.
> > >
> > > Note:
> > > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must
> > set
> > > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
> > > choose by model in runtime.
> >
> > Now no GRAPH_MODEL_MCORE_RUNTIME_SELECT. Right? it same
> > as !RTE_GRAPH_MODEL_SELECT
>
> GRAPH_MODEL_MCORE_RUNTIME_SELECT is defined in app. And it will hit rte_graph_walk()
> "else" case to get model and select walk_rtc()/walk_dispatch().

Got it. But I think, You can treat !RTE_GRAPH_MODEL_SELECT as
GRAPH_MODEL_MCORE_RUNTIME_SELECT without introducing
new one by ifndef  RTE_GRAPH_MODEL_SELECT.  No strong opinion.

>
> It is not the same as !RTE_GRAPH_MODEL_SELECT.
> GRAPH_MODEL_MCORE_RUNTIME_SELECT means to select in runtime.
> !RTE_GRAPH_MODEL_SELECT means to use the default rtc_walk(), it will hit the first case in

#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC -> means rtc_walk()
#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH -> means
mcore_dispatch_walk()
#undef RTE_GRAPH_MODEL_SELECT or not defined - it reads graph->model
to pick the model
Could you tell any issue with it?

> rte_graph_walk() in patch 13(http://patchwork.dpdk.org/project/dpdk/patch/20230607035144.1214492-14-zhirun.yan@intel.com/)
>
> Should I put it with other model definition?
> If yes, I prefer to put it the first one in patch 04, like:
>
> +#define GRAPH_MODEL_MCORE_RUNTIME_SELECT 0
> #define RTE_GRAPH_MODEL_RTC 1
> #define RTE_GRAPH_MODEL_MCORE_DISPATCH 2
> #define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
  2023-06-07 13:58                       ` Jerin Jacob
@ 2023-06-08  2:58                         ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-08  2:58 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, June 7, 2023 9:58 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore
> dispatch worker model
> 
> On Wed, Jun 7, 2023 at 5:56 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, June 7, 2023 4:27 PM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; stephen@networkplumber.org;
> > > pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> > > <mattias.ronnblom@ericsson.com>
> > > Subject: Re: [PATCH v9 15/17] examples/l3fwd-graph: introduce
> > > multicore dispatch worker model
> > >
> > > On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > > >
> > > > Add new parameter "model" to choose mcore dispatch or rtc model.
> > > > And in dispatch model, the node will affinity to worker core successively.
> > > >
> > > > Note:
> > > > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default.
> Must
> > > set
> > > > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > > > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it
> > > > could choose by model in runtime.
> > >
> > > Now no GRAPH_MODEL_MCORE_RUNTIME_SELECT. Right? it same as
> > > !RTE_GRAPH_MODEL_SELECT
> >
> > GRAPH_MODEL_MCORE_RUNTIME_SELECT is defined in app. And it will hit
> > rte_graph_walk() "else" case to get model and select
> walk_rtc()/walk_dispatch().
> 
> Got it. But I think, You can treat !RTE_GRAPH_MODEL_SELECT as
> GRAPH_MODEL_MCORE_RUNTIME_SELECT without introducing new one by
> ifndef  RTE_GRAPH_MODEL_SELECT.  No strong opinion.

I agree with no GRAPH_MODEL_MCORE_RUNTIME_SELECT.

> 
> >
> > It is not the same as !RTE_GRAPH_MODEL_SELECT.
> > GRAPH_MODEL_MCORE_RUNTIME_SELECT means to select in runtime.
> > !RTE_GRAPH_MODEL_SELECT means to use the default rtc_walk(), it will
> > hit the first case in
> 
> #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC -> means
> rtc_walk() #define RTE_GRAPH_MODEL_SELECT
> RTE_GRAPH_MODEL_MCORE_DISPATCH -> means
> mcore_dispatch_walk()
> #undef RTE_GRAPH_MODEL_SELECT or not defined - it reads graph->model to
> pick the model Could you tell any issue with it?

For the last case, if not defined or defined a wrang model type, it will read graph-model.
And app should check the model type.

Valid RTE_GRAPH_MODEL_SELECT = (not define or RTC or DISPATCH)
Valid cmdline param model = (rtc or dispatch) 
or not have cmdline model, set model to DEFAULT_RTC


> 
> > rte_graph_walk() in patch
> > 13(http://patchwork.dpdk.org/project/dpdk/patch/20230607035144.1214492
> > -14-zhirun.yan@intel.com/)
> >
> > Should I put it with other model definition?
> > If yes, I prefer to put it the first one in patch 04, like:
> >
> > +#define GRAPH_MODEL_MCORE_RUNTIME_SELECT 0
> > #define RTE_GRAPH_MODEL_RTC 1
> > #define RTE_GRAPH_MODEL_MCORE_DISPATCH 2 #define
> > RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v9 04/17] graph: add get/set graph worker model APIs
  2023-06-07 13:28                       ` Jerin Jacob
@ 2023-06-08  3:08                         ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-08  3:08 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, June 7, 2023 9:29 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v9 04/17] graph: add get/set graph worker model APIs
> 
> On Wed, Jun 7, 2023 at 5:55 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> > > > +__rte_experimental
> > > > +static inline uint32_t
> > > > +rte_graph_worker_model_get(struct rte_graph *graph) {
> > > > +       if (!graph_model_is_valid(graph->model))
> > > > +               return -EINVAL;
> > >
> > > Introduce rte_graph_worker_model_no_check_get() to skip this check
> > > to use with fastpath.
> > >
> > > rte_graph_worker_model_get can move to .c file.
> >
> > Yes. Will move in next version.
> > Got it. rte_graph_worker_model_no_check_get() will be used in fast path.
> > Actually, I don’t find the performance impact about static inline, so
> > should the new
> 
> May be due to burst size 32 or 256.it will start impacting if burst size is less.
> 

In my test, the impact still small.
But I understand your point. If the walk() be called much more, it will call much get().
The burst size will impact the node->obj[] size, and then cause much walk loop. Thanks.

> 
> > API declared with static inline keywords or put it into .c file also?
> 
> Keep in line fastpath function in .h as inline.

The declare of inline is just a suggestion for compiler. The compiler will decide to inline or not.
Got it. I will change in next version.


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-07 13:26                       ` Jerin Jacob
@ 2023-06-08  3:08                         ` Yan, Zhirun
  2023-06-08  5:33                           ` Jerin Jacob
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-08  3:08 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, June 7, 2023 9:26 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler
> model
> 
> On Wed, Jun 7, 2023 at 5:55 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> 
> > >
> > > >  rte_graph_walk(struct rte_graph *graph)  {
> > > > +#if !defined(RTE_GRAPH_MODEL_SELECT) ||
> RTE_GRAPH_MODEL_SELECT
> > > ==
> > > > +RTE_GRAPH_MODEL_RTC
> > >
> > > Is nt defined instead of !defined?
> > >
> >
> > !defined(XX) means not defined XX.
> > What is nt defined means?
> 
> #undef RTE_GRAPH_MODEL_SELECT or not #define
> RTE_GRAPH_MODEL_SELECT anywhere in .c file.
> 

In the implementation, RTE_GRAPH_MODEL_SELECT is only defined once in app
I think #if !define(XX) is a judgement, #undef XX is an action.
Here should be #if !define(XX)
For this impl, I treat not define as default and go into rtc_walk().


So If we treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick.
The #else case should cover: 1. Not defined and 2. Defined other type.
It should be as follow:

rte_graph_walk()
{
#if defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC
	rte_graph_walk_rtc();

#elif defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH
	rte_graph_walk_mcore_dispatch(graph);

#else
	const int model = rte_graph_worker_model_no_check_get();

	switch (model) {
	case RTE_GRAPH_MODEL_MCORE_DISPATCH:
		rte_graph_walk_mcore_dispatch();
		break;
	default:
		rte_graph_walk_rtc();
	}
#endif
}

> >
> > > Use bracket around RTE_GRAPH_MODEL_SELECT ==
> RTE_GRAPH_MODEL_RTC.
> > >
> > Ok.
> >
> > >
> > > >         rte_graph_walk_rtc(graph);
> > > > +#elif defined(RTE_GRAPH_MODEL_SELECT) &&
> RTE_GRAPH_MODEL_SELECT
> > > ==
> > > > +RTE_GRAPH_MODEL_MCORE_DISPATCH
> > >
> > > Use bracket around RTE_GRAPH_MODEL_SELECT ==
> > Ok.
> > >
> > > > +       rte_graph_walk_mcore_dispatch(graph);
> > > > +#else
> > > > +       int model = rte_graph_worker_model_get(graph);
> > >
> > > Introduce rte_graph_worker_model_no_check_get() as commented earlier.
> >
> > Got it.
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v9 17/17] doc: update multicore dispatch model in graph guides
  2023-06-07 12:45                   ` Jerin Jacob
@ 2023-06-08  3:21                     ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-08  3:21 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, June 7, 2023 8:46 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v9 17/17] doc: update multicore dispatch model in graph
> guides
> 
> On Wed, Jun 7, 2023 at 9:30 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Update graph documentation to introduce new multicore dispatch model.
> 
> Please squash this to relevant implementation patches, No need for separate
> patch. This is the contribution guideline followed.
> 
Got it. I will squash this to the patch of walk model choosing in patch 13.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  doc/guides/prog_guide/graph_lib.rst | 68
> > +++++++++++++++++++++++++++--
> >  1 file changed, 64 insertions(+), 4 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/graph_lib.rst
> > b/doc/guides/prog_guide/graph_lib.rst
> > index 1cfdc86433..8c2c2816ed 100644
> > --- a/doc/guides/prog_guide/graph_lib.rst
> > +++ b/doc/guides/prog_guide/graph_lib.rst
> > @@ -189,14 +189,74 @@ In the above example, A graph object will be
> > created with ethdev Rx  node of port 0 and queue 0, all ipv4* nodes in
> > the system,  and ethdev tx node of all ports.
> >
> > -Multicore graph processing
> > -~~~~~~~~~~~~~~~~~~~~~~~~~~
> > -In the current graph library implementation, specifically,
> > -``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API
> > functions
> > +Graph models chossing
> 
> Graph models
> 
> > +~~~~~~~~~~~~~~~~~~~~~
> > +Currently, there are 2 different walking models. Use macro
> > +RTE_GRAPH_MODEL_SELECT to set the model in compile time. Also offers
> > +the ability to choose models in runtime
> 
> by not defining RTE_GRAPH_MODEL_SELECT.
> 
> > +For application, must #define RTE_GRAPH_MODEL_SELECT before including
> > +rte_graph_worker.h
> 
> Some rewording as suggestion.
> 
> Graph models
> ~~~~~~~~~~~~
> There are two different kinds of graph walking models. User can select the
> model using ``rte_graph_worker_model_set()`` API. If the application decides to
> use only one model,  the fast path check can be avoided by defining the model
> with RTE_GRAPH_MODEL_SELECT. For example:
> 
> .. code-block:: console
> 
> #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC #include
> "rte_graph_worker.h"
> 
> 

Thanks for your refine. I will update in next version.

> 
> > +
> > +In l3fwd-graph, set RTE_GRAPH_MODEL_SELECT as the model explicitly
> > +for performance-sensitive use case.
> > +Or set the macro as GRAPH_MODEL_MCORE_RUNTIME_SELECT. And parse
> > +``"--model=NAME"`` in cmdline and use
> > +``rte_graph_worker_model_set()`` to set the walking model in runtime.
> 
> Please move to doc/guides/sample_app_ug/l3_forward_graph.rst
> 
Yes, will move it  and squash this part into example patch.

> > +
> > +RTC (Run-To-Completion)
> > +^^^^^^^^^^^^^^^^^^^^^^^
> > +This is the default graph walking model. Specifically,
> > +``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API
> > +functions
> >  are designed to work on single-core to have better performance.
> >  The fast path API works on graph object, So the multi-core graph
> > processing strategy would be to create graph object PER WORKER.
> >
> > +Example:
> > +
> > +Graph: node-0 -> node-1 -> node-2 @Core0.
> > +
> > +.. code-block:: diff
> > +
> > +    + - - - - - - - - - - - - - - - - - - - - - +
> > +    '                  Core #0                  '
> > +    '                                           '
> > +    ' +--------+     +---------+     +--------+ '
> > +    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> > +    ' +--------+     +---------+     +--------+ '
> > +    '                                           '
> > +    + - - - - - - - - - - - - - - - - - - - - - +
> > +
> > +Dispatch model
> > +^^^^^^^^^^^^^^
> > +The dispatch model enables a cross-core dispatching mechanism which
> > +employs a scheduling work-queue to dispatch streams to other worker
> > +cores which being associated with the destination node.
> > +
> > +Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set
> > +lcore affinity with the node.
> > +Each worker core will have a graph repetition. Use
> > +``rte_graph_clone()`` to clone graph for each worker and
> > +use``rte_graph_model_mcore_dispatch_core_bind()`` to bind graph with the
> worker core.
> > +
> > +Example:
> > +
> > +Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
> > +Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
> > +
> > +.. code-block:: diff
> > +
> > +    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> > +    '  Core #0   '     '          Core #1         '     '  Core #2   '
> > +    '            '     '                          '     '            '
> > +    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > +    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> > +    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > +    '            '     '     |                    '     '      ^     '
> > +    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> > +                             |                                 |
> > +                             + - - - - - - - - - - - - - - - - +
> > +
> > +
> >  In fast path
> >  ~~~~~~~~~~~~
> >  Typical fast-path code looks like below, where the application
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-08  3:08                         ` Yan, Zhirun
@ 2023-06-08  5:33                           ` Jerin Jacob
  2023-06-08  7:06                             ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-08  5:33 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom

On Thu, Jun 8, 2023 at 8:39 AM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, June 7, 2023 9:26 PM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; stephen@networkplumber.org;
> > pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> > <mattias.ronnblom@ericsson.com>
> > Subject: Re: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler
> > model
> >
> > On Wed, Jun 7, 2023 at 5:55 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> > > >
> > > > >  rte_graph_walk(struct rte_graph *graph)  {
> > > > > +#if !defined(RTE_GRAPH_MODEL_SELECT) ||
> > RTE_GRAPH_MODEL_SELECT
> > > > ==
> > > > > +RTE_GRAPH_MODEL_RTC
> > > >
> > > > Is nt defined instead of !defined?
> > > >
> > >
> > > !defined(XX) means not defined XX.
> > > What is nt defined means?
> >
> > #undef RTE_GRAPH_MODEL_SELECT or not #define
> > RTE_GRAPH_MODEL_SELECT anywhere in .c file.
> >
>
> In the implementation, RTE_GRAPH_MODEL_SELECT is only defined once in app
> I think #if !define(XX) is a judgement, #undef XX is an action.
> Here should be #if !define(XX)
> For this impl, I treat not define as default and go into rtc_walk().
>
>
> So If we treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick.
> The #else case should cover: 1. Not defined and 2. Defined other type.
> It should be as follow:


Ack. We are aligned, You can send the next version. Keep my existing
Acked-by on the patches which is already reviewed..
I should be able to give Acked-by on the pending one to complete the review.


>
> rte_graph_walk()
> {
> #if defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC

( RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)

>         rte_graph_walk_rtc();
>
> #elif defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH

(RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)

>         rte_graph_walk_mcore_dispatch(graph);
>
> #else
>         const int model =;
>
>         switch (model) {

switch ( rte_graph_worker_model_no_check_get()) {
as model not used anywhere else belwo , model is changing to uint8_t

>         case RTE_GRAPH_MODEL_MCORE_DISPATCH:
>                 rte_graph_walk_mcore_dispatch();
>                 break;
>         default:
>                 rte_graph_walk_rtc();
>         }
> #endif
> }
>
> > >
> > > > Use bracket around RTE_GRAPH_MODEL_SELECT ==
> > RTE_GRAPH_MODEL_RTC.
> > > >
> > > Ok.
> > >
> > > >
> > > > >         rte_graph_walk_rtc(graph);
> > > > > +#elif defined(RTE_GRAPH_MODEL_SELECT) &&
> > RTE_GRAPH_MODEL_SELECT
> > > > ==
> > > > > +RTE_GRAPH_MODEL_MCORE_DISPATCH
> > > >
> > > > Use bracket around RTE_GRAPH_MODEL_SELECT ==
> > > Ok.
> > > >
> > > > > +       rte_graph_walk_mcore_dispatch(graph);
> > > > > +#else
> > > > > +       int model = rte_graph_worker_model_get(graph);
> > > >
> > > > Introduce rte_graph_worker_model_no_check_get() as commented earlier.
> > >
> > > Got it.
> > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model
  2023-06-08  5:33                           ` Jerin Jacob
@ 2023-06-08  7:06                             ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-08  7:06 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, June 8, 2023 1:34 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler
> model
> 
> On Thu, Jun 8, 2023 at 8:39 AM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, June 7, 2023 9:26 PM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; stephen@networkplumber.org;
> > > pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> > > <mattias.ronnblom@ericsson.com>
> > > Subject: Re: [PATCH v9 13/17] graph: enable graph multicore dispatch
> > > scheduler model
> > >
> > > On Wed, Jun 7, 2023 at 5:55 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > >
> > > > >
> > > > > >  rte_graph_walk(struct rte_graph *graph)  {
> > > > > > +#if !defined(RTE_GRAPH_MODEL_SELECT) ||
> > > RTE_GRAPH_MODEL_SELECT
> > > > > ==
> > > > > > +RTE_GRAPH_MODEL_RTC
> > > > >
> > > > > Is nt defined instead of !defined?
> > > > >
> > > >
> > > > !defined(XX) means not defined XX.
> > > > What is nt defined means?
> > >
> > > #undef RTE_GRAPH_MODEL_SELECT or not #define
> RTE_GRAPH_MODEL_SELECT
> > > anywhere in .c file.
> > >
> >
> > In the implementation, RTE_GRAPH_MODEL_SELECT is only defined once in
> > app I think #if !define(XX) is a judgement, #undef XX is an action.
> > Here should be #if !define(XX)
> > For this impl, I treat not define as default and go into rtc_walk().
> >
> >
> > So If we treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick.
> > The #else case should cover: 1. Not defined and 2. Defined other type.
> > It should be as follow:
> 
> 
> Ack. We are aligned, You can send the next version. Keep my existing Acked-by
> on the patches which is already reviewed..
> I should be able to give Acked-by on the pending one to complete the review.
> 
Thanks, I will send new patch set.
> 
> >
> > rte_graph_walk()
> > {
> > #if defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT ==
> > RTE_GRAPH_MODEL_RTC
> 
> ( RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)
> 
> >         rte_graph_walk_rtc();
> >
> > #elif defined(RTE_GRAPH_MODEL_SELECT) && RTE_GRAPH_MODEL_SELECT
> ==
> > RTE_GRAPH_MODEL_MCORE_DISPATCH
> 
> (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> 
> >         rte_graph_walk_mcore_dispatch(graph);
> >
> > #else
> >         const int model =;
> >
> >         switch (model) {
> 
> switch ( rte_graph_worker_model_no_check_get()) { as model not used
> anywhere else belwo , model is changing to uint8_t
> 
Got it.
> >         case RTE_GRAPH_MODEL_MCORE_DISPATCH:
> >                 rte_graph_walk_mcore_dispatch();
> >                 break;
> >         default:
> >                 rte_graph_walk_rtc();
> >         }
> > #endif
> > }
> >
> > > >
> > > > > Use bracket around RTE_GRAPH_MODEL_SELECT ==
> > > RTE_GRAPH_MODEL_RTC.
> > > > >
> > > > Ok.
> > > >
> > > > >
> > > > > >         rte_graph_walk_rtc(graph);
> > > > > > +#elif defined(RTE_GRAPH_MODEL_SELECT) &&
> > > RTE_GRAPH_MODEL_SELECT
> > > > > ==
> > > > > > +RTE_GRAPH_MODEL_MCORE_DISPATCH
> > > > >
> > > > > Use bracket around RTE_GRAPH_MODEL_SELECT ==
> > > > Ok.
> > > > >
> > > > > > +       rte_graph_walk_mcore_dispatch(graph);
> > > > > > +#else
> > > > > > +       int model = rte_graph_worker_model_get(graph);
> > > > >
> > > > > Introduce rte_graph_worker_model_no_check_get() as commented
> earlier.
> > > >
> > > > Got it.
> > > >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 00/16] graph enhancement for multi-core dispatch
  2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
                                   ` (16 preceding siblings ...)
  2023-06-07  3:51                 ` [PATCH v9 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
@ 2023-06-08  9:57                 ` Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 01/16] graph: rename rte_graph_work as common Zhirun Yan
                                     ` (16 more replies)
  17 siblings, 17 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V10:
Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
in patch 04.
Change RTE_ASSERT to return in patch 06.
Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
Move stats into dispatch union in patch 14.
Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).

V9:
Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
Fix graph model check in rte_graph_worker_model_set() in patch 04.
Fix typo in doc.

V8:
No performance dorp for original l3fwd-graph and graph_perf_autotest.

Update graph model set/get functions and add graph_model_is_valid() in patch 04.
Update doc for new scheme usage(choose model in runtime or compile time).
Update dispatch schedule struct into union.
Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
Add model clone in graph_clone() in patch 08.
Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (16):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for mcore dispatch model
  test/graph: add functional tests for mcore dispatch model
  examples/l3fwd-graph: introduce mcore dispatch worker model

 MAINTAINERS                                   |   3 +-
 app/test/test_graph.c                         | 130 ++++
 doc/guides/prog_guide/graph_lib.rst           |  71 ++-
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
 examples/l3fwd-graph/main.c                   | 230 +++++--
 lib/graph/graph.c                             | 161 +++++
 lib/graph/graph_debug.c                       |   6 +
 lib/graph/graph_populate.c                    |   1 +
 lib/graph/graph_private.h                     |  90 +++
 lib/graph/graph_stats.c                       |  76 ++-
 lib/graph/meson.build                         |   4 +-
 lib/graph/node.c                              |  27 +-
 lib/graph/rte_graph.h                         |  65 ++
 lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
 lib/graph/rte_graph_model_mcore_dispatch.h    | 134 ++++
 lib/graph/rte_graph_model_rtc.h               |  46 ++
 lib/graph/rte_graph_worker.c                  |  39 ++
 lib/graph/rte_graph_worker.h                  | 503 +--------------
 lib/graph/rte_graph_worker_common.h           | 597 ++++++++++++++++++
 lib/graph/version.map                         |  12 +
 20 files changed, 1834 insertions(+), 568 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 01/16] graph: rename rte_graph_work as common
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 02/16] graph: split graph worker into common and default model Zhirun Yan
                                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 MAINTAINERS                                                 | 3 ++-
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 8 insertions(+), 7 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 48830ae571..34ac499c14 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1716,10 +1716,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 02/16] graph: split graph worker into common and default model
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 01/16] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 03/16] graph: move node process into inline function Zhirun Yan
                                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 6 files changed, 100 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 03/16] graph: move node process into inline function
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 01/16] graph: rename rte_graph_work as common Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 02/16] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 04/16] graph: add get/set graph worker model APIs Zhirun Yan
                                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 04/16] graph: add get/set graph worker model APIs
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (2 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 03/16] graph: move node process into inline function Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 10:38                     ` Jerin Jacob
  2023-06-08  9:57                   ` [PATCH v10 05/16] graph: introduce graph node core affinity API Zhirun Yan
                                     ` (12 subsequent siblings)
  16 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 39 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 69 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  5 +++
 4 files changed, 114 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..7e2a918fae
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+bool
+rte_graph_model_is_valid(uint8_t model)
+{
+	if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		return false;
+
+	return true;
+}
+
+int
+rte_graph_worker_model_set(uint8_t model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	if (!rte_graph_model_is_valid(model))
+		return -EINVAL;
+
+	STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+
+	return 0;
+}
+
+uint8_t
+rte_graph_worker_model_get(struct rte_graph *graph)
+{
+	if (!rte_graph_model_is_valid(graph->model))
+		return -EINVAL;
+
+	return graph->model;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..71fd620035 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+/* If adding new entry, then update graph_model_is_valid API. */
+#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+/**< Dispatch model to support cross-core dispatching within core affinity. */
+#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
+
 /**
  * @internal
  *
@@ -41,6 +48,9 @@ struct rte_graph {
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	uint8_t model;		     /**< graph model */
+	uint8_t reserved1;	     /**< Reserved for future use. */
+	uint16_t reserved2;	     /**< Reserved for future use. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -490,6 +500,65 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+/**
+ * Test the validity of model.
+ *
+ * @param model
+ *   Model to check.
+ *
+ * @return
+ *   True if graph model is valid, false otherwise.
+ */
+bool
+rte_graph_model_is_valid(uint8_t model);
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param model
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_graph_worker_model_set(uint8_t model);
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for slow path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+uint8_t rte_graph_worker_model_get(struct rte_graph *graph);
+
+/**
+ * Get the graph worker model without check
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for fast path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static __rte_always_inline
+uint8_t rte_graph_worker_model_no_check_get(struct rte_graph *graph)
+{
+	return graph->model;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..d8b66640bd 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -43,5 +43,10 @@ EXPERIMENTAL {
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
 
+	graph_model_is_valid;
+	rte_graph_worker_model_set;
+	rte_graph_worker_model_get;
+	rte_graph_worker_model_no_check_get;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 05/16] graph: introduce graph node core affinity API
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (3 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 04/16] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 06/16] graph: introduce graph bind unbind API Zhirun Yan
                                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  1 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  2 +
 6 files changed, 81 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..0685cf9e72 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index d8b66640bd..aca38b23f0 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -48,5 +48,7 @@ EXPERIMENTAL {
 	rte_graph_worker_model_get;
 	rte_graph_worker_model_no_check_get;
 
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
+
 	local: *;
 };
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 06/16] graph: introduce graph bind unbind API
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (4 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 05/16] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 10:40                     ` Jerin Jacob
  2023-06-08  9:57                   ` [PATCH v10 07/16] graph: move node clone name func into private as common Zhirun Yan
                                     ` (10 subsequent siblings)
  16 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 60 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 ++++++++++++++
 lib/graph/version.map     |  2 ++
 4 files changed, 86 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..8d5bd8b9ae 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,65 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	if (graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		goto fail;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index aca38b23f0..5a6e13e62b 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -18,6 +18,8 @@ EXPERIMENTAL {
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 07/16] graph: move node clone name func into private as common
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (5 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 06/16] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
                                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 08/16] graph: introduce graph clone API for other worker core
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (6 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 07/16] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 09/16] graph: add structure for stream moving between cores Zhirun Yan
                                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 89 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 112 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8d5bd8b9ae..1b34f0e543 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -405,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -469,6 +470,94 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Clone the graph model */
+	graph->graph->model = parent_graph->graph->model;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 5a6e13e62b..dbb3507687 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -7,6 +7,7 @@ EXPERIMENTAL {
 
 	rte_graph_create;
 	rte_graph_destroy;
+	rte_graph_clone;
 	rte_graph_dump;
 	rte_graph_export;
 	rte_graph_from_name;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 09/16] graph: add structure for stream moving between cores
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (7 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08  9:57                   ` [PATCH v10 10/16] graph: introduce stream moving cross cores Zhirun Yan
                                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 1b34f0e543..968cbbf86c 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -291,6 +291,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 		goto fail;
 
 	graph->lcore_id = lcore;
+	graph->graph->dispatch.lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -314,6 +315,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->dispatch.lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 71fd620035..d6a16dc7e3 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -36,12 +36,20 @@ extern "C" {
 /**< Dispatch model to support cross-core dispatching within core affinity. */
 #define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
  * Data structure to hold graph data.
  */
 struct rte_graph {
+	/* Fast path area. */
 	uint32_t tail;		     /**< Tail of circular buffer. */
 	uint32_t head;		     /**< Head of circular buffer. */
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
@@ -51,6 +59,20 @@ struct rte_graph {
 	uint8_t model;		     /**< graph model */
 	uint8_t reserved1;	     /**< Reserved for future use. */
 	uint16_t reserved2;	     /**< Reserved for future use. */
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+			struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+			unsigned int lcore_id;  /**< The graph running Lcore. */
+			struct rte_ring *wq;    /**< The work-queue for pending streams. */
+			struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+		} dispatch; /** Only used by dispatch model */
+	};
+	SLIST_ENTRY(rte_graph) next;   /* The next for rte_graph list */
+	/* End of Fast path area.*/
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -83,6 +105,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			unsigned int lcore_id;  /**< Node running lcore. */
+		} dispatch;
+	};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 10/16] graph: introduce stream moving cross cores
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (8 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 09/16] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 13:39                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-06-08  9:57                   ` [PATCH v10 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                                     ` (6 subsequent siblings)
  16 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                          |   6 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   2 +
 7 files changed, 254 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 968cbbf86c..41251e3435 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -473,7 +473,7 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -547,14 +547,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 0685cf9e72..9d51eabe33 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..8f4bc860ab 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->dispatch.wq = rte_ring_create(graph->name, wq_size, graph->socket,
+					     flags);
+	if (graph->dispatch.wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
+						sizeof(struct graph_mcore_dispatch_wq_node),
+						0, 0, NULL, NULL, NULL, NULL,
+						graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->dispatch.mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->dispatch.lcore_id = _graph->lcore_id;
+
+	if (parent_graph->dispatch.rq == NULL) {
+		parent_graph->dispatch.rq = &parent_graph->dispatch.rq_head;
+		SLIST_INIT(parent_graph->dispatch.rq);
+	}
+
+	graph->dispatch.rq = parent_graph->dispatch.rq;
+	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+
+	rte_mempool_free(graph->dispatch.mp);
+	graph->dispatch.mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void *)&wq_node,
+					     sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, next)
+		if (graph->dispatch.lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->dispatch.mp;
+	struct rte_ring *wq = graph->dispatch.wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index dbb3507687..f95a6b0fb5 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -50,6 +50,8 @@ EXPERIMENTAL {
 	rte_graph_worker_model_set;
 	rte_graph_worker_model_get;
 	rte_graph_worker_model_no_check_get;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
 
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 11/16] graph: enable create and destroy graph scheduling workqueue
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (9 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 10/16] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 13:43                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-06-08  9:57                   ` [PATCH v10 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                                     ` (5 subsequent siblings)
  16 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 41251e3435..0c28d925bc 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -451,6 +451,11 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get(graph->graph) ==
+			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -524,6 +529,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	/* Clone the graph model */
 	graph->graph->model = parent_graph->graph->model;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get(graph->graph) == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 12/16] graph: introduce graph walk by cross-core dispatch
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (10 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 13:43                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-06-08  9:57                   ` [PATCH v10 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                                     ` (4 subsequent siblings)
  16 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..c78a3bbdf9 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,50 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->dispatch.wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
+		    graph->dispatch.rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->dispatch.rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 13/16] graph: enable graph multicore dispatch scheduler model
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (11 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 10:42                     ` Jerin Jacob
  2023-06-08 14:29                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-06-08  9:57                   ` [PATCH v10 14/16] graph: add stats for mcore dispatch model Zhirun Yan
                                     ` (3 subsequent siblings)
  16 siblings, 2 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model. Must define
RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
to enable specific model choosing.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/prog_guide/graph_lib.rst | 71 ++++++++++++++++++++++++++---
 lib/graph/rte_graph_worker.h        | 13 ++++++
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..017cc25fd3 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,13 +189,70 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
-are designed to work on single-core to have better performance.
-The fast path API works on graph object, So the multi-core graph
-processing strategy would be to create graph object PER WORKER.
+Graph models
+~~~~~~~~~~~~
+There are two different kinds of graph walking models. User can select the model using
+``rte_graph_worker_model_set()`` API. If the application decides to use only one model,
+the fast path check can be avoided by defining the model with RTE_GRAPH_MODEL_SELECT.
+For example:
+
+.. code-block:: console
+
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
+#include "rte_graph_worker.h"
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically, ``rte_graph_walk_rtc()`` and
+``rte_node_enqueue*`` fast path API functions are designed to work on single-core to
+have better performance. The fast path API works on graph object, So the multi-core
+graph processing strategy would be to create graph object PER WORKER.
+
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
 
 In fast path
 ~~~~~~~~~~~~
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..6685600813 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,19 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)
 	rte_graph_walk_rtc(graph);
+#elif defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	switch (rte_graph_worker_model_no_check_get(graph)) {
+	case RTE_GRAPH_MODEL_MCORE_DISPATCH:
+		rte_graph_walk_mcore_dispatch(graph);
+		break;
+	default:
+		rte_graph_walk_rtc(graph);
+	}
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 14/16] graph: add stats for mcore dispatch model
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (12 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 13:11                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-06-08  9:57                   ` [PATCH v10 15/16] test/graph: add functional tests " Zhirun Yan
                                     ` (2 subsequent siblings)
  16 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for mcore dispatch model if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..9def3067ec 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get(g) == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->dispatch.total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->dispatch.total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..cc32245c05 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,28 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +104,22 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->dispatch.total_sched_objs;
+			sched_fail += node->dispatch.total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 8f4bc860ab..d1291b8c57 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->dispatch.total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->dispatch.total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index d6a16dc7e3..a6bae4c6a5 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		struct {
 			unsigned int lcore_id;  /**< Node running lcore. */
+			uint64_t total_sched_objs; /**< Number of objects scheduled. */
+			uint64_t total_sched_fail; /**< Number of scheduled failure. */
 		} dispatch;
 	};
 	/* Fast path area  */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 15/16] test/graph: add functional tests for mcore dispatch model
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (13 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 14/16] graph: add stats for mcore dispatch model Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 12:27                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-06-08  9:57                   ` [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  16 siblings, 1 reply; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 app/test/test_graph.c | 130 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..8609c0b3a4 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,132 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+	int ret = 0;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		ret = -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		ret = -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->dispatch.lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		ret = -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->dispatch.lcore_id);
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test3");
+	if (rte_graph_worker_model_get(graph) != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (14 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 15/16] test/graph: add functional tests " Zhirun Yan
@ 2023-06-08  9:57                   ` Zhirun Yan
  2023-06-08 10:45                     ` Jerin Jacob
  2023-06-08 12:08                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  16 siblings, 2 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08  9:57 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

Note:
RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
dispatch explicitly. If not define it, it could choose by param model
in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 ++
 examples/l3fwd-graph/main.c                   | 230 +++++++++++++++---
 2 files changed, 208 insertions(+), 38 deletions(-)

diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index 585ac8c898..7189fa33ec 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -54,6 +54,7 @@ The application has a number of command line options similar to l3fwd::
                                    [--pcap-enable]
                                    [--pcap-num-cap]
                                    [--pcap-file-name]
+                                   [--model]
 
 Where,
 
@@ -78,6 +79,8 @@ Where,
 
 * ``--pcap-file-name:`` Optional, Pcap filename to capture packets in.
 
+* ``--model:`` Optional, select graph walking model.
+
 For example, consider a dual processor socket platform with 8 physical cores, where cores 0-7 and 16-23 appear on socket 0,
 while cores 8-15 and 24-31 appear on socket 1.
 
@@ -122,6 +125,19 @@ In this command:
 
 *   The --pcap-file-name option enables user to give filename in which packets are to be captured.
 
+To enable mcore dispatch model, the application need change RTE_GRAPH_MODEL_SELECT to ``#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH``
+before including rte_graph_worker.h. Recompile and use following command:
+
+.. code-block:: console
+
+    ./<build_dir>/examples/dpdk-l3fwd-graph -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P --model="dispatch"
+
+To enable graph walking model selection in run-time, remove the define of ``RTE_GRAPH_MODEL_SELECT``. Recompile and use the same command.
+
+In this command:
+
+*   The --model option enables user to select ``rtc`` or ``dispatch`` model.
+
 Refer to the *DPDK Getting Started Guide* for general information on running applications and
 the Environment Abstraction Layer (EAL) options.
 
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..be69fcace1 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -55,6 +56,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +92,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static uint8_t model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +159,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +295,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +338,23 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+	else
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+#if defined(RTE_GRAPH_MODEL_SELECT)
+	if (model_conf != RTE_GRAPH_MODEL_SELECT)
+		printf("Warning: model mismatch, will use the RTE_GRAPH_MODEL_SELECT model\n");
+	model_conf = RTE_GRAPH_MODEL_SELECT;
+#endif
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +471,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +488,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +500,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +592,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -788,6 +834,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -840,6 +1022,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1264,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v10 04/16] graph: add get/set graph worker model APIs
  2023-06-08  9:57                   ` [PATCH v10 04/16] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-08 10:38                     ` Jerin Jacob
  0 siblings, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-08 10:38 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Thu, Jun 8, 2023 at 3:35 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
>
> +/** Graph worker models */
> +/* If adding new entry, then update graph_model_is_valid API. */

graph_model_is_valid-> rte_graph_model_is_valid.

Comments suggestion:
When adding a new graph model entry, update rte_graph_model_is_valid()
implementation.

With above change,
Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v10 06/16] graph: introduce graph bind unbind API
  2023-06-08  9:57                   ` [PATCH v10 06/16] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-08 10:40                     ` Jerin Jacob
  2023-06-08 13:47                       ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-08 10:40 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Thu, Jun 8, 2023 at 3:35 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add lcore_id for graph to hold affinity core id where graph would run on.
> Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
> be set as MAX by default, it means not enable this attribute.
>

> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index aca38b23f0..5a6e13e62b 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -18,6 +18,8 @@ EXPERIMENTAL {
>         rte_graph_node_get_by_name;
>         rte_graph_obj_dump;
>         rte_graph_walk;
> +       rte_graph_model_mcore_dispatch_core_bind;
> +       rte_graph_model_mcore_dispatch_core_unbind;

Across the patch, Please update in symbols in alphabetical order when adding it.

>
>         rte_graph_cluster_stats_create;
>         rte_graph_cluster_stats_destroy;
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v10 13/16] graph: enable graph multicore dispatch scheduler model
  2023-06-08  9:57                   ` [PATCH v10 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-08 10:42                     ` Jerin Jacob
  2023-06-08 14:29                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  1 sibling, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-08 10:42 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Thu, Jun 8, 2023 at 3:35 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> This patch enables to chose new scheduler model. Must define
> RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
> to enable specific model choosing.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>


> ---
>  doc/guides/prog_guide/graph_lib.rst | 71 ++++++++++++++++++++++++++---
>  lib/graph/rte_graph_worker.h        | 13 ++++++
>  2 files changed, 77 insertions(+), 7 deletions(-)
>
> diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
> index 1cfdc86433..017cc25fd3 100644
> --- a/doc/guides/prog_guide/graph_lib.rst
> +++ b/doc/guides/prog_guide/graph_lib.rst
> @@ -189,13 +189,70 @@ In the above example, A graph object will be created with ethdev Rx
>  node of port 0 and queue 0, all ipv4* nodes in the system,
>  and ethdev tx node of all ports.
>
> -Multicore graph processing
> -~~~~~~~~~~~~~~~~~~~~~~~~~~
> -In the current graph library implementation, specifically,
> -``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
> -are designed to work on single-core to have better performance.
> -The fast path API works on graph object, So the multi-core graph
> -processing strategy would be to create graph object PER WORKER.
> +Graph models
> +~~~~~~~~~~~~
> +There are two different kinds of graph walking models. User can select the model using
> +``rte_graph_worker_model_set()`` API. If the application decides to use only one model,
> +the fast path check can be avoided by defining the model with RTE_GRAPH_MODEL_SELECT.
> +For example:
> +
> +.. code-block:: console
> +
> +#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
> +#include "rte_graph_worker.h"
> +
> +RTC (Run-To-Completion)
> +^^^^^^^^^^^^^^^^^^^^^^^
> +This is the default graph walking model. Specifically, ``rte_graph_walk_rtc()`` and
> +``rte_node_enqueue*`` fast path API functions are designed to work on single-core to
> +have better performance. The fast path API works on graph object, So the multi-core
> +graph processing strategy would be to create graph object PER WORKER.
> +
> +Example:
> +
> +Graph: node-0 -> node-1 -> node-2 @Core0.
> +
> +.. code-block:: diff
> +
> +    + - - - - - - - - - - - - - - - - - - - - - +
> +    '                  Core #0                  '
> +    '                                           '
> +    ' +--------+     +---------+     +--------+ '
> +    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> +    ' +--------+     +---------+     +--------+ '
> +    '                                           '
> +    + - - - - - - - - - - - - - - - - - - - - - +
> +
> +Dispatch model
> +^^^^^^^^^^^^^^
> +The dispatch model enables a cross-core dispatching mechanism which employs
> +a scheduling work-queue to dispatch streams to other worker cores which
> +being associated with the destination node.
> +
> +Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
> +with the node.
> +Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
> +graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
> +bind graph with the worker core.
> +
> +Example:
> +
> +Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
> +Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
> +
> +.. code-block:: diff
> +
> +    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> +    '  Core #0   '     '          Core #1         '     '  Core #2   '
> +    '            '     '                          '     '            '
> +    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> +    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> +    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> +    '            '     '     |                    '     '      ^     '
> +    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> +                             |                                 |
> +                             + - - - - - - - - - - - - - - - - +
> +
>
>  In fast path
>  ~~~~~~~~~~~~
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index 5b58f7bda9..6685600813 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -11,6 +11,7 @@ extern "C" {
>  #endif
>
>  #include "rte_graph_model_rtc.h"
> +#include "rte_graph_model_mcore_dispatch.h"
>
>  /**
>   * Perform graph walk on the circular buffer and invoke the process function
> @@ -25,7 +26,19 @@ __rte_experimental
>  static inline void
>  rte_graph_walk(struct rte_graph *graph)
>  {
> +#if defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)
>         rte_graph_walk_rtc(graph);
> +#elif defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> +       rte_graph_walk_mcore_dispatch(graph);
> +#else
> +       switch (rte_graph_worker_model_no_check_get(graph)) {
> +       case RTE_GRAPH_MODEL_MCORE_DISPATCH:
> +               rte_graph_walk_mcore_dispatch(graph);
> +               break;
> +       default:
> +               rte_graph_walk_rtc(graph);
> +       }
> +#endif
>  }
>
>  #ifdef __cplusplus
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-08  9:57                   ` [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
@ 2023-06-08 10:45                     ` Jerin Jacob
  2023-06-08 12:08                     ` [EXT] " Pavan Nikhilesh Bhagavatula
  1 sibling, 0 replies; 369+ messages in thread
From: Jerin Jacob @ 2023-06-08 10:45 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Thu, Jun 8, 2023 at 3:36 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new parameter "model" to choose mcore dispatch or rtc model.
> And in dispatch model, the node will affinity to worker core successively.
>
> Note:

Remove just the  "Note" text.


> RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must set
> model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> dispatch explicitly. If not define it, it could choose by param model
> in runtime.
> Only support one RX node for mcore dispatch model in current
> implementation.
>
> ./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
> --model="dispatch"
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-08  9:57                   ` [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
  2023-06-08 10:45                     ` Jerin Jacob
@ 2023-06-08 12:08                     ` Pavan Nikhilesh Bhagavatula
  2023-06-08 13:50                       ` Yan, Zhirun
  1 sibling, 1 reply; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-06-08 12:08 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom

> Add new parameter "model" to choose mcore dispatch or rtc model.
> And in dispatch model, the node will affinity to worker core successively.
> 
> Note:
> RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must

Should be RTE_GRAPH_MODEL_RTC

> set
> model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> dispatch explicitly. If not define it, it could choose by param model
> in runtime.
> Only support one RX node for mcore dispatch model in current
> implementation.
> 
> ./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
> --model="dispatch"
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

> ---
>  doc/guides/sample_app_ug/l3_forward_graph.rst |  16 ++
>  examples/l3fwd-graph/main.c                   | 230 +++++++++++++++---
>  2 files changed, 208 insertions(+), 38 deletions(-)
> 


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v10 15/16] test/graph: add functional tests for mcore dispatch model
  2023-06-08  9:57                   ` [PATCH v10 15/16] test/graph: add functional tests " Zhirun Yan
@ 2023-06-08 12:27                     ` Pavan Nikhilesh Bhagavatula
  0 siblings, 0 replies; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-06-08 12:27 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom

> Add functional test for mcore dispatch model including graph clone,
> graph model set/get, node worker affinity, graph worker binding/unbinding.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  app/test/test_graph.c | 130
> ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 130 insertions(+)
> 
> diff --git a/app/test/test_graph.c b/app/test/test_graph.c
> index 1a2d1e6fab..8609c0b3a4 100644
> --- a/app/test/test_graph.c
> +++ b/app/test/test_graph.c
> @@ -660,6 +660,132 @@ test_create_graph(void)
>  	return 0;
>  }
> 
> +static int
> +test_graph_clone(void)
> +{
> +	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
> +	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
> +	struct rte_graph_param graph_conf;
> +	int ret = 0;
> +
> +	main_graph_id = rte_graph_from_name("worker0");
> +	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
> +		printf("Must create main graph first\n");
> +		ret = -1;
> +	}
> +
> +	graph_conf.dispatch.mp_capacity = 1024;
> +	graph_conf.dispatch.wq_size_max = 32;
> +
> +	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0",
> &graph_conf);
> +
> +	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
> +		printf("Graph creation failed with error = %d\n", rte_errno);
> +		ret = -1;
> +	}
> +
> +	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-
> cloned-test0")) {
> +		printf("Cloned graph should name as %s but get %s\n",
> "worker0-cloned-test",
> +		       rte_graph_id_to_name(cloned_graph_id));
> +		ret = -1;
> +	}
> +
> +	rte_graph_destroy(cloned_graph_id);
> +
> +	return ret;
> +}
> +
> +static int
> +test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
> +{
> +	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
> +	unsigned int worker_lcore = RTE_MAX_LCORE;
> +	rte_node_t nid = RTE_NODE_ID_INVALID;
> +	char node_name[64] = "test_node00";
> +	struct rte_node *node;
> +	int ret = 0;
> +
> +	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
> +	ret =
> rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
> worker_lcore);
> +	if (ret == 0)
> +		printf("Set node %s affinity to lcore %u\n", node_name,
> worker_lcore);
> +
> +	nid = rte_node_from_name(node_name);
> +	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
> +	node = rte_graph_node_get(cloned_graph_id, nid);
> +
> +	if (node->dispatch.lcore_id != worker_lcore) {
> +		printf("set node affinity failed\n");
> +		ret = -1;
> +	}
> +
> +	rte_graph_destroy(cloned_graph_id);
> +
> +	return ret;
> +}
> +
> +static int
> +test_graph_model_mcore_dispatch_core_bind_unbind(void)
> +{
> +	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
> +	unsigned int worker_lcore = RTE_MAX_LCORE;
> +	struct rte_graph *graph;
> +	int ret = 0;
> +
> +	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
> +	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
> +
> +	ret =
> rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id,
> worker_lcore);
> +	if (ret != 0) {
> +		printf("bind graph %d to lcore %u failed\n", graph_id,
> worker_lcore);
> +		ret = -1;
> +	}
> +
> +	graph = rte_graph_lookup("worker0-cloned-test2");
> +
> +	if (graph->dispatch.lcore_id != worker_lcore) {
> +		printf("bind graph %s(id:%d) with lcore %u failed\n",
> +		       graph->name, graph->id, worker_lcore);
> +		ret = -1;
> +	}
> +
> +	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
> +	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
> +		printf("unbind graph %s(id:%d) failed %d\n",
> +		       graph->name, graph->id, graph->dispatch.lcore_id);
> +		ret = -1;
> +	}
> +
> +	rte_graph_destroy(cloned_graph_id);
> +
> +	return ret;
> +}
> +
> +static int
> +test_graph_worker_model_set_get(void)
> +{
> +	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
> +	struct rte_graph *graph;
> +	int ret = 0;
> +
> +	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
> +	ret =
> rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
> +	if (ret != 0) {
> +		printf("Set graph mcore dispatch model failed\n");
> +		ret = -1;
> +	}
> +
> +	graph = rte_graph_lookup("worker0-cloned-test3");
> +	if (rte_graph_worker_model_get(graph) !=
> RTE_GRAPH_MODEL_MCORE_DISPATCH) {
> +		printf("Get graph worker model failed\n");
> +		ret = -1;
> +	}
> +
> +	rte_graph_destroy(cloned_graph_id);
> +
> +	return 0;
> +}
> +
>  static int
>  test_graph_walk(void)
>  {
> @@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
>  		TEST_CASE(test_update_edges),
>  		TEST_CASE(test_lookup_functions),
>  		TEST_CASE(test_create_graph),
> +		TEST_CASE(test_graph_clone),
> +
> 	TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity
> _set),
> +
> 	TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
> +		TEST_CASE(test_graph_worker_model_set_get),
>  		TEST_CASE(test_graph_lookup_functions),
>  		TEST_CASE(test_graph_walk),
>  		TEST_CASE(test_print_stats),
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v10 14/16] graph: add stats for mcore dispatch model
  2023-06-08  9:57                   ` [PATCH v10 14/16] graph: add stats for mcore dispatch model Zhirun Yan
@ 2023-06-08 13:11                     ` Pavan Nikhilesh Bhagavatula
  0 siblings, 0 replies; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-06-08 13:11 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom

> Add stats for mcore dispatch model if stats collection is
> enabled.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>

Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

> ---
>  lib/graph/graph_debug.c                    |  6 ++
>  lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
>  lib/graph/rte_graph.h                      | 10 +++
>  lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
>  lib/graph/rte_graph_worker_common.h        |  2 +
>  5 files changed, 89 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
> index b84412f5dd..9def3067ec 100644
> --- a/lib/graph/graph_debug.c
> +++ b/lib/graph/graph_debug.c
> @@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool
> all)
>  		fprintf(f, "       size=%d\n", n->size);
>  		fprintf(f, "       idx=%d\n", n->idx);
>  		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
> +		if (rte_graph_worker_model_get(g) ==
> RTE_GRAPH_MODEL_MCORE_DISPATCH) {
> +			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
> +				n->dispatch.total_sched_objs);
> +			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
> +				n->dispatch.total_sched_fail);
> +		}
>  		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
>  		for (i = 0; i < n->nb_edges; i++)
>  			fprintf(f, "          edge[%d] <%s>\n", i,
> diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
> index c0140ba922..cc32245c05 100644
> --- a/lib/graph/graph_stats.c
> +++ b/lib/graph/graph_stats.c
> @@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
>  	struct cluster_node clusters[];
>  } __rte_cache_aligned;
> 
> +#define boarder_model_dispatch()                                                              \
> +	fprintf(f, "+-------------------------------+---------------+--------" \
> +		   "-------+---------------+---------------+---------------+" \
> +		   "---------------+---------------+-" \
> +		   "----------+\n")
> +
>  #define boarder()                                                              \
>  	fprintf(f, "+-------------------------------+---------------+--------" \
>  		   "-------+---------------+---------------+---------------+-" \
>  		   "----------+\n")
> 
>  static inline void
> -print_banner(FILE *f)
> +print_banner_default(FILE *f)
>  {
>  	boarder();
>  	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node",
> "|calls",
> @@ -55,6 +61,28 @@ print_banner(FILE *f)
>  	boarder();
>  }
> 
> +static inline void
> +print_banner_dispatch(FILE *f)
> +{
> +	boarder_model_dispatch();
> +	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
> +		"|Node", "|calls",
> +		"|objs", "|sched objs", "|sched fail",
> +		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
> +		"|cycles/call|");
> +	boarder_model_dispatch();
> +}
> +
> +static inline void
> +print_banner(FILE *f)
> +{
> +	if
> (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())-
> >graph) ==
> +	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
> +		print_banner_dispatch(f);
> +	else
> +		print_banner_default(f);
> +}
> +
>  static inline void
>  print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
>  {
> @@ -76,11 +104,22 @@ print_node(FILE *f, const struct
> rte_graph_cluster_node_stats *stat)
>  	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
>  	objs_per_sec /= 1000000;
> 
> -	fprintf(f,
> -		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
> -		"|%-15.3f|%-15.6f|%-11.4f|\n",
> -		stat->name, calls, objs, stat->realloc_count, objs_per_call,
> -		objs_per_sec, cycles_per_call);
> +	if
> (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())-
> >graph) ==
> +	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
> +		fprintf(f,
> +			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15"
> PRIu64
> +			"|%-15" PRIu64 "|%-15" PRIu64
> +			"|%-15.3f|%-15.6f|%-11.4f|\n",
> +			stat->name, calls, objs, stat->dispatch.sched_objs,
> +			stat->dispatch.sched_fail, stat->realloc_count,
> objs_per_call,
> +			objs_per_sec, cycles_per_call);
> +	} else {
> +		fprintf(f,
> +			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15"
> PRIu64
> +			"|%-15.3f|%-15.6f|%-11.4f|\n",
> +			stat->name, calls, objs, stat->realloc_count,
> objs_per_call,
> +			objs_per_sec, cycles_per_call);
> +	}
>  }
> 
>  static int
> @@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last,
> void *cookie,
>  		       const struct rte_graph_cluster_node_stats *stat)
>  {
>  	FILE *f = cookie;
> +	int model;
> +
> +	model =
> rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())-
> >graph);
> 
>  	if (unlikely(is_first))
>  		print_banner(f);
>  	if (stat->objs)
>  		print_node(f, stat);
> -	if (unlikely(is_last))
> -		boarder();
> +	if (unlikely(is_last)) {
> +		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> +			boarder_model_dispatch();
> +		else
> +			boarder();
> +	}
> 
>  	return 0;
>  };
> @@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node
> *cluster)
>  {
>  	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
>  	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
> +	uint64_t sched_objs = 0, sched_fail = 0;
>  	struct rte_node *node;
>  	rte_node_t count;
> +	int model;
> 
> +	model =
> rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())-
> >graph);
>  	for (count = 0; count < cluster->nb_nodes; count++) {
>  		node = cluster->nodes[count];
> 
> +		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
> +			sched_objs += node->dispatch.total_sched_objs;
> +			sched_fail += node->dispatch.total_sched_fail;
> +		}
> +
>  		calls += node->total_calls;
>  		objs += node->total_objs;
>  		cycles += node->total_cycles;
> @@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node
> *cluster)
>  	stat->calls = calls;
>  	stat->objs = objs;
>  	stat->cycles = cycles;
> +
> +	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
> +		stat->dispatch.sched_objs = sched_objs;
> +		stat->dispatch.sched_fail = sched_fail;
> +	}
> +
>  	stat->ts = rte_get_timer_cycles();
>  	stat->realloc_count = realloc_count;
>  }
> diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
> index 2ffee520b1..28e50e49b8 100644
> --- a/lib/graph/rte_graph.h
> +++ b/lib/graph/rte_graph.h
> @@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
>  	uint64_t prev_objs;	/**< Previous number of processed objs. */
>  	uint64_t prev_cycles;	/**< Previous number of cycles. */
> 
> +	RTE_STD_C11
> +	union {
> +		struct {
> +			uint64_t sched_objs;
> +			/**< Previous number of scheduled objs for dispatch
> model. */
> +			uint64_t sched_fail;
> +			/**< Previous number of failed schedule objs for
> dispatch model. */
> +		} dispatch;
> +	};
> +
>  	uint64_t realloc_count; /**< Realloc count. */
> 
>  	rte_node_t id;	/**< Node identifier of stats. */
> diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c
> b/lib/graph/rte_graph_model_mcore_dispatch.c
> index 8f4bc860ab..d1291b8c57 100644
> --- a/lib/graph/rte_graph_model_mcore_dispatch.c
> +++ b/lib/graph/rte_graph_model_mcore_dispatch.c
> @@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node
> *node, struct rte_graph *graph)
>  		rte_pause();
> 
>  	off += size;
> +	node->dispatch.total_sched_objs += size;
>  	node->idx -= size;
>  	if (node->idx > 0)
>  		goto submit_again;
> @@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node
> *node, struct rte_graph *graph)
>  		memmove(&node->objs[0], &node->objs[off],
>  			node->idx * sizeof(void *));
> 
> +	node->dispatch.total_sched_fail += node->idx;
> +
>  	return false;
>  }
> 
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index d6a16dc7e3..a6bae4c6a5 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -110,6 +110,8 @@ struct rte_node {
>  		/* Fast schedule area for mcore dispatch model */
>  		struct {
>  			unsigned int lcore_id;  /**< Node running lcore. */
> +			uint64_t total_sched_objs; /**< Number of objects
> scheduled. */
> +			uint64_t total_sched_fail; /**< Number of scheduled
> failure. */
>  		} dispatch;
>  	};
>  	/* Fast path area  */
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v10 10/16] graph: introduce stream moving cross cores
  2023-06-08  9:57                   ` [PATCH v10 10/16] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-08 13:39                     ` Pavan Nikhilesh Bhagavatula
  0 siblings, 0 replies; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-06-08 13:39 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom

> This patch introduces key functions to allow a worker thread to
> enable enqueue and move streams of objects to the next nodes over
> different cores for mcore dispatch model.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

> ---
>  lib/graph/graph.c                          |   6 +-
>  lib/graph/graph_private.h                  |  31 ++++
>  lib/graph/meson.build                      |   2 +-
>  lib/graph/rte_graph.h                      |  15 +-
>  lib/graph/rte_graph_model_mcore_dispatch.c | 158
> +++++++++++++++++++++
>  lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
>  lib/graph/version.map                      |   2 +
>  7 files changed, 254 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/graph/graph.c b/lib/graph/graph.c
> index 968cbbf86c..41251e3435 100644
> --- a/lib/graph/graph.c
> +++ b/lib/graph/graph.c
> @@ -473,7 +473,7 @@ rte_graph_destroy(rte_graph_t id)
>  }
> 
>  static rte_graph_t
> -graph_clone(struct graph *parent_graph, const char *name)
> +graph_clone(struct graph *parent_graph, const char *name, struct
> rte_graph_param *prm)
>  {
>  	struct graph_node *graph_node;
>  	struct graph *graph;
> @@ -547,14 +547,14 @@ graph_clone(struct graph *parent_graph, const
> char *name)
>  }
> 
>  rte_graph_t
> -rte_graph_clone(rte_graph_t id, const char *name)
> +rte_graph_clone(rte_graph_t id, const char *name, struct
> rte_graph_param *prm)
>  {
>  	struct graph *graph;
> 
>  	GRAPH_ID_CHECK(id);
>  	STAILQ_FOREACH(graph, &graph_list, next)
>  		if (graph->id == id)
> -			return graph_clone(graph, name);
> +			return graph_clone(graph, name, prm);
> 
>  fail:
>  	return RTE_GRAPH_ID_INVALID;
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index d84174b667..d0ef13b205 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
>   */
>  void node_dump(FILE *f, struct node *n);
> 
> +/**
> + * @internal
> + *
> + * Create the graph schedule work queue for mcore dispatch model.
> + * All cloned graphs attached to the parent graph MUST be destroyed
> together
> + * for fast schedule design limitation.
> + *
> + * @param _graph
> + *   The graph object
> + * @param _parent_graph
> + *   The parent graph object which holds the run-queue head.
> + * @param prm
> + *   Graph parameter, includes model-specific parameters in this graph.
> + *
> + * @return
> + *   - 0: Success.
> + *   - <0: Graph schedule work queue related error.
> + */
> +int graph_sched_wq_create(struct graph *_graph, struct graph
> *_parent_graph,
> +			   struct rte_graph_param *prm);
> +
> +/**
> + * @internal
> + *
> + * Destroy the graph schedule work queue for mcore dispatch model.
> + *
> + * @param _graph
> + *   The graph object
> + */
> +void graph_sched_wq_destroy(struct graph *_graph);
> +
>  #endif /* _RTE_GRAPH_PRIVATE_H_ */
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index 0685cf9e72..9d51eabe33 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -20,4 +20,4 @@ sources = files(
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
> 
> -deps += ['eal', 'pcapng']
> +deps += ['eal', 'pcapng', 'mempool', 'ring']
> diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
> index 998cade200..2ffee520b1 100644
> --- a/lib/graph/rte_graph.h
> +++ b/lib/graph/rte_graph.h
> @@ -169,6 +169,17 @@ struct rte_graph_param {
>  	bool pcap_enable; /**< Pcap enable. */
>  	uint64_t num_pkt_to_capture; /**< Number of packets to capture.
> */
>  	char *pcap_filename; /**< Filename in which packets to be
> captured.*/
> +
> +	RTE_STD_C11
> +	union {
> +		struct {
> +			uint64_t rsvd; /**< Reserved for rtc model. */
> +		} rtc;
> +		struct {
> +			uint32_t wq_size_max; /**< Maximum size of
> workqueue for dispatch model. */
> +			uint32_t mp_capacity; /**< Capacity of memory pool
> for dispatch model. */
> +		} dispatch;
> +	};
>  };
> 
>  /**
> @@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
>   *   Name of the new graph. The library prepends the parent graph name to
> the
>   * user-specified name. The final graph name will be,
>   * "parent graph name" + "-" + name.
> + * @param prm
> + *   Graph parameter, includes model-specific parameters in this graph.
>   *
>   * @return
>   *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
>   */
>  __rte_experimental
> -rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
> +rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct
> rte_graph_param *prm);
> 
>  /**
>   * Get graph id from graph name.
> diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c
> b/lib/graph/rte_graph_model_mcore_dispatch.c
> index 9df2479a10..8f4bc860ab 100644
> --- a/lib/graph/rte_graph_model_mcore_dispatch.c
> +++ b/lib/graph/rte_graph_model_mcore_dispatch.c
> @@ -5,6 +5,164 @@
>  #include "graph_private.h"
>  #include "rte_graph_model_mcore_dispatch.h"
> 
> +int
> +graph_sched_wq_create(struct graph *_graph, struct graph
> *_parent_graph,
> +		       struct rte_graph_param *prm)
> +{
> +	struct rte_graph *parent_graph = _parent_graph->graph;
> +	struct rte_graph *graph = _graph->graph;
> +	unsigned int wq_size;
> +	unsigned int flags = RING_F_SC_DEQ;
> +
> +	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
> +	wq_size = rte_align32pow2(wq_size + 1);
> +
> +	if (prm->dispatch.wq_size_max > 0)
> +		wq_size = wq_size <= (prm->dispatch.wq_size_max) ?
> wq_size :
> +			prm->dispatch.wq_size_max;
> +
> +	if (!rte_is_power_of_2(wq_size))
> +		flags |= RING_F_EXACT_SZ;
> +
> +	graph->dispatch.wq = rte_ring_create(graph->name, wq_size,
> graph->socket,
> +					     flags);
> +	if (graph->dispatch.wq == NULL)
> +		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
> +
> +	if (prm->dispatch.mp_capacity > 0)
> +		wq_size = (wq_size <= prm->dispatch.mp_capacity) ?
> wq_size :
> +			prm->dispatch.mp_capacity;
> +
> +	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
> +						sizeof(struct
> graph_mcore_dispatch_wq_node),
> +						0, 0, NULL, NULL, NULL, NULL,
> +						graph->socket,
> MEMPOOL_F_SP_PUT);
> +	if (graph->dispatch.mp == NULL)
> +		SET_ERR_JMP(EIO, fail_mp,
> +			    "Failed to allocate graph WQ schedule entry");
> +
> +	graph->dispatch.lcore_id = _graph->lcore_id;
> +
> +	if (parent_graph->dispatch.rq == NULL) {
> +		parent_graph->dispatch.rq = &parent_graph-
> >dispatch.rq_head;
> +		SLIST_INIT(parent_graph->dispatch.rq);
> +	}
> +
> +	graph->dispatch.rq = parent_graph->dispatch.rq;
> +	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, next);
> +
> +	return 0;
> +
> +fail_mp:
> +	rte_ring_free(graph->dispatch.wq);
> +	graph->dispatch.wq = NULL;
> +fail:
> +	return -rte_errno;
> +}
> +
> +void
> +graph_sched_wq_destroy(struct graph *_graph)
> +{
> +	struct rte_graph *graph = _graph->graph;
> +
> +	if (graph == NULL)
> +		return;
> +
> +	rte_ring_free(graph->dispatch.wq);
> +	graph->dispatch.wq = NULL;
> +
> +	rte_mempool_free(graph->dispatch.mp);
> +	graph->dispatch.mp = NULL;
> +}
> +
> +static __rte_always_inline bool
> +__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph
> *graph)
> +{
> +	struct graph_mcore_dispatch_wq_node *wq_node;
> +	uint16_t off = 0;
> +	uint16_t size;
> +
> +submit_again:
> +	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
> +		goto fallback;
> +
> +	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
> +	wq_node->node_off = node->off;
> +	wq_node->nb_objs = size;
> +	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void
> *));
> +
> +	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void
> *)&wq_node,
> +					     sizeof(wq_node), 1, NULL) == 0)
> +		rte_pause();
> +
> +	off += size;
> +	node->idx -= size;
> +	if (node->idx > 0)
> +		goto submit_again;
> +
> +	return true;
> +
> +fallback:
> +	if (off != 0)
> +		memmove(&node->objs[0], &node->objs[off],
> +			node->idx * sizeof(void *));
> +
> +	return false;
> +}
> +
> +bool __rte_noinline
> +__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node
> *node,
> +					      struct rte_graph_rq_head *rq)
> +{
> +	const unsigned int lcore_id = node->dispatch.lcore_id;
> +	struct rte_graph *graph;
> +
> +	SLIST_FOREACH(graph, rq, next)
> +		if (graph->dispatch.lcore_id == lcore_id)
> +			break;
> +
> +	return graph != NULL ? __graph_sched_node_enqueue(node,
> graph) : false;
> +}
> +
> +void
> +__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph
> *graph)
> +{
> +#define WQ_SZ 32
> +	struct graph_mcore_dispatch_wq_node *wq_node;
> +	struct rte_mempool *mp = graph->dispatch.mp;
> +	struct rte_ring *wq = graph->dispatch.wq;
> +	uint16_t idx, free_space;
> +	struct rte_node *node;
> +	unsigned int i, n;
> +	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
> +
> +	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes,
> sizeof(wq_nodes[0]),
> +					   RTE_DIM(wq_nodes), NULL);
> +	if (n == 0)
> +		return;
> +
> +	for (i = 0; i < n; i++) {
> +		wq_node = wq_nodes[i];
> +		node = RTE_PTR_ADD(graph, wq_node->node_off);
> +		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +		idx = node->idx;
> +		free_space = node->size - idx;
> +
> +		if (unlikely(free_space < wq_node->nb_objs))
> +			__rte_node_stream_alloc_size(graph, node, node-
> >size + wq_node->nb_objs);
> +
> +		memmove(&node->objs[idx], wq_node->objs, wq_node-
> >nb_objs * sizeof(void *));
> +		node->idx = idx + wq_node->nb_objs;
> +
> +		__rte_node_process(graph, node);
> +
> +		wq_node->nb_objs = 0;
> +		node->idx = 0;
> +	}
> +
> +	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
> +}
> +
>  int
>  rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char
> *name, unsigned int lcore_id)
>  {
> diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h
> b/lib/graph/rte_graph_model_mcore_dispatch.h
> index 7da0483d13..6163f96c37 100644
> --- a/lib/graph/rte_graph_model_mcore_dispatch.h
> +++ b/lib/graph/rte_graph_model_mcore_dispatch.h
> @@ -20,8 +20,53 @@
>  extern "C" {
>  #endif
> 
> +#include <rte_errno.h>
> +#include <rte_mempool.h>
> +#include <rte_memzone.h>
> +#include <rte_ring.h>
> +
>  #include "rte_graph_worker_common.h"
> 
> +#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
> +#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
> +	((typeof(nb_nodes))((nb_nodes) *
> GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
> +
> +/**
> + * @internal
> + *
> + * Schedule the node to the right graph's work queue for mcore dispatch
> model.
> + *
> + * @param node
> + *   Pointer to the scheduled node object.
> + * @param rq
> + *   Pointer to the scheduled run-queue for all graphs.
> + *
> + * @return
> + *   True on success, false otherwise.
> + *
> + * @note
> + * This implementation is used by mcore dispatch model only and user
> application
> + * should not call it directly.
> + */
> +__rte_experimental
> +bool __rte_noinline
> __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node
> *node,
> +								  struct
> rte_graph_rq_head *rq);
> +
> +/**
> + * @internal
> + *
> + * Process all nodes (streams) in the graph's work queue for mcore dispatch
> model.
> + *
> + * @param graph
> + *   Pointer to the graph object.
> + *
> + * @note
> + * This implementation is used by mcore dispatch model only and user
> application
> + * should not call it directly.
> + */
> +__rte_experimental
> +void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph
> *graph);
> +
>  /**
>   * Set lcore affinity with the node used for mcore dispatch model.
>   *
> diff --git a/lib/graph/version.map b/lib/graph/version.map
> index dbb3507687..f95a6b0fb5 100644
> --- a/lib/graph/version.map
> +++ b/lib/graph/version.map
> @@ -50,6 +50,8 @@ EXPERIMENTAL {
>  	rte_graph_worker_model_set;
>  	rte_graph_worker_model_get;
>  	rte_graph_worker_model_no_check_get;
> +	__rte_graph_mcore_dispatch_sched_wq_process;
> +	__rte_graph_mcore_dispatch_sched_node_enqueue;
> 
>  	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
> 
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v10 11/16] graph: enable create and destroy graph scheduling workqueue
  2023-06-08  9:57                   ` [PATCH v10 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-08 13:43                     ` Pavan Nikhilesh Bhagavatula
  0 siblings, 0 replies; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-06-08 13:43 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom

> This patch enables to create and destroy scheduling workqueue into
> common graph operations.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  lib/graph/graph.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/graph/graph.c b/lib/graph/graph.c
> index 41251e3435..0c28d925bc 100644
> --- a/lib/graph/graph.c
> +++ b/lib/graph/graph.c
> @@ -451,6 +451,11 @@ rte_graph_destroy(rte_graph_t id)
>  	while (graph != NULL) {
>  		tmp = STAILQ_NEXT(graph, next);
>  		if (graph->id == id) {
> +			/* Destroy the schedule work queue if has */
> +			if (rte_graph_worker_model_get(graph->graph) ==
> +			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
> +				graph_sched_wq_destroy(graph);
> +
>  			/* Call fini() of the all the nodes in the graph */
>  			graph_node_fini(graph);
>  			/* Destroy graph fast path memory */
> @@ -524,6 +529,11 @@ graph_clone(struct graph *parent_graph, const char
> *name, struct rte_graph_param
>  	/* Clone the graph model */
>  	graph->graph->model = parent_graph->graph->model;
> 
> +	/* Create the graph schedule work queue */
> +	if (rte_graph_worker_model_get(graph->graph) ==
> RTE_GRAPH_MODEL_MCORE_DISPATCH &&
> +	    graph_sched_wq_create(graph, parent_graph, prm))
> +		goto graph_mem_destroy;
> +
>  	/* Call init() of the all the nodes in the graph */
>  	if (graph_node_init(graph))
>  		goto graph_mem_destroy;
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v10 12/16] graph: introduce graph walk by cross-core dispatch
  2023-06-08  9:57                   ` [PATCH v10 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-08 13:43                     ` Pavan Nikhilesh Bhagavatula
  0 siblings, 0 replies; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-06-08 13:43 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom

> This patch introduces the task scheduler mechanism to enable dispatching
> tasks to another worker cores. Currently, there is only a local work
> queue for one graph to walk. We introduce a scheduler worker queue in
> each worker core for dispatching tasks. It will perform the walk on
> scheduler work queue first, then handle the local work queue.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>

Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

> ---
>  lib/graph/rte_graph_model_mcore_dispatch.h | 44
> ++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h
> b/lib/graph/rte_graph_model_mcore_dispatch.h
> index 6163f96c37..c78a3bbdf9 100644
> --- a/lib/graph/rte_graph_model_mcore_dispatch.h
> +++ b/lib/graph/rte_graph_model_mcore_dispatch.h
> @@ -83,6 +83,50 @@ __rte_experimental
>  int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char
> *name,
>  							   unsigned int
> lcore_id);
> 
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +__rte_experimental
> +static inline void
> +rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
> +{
> +	const rte_graph_off_t *cir_start = graph->cir_start;
> +	const rte_node_t mask = graph->cir_mask;
> +	uint32_t head = graph->head;
> +	struct rte_node *node;
> +
> +	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
> +	if (graph->dispatch.wq != NULL)
> +		__rte_graph_mcore_dispatch_sched_wq_process(graph);
> +
> +	while (likely(head != graph->tail)) {
> +		node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> +
> +		/* skip the src nodes which not bind with current worker */
> +		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph-
> >dispatch.lcore_id)
> +			continue;
> +
> +		/* Schedule the node until all task/objs are done */
> +		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
> +		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
> +		    graph->dispatch.rq != NULL &&
> +
> __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph-
> >dispatch.rq))
> +			continue;
> +
> +		__rte_node_process(graph, node);
> +
> +		head = likely((int32_t)head > 0) ? head & mask : head;
> +	}
> +
> +	graph->tail = 0;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v10 06/16] graph: introduce graph bind unbind API
  2023-06-08 10:40                     ` Jerin Jacob
@ 2023-06-08 13:47                       ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-08 13:47 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, June 8, 2023 6:41 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v10 06/16] graph: introduce graph bind unbind API
> 
> On Thu, Jun 8, 2023 at 3:35 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add lcore_id for graph to hold affinity core id where graph would run on.
> > Add bind/unbind API to set/unset graph affinity attribute. lcore_id
> > will be set as MAX by default, it means not enable this attribute.
> >
> 
> > diff --git a/lib/graph/version.map b/lib/graph/version.map index
> > aca38b23f0..5a6e13e62b 100644
> > --- a/lib/graph/version.map
> > +++ b/lib/graph/version.map
> > @@ -18,6 +18,8 @@ EXPERIMENTAL {
> >         rte_graph_node_get_by_name;
> >         rte_graph_obj_dump;
> >         rte_graph_walk;
> > +       rte_graph_model_mcore_dispatch_core_bind;
> > +       rte_graph_model_mcore_dispatch_core_unbind;
> 
> Across the patch, Please update in symbols in alphabetical order when adding it.
> 
Will reorder new symbols in next version. Thanks.
> >
> >         rte_graph_cluster_stats_create;
> >         rte_graph_cluster_stats_destroy;
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-08 12:08                     ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2023-06-08 13:50                       ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-08 13:50 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen,
	jerinjacobk
  Cc: Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, June 8, 2023 8:09 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; stephen@networkplumber.org;
> jerinjacobk@gmail.com
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>
> Subject: RE: [EXT] [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore
> dispatch worker model
> 
> > Add new parameter "model" to choose mcore dispatch or rtc model.
> > And in dispatch model, the node will affinity to worker core successively.
> >
> > Note:
> > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must
> 
> Should be RTE_GRAPH_MODEL_RTC
Good catch. I will correct it in next version. Thanks.

> 
> > set
> > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > dispatch explicitly. If not define it, it could choose by param model
> > in runtime.
> > Only support one RX node for mcore dispatch model in current
> > implementation.
> >
> > ./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
> > --model="dispatch"
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> 
> Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> > ---
> >  doc/guides/sample_app_ug/l3_forward_graph.rst |  16 ++
> >  examples/l3fwd-graph/main.c                   | 230 +++++++++++++++---
> >  2 files changed, 208 insertions(+), 38 deletions(-)
> >


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [EXT] [PATCH v10 13/16] graph: enable graph multicore dispatch scheduler model
  2023-06-08  9:57                   ` [PATCH v10 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
  2023-06-08 10:42                     ` Jerin Jacob
@ 2023-06-08 14:29                     ` Pavan Nikhilesh Bhagavatula
  1 sibling, 0 replies; 369+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2023-06-08 14:29 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, stephen,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom

> This patch enables to chose new scheduler model. Must define
> RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
> to enable specific model choosing.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>

Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  doc/guides/prog_guide/graph_lib.rst | 71
> ++++++++++++++++++++++++++---
>  lib/graph/rte_graph_worker.h        | 13 ++++++
>  2 files changed, 77 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/graph_lib.rst
> b/doc/guides/prog_guide/graph_lib.rst
> index 1cfdc86433..017cc25fd3 100644
> --- a/doc/guides/prog_guide/graph_lib.rst
> +++ b/doc/guides/prog_guide/graph_lib.rst
> @@ -189,13 +189,70 @@ In the above example, A graph object will be
> created with ethdev Rx
>  node of port 0 and queue 0, all ipv4* nodes in the system,
>  and ethdev tx node of all ports.
> 
> -Multicore graph processing
> -~~~~~~~~~~~~~~~~~~~~~~~~~~
> -In the current graph library implementation, specifically,
> -``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
> -are designed to work on single-core to have better performance.
> -The fast path API works on graph object, So the multi-core graph
> -processing strategy would be to create graph object PER WORKER.
> +Graph models
> +~~~~~~~~~~~~
> +There are two different kinds of graph walking models. User can select the
> model using
> +``rte_graph_worker_model_set()`` API. If the application decides to use
> only one model,
> +the fast path check can be avoided by defining the model with
> RTE_GRAPH_MODEL_SELECT.
> +For example:
> +
> +.. code-block:: console
> +
> +#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
> +#include "rte_graph_worker.h"
> +
> +RTC (Run-To-Completion)
> +^^^^^^^^^^^^^^^^^^^^^^^
> +This is the default graph walking model. Specifically,
> ``rte_graph_walk_rtc()`` and
> +``rte_node_enqueue*`` fast path API functions are designed to work on
> single-core to
> +have better performance. The fast path API works on graph object, So the
> multi-core
> +graph processing strategy would be to create graph object PER WORKER.
> +
> +Example:
> +
> +Graph: node-0 -> node-1 -> node-2 @Core0.
> +
> +.. code-block:: diff
> +
> +    + - - - - - - - - - - - - - - - - - - - - - +
> +    '                  Core #0                  '
> +    '                                           '
> +    ' +--------+     +---------+     +--------+ '
> +    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> +    ' +--------+     +---------+     +--------+ '
> +    '                                           '
> +    + - - - - - - - - - - - - - - - - - - - - - +
> +
> +Dispatch model
> +^^^^^^^^^^^^^^
> +The dispatch model enables a cross-core dispatching mechanism which
> employs
> +a scheduling work-queue to dispatch streams to other worker cores which
> +being associated with the destination node.
> +
> +Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore
> affinity
> +with the node.
> +Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to
> clone
> +graph for each worker and
> use``rte_graph_model_mcore_dispatch_core_bind()`` to
> +bind graph with the worker core.
> +
> +Example:
> +
> +Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
> +Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
> +
> +.. code-block:: diff
> +
> +    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> +    '  Core #0   '     '          Core #1         '     '  Core #2   '
> +    '            '     '                          '     '            '
> +    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> +    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> +    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> +    '            '     '     |                    '     '      ^     '
> +    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> +                             |                                 |
> +                             + - - - - - - - - - - - - - - - - +
> +
> 
>  In fast path
>  ~~~~~~~~~~~~
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index 5b58f7bda9..6685600813 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -11,6 +11,7 @@ extern "C" {
>  #endif
> 
>  #include "rte_graph_model_rtc.h"
> +#include "rte_graph_model_mcore_dispatch.h"
> 
>  /**
>   * Perform graph walk on the circular buffer and invoke the process function
> @@ -25,7 +26,19 @@ __rte_experimental
>  static inline void
>  rte_graph_walk(struct rte_graph *graph)
>  {
> +#if defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT
> == RTE_GRAPH_MODEL_RTC)
>  	rte_graph_walk_rtc(graph);
> +#elif defined(RTE_GRAPH_MODEL_SELECT) &&
> (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
> +	rte_graph_walk_mcore_dispatch(graph);
> +#else
> +	switch (rte_graph_worker_model_no_check_get(graph)) {
> +	case RTE_GRAPH_MODEL_MCORE_DISPATCH:
> +		rte_graph_walk_mcore_dispatch(graph);
> +		break;
> +	default:
> +		rte_graph_walk_rtc(graph);
> +	}
> +#endif
>  }
> 
>  #ifdef __cplusplus
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 00/16] graph enhancement for multi-core dispatch
  2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                     ` (15 preceding siblings ...)
  2023-06-08  9:57                   ` [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
@ 2023-06-08 15:18                   ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 01/16] graph: rename rte_graph_work as common Zhirun Yan
                                       ` (17 more replies)
  16 siblings, 18 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V11:
Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
Update commit message in patch 16.

V10:
Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
in patch 04.
Change RTE_ASSERT to return in patch 06.
Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
Move stats into dispatch union in patch 14.
Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).

V9:
Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
Fix graph model check in rte_graph_worker_model_set() in patch 04.
Fix typo in doc.

V8:
No performance dorp for original l3fwd-graph and graph_perf_autotest.

Update graph model set/get functions and add graph_model_is_valid() in patch 04.
Update doc for new scheme usage(choose model in runtime or compile time).
Update dispatch schedule struct into union.
Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
Add model clone in graph_clone() in patch 08.
Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (16):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for mcore dispatch model
  test/graph: add functional tests for mcore dispatch model
  examples/l3fwd-graph: introduce mcore dispatch worker model

 MAINTAINERS                                   |   3 +-
 app/test/test_graph.c                         | 130 ++++
 doc/guides/prog_guide/graph_lib.rst           |  71 ++-
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
 examples/l3fwd-graph/main.c                   | 230 +++++--
 lib/graph/graph.c                             | 161 +++++
 lib/graph/graph_debug.c                       |   6 +
 lib/graph/graph_populate.c                    |   1 +
 lib/graph/graph_private.h                     |  90 +++
 lib/graph/graph_stats.c                       |  76 ++-
 lib/graph/meson.build                         |   4 +-
 lib/graph/node.c                              |  27 +-
 lib/graph/rte_graph.h                         |  65 ++
 lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
 lib/graph/rte_graph_model_mcore_dispatch.h    | 134 ++++
 lib/graph/rte_graph_model_rtc.h               |  46 ++
 lib/graph/rte_graph_worker.c                  |  39 ++
 lib/graph/rte_graph_worker.h                  | 503 +--------------
 lib/graph/rte_graph_worker_common.h           | 598 ++++++++++++++++++
 lib/graph/version.map                         |  11 +
 20 files changed, 1834 insertions(+), 568 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 01/16] graph: rename rte_graph_work as common
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 02/16] graph: split graph worker into common and default model Zhirun Yan
                                       ` (16 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 MAINTAINERS                                                 | 3 ++-
 lib/graph/graph_pcap.c                                      | 2 +-
 lib/graph/graph_private.h                                   | 2 +-
 lib/graph/meson.build                                       | 2 +-
 lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++---
 5 files changed, 8 insertions(+), 7 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 48830ae571..34ac499c14 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1716,10 +1716,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 99%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..0bad2938f3 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,8 +2,8 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
  * @file rte_graph_worker.h
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 02/16] graph: split graph worker into common and default model
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 01/16] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 03/16] graph: move node process into inline function Zhirun Yan
                                       ` (15 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 6 files changed, 100 insertions(+), 60 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 0bad2938f3..b58f8f6947 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 03/16] graph: move node process into inline function
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 01/16] graph: rename rte_graph_work as common Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 02/16] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 04/16] graph: add get/set graph worker model APIs Zhirun Yan
                                       ` (14 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b58f8f6947..41428974db 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 04/16] graph: add get/set graph worker model APIs
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (2 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 03/16] graph: move node process into inline function Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 05/16] graph: introduce graph node core affinity API Zhirun Yan
                                       ` (13 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 39 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 70 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  5 +++
 4 files changed, 115 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..7e2a918fae
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+bool
+rte_graph_model_is_valid(uint8_t model)
+{
+	if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		return false;
+
+	return true;
+}
+
+int
+rte_graph_worker_model_set(uint8_t model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	if (!rte_graph_model_is_valid(model))
+		return -EINVAL;
+
+	STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+
+	return 0;
+}
+
+uint8_t
+rte_graph_worker_model_get(struct rte_graph *graph)
+{
+	if (!rte_graph_model_is_valid(graph->model))
+		return -EINVAL;
+
+	return graph->model;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 41428974db..3a32001e35 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+/* When adding a new graph model entry, update rte_graph_model_is_valid() implementation. */
+#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+/**< Dispatch model to support cross-core dispatching within core affinity. */
+#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
+
 /**
  * @internal
  *
@@ -41,6 +48,9 @@ struct rte_graph {
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	uint8_t model;		     /**< graph model */
+	uint8_t reserved1;	     /**< Reserved for future use. */
+	uint16_t reserved2;	     /**< Reserved for future use. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -490,6 +500,66 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+/**
+ * Test the validity of model.
+ *
+ * @param model
+ *   Model to check.
+ *
+ * @return
+ *   True if graph model is valid, false otherwise.
+ */
+__rte_experimental
+bool
+rte_graph_model_is_valid(uint8_t model);
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param model
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_graph_worker_model_set(uint8_t model);
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for slow path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+uint8_t rte_graph_worker_model_get(struct rte_graph *graph);
+
+/**
+ * Get the graph worker model without check
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for fast path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static __rte_always_inline
+uint8_t rte_graph_worker_model_no_check_get(struct rte_graph *graph)
+{
+	return graph->model;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..e9a680a45e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -14,10 +14,15 @@ EXPERIMENTAL {
 	rte_graph_lookup;
 	rte_graph_list_dump;
 	rte_graph_max_count;
+	rte_graph_model_is_valid;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_worker_model_get;
+	rte_graph_worker_model_no_check_get;
+	rte_graph_worker_model_set;
+
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 05/16] graph: introduce graph node core affinity API
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (3 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 04/16] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 06/16] graph: introduce graph bind unbind API Zhirun Yan
                                       ` (12 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  1 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  1 +
 6 files changed, 80 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..0685cf9e72 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index e9a680a45e..6ae19b0d6e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,7 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 06/16] graph: introduce graph bind unbind API
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (4 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 05/16] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 07/16] graph: move node clone name func into private as common Zhirun Yan
                                       ` (11 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 60 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 ++++++++++++++
 lib/graph/version.map     |  3 +-
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..8d5bd8b9ae 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,65 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	if (graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		goto fail;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 6ae19b0d6e..9a20dba5e7 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,8 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
@@ -24,7 +26,6 @@ EXPERIMENTAL {
 	rte_graph_worker_model_no_check_get;
 	rte_graph_worker_model_set;
 
-
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
 	rte_graph_cluster_stats_get;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 07/16] graph: move node clone name func into private as common
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (5 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 06/16] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
                                       ` (10 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 08/16] graph: introduce graph clone API for other worker core
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (6 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 07/16] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 09/16] graph: add structure for stream moving between cores Zhirun Yan
                                       ` (9 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 89 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 112 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8d5bd8b9ae..1b34f0e543 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -405,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -469,6 +470,94 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Clone the graph model */
+	graph->graph->model = parent_graph->graph->model;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9a20dba5e7..9e92b54ffa 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -5,6 +5,7 @@ EXPERIMENTAL {
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
 
+	rte_graph_clone;
 	rte_graph_create;
 	rte_graph_destroy;
 	rte_graph_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 09/16] graph: add structure for stream moving between cores
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (7 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 10/16] graph: introduce stream moving cross cores Zhirun Yan
                                       ` (8 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 1b34f0e543..968cbbf86c 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -291,6 +291,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 		goto fail;
 
 	graph->lcore_id = lcore;
+	graph->graph->dispatch.lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -314,6 +315,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->dispatch.lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 3a32001e35..9e519b4d9d 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -36,12 +36,20 @@ extern "C" {
 /**< Dispatch model to support cross-core dispatching within core affinity. */
 #define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
  * Data structure to hold graph data.
  */
 struct rte_graph {
+	/* Fast path area. */
 	uint32_t tail;		     /**< Tail of circular buffer. */
 	uint32_t head;		     /**< Head of circular buffer. */
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
@@ -51,6 +59,20 @@ struct rte_graph {
 	uint8_t model;		     /**< graph model */
 	uint8_t reserved1;	     /**< Reserved for future use. */
 	uint16_t reserved2;	     /**< Reserved for future use. */
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+			struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+			unsigned int lcore_id;  /**< The graph running Lcore. */
+			struct rte_ring *wq;    /**< The work-queue for pending streams. */
+			struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+		} dispatch; /** Only used by dispatch model */
+	};
+	SLIST_ENTRY(rte_graph) next;   /* The next for rte_graph list */
+	/* End of Fast path area.*/
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -83,6 +105,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			unsigned int lcore_id;  /**< Node running lcore. */
+		} dispatch;
+	};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 10/16] graph: introduce stream moving cross cores
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (8 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 09/16] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                                       ` (7 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c                          |   6 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   3 +
 7 files changed, 255 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 968cbbf86c..41251e3435 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -473,7 +473,7 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -547,14 +547,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 0685cf9e72..9d51eabe33 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..8f4bc860ab 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->dispatch.wq = rte_ring_create(graph->name, wq_size, graph->socket,
+					     flags);
+	if (graph->dispatch.wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
+						sizeof(struct graph_mcore_dispatch_wq_node),
+						0, 0, NULL, NULL, NULL, NULL,
+						graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->dispatch.mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->dispatch.lcore_id = _graph->lcore_id;
+
+	if (parent_graph->dispatch.rq == NULL) {
+		parent_graph->dispatch.rq = &parent_graph->dispatch.rq_head;
+		SLIST_INIT(parent_graph->dispatch.rq);
+	}
+
+	graph->dispatch.rq = parent_graph->dispatch.rq;
+	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+
+	rte_mempool_free(graph->dispatch.mp);
+	graph->dispatch.mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void *)&wq_node,
+					     sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, next)
+		if (graph->dispatch.lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->dispatch.mp;
+	struct rte_ring *wq = graph->dispatch.wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9e92b54ffa..7e985d6308 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -1,6 +1,9 @@
 EXPERIMENTAL {
 	global:
 
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+
 	__rte_node_register;
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 11/16] graph: enable create and destroy graph scheduling workqueue
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (9 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 10/16] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                                       ` (6 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 41251e3435..0c28d925bc 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -451,6 +451,11 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get(graph->graph) ==
+			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -524,6 +529,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	/* Clone the graph model */
 	graph->graph->model = parent_graph->graph->model;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get(graph->graph) == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 12/16] graph: introduce graph walk by cross-core dispatch
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (10 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                                       ` (5 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..c78a3bbdf9 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,50 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->dispatch.wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
+		    graph->dispatch.rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->dispatch.rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 13/16] graph: enable graph multicore dispatch scheduler model
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (11 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 14/16] graph: add stats for mcore dispatch model Zhirun Yan
                                       ` (4 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model. Must define
RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
to enable specific model choosing.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/prog_guide/graph_lib.rst | 71 ++++++++++++++++++++++++++---
 lib/graph/rte_graph_worker.h        | 13 ++++++
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..017cc25fd3 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,13 +189,70 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
-are designed to work on single-core to have better performance.
-The fast path API works on graph object, So the multi-core graph
-processing strategy would be to create graph object PER WORKER.
+Graph models
+~~~~~~~~~~~~
+There are two different kinds of graph walking models. User can select the model using
+``rte_graph_worker_model_set()`` API. If the application decides to use only one model,
+the fast path check can be avoided by defining the model with RTE_GRAPH_MODEL_SELECT.
+For example:
+
+.. code-block:: console
+
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
+#include "rte_graph_worker.h"
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically, ``rte_graph_walk_rtc()`` and
+``rte_node_enqueue*`` fast path API functions are designed to work on single-core to
+have better performance. The fast path API works on graph object, So the multi-core
+graph processing strategy would be to create graph object PER WORKER.
+
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
 
 In fast path
 ~~~~~~~~~~~~
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..6685600813 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,19 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)
 	rte_graph_walk_rtc(graph);
+#elif defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	switch (rte_graph_worker_model_no_check_get(graph)) {
+	case RTE_GRAPH_MODEL_MCORE_DISPATCH:
+		rte_graph_walk_mcore_dispatch(graph);
+		break;
+	default:
+		rte_graph_walk_rtc(graph);
+	}
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 14/16] graph: add stats for mcore dispatch model
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (12 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 15/16] test/graph: add functional tests " Zhirun Yan
                                       ` (3 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for mcore dispatch model if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..9def3067ec 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get(g) == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->dispatch.total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->dispatch.total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..cc32245c05 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,28 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +104,22 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->dispatch.total_sched_objs;
+			sched_fail += node->dispatch.total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 8f4bc860ab..d1291b8c57 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->dispatch.total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->dispatch.total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 9e519b4d9d..d433293d5c 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		struct {
 			unsigned int lcore_id;  /**< Node running lcore. */
+			uint64_t total_sched_objs; /**< Number of objects scheduled. */
+			uint64_t total_sched_fail; /**< Number of scheduled failure. */
 		} dispatch;
 	};
 	/* Fast path area  */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 15/16] test/graph: add functional tests for mcore dispatch model
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (13 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 14/16] graph: add stats for mcore dispatch model Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:18                     ` [PATCH v11 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
                                       ` (2 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 app/test/test_graph.c | 130 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..8609c0b3a4 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,132 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+	int ret = 0;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		ret = -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		ret = -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->dispatch.lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		ret = -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->dispatch.lcore_id);
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test3");
+	if (rte_graph_worker_model_get(graph) != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v11 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (14 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 15/16] test/graph: add functional tests " Zhirun Yan
@ 2023-06-08 15:18                     ` Zhirun Yan
  2023-06-08 15:30                     ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Jerin Jacob
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-08 15:18 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

RTE_GRAPH_MODEL_SELECT is set to RTE_GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT if set it as rtc or mcore
dispatch explicitly. If not define it, it could choose by param model
in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 ++
 examples/l3fwd-graph/main.c                   | 230 +++++++++++++++---
 2 files changed, 208 insertions(+), 38 deletions(-)

diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index 585ac8c898..7189fa33ec 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -54,6 +54,7 @@ The application has a number of command line options similar to l3fwd::
                                    [--pcap-enable]
                                    [--pcap-num-cap]
                                    [--pcap-file-name]
+                                   [--model]
 
 Where,
 
@@ -78,6 +79,8 @@ Where,
 
 * ``--pcap-file-name:`` Optional, Pcap filename to capture packets in.
 
+* ``--model:`` Optional, select graph walking model.
+
 For example, consider a dual processor socket platform with 8 physical cores, where cores 0-7 and 16-23 appear on socket 0,
 while cores 8-15 and 24-31 appear on socket 1.
 
@@ -122,6 +125,19 @@ In this command:
 
 *   The --pcap-file-name option enables user to give filename in which packets are to be captured.
 
+To enable mcore dispatch model, the application need change RTE_GRAPH_MODEL_SELECT to ``#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH``
+before including rte_graph_worker.h. Recompile and use following command:
+
+.. code-block:: console
+
+    ./<build_dir>/examples/dpdk-l3fwd-graph -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P --model="dispatch"
+
+To enable graph walking model selection in run-time, remove the define of ``RTE_GRAPH_MODEL_SELECT``. Recompile and use the same command.
+
+In this command:
+
+*   The --model option enables user to select ``rtc`` or ``dispatch`` model.
+
 Refer to the *DPDK Getting Started Guide* for general information on running applications and
 the Environment Abstraction Layer (EAL) options.
 
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..be69fcace1 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -55,6 +56,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +92,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static uint8_t model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +159,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +295,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +338,23 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+	else
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+#if defined(RTE_GRAPH_MODEL_SELECT)
+	if (model_conf != RTE_GRAPH_MODEL_SELECT)
+		printf("Warning: model mismatch, will use the RTE_GRAPH_MODEL_SELECT model\n");
+	model_conf = RTE_GRAPH_MODEL_SELECT;
+#endif
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +471,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +488,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +500,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +592,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -788,6 +834,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -840,6 +1022,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1264,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v11 00/16] graph enhancement for multi-core dispatch
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (15 preceding siblings ...)
  2023-06-08 15:18                     ` [PATCH v11 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
@ 2023-06-08 15:30                     ` Jerin Jacob
  2023-06-09 13:39                       ` David Marchand
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
  17 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-08 15:30 UTC (permalink / raw)
  To: Zhirun Yan, Thomas Monjalon
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	cunming.liang, haiyue.wang, mattias.ronnblom

On Thu, Jun 8, 2023 at 8:55 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> V11:
> Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
> Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
> Update commit message in patch 16.

+ @Thomas Monjalon  This patch series looks good from my PoV to merge
this in rc1. If you don't find any issue, please consider merging in
this rc1.


>
> V10:
> Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
> in patch 04.
> Change RTE_ASSERT to return in patch 06.
> Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
> Move stats into dispatch union in patch 14.
> Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
> Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).
>
>
> V9:
> Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
> Fix graph model check in rte_graph_worker_model_set() in patch 04.
> Fix typo in doc.
>
> V8:
> No performance dorp for original l3fwd-graph and graph_perf_autotest.
>
> Update graph model set/get functions and add graph_model_is_valid() in patch 04.
> Update doc for new scheme usage(choose model in runtime or compile time).
> Update dispatch schedule struct into union.
> Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
> Add model clone in graph_clone() in patch 08.
> Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.
>
>
> V7:
> Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
> Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
>   compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
>   other config func to do model specific things like alloc wq, collect stats)
> Extract the common func clone_name() into graph_private.h for graph/node clone in
>   patch 07.(new patch)
> Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
> Add test case for all new APIs in patch 16(new patch).
> Remove *_END line in enum rte_graph_worker_model in patch 04.
> Add model check for graph lcore binding.
> Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
> Change all new model files/APIs with prefix _mcore_dispatch_.
> Change description of new API, comments of func/structure to explicitly mention for
>   mcore dispatch model only. Add Doxygen comments.
> Update l3fwd-graph with new scheme, Update doc.
> Update MAINTAINERS.
> Fix typo and format issues.
>
> V6:
> Change rte_rdtsc() to rte_rdtsc_precise().
> Add union in rte_graph_param to configure models.
> Remove memset in fastpath, add RTE_ASSERT for cloned graph.
> Update copyright in patch 02.
> Update l3fwd-graph node affinity, start from rx core successively.
>
> V5:
> Fix CI build issues about dynamically update doc.
>
> V4:
> Fix CI build issues about undefined reference of sched apis.
> Remove inline for model setting.
>
> V3:
> Fix CI build issues about TLS and typo.
>
> V2:
> Use git mv to keep git history.
> Use TLS for per-thread local storage.
> Change model name to mcore dispatch.
> Change API with specific mode name.
> Split big patch.
> Fix CI issues.
> Rebase l3fwd-graph example.
> Update doc and maintainers files.
>
>
> Currently, rte_graph supports RTC (Run-To-Completion) model within each
> of a single core.
> RTC is one of the typical model of packet processing. Others like
> Pipeline or Hybrid are lack of support.
>
> The patch set introduces a 'multicore dispatch' model selection which
> is a self-reacting scheme according to the core affinity.
> The new model enables a cross-core dispatching mechanism which employs a
> scheduling work-queue to dispatch streams to other worker cores which
> being associated with the destination node. When core flavor of the
> destination node is a default 'current', the stream can be continue
> executed as normal.
>
> Example:
> 3-node graph targets 3-core budget
>
> RTC:
> Graph: node-0 -> node-1 -> node-2 @Core0.
>
> + - - - - - - - - - - - - - - - - - - - - - +
> '                Core #0/1/2                '
> '                                           '
> ' +--------+     +---------+     +--------+ '
> ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> ' +--------+     +---------+     +--------+ '
> '                                           '
> + - - - - - - - - - - - - - - - - - - - - - +
>
> Dispatch:
>
> Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
> Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
>
> .. code-block:: diff
>
>     + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
>     '  Core #0   '     '          Core #1         '     '  Core #2   '
>     '            '     '                          '     '            '
>     ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
>     ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
>     ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
>     '            '     '     |                    '     '      ^     '
>     + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
>                              |                                 |
>                              + - - - - - - - - - - - - - - - - +
>
>
> The patch set has been break down as below:
>
> 1. Split graph worker into common and default model part.
> 2. Inline graph node processing to make it reusable.
> 3. Add set/get APIs to choose worker model.
> 4. Introduce core affinity API to set the node run on specific worker core.
>   (only use in new model)
> 5. Introduce graph affinity API to bind one graph with specific worker
>   core.
> 6. Introduce graph clone API.
> 7. Introduce stream moving with scheduler work-queue in patch 8~12.
> 8. Add stats for new models.
> 9. Abstract default graph config process and integrate new model into
>   example/l3fwd-graph. Add new parameters for model choosing.
>
> We could run with new worker model by this:
> ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="dispatch"
>
> References:
> https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
>
>
> Zhirun Yan (16):
>   graph: rename rte_graph_work as common
>   graph: split graph worker into common and default model
>   graph: move node process into inline function
>   graph: add get/set graph worker model APIs
>   graph: introduce graph node core affinity API
>   graph: introduce graph bind unbind API
>   graph: move node clone name func into private as common
>   graph: introduce graph clone API for other worker core
>   graph: add structure for stream moving between cores
>   graph: introduce stream moving cross cores
>   graph: enable create and destroy graph scheduling workqueue
>   graph: introduce graph walk by cross-core dispatch
>   graph: enable graph multicore dispatch scheduler model
>   graph: add stats for mcore dispatch model
>   test/graph: add functional tests for mcore dispatch model
>   examples/l3fwd-graph: introduce mcore dispatch worker model
>
>  MAINTAINERS                                   |   3 +-
>  app/test/test_graph.c                         | 130 ++++
>  doc/guides/prog_guide/graph_lib.rst           |  71 ++-
>  doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
>  examples/l3fwd-graph/main.c                   | 230 +++++--
>  lib/graph/graph.c                             | 161 +++++
>  lib/graph/graph_debug.c                       |   6 +
>  lib/graph/graph_populate.c                    |   1 +
>  lib/graph/graph_private.h                     |  90 +++
>  lib/graph/graph_stats.c                       |  76 ++-
>  lib/graph/meson.build                         |   4 +-
>  lib/graph/node.c                              |  27 +-
>  lib/graph/rte_graph.h                         |  65 ++
>  lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
>  lib/graph/rte_graph_model_mcore_dispatch.h    | 134 ++++
>  lib/graph/rte_graph_model_rtc.h               |  46 ++
>  lib/graph/rte_graph_worker.c                  |  39 ++
>  lib/graph/rte_graph_worker.h                  | 503 +--------------
>  lib/graph/rte_graph_worker_common.h           | 598 ++++++++++++++++++
>  lib/graph/version.map                         |  11 +
>  20 files changed, 1834 insertions(+), 568 deletions(-)
>  create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
>  create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
>  create mode 100644 lib/graph/rte_graph_model_rtc.h
>  create mode 100644 lib/graph/rte_graph_worker.c
>  create mode 100644 lib/graph/rte_graph_worker_common.h
>
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v11 00/16] graph enhancement for multi-core dispatch
  2023-06-08 15:30                     ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Jerin Jacob
@ 2023-06-09 13:39                       ` David Marchand
  2023-06-09 14:36                         ` David Marchand
  0 siblings, 1 reply; 369+ messages in thread
From: David Marchand @ 2023-06-09 13:39 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Zhirun Yan, Thomas Monjalon, dev, jerinj, kirankumark,
	ndabilpuram, stephen, pbhagavatula, cunming.liang, haiyue.wang,
	mattias.ronnblom

On Thu, Jun 8, 2023 at 5:31 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
> On Thu, Jun 8, 2023 at 8:55 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > V11:
> > Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
> > Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
> > Update commit message in patch 16.
>
> + @Thomas Monjalon  This patch series looks good from my PoV to merge
> this in rc1. If you don't find any issue, please consider merging in
> this rc1.

Compilation is broken at patch 1 because of the file rename...
I hope I won't find anything else broken.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v11 00/16] graph enhancement for multi-core dispatch
  2023-06-09 13:39                       ` David Marchand
@ 2023-06-09 14:36                         ` David Marchand
  2023-06-09 15:47                           ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: David Marchand @ 2023-06-09 14:36 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Zhirun Yan, Thomas Monjalon, dev, jerinj, kirankumark,
	ndabilpuram, stephen, pbhagavatula, cunming.liang, haiyue.wang,
	mattias.ronnblom

On Fri, Jun 9, 2023 at 3:39 PM David Marchand <david.marchand@redhat.com> wrote:
>
> On Thu, Jun 8, 2023 at 5:31 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > On Thu, Jun 8, 2023 at 8:55 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > V11:
> > > Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
> > > Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
> > > Update commit message in patch 16.
> >
> > + @Thomas Monjalon  This patch series looks good from my PoV to merge
> > this in rc1. If you don't find any issue, please consider merging in
> > this rc1.
>
> Compilation is broken at patch 1 because of the file rename...
> I hope I won't find anything else broken.

Afaics, header exports are incorrect. I did not look further.

About headers exports: if a public header (declared in headers meson
var) includes sub_headerA,B,C, those sub headers must be listed either
in headers or indirect_headers meson variables.
Otherwise those sub headers won't be distributed for external
applications consumption.

For checking, I recommend running:
git rebase -i origin/main -x 'DPDK_ABI_REF_VERSION=v23.03
DPDK_BUILD_TEST_EXAMPLES=all DPDK_BUILD_TEST_DIR=$HOME/builds/main
./devtools/test-meson-builds.sh'


If we don't get fixes soon, this series will have to wait rc2 (or next release).


-- 
David Marchand


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v11 00/16] graph enhancement for multi-core dispatch
  2023-06-09 14:36                         ` David Marchand
@ 2023-06-09 15:47                           ` Yan, Zhirun
  2023-06-12 14:55                             ` David Marchand
  0 siblings, 1 reply; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-09 15:47 UTC (permalink / raw)
  To: David Marchand, Jerin Jacob
  Cc: Thomas Monjalon, dev, jerinj, kirankumark, ndabilpuram, stephen,
	pbhagavatula, Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, June 9, 2023 10:37 PM
> To: Jerin Jacob <jerinjacobk@gmail.com>
> Cc: Yan, Zhirun <zhirun.yan@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>; dev@dpdk.org; jerinj@marvell.com;
> kirankumark@marvell.com; ndabilpuram@marvell.com;
> stephen@networkplumber.org; pbhagavatula@marvell.com; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>;
> mattias.ronnblom <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v11 00/16] graph enhancement for multi-core dispatch
> 
> On Fri, Jun 9, 2023 at 3:39 PM David Marchand <david.marchand@redhat.com>
> wrote:
> >
> > On Thu, Jun 8, 2023 at 5:31 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > > On Thu, Jun 8, 2023 at 8:55 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > > >
> > > > V11:
> > > > Update comments and fix to add experimental flags for
> rte_graph_model_is_valid() in patch 04.
> > > > Update added symbols in alphabetical order in version.map with patch
> 04,05,06,08,10.
> > > > Update commit message in patch 16.
> > >
> > > + @Thomas Monjalon  This patch series looks good from my PoV to
> > > + merge
> > > this in rc1. If you don't find any issue, please consider merging in
> > > this rc1.
> >
> > Compilation is broken at patch 1 because of the file rename...
> > I hope I won't find anything else broken.
> 
> Afaics, header exports are incorrect. I did not look further.
> 
> About headers exports: if a public header (declared in headers meson
> var) includes sub_headerA,B,C, those sub headers must be listed either in
> headers or indirect_headers meson variables.
> Otherwise those sub headers won't be distributed for external applications
> consumption.


The changed public header is used by lib/node/*, l3fwd-graph and graph-test. 

Actually the first patch rename the public header, and then 2nd patch change it back.
It caused the broken at 1.

Patch 01 and 02 are split as 2 patches because I want to retain more git history and make
git log clean.
I think it could be fixed by changing the used public header to the new name in patch 01,
and change them back in patch 02. Is that a good way?

> 
> For checking, I recommend running:
> git rebase -i origin/main -x 'DPDK_ABI_REF_VERSION=v23.03
> DPDK_BUILD_TEST_EXAMPLES=all DPDK_BUILD_TEST_DIR=$HOME/builds/main
> ./devtools/test-meson-builds.sh'
> 
> 
> If we don't get fixes soon, this series will have to wait rc2 (or next release).
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 00/16] graph enhancement for multi-core dispatch
  2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                       ` (16 preceding siblings ...)
  2023-06-08 15:30                     ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Jerin Jacob
@ 2023-06-09 19:12                     ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 01/16] graph: rename rte_graph_work as common Zhirun Yan
                                         ` (16 more replies)
  17 siblings, 17 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V11:
Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
Update commit message in patch 16.

V10:
Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
in patch 04.
Change RTE_ASSERT to return in patch 06.
Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
Move stats into dispatch union in patch 14.
Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).

V9:
Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
Fix graph model check in rte_graph_worker_model_set() in patch 04.
Fix typo in doc.

V8:
No performance dorp for original l3fwd-graph and graph_perf_autotest.

Update graph model set/get functions and add graph_model_is_valid() in patch 04.
Update doc for new scheme usage(choose model in runtime or compile time).
Update dispatch schedule struct into union.
Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
Add model clone in graph_clone() in patch 08.
Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (16):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for mcore dispatch model
  test/graph: add functional tests for mcore dispatch model
  examples/l3fwd-graph: introduce mcore dispatch worker model

 MAINTAINERS                                   |   3 +-
 app/test/test_graph.c                         | 130 ++++
 doc/guides/prog_guide/graph_lib.rst           |  71 ++-
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
 examples/l3fwd-graph/main.c                   | 230 +++++--
 lib/graph/graph.c                             | 161 +++++
 lib/graph/graph_debug.c                       |   6 +
 lib/graph/graph_populate.c                    |   1 +
 lib/graph/graph_private.h                     |  90 +++
 lib/graph/graph_stats.c                       |  76 ++-
 lib/graph/meson.build                         |   4 +-
 lib/graph/node.c                              |  27 +-
 lib/graph/rte_graph.h                         |  65 ++
 lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
 lib/graph/rte_graph_model_mcore_dispatch.h    | 134 ++++
 lib/graph/rte_graph_model_rtc.h               |  46 ++
 lib/graph/rte_graph_worker.c                  |  39 ++
 lib/graph/rte_graph_worker.h                  | 503 +--------------
 lib/graph/rte_graph_worker_common.h           | 598 ++++++++++++++++++
 lib/graph/version.map                         |  11 +
 20 files changed, 1834 insertions(+), 568 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 01/16] graph: rename rte_graph_work as common
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 02/16] graph: split graph worker into common and default model Zhirun Yan
                                         ` (15 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 MAINTAINERS                                               | 3 ++-
 app/test/test_graph.c                                     | 2 +-
 app/test/test_graph_perf.c                                | 2 +-
 doc/api/doxy-api-index.md                                 | 2 +-
 doc/guides/prog_guide/graph_lib.rst                       | 2 +-
 examples/l3fwd-graph/main.c                               | 2 +-
 lib/graph/graph_pcap.c                                    | 2 +-
 lib/graph/graph_private.h                                 | 2 +-
 lib/graph/meson.build                                     | 2 +-
 .../{rte_graph_worker.h => rte_graph_worker_common.h}     | 8 ++++----
 lib/node/ethdev_rx.c                                      | 2 +-
 lib/node/ethdev_tx.c                                      | 2 +-
 lib/node/ip4_lookup.c                                     | 2 +-
 lib/node/ip4_rewrite.c                                    | 2 +-
 lib/node/pkt_cls.c                                        | 2 +-
 15 files changed, 19 insertions(+), 18 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index ac22093dd4..3b11305bc0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1721,10 +1721,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..c2c855f776 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -24,7 +24,7 @@ test_node_list_dump(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_mbuf.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_random.h>
diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index c5b463f700..5c4e9c917b 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -23,7 +23,7 @@ test_graph_perf_func(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_lcore.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index c709fd48ad..6804d02c3c 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -201,7 +201,7 @@ The public API headers are grouped by topics:
     [table_em](@ref rte_swx_table_em.h)
     [table_wm](@ref rte_swx_table_wm.h)
   * [graph](@ref rte_graph.h):
-    [graph_worker](@ref rte_graph_worker.h)
+    [graph_worker](@ref rte_graph_worker_common.h)
   * graph_nodes:
     [eth_node](@ref rte_node_eth_api.h),
     [ip4_node](@ref rte_node_ip4_api.h)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..3eaa4b7f92 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -26,7 +26,7 @@ Features of the Graph library are:
 - Low overhead statistics collection infrastructure.
 - Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
 - Allow having another graph walk implementation in the future by segregating
-  the fast path(``rte_graph_worker.h``) and slow path code.
+  the fast path(``rte_graph_worker_common.h``) and slow path code.
 
 Advantages of Graph architecture
 --------------------------------
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..48825da0ce 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,7 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 98%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..5a4b54e490 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,11 +2,11 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
- * @file rte_graph_worker.h
+ * @file rte_graph_worker_common.h
  *
  * @warning
  * @b EXPERIMENTAL:
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
diff --git a/lib/node/ethdev_rx.c b/lib/node/ethdev_rx.c
index a19237b42f..e681f9f6ec 100644
--- a/lib/node/ethdev_rx.c
+++ b/lib/node/ethdev_rx.c
@@ -6,7 +6,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "ethdev_rx_priv.h"
 #include "node_private.h"
diff --git a/lib/node/ethdev_tx.c b/lib/node/ethdev_tx.c
index 7d2d72f823..17231d8b34 100644
--- a/lib/node/ethdev_tx.c
+++ b/lib/node/ethdev_tx.c
@@ -5,7 +5,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "ethdev_tx_priv.h"
 
diff --git a/lib/node/ip4_lookup.c b/lib/node/ip4_lookup.c
index 8bce03d7db..b84c066fe3 100644
--- a/lib/node/ip4_lookup.c
+++ b/lib/node/ip4_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
diff --git a/lib/node/ip4_rewrite.c b/lib/node/ip4_rewrite.c
index 34a920df5e..72ca4b1370 100644
--- a/lib/node/ip4_rewrite.c
+++ b/lib/node/ip4_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/pkt_cls.c b/lib/node/pkt_cls.c
index 3e75f2cf78..03f8b03f6c 100644
--- a/lib/node/pkt_cls.c
+++ b/lib/node/pkt_cls.c
@@ -3,7 +3,7 @@
  */
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "pkt_cls_priv.h"
 #include "node_private.h"
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 02/16] graph: split graph worker into common and default model
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 01/16] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 03/16] graph: move node process into inline function Zhirun Yan
                                         ` (14 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test/test_graph.c               |  2 +-
 app/test/test_graph_perf.c          |  2 +-
 doc/api/doxy-api-index.md           |  2 +-
 doc/guides/prog_guide/graph_lib.rst |  2 +-
 examples/l3fwd-graph/main.c         |  2 +-
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  2 +-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 lib/node/ethdev_rx.c                |  2 +-
 lib/node/ethdev_tx.c                |  2 +-
 lib/node/ip4_lookup.c               |  2 +-
 lib/node/ip4_rewrite.c              |  2 +-
 lib/node/pkt_cls.c                  |  2 +-
 16 files changed, 110 insertions(+), 70 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index c2c855f776..1a2d1e6fab 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -24,7 +24,7 @@ test_node_list_dump(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_mbuf.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_random.h>
diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index 5c4e9c917b..c5b463f700 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -23,7 +23,7 @@ test_graph_perf_func(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_lcore.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6804d02c3c..c709fd48ad 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -201,7 +201,7 @@ The public API headers are grouped by topics:
     [table_em](@ref rte_swx_table_em.h)
     [table_wm](@ref rte_swx_table_wm.h)
   * [graph](@ref rte_graph.h):
-    [graph_worker](@ref rte_graph_worker_common.h)
+    [graph_worker](@ref rte_graph_worker.h)
   * graph_nodes:
     [eth_node](@ref rte_node_eth_api.h),
     [ip4_node](@ref rte_node_ip4_api.h)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 3eaa4b7f92..1cfdc86433 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -26,7 +26,7 @@ Features of the Graph library are:
 - Low overhead statistics collection infrastructure.
 - Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
 - Allow having another graph walk implementation in the future by segregating
-  the fast path(``rte_graph_worker_common.h``) and slow path code.
+  the fast path(``rte_graph_worker.h``) and slow path code.
 
 Advantages of Graph architecture
 --------------------------------
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 48825da0ce..5feeab4f0f 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,7 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 5a4b54e490..475ccdc0ee 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
diff --git a/lib/node/ethdev_rx.c b/lib/node/ethdev_rx.c
index e681f9f6ec..a19237b42f 100644
--- a/lib/node/ethdev_rx.c
+++ b/lib/node/ethdev_rx.c
@@ -6,7 +6,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "ethdev_rx_priv.h"
 #include "node_private.h"
diff --git a/lib/node/ethdev_tx.c b/lib/node/ethdev_tx.c
index 17231d8b34..7d2d72f823 100644
--- a/lib/node/ethdev_tx.c
+++ b/lib/node/ethdev_tx.c
@@ -5,7 +5,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "ethdev_tx_priv.h"
 
diff --git a/lib/node/ip4_lookup.c b/lib/node/ip4_lookup.c
index b84c066fe3..8bce03d7db 100644
--- a/lib/node/ip4_lookup.c
+++ b/lib/node/ip4_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
diff --git a/lib/node/ip4_rewrite.c b/lib/node/ip4_rewrite.c
index 72ca4b1370..34a920df5e 100644
--- a/lib/node/ip4_rewrite.c
+++ b/lib/node/ip4_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/pkt_cls.c b/lib/node/pkt_cls.c
index 03f8b03f6c..3e75f2cf78 100644
--- a/lib/node/pkt_cls.c
+++ b/lib/node/pkt_cls.c
@@ -3,7 +3,7 @@
  */
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "pkt_cls_priv.h"
 #include "node_private.h"
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 03/16] graph: move node process into inline function
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 01/16] graph: rename rte_graph_work as common Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 02/16] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 04/16] graph: add get/set graph worker model APIs Zhirun Yan
                                         ` (13 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 475ccdc0ee..a90addb172 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 04/16] graph: add get/set graph worker model APIs
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (2 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 03/16] graph: move node process into inline function Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 05/16] graph: introduce graph node core affinity API Zhirun Yan
                                         ` (12 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 39 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 70 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  5 +++
 4 files changed, 115 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..9fab8243da 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..7e2a918fae
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+bool
+rte_graph_model_is_valid(uint8_t model)
+{
+	if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		return false;
+
+	return true;
+}
+
+int
+rte_graph_worker_model_set(uint8_t model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	if (!rte_graph_model_is_valid(model))
+		return -EINVAL;
+
+	STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+
+	return 0;
+}
+
+uint8_t
+rte_graph_worker_model_get(struct rte_graph *graph)
+{
+	if (!rte_graph_model_is_valid(graph->model))
+		return -EINVAL;
+
+	return graph->model;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a90addb172..123600f939 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+/* When adding a new graph model entry, update rte_graph_model_is_valid() implementation. */
+#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+/**< Dispatch model to support cross-core dispatching within core affinity. */
+#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
+
 /**
  * @internal
  *
@@ -41,6 +48,9 @@ struct rte_graph {
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	uint8_t model;		     /**< graph model */
+	uint8_t reserved1;	     /**< Reserved for future use. */
+	uint16_t reserved2;	     /**< Reserved for future use. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -490,6 +500,66 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+/**
+ * Test the validity of model.
+ *
+ * @param model
+ *   Model to check.
+ *
+ * @return
+ *   True if graph model is valid, false otherwise.
+ */
+__rte_experimental
+bool
+rte_graph_model_is_valid(uint8_t model);
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param model
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_graph_worker_model_set(uint8_t model);
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for slow path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+uint8_t rte_graph_worker_model_get(struct rte_graph *graph);
+
+/**
+ * Get the graph worker model without check
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for fast path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static __rte_always_inline
+uint8_t rte_graph_worker_model_no_check_get(struct rte_graph *graph)
+{
+	return graph->model;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..e9a680a45e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -14,10 +14,15 @@ EXPERIMENTAL {
 	rte_graph_lookup;
 	rte_graph_list_dump;
 	rte_graph_max_count;
+	rte_graph_model_is_valid;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_worker_model_get;
+	rte_graph_worker_model_no_check_get;
+	rte_graph_worker_model_set;
+
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 05/16] graph: introduce graph node core affinity API
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (3 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 04/16] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 06/16] graph: introduce graph bind unbind API Zhirun Yan
                                         ` (11 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  1 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  1 +
 6 files changed, 80 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 9fab8243da..0685cf9e72 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index e9a680a45e..6ae19b0d6e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,7 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 06/16] graph: introduce graph bind unbind API
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (4 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 05/16] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 07/16] graph: move node clone name func into private as common Zhirun Yan
                                         ` (10 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 60 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 ++++++++++++++
 lib/graph/version.map     |  3 +-
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..8d5bd8b9ae 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,65 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	if (graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		goto fail;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 6ae19b0d6e..9a20dba5e7 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,8 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
@@ -24,7 +26,6 @@ EXPERIMENTAL {
 	rte_graph_worker_model_no_check_get;
 	rte_graph_worker_model_set;
 
-
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
 	rte_graph_cluster_stats_get;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 07/16] graph: move node clone name func into private as common
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (5 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 06/16] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
                                         ` (9 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 08/16] graph: introduce graph clone API for other worker core
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (6 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 07/16] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 09/16] graph: add structure for stream moving between cores Zhirun Yan
                                         ` (8 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 89 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 112 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8d5bd8b9ae..1b34f0e543 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -405,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -469,6 +470,94 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Clone the graph model */
+	graph->graph->model = parent_graph->graph->model;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9a20dba5e7..9e92b54ffa 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -5,6 +5,7 @@ EXPERIMENTAL {
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
 
+	rte_graph_clone;
 	rte_graph_create;
 	rte_graph_destroy;
 	rte_graph_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 09/16] graph: add structure for stream moving between cores
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (7 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 10/16] graph: introduce stream moving cross cores Zhirun Yan
                                         ` (7 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 1b34f0e543..968cbbf86c 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -291,6 +291,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 		goto fail;
 
 	graph->lcore_id = lcore;
+	graph->graph->dispatch.lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -314,6 +315,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->dispatch.lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 123600f939..b34cdf1ffb 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -36,12 +36,20 @@ extern "C" {
 /**< Dispatch model to support cross-core dispatching within core affinity. */
 #define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
  * Data structure to hold graph data.
  */
 struct rte_graph {
+	/* Fast path area. */
 	uint32_t tail;		     /**< Tail of circular buffer. */
 	uint32_t head;		     /**< Head of circular buffer. */
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
@@ -51,6 +59,20 @@ struct rte_graph {
 	uint8_t model;		     /**< graph model */
 	uint8_t reserved1;	     /**< Reserved for future use. */
 	uint16_t reserved2;	     /**< Reserved for future use. */
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+			struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+			unsigned int lcore_id;  /**< The graph running Lcore. */
+			struct rte_ring *wq;    /**< The work-queue for pending streams. */
+			struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+		} dispatch; /** Only used by dispatch model */
+	};
+	SLIST_ENTRY(rte_graph) next;   /* The next for rte_graph list */
+	/* End of Fast path area.*/
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -83,6 +105,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			unsigned int lcore_id;  /**< Node running lcore. */
+		} dispatch;
+	};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 10/16] graph: introduce stream moving cross cores
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (8 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 09/16] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                                         ` (6 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c                          |   6 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   3 +
 7 files changed, 255 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 968cbbf86c..41251e3435 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -473,7 +473,7 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -547,14 +547,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 0685cf9e72..9d51eabe33 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -20,4 +20,4 @@ sources = files(
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..8f4bc860ab 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->dispatch.wq = rte_ring_create(graph->name, wq_size, graph->socket,
+					     flags);
+	if (graph->dispatch.wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
+						sizeof(struct graph_mcore_dispatch_wq_node),
+						0, 0, NULL, NULL, NULL, NULL,
+						graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->dispatch.mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->dispatch.lcore_id = _graph->lcore_id;
+
+	if (parent_graph->dispatch.rq == NULL) {
+		parent_graph->dispatch.rq = &parent_graph->dispatch.rq_head;
+		SLIST_INIT(parent_graph->dispatch.rq);
+	}
+
+	graph->dispatch.rq = parent_graph->dispatch.rq;
+	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+
+	rte_mempool_free(graph->dispatch.mp);
+	graph->dispatch.mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void *)&wq_node,
+					     sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, next)
+		if (graph->dispatch.lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->dispatch.mp;
+	struct rte_ring *wq = graph->dispatch.wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9e92b54ffa..7e985d6308 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -1,6 +1,9 @@
 EXPERIMENTAL {
 	global:
 
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+
 	__rte_node_register;
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 11/16] graph: enable create and destroy graph scheduling workqueue
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (9 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 10/16] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                                         ` (5 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 41251e3435..0c28d925bc 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -451,6 +451,11 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get(graph->graph) ==
+			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -524,6 +529,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	/* Clone the graph model */
 	graph->graph->model = parent_graph->graph->model;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get(graph->graph) == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 12/16] graph: introduce graph walk by cross-core dispatch
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (10 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                                         ` (4 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..c78a3bbdf9 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,50 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->dispatch.wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
+		    graph->dispatch.rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->dispatch.rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 13/16] graph: enable graph multicore dispatch scheduler model
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (11 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 14/16] graph: add stats for mcore dispatch model Zhirun Yan
                                         ` (3 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model. Must define
RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
to enable specific model choosing.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/prog_guide/graph_lib.rst | 71 ++++++++++++++++++++++++++---
 lib/graph/rte_graph_worker.h        | 13 ++++++
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..017cc25fd3 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,13 +189,70 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
-are designed to work on single-core to have better performance.
-The fast path API works on graph object, So the multi-core graph
-processing strategy would be to create graph object PER WORKER.
+Graph models
+~~~~~~~~~~~~
+There are two different kinds of graph walking models. User can select the model using
+``rte_graph_worker_model_set()`` API. If the application decides to use only one model,
+the fast path check can be avoided by defining the model with RTE_GRAPH_MODEL_SELECT.
+For example:
+
+.. code-block:: console
+
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
+#include "rte_graph_worker.h"
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically, ``rte_graph_walk_rtc()`` and
+``rte_node_enqueue*`` fast path API functions are designed to work on single-core to
+have better performance. The fast path API works on graph object, So the multi-core
+graph processing strategy would be to create graph object PER WORKER.
+
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
 
 In fast path
 ~~~~~~~~~~~~
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..6685600813 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,19 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)
 	rte_graph_walk_rtc(graph);
+#elif defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	switch (rte_graph_worker_model_no_check_get(graph)) {
+	case RTE_GRAPH_MODEL_MCORE_DISPATCH:
+		rte_graph_walk_mcore_dispatch(graph);
+		break;
+	default:
+		rte_graph_walk_rtc(graph);
+	}
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 14/16] graph: add stats for mcore dispatch model
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (12 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 15/16] test/graph: add functional tests " Zhirun Yan
                                         ` (2 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for mcore dispatch model if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..9def3067ec 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get(g) == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->dispatch.total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->dispatch.total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..cc32245c05 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,28 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +104,22 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->dispatch.total_sched_objs;
+			sched_fail += node->dispatch.total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 8f4bc860ab..d1291b8c57 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->dispatch.total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->dispatch.total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b34cdf1ffb..a3824590cd 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		struct {
 			unsigned int lcore_id;  /**< Node running lcore. */
+			uint64_t total_sched_objs; /**< Number of objects scheduled. */
+			uint64_t total_sched_fail; /**< Number of scheduled failure. */
 		} dispatch;
 	};
 	/* Fast path area  */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 15/16] test/graph: add functional tests for mcore dispatch model
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (13 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 14/16] graph: add stats for mcore dispatch model Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-09 19:12                       ` [PATCH v12 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 app/test/test_graph.c | 130 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..8609c0b3a4 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,132 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+	int ret = 0;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		ret = -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		ret = -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->dispatch.lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		ret = -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->dispatch.lcore_id);
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test3");
+	if (rte_graph_worker_model_get(graph) != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v12 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (14 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 15/16] test/graph: add functional tests " Zhirun Yan
@ 2023-06-09 19:12                       ` Zhirun Yan
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-09 19:12 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

RTE_GRAPH_MODEL_SELECT is set to RTE_GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT if set it as rtc or mcore
dispatch explicitly. If not define it, it could choose by param model
in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 ++
 examples/l3fwd-graph/main.c                   | 230 +++++++++++++++---
 2 files changed, 208 insertions(+), 38 deletions(-)

diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index 585ac8c898..7189fa33ec 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -54,6 +54,7 @@ The application has a number of command line options similar to l3fwd::
                                    [--pcap-enable]
                                    [--pcap-num-cap]
                                    [--pcap-file-name]
+                                   [--model]
 
 Where,
 
@@ -78,6 +79,8 @@ Where,
 
 * ``--pcap-file-name:`` Optional, Pcap filename to capture packets in.
 
+* ``--model:`` Optional, select graph walking model.
+
 For example, consider a dual processor socket platform with 8 physical cores, where cores 0-7 and 16-23 appear on socket 0,
 while cores 8-15 and 24-31 appear on socket 1.
 
@@ -122,6 +125,19 @@ In this command:
 
 *   The --pcap-file-name option enables user to give filename in which packets are to be captured.
 
+To enable mcore dispatch model, the application need change RTE_GRAPH_MODEL_SELECT to ``#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH``
+before including rte_graph_worker.h. Recompile and use following command:
+
+.. code-block:: console
+
+    ./<build_dir>/examples/dpdk-l3fwd-graph -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P --model="dispatch"
+
+To enable graph walking model selection in run-time, remove the define of ``RTE_GRAPH_MODEL_SELECT``. Recompile and use the same command.
+
+In this command:
+
+*   The --model option enables user to select ``rtc`` or ``dispatch`` model.
+
 Refer to the *DPDK Getting Started Guide* for general information on running applications and
 the Environment Abstraction Layer (EAL) options.
 
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..be69fcace1 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -55,6 +56,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +92,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static uint8_t model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +159,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +295,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +338,23 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+	else
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+#if defined(RTE_GRAPH_MODEL_SELECT)
+	if (model_conf != RTE_GRAPH_MODEL_SELECT)
+		printf("Warning: model mismatch, will use the RTE_GRAPH_MODEL_SELECT model\n");
+	model_conf = RTE_GRAPH_MODEL_SELECT;
+#endif
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +471,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +488,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +500,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +592,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -788,6 +834,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -840,6 +1022,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1264,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v11 00/16] graph enhancement for multi-core dispatch
  2023-06-09 15:47                           ` Yan, Zhirun
@ 2023-06-12 14:55                             ` David Marchand
  2023-06-13  8:06                               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: David Marchand @ 2023-06-12 14:55 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: Jerin Jacob, Thomas Monjalon, dev, jerinj, kirankumark,
	ndabilpuram, stephen, pbhagavatula, Liang, Cunming, Wang, Haiyue,
	mattias.ronnblom

On Fri, Jun 9, 2023 at 5:48 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > > Compilation is broken at patch 1 because of the file rename...
> > > I hope I won't find anything else broken.
> >
> > Afaics, header exports are incorrect. I did not look further.
> >
> > About headers exports: if a public header (declared in headers meson
> > var) includes sub_headerA,B,C, those sub headers must be listed either in
> > headers or indirect_headers meson variables.
> > Otherwise those sub headers won't be distributed for external applications
> > consumption.
>
>
> The changed public header is used by lib/node/*, l3fwd-graph and graph-test.
>
> Actually the first patch rename the public header, and then 2nd patch change it back.
> It caused the broken at 1.
>
> Patch 01 and 02 are split as 2 patches because I want to retain more git history and make
> git log clean.
> I think it could be fixed by changing the used public header to the new name in patch 01,
> and change them back in patch 02. Is that a good way?

Another option was to keep a (almost empty) rte_graph_worker.h header
in patch 1 that includes the new _common header.
Then update in patch 2, etc...


> >
> > For checking, I recommend running:
> > git rebase -i origin/main -x 'DPDK_ABI_REF_VERSION=v23.03
> > DPDK_BUILD_TEST_EXAMPLES=all DPDK_BUILD_TEST_DIR=$HOME/builds/main
> > ./devtools/test-meson-builds.sh'

However, the v12 revision that got posted does not fix the other point
I reported.

If you split a public header, *all* sub headers must be exported as
public headers too.
I wrote some new test for the CI, and put the v12 series on top of it.
As expected, inclusion of the graph header is broken out of dpdk.
See the graph example compilation:
https://github.com/david-marchand/dpdk/actions/runs/5244846217/jobs/9471420118#step:18:12092


-- 
David Marchand


^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v11 00/16] graph enhancement for multi-core dispatch
  2023-06-12 14:55                             ` David Marchand
@ 2023-06-13  8:06                               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-13  8:06 UTC (permalink / raw)
  To: David Marchand
  Cc: Jerin Jacob, Thomas Monjalon, dev, jerinj, kirankumark,
	ndabilpuram, stephen, pbhagavatula, Liang, Cunming, Wang, Haiyue,
	mattias.ronnblom



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Monday, June 12, 2023 10:56 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: Jerin Jacob <jerinjacobk@gmail.com>; Thomas Monjalon
> <thomas@monjalon.net>; dev@dpdk.org; jerinj@marvell.com;
> kirankumark@marvell.com; ndabilpuram@marvell.com;
> stephen@networkplumber.org; pbhagavatula@marvell.com; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>;
> mattias.ronnblom <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v11 00/16] graph enhancement for multi-core dispatch
> 
> On Fri, Jun 9, 2023 at 5:48 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > > > Compilation is broken at patch 1 because of the file rename...
> > > > I hope I won't find anything else broken.
> > >
> > > Afaics, header exports are incorrect. I did not look further.
> > >
> > > About headers exports: if a public header (declared in headers meson
> > > var) includes sub_headerA,B,C, those sub headers must be listed
> > > either in headers or indirect_headers meson variables.
> > > Otherwise those sub headers won't be distributed for external
> > > applications consumption.
> >
> >
> > The changed public header is used by lib/node/*, l3fwd-graph and graph-test.
> >
> > Actually the first patch rename the public header, and then 2nd patch change it
> back.
> > It caused the broken at 1.
> >
> > Patch 01 and 02 are split as 2 patches because I want to retain more
> > git history and make git log clean.
> > I think it could be fixed by changing the used public header to the
> > new name in patch 01, and change them back in patch 02. Is that a good way?
> 
> Another option was to keep a (almost empty) rte_graph_worker.h header in
> patch 1 that includes the new _common header.
> Then update in patch 2, etc...
> 
Thanks. 
Actually, it will cause the patch too big, the git log will have much del/add, it is also not friendly for review. So it may be more clear to rename it to fix.
> 
> > >
> > > For checking, I recommend running:
> > > git rebase -i origin/main -x 'DPDK_ABI_REF_VERSION=v23.03
> > > DPDK_BUILD_TEST_EXAMPLES=all
> DPDK_BUILD_TEST_DIR=$HOME/builds/main
> > > ./devtools/test-meson-builds.sh'
> 
> However, the v12 revision that got posted does not fix the other point I reported.
> 
> If you split a public header, *all* sub headers must be exported as public headers
> too.
> I wrote some new test for the CI, and put the v12 series on top of it.
> As expected, inclusion of the graph header is broken out of dpdk.
> See the graph example compilation:
> https://github.com/david-
> marchand/dpdk/actions/runs/5244846217/jobs/9471420118#step:18:12092
> 

Got it.  All sub header must be exported as public. There are some static functions in sub header.
I will list all sub header to export. Please see the next version. Thanks.
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 00/16] graph enhancement for multi-core dispatch
  2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
                                         ` (15 preceding siblings ...)
  2023-06-09 19:12                       ` [PATCH v12 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
@ 2023-06-13 10:13                       ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 01/16] graph: rename rte_graph_work as common Zhirun Yan
                                           ` (17 more replies)
  16 siblings, 18 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V13:
Add sub header into meson indirect_headers list to export.(change meson.build in patch02, 05)

V12:
Fix compilation broken at patch 1.(keep renamed header align with patch 1,2)

V11:
Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
Update commit message in patch 16.

V10:
Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
in patch 04.
Change RTE_ASSERT to return in patch 06.
Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
Move stats into dispatch union in patch 14.
Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).

V9:
Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
Fix graph model check in rte_graph_worker_model_set() in patch 04.
Fix typo in doc.

V8:
No performance dorp for original l3fwd-graph and graph_perf_autotest.

Update graph model set/get functions and add graph_model_is_valid() in patch 04.
Update doc for new scheme usage(choose model in runtime or compile time).
Update dispatch schedule struct into union.
Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
Add model clone in graph_clone() in patch 08.
Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (16):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for mcore dispatch model
  test/graph: add functional tests for mcore dispatch model
  examples/l3fwd-graph: introduce mcore dispatch worker model

 MAINTAINERS                                   |   3 +-
 app/test/test_graph.c                         | 130 ++++
 doc/guides/prog_guide/graph_lib.rst           |  71 ++-
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
 examples/l3fwd-graph/main.c                   | 230 +++++--
 lib/graph/graph.c                             | 161 +++++
 lib/graph/graph_debug.c                       |   6 +
 lib/graph/graph_populate.c                    |   1 +
 lib/graph/graph_private.h                     |  90 +++
 lib/graph/graph_stats.c                       |  76 ++-
 lib/graph/meson.build                         |   9 +-
 lib/graph/node.c                              |  27 +-
 lib/graph/rte_graph.h                         |  65 ++
 lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
 lib/graph/rte_graph_model_mcore_dispatch.h    | 134 ++++
 lib/graph/rte_graph_model_rtc.h               |  46 ++
 lib/graph/rte_graph_worker.c                  |  39 ++
 lib/graph/rte_graph_worker.h                  | 503 +--------------
 lib/graph/rte_graph_worker_common.h           | 598 ++++++++++++++++++
 lib/graph/version.map                         |  11 +
 20 files changed, 1839 insertions(+), 568 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 01/16] graph: rename rte_graph_work as common
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 02/16] graph: split graph worker into common and default model Zhirun Yan
                                           ` (16 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 MAINTAINERS                                               | 3 ++-
 app/test/test_graph.c                                     | 2 +-
 app/test/test_graph_perf.c                                | 2 +-
 doc/api/doxy-api-index.md                                 | 2 +-
 doc/guides/prog_guide/graph_lib.rst                       | 2 +-
 examples/l3fwd-graph/main.c                               | 2 +-
 lib/graph/graph_pcap.c                                    | 2 +-
 lib/graph/graph_private.h                                 | 2 +-
 lib/graph/meson.build                                     | 2 +-
 .../{rte_graph_worker.h => rte_graph_worker_common.h}     | 8 ++++----
 lib/node/ethdev_rx.c                                      | 2 +-
 lib/node/ethdev_tx.c                                      | 2 +-
 lib/node/ip4_lookup.c                                     | 2 +-
 lib/node/ip4_rewrite.c                                    | 2 +-
 lib/node/pkt_cls.c                                        | 2 +-
 15 files changed, 19 insertions(+), 18 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index ac22093dd4..3b11305bc0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1721,10 +1721,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..c2c855f776 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -24,7 +24,7 @@ test_node_list_dump(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_mbuf.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_random.h>
diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index c5b463f700..5c4e9c917b 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -23,7 +23,7 @@ test_graph_perf_func(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_lcore.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index c709fd48ad..6804d02c3c 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -201,7 +201,7 @@ The public API headers are grouped by topics:
     [table_em](@ref rte_swx_table_em.h)
     [table_wm](@ref rte_swx_table_wm.h)
   * [graph](@ref rte_graph.h):
-    [graph_worker](@ref rte_graph_worker.h)
+    [graph_worker](@ref rte_graph_worker_common.h)
   * graph_nodes:
     [eth_node](@ref rte_node_eth_api.h),
     [ip4_node](@ref rte_node_ip4_api.h)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..3eaa4b7f92 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -26,7 +26,7 @@ Features of the Graph library are:
 - Low overhead statistics collection infrastructure.
 - Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
 - Allow having another graph walk implementation in the future by segregating
-  the fast path(``rte_graph_worker.h``) and slow path code.
+  the fast path(``rte_graph_worker_common.h``) and slow path code.
 
 Advantages of Graph architecture
 --------------------------------
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..48825da0ce 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,7 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c43330029..8a220370fa 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 98%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..5a4b54e490 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,11 +2,11 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
- * @file rte_graph_worker.h
+ * @file rte_graph_worker_common.h
  *
  * @warning
  * @b EXPERIMENTAL:
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
diff --git a/lib/node/ethdev_rx.c b/lib/node/ethdev_rx.c
index a19237b42f..e681f9f6ec 100644
--- a/lib/node/ethdev_rx.c
+++ b/lib/node/ethdev_rx.c
@@ -6,7 +6,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "ethdev_rx_priv.h"
 #include "node_private.h"
diff --git a/lib/node/ethdev_tx.c b/lib/node/ethdev_tx.c
index 7d2d72f823..17231d8b34 100644
--- a/lib/node/ethdev_tx.c
+++ b/lib/node/ethdev_tx.c
@@ -5,7 +5,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "ethdev_tx_priv.h"
 
diff --git a/lib/node/ip4_lookup.c b/lib/node/ip4_lookup.c
index 8bce03d7db..b84c066fe3 100644
--- a/lib/node/ip4_lookup.c
+++ b/lib/node/ip4_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
diff --git a/lib/node/ip4_rewrite.c b/lib/node/ip4_rewrite.c
index 34a920df5e..72ca4b1370 100644
--- a/lib/node/ip4_rewrite.c
+++ b/lib/node/ip4_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/pkt_cls.c b/lib/node/pkt_cls.c
index 3e75f2cf78..03f8b03f6c 100644
--- a/lib/node/pkt_cls.c
+++ b/lib/node/pkt_cls.c
@@ -3,7 +3,7 @@
  */
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "pkt_cls_priv.h"
 #include "node_private.h"
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 02/16] graph: split graph worker into common and default model
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 01/16] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 03/16] graph: move node process into inline function Zhirun Yan
                                           ` (15 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test/test_graph.c               |  2 +-
 app/test/test_graph_perf.c          |  2 +-
 doc/api/doxy-api-index.md           |  2 +-
 doc/guides/prog_guide/graph_lib.rst |  2 +-
 examples/l3fwd-graph/main.c         |  2 +-
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  6 ++-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 lib/node/ethdev_rx.c                |  2 +-
 lib/node/ethdev_tx.c                |  2 +-
 lib/node/ip4_lookup.c               |  2 +-
 lib/node/ip4_rewrite.c              |  2 +-
 lib/node/pkt_cls.c                  |  2 +-
 16 files changed, 114 insertions(+), 70 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index c2c855f776..1a2d1e6fab 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -24,7 +24,7 @@ test_node_list_dump(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_mbuf.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_random.h>
diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index 5c4e9c917b..c5b463f700 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -23,7 +23,7 @@ test_graph_perf_func(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_lcore.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6804d02c3c..c709fd48ad 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -201,7 +201,7 @@ The public API headers are grouped by topics:
     [table_em](@ref rte_swx_table_em.h)
     [table_wm](@ref rte_swx_table_wm.h)
   * [graph](@ref rte_graph.h):
-    [graph_worker](@ref rte_graph_worker_common.h)
+    [graph_worker](@ref rte_graph_worker.h)
   * graph_nodes:
     [eth_node](@ref rte_node_eth_api.h),
     [ip4_node](@ref rte_node_ip4_api.h)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 3eaa4b7f92..1cfdc86433 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -26,7 +26,7 @@ Features of the Graph library are:
 - Low overhead statistics collection infrastructure.
 - Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
 - Allow having another graph walk implementation in the future by segregating
-  the fast path(``rte_graph_worker_common.h``) and slow path code.
+  the fast path(``rte_graph_worker.h``) and slow path code.
 
 Advantages of Graph architecture
 --------------------------------
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 48825da0ce..5feeab4f0f 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,7 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 8a220370fa..6c43330029 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -10,7 +10,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..31d78a1dc2 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,10 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
+indirect_headers += files(
+        'rte_graph_model_rtc.h',
+        'rte_graph_worker_common.h',
+        )
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 5a4b54e490..475ccdc0ee 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
diff --git a/lib/node/ethdev_rx.c b/lib/node/ethdev_rx.c
index e681f9f6ec..a19237b42f 100644
--- a/lib/node/ethdev_rx.c
+++ b/lib/node/ethdev_rx.c
@@ -6,7 +6,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "ethdev_rx_priv.h"
 #include "node_private.h"
diff --git a/lib/node/ethdev_tx.c b/lib/node/ethdev_tx.c
index 17231d8b34..7d2d72f823 100644
--- a/lib/node/ethdev_tx.c
+++ b/lib/node/ethdev_tx.c
@@ -5,7 +5,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "ethdev_tx_priv.h"
 
diff --git a/lib/node/ip4_lookup.c b/lib/node/ip4_lookup.c
index b84c066fe3..8bce03d7db 100644
--- a/lib/node/ip4_lookup.c
+++ b/lib/node/ip4_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
diff --git a/lib/node/ip4_rewrite.c b/lib/node/ip4_rewrite.c
index 72ca4b1370..34a920df5e 100644
--- a/lib/node/ip4_rewrite.c
+++ b/lib/node/ip4_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/pkt_cls.c b/lib/node/pkt_cls.c
index 03f8b03f6c..3e75f2cf78 100644
--- a/lib/node/pkt_cls.c
+++ b/lib/node/pkt_cls.c
@@ -3,7 +3,7 @@
  */
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "pkt_cls_priv.h"
 #include "node_private.h"
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 03/16] graph: move node process into inline function
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 01/16] graph: rename rte_graph_work as common Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 02/16] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 04/16] graph: add get/set graph worker model APIs Zhirun Yan
                                           ` (14 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 475ccdc0ee..a90addb172 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 04/16] graph: add get/set graph worker model APIs
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (2 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 03/16] graph: move node process into inline function Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 05/16] graph: introduce graph node core affinity API Zhirun Yan
                                           ` (13 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 39 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 70 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  5 +++
 4 files changed, 115 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 31d78a1dc2..5661dd855b 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 indirect_headers += files(
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..7e2a918fae
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+bool
+rte_graph_model_is_valid(uint8_t model)
+{
+	if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		return false;
+
+	return true;
+}
+
+int
+rte_graph_worker_model_set(uint8_t model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	if (!rte_graph_model_is_valid(model))
+		return -EINVAL;
+
+	STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+
+	return 0;
+}
+
+uint8_t
+rte_graph_worker_model_get(struct rte_graph *graph)
+{
+	if (!rte_graph_model_is_valid(graph->model))
+		return -EINVAL;
+
+	return graph->model;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a90addb172..123600f939 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+/* When adding a new graph model entry, update rte_graph_model_is_valid() implementation. */
+#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+/**< Dispatch model to support cross-core dispatching within core affinity. */
+#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
+
 /**
  * @internal
  *
@@ -41,6 +48,9 @@ struct rte_graph {
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	uint8_t model;		     /**< graph model */
+	uint8_t reserved1;	     /**< Reserved for future use. */
+	uint16_t reserved2;	     /**< Reserved for future use. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -490,6 +500,66 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+/**
+ * Test the validity of model.
+ *
+ * @param model
+ *   Model to check.
+ *
+ * @return
+ *   True if graph model is valid, false otherwise.
+ */
+__rte_experimental
+bool
+rte_graph_model_is_valid(uint8_t model);
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param model
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_graph_worker_model_set(uint8_t model);
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for slow path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+uint8_t rte_graph_worker_model_get(struct rte_graph *graph);
+
+/**
+ * Get the graph worker model without check
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for fast path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static __rte_always_inline
+uint8_t rte_graph_worker_model_no_check_get(struct rte_graph *graph)
+{
+	return graph->model;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..e9a680a45e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -14,10 +14,15 @@ EXPERIMENTAL {
 	rte_graph_lookup;
 	rte_graph_list_dump;
 	rte_graph_max_count;
+	rte_graph_model_is_valid;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_worker_model_get;
+	rte_graph_worker_model_no_check_get;
+	rte_graph_worker_model_set;
+
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 05/16] graph: introduce graph node core affinity API
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (3 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 04/16] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 06/16] graph: introduce graph bind unbind API Zhirun Yan
                                           ` (12 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  2 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  1 +
 6 files changed, 81 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 5661dd855b..8aef451f7b 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,9 +16,11 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 indirect_headers += files(
+        'rte_graph_model_mcore_dispatch.h',
         'rte_graph_model_rtc.h',
         'rte_graph_worker_common.h',
         )
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index e9a680a45e..6ae19b0d6e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,7 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 06/16] graph: introduce graph bind unbind API
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (4 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 05/16] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 07/16] graph: move node clone name func into private as common Zhirun Yan
                                           ` (11 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 60 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 ++++++++++++++
 lib/graph/version.map     |  3 +-
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..8d5bd8b9ae 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,65 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	if (graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		goto fail;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 6ae19b0d6e..9a20dba5e7 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,8 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
@@ -24,7 +26,6 @@ EXPERIMENTAL {
 	rte_graph_worker_model_no_check_get;
 	rte_graph_worker_model_set;
 
-
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
 	rte_graph_cluster_stats_get;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 07/16] graph: move node clone name func into private as common
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (5 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 06/16] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
                                           ` (10 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 08/16] graph: introduce graph clone API for other worker core
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (6 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 07/16] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 09/16] graph: add structure for stream moving between cores Zhirun Yan
                                           ` (9 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 89 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 112 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8d5bd8b9ae..1b34f0e543 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -405,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -469,6 +470,94 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Clone the graph model */
+	graph->graph->model = parent_graph->graph->model;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9a20dba5e7..9e92b54ffa 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -5,6 +5,7 @@ EXPERIMENTAL {
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
 
+	rte_graph_clone;
 	rte_graph_create;
 	rte_graph_destroy;
 	rte_graph_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 09/16] graph: add structure for stream moving between cores
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (7 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 10/16] graph: introduce stream moving cross cores Zhirun Yan
                                           ` (8 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 1b34f0e543..968cbbf86c 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -291,6 +291,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 		goto fail;
 
 	graph->lcore_id = lcore;
+	graph->graph->dispatch.lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -314,6 +315,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->dispatch.lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 123600f939..b34cdf1ffb 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -36,12 +36,20 @@ extern "C" {
 /**< Dispatch model to support cross-core dispatching within core affinity. */
 #define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
  * Data structure to hold graph data.
  */
 struct rte_graph {
+	/* Fast path area. */
 	uint32_t tail;		     /**< Tail of circular buffer. */
 	uint32_t head;		     /**< Head of circular buffer. */
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
@@ -51,6 +59,20 @@ struct rte_graph {
 	uint8_t model;		     /**< graph model */
 	uint8_t reserved1;	     /**< Reserved for future use. */
 	uint16_t reserved2;	     /**< Reserved for future use. */
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+			struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+			unsigned int lcore_id;  /**< The graph running Lcore. */
+			struct rte_ring *wq;    /**< The work-queue for pending streams. */
+			struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+		} dispatch; /** Only used by dispatch model */
+	};
+	SLIST_ENTRY(rte_graph) next;   /* The next for rte_graph list */
+	/* End of Fast path area.*/
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -83,6 +105,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			unsigned int lcore_id;  /**< Node running lcore. */
+		} dispatch;
+	};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 10/16] graph: introduce stream moving cross cores
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (8 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 09/16] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:13                         ` [PATCH v13 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                                           ` (7 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c                          |   6 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   3 +
 7 files changed, 255 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 968cbbf86c..41251e3435 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -473,7 +473,7 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -547,14 +547,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 8aef451f7b..cf37a13c65 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -25,4 +25,4 @@ indirect_headers += files(
         'rte_graph_worker_common.h',
         )
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..8f4bc860ab 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->dispatch.wq = rte_ring_create(graph->name, wq_size, graph->socket,
+					     flags);
+	if (graph->dispatch.wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
+						sizeof(struct graph_mcore_dispatch_wq_node),
+						0, 0, NULL, NULL, NULL, NULL,
+						graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->dispatch.mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->dispatch.lcore_id = _graph->lcore_id;
+
+	if (parent_graph->dispatch.rq == NULL) {
+		parent_graph->dispatch.rq = &parent_graph->dispatch.rq_head;
+		SLIST_INIT(parent_graph->dispatch.rq);
+	}
+
+	graph->dispatch.rq = parent_graph->dispatch.rq;
+	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+
+	rte_mempool_free(graph->dispatch.mp);
+	graph->dispatch.mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void *)&wq_node,
+					     sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, next)
+		if (graph->dispatch.lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->dispatch.mp;
+	struct rte_ring *wq = graph->dispatch.wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9e92b54ffa..7e985d6308 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -1,6 +1,9 @@
 EXPERIMENTAL {
 	global:
 
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+
 	__rte_node_register;
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 11/16] graph: enable create and destroy graph scheduling workqueue
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (9 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 10/16] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-13 10:13                         ` Zhirun Yan
  2023-06-13 10:14                         ` [PATCH v13 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                                           ` (6 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:13 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 41251e3435..0c28d925bc 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -451,6 +451,11 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get(graph->graph) ==
+			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -524,6 +529,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	/* Clone the graph model */
 	graph->graph->model = parent_graph->graph->model;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get(graph->graph) == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 12/16] graph: introduce graph walk by cross-core dispatch
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (10 preceding siblings ...)
  2023-06-13 10:13                         ` [PATCH v13 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-13 10:14                         ` Zhirun Yan
  2023-06-13 10:14                         ` [PATCH v13 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                                           ` (5 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:14 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..c78a3bbdf9 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,50 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->dispatch.wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
+		    graph->dispatch.rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->dispatch.rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 13/16] graph: enable graph multicore dispatch scheduler model
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (11 preceding siblings ...)
  2023-06-13 10:14                         ` [PATCH v13 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-13 10:14                         ` Zhirun Yan
  2023-06-13 10:14                         ` [PATCH v13 14/16] graph: add stats for mcore dispatch model Zhirun Yan
                                           ` (4 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:14 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model. Must define
RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
to enable specific model choosing.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/prog_guide/graph_lib.rst | 71 ++++++++++++++++++++++++++---
 lib/graph/rte_graph_worker.h        | 13 ++++++
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 1cfdc86433..017cc25fd3 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,13 +189,70 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
-are designed to work on single-core to have better performance.
-The fast path API works on graph object, So the multi-core graph
-processing strategy would be to create graph object PER WORKER.
+Graph models
+~~~~~~~~~~~~
+There are two different kinds of graph walking models. User can select the model using
+``rte_graph_worker_model_set()`` API. If the application decides to use only one model,
+the fast path check can be avoided by defining the model with RTE_GRAPH_MODEL_SELECT.
+For example:
+
+.. code-block:: console
+
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
+#include "rte_graph_worker.h"
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically, ``rte_graph_walk_rtc()`` and
+``rte_node_enqueue*`` fast path API functions are designed to work on single-core to
+have better performance. The fast path API works on graph object, So the multi-core
+graph processing strategy would be to create graph object PER WORKER.
+
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
 
 In fast path
 ~~~~~~~~~~~~
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..6685600813 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,19 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)
 	rte_graph_walk_rtc(graph);
+#elif defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	switch (rte_graph_worker_model_no_check_get(graph)) {
+	case RTE_GRAPH_MODEL_MCORE_DISPATCH:
+		rte_graph_walk_mcore_dispatch(graph);
+		break;
+	default:
+		rte_graph_walk_rtc(graph);
+	}
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 14/16] graph: add stats for mcore dispatch model
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (12 preceding siblings ...)
  2023-06-13 10:14                         ` [PATCH v13 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-13 10:14                         ` Zhirun Yan
  2023-06-13 10:14                         ` [PATCH v13 15/16] test/graph: add functional tests " Zhirun Yan
                                           ` (3 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:14 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for mcore dispatch model if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..9def3067ec 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get(g) == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->dispatch.total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->dispatch.total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..cc32245c05 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,28 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +104,22 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->dispatch.total_sched_objs;
+			sched_fail += node->dispatch.total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 8f4bc860ab..d1291b8c57 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->dispatch.total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->dispatch.total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b34cdf1ffb..a3824590cd 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		struct {
 			unsigned int lcore_id;  /**< Node running lcore. */
+			uint64_t total_sched_objs; /**< Number of objects scheduled. */
+			uint64_t total_sched_fail; /**< Number of scheduled failure. */
 		} dispatch;
 	};
 	/* Fast path area  */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 15/16] test/graph: add functional tests for mcore dispatch model
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (13 preceding siblings ...)
  2023-06-13 10:14                         ` [PATCH v13 14/16] graph: add stats for mcore dispatch model Zhirun Yan
@ 2023-06-13 10:14                         ` Zhirun Yan
  2023-06-13 10:14                         ` [PATCH v13 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
                                           ` (2 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:14 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 app/test/test_graph.c | 130 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..8609c0b3a4 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,132 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+	int ret = 0;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		ret = -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		ret = -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->dispatch.lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		ret = -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->dispatch.lcore_id);
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test3");
+	if (rte_graph_worker_model_get(graph) != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v13 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (14 preceding siblings ...)
  2023-06-13 10:14                         ` [PATCH v13 15/16] test/graph: add functional tests " Zhirun Yan
@ 2023-06-13 10:14                         ` Zhirun Yan
  2023-06-13 11:12                         ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Jerin Jacob
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 10:14 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

RTE_GRAPH_MODEL_SELECT is set to RTE_GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT if set it as rtc or mcore
dispatch explicitly. If not define it, it could choose by param model
in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 ++
 examples/l3fwd-graph/main.c                   | 230 +++++++++++++++---
 2 files changed, 208 insertions(+), 38 deletions(-)

diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index 585ac8c898..7189fa33ec 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -54,6 +54,7 @@ The application has a number of command line options similar to l3fwd::
                                    [--pcap-enable]
                                    [--pcap-num-cap]
                                    [--pcap-file-name]
+                                   [--model]
 
 Where,
 
@@ -78,6 +79,8 @@ Where,
 
 * ``--pcap-file-name:`` Optional, Pcap filename to capture packets in.
 
+* ``--model:`` Optional, select graph walking model.
+
 For example, consider a dual processor socket platform with 8 physical cores, where cores 0-7 and 16-23 appear on socket 0,
 while cores 8-15 and 24-31 appear on socket 1.
 
@@ -122,6 +125,19 @@ In this command:
 
 *   The --pcap-file-name option enables user to give filename in which packets are to be captured.
 
+To enable mcore dispatch model, the application need change RTE_GRAPH_MODEL_SELECT to ``#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH``
+before including rte_graph_worker.h. Recompile and use following command:
+
+.. code-block:: console
+
+    ./<build_dir>/examples/dpdk-l3fwd-graph -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P --model="dispatch"
+
+To enable graph walking model selection in run-time, remove the define of ``RTE_GRAPH_MODEL_SELECT``. Recompile and use the same command.
+
+In this command:
+
+*   The --model option enables user to select ``rtc`` or ``dispatch`` model.
+
 Refer to the *DPDK Getting Started Guide* for general information on running applications and
 the Environment Abstraction Layer (EAL) options.
 
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 5feeab4f0f..be69fcace1 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -55,6 +56,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -88,6 +92,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static uint8_t model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -153,6 +159,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
 	{RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -276,6 +295,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -318,6 +338,23 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+	else
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+#if defined(RTE_GRAPH_MODEL_SELECT)
+	if (model_conf != RTE_GRAPH_MODEL_SELECT)
+		printf("Warning: model mismatch, will use the RTE_GRAPH_MODEL_SELECT model\n");
+	model_conf = RTE_GRAPH_MODEL_SELECT;
+#endif
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -434,6 +471,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -449,6 +488,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -460,6 +500,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -551,6 +592,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -788,6 +834,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -840,6 +1022,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1079,51 +1264,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v13 00/16] graph enhancement for multi-core dispatch
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (15 preceding siblings ...)
  2023-06-13 10:14                         ` [PATCH v13 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
@ 2023-06-13 11:12                         ` Jerin Jacob
  2023-06-13 11:26                           ` Yan, Zhirun
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
  17 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-13 11:12 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	david.marchand, cunming.liang, haiyue.wang, mattias.ronnblom

On Tue, Jun 13, 2023 at 3:53 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> V13:
> Add sub header into meson indirect_headers list to export.(change meson.build in patch02, 05)

Please rebase with dpdk.org/main.  There are build issues now also
make sure monitor CI after pushing the patches.

[569/2998] Compiling C object lib/librte_node.a.p/node_ethdev_ctrl.c.o
[570/2998] Compiling C object lib/librte_graph.a.p/graph_graph.c.o
[571/2998] Compiling C object lib/librte_node.a.p/node_ip6_rewrite.c.o
FAILED: lib/librte_node.a.p/node_ip6_rewrite.c.o
ccache gcc -Ilib/librte_node.a.p -Ilib -I../lib -Ilib/node
-I../lib/node -I. -I.. -Iconfig -I../config -Ilib/eal/include
-I../lib/eal/include -Ilib/eal/linux/include
-I../lib/eal/linux/include -Ilib/eal/x86/include
-I../lib/eal/x86/include
-Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal
-Ilib/kvargs -I../lib/kvargs -Ilib/metrics -I../lib/metrics
-Ilib/telemetry -I../lib/telemetry -Ilib/graph -I../lib/graph
-Ilib/pcapng -I../lib/pcapng -Ilib/ethdev -I../lib/ethdev
-Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool
-I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/meter -I../lib/meter
-Ilib/lpm -I../lib/lpm -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu
-Ilib/cryptodev -I../lib/cryptodev -fd
iagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-Wextra -Werror -O2 -g -include rte_config.h -Wcast-qual -Wdeprecated
-Wformat -Wformat-nonliteral -Wformat-security -Wmissing-declarations
-Wmissing-prototypes -Wnested-ext
erns -Wold-style-definition -Wpointer-arith -Wsign-compare
-Wstrict-prototypes -Wundef -Wwrite-strings
-Wno-address-of-packed-member -Wno-packed-not-aligned
-Wno-missing-field-initializers -Wno-zero-length-bounds -D_GNU_SOURCE
-fPIC -march=
native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API
-Wno-format-truncation -fno-strict-aliasing
-DRTE_LOG_DEFAULT_LOGTYPE=lib.node -MD -MQ
lib/librte_node.a.p/node_ip6_rewrite.c.o -MF
lib/librte_node.a.p/node_ip6_rewrite.c.o.d -o lib/librt
e_node.a.p/node_ip6_rewrite.c.o -c ../lib/node/ip6_rewrite.c
../lib/node/ip6_rewrite.c:8:10: fatal error: rte_graph_worker.h: No
such file or directory
    8 | #include <rte_graph_worker.h>
      |          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
[572/2998] Compiling C object
drivers/libtmp_rte_common_cnxk.a.p/common_cnxk_roc_idev.c.o


>
> V12:
> Fix compilation broken at patch 1.(keep renamed header align with patch 1,2)
>
> V11:
> Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
> Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
> Update commit message in patch 16.
>
> V10:
> Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
> in patch 04.
> Change RTE_ASSERT to return in patch 06.
> Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
> Move stats into dispatch union in patch 14.
> Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
> Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).
>
>
> V9:
> Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
> Fix graph model check in rte_graph_worker_model_set() in patch 04.
> Fix typo in doc.
>
> V8:
> No performance dorp for original l3fwd-graph and graph_perf_autotest.
>
> Update graph model set/get functions and add graph_model_is_valid() in patch 04.
> Update doc for new scheme usage(choose model in runtime or compile time).
> Update dispatch schedule struct into union.
> Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
> Add model clone in graph_clone() in patch 08.
> Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.
>
>
> V7:
> Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
> Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
>   compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
>   other config func to do model specific things like alloc wq, collect stats)
> Extract the common func clone_name() into graph_private.h for graph/node clone in
>   patch 07.(new patch)
> Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
> Add test case for all new APIs in patch 16(new patch).
> Remove *_END line in enum rte_graph_worker_model in patch 04.
> Add model check for graph lcore binding.
> Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
> Change all new model files/APIs with prefix _mcore_dispatch_.
> Change description of new API, comments of func/structure to explicitly mention for
>   mcore dispatch model only. Add Doxygen comments.
> Update l3fwd-graph with new scheme, Update doc.
> Update MAINTAINERS.
> Fix typo and format issues.
>
> V6:
> Change rte_rdtsc() to rte_rdtsc_precise().
> Add union in rte_graph_param to configure models.
> Remove memset in fastpath, add RTE_ASSERT for cloned graph.
> Update copyright in patch 02.
> Update l3fwd-graph node affinity, start from rx core successively.
>
> V5:
> Fix CI build issues about dynamically update doc.
>
> V4:
> Fix CI build issues about undefined reference of sched apis.
> Remove inline for model setting.
>
> V3:
> Fix CI build issues about TLS and typo.
>
> V2:
> Use git mv to keep git history.
> Use TLS for per-thread local storage.
> Change model name to mcore dispatch.
> Change API with specific mode name.
> Split big patch.
> Fix CI issues.
> Rebase l3fwd-graph example.
> Update doc and maintainers files.
>
>
> Currently, rte_graph supports RTC (Run-To-Completion) model within each
> of a single core.
> RTC is one of the typical model of packet processing. Others like
> Pipeline or Hybrid are lack of support.
>
> The patch set introduces a 'multicore dispatch' model selection which
> is a self-reacting scheme according to the core affinity.
> The new model enables a cross-core dispatching mechanism which employs a
> scheduling work-queue to dispatch streams to other worker cores which
> being associated with the destination node. When core flavor of the
> destination node is a default 'current', the stream can be continue
> executed as normal.
>
> Example:
> 3-node graph targets 3-core budget
>
> RTC:
> Graph: node-0 -> node-1 -> node-2 @Core0.
>
> + - - - - - - - - - - - - - - - - - - - - - +
> '                Core #0/1/2                '
> '                                           '
> ' +--------+     +---------+     +--------+ '
> ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> ' +--------+     +---------+     +--------+ '
> '                                           '
> + - - - - - - - - - - - - - - - - - - - - - +
>
> Dispatch:
>
> Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
> Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
>
> .. code-block:: diff
>
>     + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
>     '  Core #0   '     '          Core #1         '     '  Core #2   '
>     '            '     '                          '     '            '
>     ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
>     ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
>     ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
>     '            '     '     |                    '     '      ^     '
>     + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
>                              |                                 |
>                              + - - - - - - - - - - - - - - - - +
>
>
> The patch set has been break down as below:
>
> 1. Split graph worker into common and default model part.
> 2. Inline graph node processing to make it reusable.
> 3. Add set/get APIs to choose worker model.
> 4. Introduce core affinity API to set the node run on specific worker core.
>   (only use in new model)
> 5. Introduce graph affinity API to bind one graph with specific worker
>   core.
> 6. Introduce graph clone API.
> 7. Introduce stream moving with scheduler work-queue in patch 8~12.
> 8. Add stats for new models.
> 9. Abstract default graph config process and integrate new model into
>   example/l3fwd-graph. Add new parameters for model choosing.
>
> We could run with new worker model by this:
> ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="dispatch"
>
> References:
> https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
>
>
>
>
> Zhirun Yan (16):
>   graph: rename rte_graph_work as common
>   graph: split graph worker into common and default model
>   graph: move node process into inline function
>   graph: add get/set graph worker model APIs
>   graph: introduce graph node core affinity API
>   graph: introduce graph bind unbind API
>   graph: move node clone name func into private as common
>   graph: introduce graph clone API for other worker core
>   graph: add structure for stream moving between cores
>   graph: introduce stream moving cross cores
>   graph: enable create and destroy graph scheduling workqueue
>   graph: introduce graph walk by cross-core dispatch
>   graph: enable graph multicore dispatch scheduler model
>   graph: add stats for mcore dispatch model
>   test/graph: add functional tests for mcore dispatch model
>   examples/l3fwd-graph: introduce mcore dispatch worker model
>
>  MAINTAINERS                                   |   3 +-
>  app/test/test_graph.c                         | 130 ++++
>  doc/guides/prog_guide/graph_lib.rst           |  71 ++-
>  doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
>  examples/l3fwd-graph/main.c                   | 230 +++++--
>  lib/graph/graph.c                             | 161 +++++
>  lib/graph/graph_debug.c                       |   6 +
>  lib/graph/graph_populate.c                    |   1 +
>  lib/graph/graph_private.h                     |  90 +++
>  lib/graph/graph_stats.c                       |  76 ++-
>  lib/graph/meson.build                         |   9 +-
>  lib/graph/node.c                              |  27 +-
>  lib/graph/rte_graph.h                         |  65 ++
>  lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
>  lib/graph/rte_graph_model_mcore_dispatch.h    | 134 ++++
>  lib/graph/rte_graph_model_rtc.h               |  46 ++
>  lib/graph/rte_graph_worker.c                  |  39 ++
>  lib/graph/rte_graph_worker.h                  | 503 +--------------
>  lib/graph/rte_graph_worker_common.h           | 598 ++++++++++++++++++
>  lib/graph/version.map                         |  11 +
>  20 files changed, 1839 insertions(+), 568 deletions(-)
>  create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
>  create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
>  create mode 100644 lib/graph/rte_graph_model_rtc.h
>  create mode 100644 lib/graph/rte_graph_worker.c
>  create mode 100644 lib/graph/rte_graph_worker_common.h
>
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v13 00/16] graph enhancement for multi-core dispatch
  2023-06-13 11:12                         ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Jerin Jacob
@ 2023-06-13 11:26                           ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-13 11:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	david.marchand, Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, June 13, 2023 7:13 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; stephen@networkplumber.org;
> pbhagavatula@marvell.com; david.marchand@redhat.com; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>;
> mattias.ronnblom <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v13 00/16] graph enhancement for multi-core dispatch
> 
> On Tue, Jun 13, 2023 at 3:53 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > V13:
> > Add sub header into meson indirect_headers list to export.(change
> > meson.build in patch02, 05)
> 
> Please rebase with dpdk.org/main.  There are build issues now also make sure
> monitor CI after pushing the patches.
> 
Got it. I will rebase and fix the issues.

> [569/2998] Compiling C object lib/librte_node.a.p/node_ethdev_ctrl.c.o
> [570/2998] Compiling C object lib/librte_graph.a.p/graph_graph.c.o
> [571/2998] Compiling C object lib/librte_node.a.p/node_ip6_rewrite.c.o
> FAILED: lib/librte_node.a.p/node_ip6_rewrite.c.o
> ccache gcc -Ilib/librte_node.a.p -Ilib -I../lib -Ilib/node -I../lib/node -I. -I.. -Iconfig
> -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include -
> I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include -
> Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs -
> I../lib/kvargs -Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -
> Ilib/graph -I../lib/graph -Ilib/pcapng -I../lib/pcapng -Ilib/ethdev -I../lib/ethdev -
> Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -
> Ilib/ring -I../lib/ring -Ilib/meter -I../lib/meter -Ilib/lpm -I../lib/lpm -Ilib/hash -
> I../lib/hash -Ilib/rcu -I../lib/rcu -Ilib/cryptodev -I../lib/cryptodev -fd iagnostics-
> color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Werror -
> O2 -g -include rte_config.h -Wcast-qual -Wdeprecated -Wformat -Wformat-
> nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -
> Wnested-ext erns -Wold-style-definition -Wpointer-arith -Wsign-compare -
> Wstrict-prototypes -Wundef -Wwrite-strings -Wno-address-of-packed-member
> -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-zero-length-
> bounds -D_GNU_SOURCE -fPIC -march= native -DALLOW_EXPERIMENTAL_API -
> DALLOW_INTERNAL_API -Wno-format-truncation -fno-strict-aliasing -
> DRTE_LOG_DEFAULT_LOGTYPE=lib.node -MD -MQ
> lib/librte_node.a.p/node_ip6_rewrite.c.o -MF
> lib/librte_node.a.p/node_ip6_rewrite.c.o.d -o lib/librt
> e_node.a.p/node_ip6_rewrite.c.o -c ../lib/node/ip6_rewrite.c
> ../lib/node/ip6_rewrite.c:8:10: fatal error: rte_graph_worker.h: No such file or
> directory
>     8 | #include <rte_graph_worker.h>
>       |          ^~~~~~~~~~~~~~~~~~~~
> compilation terminated.
> [572/2998] Compiling C object
> drivers/libtmp_rte_common_cnxk.a.p/common_cnxk_roc_idev.c.o
> 
> 
> >
> > V12:
> > Fix compilation broken at patch 1.(keep renamed header align with
> > patch 1,2)
> >
> > V11:
> > Update comments and fix to add experimental flags for
> rte_graph_model_is_valid() in patch 04.
> > Update added symbols in alphabetical order in version.map with patch
> 04,05,06,08,10.
> > Update commit message in patch 16.
> >
> > V10:
> > Add rte_graph_worker_model_no_check_get() for fast path, extract
> > rte_graph_model_is_valid() in patch 04.
> > Change RTE_ASSERT to return in patch 06.
> > Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in
> patch 13.
> > Move stats into dispatch union in patch 14.
> > Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch
> 16.
> > Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).
> >
> >
> > V9:
> > Fix CI build issues for doc building(move TAILQ next pointer out of union) in
> patch 09,10.
> > Fix graph model check in rte_graph_worker_model_set() in patch 04.
> > Fix typo in doc.
> >
> > V8:
> > No performance dorp for original l3fwd-graph and graph_perf_autotest.
> >
> > Update graph model set/get functions and add graph_model_is_valid() in
> patch 04.
> > Update doc for new scheme usage(choose model in runtime or compile time).
> > Update dispatch schedule struct into union.
> > Change enum rte_graph_worker_model to macro define in
> rte_graph_worker_common.h.
> > Add model clone in graph_clone() in patch 08.
> > Remove unnecessary inline for slow path func graph_src_node_avail() in patch
> 06.
> >
> >
> > V7:
> > Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
> > Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT
> to choose in
> >   compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to
> help
> >   other config func to do model specific things like alloc wq, collect
> > stats) Extract the common func clone_name() into graph_private.h for
> graph/node clone in
> >   patch 07.(new patch)
> > Use rte_graph->model in rte_graph_worker_model_set() instead of
> RTE_PER_LCORE_*.
> > Add test case for all new APIs in patch 16(new patch).
> > Remove *_END line in enum rte_graph_worker_model in patch 04.
> > Add model check for graph lcore binding.
> > Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
> > Change all new model files/APIs with prefix _mcore_dispatch_.
> > Change description of new API, comments of func/structure to explicitly
> mention for
> >   mcore dispatch model only. Add Doxygen comments.
> > Update l3fwd-graph with new scheme, Update doc.
> > Update MAINTAINERS.
> > Fix typo and format issues.
> >
> > V6:
> > Change rte_rdtsc() to rte_rdtsc_precise().
> > Add union in rte_graph_param to configure models.
> > Remove memset in fastpath, add RTE_ASSERT for cloned graph.
> > Update copyright in patch 02.
> > Update l3fwd-graph node affinity, start from rx core successively.
> >
> > V5:
> > Fix CI build issues about dynamically update doc.
> >
> > V4:
> > Fix CI build issues about undefined reference of sched apis.
> > Remove inline for model setting.
> >
> > V3:
> > Fix CI build issues about TLS and typo.
> >
> > V2:
> > Use git mv to keep git history.
> > Use TLS for per-thread local storage.
> > Change model name to mcore dispatch.
> > Change API with specific mode name.
> > Split big patch.
> > Fix CI issues.
> > Rebase l3fwd-graph example.
> > Update doc and maintainers files.
> >
> >
> > Currently, rte_graph supports RTC (Run-To-Completion) model within
> > each of a single core.
> > RTC is one of the typical model of packet processing. Others like
> > Pipeline or Hybrid are lack of support.
> >
> > The patch set introduces a 'multicore dispatch' model selection which
> > is a self-reacting scheme according to the core affinity.
> > The new model enables a cross-core dispatching mechanism which employs
> > a scheduling work-queue to dispatch streams to other worker cores
> > which being associated with the destination node. When core flavor of
> > the destination node is a default 'current', the stream can be
> > continue executed as normal.
> >
> > Example:
> > 3-node graph targets 3-core budget
> >
> > RTC:
> > Graph: node-0 -> node-1 -> node-2 @Core0.
> >
> > + - - - - - - - - - - - - - - - - - - - - - +
> > '                Core #0/1/2                '
> > '                                           '
> > ' +--------+     +---------+     +--------+ '
> > ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> > ' +--------+     +---------+     +--------+ '
> > '                                           '
> > + - - - - - - - - - - - - - - - - - - - - - +
> >
> > Dispatch:
> >
> > Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
> > Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
> >
> > .. code-block:: diff
> >
> >     + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> >     '  Core #0   '     '          Core #1         '     '  Core #2   '
> >     '            '     '                          '     '            '
> >     ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> >     ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> >     ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> >     '            '     '     |                    '     '      ^     '
> >     + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> >                              |                                 |
> >                              + - - - - - - - - - - - - - - - - +
> >
> >
> > The patch set has been break down as below:
> >
> > 1. Split graph worker into common and default model part.
> > 2. Inline graph node processing to make it reusable.
> > 3. Add set/get APIs to choose worker model.
> > 4. Introduce core affinity API to set the node run on specific worker core.
> >   (only use in new model)
> > 5. Introduce graph affinity API to bind one graph with specific worker
> >   core.
> > 6. Introduce graph clone API.
> > 7. Introduce stream moving with scheduler work-queue in patch 8~12.
> > 8. Add stats for new models.
> > 9. Abstract default graph config process and integrate new model into
> >   example/l3fwd-graph. Add new parameters for model choosing.
> >
> > We could run with new worker model by this:
> > ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > --model="dispatch"
> >
> > References:
> > https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20intro
> > duce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
> >
> >
> >
> >
> > Zhirun Yan (16):
> >   graph: rename rte_graph_work as common
> >   graph: split graph worker into common and default model
> >   graph: move node process into inline function
> >   graph: add get/set graph worker model APIs
> >   graph: introduce graph node core affinity API
> >   graph: introduce graph bind unbind API
> >   graph: move node clone name func into private as common
> >   graph: introduce graph clone API for other worker core
> >   graph: add structure for stream moving between cores
> >   graph: introduce stream moving cross cores
> >   graph: enable create and destroy graph scheduling workqueue
> >   graph: introduce graph walk by cross-core dispatch
> >   graph: enable graph multicore dispatch scheduler model
> >   graph: add stats for mcore dispatch model
> >   test/graph: add functional tests for mcore dispatch model
> >   examples/l3fwd-graph: introduce mcore dispatch worker model
> >
> >  MAINTAINERS                                   |   3 +-
> >  app/test/test_graph.c                         | 130 ++++
> >  doc/guides/prog_guide/graph_lib.rst           |  71 ++-
> >  doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
> >  examples/l3fwd-graph/main.c                   | 230 +++++--
> >  lib/graph/graph.c                             | 161 +++++
> >  lib/graph/graph_debug.c                       |   6 +
> >  lib/graph/graph_populate.c                    |   1 +
> >  lib/graph/graph_private.h                     |  90 +++
> >  lib/graph/graph_stats.c                       |  76 ++-
> >  lib/graph/meson.build                         |   9 +-
> >  lib/graph/node.c                              |  27 +-
> >  lib/graph/rte_graph.h                         |  65 ++
> >  lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
> >  lib/graph/rte_graph_model_mcore_dispatch.h    | 134 ++++
> >  lib/graph/rte_graph_model_rtc.h               |  46 ++
> >  lib/graph/rte_graph_worker.c                  |  39 ++
> >  lib/graph/rte_graph_worker.h                  | 503 +--------------
> >  lib/graph/rte_graph_worker_common.h           | 598 ++++++++++++++++++
> >  lib/graph/version.map                         |  11 +
> >  20 files changed, 1839 insertions(+), 568 deletions(-)  create mode
> > 100644 lib/graph/rte_graph_model_mcore_dispatch.c
> >  create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
> >  create mode 100644 lib/graph/rte_graph_model_rtc.h  create mode
> > 100644 lib/graph/rte_graph_worker.c  create mode 100644
> > lib/graph/rte_graph_worker_common.h
> >
> > --
> > 2.37.2
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 00/16] graph enhancement for multi-core dispatch
  2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
                                           ` (16 preceding siblings ...)
  2023-06-13 11:12                         ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Jerin Jacob
@ 2023-06-13 14:04                         ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 01/16] graph: rename rte_graph_work as common Zhirun Yan
                                             ` (17 more replies)
  17 siblings, 18 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V14:
Rebase to main and fix build issues.(align header name in patch 01,02)

V13:
Add sub header into meson indirect_headers list to export.(change meson.build in patch02, 05)

V12:
Fix compilation broken at patch 1.(keep renamed header align with patch 1,2)

V11:
Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
Update commit message in patch 16.

V10:
Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
in patch 04.
Change RTE_ASSERT to return in patch 06.
Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
Move stats into dispatch union in patch 14.
Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).

V9:
Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
Fix graph model check in rte_graph_worker_model_set() in patch 04.
Fix typo in doc.

V8:
No performance dorp for original l3fwd-graph and graph_perf_autotest.

Update graph model set/get functions and add graph_model_is_valid() in patch 04.
Update doc for new scheme usage(choose model in runtime or compile time).
Update dispatch schedule struct into union.
Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
Add model clone in graph_clone() in patch 08.
Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (16):
  graph: rename rte_graph_work as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for mcore dispatch model
  test/graph: add functional tests for mcore dispatch model
  examples/l3fwd-graph: introduce mcore dispatch worker model

 MAINTAINERS                                   |   3 +-
 app/test/test_graph.c                         | 130 ++++
 doc/guides/prog_guide/graph_lib.rst           |  71 ++-
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
 examples/l3fwd-graph/main.c                   | 230 +++++--
 lib/graph/graph.c                             | 161 +++++
 lib/graph/graph_debug.c                       |   6 +
 lib/graph/graph_populate.c                    |   1 +
 lib/graph/graph_private.h                     |  90 +++
 lib/graph/graph_stats.c                       |  76 ++-
 lib/graph/meson.build                         |   9 +-
 lib/graph/node.c                              |  27 +-
 lib/graph/rte_graph.h                         |  65 ++
 lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
 lib/graph/rte_graph_model_mcore_dispatch.h    | 134 ++++
 lib/graph/rte_graph_model_rtc.h               |  46 ++
 lib/graph/rte_graph_worker.c                  |  39 ++
 lib/graph/rte_graph_worker.h                  | 503 +--------------
 lib/graph/rte_graph_worker_common.h           | 598 ++++++++++++++++++
 lib/graph/version.map                         |  11 +
 20 files changed, 1839 insertions(+), 568 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 01/16] graph: rename rte_graph_work as common
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 02/16] graph: split graph worker into common and default model Zhirun Yan
                                             ` (16 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 MAINTAINERS                                               | 3 ++-
 app/test/test_graph.c                                     | 2 +-
 app/test/test_graph_perf.c                                | 2 +-
 doc/api/doxy-api-index.md                                 | 2 +-
 doc/guides/prog_guide/graph_lib.rst                       | 2 +-
 examples/l3fwd-graph/main.c                               | 2 +-
 lib/graph/graph_pcap.c                                    | 2 +-
 lib/graph/graph_private.h                                 | 2 +-
 lib/graph/meson.build                                     | 2 +-
 .../{rte_graph_worker.h => rte_graph_worker_common.h}     | 8 ++++----
 lib/node/ethdev_rx.c                                      | 2 +-
 lib/node/ethdev_tx.c                                      | 2 +-
 lib/node/ip4_lookup.c                                     | 2 +-
 lib/node/ip4_rewrite.c                                    | 2 +-
 lib/node/ip6_lookup.c                                     | 2 +-
 lib/node/ip6_rewrite.c                                    | 2 +-
 lib/node/kernel_rx.c                                      | 2 +-
 lib/node/kernel_tx.c                                      | 2 +-
 lib/node/pkt_cls.c                                        | 2 +-
 19 files changed, 23 insertions(+), 22 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 906b31f97c..fda0b55513 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1724,10 +1724,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..c2c855f776 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -24,7 +24,7 @@ test_node_list_dump(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_mbuf.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_random.h>
diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index c5b463f700..5c4e9c917b 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -23,7 +23,7 @@ test_graph_perf_func(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_lcore.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 32bf7b72b2..cc1ad395f3 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -205,7 +205,7 @@ The public API headers are grouped by topics:
     [table_em](@ref rte_swx_table_em.h)
     [table_wm](@ref rte_swx_table_wm.h)
   * [graph](@ref rte_graph.h):
-    [graph_worker](@ref rte_graph_worker.h)
+    [graph_worker](@ref rte_graph_worker_common.h)
   * graph_nodes:
     [eth_node](@ref rte_node_eth_api.h),
     [ip4_node](@ref rte_node_ip4_api.h)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 07248f06c4..18d29e1422 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -26,7 +26,7 @@ Features of the Graph library are:
 - Low overhead statistics collection infrastructure.
 - Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
 - Allow having another graph walk implementation in the future by segregating
-  the fast path(``rte_graph_worker.h``) and slow path code.
+  the fast path(``rte_graph_worker_common.h``) and slow path code.
 
 Advantages of Graph architecture
 --------------------------------
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 0c82e24513..cddbdf327b 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,7 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index db722c375f..f7767fb8be 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -11,7 +11,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 98%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..5a4b54e490 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,11 +2,11 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
- * @file rte_graph_worker.h
+ * @file rte_graph_worker_common.h
  *
  * @warning
  * @b EXPERIMENTAL:
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
diff --git a/lib/node/ethdev_rx.c b/lib/node/ethdev_rx.c
index d131034991..87cccbcc7b 100644
--- a/lib/node/ethdev_rx.c
+++ b/lib/node/ethdev_rx.c
@@ -6,7 +6,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "ethdev_rx_priv.h"
 #include "node_private.h"
diff --git a/lib/node/ethdev_tx.c b/lib/node/ethdev_tx.c
index 7d2d72f823..17231d8b34 100644
--- a/lib/node/ethdev_tx.c
+++ b/lib/node/ethdev_tx.c
@@ -5,7 +5,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "ethdev_tx_priv.h"
 
diff --git a/lib/node/ip4_lookup.c b/lib/node/ip4_lookup.c
index 8bce03d7db..b84c066fe3 100644
--- a/lib/node/ip4_lookup.c
+++ b/lib/node/ip4_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
diff --git a/lib/node/ip4_rewrite.c b/lib/node/ip4_rewrite.c
index 34a920df5e..72ca4b1370 100644
--- a/lib/node/ip4_rewrite.c
+++ b/lib/node/ip4_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/ip6_lookup.c b/lib/node/ip6_lookup.c
index 2898b3f8f6..912e33688c 100644
--- a/lib/node/ip6_lookup.c
+++ b/lib/node/ip6_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_lpm6.h>
 
diff --git a/lib/node/ip6_rewrite.c b/lib/node/ip6_rewrite.c
index 198d8d8820..83f698a0c8 100644
--- a/lib/node/ip6_rewrite.c
+++ b/lib/node/ip6_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/kernel_rx.c b/lib/node/kernel_rx.c
index 2dba7c8cc7..cdd2740170 100644
--- a/lib/node/kernel_rx.c
+++ b/lib/node/kernel_rx.c
@@ -12,7 +12,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/lib/node/kernel_tx.c b/lib/node/kernel_tx.c
index 27d1808c71..0db83a91a9 100644
--- a/lib/node/kernel_tx.c
+++ b/lib/node/kernel_tx.c
@@ -10,7 +10,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 
 #include "kernel_tx_priv.h"
diff --git a/lib/node/pkt_cls.c b/lib/node/pkt_cls.c
index a8302b8d28..2e5b339a60 100644
--- a/lib/node/pkt_cls.c
+++ b/lib/node/pkt_cls.c
@@ -3,7 +3,7 @@
  */
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "pkt_cls_priv.h"
 #include "node_private.h"
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 02/16] graph: split graph worker into common and default model
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 01/16] graph: rename rte_graph_work as common Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 03/16] graph: move node process into inline function Zhirun Yan
                                             ` (15 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test/test_graph.c               |  2 +-
 app/test/test_graph_perf.c          |  2 +-
 doc/api/doxy-api-index.md           |  2 +-
 doc/guides/prog_guide/graph_lib.rst |  2 +-
 examples/l3fwd-graph/main.c         |  2 +-
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  6 ++-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 lib/node/ethdev_rx.c                |  2 +-
 lib/node/ethdev_tx.c                |  2 +-
 lib/node/ip4_lookup.c               |  2 +-
 lib/node/ip4_rewrite.c              |  2 +-
 lib/node/ip6_lookup.c               |  2 +-
 lib/node/ip6_rewrite.c              |  2 +-
 lib/node/kernel_rx.c                |  2 +-
 lib/node/kernel_tx.c                |  2 +-
 lib/node/pkt_cls.c                  |  2 +-
 20 files changed, 118 insertions(+), 74 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index c2c855f776..1a2d1e6fab 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -24,7 +24,7 @@ test_node_list_dump(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_mbuf.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_random.h>
diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index 5c4e9c917b..c5b463f700 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -23,7 +23,7 @@ test_graph_perf_func(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_lcore.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index cc1ad395f3..32bf7b72b2 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -205,7 +205,7 @@ The public API headers are grouped by topics:
     [table_em](@ref rte_swx_table_em.h)
     [table_wm](@ref rte_swx_table_wm.h)
   * [graph](@ref rte_graph.h):
-    [graph_worker](@ref rte_graph_worker_common.h)
+    [graph_worker](@ref rte_graph_worker.h)
   * graph_nodes:
     [eth_node](@ref rte_node_eth_api.h),
     [ip4_node](@ref rte_node_ip4_api.h)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 18d29e1422..07248f06c4 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -26,7 +26,7 @@ Features of the Graph library are:
 - Low overhead statistics collection infrastructure.
 - Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
 - Allow having another graph walk implementation in the future by segregating
-  the fast path(``rte_graph_worker_common.h``) and slow path code.
+  the fast path(``rte_graph_worker.h``) and slow path code.
 
 Advantages of Graph architecture
 --------------------------------
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index cddbdf327b..0c82e24513 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,7 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index f7767fb8be..db722c375f 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -11,7 +11,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..31d78a1dc2 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,10 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
+indirect_headers += files(
+        'rte_graph_model_rtc.h',
+        'rte_graph_worker_common.h',
+        )
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 5a4b54e490..475ccdc0ee 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
diff --git a/lib/node/ethdev_rx.c b/lib/node/ethdev_rx.c
index 87cccbcc7b..d131034991 100644
--- a/lib/node/ethdev_rx.c
+++ b/lib/node/ethdev_rx.c
@@ -6,7 +6,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "ethdev_rx_priv.h"
 #include "node_private.h"
diff --git a/lib/node/ethdev_tx.c b/lib/node/ethdev_tx.c
index 17231d8b34..7d2d72f823 100644
--- a/lib/node/ethdev_tx.c
+++ b/lib/node/ethdev_tx.c
@@ -5,7 +5,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "ethdev_tx_priv.h"
 
diff --git a/lib/node/ip4_lookup.c b/lib/node/ip4_lookup.c
index b84c066fe3..8bce03d7db 100644
--- a/lib/node/ip4_lookup.c
+++ b/lib/node/ip4_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
diff --git a/lib/node/ip4_rewrite.c b/lib/node/ip4_rewrite.c
index 72ca4b1370..34a920df5e 100644
--- a/lib/node/ip4_rewrite.c
+++ b/lib/node/ip4_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/ip6_lookup.c b/lib/node/ip6_lookup.c
index 912e33688c..2898b3f8f6 100644
--- a/lib/node/ip6_lookup.c
+++ b/lib/node/ip6_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_lpm6.h>
 
diff --git a/lib/node/ip6_rewrite.c b/lib/node/ip6_rewrite.c
index 83f698a0c8..198d8d8820 100644
--- a/lib/node/ip6_rewrite.c
+++ b/lib/node/ip6_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/kernel_rx.c b/lib/node/kernel_rx.c
index cdd2740170..2dba7c8cc7 100644
--- a/lib/node/kernel_rx.c
+++ b/lib/node/kernel_rx.c
@@ -12,7 +12,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/lib/node/kernel_tx.c b/lib/node/kernel_tx.c
index 0db83a91a9..27d1808c71 100644
--- a/lib/node/kernel_tx.c
+++ b/lib/node/kernel_tx.c
@@ -10,7 +10,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 
 #include "kernel_tx_priv.h"
diff --git a/lib/node/pkt_cls.c b/lib/node/pkt_cls.c
index 2e5b339a60..a8302b8d28 100644
--- a/lib/node/pkt_cls.c
+++ b/lib/node/pkt_cls.c
@@ -3,7 +3,7 @@
  */
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "pkt_cls_priv.h"
 #include "node_private.h"
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 03/16] graph: move node process into inline function
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 01/16] graph: rename rte_graph_work as common Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 02/16] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 04/16] graph: add get/set graph worker model APIs Zhirun Yan
                                             ` (14 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 475ccdc0ee..a90addb172 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 04/16] graph: add get/set graph worker model APIs
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (2 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 03/16] graph: move node process into inline function Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 05/16] graph: introduce graph node core affinity API Zhirun Yan
                                             ` (13 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 39 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 70 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  5 +++
 4 files changed, 115 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 31d78a1dc2..5661dd855b 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 indirect_headers += files(
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..7e2a918fae
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+bool
+rte_graph_model_is_valid(uint8_t model)
+{
+	if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		return false;
+
+	return true;
+}
+
+int
+rte_graph_worker_model_set(uint8_t model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	if (!rte_graph_model_is_valid(model))
+		return -EINVAL;
+
+	STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+
+	return 0;
+}
+
+uint8_t
+rte_graph_worker_model_get(struct rte_graph *graph)
+{
+	if (!rte_graph_model_is_valid(graph->model))
+		return -EINVAL;
+
+	return graph->model;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a90addb172..123600f939 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+/* When adding a new graph model entry, update rte_graph_model_is_valid() implementation. */
+#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+/**< Dispatch model to support cross-core dispatching within core affinity. */
+#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
+
 /**
  * @internal
  *
@@ -41,6 +48,9 @@ struct rte_graph {
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	uint8_t model;		     /**< graph model */
+	uint8_t reserved1;	     /**< Reserved for future use. */
+	uint16_t reserved2;	     /**< Reserved for future use. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -490,6 +500,66 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+/**
+ * Test the validity of model.
+ *
+ * @param model
+ *   Model to check.
+ *
+ * @return
+ *   True if graph model is valid, false otherwise.
+ */
+__rte_experimental
+bool
+rte_graph_model_is_valid(uint8_t model);
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param model
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_graph_worker_model_set(uint8_t model);
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for slow path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+uint8_t rte_graph_worker_model_get(struct rte_graph *graph);
+
+/**
+ * Get the graph worker model without check
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for fast path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static __rte_always_inline
+uint8_t rte_graph_worker_model_no_check_get(struct rte_graph *graph)
+{
+	return graph->model;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..e9a680a45e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -14,10 +14,15 @@ EXPERIMENTAL {
 	rte_graph_lookup;
 	rte_graph_list_dump;
 	rte_graph_max_count;
+	rte_graph_model_is_valid;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_worker_model_get;
+	rte_graph_worker_model_no_check_get;
+	rte_graph_worker_model_set;
+
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 05/16] graph: introduce graph node core affinity API
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (3 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 04/16] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 06/16] graph: introduce graph bind unbind API Zhirun Yan
                                             ` (12 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  2 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  1 +
 6 files changed, 81 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 5661dd855b..8aef451f7b 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,9 +16,11 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 indirect_headers += files(
+        'rte_graph_model_mcore_dispatch.h',
         'rte_graph_model_rtc.h',
         'rte_graph_worker_common.h',
         )
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index e9a680a45e..6ae19b0d6e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,7 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 06/16] graph: introduce graph bind unbind API
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (4 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 05/16] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 07/16] graph: move node clone name func into private as common Zhirun Yan
                                             ` (11 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 60 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 ++++++++++++++
 lib/graph/version.map     |  3 +-
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..8d5bd8b9ae 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,65 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	if (graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		goto fail;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 6ae19b0d6e..9a20dba5e7 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,8 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
@@ -24,7 +26,6 @@ EXPERIMENTAL {
 	rte_graph_worker_model_no_check_get;
 	rte_graph_worker_model_set;
 
-
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
 	rte_graph_cluster_stats_get;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 07/16] graph: move node clone name func into private as common
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (5 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 06/16] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
                                             ` (10 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 08/16] graph: introduce graph clone API for other worker core
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (6 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 07/16] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 09/16] graph: add structure for stream moving between cores Zhirun Yan
                                             ` (9 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 89 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 112 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8d5bd8b9ae..1b34f0e543 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -405,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -469,6 +470,94 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Clone the graph model */
+	graph->graph->model = parent_graph->graph->model;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9a20dba5e7..9e92b54ffa 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -5,6 +5,7 @@ EXPERIMENTAL {
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
 
+	rte_graph_clone;
 	rte_graph_create;
 	rte_graph_destroy;
 	rte_graph_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 09/16] graph: add structure for stream moving between cores
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (7 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 10/16] graph: introduce stream moving cross cores Zhirun Yan
                                             ` (8 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 1b34f0e543..968cbbf86c 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -291,6 +291,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 		goto fail;
 
 	graph->lcore_id = lcore;
+	graph->graph->dispatch.lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -314,6 +315,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->dispatch.lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 123600f939..b34cdf1ffb 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -36,12 +36,20 @@ extern "C" {
 /**< Dispatch model to support cross-core dispatching within core affinity. */
 #define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
  * Data structure to hold graph data.
  */
 struct rte_graph {
+	/* Fast path area. */
 	uint32_t tail;		     /**< Tail of circular buffer. */
 	uint32_t head;		     /**< Head of circular buffer. */
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
@@ -51,6 +59,20 @@ struct rte_graph {
 	uint8_t model;		     /**< graph model */
 	uint8_t reserved1;	     /**< Reserved for future use. */
 	uint16_t reserved2;	     /**< Reserved for future use. */
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+			struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+			unsigned int lcore_id;  /**< The graph running Lcore. */
+			struct rte_ring *wq;    /**< The work-queue for pending streams. */
+			struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+		} dispatch; /** Only used by dispatch model */
+	};
+	SLIST_ENTRY(rte_graph) next;   /* The next for rte_graph list */
+	/* End of Fast path area.*/
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -83,6 +105,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			unsigned int lcore_id;  /**< Node running lcore. */
+		} dispatch;
+	};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 10/16] graph: introduce stream moving cross cores
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (8 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 09/16] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                                             ` (7 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c                          |   6 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   3 +
 7 files changed, 255 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 968cbbf86c..41251e3435 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -473,7 +473,7 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
@@ -547,14 +547,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 8aef451f7b..cf37a13c65 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -25,4 +25,4 @@ indirect_headers += files(
         'rte_graph_worker_common.h',
         )
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..8f4bc860ab 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->dispatch.wq = rte_ring_create(graph->name, wq_size, graph->socket,
+					     flags);
+	if (graph->dispatch.wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
+						sizeof(struct graph_mcore_dispatch_wq_node),
+						0, 0, NULL, NULL, NULL, NULL,
+						graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->dispatch.mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->dispatch.lcore_id = _graph->lcore_id;
+
+	if (parent_graph->dispatch.rq == NULL) {
+		parent_graph->dispatch.rq = &parent_graph->dispatch.rq_head;
+		SLIST_INIT(parent_graph->dispatch.rq);
+	}
+
+	graph->dispatch.rq = parent_graph->dispatch.rq;
+	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+
+	rte_mempool_free(graph->dispatch.mp);
+	graph->dispatch.mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void *)&wq_node,
+					     sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, next)
+		if (graph->dispatch.lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->dispatch.mp;
+	struct rte_ring *wq = graph->dispatch.wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9e92b54ffa..7e985d6308 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -1,6 +1,9 @@
 EXPERIMENTAL {
 	global:
 
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+
 	__rte_node_register;
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 11/16] graph: enable create and destroy graph scheduling workqueue
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (9 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 10/16] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                                             ` (6 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 41251e3435..0c28d925bc 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -451,6 +451,11 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get(graph->graph) ==
+			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -524,6 +529,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	/* Clone the graph model */
 	graph->graph->model = parent_graph->graph->model;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get(graph->graph) == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 12/16] graph: introduce graph walk by cross-core dispatch
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (10 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                                             ` (5 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..c78a3bbdf9 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,50 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	RTE_ASSERT(graph->parent_id != RTE_GRAPH_ID_INVALID);
+	if (graph->dispatch.wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
+		    graph->dispatch.rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->dispatch.rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 13/16] graph: enable graph multicore dispatch scheduler model
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (11 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 14/16] graph: add stats for mcore dispatch model Zhirun Yan
                                             ` (4 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model. Must define
RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
to enable specific model choosing.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/prog_guide/graph_lib.rst | 71 ++++++++++++++++++++++++++---
 lib/graph/rte_graph_worker.h        | 13 ++++++
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 07248f06c4..6f85e6faef 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,13 +189,70 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
-are designed to work on single-core to have better performance.
-The fast path API works on graph object, So the multi-core graph
-processing strategy would be to create graph object PER WORKER.
+Graph models
+~~~~~~~~~~~~
+There are two different kinds of graph walking models. User can select the model using
+``rte_graph_worker_model_set()`` API. If the application decides to use only one model,
+the fast path check can be avoided by defining the model with RTE_GRAPH_MODEL_SELECT.
+For example:
+
+.. code-block:: console
+
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
+#include "rte_graph_worker.h"
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically, ``rte_graph_walk_rtc()`` and
+``rte_node_enqueue*`` fast path API functions are designed to work on single-core to
+have better performance. The fast path API works on graph object, So the multi-core
+graph processing strategy would be to create graph object PER WORKER.
+
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
 
 In fast path
 ~~~~~~~~~~~~
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..6685600813 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,19 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)
 	rte_graph_walk_rtc(graph);
+#elif defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	switch (rte_graph_worker_model_no_check_get(graph)) {
+	case RTE_GRAPH_MODEL_MCORE_DISPATCH:
+		rte_graph_walk_mcore_dispatch(graph);
+		break;
+	default:
+		rte_graph_walk_rtc(graph);
+	}
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 14/16] graph: add stats for mcore dispatch model
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (12 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 15/16] test/graph: add functional tests " Zhirun Yan
                                             ` (3 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for mcore dispatch model if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..9def3067ec 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get(g) == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->dispatch.total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->dispatch.total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..cc32245c05 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,28 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +104,22 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->dispatch.total_sched_objs;
+			sched_fail += node->dispatch.total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 8f4bc860ab..d1291b8c57 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->dispatch.total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->dispatch.total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b34cdf1ffb..a3824590cd 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		struct {
 			unsigned int lcore_id;  /**< Node running lcore. */
+			uint64_t total_sched_objs; /**< Number of objects scheduled. */
+			uint64_t total_sched_fail; /**< Number of scheduled failure. */
 		} dispatch;
 	};
 	/* Fast path area  */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 15/16] test/graph: add functional tests for mcore dispatch model
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (13 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 14/16] graph: add stats for mcore dispatch model Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:04                           ` [PATCH v14 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
                                             ` (2 subsequent siblings)
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 app/test/test_graph.c | 130 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..8609c0b3a4 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,132 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+	int ret = 0;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		ret = -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		ret = -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->dispatch.lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		ret = -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->dispatch.lcore_id);
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test3");
+	if (rte_graph_worker_model_get(graph) != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v14 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (14 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 15/16] test/graph: add functional tests " Zhirun Yan
@ 2023-06-13 14:04                           ` Zhirun Yan
  2023-06-13 14:40                           ` [PATCH v14 00/16] graph enhancement for multi-core dispatch David Marchand
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
  17 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-13 14:04 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

RTE_GRAPH_MODEL_SELECT is set to RTE_GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT if set it as rtc or mcore
dispatch explicitly. If not define it, it could choose by param model
in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 ++
 examples/l3fwd-graph/main.c                   | 230 +++++++++++++++---
 2 files changed, 208 insertions(+), 38 deletions(-)

diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index 23f86e4785..eb56f9b94a 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -57,6 +57,7 @@ The application has a number of command line options similar to l3fwd::
                                    [--pcap-enable]
                                    [--pcap-num-cap]
                                    [--pcap-file-name]
+                                   [--model]
 
 Where,
 
@@ -81,6 +82,8 @@ Where,
 
 * ``--pcap-file-name:`` Optional, Pcap filename to capture packets in.
 
+* ``--model:`` Optional, select graph walking model.
+
 For example, consider a dual processor socket platform with 8 physical cores, where cores 0-7 and 16-23 appear on socket 0,
 while cores 8-15 and 24-31 appear on socket 1.
 
@@ -125,6 +128,19 @@ In this command:
 
 *   The --pcap-file-name option enables user to give filename in which packets are to be captured.
 
+To enable mcore dispatch model, the application need change RTE_GRAPH_MODEL_SELECT to ``#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH``
+before including rte_graph_worker.h. Recompile and use following command:
+
+.. code-block:: console
+
+    ./<build_dir>/examples/dpdk-l3fwd-graph -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P --model="dispatch"
+
+To enable graph walking model selection in run-time, remove the define of ``RTE_GRAPH_MODEL_SELECT``. Recompile and use the same command.
+
+In this command:
+
+*   The --model option enables user to select ``rtc`` or ``dispatch`` model.
+
 Refer to the *DPDK Getting Started Guide* for general information on running applications and
 the Environment Abstraction Layer (EAL) options.
 
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 0c82e24513..96cb1c81ff 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -57,6 +58,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -90,6 +94,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static uint8_t model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -183,6 +189,19 @@ static struct ipv6_l3fwd_lpm_route ipv6_l3fwd_lpm_route_array[] = {
 	0x02}, 48, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -306,6 +325,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -348,6 +368,23 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+	else
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+#if defined(RTE_GRAPH_MODEL_SELECT)
+	if (model_conf != RTE_GRAPH_MODEL_SELECT)
+		printf("Warning: model mismatch, will use the RTE_GRAPH_MODEL_SELECT model\n");
+	model_conf = RTE_GRAPH_MODEL_SELECT;
+#endif
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -464,6 +501,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -479,6 +518,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -490,6 +530,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -581,6 +622,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -818,6 +864,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -870,6 +1052,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1109,51 +1294,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v14 00/16] graph enhancement for multi-core dispatch
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (15 preceding siblings ...)
  2023-06-13 14:04                           ` [PATCH v14 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
@ 2023-06-13 14:40                           ` David Marchand
  2023-06-13 16:08                             ` Jerin Jacob
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
  17 siblings, 1 reply; 369+ messages in thread
From: David Marchand @ 2023-06-13 14:40 UTC (permalink / raw)
  To: Zhirun Yan, jerinj
  Cc: dev, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, cunming.liang, haiyue.wang, mattias.ronnblom

On Tue, Jun 13, 2023 at 4:12 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> V14:
> Rebase to main and fix build issues.(align header name in patch 01,02)

And, again, compilation is broken between some patches.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v14 00/16] graph enhancement for multi-core dispatch
  2023-06-13 14:40                           ` [PATCH v14 00/16] graph enhancement for multi-core dispatch David Marchand
@ 2023-06-13 16:08                             ` Jerin Jacob
  2023-06-14  1:52                               ` Yan, Zhirun
  0 siblings, 1 reply; 369+ messages in thread
From: Jerin Jacob @ 2023-06-13 16:08 UTC (permalink / raw)
  To: David Marchand
  Cc: Zhirun Yan, jerinj, dev, kirankumark, ndabilpuram, stephen,
	pbhagavatula, cunming.liang, haiyue.wang, mattias.ronnblom

On Tue, Jun 13, 2023 at 8:10 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> On Tue, Jun 13, 2023 at 4:12 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > V14:
> > Rebase to main and fix build issues.(align header name in patch 01,02)
>
> And, again, compilation is broken between some patches.

Same here.

Applying: graph: introduce stream moving cross cores

real    0m17.361s
user    1m39.631s
sys     0m34.884s
HEAD is now at 050de60d8a version: 23.07-rc1
meson: build failed

@Zhirun Yan  You may use the following script to test the compilation
per patch and additional build sanity before posting the patch to
mailing list.
https://github.com/jerinjacobk/config/blob/master/scripts/build_each_patch.sh

example:
git pw series  apply 28485  && rm -rf /tmp/k/* && git format-patch
HEAD~16 -o /tmp/k/ && git reset --hard HEAD~1
</path/to/build_each_patch.sh /tmp/k/


[1202/3000] Compiling C object lib/librte_graph.a.p/graph_graph.c.o
FAILED: lib/librte_graph.a.p/graph_graph.c.o
ccache gcc -Ilib/librte_graph.a.p -Ilib -I../lib -Ilib/graph
-I../lib/graph -I. -I.. -Iconfig -I../config -Ilib/eal/include
-I../lib/eal/include -Ilib/eal/linux/include
-I../lib/eal/linux/include -Ilib/eal/x86/include
-I../lib/eal/x86/inclu
de -Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal
-Ilib/kvargs -I../lib/kvargs -Ilib/metrics -I../lib/metrics
-Ilib/telemetry -I../lib/telemetry -Ilib/pcapng -I../lib/pcapng
-Ilib/ethdev -I../lib/ethdev -Ilib/net -I../lib/net -
Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring
-I../lib/ring -Ilib/meter -I../lib/meter -fdiagnostics-color=always
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Werror -O2 -g
-include rte_config.h -Wcast-qual -Wdepr
ecated -Wformat -Wformat-nonliteral -Wformat-security
-Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare
-Wstrict-prototypes -Wundef -Wwrite-strings
-Wno-address-of-packed-memb
er -Wno-packed-not-aligned -Wno-missing-field-initializers
-Wno-zero-length-bounds -D_GNU_SOURCE -fPIC -march=native
-DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation
-DRTE_LOG_DEFAULT_LOGTYPE=lib.graph -MD -MQ lib/librte
_graph.a.p/graph_graph.c.o -MF lib/librte_graph.a.p/graph_graph.c.o.d
-o lib/librte_graph.a.p/graph_graph.c.o -c ../lib/graph/graph.c
../lib/graph/graph.c: In function ‘graph_clone’:
../lib/graph/graph.c:476:83: error: unused parameter ‘prm’
[-Werror=unused-parameter]
  476 | graph_clone(struct graph *parent_graph, const char *name,
struct rte_graph_param *prm)
      |
~~~~~~~~~~~~~~~~~~~~~~~~^~~
cc1: all warnings being treated as errors



>
>
> --
> David Marchand
>

^ permalink raw reply	[flat|nested] 369+ messages in thread

* RE: [PATCH v14 00/16] graph enhancement for multi-core dispatch
  2023-06-13 16:08                             ` Jerin Jacob
@ 2023-06-14  1:52                               ` Yan, Zhirun
  0 siblings, 0 replies; 369+ messages in thread
From: Yan, Zhirun @ 2023-06-14  1:52 UTC (permalink / raw)
  To: Jerin Jacob, David Marchand
  Cc: jerinj, dev, kirankumark, ndabilpuram, stephen, pbhagavatula,
	Liang, Cunming, Wang, Haiyue, mattias.ronnblom



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, June 14, 2023 12:09 AM
> To: David Marchand <david.marchand@redhat.com>
> Cc: Yan, Zhirun <zhirun.yan@intel.com>; jerinj@marvell.com; dev@dpdk.org;
> kirankumark@marvell.com; ndabilpuram@marvell.com;
> stephen@networkplumber.org; pbhagavatula@marvell.com; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>;
> mattias.ronnblom <mattias.ronnblom@ericsson.com>
> Subject: Re: [PATCH v14 00/16] graph enhancement for multi-core dispatch
> 
> On Tue, Jun 13, 2023 at 8:10 PM David Marchand
> <david.marchand@redhat.com> wrote:
> >
> > On Tue, Jun 13, 2023 at 4:12 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > V14:
> > > Rebase to main and fix build issues.(align header name in patch
> > > 01,02)
> >
> > And, again, compilation is broken between some patches.
> 
> Same here.
> 
> Applying: graph: introduce stream moving cross cores
> 
> real    0m17.361s
> user    1m39.631s
> sys     0m34.884s
> HEAD is now at 050de60d8a version: 23.07-rc1
> meson: build failed
> 
> @Zhirun Yan  You may use the following script to test the compilation per patch
> and additional build sanity before posting the patch to mailing list.
> https://github.com/jerinjacobk/config/blob/master/scripts/build_each_patch.sh
> 
> example:
> git pw series  apply 28485  && rm -rf /tmp/k/* && git format-patch
> HEAD~16 -o /tmp/k/ && git reset --hard HEAD~1 </path/to/build_each_patch.sh
> /tmp/k/
> 
Thanks for the tools. I will double check the patch series.

> 
> [1202/3000] Compiling C object lib/librte_graph.a.p/graph_graph.c.o
> FAILED: lib/librte_graph.a.p/graph_graph.c.o
> ccache gcc -Ilib/librte_graph.a.p -Ilib -I../lib -Ilib/graph -I../lib/graph -I. -I.. -
> Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include -
> I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/inclu de -
> Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs -
> I../lib/kvargs -Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -
> Ilib/pcapng -I../lib/pcapng -Ilib/ethdev -I../lib/ethdev -Ilib/net -I../lib/net -
> Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -
> Ilib/meter -I../lib/meter -fdiagnostics-color=always
> -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Werror -O2 -g -include
> rte_config.h -Wcast-qual -Wdepr ecated -Wformat -Wformat-nonliteral -
> Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-
> externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-
> prototypes -Wundef -Wwrite-strings -Wno-address-of-packed-memb er -Wno-
> packed-not-aligned -Wno-missing-field-initializers -Wno-zero-length-bounds -
> D_GNU_SOURCE -fPIC -march=native -DALLOW_EXPERIMENTAL_API -
> DALLOW_INTERNAL_API -Wno-format-truncation -
> DRTE_LOG_DEFAULT_LOGTYPE=lib.graph -MD -MQ lib/librte
> _graph.a.p/graph_graph.c.o -MF lib/librte_graph.a.p/graph_graph.c.o.d
> -o lib/librte_graph.a.p/graph_graph.c.o -c ../lib/graph/graph.c
> ../lib/graph/graph.c: In function ‘graph_clone’:
> ../lib/graph/graph.c:476:83: error: unused parameter ‘prm’
> [-Werror=unused-parameter]
>   476 | graph_clone(struct graph *parent_graph, const char *name, struct
> rte_graph_param *prm)
>       |
> ~~~~~~~~~~~~~~~~~~~~~~~~^~~
> cc1: all warnings being treated as errors
> 
> 
> 
> >
> >
> > --
> > David Marchand
> >

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 00/16] graph enhancement for multi-core dispatch
  2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
                                             ` (16 preceding siblings ...)
  2023-06-13 14:40                           ` [PATCH v14 00/16] graph enhancement for multi-core dispatch David Marchand
@ 2023-06-14 15:58                           ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 01/16] graph: rename rte graph worker header as common Zhirun Yan
                                               ` (16 more replies)
  17 siblings, 17 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

V15:
Fix build issues. (add unused macro in patch 10 then remove in patch 11,
fix log message format in patch 01, rm wrong assert in dispatch_walk in patch 12)

V14:
Rebase to main and fix build issues.(align header name in patch 01,02)

V13:
Add sub header into meson indirect_headers list to export.(change meson.build in patch02, 05)

V12:
Fix compilation broken at patch 1.(keep renamed header align with patch 1,2)

V11:
Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
Update commit message in patch 16.

V10:
Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
in patch 04.
Change RTE_ASSERT to return in patch 06.
Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
Move stats into dispatch union in patch 14.
Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).

V9:
Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
Fix graph model check in rte_graph_worker_model_set() in patch 04.
Fix typo in doc.

V8:
No performance dorp for original l3fwd-graph and graph_perf_autotest.

Update graph model set/get functions and add graph_model_is_valid() in patch 04.
Update doc for new scheme usage(choose model in runtime or compile time).
Update dispatch schedule struct into union.
Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
Add model clone in graph_clone() in patch 08.
Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.

V7:
Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
  compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
  other config func to do model specific things like alloc wq, collect stats)
Extract the common func clone_name() into graph_private.h for graph/node clone in
  patch 07.(new patch)
Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
Add test case for all new APIs in patch 16(new patch).
Remove *_END line in enum rte_graph_worker_model in patch 04.
Add model check for graph lcore binding.
Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
Change all new model files/APIs with prefix _mcore_dispatch_.
Change description of new API, comments of func/structure to explicitly mention for
  mcore dispatch model only. Add Doxygen comments.
Update l3fwd-graph with new scheme, Update doc.
Update MAINTAINERS.
Fix typo and format issues.

V6:
Change rte_rdtsc() to rte_rdtsc_precise().
Add union in rte_graph_param to configure models.
Remove memset in fastpath, add RTE_ASSERT for cloned graph.
Update copyright in patch 02.
Update l3fwd-graph node affinity, start from rx core successively.

V5:
Fix CI build issues about dynamically update doc.

V4:
Fix CI build issues about undefined reference of sched apis.
Remove inline for model setting.

V3:
Fix CI build issues about TLS and typo.

V2:
Use git mv to keep git history.
Use TLS for per-thread local storage.
Change model name to mcore dispatch.
Change API with specific mode name.
Split big patch.
Fix CI issues.
Rebase l3fwd-graph example.
Update doc and maintainers files.

Currently, rte_graph supports RTC (Run-To-Completion) model within each
of a single core.
RTC is one of the typical model of packet processing. Others like
Pipeline or Hybrid are lack of support.

The patch set introduces a 'multicore dispatch' model selection which
is a self-reacting scheme according to the core affinity.
The new model enables a cross-core dispatching mechanism which employs a
scheduling work-queue to dispatch streams to other worker cores which
being associated with the destination node. When core flavor of the
destination node is a default 'current', the stream can be continue
executed as normal.

Example:
3-node graph targets 3-core budget

RTC:
Graph: node-0 -> node-1 -> node-2 @Core0.

+ - - - - - - - - - - - - - - - - - - - - - +
'                Core #0/1/2                '
'                                           '
' +--------+     +---------+     +--------+ '
' | Node-0 | --> | Node-1  | --> | Node-2 | '
' +--------+     +---------+     +--------+ '
'                                           '
+ - - - - - - - - - - - - - - - - - - - - - +

Dispatch:

Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.

.. code-block:: diff

    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
    '  Core #0   '     '          Core #1         '     '  Core #2   '
    '            '     '                          '     '            '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
    '            '     '     |                    '     '      ^     '
    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                             |                                 |
                             + - - - - - - - - - - - - - - - - +

The patch set has been break down as below:

1. Split graph worker into common and default model part.
2. Inline graph node processing to make it reusable.
3. Add set/get APIs to choose worker model.
4. Introduce core affinity API to set the node run on specific worker core.
  (only use in new model)
5. Introduce graph affinity API to bind one graph with specific worker
  core.
6. Introduce graph clone API.
7. Introduce stream moving with scheduler work-queue in patch 8~12.
8. Add stats for new models.
9. Abstract default graph config process and integrate new model into
  example/l3fwd-graph. Add new parameters for model choosing.

We could run with new worker model by this:
./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
--model="dispatch"

References:
https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf

Zhirun Yan (16):
  graph: rename rte graph worker header as common
  graph: split graph worker into common and default model
  graph: move node process into inline function
  graph: add get/set graph worker model APIs
  graph: introduce graph node core affinity API
  graph: introduce graph bind unbind API
  graph: move node clone name func into private as common
  graph: introduce graph clone API for other worker core
  graph: add structure for stream moving between cores
  graph: introduce stream moving cross cores
  graph: enable create and destroy graph scheduling workqueue
  graph: introduce graph walk by cross-core dispatch
  graph: enable graph multicore dispatch scheduler model
  graph: add stats for mcore dispatch model
  test/graph: add functional tests for mcore dispatch model
  examples/l3fwd-graph: introduce mcore dispatch worker model

 MAINTAINERS                                   |   3 +-
 app/test/test_graph.c                         | 130 ++++
 doc/guides/prog_guide/graph_lib.rst           |  71 ++-
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 +
 examples/l3fwd-graph/main.c                   | 230 +++++--
 lib/graph/graph.c                             | 161 +++++
 lib/graph/graph_debug.c                       |   6 +
 lib/graph/graph_populate.c                    |   1 +
 lib/graph/graph_private.h                     |  90 +++
 lib/graph/graph_stats.c                       |  76 ++-
 lib/graph/meson.build                         |   9 +-
 lib/graph/node.c                              |  27 +-
 lib/graph/rte_graph.h                         |  65 ++
 lib/graph/rte_graph_model_mcore_dispatch.c    | 191 ++++++
 lib/graph/rte_graph_model_mcore_dispatch.h    | 133 ++++
 lib/graph/rte_graph_model_rtc.h               |  46 ++
 lib/graph/rte_graph_worker.c                  |  39 ++
 lib/graph/rte_graph_worker.h                  | 503 +--------------
 lib/graph/rte_graph_worker_common.h           | 598 ++++++++++++++++++
 lib/graph/version.map                         |  11 +
 20 files changed, 1838 insertions(+), 568 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.c
 create mode 100644 lib/graph/rte_graph_worker_common.h

-- 
2.37.2

^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 01/16] graph: rename rte graph worker header as common
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 02/16] graph: split graph worker into common and default model Zhirun Yan
                                               ` (15 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Rename rte_graph_work.h to rte_graph_work_common.h for supporting
multiple graph worker model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 MAINTAINERS                                               | 3 ++-
 app/test/test_graph.c                                     | 2 +-
 app/test/test_graph_perf.c                                | 2 +-
 doc/api/doxy-api-index.md                                 | 2 +-
 doc/guides/prog_guide/graph_lib.rst                       | 2 +-
 examples/l3fwd-graph/main.c                               | 2 +-
 lib/graph/graph_pcap.c                                    | 2 +-
 lib/graph/graph_private.h                                 | 2 +-
 lib/graph/meson.build                                     | 2 +-
 .../{rte_graph_worker.h => rte_graph_worker_common.h}     | 8 ++++----
 lib/node/ethdev_rx.c                                      | 2 +-
 lib/node/ethdev_tx.c                                      | 2 +-
 lib/node/ip4_lookup.c                                     | 2 +-
 lib/node/ip4_rewrite.c                                    | 2 +-
 lib/node/ip6_lookup.c                                     | 2 +-
 lib/node/ip6_rewrite.c                                    | 2 +-
 lib/node/kernel_rx.c                                      | 2 +-
 lib/node/kernel_tx.c                                      | 2 +-
 lib/node/pkt_cls.c                                        | 2 +-
 19 files changed, 23 insertions(+), 22 deletions(-)
 rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 906b31f97c..fda0b55513 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1724,10 +1724,11 @@ F: doc/guides/prog_guide/bpf_lib.rst
 Graph - EXPERIMENTAL
 M: Jerin Jacob <jerinj@marvell.com>
 M: Kiran Kumar K <kirankumark@marvell.com>
+M: Nithin Dabilpuram <ndabilpuram@marvell.com>
+M: Zhirun Yan <zhirun.yan@intel.com>
 F: lib/graph/
 F: doc/guides/prog_guide/graph_lib.rst
 F: app/test/test_graph*
-M: Nithin Dabilpuram <ndabilpuram@marvell.com>
 F: examples/l3fwd-graph/
 F: doc/guides/sample_app_ug/l3_forward_graph.rst
 
diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..c2c855f776 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -24,7 +24,7 @@ test_node_list_dump(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_mbuf.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_random.h>
diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index c5b463f700..5c4e9c917b 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -23,7 +23,7 @@ test_graph_perf_func(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_lcore.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 32bf7b72b2..cc1ad395f3 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -205,7 +205,7 @@ The public API headers are grouped by topics:
     [table_em](@ref rte_swx_table_em.h)
     [table_wm](@ref rte_swx_table_wm.h)
   * [graph](@ref rte_graph.h):
-    [graph_worker](@ref rte_graph_worker.h)
+    [graph_worker](@ref rte_graph_worker_common.h)
   * graph_nodes:
     [eth_node](@ref rte_node_eth_api.h),
     [ip4_node](@ref rte_node_ip4_api.h)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 07248f06c4..18d29e1422 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -26,7 +26,7 @@ Features of the Graph library are:
 - Low overhead statistics collection infrastructure.
 - Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
 - Allow having another graph walk implementation in the future by segregating
-  the fast path(``rte_graph_worker.h``) and slow path code.
+  the fast path(``rte_graph_worker_common.h``) and slow path code.
 
 Advantages of Graph architecture
 --------------------------------
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 0c82e24513..cddbdf327b 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,7 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index db722c375f..f7767fb8be 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -11,7 +11,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..307e5f70bc 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker.h"
+#include "rte_graph_worker_common.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 3526d1b5d4..4e2b612ad3 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,6 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker.h')
+headers = files('rte_graph.h', 'rte_graph_worker_common.h')
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h
similarity index 98%
rename from lib/graph/rte_graph_worker.h
rename to lib/graph/rte_graph_worker_common.h
index 438595b15c..5a4b54e490 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -2,11 +2,11 @@
  * Copyright(C) 2020 Marvell International Ltd.
  */
 
-#ifndef _RTE_GRAPH_WORKER_H_
-#define _RTE_GRAPH_WORKER_H_
+#ifndef _RTE_GRAPH_WORKER_COMMON_H_
+#define _RTE_GRAPH_WORKER_COMMON_H_
 
 /**
- * @file rte_graph_worker.h
+ * @file rte_graph_worker_common.h
  *
  * @warning
  * @b EXPERIMENTAL:
@@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 }
 #endif
 
-#endif /* _RTE_GRAPH_WORKER_H_ */
+#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */
diff --git a/lib/node/ethdev_rx.c b/lib/node/ethdev_rx.c
index d131034991..87cccbcc7b 100644
--- a/lib/node/ethdev_rx.c
+++ b/lib/node/ethdev_rx.c
@@ -6,7 +6,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "ethdev_rx_priv.h"
 #include "node_private.h"
diff --git a/lib/node/ethdev_tx.c b/lib/node/ethdev_tx.c
index 7d2d72f823..17231d8b34 100644
--- a/lib/node/ethdev_tx.c
+++ b/lib/node/ethdev_tx.c
@@ -5,7 +5,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "ethdev_tx_priv.h"
 
diff --git a/lib/node/ip4_lookup.c b/lib/node/ip4_lookup.c
index 8bce03d7db..b84c066fe3 100644
--- a/lib/node/ip4_lookup.c
+++ b/lib/node/ip4_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
diff --git a/lib/node/ip4_rewrite.c b/lib/node/ip4_rewrite.c
index 34a920df5e..72ca4b1370 100644
--- a/lib/node/ip4_rewrite.c
+++ b/lib/node/ip4_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/ip6_lookup.c b/lib/node/ip6_lookup.c
index 2898b3f8f6..912e33688c 100644
--- a/lib/node/ip6_lookup.c
+++ b/lib/node/ip6_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_lpm6.h>
 
diff --git a/lib/node/ip6_rewrite.c b/lib/node/ip6_rewrite.c
index 198d8d8820..83f698a0c8 100644
--- a/lib/node/ip6_rewrite.c
+++ b/lib/node/ip6_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/kernel_rx.c b/lib/node/kernel_rx.c
index 2dba7c8cc7..cdd2740170 100644
--- a/lib/node/kernel_rx.c
+++ b/lib/node/kernel_rx.c
@@ -12,7 +12,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/lib/node/kernel_tx.c b/lib/node/kernel_tx.c
index 27d1808c71..0db83a91a9 100644
--- a/lib/node/kernel_tx.c
+++ b/lib/node/kernel_tx.c
@@ -10,7 +10,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 #include <rte_ip.h>
 
 #include "kernel_tx_priv.h"
diff --git a/lib/node/pkt_cls.c b/lib/node/pkt_cls.c
index a8302b8d28..2e5b339a60 100644
--- a/lib/node/pkt_cls.c
+++ b/lib/node/pkt_cls.c
@@ -3,7 +3,7 @@
  */
 
 #include <rte_graph.h>
-#include <rte_graph_worker.h>
+#include <rte_graph_worker_common.h>
 
 #include "pkt_cls_priv.h"
 #include "node_private.h"
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 02/16] graph: split graph worker into common and default model
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 01/16] graph: rename rte graph worker header as common Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 03/16] graph: move node process into inline function Zhirun Yan
                                               ` (14 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

To support multiple graph worker model, split graph into common
and default. Naming the current walk function as rte_graph_model_rtc
cause the default model is RTC(Run-to-completion).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test/test_graph.c               |  2 +-
 app/test/test_graph_perf.c          |  2 +-
 doc/api/doxy-api-index.md           |  2 +-
 doc/guides/prog_guide/graph_lib.rst |  2 +-
 examples/l3fwd-graph/main.c         |  2 +-
 lib/graph/graph_pcap.c              |  2 +-
 lib/graph/graph_private.h           |  2 +-
 lib/graph/meson.build               |  6 ++-
 lib/graph/rte_graph_model_rtc.h     | 62 +++++++++++++++++++++++++++++
 lib/graph/rte_graph_worker.h        | 35 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 57 --------------------------
 lib/node/ethdev_rx.c                |  2 +-
 lib/node/ethdev_tx.c                |  2 +-
 lib/node/ip4_lookup.c               |  2 +-
 lib/node/ip4_rewrite.c              |  2 +-
 lib/node/ip6_lookup.c               |  2 +-
 lib/node/ip6_rewrite.c              |  2 +-
 lib/node/kernel_rx.c                |  2 +-
 lib/node/kernel_tx.c                |  2 +-
 lib/node/pkt_cls.c                  |  2 +-
 20 files changed, 118 insertions(+), 74 deletions(-)
 create mode 100644 lib/graph/rte_graph_model_rtc.h
 create mode 100644 lib/graph/rte_graph_worker.h

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index c2c855f776..1a2d1e6fab 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -24,7 +24,7 @@ test_node_list_dump(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_mbuf.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_random.h>
diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index 5c4e9c917b..c5b463f700 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -23,7 +23,7 @@ test_graph_perf_func(void)
 #else
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_lcore.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index cc1ad395f3..32bf7b72b2 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -205,7 +205,7 @@ The public API headers are grouped by topics:
     [table_em](@ref rte_swx_table_em.h)
     [table_wm](@ref rte_swx_table_wm.h)
   * [graph](@ref rte_graph.h):
-    [graph_worker](@ref rte_graph_worker_common.h)
+    [graph_worker](@ref rte_graph_worker.h)
   * graph_nodes:
     [eth_node](@ref rte_node_eth_api.h),
     [ip4_node](@ref rte_node_ip4_api.h)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 18d29e1422..07248f06c4 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -26,7 +26,7 @@ Features of the Graph library are:
 - Low overhead statistics collection infrastructure.
 - Support to export the graph as a Graphviz dot file. See ``rte_graph_export()``.
 - Allow having another graph walk implementation in the future by segregating
-  the fast path(``rte_graph_worker_common.h``) and slow path code.
+  the fast path(``rte_graph_worker.h``) and slow path code.
 
 Advantages of Graph architecture
 --------------------------------
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index cddbdf327b..0c82e24513 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,7 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index f7767fb8be..db722c375f 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -11,7 +11,7 @@
 #include <rte_mbuf.h>
 #include <rte_pcapng.h>
 
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 #include "graph_pcap_private.h"
 
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 307e5f70bc..eacdef45f0 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -13,7 +13,7 @@
 #include <rte_spinlock.h>
 
 #include "rte_graph.h"
-#include "rte_graph_worker_common.h"
+#include "rte_graph_worker.h"
 
 extern int rte_graph_logtype;
 
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 4e2b612ad3..31d78a1dc2 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,6 +16,10 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
 )
-headers = files('rte_graph.h', 'rte_graph_worker_common.h')
+headers = files('rte_graph.h', 'rte_graph_worker.h')
+indirect_headers += files(
+        'rte_graph_model_rtc.h',
+        'rte_graph_worker_common.h',
+        )
 
 deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
new file mode 100644
index 0000000000..10b359772f
--- /dev/null
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+static inline void
+rte_graph_walk_rtc(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	/*
+	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
+	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
+	 * in a circular buffer fashion.
+	 *
+	 *	+-----+ <= cir_start - head [number of source nodes]
+	 *	|     |
+	 *	| ... | <= source nodes
+	 *	|     |
+	 *	+-----+ <= cir_start [head = 0] [tail = 0]
+	 *	|     |
+	 *	| ... | <= pending streams
+	 *	|     |
+	 *	+-----+ <= cir_start + mask
+	 */
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, node->idx);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, node->idx);
+		}
+			node->idx = 0;
+			head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+	graph->tail = 0;
+}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
new file mode 100644
index 0000000000..5b58f7bda9
--- /dev/null
+++ b/lib/graph/rte_graph_worker.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell International Ltd.
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_WORKER_H_
+#define _RTE_GRAPH_WORKER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_model_rtc.h"
+
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk(struct rte_graph *graph)
+{
+	rte_graph_walk_rtc(graph);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_WORKER_H_ */
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 5a4b54e490..475ccdc0ee 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -128,63 +128,6 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
-/**
- * Perform graph walk on the circular buffer and invoke the process function
- * of the nodes and collect the stats.
- *
- * @param graph
- *   Graph pointer returned from rte_graph_lookup function.
- *
- * @see rte_graph_lookup()
- */
-__rte_experimental
-static inline void
-rte_graph_walk(struct rte_graph *graph)
-{
-	const rte_graph_off_t *cir_start = graph->cir_start;
-	const rte_node_t mask = graph->cir_mask;
-	uint32_t head = graph->head;
-	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
-
-	/*
-	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-	 * in a circular buffer fashion.
-	 *
-	 *	+-----+ <= cir_start - head [number of source nodes]
-	 *	|     |
-	 *	| ... | <= source nodes
-	 *	|     |
-	 *	+-----+ <= cir_start [head = 0] [tail = 0]
-	 *	|     |
-	 *	| ... | <= pending streams
-	 *	|     |
-	 *	+-----+ <= cir_start + mask
-	 */
-	while (likely(head != graph->tail)) {
-		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-		node->idx = 0;
-		head = likely((int32_t)head > 0) ? head & mask : head;
-	}
-	graph->tail = 0;
-}
-
 /* Fast path helper functions */
 
 /**
diff --git a/lib/node/ethdev_rx.c b/lib/node/ethdev_rx.c
index 87cccbcc7b..d131034991 100644
--- a/lib/node/ethdev_rx.c
+++ b/lib/node/ethdev_rx.c
@@ -6,7 +6,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "ethdev_rx_priv.h"
 #include "node_private.h"
diff --git a/lib/node/ethdev_tx.c b/lib/node/ethdev_tx.c
index 17231d8b34..7d2d72f823 100644
--- a/lib/node/ethdev_tx.c
+++ b/lib/node/ethdev_tx.c
@@ -5,7 +5,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "ethdev_tx_priv.h"
 
diff --git a/lib/node/ip4_lookup.c b/lib/node/ip4_lookup.c
index b84c066fe3..8bce03d7db 100644
--- a/lib/node/ip4_lookup.c
+++ b/lib/node/ip4_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
diff --git a/lib/node/ip4_rewrite.c b/lib/node/ip4_rewrite.c
index 72ca4b1370..34a920df5e 100644
--- a/lib/node/ip4_rewrite.c
+++ b/lib/node/ip4_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/ip6_lookup.c b/lib/node/ip6_lookup.c
index 912e33688c..2898b3f8f6 100644
--- a/lib/node/ip6_lookup.c
+++ b/lib/node/ip6_lookup.c
@@ -8,7 +8,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_lpm6.h>
 
diff --git a/lib/node/ip6_rewrite.c b/lib/node/ip6_rewrite.c
index 83f698a0c8..198d8d8820 100644
--- a/lib/node/ip6_rewrite.c
+++ b/lib/node/ip6_rewrite.c
@@ -5,7 +5,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
diff --git a/lib/node/kernel_rx.c b/lib/node/kernel_rx.c
index cdd2740170..2dba7c8cc7 100644
--- a/lib/node/kernel_rx.c
+++ b/lib/node/kernel_rx.c
@@ -12,7 +12,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
diff --git a/lib/node/kernel_tx.c b/lib/node/kernel_tx.c
index 0db83a91a9..27d1808c71 100644
--- a/lib/node/kernel_tx.c
+++ b/lib/node/kernel_tx.c
@@ -10,7 +10,7 @@
 #include <rte_debug.h>
 #include <rte_ethdev.h>
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 #include <rte_ip.h>
 
 #include "kernel_tx_priv.h"
diff --git a/lib/node/pkt_cls.c b/lib/node/pkt_cls.c
index 2e5b339a60..a8302b8d28 100644
--- a/lib/node/pkt_cls.c
+++ b/lib/node/pkt_cls.c
@@ -3,7 +3,7 @@
  */
 
 #include <rte_graph.h>
-#include <rte_graph_worker_common.h>
+#include <rte_graph_worker.h>
 
 #include "pkt_cls_priv.h"
 #include "node_private.h"
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 03/16] graph: move node process into inline function
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 01/16] graph: rename rte graph worker header as common Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 02/16] graph: split graph worker into common and default model Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 04/16] graph: add get/set graph worker model APIs Zhirun Yan
                                               ` (13 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Node process is a single and reusable block, move the code into an inline
function.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/rte_graph_model_rtc.h     | 20 ++---------------
 lib/graph/rte_graph_worker_common.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 10b359772f..4b6236e301 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -21,9 +21,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
-	uint64_t start;
-	uint16_t rc;
-	void **objs;
 
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
@@ -42,21 +39,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
 	 */
 	while (likely(head != graph->tail)) {
 		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
-		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
-		objs = node->objs;
-		rte_prefetch0(objs);
-
-		if (rte_graph_has_stats_feature()) {
-			start = rte_rdtsc();
-			rc = node->process(graph, node, objs, node->idx);
-			node->total_cycles += rte_rdtsc() - start;
-			node->total_calls++;
-			node->total_objs += rc;
-		} else {
-			node->process(graph, node, objs, node->idx);
-		}
-			node->idx = 0;
-			head = likely((int32_t)head > 0) ? head & mask : head;
+		__rte_node_process(graph, node);
+		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 475ccdc0ee..a90addb172 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 
 /* Fast path helper functions */
 
+/**
+ * @internal
+ *
+ * Enqueue a given node to the tail of the graph reel.
+ *
+ * @param graph
+ *   Pointer Graph object.
+ * @param node
+ *   Pointer to node object to be enqueued.
+ */
+static __rte_always_inline void
+__rte_node_process(struct rte_graph *graph, struct rte_node *node)
+{
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+
+	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+	objs = node->objs;
+	rte_prefetch0(objs);
+
+	if (rte_graph_has_stats_feature()) {
+		start = rte_rdtsc();
+		rc = node->process(graph, node, objs, node->idx);
+		node->total_cycles += rte_rdtsc() - start;
+		node->total_calls++;
+		node->total_objs += rc;
+	} else {
+		node->process(graph, node, objs, node->idx);
+	}
+	node->idx = 0;
+}
+
 /**
  * @internal
  *
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 04/16] graph: add get/set graph worker model APIs
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (2 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 03/16] graph: move node process into inline function Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 05/16] graph: introduce graph node core affinity API Zhirun Yan
                                               ` (12 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new get/set APIs to configure graph worker model which is used to
determine which model will be chosen.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/meson.build               |  1 +
 lib/graph/rte_graph_worker.c        | 39 ++++++++++++++++
 lib/graph/rte_graph_worker_common.h | 70 +++++++++++++++++++++++++++++
 lib/graph/version.map               |  5 +++
 4 files changed, 115 insertions(+)
 create mode 100644 lib/graph/rte_graph_worker.c

diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 31d78a1dc2..5661dd855b 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'graph_stats.c',
         'graph_populate.c',
         'graph_pcap.c',
+        'rte_graph_worker.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 indirect_headers += files(
diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c
new file mode 100644
index 0000000000..7e2a918fae
--- /dev/null
+++ b/lib/graph/rte_graph_worker.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "rte_graph_worker_common.h"
+#include "graph_private.h"
+
+bool
+rte_graph_model_is_valid(uint8_t model)
+{
+	if (model > RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		return false;
+
+	return true;
+}
+
+int
+rte_graph_worker_model_set(uint8_t model)
+{
+	struct graph_head *graph_head = graph_list_head_get();
+	struct graph *graph;
+
+	if (!rte_graph_model_is_valid(model))
+		return -EINVAL;
+
+	STAILQ_FOREACH(graph, graph_head, next)
+			graph->graph->model = model;
+
+	return 0;
+}
+
+uint8_t
+rte_graph_worker_model_get(struct rte_graph *graph)
+{
+	if (!rte_graph_model_is_valid(graph->model))
+		return -EINVAL;
+
+	return graph->model;
+}
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a90addb172..123600f939 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -29,6 +29,13 @@
 extern "C" {
 #endif
 
+/** Graph worker models */
+/* When adding a new graph model entry, update rte_graph_model_is_valid() implementation. */
+#define RTE_GRAPH_MODEL_RTC 0 /**< Run-To-Completion model. It is the default model. */
+#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
+/**< Dispatch model to support cross-core dispatching within core affinity. */
+#define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
+
 /**
  * @internal
  *
@@ -41,6 +48,9 @@ struct rte_graph {
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+	uint8_t model;		     /**< graph model */
+	uint8_t reserved1;	     /**< Reserved for future use. */
+	uint16_t reserved2;	     /**< Reserved for future use. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -490,6 +500,66 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
 	}
 }
 
+/**
+ * Test the validity of model.
+ *
+ * @param model
+ *   Model to check.
+ *
+ * @return
+ *   True if graph model is valid, false otherwise.
+ */
+__rte_experimental
+bool
+rte_graph_model_is_valid(uint8_t model);
+
+/**
+ * @note This function does not perform any locking, and is only safe to call
+ *    before graph running. It will set all graphs the same model.
+ *
+ * @param model
+ *   Name of the graph worker model.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_graph_worker_model_set(uint8_t model);
+
+/**
+ * Get the graph worker model
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for slow path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+uint8_t rte_graph_worker_model_get(struct rte_graph *graph);
+
+/**
+ * Get the graph worker model without check
+ *
+ * @note All graph will use the same model and this function will get model from the first one.
+ *    Used for fast path.
+ *
+ * @param graph
+ *   Graph pointer.
+ *
+ * @return
+ *   Graph worker model on success.
+ */
+__rte_experimental
+static __rte_always_inline
+uint8_t rte_graph_worker_model_no_check_get(struct rte_graph *graph)
+{
+	return graph->model;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..e9a680a45e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -14,10 +14,15 @@ EXPERIMENTAL {
 	rte_graph_lookup;
 	rte_graph_list_dump;
 	rte_graph_max_count;
+	rte_graph_model_is_valid;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
 	rte_graph_walk;
+	rte_graph_worker_model_get;
+	rte_graph_worker_model_no_check_get;
+	rte_graph_worker_model_set;
+
 
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 05/16] graph: introduce graph node core affinity API
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (3 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 04/16] graph: add get/set graph worker model APIs Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 06/16] graph: introduce graph bind unbind API Zhirun Yan
                                               ` (11 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for node to hold affinity core id and impl
rte_graph_model_mcore_dispatch_lcore_affinity_set to set node affinity
with specific lcore.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h                  |  2 +
 lib/graph/meson.build                      |  2 +
 lib/graph/node.c                           |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 30 +++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h | 45 ++++++++++++++++++++++
 lib/graph/version.map                      |  1 +
 6 files changed, 81 insertions(+)
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.c
 create mode 100644 lib/graph/rte_graph_model_mcore_dispatch.h

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index eacdef45f0..ea4409448d 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -51,6 +51,8 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;
+	/**< Node runs on the Lcore ID used for mcore dispatch model. */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 5661dd855b..8aef451f7b 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -16,9 +16,11 @@ sources = files(
         'graph_populate.c',
         'graph_pcap.c',
         'rte_graph_worker.c',
+        'rte_graph_model_mcore_dispatch.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 indirect_headers += files(
+        'rte_graph_model_mcore_dispatch.h',
         'rte_graph_model_rtc.h',
         'rte_graph_worker_common.h',
         )
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 149414dcd9..339b4a0da5 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
new file mode 100644
index 0000000000..9df2479a10
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#include "graph_private.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
+int
+rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, node_list_head_get(), next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
new file mode 100644
index 0000000000..7da0483d13
--- /dev/null
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+#define _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_
+
+/**
+ * @file rte_graph_model_mcore_dispatch.h
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * These APIs allow to set core affinity with the node and only used for mcore
+ * dispatch model.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_graph_worker_common.h"
+
+/**
+ * Set lcore affinity with the node used for mcore dispatch model.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
+							   unsigned int lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_MODEL_MCORE_DISPATCH_H_ */
diff --git a/lib/graph/version.map b/lib/graph/version.map
index e9a680a45e..6ae19b0d6e 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,7 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
 	rte_graph_obj_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 06/16] graph: introduce graph bind unbind API
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (4 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 05/16] graph: introduce graph node core affinity API Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 07/16] graph: move node clone name func into private as common Zhirun Yan
                                               ` (10 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add lcore_id for graph to hold affinity core id where graph would run on.
Add bind/unbind API to set/unset graph affinity attribute. lcore_id will
be set as MAX by default, it means not enable this attribute.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 60 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 ++
 lib/graph/rte_graph.h     | 22 ++++++++++++++
 lib/graph/version.map     |  3 +-
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 5582631b53..8d5bd8b9ae 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -260,6 +260,65 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	return graph_mem_fixup_node_ctx(graph);
 }
 
+static bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+		    (graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		     graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	if (!rte_lcore_is_enabled(lcore))
+		SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
+
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	if (graph->graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		goto fail;
+
+	graph->lcore_id = lcore;
+	graph->socket = rte_lcore_to_socket_id(lcore);
+
+	/* check the availability of source node */
+	if (!graph_src_node_avail(graph))
+		graph->graph->head = 0;
+
+	return 0;
+
+fail:
+	return -rte_errno;
+}
+
+void
+rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			break;
+
+	graph->lcore_id = RTE_MAX_LCORE;
+
+fail:
+	return;
+}
+
 struct rte_graph *
 rte_graph_lookup(const char *name)
 {
@@ -346,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
 		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index ea4409448d..6d2137c81b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -100,6 +100,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
 	int socket;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index c9a77297fc..f70c694e77 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id);
 __rte_experimental
 int rte_graph_export(const char *name, FILE *f);
 
+/**
+ * Bind graph with specific lcore for mcore dispatch model.
+ *
+ * @param id
+ *   Graph id to get the pointer of graph object
+ * @param lcore
+ * The lcore where the graph will run on
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore);
+
+/**
+ * Unbind graph with lcore for mcore dispatch model
+ *
+ * @param id
+ * Graph id to get the pointer of graph object
+ */
+__rte_experimental
+void rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id);
+
 /**
  * Get graph object from its name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 6ae19b0d6e..9a20dba5e7 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -15,6 +15,8 @@ EXPERIMENTAL {
 	rte_graph_list_dump;
 	rte_graph_max_count;
 	rte_graph_model_is_valid;
+	rte_graph_model_mcore_dispatch_core_bind;
+	rte_graph_model_mcore_dispatch_core_unbind;
 	rte_graph_model_mcore_dispatch_node_lcore_affinity_set;
 	rte_graph_node_get;
 	rte_graph_node_get_by_name;
@@ -24,7 +26,6 @@ EXPERIMENTAL {
 	rte_graph_worker_model_no_check_get;
 	rte_graph_worker_model_set;
 
-
 	rte_graph_cluster_stats_create;
 	rte_graph_cluster_stats_destroy;
 	rte_graph_cluster_stats_get;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 07/16] graph: move node clone name func into private as common
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (5 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 06/16] graph: introduce graph bind unbind API Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
                                               ` (9 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Move clone_name() into graph_private.h as a common function for both node
and graph to naming a new cloned object.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph_private.h | 41 +++++++++++++++++++++++++++++++++++++++
 lib/graph/node.c          | 26 +------------------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 6d2137c81b..a6d8c6e98b 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -11,6 +11,8 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 #include <rte_spinlock.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
 
 #include "rte_graph.h"
 #include "rte_graph_worker.h"
@@ -114,6 +116,45 @@ struct graph {
 	/**< Nodes in a graph. */
 };
 
+/* Node and graph common functions */
+/**
+ * @internal
+ *
+ * Naming a cloned graph or node by appending a string to base name.
+ *
+ * @param new_name
+ *   Pointer to the name of the cloned object.
+ * @param base_name
+ *   Pointer to the name of original object.
+ * @param append_str
+ *   Pointer to the appended string.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static inline int clone_name(char *new_name, char *base_name, const char *append_str)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_MIN(RTE_NODE_NAMESIZE, RTE_GRAPH_NAMESIZE)
+	rc = rte_strscpy(new_name, base_name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(new_name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(new_name + sz, append_str, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
 /* Node functions */
 STAILQ_HEAD(node_head, node);
 
diff --git a/lib/graph/node.c b/lib/graph/node.c
index 339b4a0da5..99a9622779 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -115,30 +115,6 @@ __rte_node_register(const struct rte_node_register *reg)
 	return RTE_NODE_ID_INVALID;
 }
 
-static int
-clone_name(struct rte_node_register *reg, struct node *node, const char *name)
-{
-	ssize_t sz, rc;
-
-#define SZ RTE_NODE_NAMESIZE
-	rc = rte_strscpy(reg->name, node->name, SZ);
-	if (rc < 0)
-		goto fail;
-	sz = rc;
-	rc = rte_strscpy(reg->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
-	if (rc < 0)
-		goto fail;
-	sz += rc;
-	sz = rte_strscpy(reg->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
-	if (sz < 0)
-		goto fail;
-
-	return 0;
-fail:
-	rte_errno = E2BIG;
-	return -rte_errno;
-}
-
 static rte_node_t
 node_clone(struct node *node, const char *name)
 {
@@ -170,7 +146,7 @@ node_clone(struct node *node, const char *name)
 		reg->next_nodes[i] = node->next_nodes[i];
 
 	/* Naming ceremony of the new node. name is node->name + "-" + name */
-	if (clone_name(reg, node, name))
+	if (clone_name(reg->name, node->name, name))
 		goto free;
 
 	rc = __rte_node_register(reg);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 08/16] graph: introduce graph clone API for other worker core
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (6 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 07/16] graph: move node clone name func into private as common Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 09/16] graph: add structure for stream moving between cores Zhirun Yan
                                               ` (8 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/graph/graph.c         | 89 +++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |  2 +
 lib/graph/rte_graph.h     | 20 +++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 112 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 8d5bd8b9ae..1b34f0e543 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -405,6 +405,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
 	graph->lcore_id = RTE_MAX_LCORE;
 	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
 	if (prm->pcap_filename)
@@ -469,6 +470,94 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph->name, parent_graph->name, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = parent_graph->lcore_id;
+	graph->socket = parent_graph->socket;
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Clone the graph model */
+	graph->graph->model = parent_graph->graph->model;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index a6d8c6e98b..354dc8ac0a 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -102,6 +102,8 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	unsigned int lcore_id;
 	/**< Lcore identifier where the graph prefer to run on. Used for mcore dispatch model. */
 	size_t mem_sz;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index f70c694e77..998cade200 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -247,6 +247,26 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create()). And
+ * all cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9a20dba5e7..9e92b54ffa 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -5,6 +5,7 @@ EXPERIMENTAL {
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
 
+	rte_graph_clone;
 	rte_graph_create;
 	rte_graph_destroy;
 	rte_graph_dump;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 09/16] graph: add structure for stream moving between cores
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (7 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 10/16] graph: introduce stream moving cross cores Zhirun Yan
                                               ` (7 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add graph_mcore_dispatch_wq_node to hold graph scheduling workqueue
node.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c                   |  2 ++
 lib/graph/graph_populate.c          |  1 +
 lib/graph/graph_private.h           | 12 ++++++++++++
 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 1b34f0e543..968cbbf86c 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -291,6 +291,7 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
 		goto fail;
 
 	graph->lcore_id = lcore;
+	graph->graph->dispatch.lcore_id = graph->lcore_id;
 	graph->socket = rte_lcore_to_socket_id(lcore);
 
 	/* check the availability of source node */
@@ -314,6 +315,7 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 			break;
 
 	graph->lcore_id = RTE_MAX_LCORE;
+	graph->graph->dispatch.lcore_id = RTE_MAX_LCORE;
 
 fail:
 	return;
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 2c0844ce92..ed596a7711 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->dispatch.lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 354dc8ac0a..d84174b667 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -64,6 +64,18 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ * Used for mcore dispatch model.
+ */
+struct graph_mcore_dispatch_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 123600f939..b34cdf1ffb 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -36,12 +36,20 @@ extern "C" {
 /**< Dispatch model to support cross-core dispatching within core affinity. */
 #define RTE_GRAPH_MODEL_DEFAULT RTE_GRAPH_MODEL_RTC /**< Default graph model. */
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
  * Data structure to hold graph data.
  */
 struct rte_graph {
+	/* Fast path area. */
 	uint32_t tail;		     /**< Tail of circular buffer. */
 	uint32_t head;		     /**< Head of circular buffer. */
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
@@ -51,6 +59,20 @@ struct rte_graph {
 	uint8_t model;		     /**< graph model */
 	uint8_t reserved1;	     /**< Reserved for future use. */
 	uint16_t reserved2;	     /**< Reserved for future use. */
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+			struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+
+			unsigned int lcore_id;  /**< The graph running Lcore. */
+			struct rte_ring *wq;    /**< The work-queue for pending streams. */
+			struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+		} dispatch; /** Only used by dispatch model */
+	};
+	SLIST_ENTRY(rte_graph) next;   /* The next for rte_graph list */
+	/* End of Fast path area.*/
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
@@ -83,6 +105,13 @@ struct rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	RTE_STD_C11
+	union {
+		/* Fast schedule area for mcore dispatch model */
+		struct {
+			unsigned int lcore_id;  /**< Node running lcore. */
+		} dispatch;
+	};
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 10/16] graph: introduce stream moving cross cores
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (8 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 09/16] graph: add structure for stream moving between cores Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
                                               ` (6 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces key functions to allow a worker thread to
enable enqueue and move streams of objects to the next nodes over
different cores for mcore dispatch model.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c                          |   7 +-
 lib/graph/graph_private.h                  |  31 ++++
 lib/graph/meson.build                      |   2 +-
 lib/graph/rte_graph.h                      |  15 +-
 lib/graph/rte_graph_model_mcore_dispatch.c | 158 +++++++++++++++++++++
 lib/graph/rte_graph_model_mcore_dispatch.h |  45 ++++++
 lib/graph/version.map                      |   3 +
 7 files changed, 256 insertions(+), 5 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 968cbbf86c..a5320bd94e 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -473,10 +473,11 @@ rte_graph_destroy(rte_graph_t id)
 }
 
 static rte_graph_t
-graph_clone(struct graph *parent_graph, const char *name)
+graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param *prm)
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
+	RTE_SET_USED(prm);
 
 	graph_spinlock_lock();
 
@@ -547,14 +548,14 @@ graph_clone(struct graph *parent_graph, const char *name)
 }
 
 rte_graph_t
-rte_graph_clone(rte_graph_t id, const char *name)
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm)
 {
 	struct graph *graph;
 
 	GRAPH_ID_CHECK(id);
 	STAILQ_FOREACH(graph, &graph_list, next)
 		if (graph->id == id)
-			return graph_clone(graph, name);
+			return graph_clone(graph, name, prm);
 
 fail:
 	return RTE_GRAPH_ID_INVALID;
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d84174b667..d0ef13b205 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -414,4 +414,35 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue for mcore dispatch model.
+ * All cloned graphs attached to the parent graph MUST be destroyed together
+ * for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+			   struct rte_graph_param *prm);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue for mcore dispatch model.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index 8aef451f7b..cf37a13c65 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -25,4 +25,4 @@ indirect_headers += files(
         'rte_graph_worker_common.h',
         )
 
-deps += ['eal', 'pcapng']
+deps += ['eal', 'pcapng', 'mempool', 'ring']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 998cade200..2ffee520b1 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -169,6 +169,17 @@ struct rte_graph_param {
 	bool pcap_enable; /**< Pcap enable. */
 	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
 	char *pcap_filename; /**< Filename in which packets to be captured.*/
+
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t rsvd; /**< Reserved for rtc model. */
+		} rtc;
+		struct {
+			uint32_t wq_size_max; /**< Maximum size of workqueue for dispatch model. */
+			uint32_t mp_capacity; /**< Capacity of memory pool for dispatch model. */
+		} dispatch;
+	};
 };
 
 /**
@@ -260,12 +271,14 @@ int rte_graph_destroy(rte_graph_t id);
  *   Name of the new graph. The library prepends the parent graph name to the
  * user-specified name. The final graph name will be,
  * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph parameter, includes model-specific parameters in this graph.
  *
  * @return
  *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
  */
 __rte_experimental
-rte_graph_t rte_graph_clone(rte_graph_t id, const char *name);
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_param *prm);
 
 /**
  * Get graph id from graph name.
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 9df2479a10..8f4bc860ab 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -5,6 +5,164 @@
 #include "graph_private.h"
 #include "rte_graph_model_mcore_dispatch.h"
 
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph,
+		       struct rte_graph_param *prm)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+	unsigned int flags = RING_F_SC_DEQ;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	if (prm->dispatch.wq_size_max > 0)
+		wq_size = wq_size <= (prm->dispatch.wq_size_max) ? wq_size :
+			prm->dispatch.wq_size_max;
+
+	if (!rte_is_power_of_2(wq_size))
+		flags |= RING_F_EXACT_SZ;
+
+	graph->dispatch.wq = rte_ring_create(graph->name, wq_size, graph->socket,
+					     flags);
+	if (graph->dispatch.wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	if (prm->dispatch.mp_capacity > 0)
+		wq_size = (wq_size <= prm->dispatch.mp_capacity) ? wq_size :
+			prm->dispatch.mp_capacity;
+
+	graph->dispatch.mp = rte_mempool_create(graph->name, wq_size,
+						sizeof(struct graph_mcore_dispatch_wq_node),
+						0, 0, NULL, NULL, NULL, NULL,
+						graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->dispatch.mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->dispatch.lcore_id = _graph->lcore_id;
+
+	if (parent_graph->dispatch.rq == NULL) {
+		parent_graph->dispatch.rq = &parent_graph->dispatch.rq_head;
+		SLIST_INIT(parent_graph->dispatch.rq);
+	}
+
+	graph->dispatch.rq = parent_graph->dispatch.rq;
+	SLIST_INSERT_HEAD(graph->dispatch.rq, graph, next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->dispatch.wq);
+	graph->dispatch.wq = NULL;
+
+	rte_mempool_free(graph->dispatch.mp);
+	graph->dispatch.mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->dispatch.mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->dispatch.wq, (void *)&wq_node,
+					     sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+					      struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->dispatch.lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, next)
+		if (graph->dispatch.lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void
+__rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph)
+{
+#define WQ_SZ 32
+	struct graph_mcore_dispatch_wq_node *wq_node;
+	struct rte_mempool *mp = graph->dispatch.mp;
+	struct rte_ring *wq = graph->dispatch.wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	struct graph_mcore_dispatch_wq_node *wq_nodes[WQ_SZ];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+
+		__rte_node_process(graph, node);
+
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
+
 int
 rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id)
 {
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 7da0483d13..6163f96c37 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -20,8 +20,53 @@
 extern "C" {
 #endif
 
+#include <rte_errno.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
 #include "rte_graph_worker_common.h"
 
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue for mcore dispatch model.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+bool __rte_noinline __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
+								  struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue for mcore dispatch model.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ *
+ * @note
+ * This implementation is used by mcore dispatch model only and user application
+ * should not call it directly.
+ */
+__rte_experimental
+void __rte_graph_mcore_dispatch_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Set lcore affinity with the node used for mcore dispatch model.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 9e92b54ffa..7e985d6308 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -1,6 +1,9 @@
 EXPERIMENTAL {
 	global:
 
+	__rte_graph_mcore_dispatch_sched_node_enqueue;
+	__rte_graph_mcore_dispatch_sched_wq_process;
+
 	__rte_node_register;
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 11/16] graph: enable create and destroy graph scheduling workqueue
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (9 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 10/16] graph: introduce stream moving cross cores Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
                                               ` (5 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to create and destroy scheduling workqueue into
common graph operations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index a5320bd94e..0c28d925bc 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -451,6 +451,11 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			if (rte_graph_worker_model_get(graph->graph) ==
+			    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+				graph_sched_wq_destroy(graph);
+
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -477,7 +482,6 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 {
 	struct graph_node *graph_node;
 	struct graph *graph;
-	RTE_SET_USED(prm);
 
 	graph_spinlock_lock();
 
@@ -525,6 +529,11 @@ graph_clone(struct graph *parent_graph, const char *name, struct rte_graph_param
 	/* Clone the graph model */
 	graph->graph->model = parent_graph->graph->model;
 
+	/* Create the graph schedule work queue */
+	if (rte_graph_worker_model_get(graph->graph) == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    graph_sched_wq_create(graph, parent_graph, prm))
+		goto graph_mem_destroy;
+
 	/* Call init() of the all the nodes in the graph */
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 12/16] graph: introduce graph walk by cross-core dispatch
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (10 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:58                             ` [PATCH v15 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
                                               ` (4 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch introduces the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.h | 43 ++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 6163f96c37..79c3ad5b40 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -83,6 +83,49 @@ __rte_experimental
 int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 							   unsigned int lcore_id);
 
+/**
+ * Perform graph walk on the circular buffer and invoke the process function
+ * of the nodes and collect the stats.
+ *
+ * @param graph
+ *   Graph pointer returned from rte_graph_lookup function.
+ *
+ * @see rte_graph_lookup()
+ */
+__rte_experimental
+static inline void
+rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
+{
+	const rte_graph_off_t *cir_start = graph->cir_start;
+	const rte_node_t mask = graph->cir_mask;
+	uint32_t head = graph->head;
+	struct rte_node *node;
+
+	if (graph->dispatch.wq != NULL)
+		__rte_graph_mcore_dispatch_sched_wq_process(graph);
+
+	while (likely(head != graph->tail)) {
+		node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+		/* skip the src nodes which not bind with current worker */
+		if ((int32_t)head < 0 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
+			continue;
+
+		/* Schedule the node until all task/objs are done */
+		if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
+		    graph->dispatch.lcore_id != node->dispatch.lcore_id &&
+		    graph->dispatch.rq != NULL &&
+		    __rte_graph_mcore_dispatch_sched_node_enqueue(node, graph->dispatch.rq))
+			continue;
+
+		__rte_node_process(graph, node);
+
+		head = likely((int32_t)head > 0) ? head & mask : head;
+	}
+
+	graph->tail = 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 13/16] graph: enable graph multicore dispatch scheduler model
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (11 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
@ 2023-06-14 15:58                             ` Zhirun Yan
  2023-06-14 15:59                             ` [PATCH v15 14/16] graph: add stats for mcore dispatch model Zhirun Yan
                                               ` (3 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:58 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

This patch enables to chose new scheduler model. Must define
RTE_GRAPH_MODEL_SELECT before including rte_graph_worker.h
to enable specific model choosing.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/prog_guide/graph_lib.rst | 71 ++++++++++++++++++++++++++---
 lib/graph/rte_graph_worker.h        | 13 ++++++
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 07248f06c4..6f85e6faef 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -189,13 +189,70 @@ In the above example, A graph object will be created with ethdev Rx
 node of port 0 and queue 0, all ipv4* nodes in the system,
 and ethdev tx node of all ports.
 
-Multicore graph processing
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the current graph library implementation, specifically,
-``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions
-are designed to work on single-core to have better performance.
-The fast path API works on graph object, So the multi-core graph
-processing strategy would be to create graph object PER WORKER.
+Graph models
+~~~~~~~~~~~~
+There are two different kinds of graph walking models. User can select the model using
+``rte_graph_worker_model_set()`` API. If the application decides to use only one model,
+the fast path check can be avoided by defining the model with RTE_GRAPH_MODEL_SELECT.
+For example:
+
+.. code-block:: console
+
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
+#include "rte_graph_worker.h"
+
+RTC (Run-To-Completion)
+^^^^^^^^^^^^^^^^^^^^^^^
+This is the default graph walking model. Specifically, ``rte_graph_walk_rtc()`` and
+``rte_node_enqueue*`` fast path API functions are designed to work on single-core to
+have better performance. The fast path API works on graph object, So the multi-core
+graph processing strategy would be to create graph object PER WORKER.
+
+Example:
+
+Graph: node-0 -> node-1 -> node-2 @Core0.
+
+.. code-block:: diff
+
+    + - - - - - - - - - - - - - - - - - - - - - +
+    '                  Core #0                  '
+    '                                           '
+    ' +--------+     +---------+     +--------+ '
+    ' | Node-0 | --> | Node-1  | --> | Node-2 | '
+    ' +--------+     +---------+     +--------+ '
+    '                                           '
+    + - - - - - - - - - - - - - - - - - - - - - +
+
+Dispatch model
+^^^^^^^^^^^^^^
+The dispatch model enables a cross-core dispatching mechanism which employs
+a scheduling work-queue to dispatch streams to other worker cores which
+being associated with the destination node.
+
+Use ``rte_graph_model_mcore_dispatch_lcore_affinity_set()`` to set lcore affinity
+with the node.
+Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to clone
+graph for each worker and use``rte_graph_model_mcore_dispatch_core_bind()`` to
+bind graph with the worker core.
+
+Example:
+
+Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
+Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
+
+.. code-block:: diff
+
+    + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
+    '  Core #0   '     '          Core #1         '     '  Core #2   '
+    '            '     '                          '     '            '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
+    ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
+    '            '     '     |                    '     '      ^     '
+    + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
+                             |                                 |
+                             + - - - - - - - - - - - - - - - - +
+
 
 In fast path
 ~~~~~~~~~~~~
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 5b58f7bda9..6685600813 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -11,6 +11,7 @@ extern "C" {
 #endif
 
 #include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
 
 /**
  * Perform graph walk on the circular buffer and invoke the process function
@@ -25,7 +26,19 @@ __rte_experimental
 static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
+#if defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC)
 	rte_graph_walk_rtc(graph);
+#elif defined(RTE_GRAPH_MODEL_SELECT) && (RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+	rte_graph_walk_mcore_dispatch(graph);
+#else
+	switch (rte_graph_worker_model_no_check_get(graph)) {
+	case RTE_GRAPH_MODEL_MCORE_DISPATCH:
+		rte_graph_walk_mcore_dispatch(graph);
+		break;
+	default:
+		rte_graph_walk_rtc(graph);
+	}
+#endif
 }
 
 #ifdef __cplusplus
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 14/16] graph: add stats for mcore dispatch model
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (12 preceding siblings ...)
  2023-06-14 15:58                             ` [PATCH v15 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
@ 2023-06-14 15:59                             ` Zhirun Yan
  2023-06-14 15:59                             ` [PATCH v15 15/16] test/graph: add functional tests " Zhirun Yan
                                               ` (2 subsequent siblings)
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:59 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add stats for mcore dispatch model if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/graph/graph_debug.c                    |  6 ++
 lib/graph/graph_stats.c                    | 76 +++++++++++++++++++---
 lib/graph/rte_graph.h                      | 10 +++
 lib/graph/rte_graph_model_mcore_dispatch.c |  3 +
 lib/graph/rte_graph_worker_common.h        |  2 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..9def3067ec 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		if (rte_graph_worker_model_get(g) == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+				n->dispatch.total_sched_objs);
+			fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+				n->dispatch.total_sched_fail);
+		}
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index c0140ba922..cc32245c05 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -40,13 +40,19 @@ struct rte_graph_cluster_stats {
 	struct cluster_node clusters[];
 } __rte_cache_aligned;
 
+#define boarder_model_dispatch()                                                              \
+	fprintf(f, "+-------------------------------+---------------+--------" \
+		   "-------+---------------+---------------+---------------+" \
+		   "---------------+---------------+-" \
+		   "----------+\n")
+
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
 		   "-------+---------------+---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
-print_banner(FILE *f)
+print_banner_default(FILE *f)
 {
 	boarder();
 	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
@@ -55,6 +61,28 @@ print_banner(FILE *f)
 	boarder();
 }
 
+static inline void
+print_banner_dispatch(FILE *f)
+{
+	boarder_model_dispatch();
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
+		"|cycles/call|");
+	boarder_model_dispatch();
+}
+
+static inline void
+print_banner(FILE *f)
+{
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		print_banner_dispatch(f);
+	else
+		print_banner_default(f);
+}
+
 static inline void
 print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 {
@@ -76,11 +104,22 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 	objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
 	objs_per_sec /= 1000000;
 
-	fprintf(f,
-		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
-		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
-		objs_per_sec, cycles_per_call);
+	if (rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
+	    RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->dispatch.sched_objs,
+			stat->dispatch.sched_fail, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	} else {
+		fprintf(f,
+			"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+			"|%-15.3f|%-15.6f|%-11.4f|\n",
+			stat->name, calls, objs, stat->realloc_count, objs_per_call,
+			objs_per_sec, cycles_per_call);
+	}
 }
 
 static int
@@ -88,13 +127,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
 		       const struct rte_graph_cluster_node_stats *stat)
 {
 	FILE *f = cookie;
+	int model;
+
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
 	if (unlikely(is_first))
 		print_banner(f);
 	if (stat->objs)
 		print_node(f, stat);
-	if (unlikely(is_last))
-		boarder();
+	if (unlikely(is_last)) {
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+			boarder_model_dispatch();
+		else
+			boarder();
+	}
 
 	return 0;
 };
@@ -333,12 +379,20 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_node *node;
 	rte_node_t count;
+	int model;
 
+	model = rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+			sched_objs += node->dispatch.total_sched_objs;
+			sched_fail += node->dispatch.total_sched_fail;
+		}
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -348,6 +402,12 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+
+	if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		stat->dispatch.sched_objs = sched_objs;
+		stat->dispatch.sched_fail = sched_fail;
+	}
+
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 2ffee520b1..28e50e49b8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -220,6 +220,16 @@ struct rte_graph_cluster_node_stats {
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
 	uint64_t prev_cycles;	/**< Previous number of cycles. */
 
+	RTE_STD_C11
+	union {
+		struct {
+			uint64_t sched_objs;
+			/**< Previous number of scheduled objs for dispatch model. */
+			uint64_t sched_fail;
+			/**< Previous number of failed schedule objs for dispatch model. */
+		} dispatch;
+	};
+
 	uint64_t realloc_count; /**< Realloc count. */
 
 	rte_node_t id;	/**< Node identifier of stats. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index 8f4bc860ab..d1291b8c57 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -96,6 +96,7 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		rte_pause();
 
 	off += size;
+	node->dispatch.total_sched_objs += size;
 	node->idx -= size;
 	if (node->idx > 0)
 		goto submit_again;
@@ -107,6 +108,8 @@ __graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
 		memmove(&node->objs[0], &node->objs[off],
 			node->idx * sizeof(void *));
 
+	node->dispatch.total_sched_fail += node->idx;
+
 	return false;
 }
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index b34cdf1ffb..a3824590cd 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,8 @@ struct rte_node {
 		/* Fast schedule area for mcore dispatch model */
 		struct {
 			unsigned int lcore_id;  /**< Node running lcore. */
+			uint64_t total_sched_objs; /**< Number of objects scheduled. */
+			uint64_t total_sched_fail; /**< Number of scheduled failure. */
 		} dispatch;
 	};
 	/* Fast path area  */
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 15/16] test/graph: add functional tests for mcore dispatch model
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (13 preceding siblings ...)
  2023-06-14 15:59                             ` [PATCH v15 14/16] graph: add stats for mcore dispatch model Zhirun Yan
@ 2023-06-14 15:59                             ` Zhirun Yan
  2023-06-14 15:59                             ` [PATCH v15 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
  2023-06-19 20:45                             ` [PATCH v15 00/16] graph enhancement for multi-core dispatch David Marchand
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:59 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add functional test for mcore dispatch model including graph clone,
graph model set/get, node worker affinity, graph worker binding/unbinding.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 app/test/test_graph.c | 130 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/app/test/test_graph.c b/app/test/test_graph.c
index 1a2d1e6fab..8609c0b3a4 100644
--- a/app/test/test_graph.c
+++ b/app/test/test_graph.c
@@ -660,6 +660,132 @@ test_create_graph(void)
 	return 0;
 }
 
+static int
+test_graph_clone(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph_param graph_conf;
+	int ret = 0;
+
+	main_graph_id = rte_graph_from_name("worker0");
+	if (main_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Must create main graph first\n");
+		ret = -1;
+	}
+
+	graph_conf.dispatch.mp_capacity = 1024;
+	graph_conf.dispatch.wq_size_max = 32;
+
+	cloned_graph_id = rte_graph_clone(main_graph_id, "cloned-test0", &graph_conf);
+
+	if (cloned_graph_id == RTE_GRAPH_ID_INVALID) {
+		printf("Graph creation failed with error = %d\n", rte_errno);
+		ret = -1;
+	}
+
+	if (strcmp(rte_graph_id_to_name(cloned_graph_id), "worker0-cloned-test0")) {
+		printf("Cloned graph should name as %s but get %s\n", "worker0-cloned-test",
+		       rte_graph_id_to_name(cloned_graph_id));
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_node_lcore_affinity_set(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	rte_node_t nid = RTE_NODE_ID_INVALID;
+	char node_name[64] = "test_node00";
+	struct rte_node *node;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name, worker_lcore);
+	if (ret == 0)
+		printf("Set node %s affinity to lcore %u\n", node_name, worker_lcore);
+
+	nid = rte_node_from_name(node_name);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test1", NULL);
+	node = rte_graph_node_get(cloned_graph_id, nid);
+
+	if (node->dispatch.lcore_id != worker_lcore) {
+		printf("set node affinity failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_model_mcore_dispatch_core_bind_unbind(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	unsigned int worker_lcore = RTE_MAX_LCORE;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test2", NULL);
+
+	ret = rte_graph_model_mcore_dispatch_core_bind(cloned_graph_id, worker_lcore);
+	if (ret != 0) {
+		printf("bind graph %d to lcore %u failed\n", graph_id, worker_lcore);
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test2");
+
+	if (graph->dispatch.lcore_id != worker_lcore) {
+		printf("bind graph %s(id:%d) with lcore %u failed\n",
+		       graph->name, graph->id, worker_lcore);
+		ret = -1;
+	}
+
+	rte_graph_model_mcore_dispatch_core_unbind(cloned_graph_id);
+	if (graph->dispatch.lcore_id != RTE_MAX_LCORE) {
+		printf("unbind graph %s(id:%d) failed %d\n",
+		       graph->name, graph->id, graph->dispatch.lcore_id);
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return ret;
+}
+
+static int
+test_graph_worker_model_set_get(void)
+{
+	rte_graph_t cloned_graph_id = RTE_GRAPH_ID_INVALID;
+	struct rte_graph *graph;
+	int ret = 0;
+
+	cloned_graph_id = rte_graph_clone(graph_id, "cloned-test3", NULL);
+	ret = rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	if (ret != 0) {
+		printf("Set graph mcore dispatch model failed\n");
+		ret = -1;
+	}
+
+	graph = rte_graph_lookup("worker0-cloned-test3");
+	if (rte_graph_worker_model_get(graph) != RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+		printf("Get graph worker model failed\n");
+		ret = -1;
+	}
+
+	rte_graph_destroy(cloned_graph_id);
+
+	return 0;
+}
+
 static int
 test_graph_walk(void)
 {
@@ -837,6 +963,10 @@ static struct unit_test_suite graph_testsuite = {
 		TEST_CASE(test_update_edges),
 		TEST_CASE(test_lookup_functions),
 		TEST_CASE(test_create_graph),
+		TEST_CASE(test_graph_clone),
+		TEST_CASE(test_graph_model_mcore_dispatch_node_lcore_affinity_set),
+		TEST_CASE(test_graph_model_mcore_dispatch_core_bind_unbind),
+		TEST_CASE(test_graph_worker_model_set_get),
 		TEST_CASE(test_graph_lookup_functions),
 		TEST_CASE(test_graph_walk),
 		TEST_CASE(test_print_stats),
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* [PATCH v15 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (14 preceding siblings ...)
  2023-06-14 15:59                             ` [PATCH v15 15/16] test/graph: add functional tests " Zhirun Yan
@ 2023-06-14 15:59                             ` Zhirun Yan
  2023-06-19 20:45                             ` [PATCH v15 00/16] graph enhancement for multi-core dispatch David Marchand
  16 siblings, 0 replies; 369+ messages in thread
From: Zhirun Yan @ 2023-06-14 15:59 UTC (permalink / raw)
  To: dev, jerinj, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, david.marchand
  Cc: cunming.liang, haiyue.wang, mattias.ronnblom, Zhirun Yan

Add new parameter "model" to choose mcore dispatch or rtc model.
And in dispatch model, the node will affinity to worker core successively.

RTE_GRAPH_MODEL_SELECT is set to RTE_GRAPH_MODEL_RTC by default. Must set
model the same as RTE_GRAPH_MODEL_SELECT if set it as rtc or mcore
dispatch explicitly. If not define it, it could choose by param model
in runtime.
Only support one RX node for mcore dispatch model in current
implementation.

./dpdk-l3fwd-graph  -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P
--model="dispatch"

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/sample_app_ug/l3_forward_graph.rst |  16 ++
 examples/l3fwd-graph/main.c                   | 230 +++++++++++++++---
 2 files changed, 208 insertions(+), 38 deletions(-)

diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index 23f86e4785..eb56f9b94a 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -57,6 +57,7 @@ The application has a number of command line options similar to l3fwd::
                                    [--pcap-enable]
                                    [--pcap-num-cap]
                                    [--pcap-file-name]
+                                   [--model]
 
 Where,
 
@@ -81,6 +82,8 @@ Where,
 
 * ``--pcap-file-name:`` Optional, Pcap filename to capture packets in.
 
+* ``--model:`` Optional, select graph walking model.
+
 For example, consider a dual processor socket platform with 8 physical cores, where cores 0-7 and 16-23 appear on socket 0,
 while cores 8-15 and 24-31 appear on socket 1.
 
@@ -125,6 +128,19 @@ In this command:
 
 *   The --pcap-file-name option enables user to give filename in which packets are to be captured.
 
+To enable mcore dispatch model, the application need change RTE_GRAPH_MODEL_SELECT to ``#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_MCORE_DISPATCH``
+before including rte_graph_worker.h. Recompile and use following command:
+
+.. code-block:: console
+
+    ./<build_dir>/examples/dpdk-l3fwd-graph -l 1,2,3,4 -n 4 -- -p 0x1 --config="(0,0,1)" -P --model="dispatch"
+
+To enable graph walking model selection in run-time, remove the define of ``RTE_GRAPH_MODEL_SELECT``. Recompile and use the same command.
+
+In this command:
+
+*   The --model option enables user to select ``rtc`` or ``dispatch`` model.
+
 Refer to the *DPDK Getting Started Guide* for general information on running applications and
 the Environment Abstraction Layer (EAL) options.
 
diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
index 0c82e24513..96cb1c81ff 100644
--- a/examples/l3fwd-graph/main.c
+++ b/examples/l3fwd-graph/main.c
@@ -23,6 +23,7 @@
 #include <rte_cycles.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
+#define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC
 #include <rte_graph_worker.h>
 #include <rte_launch.h>
 #include <rte_lcore.h>
@@ -57,6 +58,9 @@
 
 #define NB_SOCKETS 8
 
+/* Graph module */
+#define WORKER_MODEL_RTC "rtc"
+#define WORKER_MODEL_MCORE_DISPATCH "dispatch"
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RX_DESC_DEFAULT;
 static uint16_t nb_txd = TX_DESC_DEFAULT;
@@ -90,6 +94,8 @@ struct lcore_rx_queue {
 	char node_name[RTE_NODE_NAMESIZE];
 };
 
+static uint8_t model_conf = RTE_GRAPH_MODEL_DEFAULT;
+
 /* Lcore conf */
 struct lcore_conf {
 	uint16_t n_rx_queue;
@@ -183,6 +189,19 @@ static struct ipv6_l3fwd_lpm_route ipv6_l3fwd_lpm_route_array[] = {
 	0x02}, 48, 7},
 };
 
+static int
+check_worker_model_params(void)
+{
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH &&
+	    nb_lcore_params > 1) {
+		printf("Exceeded max number of lcore params for remote model: %hu\n",
+		       nb_lcore_params);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 check_lcore_params(void)
 {
@@ -306,6 +325,7 @@ print_usage(const char *prgname)
 		"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for "
 		"port X\n"
 		"  --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n"
+		"  --model NAME: walking model name, dispatch or rtc(by default)\n"
 		"  --no-numa: Disable numa awareness\n"
 		"  --per-port-pool: Use separate buffer pool per port\n"
 		"  --pcap-enable: Enables pcap capture\n"
@@ -348,6 +368,23 @@ parse_max_pkt_len(const char *pktlen)
 	return len;
 }
 
+static void
+parse_worker_model(const char *model)
+{
+	if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0)
+		model_conf = RTE_GRAPH_MODEL_MCORE_DISPATCH;
+	else if (strcmp(model, WORKER_MODEL_RTC) == 0)
+		model_conf = RTE_GRAPH_MODEL_RTC;
+	else
+		rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model);
+
+#if defined(RTE_GRAPH_MODEL_SELECT)
+	if (model_conf != RTE_GRAPH_MODEL_SELECT)
+		printf("Warning: model mismatch, will use the RTE_GRAPH_MODEL_SELECT model\n");
+	model_conf = RTE_GRAPH_MODEL_SELECT;
+#endif
+}
+
 static int
 parse_portmask(const char *portmask)
 {
@@ -464,6 +501,8 @@ static const char short_options[] = "p:" /* portmask */
 #define CMD_LINE_OPT_PCAP_ENABLE   "pcap-enable"
 #define CMD_LINE_OPT_NUM_PKT_CAP   "pcap-num-cap"
 #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name"
+#define CMD_LINE_OPT_WORKER_MODEL  "model"
+
 enum {
 	/* Long options mapped to a short option */
 
@@ -479,6 +518,7 @@ enum {
 	CMD_LINE_OPT_PARSE_PCAP_ENABLE,
 	CMD_LINE_OPT_PARSE_NUM_PKT_CAP,
 	CMD_LINE_OPT_PCAP_FILENAME_CAP,
+	CMD_LINE_OPT_WORKER_MODEL_TYPE,
 };
 
 static const struct option lgopts[] = {
@@ -490,6 +530,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE},
 	{CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP},
 	{CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP},
+	{CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE},
 	{NULL, 0, 0, 0},
 };
 
@@ -581,6 +622,11 @@ parse_args(int argc, char **argv)
 			printf("Pcap file name: %s\n", pcap_filename);
 			break;
 
+		case CMD_LINE_OPT_WORKER_MODEL_TYPE:
+			printf("Use new worker model: %s\n", optarg);
+			parse_worker_model(optarg);
+			break;
+
 		default:
 			print_usage(prgname);
 			return -1;
@@ -818,6 +864,142 @@ config_port_max_pkt_len(struct rte_eth_conf *conf,
 	return 0;
 }
 
+static void
+graph_config_mcore_dispatch(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	int worker_count = rte_lcore_count() - 1;
+	int main_lcore_id = rte_get_main_lcore();
+	rte_graph_t main_graph_id = 0;
+	struct rte_node *node_tmp;
+	struct lcore_conf *qconf;
+	struct rte_graph *graph;
+	rte_graph_t graph_id;
+	rte_graph_off_t off;
+	int n_rx_node = 0;
+	int worker_lcore;
+	rte_node_t count;
+	int i, j;
+	int ret;
+
+	for (j = 0; j < nb_lcore_params; j++) {
+		qconf = &lcore_conf[lcore_params[j].lcore_id];
+		/* Add rx node patterns of all lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			char *node_name = qconf->rx_queue_list[i].node_name;
+			unsigned int lcore_id = lcore_params[j].lcore_id;
+
+			graph_conf.node_patterns[nb_patterns + n_rx_node + i] = node_name;
+			n_rx_node++;
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_name,
+										     lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n", node_name,
+				       lcore_params[j].lcore_id);
+		}
+	}
+
+	graph_conf.nb_node_patterns = nb_patterns + n_rx_node;
+	graph_conf.socket_id = rte_lcore_to_socket_id(main_lcore_id);
+
+	qconf = &lcore_conf[main_lcore_id];
+	snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+		 main_lcore_id);
+
+	/* create main graph */
+	main_graph_id = rte_graph_create(qconf->name, &graph_conf);
+	if (main_graph_id == RTE_GRAPH_ID_INVALID)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_create(): main_graph_id invalid for lcore %u\n",
+			 main_lcore_id);
+
+	/* set the graph model for the main graph */
+	rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH);
+	qconf->graph_id = main_graph_id;
+	qconf->graph = rte_graph_lookup(qconf->name);
+	if (!qconf->graph)
+		rte_exit(EXIT_FAILURE,
+			 "rte_graph_lookup(): graph %s not found\n",
+			 qconf->name);
+
+	graph = qconf->graph;
+	worker_lcore = lcore_params[nb_lcore_params - 1].lcore_id;
+	rte_graph_foreach_node(count, off, graph, node_tmp) {
+		/* Need to set the node Lcore affinity before clone graph for each lcore */
+		if (node_tmp->dispatch.lcore_id == RTE_MAX_LCORE) {
+			worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+			ret = rte_graph_model_mcore_dispatch_node_lcore_affinity_set(node_tmp->name,
+										     worker_lcore);
+			if (ret == 0)
+				printf("Set node %s affinity to lcore %u\n",
+				       node_tmp->name, worker_lcore);
+		}
+	}
+
+	worker_lcore = main_lcore_id;
+	for (i = 0; i < worker_count; i++) {
+		worker_lcore = rte_get_next_lcore(worker_lcore, true, 1);
+
+		qconf = &lcore_conf[worker_lcore];
+		snprintf(qconf->name, sizeof(qconf->name), "cloned-%u", worker_lcore);
+		graph_id = rte_graph_clone(main_graph_id, qconf->name, &graph_conf);
+		ret = rte_graph_model_mcore_dispatch_core_bind(graph_id, worker_lcore);
+		if (ret == 0)
+			printf("bind graph %d to lcore %u\n", graph_id, worker_lcore);
+
+		/* full cloned graph name */
+		snprintf(qconf->name, sizeof(qconf->name), "%s",
+			 rte_graph_id_to_name(graph_id));
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 qconf->name);
+		continue;
+	}
+}
+
+static void
+graph_config_rtc(struct rte_graph_param graph_conf)
+{
+	uint16_t nb_patterns = graph_conf.nb_node_patterns;
+	struct lcore_conf *qconf;
+	rte_graph_t graph_id;
+	uint32_t lcore_id;
+	rte_edge_t i;
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0)
+			continue;
+
+		qconf = &lcore_conf[lcore_id];
+		/* Skip graph creation if no source exists */
+		if (!qconf->n_rx_queue)
+			continue;
+		/* Add rx node patterns of this lcore */
+		for (i = 0; i < qconf->n_rx_queue; i++) {
+			graph_conf.node_patterns[nb_patterns + i] =
+				qconf->rx_queue_list[i].node_name;
+		}
+		graph_conf.nb_node_patterns = nb_patterns + i;
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
+			 lcore_id);
+		graph_id = rte_graph_create(qconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_create(): graph_id invalid for lcore %u\n",
+				 lcore_id);
+		qconf->graph_id = graph_id;
+		qconf->graph = rte_graph_lookup(qconf->name);
+		if (!qconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "rte_graph_lookup(): graph %s not found\n",
+				 qconf->name);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -870,6 +1052,9 @@ main(int argc, char **argv)
 	if (check_lcore_params() < 0)
 		rte_exit(EXIT_FAILURE, "check_lcore_params() failed\n");
 
+	if (check_worker_model_params() < 0)
+		rte_exit(EXIT_FAILURE, "check_worker_model_params() failed\n");
+
 	ret = init_lcore_rx_queues();
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "init_lcore_rx_queues() failed\n");
@@ -1109,51 +1294,20 @@ main(int argc, char **argv)
 
 	memset(&graph_conf, 0, sizeof(graph_conf));
 	graph_conf.node_patterns = node_patterns;
+	graph_conf.nb_node_patterns = nb_patterns;
 
 	/* Pcap config */
 	graph_conf.pcap_enable = pcap_trace_enable;
 	graph_conf.num_pkt_to_capture = packet_to_capture;
 	graph_conf.pcap_filename = pcap_filename;
 
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		rte_graph_t graph_id;
-		rte_edge_t i;
-
-		if (rte_lcore_is_enabled(lcore_id) == 0)
-			continue;
-
-		qconf = &lcore_conf[lcore_id];
-
-		/* Skip graph creation if no source exists */
-		if (!qconf->n_rx_queue)
-			continue;
-
-		/* Add rx node patterns of this lcore */
-		for (i = 0; i < qconf->n_rx_queue; i++) {
-			graph_conf.node_patterns[nb_patterns + i] =
-				qconf->rx_queue_list[i].node_name;
-		}
-
-		graph_conf.nb_node_patterns = nb_patterns + i;
-		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
-
-		snprintf(qconf->name, sizeof(qconf->name), "worker_%u",
-			 lcore_id);
-
-		graph_id = rte_graph_create(qconf->name, &graph_conf);
-		if (graph_id == RTE_GRAPH_ID_INVALID)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_create(): graph_id invalid"
-				 " for lcore %u\n", lcore_id);
+	if (model_conf == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+		graph_config_mcore_dispatch(graph_conf);
+	else
+		graph_config_rtc(graph_conf);
 
-		qconf->graph_id = graph_id;
-		qconf->graph = rte_graph_lookup(qconf->name);
-		/* >8 End of graph initialization. */
-		if (!qconf->graph)
-			rte_exit(EXIT_FAILURE,
-				 "rte_graph_lookup(): graph %s not found\n",
-				 qconf->name);
-	}
+	rte_graph_worker_model_set(model_conf);
+	/* >8 End of graph initialization. */
 
 	memset(&rewrite_data, 0, sizeof(rewrite_data));
 	rewrite_len = sizeof(rewrite_data);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 369+ messages in thread

* Re: [PATCH v15 00/16] graph enhancement for multi-core dispatch
  2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
                                               ` (15 preceding siblings ...)
  2023-06-14 15:59                             ` [PATCH v15 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
@ 2023-06-19 20:45                             ` David Marchand
  16 siblings, 0 replies; 369+ messages in thread
From: David Marchand @ 2023-06-19 20:45 UTC (permalink / raw)
  To: Zhirun Yan, jerinj
  Cc: dev, kirankumark, ndabilpuram, stephen, pbhagavatula,
	jerinjacobk, cunming.liang, haiyue.wang, mattias.ronnblom

On Wed, Jun 14, 2023 at 6:13 PM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> V15:
> Fix build issues. (add unused macro in patch 10 then remove in patch 11,
> fix log message format in patch 01, rm wrong assert in dispatch_walk in patch 12)
>
> V14:
> Rebase to main and fix build issues.(align header name in patch 01,02)
>
> V13:
> Add sub header into meson indirect_headers list to export.(change meson.build in patch02, 05)
>
> V12:
> Fix compilation broken at patch 1.(keep renamed header align with patch 1,2)
>
> V11:
> Update comments and fix to add experimental flags for rte_graph_model_is_valid() in patch 04.
> Update added symbols in alphabetical order in version.map with patch 04,05,06,08,10.
> Update commit message in patch 16.
>
> V10:
> Add rte_graph_worker_model_no_check_get() for fast path, extract rte_graph_model_is_valid()
> in patch 04.
> Change RTE_ASSERT to return in patch 06.
> Change to treat not defined RTE_GRAPH_MODEL_SELECT as runtime pick in patch 13.
> Move stats into dispatch union in patch 14.
> Change example to align with RTE_GRAPH_MODEL_SELECT scheme in patch 16.
> Squash patch 17(doc) into patch 13(prog_guide), 16(example guide).
>
>
> V9:
> Fix CI build issues for doc building(move TAILQ next pointer out of union) in patch 09,10.
> Fix graph model check in rte_graph_worker_model_set() in patch 04.
> Fix typo in doc.
>
> V8:
> No performance dorp for original l3fwd-graph and graph_perf_autotest.
>
> Update graph model set/get functions and add graph_model_is_valid() in patch 04.
> Update doc for new scheme usage(choose model in runtime or compile time).
> Update dispatch schedule struct into union.
> Change enum rte_graph_worker_model to macro define in rte_graph_worker_common.h.
> Add model clone in graph_clone() in patch 08.
> Remove unnecessary inline for slow path func graph_src_node_avail() in patch 06.
>
>
> V7:
> Revert rte_rdtsc_precise() in fastpath to fix performance issues in patch 03.
> Introduce new scheme for model choosing. Use RTE_GRAPH_MODEL_SELECT to choose in
>   compile-time in patch 13, 15.(must have rte_graph_worker_model_set() to help
>   other config func to do model specific things like alloc wq, collect stats)
> Extract the common func clone_name() into graph_private.h for graph/node clone in
>   patch 07.(new patch)
> Use rte_graph->model in rte_graph_worker_model_set() instead of RTE_PER_LCORE_*.
> Add test case for all new APIs in patch 16(new patch).
> Remove *_END line in enum rte_graph_worker_model in patch 04.
> Add model check for graph lcore binding.
> Rename workqueue as graph_mcore_dispatch_wq_node in patch 09.
> Change all new model files/APIs with prefix _mcore_dispatch_.
> Change description of new API, comments of func/structure to explicitly mention for
>   mcore dispatch model only. Add Doxygen comments.
> Update l3fwd-graph with new scheme, Update doc.
> Update MAINTAINERS.
> Fix typo and format issues.
>
> V6:
> Change rte_rdtsc() to rte_rdtsc_precise().
> Add union in rte_graph_param to configure models.
> Remove memset in fastpath, add RTE_ASSERT for cloned graph.
> Update copyright in patch 02.
> Update l3fwd-graph node affinity, start from rx core successively.
>
> V5:
> Fix CI build issues about dynamically update doc.
>
> V4:
> Fix CI build issues about undefined reference of sched apis.
> Remove inline for model setting.
>
> V3:
> Fix CI build issues about TLS and typo.
>
> V2:
> Use git mv to keep git history.
> Use TLS for per-thread local storage.
> Change model name to mcore dispatch.
> Change API with specific mode name.
> Split big patch.
> Fix CI issues.
> Rebase l3fwd-graph example.
> Update doc and maintainers files.
>
>
> Currently, rte_graph supports RTC (Run-To-Completion) model within each
> of a single core.
> RTC is one of the typical model of packet processing. Others like
> Pipeline or Hybrid are lack of support.
>
> The patch set introduces a 'multicore dispatch' model selection which
> is a self-reacting scheme according to the core affinity.
> The new model enables a cross-core dispatching mechanism which employs a
> scheduling work-queue to dispatch streams to other worker cores which
> being associated with the destination node. When core flavor of the
> destination node is a default 'current', the stream can be continue
> executed as normal.
>
> Example:
> 3-node graph targets 3-core budget
>
> RTC:
> Graph: node-0 -> node-1 -> node-2 @Core0.
>
> + - - - - - - - - - - - - - - - - - - - - - +
> '                Core #0/1/2                '
> '                                           '
> ' +--------+     +---------+     +--------+ '
> ' | Node-0 | --> | Node-1  | --> | Node-2 | '
> ' +--------+     +---------+     +--------+ '
> '                                           '
> + - - - - - - - - - - - - - - - - - - - - - +
>
> Dispatch:
>
> Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3.
> Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2.
>
> .. code-block:: diff
>
>     + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
>     '  Core #0   '     '          Core #1         '     '  Core #2   '
>     '            '     '                          '     '            '
>     ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
>     ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
>     ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
>     '            '     '     |                    '     '      ^     '
>     + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
>                              |                                 |
>                              + - - - - - - - - - - - - - - - - +
>
>
> The patch set has been break down as below:
>
> 1. Split graph worker into common and default model part.
> 2. Inline graph node processing to make it reusable.
> 3. Add set/get APIs to choose worker model.
> 4. Introduce core affinity API to set the node run on specific worker core.
>   (only use in new model)
> 5. Introduce graph affinity API to bind one graph with specific worker
>   core.
> 6. Introduce graph clone API.
> 7. Introduce stream moving with scheduler work-queue in patch 8~12.
> 8. Add stats for new models.
> 9. Abstract default graph config process and integrate new model into
>   example/l3fwd-graph. Add new parameters for model choosing.
>
> We could run with new worker model by this:
> ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> --model="dispatch"
>
> References:
> https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf
>
>
> Zhirun Yan (16):
>   graph: rename rte graph worker header as common
>   graph: split graph worker into common and default model
>   graph: move node process into inline function
>   graph: add get/set graph worker model APIs
>   graph: introduce graph node core affinity API
>   graph: introduce graph bind unbind API
>   graph: move node clone name func into private as common
>   graph: introduce graph clone API for other worker core
>   graph: add structure for stream moving between cores
>   graph: introduce stream moving cross cores
>   graph: enable create and destroy graph scheduling workqueue
>   graph: introduce graph walk by cross-core dispatch
>   graph: enable graph multicore dispatch scheduler model
>   graph: add stats for mcore dispatch model
>   test/graph: add functional tests for mcore dispatch model
>   examples/l3fwd-graph: introduce mcore dispatch worker model

Series applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 369+ messages in thread

end of thread, other threads:[~2023-06-19 20:45 UTC | newest]

Thread overview: 369+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17  5:09 [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
2022-11-17  5:09 ` [PATCH v1 01/13] graph: split graph worker into common and default model Zhirun Yan
2023-02-20 13:38   ` Jerin Jacob
2023-02-24  6:29     ` Yan, Zhirun
2022-11-17  5:09 ` [PATCH v1 02/13] graph: move node process into inline function Zhirun Yan
2023-02-20 13:39   ` Jerin Jacob
2022-11-17  5:09 ` [PATCH v1 03/13] graph: add macro to walk on graph circular buffer Zhirun Yan
2023-02-20 13:45   ` Jerin Jacob
2023-02-24  6:30     ` Yan, Zhirun
2022-11-17  5:09 ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
2022-12-06  3:35   ` [EXT] " Kiran Kumar Kokkilagadda
2022-12-08  7:26     ` Yan, Zhirun
2023-02-20 13:50   ` Jerin Jacob
2023-02-24  6:31     ` Yan, Zhirun
2023-02-26 22:23       ` Jerin Jacob
2023-03-02  8:38         ` Yan, Zhirun
2023-03-02 13:58           ` Jerin Jacob
2023-03-07  8:26             ` Yan, Zhirun
2022-11-17  5:09 ` [PATCH v1 05/13] graph: introduce core affinity API Zhirun Yan
2023-02-20 14:05   ` Jerin Jacob
2023-02-24  6:32     ` Yan, Zhirun
2022-11-17  5:09 ` [PATCH v1 06/13] graph: introduce graph " Zhirun Yan
2023-02-20 14:07   ` Jerin Jacob
2023-02-24  6:39     ` Yan, Zhirun
2022-11-17  5:09 ` [PATCH v1 07/13] graph: introduce graph clone API for other worker core Zhirun Yan
2022-11-17  5:09 ` [PATCH v1 08/13] graph: introduce stream moving cross cores Zhirun Yan
2023-02-20 14:17   ` Jerin Jacob
2023-02-24  6:48     ` Yan, Zhirun
2022-11-17  5:09 ` [PATCH v1 09/13] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2022-11-17  5:09 ` [PATCH v1 10/13] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2022-11-17  5:09 ` [PATCH v1 11/13] graph: enable graph generic scheduler model Zhirun Yan
2022-11-17  5:09 ` [PATCH v1 12/13] graph: add stats for corss-core dispatching Zhirun Yan
2022-11-17  5:09 ` [PATCH v1 13/13] examples/l3fwd-graph: introduce generic worker model Zhirun Yan
2023-02-20 14:20   ` Jerin Jacob
2023-02-24  6:49     ` Yan, Zhirun
2023-02-20  0:22 ` [PATCH v1 00/13] graph enhancement for multi-core dispatch Thomas Monjalon
2023-02-20  8:28   ` Yan, Zhirun
2023-02-20  9:33     ` Jerin Jacob
2023-03-24  2:16 ` [PATCH v2 00/15] " Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 01/15] graph: rename rte_graph_work as common Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 02/15] graph: split graph worker into common and default model Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 03/15] graph: move node process into inline function Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 04/15] graph: add get/set graph worker model APIs Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 05/15] graph: introduce graph node core affinity API Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 06/15] graph: introduce graph bind unbind API Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 08/15] graph: add struct for stream moving between cores Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 09/15] graph: introduce stream moving cross cores Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 13/15] graph: add stats for corss-core dispatching Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
2023-03-24  2:16   ` [PATCH v2 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
2023-03-29  6:43   ` [PATCH v3 00/15] graph enhancement for multi-core dispatch Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 01/15] graph: rename rte_graph_work as common Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 02/15] graph: split graph worker into common and default model Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 03/15] graph: move node process into inline function Zhirun Yan
2023-03-29 15:34       ` Stephen Hemminger
2023-03-29 15:41         ` Jerin Jacob
2023-03-29  6:43     ` [PATCH v3 04/15] graph: add get/set graph worker model APIs Zhirun Yan
2023-03-29 15:35       ` Stephen Hemminger
2023-03-30  3:37         ` Yan, Zhirun
2023-03-29  6:43     ` [PATCH v3 05/15] graph: introduce graph node core affinity API Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 06/15] graph: introduce graph bind unbind API Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 08/15] graph: add struct for stream moving between cores Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 09/15] graph: introduce stream moving cross cores Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 13/15] graph: add stats for cross-core dispatching Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
2023-03-30  6:18     ` [PATCH v4 00/15] graph enhancement for multi-core dispatch Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 01/15] graph: rename rte_graph_work as common Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 02/15] graph: split graph worker into common and default model Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 03/15] graph: move node process into inline function Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 04/15] graph: add get/set graph worker model APIs Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 05/15] graph: introduce graph node core affinity API Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 06/15] graph: introduce graph bind unbind API Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 08/15] graph: add struct for stream moving between cores Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 09/15] graph: introduce stream moving cross cores Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 13/15] graph: add stats for cross-core dispatching Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
2023-03-30  6:18       ` [PATCH v4 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
2023-03-31  4:02       ` [PATCH v5 00/15] graph enhancement for multi-core dispatch Zhirun Yan
2023-03-31  4:02         ` [PATCH v5 01/15] graph: rename rte_graph_work as common Zhirun Yan
2023-03-31  4:02         ` [PATCH v5 02/15] graph: split graph worker into common and default model Zhirun Yan
2023-04-27 14:11           ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-05-05  2:09             ` Yan, Zhirun
2023-03-31  4:02         ` [PATCH v5 03/15] graph: move node process into inline function Zhirun Yan
2023-04-27 15:03           ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-05-05  2:10             ` Yan, Zhirun
2023-03-31  4:02         ` [PATCH v5 04/15] graph: add get/set graph worker model APIs Zhirun Yan
2023-03-31  4:02         ` [PATCH v5 05/15] graph: introduce graph node core affinity API Zhirun Yan
2023-03-31  4:02         ` [PATCH v5 06/15] graph: introduce graph bind unbind API Zhirun Yan
2023-03-31  4:02         ` [PATCH v5 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
2023-03-31  4:02         ` [PATCH v5 08/15] graph: add struct for stream moving between cores Zhirun Yan
2023-03-31  4:03         ` [PATCH v5 09/15] graph: introduce stream moving cross cores Zhirun Yan
2023-04-27 14:52           ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-05-05  2:10             ` Yan, Zhirun
2023-03-31  4:03         ` [PATCH v5 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-03-31  4:03         ` [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-04-27 14:58           ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-05-05  2:09             ` Yan, Zhirun
2023-03-31  4:03         ` [PATCH v5 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-03-31  4:03         ` [PATCH v5 13/15] graph: add stats for cross-core dispatching Zhirun Yan
2023-03-31  4:03         ` [PATCH v5 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
2023-03-31  4:03         ` [PATCH v5 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
2023-05-09  6:03         ` [PATCH v6 00/15] graph enhancement for multi-core dispatch Zhirun Yan
2023-05-09  6:03           ` [PATCH v6 01/15] graph: rename rte_graph_work as common Zhirun Yan
2023-05-22  8:25             ` Jerin Jacob
2023-05-23  8:13               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 02/15] graph: split graph worker into common and default model Zhirun Yan
2023-05-09  6:03           ` [PATCH v6 03/15] graph: move node process into inline function Zhirun Yan
2023-05-09  6:03           ` [PATCH v6 04/15] graph: add get/set graph worker model APIs Zhirun Yan
2023-05-24  6:08             ` Jerin Jacob
2023-05-26  9:58               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 05/15] graph: introduce graph node core affinity API Zhirun Yan
2023-05-24  6:36             ` Jerin Jacob
2023-05-26 10:00               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 06/15] graph: introduce graph bind unbind API Zhirun Yan
2023-05-24  6:23             ` Jerin Jacob
2023-05-26 10:00               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 07/15] graph: introduce graph clone API for other worker core Zhirun Yan
2023-05-24  7:14             ` Jerin Jacob
2023-05-26 10:02               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 08/15] graph: add struct for stream moving between cores Zhirun Yan
2023-05-24  7:24             ` Jerin Jacob
2023-05-26 10:02               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 09/15] graph: introduce stream moving cross cores Zhirun Yan
2023-05-24  8:00             ` Jerin Jacob
2023-05-26 10:03               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 10/15] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-05-09  6:03           ` [PATCH v6 11/15] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-05-09  6:03           ` [PATCH v6 12/15] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-05-24  8:45             ` Jerin Jacob
2023-05-26 10:04               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 13/15] graph: add stats for cross-core dispatching Zhirun Yan
2023-05-24  8:08             ` Jerin Jacob
2023-05-26 10:03               ` Yan, Zhirun
2023-05-09  6:03           ` [PATCH v6 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
2023-05-09  6:03           ` [PATCH v6 15/15] doc: update multicore dispatch model in graph guides Zhirun Yan
2023-05-24  8:12             ` Jerin Jacob
2023-05-26 10:04               ` Yan, Zhirun
2023-06-05 11:19           ` [PATCH v7 00/15] graph enhancement for multi-core dispatch Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 01/17] graph: rename rte_graph_work as common Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 02/17] graph: split graph worker into common and default model Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 03/17] graph: move node process into inline function Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 04/17] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-05 12:38               ` Jerin Jacob
2023-06-06  4:30                 ` Yan, Zhirun
2023-06-06  5:48                   ` Jerin Jacob
2023-06-06  6:34                     ` Yan, Zhirun
2023-06-06  6:39                       ` Jerin Jacob
2023-06-05 11:19             ` [PATCH v7 05/17] graph: introduce graph node core affinity API Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 06/17] graph: introduce graph bind unbind API Zhirun Yan
2023-06-05 12:41               ` Jerin Jacob
2023-06-06  4:30                 ` Yan, Zhirun
2023-06-05 11:19             ` [PATCH v7 07/17] graph: move node clone name func into private as common Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 09/17] graph: add structure for stream moving between cores Zhirun Yan
2023-06-05 12:46               ` Jerin Jacob
2023-06-06  4:30                 ` Yan, Zhirun
2023-06-05 11:19             ` [PATCH v7 10/17] graph: introduce stream moving cross cores Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 14/17] graph: add stats for cross-core dispatching Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
2023-06-05 13:42               ` Jerin Jacob
2023-06-06  5:10                 ` Yan, Zhirun
2023-06-06  5:55                   ` Jerin Jacob
2023-06-06  8:51                     ` Yan, Zhirun
2023-06-05 11:19             ` [PATCH v7 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
2023-06-05 11:19             ` [PATCH v7 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
2023-06-06 14:47             ` [PATCH v8 00/17] graph enhancement for multi-core dispatch Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 01/17] graph: rename rte_graph_work as common Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 02/17] graph: split graph worker into common and default model Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 03/17] graph: move node process into inline function Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 04/17] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 05/17] graph: introduce graph node core affinity API Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 06/17] graph: introduce graph bind unbind API Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 07/17] graph: move node clone name func into private as common Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 09/17] graph: add structure for stream moving between cores Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 10/17] graph: introduce stream moving cross cores Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 14/17] graph: add stats for cross-core dispatching Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
2023-06-06 14:47               ` [PATCH v8 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
2023-06-07  3:51               ` [PATCH v9 00/17] graph enhancement for multi-core dispatch Zhirun Yan
2023-06-07  3:51                 ` [PATCH v9 01/17] graph: rename rte_graph_work as common Zhirun Yan
2023-06-07  7:28                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 02/17] graph: split graph worker into common and default model Zhirun Yan
2023-06-07  7:30                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 03/17] graph: move node process into inline function Zhirun Yan
2023-06-07  7:31                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 04/17] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-07  7:42                   ` Jerin Jacob
2023-06-07 12:25                     ` Yan, Zhirun
2023-06-07 13:28                       ` Jerin Jacob
2023-06-08  3:08                         ` Yan, Zhirun
2023-06-07  3:51                 ` [PATCH v9 05/17] graph: introduce graph node core affinity API Zhirun Yan
2023-06-07  7:56                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 06/17] graph: introduce graph bind unbind API Zhirun Yan
2023-06-07  7:59                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 07/17] graph: move node clone name func into private as common Zhirun Yan
2023-06-07  8:01                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 08/17] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-07  8:04                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 09/17] graph: add structure for stream moving between cores Zhirun Yan
2023-06-07  3:51                 ` [PATCH v9 10/17] graph: introduce stream moving cross cores Zhirun Yan
2023-06-07  3:51                 ` [PATCH v9 11/17] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-07  3:51                 ` [PATCH v9 12/17] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-07  8:09                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 13/17] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-07  8:15                   ` Jerin Jacob
2023-06-07 12:25                     ` Yan, Zhirun
2023-06-07 13:26                       ` Jerin Jacob
2023-06-08  3:08                         ` Yan, Zhirun
2023-06-08  5:33                           ` Jerin Jacob
2023-06-08  7:06                             ` Yan, Zhirun
2023-06-07  3:51                 ` [PATCH v9 14/17] graph: add stats for cross-core dispatching Zhirun Yan
2023-06-07  8:20                   ` Jerin Jacob
2023-06-07  3:51                 ` [PATCH v9 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model Zhirun Yan
2023-06-07  8:27                   ` Jerin Jacob
2023-06-07 12:26                     ` Yan, Zhirun
2023-06-07 13:58                       ` Jerin Jacob
2023-06-08  2:58                         ` Yan, Zhirun
2023-06-07  3:51                 ` [PATCH v9 16/17] test/graph: add functional tests for mcore dispatch model Zhirun Yan
2023-06-07  3:51                 ` [PATCH v9 17/17] doc: update multicore dispatch model in graph guides Zhirun Yan
2023-06-07 12:45                   ` Jerin Jacob
2023-06-08  3:21                     ` Yan, Zhirun
2023-06-08  9:57                 ` [PATCH v10 00/16] graph enhancement for multi-core dispatch Zhirun Yan
2023-06-08  9:57                   ` [PATCH v10 01/16] graph: rename rte_graph_work as common Zhirun Yan
2023-06-08  9:57                   ` [PATCH v10 02/16] graph: split graph worker into common and default model Zhirun Yan
2023-06-08  9:57                   ` [PATCH v10 03/16] graph: move node process into inline function Zhirun Yan
2023-06-08  9:57                   ` [PATCH v10 04/16] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-08 10:38                     ` Jerin Jacob
2023-06-08  9:57                   ` [PATCH v10 05/16] graph: introduce graph node core affinity API Zhirun Yan
2023-06-08  9:57                   ` [PATCH v10 06/16] graph: introduce graph bind unbind API Zhirun Yan
2023-06-08 10:40                     ` Jerin Jacob
2023-06-08 13:47                       ` Yan, Zhirun
2023-06-08  9:57                   ` [PATCH v10 07/16] graph: move node clone name func into private as common Zhirun Yan
2023-06-08  9:57                   ` [PATCH v10 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-08  9:57                   ` [PATCH v10 09/16] graph: add structure for stream moving between cores Zhirun Yan
2023-06-08  9:57                   ` [PATCH v10 10/16] graph: introduce stream moving cross cores Zhirun Yan
2023-06-08 13:39                     ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-06-08  9:57                   ` [PATCH v10 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-08 13:43                     ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-06-08  9:57                   ` [PATCH v10 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-08 13:43                     ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-06-08  9:57                   ` [PATCH v10 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-08 10:42                     ` Jerin Jacob
2023-06-08 14:29                     ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-06-08  9:57                   ` [PATCH v10 14/16] graph: add stats for mcore dispatch model Zhirun Yan
2023-06-08 13:11                     ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-06-08  9:57                   ` [PATCH v10 15/16] test/graph: add functional tests " Zhirun Yan
2023-06-08 12:27                     ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-06-08  9:57                   ` [PATCH v10 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
2023-06-08 10:45                     ` Jerin Jacob
2023-06-08 12:08                     ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-06-08 13:50                       ` Yan, Zhirun
2023-06-08 15:18                   ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 01/16] graph: rename rte_graph_work as common Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 02/16] graph: split graph worker into common and default model Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 03/16] graph: move node process into inline function Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 04/16] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 05/16] graph: introduce graph node core affinity API Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 06/16] graph: introduce graph bind unbind API Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 07/16] graph: move node clone name func into private as common Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 09/16] graph: add structure for stream moving between cores Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 10/16] graph: introduce stream moving cross cores Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 14/16] graph: add stats for mcore dispatch model Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 15/16] test/graph: add functional tests " Zhirun Yan
2023-06-08 15:18                     ` [PATCH v11 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
2023-06-08 15:30                     ` [PATCH v11 00/16] graph enhancement for multi-core dispatch Jerin Jacob
2023-06-09 13:39                       ` David Marchand
2023-06-09 14:36                         ` David Marchand
2023-06-09 15:47                           ` Yan, Zhirun
2023-06-12 14:55                             ` David Marchand
2023-06-13  8:06                               ` Yan, Zhirun
2023-06-09 19:12                     ` [PATCH v12 " Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 01/16] graph: rename rte_graph_work as common Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 02/16] graph: split graph worker into common and default model Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 03/16] graph: move node process into inline function Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 04/16] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 05/16] graph: introduce graph node core affinity API Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 06/16] graph: introduce graph bind unbind API Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 07/16] graph: move node clone name func into private as common Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 09/16] graph: add structure for stream moving between cores Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 10/16] graph: introduce stream moving cross cores Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 14/16] graph: add stats for mcore dispatch model Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 15/16] test/graph: add functional tests " Zhirun Yan
2023-06-09 19:12                       ` [PATCH v12 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
2023-06-13 10:13                       ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 01/16] graph: rename rte_graph_work as common Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 02/16] graph: split graph worker into common and default model Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 03/16] graph: move node process into inline function Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 04/16] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 05/16] graph: introduce graph node core affinity API Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 06/16] graph: introduce graph bind unbind API Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 07/16] graph: move node clone name func into private as common Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 09/16] graph: add structure for stream moving between cores Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 10/16] graph: introduce stream moving cross cores Zhirun Yan
2023-06-13 10:13                         ` [PATCH v13 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-13 10:14                         ` [PATCH v13 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-13 10:14                         ` [PATCH v13 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-13 10:14                         ` [PATCH v13 14/16] graph: add stats for mcore dispatch model Zhirun Yan
2023-06-13 10:14                         ` [PATCH v13 15/16] test/graph: add functional tests " Zhirun Yan
2023-06-13 10:14                         ` [PATCH v13 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
2023-06-13 11:12                         ` [PATCH v13 00/16] graph enhancement for multi-core dispatch Jerin Jacob
2023-06-13 11:26                           ` Yan, Zhirun
2023-06-13 14:04                         ` [PATCH v14 " Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 01/16] graph: rename rte_graph_work as common Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 02/16] graph: split graph worker into common and default model Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 03/16] graph: move node process into inline function Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 04/16] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 05/16] graph: introduce graph node core affinity API Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 06/16] graph: introduce graph bind unbind API Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 07/16] graph: move node clone name func into private as common Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 09/16] graph: add structure for stream moving between cores Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 10/16] graph: introduce stream moving cross cores Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 14/16] graph: add stats for mcore dispatch model Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 15/16] test/graph: add functional tests " Zhirun Yan
2023-06-13 14:04                           ` [PATCH v14 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
2023-06-13 14:40                           ` [PATCH v14 00/16] graph enhancement for multi-core dispatch David Marchand
2023-06-13 16:08                             ` Jerin Jacob
2023-06-14  1:52                               ` Yan, Zhirun
2023-06-14 15:58                           ` [PATCH v15 " Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 01/16] graph: rename rte graph worker header as common Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 02/16] graph: split graph worker into common and default model Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 03/16] graph: move node process into inline function Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 04/16] graph: add get/set graph worker model APIs Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 05/16] graph: introduce graph node core affinity API Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 06/16] graph: introduce graph bind unbind API Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 07/16] graph: move node clone name func into private as common Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 08/16] graph: introduce graph clone API for other worker core Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 09/16] graph: add structure for stream moving between cores Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 10/16] graph: introduce stream moving cross cores Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 11/16] graph: enable create and destroy graph scheduling workqueue Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 12/16] graph: introduce graph walk by cross-core dispatch Zhirun Yan
2023-06-14 15:58                             ` [PATCH v15 13/16] graph: enable graph multicore dispatch scheduler model Zhirun Yan
2023-06-14 15:59                             ` [PATCH v15 14/16] graph: add stats for mcore dispatch model Zhirun Yan
2023-06-14 15:59                             ` [PATCH v15 15/16] test/graph: add functional tests " Zhirun Yan
2023-06-14 15:59                             ` [PATCH v15 16/16] examples/l3fwd-graph: introduce mcore dispatch worker model Zhirun Yan
2023-06-19 20:45                             ` [PATCH v15 00/16] graph enhancement for multi-core dispatch David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).