DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC, v1 0/6] graph enhancement for multi-core dispatch
@ 2022-09-08  2:09 Zhirun Yan
  2022-09-08  2:09 ` [RFC, v1 1/6] graph: introduce core affinity API into graph Zhirun Yan
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Zhirun Yan @ 2022-09-08  2:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark; +Cc: cunming.liang, haiyue.wang, Zhirun Yan

Currently, the rte_graph_walk() and rte_node_enqueue* fast path API
functions in graph lib implementation are designed to work on single-core.

This solution(RFC) proposes usage of cross-core dispatching mechanism to
enhance the graph scaling strategy. We introduce Scheduler Workqueue
then we could directly dispatch streams to another worker core which is
affinity with a specific node.

This RFC:
  1. Introduce core affinity API and graph clone API.
  2. Introduce key functions to enqueue/dequeue for dispatching streams.
  3. Enhance rte_graph_walk by cross-core dispatch.
  4. Add l2fwd-graph example and stats for cross-core dispatching.

With this patch set, it could easily plan and orchestrate stream on
multi-core systems.

Future work:
  1. Support to affinity lcore set for one node.
  2. Use l3fwd-graph instead of l2fwd-graph as example in patch 06.
  3. Add new parameter, like --node(nodeid, lcoreid) to config node for core
  affinity.

Comments and suggestions are welcome. Thanks!



Haiyue Wang (1):
  examples: add l2fwd-graph

Zhirun Yan (5):
  graph: introduce core affinity API into graph
  graph: introduce graph clone API for other worker core
  graph: enable stream moving cross cores
  graph: enhance graph walk by cross-core dispatch
  graph: add stats for corss-core dispatching

 examples/l2fwd-graph/main.c      | 455 +++++++++++++++++++++++++++++++
 examples/l2fwd-graph/meson.build |  25 ++
 examples/l2fwd-graph/node.c      | 263 ++++++++++++++++++
 examples/l2fwd-graph/node.h      |  64 +++++
 examples/meson.build             |   1 +
 lib/graph/graph.c                | 121 ++++++++
 lib/graph/graph_debug.c          |   4 +
 lib/graph/graph_populate.c       |   1 +
 lib/graph/graph_private.h        |  43 +++
 lib/graph/graph_sched.c          | 194 +++++++++++++
 lib/graph/graph_stats.c          |  19 +-
 lib/graph/meson.build            |   2 +
 lib/graph/node.c                 |  25 ++
 lib/graph/rte_graph.h            |  50 ++++
 lib/graph/rte_graph_worker.h     |  59 ++++
 lib/graph/version.map            |   5 +
 16 files changed, 1327 insertions(+), 4 deletions(-)
 create mode 100644 examples/l2fwd-graph/main.c
 create mode 100644 examples/l2fwd-graph/meson.build
 create mode 100644 examples/l2fwd-graph/node.c
 create mode 100644 examples/l2fwd-graph/node.h
 create mode 100644 lib/graph/graph_sched.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, v1 1/6] graph: introduce core affinity API into graph
  2022-09-08  2:09 [RFC, v1 0/6] graph enhancement for multi-core dispatch Zhirun Yan
@ 2022-09-08  2:09 ` Zhirun Yan
  2022-09-08  2:09 ` [RFC, v1 2/6] graph: introduce graph clone API for other worker core Zhirun Yan
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Zhirun Yan @ 2022-09-08  2:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark; +Cc: cunming.liang, haiyue.wang, Zhirun Yan

1. add lcore_id for node to hold affinity core id
2. impl rte_node_set_lcore_affinity to affinity node with one lcore
3. update version map for graph public API

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_private.h |  1 +
 lib/graph/node.c          | 25 +++++++++++++++++++++++++
 lib/graph/rte_graph.h     | 15 +++++++++++++++
 lib/graph/version.map     |  1 +
 4 files changed, 42 insertions(+)

diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..627090f802 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -49,6 +49,7 @@ struct node {
 	STAILQ_ENTRY(node) next;      /**< Next node in the list. */
 	char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
 	uint64_t flags;		      /**< Node configuration flag. */
+	unsigned int lcore_id;        /**< Node runs on the Lcore ID */
 	rte_node_process_t process;   /**< Node process function. */
 	rte_node_init_t init;         /**< Node init function. */
 	rte_node_fini_t fini;	      /**< Node fini function. */
diff --git a/lib/graph/node.c b/lib/graph/node.c
index ae6eadb260..2ce031886b 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -99,6 +99,7 @@ __rte_node_register(const struct rte_node_register *reg)
 			goto free;
 	}
 
+	node->lcore_id = RTE_MAX_LCORE;
 	node->id = node_id++;
 
 	/* Add the node at tail */
@@ -192,6 +193,30 @@ rte_node_clone(rte_node_t id, const char *name)
 	return RTE_NODE_ID_INVALID;
 }
 
+int
+rte_node_set_lcore_affinity(const char *name, unsigned int lcore_id)
+{
+	struct node *node;
+	int ret = -EINVAL;
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return ret;
+
+	graph_spinlock_lock();
+
+	STAILQ_FOREACH(node, &node_list, next) {
+		if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) {
+			node->lcore_id = lcore_id;
+			ret = 0;
+			break;
+		}
+	}
+
+	graph_spinlock_unlock();
+
+	return ret;
+}
+
 rte_node_t
 rte_node_from_name(const char *name)
 {
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..9f84461dd8 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -510,6 +510,21 @@ rte_node_t rte_node_from_name(const char *name);
 __rte_experimental
 char *rte_node_id_to_name(rte_node_t id);
 
+/**
+ * Set lcore affinity to the node.
+ *
+ * @param name
+ *   Valid node name. In the case of the cloned node, the name will be
+ * "parent node name" + "-" + name.
+ * @param lcore_id
+ *   The lcore ID value.
+ *
+ * @return
+ *   0 on success, error otherwise.
+ */
+__rte_experimental
+int rte_node_set_lcore_affinity(const char *name, unsigned int lcore_id);
+
 /**
  * Get the number of edges(next-nodes) for a node from node id.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 13b838752d..6d40a74731 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -42,6 +42,7 @@ EXPERIMENTAL {
 	rte_node_next_stream_get;
 	rte_node_next_stream_put;
 	rte_node_next_stream_move;
+	rte_node_set_lcore_affinity;
 
 	local: *;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, v1 2/6] graph: introduce graph clone API for other worker core
  2022-09-08  2:09 [RFC, v1 0/6] graph enhancement for multi-core dispatch Zhirun Yan
  2022-09-08  2:09 ` [RFC, v1 1/6] graph: introduce core affinity API into graph Zhirun Yan
@ 2022-09-08  2:09 ` Zhirun Yan
  2022-09-08  2:09 ` [RFC, v1 3/6] graph: enable stream moving cross cores Zhirun Yan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Zhirun Yan @ 2022-09-08  2:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark; +Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch adds graph API for supporting to clone the graph object for
a specified worker core. The new graph will also clone all nodes.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c         | 115 ++++++++++++++++++++++++++++++++++++++
 lib/graph/graph_private.h |   4 ++
 lib/graph/rte_graph.h     |  32 +++++++++++
 lib/graph/version.map     |   1 +
 4 files changed, 152 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index d61288647c..b4eb18175a 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -327,6 +327,8 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->parent_id = RTE_GRAPH_ID_INVALID;
+	graph->lcore_id = RTE_MAX_LCORE;
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
@@ -387,6 +389,119 @@ rte_graph_destroy(rte_graph_t id)
 	return rc;
 }
 
+static int
+clone_name(struct graph *graph, struct graph *parent_graph, const char *name)
+{
+	ssize_t sz, rc;
+
+#define SZ RTE_GRAPH_NAMESIZE
+	rc = rte_strscpy(graph->name, parent_graph->name, SZ);
+	if (rc < 0)
+		goto fail;
+	sz = rc;
+	rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0));
+	if (rc < 0)
+		goto fail;
+	sz += rc;
+	sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0));
+	if (sz < 0)
+		goto fail;
+
+	return 0;
+fail:
+	rte_errno = E2BIG;
+	return -rte_errno;
+}
+
+static rte_graph_t
+graph_clone(struct graph *parent_graph, const char *name,
+	    struct rte_graph_clone_param *prm)
+{
+	struct graph_node *graph_node;
+	struct graph *graph;
+
+	graph_spinlock_lock();
+
+	if (prm->lcore_id >= RTE_MAX_LCORE)
+		SET_ERR_JMP(EINVAL, fail, "Invalid lcore ID");
+
+	/* Don't allow to clone a node from a cloned graph */
+	if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned");
+
+	/* Create graph object */
+	graph = calloc(1, sizeof(*graph));
+	if (graph == NULL)
+		SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object");
+
+	/* Naming ceremony of the new graph. name is node->name + "-" + name */
+	if (clone_name(graph, parent_graph, name))
+		goto free;
+
+	/* Check for existence of duplicate graph */
+	if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID)
+		SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s",
+			    graph->name);
+
+	/* Clone nodes from parent graph firstly */
+	STAILQ_INIT(&graph->node_list);
+	STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) {
+		if (graph_node_add(graph, graph_node->node))
+			goto graph_cleanup;
+	}
+
+	/* Just update adjacency list of all nodes in the graph */
+	if (graph_adjacency_list_update(graph))
+		goto graph_cleanup;
+
+	/* Initialize the graph object */
+	graph->src_node_count = parent_graph->src_node_count;
+	graph->node_count = parent_graph->node_count;
+	graph->parent_id = parent_graph->id;
+	graph->lcore_id = prm->lcore_id;
+	graph->socket = rte_lcore_to_socket_id(prm->lcore_id);
+	graph->id = graph_id;
+
+	/* Allocate the Graph fast path memory and populate the data */
+	if (graph_fp_mem_create(graph))
+		goto graph_cleanup;
+
+	/* Call init() of the all the nodes in the graph */
+	if (graph_node_init(graph))
+		goto graph_mem_destroy;
+
+	/* All good, Lets add the graph to the list */
+	graph_id++;
+	STAILQ_INSERT_TAIL(&graph_list, graph, next);
+
+	graph_spinlock_unlock();
+	return graph->id;
+
+graph_mem_destroy:
+	graph_fp_mem_destroy(graph);
+graph_cleanup:
+	graph_cleanup(graph);
+free:
+	free(graph);
+fail:
+	graph_spinlock_unlock();
+	return RTE_GRAPH_ID_INVALID;
+}
+
+rte_graph_t
+rte_graph_clone(rte_graph_t id, const char *name, struct rte_graph_clone_param *prm)
+{
+	struct graph *graph;
+
+	GRAPH_ID_CHECK(id);
+	STAILQ_FOREACH(graph, &graph_list, next)
+		if (graph->id == id)
+			return graph_clone(graph, name, prm);
+
+fail:
+	return RTE_GRAPH_ID_INVALID;
+}
+
 rte_graph_t
 rte_graph_from_name(const char *name)
 {
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 627090f802..d53ef289d4 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -97,8 +97,12 @@ struct graph {
 	/**< Circular buffer mask for wrap around. */
 	rte_graph_t id;
 	/**< Graph identifier. */
+	rte_graph_t parent_id;
+	/**< Parent graph identifier. */
 	size_t mem_sz;
 	/**< Memory size of the graph. */
+	unsigned int lcore_id;
+	/**< Lcore identifier where the grap runs on. */
 	int socket;
 	/**< Socket identifier where memory is allocated. */
 	STAILQ_HEAD(gnode_list, graph_node) node_list;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 9f84461dd8..27fd1e6cd0 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -166,6 +166,15 @@ struct rte_graph_param {
 	/**< Array of node patterns based on shell pattern. */
 };
 
+/**
+ * Structure to hold configuration parameters for cloning the graph.
+ *
+ * @see rte_graph_clone()
+ */
+struct rte_graph_clone_param {
+	unsigned int lcore_id; /**< Lcore id where the new graph is cloned to run. */
+};
+
 /**
  * Structure to hold configuration parameters for graph cluster stats create.
  *
@@ -242,6 +251,29 @@ rte_graph_t rte_graph_create(const char *name, struct rte_graph_param *prm);
 __rte_experimental
 int rte_graph_destroy(rte_graph_t id);
 
+/**
+ * Clone Graph.
+ *
+ * Clone a graph from static graph (graph created from rte_graph_create). And
+ * all cloned graphs attached to the parent graph MUST be destroied together
+ * for fast schedule design limitation (stop ALL graph walk firstly).
+ *
+ * @param id
+ *   Static graph id to clone from.
+ * @param name
+ *   Name of the new graph. The library prepends the parent graph name to the
+ * user-specified name. The final graph name will be,
+ * "parent graph name" + "-" + name.
+ * @param prm
+ *   Graph clone parameter, includes lcore ID.
+ *
+ * @return
+ *   Valid graph id on success, RTE_GRAPH_ID_INVALID otherwise.
+ */
+__rte_experimental
+rte_graph_t rte_graph_clone(rte_graph_t id, const char *name,
+			    struct rte_graph_clone_param *prm);
+
 /**
  * Get graph id from graph name.
  *
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 6d40a74731..6fc43e4411 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -5,6 +5,7 @@ EXPERIMENTAL {
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
 
+	rte_graph_clone;
 	rte_graph_create;
 	rte_graph_destroy;
 	rte_graph_dump;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, v1 3/6] graph: enable stream moving cross cores
  2022-09-08  2:09 [RFC, v1 0/6] graph enhancement for multi-core dispatch Zhirun Yan
  2022-09-08  2:09 ` [RFC, v1 1/6] graph: introduce core affinity API into graph Zhirun Yan
  2022-09-08  2:09 ` [RFC, v1 2/6] graph: introduce graph clone API for other worker core Zhirun Yan
@ 2022-09-08  2:09 ` Zhirun Yan
  2022-09-08  2:09 ` [RFC, v1 4/6] graph: enhance graph walk by cross-core dispatch Zhirun Yan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Zhirun Yan @ 2022-09-08  2:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark; +Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch allows a worker thread to enable enqueue and move streams of
objects to the next nodes over different cores.

1. add graph_sched_wq_node to hold graph scheduling workqueue node
stream
2. add workqueue help functions to create/destroy/enqueue/dequeue

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_populate.c   |   1 +
 lib/graph/graph_private.h    |  38 +++++++
 lib/graph/graph_sched.c      | 194 +++++++++++++++++++++++++++++++++++
 lib/graph/meson.build        |   2 +
 lib/graph/rte_graph_worker.h |  45 ++++++++
 lib/graph/version.map        |   3 +
 6 files changed, 283 insertions(+)
 create mode 100644 lib/graph/graph_sched.c

diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..26f9670406 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -84,6 +84,7 @@ graph_nodes_populate(struct graph *_graph)
 		}
 		node->id = graph_node->node->id;
 		node->parent_id = pid;
+		node->lcore_id = graph_node->node->lcore_id;
 		nb_edges = graph_node->node->nb_edges;
 		node->nb_edges = nb_edges;
 		off += sizeof(struct rte_node);
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index d53ef289d4..e50a207764 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -59,6 +59,17 @@ struct node {
 	char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */
 };
 
+/**
+ * @internal
+ *
+ * Structure that holds the graph scheduling workqueue node stream.
+ */
+struct graph_sched_wq_node {
+	rte_graph_off_t node_off;
+	uint16_t nb_objs;
+	void *objs[RTE_GRAPH_BURST_SIZE];
+} __rte_cache_aligned;
+
 /**
  * @internal
  *
@@ -349,4 +360,31 @@ void graph_dump(FILE *f, struct graph *g);
  */
 void node_dump(FILE *f, struct node *n);
 
+/**
+ * @internal
+ *
+ * Create the graph schedule work queue. And all cloned graphs attached to the
+ * parent graph MUST be destroied together for fast schedule design limitation.
+ *
+ * @param _graph
+ *   The graph object
+ * @param _parent_graph
+ *   The parent graph object which holds the run-queue head.
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Graph schedule work queue related error.
+ */
+int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph);
+
+/**
+ * @internal
+ *
+ * Destroy the graph schedule work queue.
+ *
+ * @param _graph
+ *   The graph object
+ */
+void graph_sched_wq_destroy(struct graph *_graph);
+
 #endif /* _RTE_GRAPH_PRIVATE_H_ */
diff --git a/lib/graph/graph_sched.c b/lib/graph/graph_sched.c
new file mode 100644
index 0000000000..465148796b
--- /dev/null
+++ b/lib/graph/graph_sched.c
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_mempool.h>
+#include <rte_memzone.h>
+#include <rte_ring.h>
+
+#include "graph_private.h"
+
+#define GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
+#define GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
+	((typeof(nb_nodes))((nb_nodes) * GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
+
+static __rte_always_inline bool
+graph_src_node_avail(struct graph *graph)
+{
+	struct graph_node *graph_node;
+
+	STAILQ_FOREACH(graph_node, &graph->node_list, next)
+		if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
+			(graph_node->node->lcore_id == RTE_MAX_LCORE ||
+		    graph->lcore_id == graph_node->node->lcore_id))
+			return true;
+
+	return false;
+}
+
+int
+graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph)
+{
+	struct rte_graph *parent_graph = _parent_graph->graph;
+	struct rte_graph *graph = _graph->graph;
+	unsigned int wq_size;
+
+	/* cloned graph doesn't access input node if not affinity rx core*/
+	if (!graph_src_node_avail(_graph))
+		graph->head = 0;
+
+	wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
+	wq_size = rte_align32pow2(wq_size + 1);
+
+	graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
+				    RING_F_SC_DEQ);
+	if (graph->wq == NULL)
+		SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
+
+	graph->mp = rte_mempool_create(graph->name, wq_size,
+				       sizeof(struct graph_sched_wq_node),
+				       0, 0, NULL, NULL, NULL, NULL,
+				       graph->socket, MEMPOOL_F_SP_PUT);
+	if (graph->mp == NULL)
+		SET_ERR_JMP(EIO, fail_mp,
+			    "Failed to allocate graph WQ schedule entry");
+
+	graph->lcore_id = _graph->lcore_id;
+
+	if (parent_graph->rq == NULL) {
+		parent_graph->rq = &parent_graph->rq_head;
+		SLIST_INIT(parent_graph->rq);
+	}
+
+	graph->rq = parent_graph->rq;
+	SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
+
+	return 0;
+
+fail_mp:
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+fail:
+	return -rte_errno;
+}
+
+void
+graph_sched_wq_destroy(struct graph *_graph)
+{
+	struct rte_graph *graph = _graph->graph;
+
+	if (graph == NULL)
+		return;
+
+	rte_ring_free(graph->wq);
+	graph->wq = NULL;
+
+	rte_mempool_free(graph->mp);
+	graph->mp = NULL;
+}
+
+static __rte_always_inline bool
+__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	uint16_t off = 0;
+	uint16_t size;
+
+submit_again:
+	if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
+		goto fallback;
+
+	size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs));
+	wq_node->node_off = node->off;
+	wq_node->nb_objs = size;
+	rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *));
+
+	while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node,
+					  sizeof(wq_node), 1, NULL) == 0)
+		rte_pause();
+
+	off += size;
+	node->total_sched_objs += size;
+	node->idx -= size;
+	if (node->idx > 0)
+		goto submit_again;
+
+	return true;
+
+fallback:
+	if (off != 0)
+		memmove(&node->objs[0], &node->objs[off],
+			node->idx * sizeof(void *));
+
+	node->total_sched_fail += node->idx;
+
+	return false;
+}
+
+bool __rte_noinline
+__rte_graph_sched_node_enqueue(struct rte_node *node,
+			       struct rte_graph_rq_head *rq)
+{
+	const unsigned int lcore_id = node->lcore_id;
+	struct rte_graph *graph;
+
+	SLIST_FOREACH(graph, rq, rq_next)
+		if (graph->lcore_id == lcore_id)
+			break;
+
+	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
+}
+
+void __rte_noinline
+__rte_graph_sched_wq_process(struct rte_graph *graph)
+{
+	struct graph_sched_wq_node *wq_node;
+	struct rte_mempool *mp = graph->mp;
+	struct rte_ring *wq = graph->wq;
+	uint16_t idx, free_space;
+	struct rte_node *node;
+	unsigned int i, n;
+	uint64_t start;
+	uint16_t rc;
+	void **objs;
+	struct graph_sched_wq_node *wq_nodes[32];
+
+	n = rte_ring_sc_dequeue_burst_elem(wq, wq_nodes, sizeof(wq_nodes[0]),
+					   RTE_DIM(wq_nodes), NULL);
+	if (n == 0)
+		return;
+
+	for (i = 0; i < n; i++) {
+		wq_node = wq_nodes[i];
+		node = RTE_PTR_ADD(graph, wq_node->node_off);
+		RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
+		idx = node->idx;
+		free_space = node->size - idx;
+
+		if (unlikely(free_space < wq_node->nb_objs))
+			__rte_node_stream_alloc_size(graph, node, node->size + wq_node->nb_objs);
+
+		memmove(&node->objs[idx], wq_node->objs, wq_node->nb_objs * sizeof(void *));
+		memset(wq_node->objs, 0, wq_node->nb_objs * sizeof(void *));
+		node->idx = idx + wq_node->nb_objs;
+		objs = node->objs;
+		rte_prefetch0(objs);
+
+		if (rte_graph_has_stats_feature()) {
+			start = rte_rdtsc();
+			rc = node->process(graph, node, objs, wq_node->nb_objs);
+			node->total_cycles += rte_rdtsc() - start;
+			node->total_calls++;
+			node->total_objs += rc;
+		} else {
+			node->process(graph, node, objs, wq_node->nb_objs);
+		}
+		wq_node->nb_objs = 0;
+		node->idx = 0;
+	}
+
+	rte_mempool_put_bulk(mp, (void **)wq_nodes, n);
+}
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..440063b24e 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -13,8 +13,10 @@ sources = files(
         'graph_ops.c',
         'graph_debug.c',
         'graph_stats.c',
+        'graph_sched.c',
         'graph_populate.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
 deps += ['eal']
+deps += ['mempool', 'ring']
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index ca0185ec36..faf3f31ddc 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -28,6 +28,13 @@
 extern "C" {
 #endif
 
+/**
+ * @internal
+ *
+ * Singly-linked list head for graph schedule run-queue.
+ */
+SLIST_HEAD(rte_graph_rq_head, rte_graph);
+
 /**
  * @internal
  *
@@ -39,6 +46,14 @@ struct rte_graph {
 	uint32_t cir_mask;	     /**< Circular buffer wrap around mask. */
 	rte_node_t nb_nodes;	     /**< Number of nodes in the graph. */
 	rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
+	/* Graph schedule area --BEGIN-- */
+	struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */
+	struct rte_graph_rq_head rq_head; /* The head for run-queue list */
+	SLIST_ENTRY(rte_graph) rq_next;   /* The next for run-queue list */
+	unsigned int lcore_id;  /**< The graph running Lcore. */
+	struct rte_ring *wq;    /**< The work-queue for pending streams. */
+	struct rte_mempool *mp; /**< The mempool for scheduling streams. */
+	/* Graph schedule area --END-- */
 	rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
@@ -63,6 +78,8 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/* Fast schedule area */
+	unsigned int lcore_id __rte_cache_aligned;  /**< Node running Lcore. */
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
@@ -118,6 +135,34 @@ __rte_experimental
 void __rte_node_stream_alloc_size(struct rte_graph *graph,
 				  struct rte_node *node, uint16_t req_size);
 
+/**
+ * @internal
+ *
+ * Schedule the node to the right graph's work queue.
+ *
+ * @param node
+ *   Pointer to the scheduled node object.
+ * @param rq
+ *   Pointer to the scheduled run-queue for all graphs.
+ *
+ * @return
+ *   True on success, false otherwise.
+ */
+__rte_experimental
+bool __rte_graph_sched_node_enqueue(struct rte_node *node,
+				    struct rte_graph_rq_head *rq);
+
+/**
+ * @internal
+ *
+ * Process all nodes (streams) in the graph's work queue.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ */
+__rte_experimental
+void __rte_graph_sched_wq_process(struct rte_graph *graph);
+
 /**
  * Perform graph walk on the circular buffer and invoke the process function
  * of the nodes and collect the stats.
diff --git a/lib/graph/version.map b/lib/graph/version.map
index 6fc43e4411..99f82eb21a 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -1,6 +1,9 @@
 EXPERIMENTAL {
 	global:
 
+	__rte_graph_sched_node_enqueue;
+	__rte_graph_sched_wq_process;
+
 	__rte_node_register;
 	__rte_node_stream_alloc;
 	__rte_node_stream_alloc_size;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, v1 4/6] graph: enhance graph walk by cross-core dispatch
  2022-09-08  2:09 [RFC, v1 0/6] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (2 preceding siblings ...)
  2022-09-08  2:09 ` [RFC, v1 3/6] graph: enable stream moving cross cores Zhirun Yan
@ 2022-09-08  2:09 ` Zhirun Yan
  2022-09-08  5:27   ` [EXT] " Pavan Nikhilesh Bhagavatula
  2022-09-08  2:09 ` [RFC, v1 5/6] graph: add stats for corss-core dispatching Zhirun Yan
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Zhirun Yan @ 2022-09-08  2:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark; +Cc: cunming.liang, haiyue.wang, Zhirun Yan

This patch enhance the task scheduler mechanism to enable dispatching
tasks to another worker cores. Currently, there is only a local work
queue for one graph to walk. We introduce a scheduler worker queue in
each worker core for dispatching tasks. It will perform the walk on
scheduler work queue first, then handle the local work queue.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph.c            |  6 ++++++
 lib/graph/rte_graph_worker.h | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index b4eb18175a..49ea2b3fbb 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -368,6 +368,8 @@ rte_graph_destroy(rte_graph_t id)
 	while (graph != NULL) {
 		tmp = STAILQ_NEXT(graph, next);
 		if (graph->id == id) {
+			/* Destroy the schedule work queue if has */
+			graph_sched_wq_destroy(graph);
 			/* Call fini() of the all the nodes in the graph */
 			graph_node_fini(graph);
 			/* Destroy graph fast path memory */
@@ -470,6 +472,10 @@ graph_clone(struct graph *parent_graph, const char *name,
 	if (graph_node_init(graph))
 		goto graph_mem_destroy;
 
+	/* Create the graph schedule work queue */
+	if (graph_sched_wq_create(graph, parent_graph))
+		goto graph_mem_destroy;
+
 	/* All good, Lets add the graph to the list */
 	graph_id++;
 	STAILQ_INSERT_TAIL(&graph_list, graph, next);
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index faf3f31ddc..e98697d880 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -177,6 +177,7 @@ static inline void
 rte_graph_walk(struct rte_graph *graph)
 {
 	const rte_graph_off_t *cir_start = graph->cir_start;
+	const unsigned int lcore_id = graph->lcore_id;
 	const rte_node_t mask = graph->cir_mask;
 	uint32_t head = graph->head;
 	struct rte_node *node;
@@ -184,6 +185,9 @@ rte_graph_walk(struct rte_graph *graph)
 	uint16_t rc;
 	void **objs;
 
+	if (graph->wq != NULL)
+		__rte_graph_sched_wq_process(graph);
+
 	/*
 	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
 	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
@@ -205,6 +209,12 @@ rte_graph_walk(struct rte_graph *graph)
 		objs = node->objs;
 		rte_prefetch0(objs);
 
+		/* Schedule the node until all task/objs are done */
+		if (node->lcore_id != RTE_MAX_LCORE && (int32_t)head > 0 &&
+		    lcore_id != node->lcore_id && graph->rq != NULL &&
+		    __rte_graph_sched_node_enqueue(node, graph->rq))
+			goto next;
+
 		if (rte_graph_has_stats_feature()) {
 			start = rte_rdtsc();
 			rc = node->process(graph, node, objs, node->idx);
@@ -215,6 +225,7 @@ rte_graph_walk(struct rte_graph *graph)
 			node->process(graph, node, objs, node->idx);
 		}
 		node->idx = 0;
+	next:
 		head = likely((int32_t)head > 0) ? head & mask : head;
 	}
 	graph->tail = 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, v1 5/6] graph: add stats for corss-core dispatching
  2022-09-08  2:09 [RFC, v1 0/6] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (3 preceding siblings ...)
  2022-09-08  2:09 ` [RFC, v1 4/6] graph: enhance graph walk by cross-core dispatch Zhirun Yan
@ 2022-09-08  2:09 ` Zhirun Yan
  2022-09-08  2:09 ` [RFC, v1 6/6] examples: add l2fwd-graph Zhirun Yan
  2022-09-20  9:33 ` [RFC, v1 0/6] graph enhancement for multi-core dispatch Jerin Jacob
  6 siblings, 0 replies; 11+ messages in thread
From: Zhirun Yan @ 2022-09-08  2:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark; +Cc: cunming.liang, haiyue.wang, Zhirun Yan

Add stats for cross-core dispatching scheduler if stats collection is
enabled.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 lib/graph/graph_debug.c      |  4 ++++
 lib/graph/graph_stats.c      | 19 +++++++++++++++----
 lib/graph/rte_graph.h        |  3 +++
 lib/graph/rte_graph_worker.h |  3 +++
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index b84412f5dd..168299259b 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -74,6 +74,10 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
 		fprintf(f, "       size=%d\n", n->size);
 		fprintf(f, "       idx=%d\n", n->idx);
 		fprintf(f, "       total_objs=%" PRId64 "\n", n->total_objs);
+		fprintf(f, "       total_sched_objs=%" PRId64 "\n",
+			n->total_sched_objs);
+		fprintf(f, "       total_sched_fail=%" PRId64 "\n",
+			n->total_sched_fail);
 		fprintf(f, "       total_calls=%" PRId64 "\n", n->total_calls);
 		for (i = 0; i < n->nb_edges; i++)
 			fprintf(f, "          edge[%d] <%s>\n", i,
diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index 65e12d46a3..c123ac4087 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -41,15 +41,18 @@ struct rte_graph_cluster_stats {
 
 #define boarder()                                                              \
 	fprintf(f, "+-------------------------------+---------------+--------" \
-		   "-------+---------------+---------------+---------------+-" \
+		   "-------+---------------+---------------+---------------+"\
+		   "---------------+---------------+-" \
 		   "----------+\n")
 
 static inline void
 print_banner(FILE *f)
 {
 	boarder();
-	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls",
-		"|objs", "|realloc_count", "|objs/call", "|objs/sec(10E6)",
+	fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n",
+		"|Node", "|calls",
+		"|objs", "|sched objs", "|sched fail",
+		"|realloc_count", "|objs/call", "|objs/sec(10E6)",
 		"|cycles/call|");
 	boarder();
 }
@@ -77,8 +80,10 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
 
 	fprintf(f,
 		"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
+		"|%-15" PRIu64 "|%-15" PRIu64
 		"|%-15.3f|%-15.6f|%-11.4f|\n",
-		stat->name, calls, objs, stat->realloc_count, objs_per_call,
+		stat->name, calls, objs, stat->sched_objs,
+		stat->sched_fail, stat->realloc_count, objs_per_call,
 		objs_per_sec, cycles_per_call);
 }
 
@@ -331,6 +336,7 @@ static inline void
 cluster_node_arregate_stats(struct cluster_node *cluster)
 {
 	uint64_t calls = 0, cycles = 0, objs = 0, realloc_count = 0;
+	uint64_t sched_objs = 0, sched_fail = 0;
 	struct rte_graph_cluster_node_stats *stat = &cluster->stat;
 	struct rte_node *node;
 	rte_node_t count;
@@ -338,6 +344,9 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	for (count = 0; count < cluster->nb_nodes; count++) {
 		node = cluster->nodes[count];
 
+		sched_objs += node->total_sched_objs;
+		sched_fail += node->total_sched_fail;
+
 		calls += node->total_calls;
 		objs += node->total_objs;
 		cycles += node->total_cycles;
@@ -347,6 +356,8 @@ cluster_node_arregate_stats(struct cluster_node *cluster)
 	stat->calls = calls;
 	stat->objs = objs;
 	stat->cycles = cycles;
+	stat->sched_objs = sched_objs;
+	stat->sched_fail = sched_fail;
 	stat->ts = rte_get_timer_cycles();
 	stat->realloc_count = realloc_count;
 }
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 27fd1e6cd0..1c929d741a 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -208,6 +208,9 @@ struct rte_graph_cluster_node_stats {
 	uint64_t objs;      /**< Current number of objs processed. */
 	uint64_t cycles;    /**< Current number of cycles. */
 
+	uint64_t sched_objs;
+	uint64_t sched_fail;
+
 	uint64_t prev_ts;	/**< Previous call timestamp. */
 	uint64_t prev_calls;	/**< Previous number of calls. */
 	uint64_t prev_objs;	/**< Previous number of processed objs. */
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index e98697d880..51f174c5c1 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -80,6 +80,9 @@ struct rte_node {
 
 	/* Fast schedule area */
 	unsigned int lcore_id __rte_cache_aligned;  /**< Node running Lcore. */
+	uint64_t total_sched_objs; /**< Number of objects scheduled. */
+	uint64_t total_sched_fail; /**< Number of scheduled failure. */
+
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, v1 6/6] examples: add l2fwd-graph
  2022-09-08  2:09 [RFC, v1 0/6] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (4 preceding siblings ...)
  2022-09-08  2:09 ` [RFC, v1 5/6] graph: add stats for corss-core dispatching Zhirun Yan
@ 2022-09-08  2:09 ` Zhirun Yan
  2022-09-20  9:33 ` [RFC, v1 0/6] graph enhancement for multi-core dispatch Jerin Jacob
  6 siblings, 0 replies; 11+ messages in thread
From: Zhirun Yan @ 2022-09-08  2:09 UTC (permalink / raw)
  To: dev, jerinj, kirankumark; +Cc: cunming.liang, haiyue.wang, Zhirun Yan

From: Haiyue Wang <haiyue.wang@intel.com>

l2fwd with graph function.
Adding 3 nodes for quick test, and the node will affinity to worker
core successively.

./dpdk-l2fwd-graph -l 7,8,9,10 -n 4 -a 0000:4b:00.0 --  -P

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
---
 examples/l2fwd-graph/main.c      | 455 +++++++++++++++++++++++++++++++
 examples/l2fwd-graph/meson.build |  25 ++
 examples/l2fwd-graph/node.c      | 263 ++++++++++++++++++
 examples/l2fwd-graph/node.h      |  64 +++++
 examples/meson.build             |   1 +
 5 files changed, 808 insertions(+)
 create mode 100644 examples/l2fwd-graph/main.c
 create mode 100644 examples/l2fwd-graph/meson.build
 create mode 100644 examples/l2fwd-graph/node.c
 create mode 100644 examples/l2fwd-graph/node.h

diff --git a/examples/l2fwd-graph/main.c b/examples/l2fwd-graph/main.c
new file mode 100644
index 0000000000..88ffd84340
--- /dev/null
+++ b/examples/l2fwd-graph/main.c
@@ -0,0 +1,455 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <sys/types.h>
+#include <sys/queue.h>
+#include <netinet/in.h>
+#include <setjmp.h>
+#include <stdarg.h>
+#include <ctype.h>
+#include <errno.h>
+#include <getopt.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_memcpy.h>
+#include <rte_eal.h>
+#include <rte_launch.h>
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_prefetch.h>
+#include <rte_lcore.h>
+#include <rte_per_lcore.h>
+#include <rte_branch_prediction.h>
+#include <rte_interrupts.h>
+#include <rte_random.h>
+#include <rte_debug.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_mempool.h>
+#include <rte_mbuf.h>
+#include <rte_string_fns.h>
+#include <rte_graph_worker.h>
+
+#include "node.h"
+
+#define MAX_PKT_BURST 32
+#define MEMPOOL_CACHE_SIZE 256
+#define RTE_TEST_RX_DESC_DEFAULT 1024
+#define RTE_TEST_TX_DESC_DEFAULT 1024
+
+/* ethernet addresses of ports */
+struct rte_ether_addr l2fwd_ports_eth_addr[RTE_MAX_ETHPORTS];
+
+static struct rte_eth_conf default_port_conf = {
+	.rxmode = {
+		.mq_mode = RTE_ETH_MQ_RX_RSS,
+		.split_hdr_size = 0,
+	},
+	.rx_adv_conf = {
+		.rss_conf = {
+			.rss_key = NULL,
+			.rss_hf = RTE_ETH_RSS_IP,
+		},
+	},
+	.txmode = {
+		.mq_mode = RTE_ETH_MQ_TX_NONE,
+	},
+};
+
+static struct rte_mempool *l2fwd_pktmbuf_pool;
+
+static volatile bool force_quit;
+static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
+static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;
+static uint16_t nb_rxq = 1;
+
+/* Lcore conf */
+struct lcore_conf {
+	int enable;
+	uint16_t port_id;
+
+	struct rte_graph *graph;
+	char name[RTE_GRAPH_NAMESIZE];
+	rte_graph_t graph_id;
+} __rte_cache_aligned;
+
+static struct lcore_conf l2fwd_lcore_conf[RTE_MAX_LCORE];
+
+#define L2FWD_GRAPH_NAME_PREFIX "l2fwd_graph_"
+
+static int
+l2fwd_graph_loop(void *conf)
+{
+	struct lcore_conf *lconf;
+	struct rte_graph *graph;
+	uint32_t lcore_id;
+
+	RTE_SET_USED(conf);
+
+	lcore_id = rte_lcore_id();
+	lconf = &l2fwd_lcore_conf[lcore_id];
+	graph = lconf->graph;
+
+	if (!graph) {
+		RTE_LOG(INFO, L2FWD_GRAPH, "Lcore %u has nothing to do\n",
+			lcore_id);
+		return 0;
+	}
+
+	RTE_LOG(INFO, L2FWD_GRAPH,
+		"Entering graph loop on lcore %u, graph %s\n",
+		lcore_id, graph->name);
+
+	while (likely(!force_quit))
+		rte_graph_walk(graph);
+
+	RTE_LOG(INFO, L2FWD_GRAPH,
+		"Leaving graph loop on lcore %u, graph %s\n",
+		lcore_id, graph->name);
+
+	return 0;
+}
+
+static void
+print_stats(void)
+{
+	const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'};
+	const char clr[] = {27, '[', '2', 'J', '\0'};
+	struct rte_graph_cluster_stats_param s_param;
+	struct rte_graph_cluster_stats *stats;
+	const char *pattern = L2FWD_GRAPH_NAME_PREFIX "*";
+
+	/* Prepare stats object */
+	memset(&s_param, 0, sizeof(s_param));
+	s_param.f = stdout;
+	s_param.socket_id = SOCKET_ID_ANY;
+	s_param.graph_patterns = &pattern;
+	s_param.nb_graph_patterns = 1;
+
+	stats = rte_graph_cluster_stats_create(&s_param);
+	if (stats == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to create stats object\n");
+
+	while (!force_quit) {
+		/* Clear screen and move to top left */
+		printf("%s%s", clr, topLeft);
+		rte_graph_cluster_stats_get(stats, 0);
+		rte_delay_ms(1E3);
+	}
+
+	rte_graph_cluster_stats_destroy(stats);
+}
+
+static void
+signal_handler(int signum)
+{
+	if (signum == SIGINT || signum == SIGTERM) {
+		printf("\n\nSignal %d received, preparing to exit...\n",
+				signum);
+		force_quit = true;
+	}
+}
+
+enum {
+	/* Long options mapped to a short option */
+
+	/* First long only option value must be >= 256, so that we won't
+	 * conflict with short options
+	 */
+	CMD_LINE_OPT_MIN_NUM = 256,
+	CMD_LINE_OPT_RXQ_NUM,
+};
+
+int
+main(int argc, char **argv)
+{
+	static const char *default_patterns[] = {
+		"l2fwd_pkt_rx-0-0",
+		"l2fwd_pkt_fwd",
+		"l2fwd_pkt_tx",
+		"l2fwd_pkt_drop"
+	};
+	struct rte_graph_param graph_conf;
+	union l2fwd_node_ctx node_ctx;
+	struct lcore_conf *lconf;
+	uint16_t nb_ports_available = 0;
+	uint16_t nb_ports;
+	uint16_t portid;
+	uint16_t fwd_portid = 0;
+	unsigned int nb_lcores = 0;
+	unsigned int lcore_id;
+	unsigned int nb_mbufs;
+	rte_graph_t main_graph_id = RTE_GRAPH_ID_INVALID;
+	rte_graph_t graph_id;
+	uint16_t n;
+	int ret;
+
+	/* init EAL */
+	ret = rte_eal_init(argc, argv);
+	if (ret < 0)
+		rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
+	argc -= ret;
+	argv += ret;
+
+	force_quit = false;
+	signal(SIGINT, signal_handler);
+	signal(SIGTERM, signal_handler);
+
+	nb_ports = rte_eth_dev_count_avail();
+	if (nb_ports == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");
+
+	nb_lcores = 2;
+	nb_mbufs = RTE_MAX(nb_ports * ((nb_rxd * nb_rxq) + nb_txd + MAX_PKT_BURST +
+			   nb_lcores * MEMPOOL_CACHE_SIZE), 8192U);
+
+	/* create the mbuf pool */
+	l2fwd_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", nb_mbufs,
+						     MEMPOOL_CACHE_SIZE, 0,
+						     RTE_MBUF_DEFAULT_BUF_SIZE,
+						     rte_socket_id());
+	if (l2fwd_pktmbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n");
+
+	/* Initialise each port */
+	RTE_ETH_FOREACH_DEV(portid) {
+		struct rte_eth_conf local_port_conf = default_port_conf;
+		struct rte_eth_dev_info dev_info;
+		struct rte_eth_rxconf rxq_conf;
+		struct rte_eth_txconf txq_conf;
+
+		nb_ports_available++;
+
+		/* init port */
+		printf("Initializing port %u... ", portid);
+		fflush(stdout);
+
+		ret = rte_eth_dev_info_get(portid, &dev_info);
+		if (ret != 0)
+			rte_exit(EXIT_FAILURE,
+				"Error during getting device (port %u) info: %s\n",
+				portid, strerror(-ret));
+
+		if (dev_info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE)
+			local_port_conf.txmode.offloads |=
+				RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
+
+		local_port_conf.rx_adv_conf.rss_conf.rss_hf &=
+			dev_info.flow_type_rss_offloads;
+		if (local_port_conf.rx_adv_conf.rss_conf.rss_hf !=
+		    default_port_conf.rx_adv_conf.rss_conf.rss_hf) {
+			printf("Port %u modified RSS hash function based on "
+			       "hardware support,"
+			       "requested:%#" PRIx64 " configured:%#" PRIx64
+			       "\n",
+			       portid, default_port_conf.rx_adv_conf.rss_conf.rss_hf,
+			       local_port_conf.rx_adv_conf.rss_conf.rss_hf);
+		}
+
+		ret = rte_eth_dev_configure(portid, nb_rxq, 1, &local_port_conf);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot configure device: err=%d, port=%u\n",
+				  ret, portid);
+
+		ret = rte_eth_dev_adjust_nb_rx_tx_desc(portid, &nb_rxd, &nb_txd);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Cannot adjust number of descriptors: err=%d, port=%u\n",
+				 ret, portid);
+
+		ret = rte_eth_macaddr_get(portid,
+					  &l2fwd_ports_eth_addr[portid]);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Cannot get MAC address: err=%d, port=%u\n",
+				 ret, portid);
+
+		/* init RX queues */
+		fflush(stdout);
+		for (n = 0;  n < nb_rxq; n++) {
+			rxq_conf = dev_info.default_rxconf;
+			rxq_conf.offloads = local_port_conf.rxmode.offloads;
+			ret = rte_eth_rx_queue_setup(portid, n, nb_rxd,
+						     rte_eth_dev_socket_id(portid),
+						     &rxq_conf,
+						     l2fwd_pktmbuf_pool);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					 "rte_eth_rx_queue_setup:err=%d, port=%u, queue=%u\n",
+					 ret, portid, n);
+		}
+
+		/* init one TX queue on each port */
+		fflush(stdout);
+		txq_conf = dev_info.default_txconf;
+		txq_conf.offloads = local_port_conf.txmode.offloads;
+		ret = rte_eth_tx_queue_setup(portid, 0, nb_txd,
+					     rte_eth_dev_socket_id(portid),
+					     &txq_conf);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
+				 ret, portid);
+
+		/* Start device */
+		ret = rte_eth_dev_start(portid);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "rte_eth_dev_start:err=%d, port=%u\n",
+				 ret, portid);
+
+		ret = rte_eth_promiscuous_enable(portid);
+		if (ret < 0)
+			printf("Failed to set port %u promiscuous mode!\n", portid);
+
+		printf("done:\n");
+
+		printf("Port %u, MAC address: %02X:%02X:%02X:%02X:%02X:%02X\n\n",
+		       portid,
+		       l2fwd_ports_eth_addr[portid].addr_bytes[0],
+		       l2fwd_ports_eth_addr[portid].addr_bytes[1],
+		       l2fwd_ports_eth_addr[portid].addr_bytes[2],
+		       l2fwd_ports_eth_addr[portid].addr_bytes[3],
+		       l2fwd_ports_eth_addr[portid].addr_bytes[4],
+		       l2fwd_ports_eth_addr[portid].addr_bytes[5]);
+
+		fwd_portid = portid;
+	}
+
+	if (!nb_ports_available) {
+		rte_exit(EXIT_FAILURE,
+			"All available ports are disabled. Please set portmask.\n");
+	}
+
+	l2fwd_rx_node_init(0, nb_rxq);
+
+	memset(&node_ctx, 0, sizeof(node_ctx));
+	node_ctx.fwd_ctx.dest_port = fwd_portid;
+	l2fwd_node_ctx_add(l2fwd_get_pkt_fwd_node_id(), &node_ctx);
+
+	memset(&node_ctx, 0, sizeof(node_ctx));
+	node_ctx.rxtx_ctx.port = fwd_portid;
+	node_ctx.rxtx_ctx.queue = 0;
+	l2fwd_node_ctx_add(l2fwd_get_pkt_tx_node_id(), &node_ctx);
+
+	/* Need to set the node Lcore affinity before creating graph */
+	for (n = 0, lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		if (rte_lcore_is_enabled(lcore_id) == 0 ||
+		    rte_get_main_lcore() == lcore_id)
+			continue;
+
+		if (n < RTE_DIM(default_patterns)) {
+			ret = rte_node_set_lcore_affinity(default_patterns[n], lcore_id);
+			if (ret == 0)
+				printf("Set node %s affinity to Lcore %u\n",
+				       default_patterns[n], lcore_id);
+		}
+
+		n++;
+	}
+
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		lconf = &l2fwd_lcore_conf[lcore_id];
+
+		if (rte_lcore_is_enabled(lcore_id) == 0 ||
+		    rte_get_main_lcore() == lcore_id) {
+			lconf->graph_id = RTE_GRAPH_ID_INVALID;
+			continue;
+		}
+
+		if (main_graph_id != RTE_GRAPH_ID_INVALID) {
+			struct rte_graph_clone_param clone_prm;
+
+			snprintf(lconf->name, sizeof(lconf->name), "cloned-%u", lcore_id);
+			memset(&clone_prm, 0, sizeof(clone_prm));
+			clone_prm.lcore_id = lcore_id;
+			graph_id = rte_graph_clone(main_graph_id, lconf->name, &clone_prm);
+			if (graph_id == RTE_GRAPH_ID_INVALID)
+				rte_exit(EXIT_FAILURE,
+					 "Failed to clone graph for lcore %u\n",
+					 lcore_id);
+
+			/* full cloned graph name */
+			snprintf(lconf->name, sizeof(lconf->name), "%s",
+				 rte_graph_id_to_name(graph_id));
+			lconf->graph_id = graph_id;
+			lconf->graph = rte_graph_lookup(lconf->name);
+			if (!lconf->graph)
+				rte_exit(EXIT_FAILURE,
+					 "Failed to lookup graph %s\n",
+					 lconf->name);
+			//rte_graph_obj_dump(stdout, lconf->graph, true);
+			continue;
+		}
+
+		memset(&graph_conf, 0, sizeof(graph_conf));
+		graph_conf.socket_id = rte_lcore_to_socket_id(lcore_id);
+		graph_conf.nb_node_patterns = RTE_DIM(default_patterns);
+		graph_conf.node_patterns = default_patterns;
+
+		snprintf(lconf->name, sizeof(lconf->name), L2FWD_GRAPH_NAME_PREFIX "%u",
+			 lcore_id);
+
+		graph_id = rte_graph_create(lconf->name, &graph_conf);
+		if (graph_id == RTE_GRAPH_ID_INVALID)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to create graph for lcore %u\n",
+				 lcore_id);
+
+		lconf->graph_id = graph_id;
+		lconf->graph = rte_graph_lookup(lconf->name);
+		if (!lconf->graph)
+			rte_exit(EXIT_FAILURE,
+				 "Failed to lookup graph %s\n",
+				 lconf->name);
+		//rte_graph_obj_dump(stdout, lconf->graph, true);
+		main_graph_id = graph_id;
+	}
+
+	rte_eal_mp_remote_launch(l2fwd_graph_loop, NULL, SKIP_MAIN);
+
+	if (rte_graph_has_stats_feature())
+		print_stats();
+
+	ret = 0;
+
+	printf("Waiting all graphs to quit ...\n");
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (rte_eal_wait_lcore(lcore_id) < 0) {
+			ret = -1;
+			break;
+		}
+
+		lconf = &l2fwd_lcore_conf[lcore_id];
+		printf("Lcore %u with graph %s quit!\n", lcore_id,
+		       lconf->graph != NULL ? lconf->name : "Empty");
+	}
+
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		graph_id = l2fwd_lcore_conf[lcore_id].graph_id;
+		if (graph_id != RTE_GRAPH_ID_INVALID)
+			rte_graph_destroy(graph_id);
+	}
+
+	RTE_ETH_FOREACH_DEV(portid) {
+		printf("Closing port %d...", portid);
+		ret = rte_eth_dev_stop(portid);
+		if (ret != 0)
+			printf("rte_eth_dev_stop: err=%d, port=%d\n",
+			       ret, portid);
+		rte_eth_dev_close(portid);
+		printf(" Done\n");
+	}
+	printf("Bye...\n");
+
+	return ret;
+}
diff --git a/examples/l2fwd-graph/meson.build b/examples/l2fwd-graph/meson.build
new file mode 100644
index 0000000000..ce417c7ffe
--- /dev/null
+++ b/examples/l2fwd-graph/meson.build
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2020 Marvell International Ltd.
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+allow_experimental_apis = true
+
+deps += ['graph', 'eal', 'lpm', 'ethdev']
+
+if dpdk_conf.has('RTE_MEMPOOL_RING')
+	deps += 'mempool_ring'
+endif
+
+if dpdk_conf.has('RTE_NET_ICE')
+	deps += 'net_ice'
+endif
+
+sources = files(
+	'node.c',
+	'main.c'
+)
+
diff --git a/examples/l2fwd-graph/node.c b/examples/l2fwd-graph/node.c
new file mode 100644
index 0000000000..f0354536ab
--- /dev/null
+++ b/examples/l2fwd-graph/node.c
@@ -0,0 +1,263 @@
+#include <rte_debug.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_graph.h>
+#include <rte_graph_worker.h>
+#include <rte_mbuf.h>
+
+#include "node.h"
+
+static struct l2fwd_node_ctx_head l2fwd_node_ctx_list =
+			STAILQ_HEAD_INITIALIZER(l2fwd_node_ctx_list);
+static rte_spinlock_t l2fwd_node_ctx_lock = RTE_SPINLOCK_INITIALIZER;
+
+int
+l2fwd_node_ctx_add(rte_node_t node_id, union l2fwd_node_ctx *ctx)
+{
+	struct l2fwd_node_ctx_elem *ctx_elem, *temp;
+	int ret;
+
+	ctx_elem = calloc(1, sizeof(*temp));
+	if (ctx_elem == NULL) {
+		RTE_LOG(ERR, L2FWD_GRAPH,
+			"Failed to calloc node %u context object\n", node_id);
+		return -ENOMEM;
+	}
+
+	ctx_elem->id = node_id;
+	rte_memcpy(&ctx_elem->ctx, ctx, sizeof(*ctx));
+
+	rte_spinlock_lock(&l2fwd_node_ctx_lock);
+
+	STAILQ_FOREACH(temp, &l2fwd_node_ctx_list, next) {
+		if (temp->id == node_id) {
+			ret = -EEXIST;
+			RTE_LOG(ERR, L2FWD_GRAPH,
+				"The node %u context exists\n", node_id);
+			rte_spinlock_unlock(&l2fwd_node_ctx_lock);
+			goto fail;
+		}
+	}
+
+	STAILQ_INSERT_TAIL(&l2fwd_node_ctx_list, ctx_elem, next);
+
+	rte_spinlock_unlock(&l2fwd_node_ctx_lock);
+
+	return 0;
+
+fail:
+	free(temp);
+	return ret;
+}
+
+int
+l2fwd_node_ctx_del(rte_node_t node_id)
+{
+	struct l2fwd_node_ctx_elem *ctx_elem, *found = NULL;
+
+	rte_spinlock_lock(&l2fwd_node_ctx_lock);
+
+	STAILQ_FOREACH(ctx_elem, &l2fwd_node_ctx_list, next) {
+		if (ctx_elem->id == node_id) {
+			STAILQ_REMOVE(&l2fwd_node_ctx_list, ctx_elem, l2fwd_node_ctx_elem, next);
+			found = ctx_elem;
+			break;
+		}
+	}
+
+	rte_spinlock_unlock(&l2fwd_node_ctx_lock);
+
+	if (found) {
+		free(found);
+		return 0;
+	} else {
+		return -1;
+	}
+}
+
+static int
+l2fwd_node_init(const struct rte_graph *graph, struct rte_node *node)
+{
+	struct l2fwd_node_ctx_elem *ctx_elem;
+	int ret = -1;
+
+	RTE_SET_USED(graph);
+
+	rte_spinlock_lock(&l2fwd_node_ctx_lock);
+
+	STAILQ_FOREACH(ctx_elem, &l2fwd_node_ctx_list, next) {
+		if (ctx_elem->id == node->id) {
+			rte_memcpy(node->ctx, ctx_elem->ctx.ctx, RTE_NODE_CTX_SZ);
+			ret = 0;
+			break;
+		}
+	}
+
+	rte_spinlock_unlock(&l2fwd_node_ctx_lock);
+
+	return ret;
+}
+
+static uint16_t
+l2fwd_pkt_rx_node_process(struct rte_graph *graph, struct rte_node *node,
+			  void **objs, uint16_t cnt)
+{
+	struct l2fwd_node_rxtx_ctx *ctx = (struct l2fwd_node_rxtx_ctx *)node->ctx;
+	uint16_t n_pkts;
+
+	RTE_SET_USED(objs);
+	RTE_SET_USED(cnt);
+
+	n_pkts = rte_eth_rx_burst(ctx->port, ctx->queue, (struct rte_mbuf **)node->objs,
+				  RTE_GRAPH_BURST_SIZE);
+	if (!n_pkts)
+		return 0;
+
+	node->idx = n_pkts;
+
+	rte_node_next_stream_move(graph, node, L2FWD_RX_NEXT_PKT_FWD);
+
+	return n_pkts;
+}
+
+static struct rte_node_register l2fwd_pkt_rx_node_base = {
+	.name = "l2fwd_pkt_rx",
+	.flags = RTE_NODE_SOURCE_F,
+	.init = l2fwd_node_init,
+	.process = l2fwd_pkt_rx_node_process,
+
+	.nb_edges = L2FWD_RX_NEXT_MAX,
+	.next_nodes = {
+		[L2FWD_RX_NEXT_PKT_FWD] = "l2fwd_pkt_fwd",
+	},
+};
+
+int l2fwd_rx_node_init(uint16_t portid, uint16_t nb_rxq)
+{
+	char name[RTE_NODE_NAMESIZE];
+	union l2fwd_node_ctx ctx;
+	uint32_t id;
+	uint16_t n;
+
+	for (n = 0; n < nb_rxq; n++) {
+		snprintf(name, sizeof(name), "%u-%u", portid, n);
+
+		/* Clone a new rx node with same edges as parent */
+		id = rte_node_clone(l2fwd_pkt_rx_node_base.id, name);
+		if (id == RTE_NODE_ID_INVALID)
+			return -EIO;
+
+		memset(&ctx, 0, sizeof(ctx));
+		ctx.rxtx_ctx.port = portid;
+		ctx.rxtx_ctx.queue = n;
+		l2fwd_node_ctx_add(id, &ctx);
+	}
+
+	return 0;
+}
+
+static uint16_t
+l2fwd_pkt_fwd_node_process(struct rte_graph *graph, struct rte_node *node,
+			   void **objs, uint16_t cnt)
+{
+	struct l2fwd_node_fwd_ctx *ctx = (struct l2fwd_node_fwd_ctx *)node->ctx;
+	struct rte_mbuf *mbuf, **pkts;
+	struct rte_ether_addr *dest;
+	struct rte_ether_hdr *eth;
+	uint16_t dest_port;
+	uint16_t i;
+	void *tmp;
+
+	dest_port = ctx->dest_port;
+	dest = &l2fwd_ports_eth_addr[dest_port];
+	pkts = (struct rte_mbuf **)objs;
+
+	for (i = 0; i < cnt; i++) {
+		mbuf = pkts[i];
+
+		eth = rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *);
+
+		/* 02:00:00:00:00:xx */
+		tmp = &eth->dst_addr.addr_bytes[0];
+		*((uint64_t *)tmp) = 0x000000000002 + ((uint64_t)dest_port << 40);
+
+		/* src addr */
+		rte_ether_addr_copy(dest, &eth->src_addr);
+	}
+
+	rte_node_enqueue(graph, node, L2FWD_FWD_NEXT_PKT_TX, objs, cnt);
+
+	return cnt;
+}
+
+static struct rte_node_register l2fwd_pkt_fwd_node = {
+	.name = "l2fwd_pkt_fwd",
+	.init = l2fwd_node_init,
+	.process = l2fwd_pkt_fwd_node_process,
+
+	.nb_edges = L2FWD_FWD_NEXT_MAX,
+	.next_nodes = {
+		[L2FWD_FWD_NEXT_PKT_TX] = "l2fwd_pkt_tx",
+	},
+};
+
+rte_node_t
+l2fwd_get_pkt_fwd_node_id(void)
+{
+	return l2fwd_pkt_fwd_node.id;
+}
+
+static uint16_t
+l2fwd_pkt_tx_node_process(struct rte_graph *graph, struct rte_node *node,
+			  void **objs, uint16_t nb_objs)
+{
+	struct l2fwd_node_rxtx_ctx *ctx = (struct l2fwd_node_rxtx_ctx *)node->ctx;
+	uint16_t count;
+
+	count = rte_eth_tx_burst(ctx->port, ctx->queue, (struct rte_mbuf **)objs,
+				 nb_objs);
+	if (count != nb_objs)
+		rte_node_enqueue(graph, node, L2FWD_TX_NEXT_PKT_DROP,
+				 &objs[count], nb_objs - count);
+
+	return count;
+}
+
+static struct rte_node_register l2fwd_pkt_tx_node = {
+	.name = "l2fwd_pkt_tx",
+	.init = l2fwd_node_init,
+	.process = l2fwd_pkt_tx_node_process,
+
+	.nb_edges = L2FWD_TX_NEXT_MAX,
+	.next_nodes = {
+		[L2FWD_TX_NEXT_PKT_DROP] = "l2fwd_pkt_drop",
+	},
+};
+
+rte_node_t
+l2fwd_get_pkt_tx_node_id(void)
+{
+	return l2fwd_pkt_tx_node.id;
+}
+
+static uint16_t
+l2fwd_pkt_drop_node_process(struct rte_graph *graph, struct rte_node *node,
+			    void **objs, uint16_t nb_objs)
+{
+	RTE_SET_USED(node);
+	RTE_SET_USED(graph);
+
+	rte_pktmbuf_free_bulk((struct rte_mbuf **)objs, nb_objs);
+
+	return nb_objs;
+}
+
+static struct rte_node_register l2fwd_pkt_drop_node = {
+	.name = "l2fwd_pkt_drop",
+	.process = l2fwd_pkt_drop_node_process,
+};
+
+RTE_NODE_REGISTER(l2fwd_pkt_rx_node_base);
+RTE_NODE_REGISTER(l2fwd_pkt_fwd_node);
+RTE_NODE_REGISTER(l2fwd_pkt_tx_node);
+RTE_NODE_REGISTER(l2fwd_pkt_drop_node);
diff --git a/examples/l2fwd-graph/node.h b/examples/l2fwd-graph/node.h
new file mode 100644
index 0000000000..201136308c
--- /dev/null
+++ b/examples/l2fwd-graph/node.h
@@ -0,0 +1,64 @@
+#ifndef __NODE_H__
+#define __NODE_H__
+
+#include <inttypes.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "rte_graph.h"
+#include "rte_graph_worker.h"
+
+#define RTE_LOGTYPE_L2FWD_GRAPH RTE_LOGTYPE_USER1
+
+enum l2fwd_rx_next_nodes {
+	L2FWD_RX_NEXT_PKT_FWD,
+	L2FWD_RX_NEXT_MAX,
+};
+
+enum l2fwd_fwd_next_nodes {
+	L2FWD_FWD_NEXT_PKT_TX,
+	L2FWD_FWD_NEXT_MAX,
+};
+
+enum l2fwd_tx_next_nodes {
+	L2FWD_TX_NEXT_PKT_DROP,
+	L2FWD_TX_NEXT_MAX,
+};
+
+struct l2fwd_node_rxtx_ctx {
+	uint16_t port;
+	uint16_t queue;
+};
+
+struct l2fwd_node_fwd_ctx {
+	uint16_t dest_port;
+};
+
+union l2fwd_node_ctx {
+	struct l2fwd_node_rxtx_ctx rxtx_ctx;
+	struct l2fwd_node_fwd_ctx fwd_ctx;
+	uint8_t ctx[RTE_NODE_CTX_SZ];
+};
+
+struct l2fwd_node_ctx_elem {
+	STAILQ_ENTRY(l2fwd_node_ctx_elem) next;
+
+	rte_node_t id;
+
+	union l2fwd_node_ctx ctx;
+};
+
+STAILQ_HEAD(l2fwd_node_ctx_head, l2fwd_node_ctx_elem);
+
+extern struct rte_ether_addr l2fwd_ports_eth_addr[];
+
+int l2fwd_node_ctx_add(rte_node_t node_id, union l2fwd_node_ctx *ctx);
+int l2fwd_node_ctx_del(rte_node_t node_id);
+
+int l2fwd_rx_node_init(uint16_t portid, uint16_t nb_rxq);
+rte_node_t l2fwd_get_pkt_fwd_node_id(void);
+rte_node_t l2fwd_get_pkt_tx_node_id(void);
+
+#endif
diff --git a/examples/meson.build b/examples/meson.build
index 81e93799f2..7c38e63ebf 100644
--- a/examples/meson.build
+++ b/examples/meson.build
@@ -29,6 +29,7 @@ all_examples = [
         'l2fwd-cat',
         'l2fwd-crypto',
         'l2fwd-event',
+        'l2fwd-graph',
         'l2fwd-jobstats',
         'l2fwd-keepalive',
         'l3fwd',
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [EXT] [RFC, v1 4/6] graph: enhance graph walk by cross-core dispatch
  2022-09-08  2:09 ` [RFC, v1 4/6] graph: enhance graph walk by cross-core dispatch Zhirun Yan
@ 2022-09-08  5:27   ` Pavan Nikhilesh Bhagavatula
  2022-09-15  1:52     ` Yan, Zhirun
  0 siblings, 1 reply; 11+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2022-09-08  5:27 UTC (permalink / raw)
  To: Zhirun Yan, dev, Jerin Jacob Kollanukkaran, Kiran Kumar Kokkilagadda
  Cc: cunming.liang, haiyue.wang

> This patch enhance the task scheduler mechanism to enable dispatching
> tasks to another worker cores. Currently, there is only a local work
> queue for one graph to walk. We introduce a scheduler worker queue in
> each worker core for dispatching tasks. It will perform the walk on
> scheduler work queue first, then handle the local work queue.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/graph.c            |  6 ++++++
>  lib/graph/rte_graph_worker.h | 11 +++++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/lib/graph/graph.c b/lib/graph/graph.c
> index b4eb18175a..49ea2b3fbb 100644
> --- a/lib/graph/graph.c
> +++ b/lib/graph/graph.c
> @@ -368,6 +368,8 @@ rte_graph_destroy(rte_graph_t id)
>  	while (graph != NULL) {
>  		tmp = STAILQ_NEXT(graph, next);
>  		if (graph->id == id) {
> +			/* Destroy the schedule work queue if has */
> +			graph_sched_wq_destroy(graph);
>  			/* Call fini() of the all the nodes in the graph */
>  			graph_node_fini(graph);
>  			/* Destroy graph fast path memory */
> @@ -470,6 +472,10 @@ graph_clone(struct graph *parent_graph, const char
> *name,
>  	if (graph_node_init(graph))
>  		goto graph_mem_destroy;
> 
> +	/* Create the graph schedule work queue */
> +	if (graph_sched_wq_create(graph, parent_graph))
> +		goto graph_mem_destroy;
> +
>  	/* All good, Lets add the graph to the list */
>  	graph_id++;
>  	STAILQ_INSERT_TAIL(&graph_list, graph, next);
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index faf3f31ddc..e98697d880 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -177,6 +177,7 @@ static inline void
>  rte_graph_walk(struct rte_graph *graph)
>  {
>  	const rte_graph_off_t *cir_start = graph->cir_start;
> +	const unsigned int lcore_id = graph->lcore_id;
>  	const rte_node_t mask = graph->cir_mask;
>  	uint32_t head = graph->head;
>  	struct rte_node *node;
> @@ -184,6 +185,9 @@ rte_graph_walk(struct rte_graph *graph)
>  	uint16_t rc;
>  	void **objs;
> 
> +	if (graph->wq != NULL)
> +		__rte_graph_sched_wq_process(graph);
> +


We should introduce a flags field in rte_graph_param which can
be used by the application to define whether a graph should support
multi-core dispatch.

Then we can make `__rte_graph_sched_wq_process` as node 0 during graph 
creation so that it will be always called at the start of graph processing followed 
by calling rest of the nodes.
This will remove unnecessary branches in fastpath.

>  	/*
>  	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
>  	 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> @@ -205,6 +209,12 @@ rte_graph_walk(struct rte_graph *graph)
>  		objs = node->objs;
>  		rte_prefetch0(objs);
> 
> +		/* Schedule the node until all task/objs are done */
> +		if (node->lcore_id != RTE_MAX_LCORE && (int32_t)head > 0
> &&
> +		    lcore_id != node->lcore_id && graph->rq != NULL &&
> +		    __rte_graph_sched_node_enqueue(node, graph->rq))
> +			goto next;
> +
>  		if (rte_graph_has_stats_feature()) {
>  			start = rte_rdtsc();
>  			rc = node->process(graph, node, objs, node->idx);
> @@ -215,6 +225,7 @@ rte_graph_walk(struct rte_graph *graph)
>  			node->process(graph, node, objs, node->idx);
>  		}
>  		node->idx = 0;
> +	next:
>  		head = likely((int32_t)head > 0) ? head & mask : head;
>  	}
>  	graph->tail = 0;
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [EXT] [RFC, v1 4/6] graph: enhance graph walk by cross-core dispatch
  2022-09-08  5:27   ` [EXT] " Pavan Nikhilesh Bhagavatula
@ 2022-09-15  1:52     ` Yan, Zhirun
  0 siblings, 0 replies; 11+ messages in thread
From: Yan, Zhirun @ 2022-09-15  1:52 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, dev, Jerin Jacob Kollanukkaran,
	Kiran Kumar Kokkilagadda
  Cc: Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Thursday, September 8, 2022 1:27 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>; dev@dpdk.org; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>
> Cc: Liang, Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: RE: [EXT] [RFC, v1 4/6] graph: enhance graph walk by cross-core
> dispatch
> 
> > This patch enhance the task scheduler mechanism to enable dispatching
> > tasks to another worker cores. Currently, there is only a local work
> > queue for one graph to walk. We introduce a scheduler worker queue in
> > each worker core for dispatching tasks. It will perform the walk on
> > scheduler work queue first, then handle the local work queue.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/graph.c            |  6 ++++++
> >  lib/graph/rte_graph_worker.h | 11 +++++++++++
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/lib/graph/graph.c b/lib/graph/graph.c index
> > b4eb18175a..49ea2b3fbb 100644
> > --- a/lib/graph/graph.c
> > +++ b/lib/graph/graph.c
> > @@ -368,6 +368,8 @@ rte_graph_destroy(rte_graph_t id)
> >  	while (graph != NULL) {
> >  		tmp = STAILQ_NEXT(graph, next);
> >  		if (graph->id == id) {
> > +			/* Destroy the schedule work queue if has */
> > +			graph_sched_wq_destroy(graph);
> >  			/* Call fini() of the all the nodes in the graph */
> >  			graph_node_fini(graph);
> >  			/* Destroy graph fast path memory */ @@ -470,6 +472,10
> @@
> > graph_clone(struct graph *parent_graph, const char *name,
> >  	if (graph_node_init(graph))
> >  		goto graph_mem_destroy;
> >
> > +	/* Create the graph schedule work queue */
> > +	if (graph_sched_wq_create(graph, parent_graph))
> > +		goto graph_mem_destroy;
> > +
> >  	/* All good, Lets add the graph to the list */
> >  	graph_id++;
> >  	STAILQ_INSERT_TAIL(&graph_list, graph, next); diff --git
> > a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index
> > faf3f31ddc..e98697d880 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -177,6 +177,7 @@ static inline void  rte_graph_walk(struct
> > rte_graph *graph)  {
> >  	const rte_graph_off_t *cir_start = graph->cir_start;
> > +	const unsigned int lcore_id = graph->lcore_id;
> >  	const rte_node_t mask = graph->cir_mask;
> >  	uint32_t head = graph->head;
> >  	struct rte_node *node;
> > @@ -184,6 +185,9 @@ rte_graph_walk(struct rte_graph *graph)
> >  	uint16_t rc;
> >  	void **objs;
> >
> > +	if (graph->wq != NULL)
> > +		__rte_graph_sched_wq_process(graph);
> > +
> 
> 
> We should introduce a flags field in rte_graph_param which can be used by
> the application to define whether a graph should support multi-core
> dispatch.
> 
Yes, I will add a flags field in next version.

> Then we can make `__rte_graph_sched_wq_process` as node 0 during graph
> creation so that it will be always called at the start of graph processing
> followed by calling rest of the nodes.
> This will remove unnecessary branches in fastpath.
> 

Thanks for your comments and sorry for my late reply.
Yes, we can make `__rte_graph_sched_wq_process` as node 0 with dispatch flags.
But I am not sure whether we need to register a new node here. It means we should change
the graph topo, and it will be an isolated node with no links.


> >  	/*
> >  	 * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> > then
> >  	 * on the pending streams (cir_start -> (cir_start + mask) ->
> > cir_start) @@ -205,6 +209,12 @@ rte_graph_walk(struct rte_graph
> *graph)
> >  		objs = node->objs;
> >  		rte_prefetch0(objs);
> >
> > +		/* Schedule the node until all task/objs are done */
> > +		if (node->lcore_id != RTE_MAX_LCORE && (int32_t)head > 0
> > &&
> > +		    lcore_id != node->lcore_id && graph->rq != NULL &&
> > +		    __rte_graph_sched_node_enqueue(node, graph->rq))
> > +			goto next;
> > +
> >  		if (rte_graph_has_stats_feature()) {
> >  			start = rte_rdtsc();
> >  			rc = node->process(graph, node, objs, node->idx); @@ -215,6
> +225,7
> > @@ rte_graph_walk(struct rte_graph *graph)
> >  			node->process(graph, node, objs, node->idx);
> >  		}
> >  		node->idx = 0;
> > +	next:
> >  		head = likely((int32_t)head > 0) ? head & mask : head;
> >  	}
> >  	graph->tail = 0;
> > --
> > 2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC, v1 0/6] graph enhancement for multi-core dispatch
  2022-09-08  2:09 [RFC, v1 0/6] graph enhancement for multi-core dispatch Zhirun Yan
                   ` (5 preceding siblings ...)
  2022-09-08  2:09 ` [RFC, v1 6/6] examples: add l2fwd-graph Zhirun Yan
@ 2022-09-20  9:33 ` Jerin Jacob
  2022-09-30  6:41   ` Yan, Zhirun
  6 siblings, 1 reply; 11+ messages in thread
From: Jerin Jacob @ 2022-09-20  9:33 UTC (permalink / raw)
  To: Zhirun Yan; +Cc: dev, jerinj, kirankumark, cunming.liang, haiyue.wang

On Thu, Sep 8, 2022 at 7:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Currently, the rte_graph_walk() and rte_node_enqueue* fast path API
> functions in graph lib implementation are designed to work on single-core.
>
> This solution(RFC) proposes usage of cross-core dispatching mechanism to
> enhance the graph scaling strategy. We introduce Scheduler Workqueue
> then we could directly dispatch streams to another worker core which is
> affinity with a specific node.
>
> This RFC:
>   1. Introduce core affinity API and graph clone API.
>   2. Introduce key functions to enqueue/dequeue for dispatching streams.
>   3. Enhance rte_graph_walk by cross-core dispatch.
>   4. Add l2fwd-graph example and stats for cross-core dispatching.
>
> With this patch set, it could easily plan and orchestrate stream on
> multi-core systems.
>
> Future work:
>   1. Support to affinity lcore set for one node.
>   2. Use l3fwd-graph instead of l2fwd-graph as example in patch 06.
>   3. Add new parameter, like --node(nodeid, lcoreid) to config node for core
>   affinity.
>
> Comments and suggestions are welcome. Thanks!

Some top level comments.

1)Yes it makes sense to not create the l2fwd-graph, Please enhance the
l3fwd-graph and compare the performance with multi core scenarios.

2) It is good to have multiple graph walk schemes like the one you
have introduced now.
Though I am not sure about performance aspects, specifically, it is
used with multiple producers and multi consumers with node.

If you have a use case for the new worker scheme, then we can add it.
I think, it would call for

a) We need to have separate rte_graph_worker.h for each implementation
to avoid the performance impact for each other.
That may boils down to
i) Create lib/graph/rte_graph_worker_common.h
ii) Treat existing rte_graph_worker.h as default scheme and include
rte_graph_worker_common.h
iii) Add new rte_graph_worker_xxx.h for the new scheme(diff between
default worker) with leveraging te_graph_worker_common.h

Application can select the worker by

#define RTE_GRAPH_WORKER_MODEL_XXX
//#define RTE_GRAPH_WORKER_MODEL_YYY
#include <rte_graph_worker.h>

b) Introduce a new enum rte_graph_model or so to express this  new
model and other models in feature

c) Each core has its own node instance so we don't need explicit
critical section management when dealing with node instances.
In this new scheme, Can we leverage the existing node implementation?
If not, we need to have separate node
implementation for different graph models. It will be a maintenance
issue. But if we really need to take this path,
Probably on each node's capability, the node needs to declare the
models supported(Use enum rte_graph_model).
This can be used for sanity checking when we clone the graph etc and
check the compatibility for creating the graph etc.
I think this is the biggest issue with adding a new model. Where nodes
need to be written based on the model. I think this could
be the reason for VPP not adding other models.

d) All new slowpath APIs like rte_node_set_lcore_affinity,
rte_graph_clone, We need to fix the namespace by
rte_graph_model_<model_name>_<operation> or so to make sure
application writer understand this APIs
are only for this model.(Also we can use "enum rte_graph_model" for
sanity check etc)



>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC, v1 0/6] graph enhancement for multi-core dispatch
  2022-09-20  9:33 ` [RFC, v1 0/6] graph enhancement for multi-core dispatch Jerin Jacob
@ 2022-09-30  6:41   ` Yan, Zhirun
  0 siblings, 0 replies; 11+ messages in thread
From: Yan, Zhirun @ 2022-09-30  6:41 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, jerinj, kirankumark, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, September 20, 2022 5:33 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com; Liang,
> Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [RFC, v1 0/6] graph enhancement for multi-core dispatch
> 
> On Thu, Sep 8, 2022 at 7:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Currently, the rte_graph_walk() and rte_node_enqueue* fast path API
> > functions in graph lib implementation are designed to work on single-core.
> >
> > This solution(RFC) proposes usage of cross-core dispatching mechanism
> > to enhance the graph scaling strategy. We introduce Scheduler
> > Workqueue then we could directly dispatch streams to another worker
> > core which is affinity with a specific node.
> >
> > This RFC:
> >   1. Introduce core affinity API and graph clone API.
> >   2. Introduce key functions to enqueue/dequeue for dispatching streams.
> >   3. Enhance rte_graph_walk by cross-core dispatch.
> >   4. Add l2fwd-graph example and stats for cross-core dispatching.
> >
> > With this patch set, it could easily plan and orchestrate stream on
> > multi-core systems.
> >
> > Future work:
> >   1. Support to affinity lcore set for one node.
> >   2. Use l3fwd-graph instead of l2fwd-graph as example in patch 06.
> >   3. Add new parameter, like --node(nodeid, lcoreid) to config node for
> core
> >   affinity.
> >
> > Comments and suggestions are welcome. Thanks!
> 
> Some top level comments.
> 
> 1)Yes it makes sense to not create the l2fwd-graph, Please enhance the
> l3fwd-graph and compare the performance with multi core scenarios.
> 
Thanks for your comments.
Yes, I will use l3fwd-graph and compare the performance in next version.


> 2) It is good to have multiple graph walk schemes like the one you have
> introduced now.
> Though I am not sure about performance aspects, specifically, it is used with
> multiple producers and multi consumers with node.
> 
> If you have a use case for the new worker scheme, then we can add it.
> I think, it would call for
> 
> a) We need to have separate rte_graph_worker.h for each implementation to
> avoid the performance impact for each other.
> That may boils down to
> i) Create lib/graph/rte_graph_worker_common.h
> ii) Treat existing rte_graph_worker.h as default scheme and include
> rte_graph_worker_common.h
> iii) Add new rte_graph_worker_xxx.h for the new scheme(diff between
> default worker) with leveraging te_graph_worker_common.h
> 
> Application can select the worker by
> 
> #define RTE_GRAPH_WORKER_MODEL_XXX
> //#define RTE_GRAPH_WORKER_MODEL_YYY
> #include <rte_graph_worker.h>
> 
Yes, I will break it down in next version.

> b) Introduce a new enum rte_graph_model or so to express this  new model
> and other models in feature
> 
> c) Each core has its own node instance so we don't need explicit critical
> section management when dealing with node instances.
> In this new scheme, Can we leverage the existing node implementation?
> If not, we need to have separate node
> implementation for different graph models. It will be a maintenance issue.
> But if we really need to take this path, Probably on each node's capability,
> the node needs to declare the models supported(Use enum
> rte_graph_model).
> This can be used for sanity checking when we clone the graph etc and check
> the compatibility for creating the graph etc.
> I think this is the biggest issue with adding a new model. Where nodes need
> to be written based on the model. I think this could be the reason for VPP
> not adding other models.
> 
I am agree with you. We should leverage the existing node implementation.
Also, I think the node should be agnostic of graph model.

There’re different kinds of data (tables, state, etc.) being referred in the node.
Some of them are private data of node, which has no constraint to any graph model.
For other shared data within graph, it does have thread-safe prerequisite of data access.
Instead of declaring graph model, actually it makes more sense for the node to declare the visibility of data.
It can still be transparent to the node. When running on a ‘single-core’ model, it can simplify fallback to a zero cost data access.


> d) All new slowpath APIs like rte_node_set_lcore_affinity, rte_graph_clone,
> We need to fix the namespace by
> rte_graph_model_<model_name>_<operation> or so to make sure
> application writer understand this APIs are only for this model.(Also we can
> use "enum rte_graph_model" for sanity check etc)
> 
Yes, If the operation is binding with one model, we need add the namespace.

But for rte_node_set_lcore_affinity(), it could be treated as a common API for
all models. 

For current single-core model, one graph is binding with one lcore. And it actually
makes an implicit call to set all nodes to current lcore. So I think
rte_node_set_lcore_affinity() could be a common API.
> 
> 
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-09-30  6:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-08  2:09 [RFC, v1 0/6] graph enhancement for multi-core dispatch Zhirun Yan
2022-09-08  2:09 ` [RFC, v1 1/6] graph: introduce core affinity API into graph Zhirun Yan
2022-09-08  2:09 ` [RFC, v1 2/6] graph: introduce graph clone API for other worker core Zhirun Yan
2022-09-08  2:09 ` [RFC, v1 3/6] graph: enable stream moving cross cores Zhirun Yan
2022-09-08  2:09 ` [RFC, v1 4/6] graph: enhance graph walk by cross-core dispatch Zhirun Yan
2022-09-08  5:27   ` [EXT] " Pavan Nikhilesh Bhagavatula
2022-09-15  1:52     ` Yan, Zhirun
2022-09-08  2:09 ` [RFC, v1 5/6] graph: add stats for corss-core dispatching Zhirun Yan
2022-09-08  2:09 ` [RFC, v1 6/6] examples: add l2fwd-graph Zhirun Yan
2022-09-20  9:33 ` [RFC, v1 0/6] graph enhancement for multi-core dispatch Jerin Jacob
2022-09-30  6:41   ` Yan, Zhirun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).