DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] graph: optimize graph search when scheduling nodes
@ 2024-11-07  8:04 Huichao cai
  2024-11-07  9:37 ` [EXTERNAL] " Jerin Jacob
  2024-11-11  4:03 ` [PATCH v2] graph: mcore: optimize graph search Huichao Cai
  0 siblings, 2 replies; 32+ messages in thread
From: Huichao cai @ 2024-11-07  8:04 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163; +Cc: dev

In the function __rte_graph_ccore_ispatch_stched_node_dequeue,
use a slower loop to search for the graph, modify the search logic
to record the result of the first search, and use this record for
subsequent searches to improve search speed.

Signed-off-by: Huichao cai <chcchc88@163.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.c | 11 +++++++----
 lib/graph/rte_graph_worker_common.h        |  1 +
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index a590fc9..a81d338 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -118,11 +118,14 @@
 					      struct rte_graph_rq_head *rq)
 {
 	const unsigned int lcore_id = node->dispatch.lcore_id;
-	struct rte_graph *graph;
+	struct rte_graph *graph = node->dispatch.graph;
 
-	SLIST_FOREACH(graph, rq, next)
-		if (graph->dispatch.lcore_id == lcore_id)
-			break;
+	if (unlikely((!graph) || (graph->dispatch.lcore_id != lcore_id))) {
+		SLIST_FOREACH(graph, rq, next)
+			if (graph->dispatch.lcore_id == lcore_id)
+				break;
+		node->dispatch.graph = graph;
+	}
 
 	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a518af2..4c2432b 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
 			unsigned int lcore_id;  /**< Node running lcore. */
 			uint64_t total_sched_objs; /**< Number of objects scheduled. */
 			uint64_t total_sched_fail; /**< Number of scheduled failure. */
+			struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
 		} dispatch;
 	};
 	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-07  8:04 [PATCH] graph: optimize graph search when scheduling nodes Huichao cai
@ 2024-11-07  9:37 ` Jerin Jacob
  2024-11-08  1:39   ` Huichao Cai
  2024-11-11  4:03 ` [PATCH v2] graph: mcore: optimize graph search Huichao Cai
  1 sibling, 1 reply; 32+ messages in thread
From: Jerin Jacob @ 2024-11-07  9:37 UTC (permalink / raw)
  To: Huichao cai, Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram,
	yanzhirun_163
  Cc: dev


> -----Original Message-----
> From: Huichao cai <chcchc88@163.com>
> Sent: Thursday, November 7, 2024 1:35 PM
> To: Jerin Jacob <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; yanzhirun_163@163.com
> Cc: dev@dpdk.org
> Subject: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling
> nodes
> 
> In the function __rte_graph_ccore_ispatch_stched_node_dequeue, use a slower
> loop to search for the graph, modify the search logic to record the result of the
> first search, and use this record for subsequent searches to improve search
> speed
> In the function __rte_graph_ccore_ispatch_stched_node_dequeue,
> use a slower loop to search for the graph, modify the search logic to record the
> result of the first search, and use this record for subsequent searches to
> improve search speed.
> 
> Signed-off-by: Huichao cai <chcchc88@163.com>
> ---
>  	return graph != NULL ? __graph_sched_node_enqueue(node, graph) :
> false;  } diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index a518af2..4c2432b 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
>  			unsigned int lcore_id;  /**< Node running lcore. */
>  			uint64_t total_sched_objs; /**< Number of objects
> scheduled. */
>  			uint64_t total_sched_fail; /**< Number of scheduled
> failure. */
> +			struct rte_graph *graph;  /**< Graph corresponding to
> lcore_id. */

Is n't breaking the ABI?

Also, please change commit as following for mcore specific changes 

graph: mcore: ...

>  		} dispatch;
>  	};
>  	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-07  9:37 ` [EXTERNAL] " Jerin Jacob
@ 2024-11-08  1:39   ` Huichao Cai
  2024-11-08 12:22     ` Jerin Jacob
  0 siblings, 1 reply; 32+ messages in thread
From: Huichao Cai @ 2024-11-08  1:39 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, yanzhirun_163, dev

[-- Attachment #1: Type: text/plain, Size: 117 bytes --]

> Is n't breaking the ABI?

So can't we modify the ABI, or is there any special operation required to modify the ABI?

[-- Attachment #2: Type: text/html, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-08  1:39   ` Huichao Cai
@ 2024-11-08 12:22     ` Jerin Jacob
  2024-11-08 13:38       ` David Marchand
  0 siblings, 1 reply; 32+ messages in thread
From: Jerin Jacob @ 2024-11-08 12:22 UTC (permalink / raw)
  To: Huichao Cai
  Cc: Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, yanzhirun_163,
	dev, Thomas Monjalon, david.marchand, Robin Jarry



> -----Original Message-----
> From: Huichao Cai <chcchc88@163.com>
> Sent: Friday, November 8, 2024 7:10 AM
> To: Jerin Jacob <jerinj@marvell.com>
> Cc: Kiran Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar
> Dabilpuram <ndabilpuram@marvell.com>; yanzhirun_163@163.com;
> dev@dpdk.org
> Subject: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when
> scheduling nodes
> 
> > Is n't breaking the ABI? So can't we modify the ABI, or is there any
> > special operation required to modify the ABI? ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
> > ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
> > ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
> > ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
> 
> ZjQcmQRYFpfptBannerEnd
> 
> > Is n't breaking the ABI?
> 
> So can't we modify the ABI, or is there any special operation required to modify
> the ABI?

Only LTS release (xx.11) can change the ABI after sending deprecation notice.
Looking at the pahole output, one option will be making dispatch and new semi fastpath
Additions like  xstat_off can be min cache aligned to make room for future expansion and 
to make sure have better performance.

For xstat_off addition, there was deprecation notice to update rte_node.
If there are no objection, may be we can try following in this release to not wait
Huichao for one more year.


[main] [dpdk.org] $ git diff
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a518af2b2a..ec9a82186d 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -104,6 +104,7 @@ struct __rte_cache_aligned rte_node {
        /** Original process function when pcap is enabled. */
        rte_node_process_t original_process;

+       alignas(RTE_CACHE_LINE_MIN_SIZE)
        union {
                /* Fast schedule area for mcore dispatch model */
                struct {
@@ -112,6 +113,7 @@ struct __rte_cache_aligned rte_node {
                        uint64_t total_sched_fail; /**< Number of scheduled failure. */
                } dispatch;
        };
+       alignas(RTE_CACHE_LINE_MIN_SIZE)
        rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
        /* Fast path area  */
        __extension__ struct __rte_cache_aligned {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-08 12:22     ` Jerin Jacob
@ 2024-11-08 13:38       ` David Marchand
  2024-11-11  5:38         ` Jerin Jacob
  0 siblings, 1 reply; 32+ messages in thread
From: David Marchand @ 2024-11-08 13:38 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Huichao Cai, Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram,
	yanzhirun_163, dev, Thomas Monjalon, Robin Jarry

Hello Jerin,

On Fri, Nov 8, 2024 at 1:22 PM Jerin Jacob <jerinj@marvell.com> wrote:
> > > Is n't breaking the ABI?
> >
> > So can't we modify the ABI, or is there any special operation required to modify
> > the ABI?
>
> Only LTS release (xx.11) can change the ABI after sending deprecation notice.
> Looking at the pahole output, one option will be making dispatch and new semi fastpath
> Additions like  xstat_off can be min cache aligned to make room for future expansion and
> to make sure have better performance.

Adding holes may be a short term solution, but in my opinion, the slow
path part should be entirely hidden and we only expose the fp part.
Reminder, those holes must be in a "known state" as we release v24.11
so that the presence of future additions can be safely detected.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2] graph: mcore: optimize graph search
  2024-11-07  8:04 [PATCH] graph: optimize graph search when scheduling nodes Huichao cai
  2024-11-07  9:37 ` [EXTERNAL] " Jerin Jacob
@ 2024-11-11  4:03 ` Huichao Cai
  2024-11-11  5:46   ` [EXTERNAL] " Jerin Jacob
  2024-11-13  7:35   ` [PATCH v3 1/2] " Huichao Cai
  1 sibling, 2 replies; 32+ messages in thread
From: Huichao Cai @ 2024-11-11  4:03 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163; +Cc: dev, Huichao cai

From: Huichao cai <chcchc88@163.com>

In the function __rte_graph_mcore_dispatch_sched_node_enqueue,
use a slower loop to search for the graph, modify the search logic
to record the result of the first search, and use this record for
subsequent searches to improve search speed.

Signed-off-by: Huichao cai <chcchc88@163.com>
---
 lib/graph/rte_graph_model_mcore_dispatch.c | 11 +++++++----
 lib/graph/rte_graph_worker_common.h        |  1 +
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index a590fc9..a81d338 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -118,11 +118,14 @@
 					      struct rte_graph_rq_head *rq)
 {
 	const unsigned int lcore_id = node->dispatch.lcore_id;
-	struct rte_graph *graph;
+	struct rte_graph *graph = node->dispatch.graph;
 
-	SLIST_FOREACH(graph, rq, next)
-		if (graph->dispatch.lcore_id == lcore_id)
-			break;
+	if (unlikely((!graph) || (graph->dispatch.lcore_id != lcore_id))) {
+		SLIST_FOREACH(graph, rq, next)
+			if (graph->dispatch.lcore_id == lcore_id)
+				break;
+		node->dispatch.graph = graph;
+	}
 
 	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a518af2..4c2432b 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
 			unsigned int lcore_id;  /**< Node running lcore. */
 			uint64_t total_sched_objs; /**< Number of objects scheduled. */
 			uint64_t total_sched_fail; /**< Number of scheduled failure. */
+			struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
 		} dispatch;
 	};
 	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-08 13:38       ` David Marchand
@ 2024-11-11  5:38         ` Jerin Jacob
  2024-11-12  8:51           ` David Marchand
  0 siblings, 1 reply; 32+ messages in thread
From: Jerin Jacob @ 2024-11-11  5:38 UTC (permalink / raw)
  To: David Marchand
  Cc: Huichao Cai, Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram,
	yanzhirun_163, dev, Thomas Monjalon, Robin Jarry



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, November 8, 2024 7:08 PM
> To: Jerin Jacob <jerinj@marvell.com>
> Cc: Huichao Cai <chcchc88@163.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; yanzhirun_163@163.com; dev@dpdk.org;
> Thomas Monjalon <thomas@monjalon.net>; Robin Jarry <rjarry@redhat.com>
> Subject: Re: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when
> scheduling nodes
> 
> Hello Jerin, On Fri, Nov 8, 2024 at 1: 22 PM Jerin Jacob <jerinj@ marvell. com>
> wrote: > > > Is n't breaking the ABI? > > > > So can't we modify the ABI, or is
> there any special operation required to modify > > 
> Hello Jerin,

Hello David,

> 
> On Fri, Nov 8, 2024 at 1:22 PM Jerin Jacob <jerinj@marvell.com> wrote:
> > > > Is n't breaking the ABI?
> > >
> > > So can't we modify the ABI, or is there any special operation
> > > required to modify the ABI?
> >
> > Only LTS release (xx.11) can change the ABI after sending deprecation notice.
> > Looking at the pahole output, one option will be making dispatch and
> > new semi fastpath Additions like  xstat_off can be min cache aligned
> > to make room for future expansion and to make sure have better
> performance.
> 
> Adding holes may be a short term solution, but in my opinion, the slow path
> part should be entirely hidden and we only expose the fp part.

The new cache line alignment items are proposed are fastpath items only.

> Reminder, those holes must be in a "known state" as we release v24.11 so that
> the presence of future additions can be safely detected.
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXTERNAL] [PATCH v2] graph: mcore: optimize graph search
  2024-11-11  4:03 ` [PATCH v2] graph: mcore: optimize graph search Huichao Cai
@ 2024-11-11  5:46   ` Jerin Jacob
  2024-11-13  9:19     ` Huichao Cai
  2024-11-13  7:35   ` [PATCH v3 1/2] " Huichao Cai
  1 sibling, 1 reply; 32+ messages in thread
From: Jerin Jacob @ 2024-11-11  5:46 UTC (permalink / raw)
  To: Huichao Cai, Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram,
	yanzhirun_163, david.marchand, Thomas Monjalon
  Cc: dev



> -----Original Message-----
> From: Huichao Cai <chcchc88@163.com>
> Sent: Monday, November 11, 2024 9:33 AM
> To: Jerin Jacob <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; yanzhirun_163@163.com
> Cc: dev@dpdk.org; Huichao cai <chcchc88@163.com>
> Subject: [EXTERNAL] [PATCH v2] graph: mcore: optimize graph search
> 
> From: Huichao cai <chcchc88@ 163. com> In the function
> __rte_graph_mcore_dispatch_sched_node_enqueue, use a slower loop to
> search for the graph, modify the search logic to record the result of the first
> search, and use this record for subsequent 
> From: Huichao cai <chcchc88@163.com>
> 
> In the function __rte_graph_mcore_dispatch_sched_node_enqueue,
> use a slower loop to search for the graph, modify the search logic to record the
> result of the first search, and use this record for subsequent searches to
> improve search speed.
> 
> Signed-off-by: Huichao cai <chcchc88@163.com>
> ---
>  lib/graph/rte_graph_model_mcore_dispatch.c | 11 +++++++----
>  lib/graph/rte_graph_worker_common.h        |  1 +
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c
> b/lib/graph/rte_graph_model_mcore_dispatch.c
> index a590fc9..a81d338 100644
> --- a/lib/graph/rte_graph_model_mcore_dispatch.c
> +++ b/lib/graph/rte_graph_model_mcore_dispatch.c
> @@ -118,11 +118,14 @@
>  					      struct rte_graph_rq_head *rq)  {
>  	const unsigned int lcore_id = node->dispatch.lcore_id;
> -	struct rte_graph *graph;
> +	struct rte_graph *graph = node->dispatch.graph;
> 
> -	SLIST_FOREACH(graph, rq, next)
> -		if (graph->dispatch.lcore_id == lcore_id)
> -			break;
> +	if (unlikely((!graph) || (graph->dispatch.lcore_id != lcore_id))) {
> +		SLIST_FOREACH(graph, rq, next)
> +			if (graph->dispatch.lcore_id == lcore_id)
> +				break;
> +		node->dispatch.graph = graph;
> +	}
> 
>  	return graph != NULL ? __graph_sched_node_enqueue(node, graph) :
> false;  } diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index a518af2..4c2432b 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
>  			unsigned int lcore_id;  /**< Node running lcore. */
>  			uint64_t total_sched_objs; /**< Number of objects
> scheduled. */
>  			uint64_t total_sched_fail; /**< Number of scheduled
> failure. */
> +			struct rte_graph *graph;  /**< Graph corresponding to
> lcore_id. */

Need to conclude the ABI related discussion here before making change
 https://patches.dpdk.org/project/dpdk/patch/1730966682-2632-1-git-send-email-chcchc88@163.com/

>  		} dispatch;
>  	};
>  	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-11  5:38         ` Jerin Jacob
@ 2024-11-12  8:51           ` David Marchand
  2024-11-12  9:35             ` Jerin Jacob
  0 siblings, 1 reply; 32+ messages in thread
From: David Marchand @ 2024-11-12  8:51 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Huichao Cai, Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram,
	yanzhirun_163, dev, Thomas Monjalon, Robin Jarry

On Mon, Nov 11, 2024 at 6:39 AM Jerin Jacob <jerinj@marvell.com> wrote:
>
>
>
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Friday, November 8, 2024 7:08 PM
> > To: Jerin Jacob <jerinj@marvell.com>
> > Cc: Huichao Cai <chcchc88@163.com>; Kiran Kumar Kokkilagadda
> > <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> > <ndabilpuram@marvell.com>; yanzhirun_163@163.com; dev@dpdk.org;
> > Thomas Monjalon <thomas@monjalon.net>; Robin Jarry <rjarry@redhat.com>
> > Subject: Re: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when
> > scheduling nodes
> >
> > Hello Jerin, On Fri, Nov 8, 2024 at 1: 22 PM Jerin Jacob <jerinj@ marvell. com>
> > wrote: > > > Is n't breaking the ABI? > > > > So can't we modify the ABI, or is
> > there any special operation required to modify > >
> > Hello Jerin,
>
> Hello David,
>
> >
> > On Fri, Nov 8, 2024 at 1:22 PM Jerin Jacob <jerinj@marvell.com> wrote:
> > > > > Is n't breaking the ABI?
> > > >
> > > > So can't we modify the ABI, or is there any special operation
> > > > required to modify the ABI?
> > >
> > > Only LTS release (xx.11) can change the ABI after sending deprecation notice.
> > > Looking at the pahole output, one option will be making dispatch and
> > > new semi fastpath Additions like  xstat_off can be min cache aligned
> > > to make room for future expansion and to make sure have better
> > performance.
> >
> > Adding holes may be a short term solution, but in my opinion, the slow path
> > part should be entirely hidden and we only expose the fp part.
>
> The new cache line alignment items are proposed are fastpath items only.

I had only noticed the second comment:

+       alignas(RTE_CACHE_LINE_MIN_SIZE)
        rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
        /* Fast path area  */
        ^^^^^^^^^^^^

And I assumed the part in the struct before was slow path.
(it may be worth enhancing these comments, with a single limit of
slow/fast path areas)


>
> > Reminder, those holes must be in a "known state" as we release v24.11 so that
> > the presence of future additions can be safely detected.

If the rte_node objects are allocated by the graph library and zero'd,
then we are good.
It seems to be the case in graph_nodes_populate(), and the rte_node
objects are embedded in the rte_graph object.

Is there another location in the graph library where a rte_node object
is allocated?

If not, and an application can not create a rte_node object, your
proposal looks good to me.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-12  8:51           ` David Marchand
@ 2024-11-12  9:35             ` Jerin Jacob
  2024-11-12 12:57               ` Huichao Cai
  2024-11-13  9:22               ` Huichao Cai
  0 siblings, 2 replies; 32+ messages in thread
From: Jerin Jacob @ 2024-11-12  9:35 UTC (permalink / raw)
  To: David Marchand, Huichao Cai
  Cc: Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, yanzhirun_163,
	dev, Thomas Monjalon, Robin Jarry



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, November 12, 2024 2:21 PM
> To: Jerin Jacob <jerinj@marvell.com>
> Cc: Huichao Cai <chcchc88@163.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; yanzhirun_163@163.com; dev@dpdk.org;
> Thomas Monjalon <thomas@monjalon.net>; Robin Jarry <rjarry@redhat.com>
> Subject: Re: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when
> scheduling nodes
> 
> On Mon, Nov 11, 2024 at 6: 39 AM Jerin Jacob <jerinj@ marvell. com> wrote: >
> > > > > -----Original Message----- > > From: David Marchand
> <david. marchand@ redhat. com> > > Sent: Friday, November 8, 2024 7: 08
> 
> On Mon, Nov 11, 2024 at 6:39 AM Jerin Jacob <jerinj@marvell.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: David Marchand <david.marchand@redhat.com>
> > > Sent: Friday, November 8, 2024 7:08 PM
> > > To: Jerin Jacob <jerinj@marvell.com>
> > > Cc: Huichao Cai <chcchc88@163.com>; Kiran Kumar Kokkilagadda
> > > <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> > > <ndabilpuram@marvell.com>; yanzhirun_163@163.com; dev@dpdk.org;
> > > Thomas Monjalon <thomas@monjalon.net>; Robin Jarry
> > > <rjarry@redhat.com>
> > > Subject: Re: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search
> > > when scheduling nodes
> > >
> > > Hello Jerin, On Fri, Nov 8, 2024 at 1: 22 PM Jerin Jacob <jerinj@ 
> > > marvell. com>
> > > wrote: > > > Is n't breaking the ABI? > > > > So can't we modify the
> > > ABI, or is there any special operation required to modify > > Hello
> > > Jerin,
> >
> > Hello David,
> >
> > >
> > > On Fri, Nov 8, 2024 at 1:22 PM Jerin Jacob <jerinj@marvell.com> wrote:
> > > > > > Is n't breaking the ABI?
> > > > >
> > > > > So can't we modify the ABI, or is there any special operation
> > > > > required to modify the ABI?
> > > >
> > > > Only LTS release (xx.11) can change the ABI after sending deprecation
> notice.
> > > > Looking at the pahole output, one option will be making dispatch
> > > > and new semi fastpath Additions like  xstat_off can be min cache
> > > > aligned to make room for future expansion and to make sure have
> > > > better
> > > performance.
> > >
> > > Adding holes may be a short term solution, but in my opinion, the
> > > slow path part should be entirely hidden and we only expose the fp part.
> >
> > The new cache line alignment items are proposed are fastpath items only.
> 
> I had only noticed the second comment:
> 
> +       alignas(RTE_CACHE_LINE_MIN_SIZE)
>         rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
>         /* Fast path area  */
>         ^^^^^^^^^^^^
> 
> And I assumed the part in the struct before was slow path.
> (it may be worth enhancing these comments, with a single limit of slow/fast
> path areas)

Yes. Xstat_off was new addition as a fastpath item in this release and there was no space
in original Fastpath area. And, Yes, the comment needs to be updated.


> 
> 
> >
> > > Reminder, those holes must be in a "known state" as we release
> > > v24.11 so that the presence of future additions can be safely detected.
> 
> If the rte_node objects are allocated by the graph library and zero'd, then we
> are good.
> It seems to be the case in graph_nodes_populate(), and the rte_node objects
> are embedded in the rte_graph object.
> 
> Is there another location in the graph library where a rte_node object is
> allocated?

No

> 
> If not, and an application can not create a rte_node object, your proposal looks
> good to me.

OK. @Huichao Cai Please send two patches (a) new proposal and (b) your improvement as series.
Update ABI Changes section in  doc/guides/rel_notes/release_24_11.rst  


> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:RE: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-12  9:35             ` Jerin Jacob
@ 2024-11-12 12:57               ` Huichao Cai
  2024-11-13  9:22               ` Huichao Cai
  1 sibling, 0 replies; 32+ messages in thread
From: Huichao Cai @ 2024-11-12 12:57 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: David Marchand, Kiran Kumar Kokkilagadda,
	Nithin Kumar Dabilpuram, yanzhirun_163, dev, Thomas Monjalon,
	Robin Jarry

[-- Attachment #1: Type: text/plain, Size: 205 bytes --]

>OK. @Huichao Cai Please send two patches (a) new proposal and (b) your improvement as series.
>Update ABI Changes section in  doc/guides/rel_notes/release_24_11.rst  Ok.I will send these two patches soon.

[-- Attachment #2: Type: text/html, Size: 380 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/2] graph: mcore: optimize graph search
  2024-11-11  4:03 ` [PATCH v2] graph: mcore: optimize graph search Huichao Cai
  2024-11-11  5:46   ` [EXTERNAL] " Jerin Jacob
@ 2024-11-13  7:35   ` Huichao Cai
  2024-11-13  7:35     ` [PATCH v3 2/2] graph: add alignment to the member of rte_node Huichao Cai
  2024-11-14  8:45     ` [PATCH v4 1/2] graph: mcore: optimize graph search Huichao Cai
  1 sibling, 2 replies; 32+ messages in thread
From: Huichao Cai @ 2024-11-13  7:35 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163; +Cc: dev

In the function __rte_graph_mcore_dispatch_sched_node_enqueue,
use a slower loop to search for the graph, modify the search logic
to record the result of the first search, and use this record for
subsequent searches to improve search speed.

Due to the addition of a "graph" field in the "rte_node" structure,
update file release_24_11.rst.

Signed-off-by: Huichao Cai <chcchc88@163.com>
---
 doc/guides/rel_notes/release_24_11.rst     |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 11 +++++++----
 lib/graph/rte_graph_worker_common.h        |  1 +
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 9dc739c4cb..592116b979 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -423,6 +423,7 @@ ABI Changes
   added new structure ``rte_node_xstats`` to ``rte_node_register`` and
   added ``xstat_off`` to ``rte_node``.
 
+* graph: added ``graph`` field to the ``dispatch`` structure in the ``rte_node`` structure.
 
 Known Issues
 ------------
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index a590fc9497..a81d338227 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -118,11 +118,14 @@ __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
 					      struct rte_graph_rq_head *rq)
 {
 	const unsigned int lcore_id = node->dispatch.lcore_id;
-	struct rte_graph *graph;
+	struct rte_graph *graph = node->dispatch.graph;
 
-	SLIST_FOREACH(graph, rq, next)
-		if (graph->dispatch.lcore_id == lcore_id)
-			break;
+	if (unlikely((!graph) || (graph->dispatch.lcore_id != lcore_id))) {
+		SLIST_FOREACH(graph, rq, next)
+			if (graph->dispatch.lcore_id == lcore_id)
+				break;
+		node->dispatch.graph = graph;
+	}
 
 	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a518af2b2a..4c2432b47f 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
 			unsigned int lcore_id;  /**< Node running lcore. */
 			uint64_t total_sched_objs; /**< Number of objects scheduled. */
 			uint64_t total_sched_fail; /**< Number of scheduled failure. */
+			struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
 		} dispatch;
 	};
 	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 2/2] graph: add alignment to the member of rte_node
  2024-11-13  7:35   ` [PATCH v3 1/2] " Huichao Cai
@ 2024-11-13  7:35     ` Huichao Cai
  2024-11-14  7:14       ` [EXTERNAL] " Jerin Jacob
  2024-11-14  8:45     ` [PATCH v4 1/2] graph: mcore: optimize graph search Huichao Cai
  1 sibling, 1 reply; 32+ messages in thread
From: Huichao Cai @ 2024-11-13  7:35 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163; +Cc: dev

The members "dispatch" and "xstat_off" of the structure "rte_node"
can be min cache aligned to make room for future expansion and to
make sure have better performance.

Due to the modification of the alignment of some members of the
"rte_node" structure, update file release_24_11.rst.

Signed-off-by: Huichao Cai <chcchc88@163.com>
---
 doc/guides/rel_notes/release_24_11.rst | 3 +++
 lib/graph/rte_graph_worker_common.h    | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 592116b979..6903b1d0f0 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -425,6 +425,9 @@ ABI Changes
 
 * graph: added ``graph`` field to the ``dispatch`` structure in the ``rte_node`` structure.
 
+* graph: The members ``dispatch`` and ``xstat_off`` of the structure ``rte_node`` have been
+  marked as RTE_CACHE_LINE_MIN_SIZE bytes aligned.
+
 Known Issues
 ------------
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 4c2432b47f..9e99278a0a 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -104,6 +104,7 @@ struct __rte_cache_aligned rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	union {
 		/* Fast schedule area for mcore dispatch model */
 		struct {
@@ -113,8 +114,10 @@ struct __rte_cache_aligned rte_node {
 			struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
 		} dispatch;
 	};
-	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
+
 	/* Fast path area  */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
+	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
 	__extension__ struct __rte_cache_aligned {
 #define RTE_NODE_CTX_SZ 16
 		union {
-- 
2.27.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:RE: [EXTERNAL] [PATCH v2] graph: mcore: optimize graph search
  2024-11-11  5:46   ` [EXTERNAL] " Jerin Jacob
@ 2024-11-13  9:19     ` Huichao Cai
  0 siblings, 0 replies; 32+ messages in thread
From: Huichao Cai @ 2024-11-13  9:19 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, yanzhirun_163,
	david.marchand, Thomas Monjalon, dev

[-- Attachment #1: Type: text/plain, Size: 5102 bytes --]

> [main] [dpdk.org] $ git diff
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index a518af2b2a..ec9a82186d 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -104,6 +104,7 @@ struct __rte_cache_aligned rte_node {
>         /** Original process function when pcap is enabled. */
>         rte_node_process_t original_process;

> +       alignas(RTE_CACHE_LINE_MIN_SIZE)
>         union {

Hi, Jerin
The C++standard cannot align anonymous unions. Do we need to fill in reserved fields in order to maintain union alignment with RTE-CAHE_LINE_LIN_SIZE bytes?

>                 /* Fast schedule area for mcore dispatch model */
>                 struct {
> @@ -112,6 +113,7 @@ struct __rte_cache_aligned rte_node {
>                         uint64_t total_sched_fail; /**< Number of scheduled failure. */
>                 } dispatch;
>         };
> +       alignas(RTE_CACHE_LINE_MIN_SIZE)
>         rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
>         /* Fast path area  */
>         __extension__ struct __rte_cache_aligned {


FAILED: buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o 
ccache c++ -Ibuildtools/chkincs/chkincs-cpp.p -Ibuildtools/chkincs -I../buildtools/chkincs -Iexamples/l3fwd -I../examples/l3fwd -I../examples/common -Idrivers/bus/vdev -I../drivers/bus/vdev -I. -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include -I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include -I../kernel/linux -Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux -Ilib/pci -I../lib/pci -Idrivers/bus/vmbus -I../drivers/bus/vmbus -I../drivers/bus/vmbus/linux -Ilib/argparse -I../lib/argparse -Ilib/ptr_compress -I../lib/ptr_compress -Ilib/ring -I../lib/ring -Ilib/rcu -I../lib/rcu -Ilib/mempool -I../lib/mempool -Ilib/mbuf -I../lib/mbuf -Ilib/net -I../lib/net -Ilib/meter -I../lib/meter -Ilib/ethdev -I../lib/ethdev -Ilib/cmdline -I../lib/cmdline -Ilib/hash -I../lib/hash -Ilib/timer -I../lib/timer -Ilib/acl -I../lib/acl -Ilib/bbdev -I../lib/bbdev -Ilib/bitratestats -I../lib/bitratestats -Ilib/bpf -I../lib/bpf -Ilib/cfgfile -I../lib/cfgfile -Ilib/compressdev -I../lib/compressdev -Ilib/cryptodev -I../lib/cryptodev -Ilib/distributor -I../lib/distributor -Ilib/dmadev -I../lib/dmadev -Ilib/efd -I../lib/efd -Ilib/eventdev -I../lib/eventdev -Ilib/dispatcher -I../lib/dispatcher -Ilib/gpudev -I../lib/gpudev -Ilib/gro -I../lib/gro -Ilib/gso -I../lib/gso -Ilib/ip_frag -I../lib/ip_frag -Ilib/jobstats -I../lib/jobstats -Ilib/latencystats -I../lib/latencystats -Ilib/lpm -I../lib/lpm -Ilib/member -I../lib/member -Ilib/pcapng -I../lib/pcapng -Ilib/power -I../lib/power -Ilib/rawdev -I../lib/rawdev -Ilib/regexdev -I../lib/regexdev -Ilib/mldev -I../lib/mldev -Ilib/rib -I../lib/rib -Ilib/reorder -I../lib/reorder -Ilib/sched -I../lib/sched -Ilib/security -I../lib/security -Ilib/stack -I../lib/stack -Ilib/vhost -I../lib/vhost -Ilib/ipsec -I../lib/ipsec -Ilib/pdcp -I../lib/pdcp -Ilib/fib -I../lib/fib -Ilib/port -I../lib/port -Ilib/pdump -I../lib/pdump -Ilib/table -I../lib/table -Ilib/pipeline -I../lib/pipeline -Ilib/graph -I../lib/graph -Ilib/node -I../lib/node -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wnon-virtual-dtor -Wextra -Werror -g -include rte_config.h -march=corei7 -mrtm -MD -MQ buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o -MF buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o.d -o buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o -c buildtools/chkincs/chkincs-cpp.p/rte_graph_worker.cpp
In file included from /home/runner/work/dpdk/dpdk/lib/graph/rte_graph_model_rtc.h:6,
                 from /home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker.h:9,
                 from buildtools/chkincs/chkincs-cpp.p/rte_graph_worker.cpp:1:
/home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker_common.h:108:15: error: attribute ignored in declaration of ‘union rte_node::<unnamed>’ [-Werror=attributes]
  108 |         union {
      |               ^
/home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker_common.h:108:15: note: attribute for ‘union rte_node::<unnamed>’ must follow the ‘union’ keyword
cc1plus: all warnings being treated as errors
[5410/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_table_lpm.cpp.o
[5411/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_port_in_action.cpp.o
[5412/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_pipeline.cpp.o
[5413/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_table_action.cpp.o
[5414/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_swx_ipsec.cpp.o
ninja: build stopped: subcommand failed.

[-- Attachment #2: Type: text/html, Size: 7692 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:RE: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-12  9:35             ` Jerin Jacob
  2024-11-12 12:57               ` Huichao Cai
@ 2024-11-13  9:22               ` Huichao Cai
  2024-11-14  7:09                 ` Jerin Jacob
  1 sibling, 1 reply; 32+ messages in thread
From: Huichao Cai @ 2024-11-13  9:22 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: David Marchand, Kiran Kumar Kokkilagadda,
	Nithin Kumar Dabilpuram, yanzhirun_163, dev, Thomas Monjalon,
	Robin Jarry

[-- Attachment #1: Type: text/plain, Size: 5084 bytes --]

> [main] [dpdk.org] $ git diff> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h> index a518af2b2a..ec9a82186d 100644> --- a/lib/graph/rte_graph_worker_common.h> +++ b/lib/graph/rte_graph_worker_common.h> @@ -104,6 +104,7 @@ struct __rte_cache_aligned rte_node {>         /** Original process function when pcap is enabled. */>         rte_node_process_t original_process;
> +       alignas(RTE_CACHE_LINE_MIN_SIZE)>         union {

Hi, Jerin
The C++standard cannot align anonymous unions. Do we need to fill in reserved fields in order to maintain union alignment with RTE-CAHE_LINE_LIN_SIZE bytes?

>                 /* Fast schedule area for mcore dispatch model */>                 struct {> @@ -112,6 +113,7 @@ struct __rte_cache_aligned rte_node {>                         uint64_t total_sched_fail; /**< Number of scheduled failure. */>                 } dispatch;>         };> +       alignas(RTE_CACHE_LINE_MIN_SIZE)>         rte_graph_off_t xstat_off; /**< Offset to xstat counters. */>         /* Fast path area  */>         __extension__ struct __rte_cache_aligned {


FAILED: buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o 
ccache c++ -Ibuildtools/chkincs/chkincs-cpp.p -Ibuildtools/chkincs -I../buildtools/chkincs -Iexamples/l3fwd -I../examples/l3fwd -I../examples/common -Idrivers/bus/vdev -I../drivers/bus/vdev -I. -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include -I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include -I../kernel/linux -Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux -Ilib/pci -I../lib/pci -Idrivers/bus/vmbus -I../drivers/bus/vmbus -I../drivers/bus/vmbus/linux -Ilib/argparse -I../lib/argparse -Ilib/ptr_compress -I../lib/ptr_compress -Ilib/ring -I../lib/ring -Ilib/rcu -I../lib/rcu -Ilib/mempool -I../lib/mempool -Ilib/mbuf -I../lib/mbuf -Ilib/net -I../lib/net -Ilib/meter -I../lib/meter -Ilib/ethdev -I../lib/ethdev -Ilib/cmdline -I../lib/cmdline -Ilib/hash -I../lib/hash -Ilib/timer -I../lib/timer -Ilib/acl -I../lib/acl -Ilib/bbdev -I../lib/bbdev -Ilib/bitratestats -I../lib/bitratestats -Ilib/bpf -I../lib/bpf -Ilib/cfgfile -I../lib/cfgfile -Ilib/compressdev -I../lib/compressdev -Ilib/cryptodev -I../lib/cryptodev -Ilib/distributor -I../lib/distributor -Ilib/dmadev -I../lib/dmadev -Ilib/efd -I../lib/efd -Ilib/eventdev -I../lib/eventdev -Ilib/dispatcher -I../lib/dispatcher -Ilib/gpudev -I../lib/gpudev -Ilib/gro -I../lib/gro -Ilib/gso -I../lib/gso -Ilib/ip_frag -I../lib/ip_frag -Ilib/jobstats -I../lib/jobstats -Ilib/latencystats -I../lib/latencystats -Ilib/lpm -I../lib/lpm -Ilib/member -I../lib/member -Ilib/pcapng -I../lib/pcapng -Ilib/power -I../lib/power -Ilib/rawdev -I../lib/rawdev -Ilib/regexdev -I../lib/regexdev -Ilib/mldev -I../lib/mldev -Ilib/rib -I../lib/rib -Ilib/reorder -I../lib/reorder -Ilib/sched -I../lib/sched -Ilib/security -I../lib/security -Ilib/stack -I../lib/stack -Ilib/vhost -I../lib/vhost -Ilib/ipsec -I../lib/ipsec -Ilib/pdcp -I../lib/pdcp -Ilib/fib -I../lib/fib -Ilib/port -I../lib/port -Ilib/pdump -I../lib/pdump -Ilib/table -I../lib/table -Ilib/pipeline -I../lib/pipeline -Ilib/graph -I../lib/graph -Ilib/node -I../lib/node -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wnon-virtual-dtor -Wextra -Werror -g -include rte_config.h -march=corei7 -mrtm -MD -MQ buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o -MF buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o.d -o buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o -c buildtools/chkincs/chkincs-cpp.p/rte_graph_worker.cpp
In file included from /home/runner/work/dpdk/dpdk/lib/graph/rte_graph_model_rtc.h:6,
                 from /home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker.h:9,
                 from buildtools/chkincs/chkincs-cpp.p/rte_graph_worker.cpp:1:
/home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker_common.h:108:15: error: attribute ignored in declaration of ‘union rte_node::<unnamed>’ [-Werror=attributes]
  108 |         union {
      |               ^
/home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker_common.h:108:15: note: attribute for ‘union rte_node::<unnamed>’ must follow the ‘union’ keyword
cc1plus: all warnings being treated as errors
[5410/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_table_lpm.cpp.o
[5411/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_port_in_action.cpp.o
[5412/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_pipeline.cpp.o
[5413/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_table_action.cpp.o
[5414/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_swx_ipsec.cpp.o
ninja: build stopped: subcommand failed.

[-- Attachment #2: Type: text/html, Size: 8148 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Re:RE: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when scheduling nodes
  2024-11-13  9:22               ` Huichao Cai
@ 2024-11-14  7:09                 ` Jerin Jacob
  0 siblings, 0 replies; 32+ messages in thread
From: Jerin Jacob @ 2024-11-14  7:09 UTC (permalink / raw)
  To: Huichao Cai
  Cc: David Marchand, Kiran Kumar Kokkilagadda,
	Nithin Kumar Dabilpuram, yanzhirun_163, dev, Thomas Monjalon,
	Robin Jarry



> -----Original Message-----
> From: Huichao Cai <chcchc88@163.com>
> Sent: Wednesday, November 13, 2024 2:52 PM
> To: Jerin Jacob <jerinj@marvell.com>
> Cc: David Marchand <david.marchand@redhat.com>; Kiran Kumar
> Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; yanzhirun_163@163.com; dev@dpdk.org;
> Thomas Monjalon <thomas@monjalon.net>; Robin Jarry <rjarry@redhat.com>
> Subject: Re:RE: Re:RE: [EXTERNAL] [PATCH] graph: optimize graph search when
> scheduling nodes
> 
> > [main] [dpdk. org] $ git diff > diff --git
> > a/lib/graph/rte_graph_worker_common. h
> > b/lib/graph/rte_graph_worker_common. h > index a518af2b2a. . 
> > ec9a82186d 100644 > --- a/lib/graph/rte_graph_worker_common. h > +++
> > b/lib/graph/rte_graph_worker_common. h
> 
> > [main] [dpdk.org] $ git diff
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index a518af2b2a..ec9a82186d 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -104,6 +104,7 @@ struct __rte_cache_aligned rte_node {
> >         /** Original process function when pcap is enabled. */
> >         rte_node_process_t original_process;
> 
> > +       alignas(RTE_CACHE_LINE_MIN_SIZE)
> >         union {
> 
> Hi, Jerin
> The C++standard cannot align anonymous unions. Do we need to fill in reserved
> fields in order to maintain union alignment with RTE-CAHE_LINE_LIN_SIZE
> bytes?


You can bring it inside the structure.

> 
> >                 /* Fast schedule area for mcore dispatch model */
> >                 struct {
> > @@ -112,6 +113,7 @@ struct __rte_cache_aligned rte_node {
> >                         uint64_t total_sched_fail; /**< Number of scheduled failure. */
> >                 } dispatch;
> >         };
> > +       alignas(RTE_CACHE_LINE_MIN_SIZE)
> >         rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
> >         /* Fast path area  */
> >         __extension__ struct __rte_cache_aligned {
> 
> 
> FAILED: buildtools/chkincs/chkincs-cpp.p/meson-
> generated_rte_graph_worker.cpp.o
> ccache c++ -Ibuildtools/chkincs/chkincs-cpp.p -Ibuildtools/chkincs -
> I../buildtools/chkincs -Iexamples/l3fwd -I../examples/l3fwd -
> I../examples/common -Idrivers/bus/vdev -I../drivers/bus/vdev -I. -I.. -Iconfig -
> I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include -
> I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include -
> I../kernel/linux -Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal -
> Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics -I../lib/metrics -
> Ilib/telemetry -I../lib/telemetry -Idrivers/bus/pci -I../drivers/bus/pci -
> I../drivers/bus/pci/linux -Ilib/pci -I../lib/pci -Idrivers/bus/vmbus -
> I../drivers/bus/vmbus -I../drivers/bus/vmbus/linux -Ilib/argparse -
> I../lib/argparse -Ilib/ptr_compress -I../lib/ptr_compress -Ilib/ring -I../lib/ring -
> Ilib/rcu -I../lib/rcu -Ilib/mempool -I../lib/mempool -Ilib/mbuf -I../lib/mbuf -
> Ilib/net -I../lib/net -Ilib/meter -I../lib/meter -Ilib/ethdev -I../lib/ethdev -
> Ilib/cmdline -I../lib/cmdline -Ilib/hash -I../lib/hash -Ilib/timer -I../lib/timer -
> Ilib/acl -I../lib/acl -Ilib/bbdev -I../lib/bbdev -Ilib/bitratestats -I../lib/bitratestats -
> Ilib/bpf -I../lib/bpf -Ilib/cfgfile -I../lib/cfgfile -Ilib/compressdev -
> I../lib/compressdev -Ilib/cryptodev -I../lib/cryptodev -Ilib/distributor -
> I../lib/distributor -Ilib/dmadev -I../lib/dmadev -Ilib/efd -I../lib/efd -Ilib/eventdev
> -I../lib/eventdev -Ilib/dispatcher -I../lib/dispatcher -Ilib/gpudev -I../lib/gpudev -
> Ilib/gro -I../lib/gro -Ilib/gso -I../lib/gso -Ilib/ip_frag -I../lib/ip_frag -Ilib/jobstats -
> I../lib/jobstats -Ilib/latencystats -I../lib/latencystats -Ilib/lpm -I../lib/lpm -
> Ilib/member -I../lib/member -Ilib/pcapng -I../lib/pcapng -Ilib/power -
> I../lib/power -Ilib/rawdev -I../lib/rawdev -Ilib/regexdev -I../lib/regexdev -
> Ilib/mldev -I../lib/mldev -Ilib/rib -I../lib/rib -Ilib/reorder -I../lib/reorder -
> Ilib/sched -I../lib/sched -Ilib/security -I../lib/security -Ilib/stack -I../lib/stack -
> Ilib/vhost -I../lib/vhost -Ilib/ipsec -I../lib/ipsec -Ilib/pdcp -I../lib/pdcp -Ilib/fib -
> I../lib/fib -Ilib/port -I../lib/port -Ilib/pdump -I../lib/pdump -Ilib/table -I../lib/table
> -Ilib/pipeline -I../lib/pipeline -Ilib/graph -I../lib/graph -Ilib/node -I../lib/node -
> fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -
> Wnon-virtual-dtor -Wextra -Werror -g -include rte_config.h -march=corei7 -
> mrtm -MD -MQ buildtools/chkincs/chkincs-cpp.p/meson-
> generated_rte_graph_worker.cpp.o -MF buildtools/chkincs/chkincs-
> cpp.p/meson-generated_rte_graph_worker.cpp.o.d -o
> buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_graph_worker.cpp.o -c
> buildtools/chkincs/chkincs-cpp.p/rte_graph_worker.cpp
> In file included from
> /home/runner/work/dpdk/dpdk/lib/graph/rte_graph_model_rtc.h:6,
>                  from
> /home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker.h:9,
>                  from buildtools/chkincs/chkincs-cpp.p/rte_graph_worker.cpp:1:
> /home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker_common.h:108:1
> 5: error: attribute ignored in declaration of ‘union rte_node::<unnamed>’ [-
> Werror=attributes]
>   108 |         union {
>       |               ^
> /home/runner/work/dpdk/dpdk/lib/graph/rte_graph_worker_common.h:108:1
> 5: note: attribute for ‘union rte_node::<unnamed>’ must follow the ‘union’
> keyword
> cc1plus: all warnings being treated as errors [5410/6569] Compiling C++ object
> buildtools/chkincs/chkincs-cpp.p/meson-generated_rte_table_lpm.cpp.o
> [5411/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-
> generated_rte_port_in_action.cpp.o
> [5412/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-
> generated_rte_pipeline.cpp.o
> [5413/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-
> generated_rte_table_action.cpp.o
> [5414/6569] Compiling C++ object buildtools/chkincs/chkincs-cpp.p/meson-
> generated_rte_swx_ipsec.cpp.o
> ninja: build stopped: subcommand failed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXTERNAL] [PATCH v3 2/2] graph: add alignment to the member of rte_node
  2024-11-13  7:35     ` [PATCH v3 2/2] graph: add alignment to the member of rte_node Huichao Cai
@ 2024-11-14  7:14       ` Jerin Jacob
  0 siblings, 0 replies; 32+ messages in thread
From: Jerin Jacob @ 2024-11-14  7:14 UTC (permalink / raw)
  To: Huichao Cai, Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram,
	yanzhirun_163
  Cc: dev



> -----Original Message-----
> From: Huichao Cai <chcchc88@163.com>
> Sent: Wednesday, November 13, 2024 1:06 PM
> To: Jerin Jacob <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; yanzhirun_163@163.com
> Cc: dev@dpdk.org
> Subject: [EXTERNAL] [PATCH v3 2/2] graph: add alignment to the member of
> rte_node
> 
> The members "dispatch" and "xstat_off" of the structure "rte_node" can be min
> cache aligned to make room for future expansion and to make sure have better
> performance. Due to the modification of the alignment of some members of the
> "rte_node"
> 
> The members "dispatch" and "xstat_off" of the structure "rte_node"
> can be min cache aligned to make room for future expansion and to make sure
> have better performance.
> 
> Due to the modification of the alignment of some members of the "rte_node"
> structure, update file release_24_11.rst.
> 
> Signed-off-by: Huichao Cai <chcchc88@163.com>
> ---
>  doc/guides/rel_notes/release_24_11.rst | 3 +++
>  lib/graph/rte_graph_worker_common.h    | 5 ++++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_24_11.rst
> b/doc/guides/rel_notes/release_24_11.rst
> index 592116b979..6903b1d0f0 100644
> --- a/doc/guides/rel_notes/release_24_11.rst
> +++ b/doc/guides/rel_notes/release_24_11.rst
> @@ -425,6 +425,9 @@ ABI Changes
> 
>  * graph: added ``graph`` field to the ``dispatch`` structure in the ``rte_node``
> structure.
> 
> +* graph: The members ``dispatch`` and ``xstat_off`` of the structure
> +``rte_node`` have been
> +  marked as RTE_CACHE_LINE_MIN_SIZE bytes aligned.
> +
>  Known Issues
>  ------------
> 
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index 4c2432b47f..9e99278a0a 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -104,6 +104,7 @@ struct __rte_cache_aligned rte_node {
>  	/** Original process function when pcap is enabled. */
>  	rte_node_process_t original_process;
> 
> +	alignas(RTE_CACHE_LINE_MIN_SIZE)
>  	union {
>  		/* Fast schedule area for mcore dispatch model */
>  		struct {
> @@ -113,8 +114,10 @@ struct __rte_cache_aligned rte_node {
>  			struct rte_graph *graph;  /**< Graph corresponding to
> lcore_id. */
>  		} dispatch;
>  	};
> -	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
> +
>  	/* Fast path area  */

Make it as two separate comment, Fast path area cache line 1 and Fastpath area cache line 2.

> +	alignas(RTE_CACHE_LINE_MIN_SIZE)
> +	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
>  	__extension__ struct __rte_cache_aligned {  #define RTE_NODE_CTX_SZ
> 16
>  		union {
> --
> 2.27.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 1/2] graph: mcore: optimize graph search
  2024-11-13  7:35   ` [PATCH v3 1/2] " Huichao Cai
  2024-11-13  7:35     ` [PATCH v3 2/2] graph: add alignment to the member of rte_node Huichao Cai
@ 2024-11-14  8:45     ` Huichao Cai
  2024-11-14  8:45       ` [PATCH v4 2/2] graph: add alignment to the member of rte_node Huichao Cai
  2024-12-13  2:21       ` [PATCH v5] graph: mcore: optimize graph search Huichao Cai
  1 sibling, 2 replies; 32+ messages in thread
From: Huichao Cai @ 2024-11-14  8:45 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163; +Cc: dev

In the function __rte_graph_mcore_dispatch_sched_node_enqueue,
use a slower loop to search for the graph, modify the search logic
to record the result of the first search, and use this record for
subsequent searches to improve search speed.

Due to the addition of a "graph" field in the "rte_node" structure,
update file release_24_11.rst.

Signed-off-by: Huichao Cai <chcchc88@163.com>
---
 doc/guides/rel_notes/release_24_11.rst     |  1 +
 lib/graph/rte_graph_model_mcore_dispatch.c | 11 +++++++----
 lib/graph/rte_graph_worker_common.h        |  1 +
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 9dc739c4cb..592116b979 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -423,6 +423,7 @@ ABI Changes
   added new structure ``rte_node_xstats`` to ``rte_node_register`` and
   added ``xstat_off`` to ``rte_node``.
 
+* graph: added ``graph`` field to the ``dispatch`` structure in the ``rte_node`` structure.
 
 Known Issues
 ------------
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index a590fc9497..a81d338227 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -118,11 +118,14 @@ __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
 					      struct rte_graph_rq_head *rq)
 {
 	const unsigned int lcore_id = node->dispatch.lcore_id;
-	struct rte_graph *graph;
+	struct rte_graph *graph = node->dispatch.graph;
 
-	SLIST_FOREACH(graph, rq, next)
-		if (graph->dispatch.lcore_id == lcore_id)
-			break;
+	if (unlikely((!graph) || (graph->dispatch.lcore_id != lcore_id))) {
+		SLIST_FOREACH(graph, rq, next)
+			if (graph->dispatch.lcore_id == lcore_id)
+				break;
+		node->dispatch.graph = graph;
+	}
 
 	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a518af2b2a..4c2432b47f 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
 			unsigned int lcore_id;  /**< Node running lcore. */
 			uint64_t total_sched_objs; /**< Number of objects scheduled. */
 			uint64_t total_sched_fail; /**< Number of scheduled failure. */
+			struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
 		} dispatch;
 	};
 	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 2/2] graph: add alignment to the member of rte_node
  2024-11-14  8:45     ` [PATCH v4 1/2] graph: mcore: optimize graph search Huichao Cai
@ 2024-11-14  8:45       ` Huichao Cai
  2024-11-14 10:05         ` [EXTERNAL] " Jerin Jacob
  2024-11-15  1:55         ` [PATCH v5 1/1] graph: improve node layout Huichao Cai
  2024-12-13  2:21       ` [PATCH v5] graph: mcore: optimize graph search Huichao Cai
  1 sibling, 2 replies; 32+ messages in thread
From: Huichao Cai @ 2024-11-14  8:45 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163; +Cc: dev

The members dispatch and xstat_off of the structure rte_node
can be min cache aligned to make room for future expansion and to
make sure have better performance. Add corresponding comments.

Due to the modification of the alignment of some members of the
rte_node structure, update file release_24_11.rst.

Signed-off-by: Huichao Cai <chcchc88@163.com>
---
 doc/guides/rel_notes/release_24_11.rst | 3 +++
 lib/graph/rte_graph_worker_common.h    | 7 ++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 592116b979..6903b1d0f0 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -425,6 +425,9 @@ ABI Changes
 
 * graph: added ``graph`` field to the ``dispatch`` structure in the ``rte_node`` structure.
 
+* graph: The members ``dispatch`` and ``xstat_off`` of the structure ``rte_node`` have been
+  marked as RTE_CACHE_LINE_MIN_SIZE bytes aligned.
+
 Known Issues
 ------------
 
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 4c2432b47f..d36abec08b 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -104,16 +104,21 @@ struct __rte_cache_aligned rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	/** Fast path area cache line 1. */
 	union {
 		/* Fast schedule area for mcore dispatch model */
-		struct {
+		alignas(RTE_CACHE_LINE_MIN_SIZE) struct {
 			unsigned int lcore_id;  /**< Node running lcore. */
 			uint64_t total_sched_objs; /**< Number of objects scheduled. */
 			uint64_t total_sched_fail; /**< Number of scheduled failure. */
 			struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
 		} dispatch;
 	};
+
+	/** Fast path area cache line 2. */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
+
 	/* Fast path area  */
 	__extension__ struct __rte_cache_aligned {
 #define RTE_NODE_CTX_SZ 16
-- 
2.27.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXTERNAL] [PATCH v4 2/2] graph: add alignment to the member of rte_node
  2024-11-14  8:45       ` [PATCH v4 2/2] graph: add alignment to the member of rte_node Huichao Cai
@ 2024-11-14 10:05         ` Jerin Jacob
  2024-11-14 12:06           ` Huichao Cai
  2024-11-15  1:55         ` [PATCH v5 1/1] graph: improve node layout Huichao Cai
  1 sibling, 1 reply; 32+ messages in thread
From: Jerin Jacob @ 2024-11-14 10:05 UTC (permalink / raw)
  To: Huichao Cai, Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram,
	yanzhirun_163, david.marchand
  Cc: dev



> -----Original Message-----
> From: Huichao Cai <chcchc88@163.com>
> Sent: Thursday, November 14, 2024 2:15 PM
> To: Jerin Jacob <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; yanzhirun_163@163.com
> Cc: dev@dpdk.org
> Subject: [EXTERNAL] [PATCH v4 2/2] graph: add alignment to the member of
> rte_node
> 
> The members dispatch and xstat_off of the structure rte_node can be min cache
> aligned to make room for future expansion and to make sure have better
> performance. Add corresponding comments. Due to the modification of the
> alignment of some members 
> The members dispatch and xstat_off of the structure rte_node can be min cache
> aligned to make room for future expansion and to make sure have better
> performance. Add corresponding comments.
> 

Please change subject to graph: improve node layout


> Due to the modification of the alignment of some members of the rte_node
> structure, update file release_24_11.rst.

The above section is not needed.


> 
> Signed-off-by: Huichao Cai <chcchc88@163.com>
> ---
>  doc/guides/rel_notes/release_24_11.rst | 3 +++
>  lib/graph/rte_graph_worker_common.h    | 7 ++++++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_24_11.rst
> b/doc/guides/rel_notes/release_24_11.rst
> index 592116b979..6903b1d0f0 100644
> --- a/doc/guides/rel_notes/release_24_11.rst
> +++ b/doc/guides/rel_notes/release_24_11.rst
> @@ -425,6 +425,9 @@ ABI Changes
> 
>  * graph: added ``graph`` field to the ``dispatch`` structure in the ``rte_node``
> structure.
> 
> +* graph: The members ``dispatch`` and ``xstat_off`` of the structure
> +``rte_node`` have been
> +  marked as RTE_CACHE_LINE_MIN_SIZE bytes aligned.
> +
>  Known Issues
>  ------------
> 
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index 4c2432b47f..d36abec08b 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -104,16 +104,21 @@ struct __rte_cache_aligned rte_node {
>  	/** Original process function when pcap is enabled. */
>  	rte_node_process_t original_process;
> 
> +	/** Fast path area cache line 1. */


Fast schedule area for mcore dispatch model

>  	union {
>  		/* Fast schedule area for mcore dispatch model */

Above comment you can remove it

> -		struct {
> +		alignas(RTE_CACHE_LINE_MIN_SIZE) struct {
>  			unsigned int lcore_id;  /**< Node running lcore. */
>  			uint64_t total_sched_objs; /**< Number of objects
> scheduled. */
>  			uint64_t total_sched_fail; /**< Number of scheduled
> failure. */
>  			struct rte_graph *graph;  /**< Graph corresponding to
> lcore_id. */
>  		} dispatch;
>  	};
> +
> +	/** Fast path area cache line 2. */
> +	alignas(RTE_CACHE_LINE_MIN_SIZE)
>  	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
> +
>  	/* Fast path area  */


Fast path area cache line 1

>  	__extension__ struct __rte_cache_aligned {  #define RTE_NODE_CTX_SZ
> 16

With above: Acked-by: Jerin Jacob <jerinj@marvell.com>

Looks loke we cannot merge new feature in rc3. I would suggest skip 1/2 and send only this patch so that 1/2 can merged in next release.

Please add @david.marchand@redhat.com in Cc.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:RE: [EXTERNAL] [PATCH v4 2/2] graph: add alignment to the member of rte_node
  2024-11-14 10:05         ` [EXTERNAL] " Jerin Jacob
@ 2024-11-14 12:06           ` Huichao Cai
  2024-11-14 13:04             ` Jerin Jacob
  0 siblings, 1 reply; 32+ messages in thread
From: Huichao Cai @ 2024-11-14 12:06 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, yanzhirun_163,
	david.marchand, dev

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

Hi, Jerin. Like this?




diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h

index a518af2b2a..f9ff7dd8c9 100644

--- a/lib/graph/rte_graph_worker_common.h

+++ b/lib/graph/rte_graph_worker_common.h

@@ -104,15 +104,19 @@ struct __rte_cache_aligned rte_node {

        /** Original process function when pcap is enabled. */

        rte_node_process_t original_process;

 

+       /** Fast schedule area for mcore dispatch model. */

        union {

-               /* Fast schedule area for mcore dispatch model */

-               struct {

+               alignas(RTE_CACHE_LINE_MIN_SIZE) struct {

                        unsigned int lcore_id;  /**< Node running lcore. */

                        uint64_t total_sched_objs; /**< Number of objects scheduled. */

                        uint64_t total_sched_fail; /**< Number of scheduled failure. */

                } dispatch;

        };

+

+       /** Fast path area cache line 1. */

+       alignas(RTE_CACHE_LINE_MIN_SIZE)

        rte_graph_off_t xstat_off; /**< Offset to xstat counters. */

+

        /* Fast path area  */

        __extension__ struct __rte_cache_aligned {

 #define RTE_NODE_CTX_SZ 16



[-- Attachment #2: Type: text/html, Size: 2621 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Re:RE: [EXTERNAL] [PATCH v4 2/2] graph: add alignment to the member of rte_node
  2024-11-14 12:06           ` Huichao Cai
@ 2024-11-14 13:04             ` Jerin Jacob
  0 siblings, 0 replies; 32+ messages in thread
From: Jerin Jacob @ 2024-11-14 13:04 UTC (permalink / raw)
  To: Huichao Cai
  Cc: Kiran Kumar Kokkilagadda, Nithin Kumar Dabilpuram, yanzhirun_163,
	david.marchand, dev



> -----Original Message-----
> From: Huichao Cai <chcchc88@163.com>
> Sent: Thursday, November 14, 2024 5:37 PM
> To: Jerin Jacob <jerinj@marvell.com>
> Cc: Kiran Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar
> Dabilpuram <ndabilpuram@marvell.com>; yanzhirun_163@163.com;
> david.marchand@redhat.com; dev@dpdk.org
> Subject: Re:RE: [EXTERNAL] [PATCH v4 2/2] graph: add alignment to the
> member of rte_node
> 
> Hi, Jerin. Like this? diff --git a/lib/graph/rte_graph_worker_common. h
> b/lib/graph/rte_graph_worker_common. h index a518af2b2a. . f9ff7dd8c9
> 100644 --- a/lib/graph/rte_graph_worker_common. h +++
> b/lib/graph/rte_graph_worker_common. h @@ -104,15 +104,19
> 
> 
> Hi, Jerin. Like this?
> 
> 
> 
> 
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> 
> index a518af2b2a..f9ff7dd8c9 100644
> 
> --- a/lib/graph/rte_graph_worker_common.h
> 
> +++ b/lib/graph/rte_graph_worker_common.h
> 
> @@ -104,15 +104,19 @@ struct __rte_cache_aligned rte_node {
> 
>         /** Original process function when pcap is enabled. */
> 
>         rte_node_process_t original_process;
> 
> 
> 
> +       /** Fast schedule area for mcore dispatch model. */
> 
>         union {
> 
> -               /* Fast schedule area for mcore dispatch model */
> 
> -               struct {
> 
> +               alignas(RTE_CACHE_LINE_MIN_SIZE) struct {
> 
>                         unsigned int lcore_id;  /**< Node running lcore. */
> 
>                         uint64_t total_sched_objs; /**< Number of objects scheduled. */
> 
>                         uint64_t total_sched_fail; /**< Number of scheduled failure. */
> 
>                 } dispatch;
> 
>         };
> 
> +
> 
> +       /** Fast path area cache line 1. */
> 
> +       alignas(RTE_CACHE_LINE_MIN_SIZE)
> 
>         rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
> 
> +
> 
>         /* Fast path area  */

Fast path area cache line 2

Rest looks good to me.

> 
>         __extension__ struct __rte_cache_aligned {
> 
>  #define RTE_NODE_CTX_SZ 16
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v5 1/1] graph: improve node layout
  2024-11-14  8:45       ` [PATCH v4 2/2] graph: add alignment to the member of rte_node Huichao Cai
  2024-11-14 10:05         ` [EXTERNAL] " Jerin Jacob
@ 2024-11-15  1:55         ` Huichao Cai
  2024-11-15 14:23           ` Thomas Monjalon
  1 sibling, 1 reply; 32+ messages in thread
From: Huichao Cai @ 2024-11-15  1:55 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163; +Cc: dev

The members "dispatch" and "xstat_off" of the structure "rte_node"
can be min cache aligned to make room for future expansion and to
make sure have better performance. Add corresponding comments.

Signed-off-by: Huichao Cai <chcchc88@163.com>
---
 doc/guides/rel_notes/release_24_11.rst |  2 ++
 lib/graph/rte_graph_worker_common.h    | 10 +++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 5063badf39..32800e8cb0 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -491,6 +491,8 @@ ABI Changes
   added new structure ``rte_node_xstats`` to ``rte_node_register`` and
   added ``xstat_off`` to ``rte_node``.
 
+* graph: The members ``dispatch`` and ``xstat_off`` of the structure ``rte_node`` have been
+  marked as RTE_CACHE_LINE_MIN_SIZE bytes aligned.
 
 Known Issues
 ------------
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index a518af2b2a..d3ec88519d 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -104,16 +104,20 @@ struct __rte_cache_aligned rte_node {
 	/** Original process function when pcap is enabled. */
 	rte_node_process_t original_process;
 
+	/** Fast schedule area for mcore dispatch model. */
 	union {
-		/* Fast schedule area for mcore dispatch model */
-		struct {
+		alignas(RTE_CACHE_LINE_MIN_SIZE) struct {
 			unsigned int lcore_id;  /**< Node running lcore. */
 			uint64_t total_sched_objs; /**< Number of objects scheduled. */
 			uint64_t total_sched_fail; /**< Number of scheduled failure. */
 		} dispatch;
 	};
+
+	/** Fast path area cache line 1. */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
-	/* Fast path area  */
+
+	/** Fast path area cache line 2. */
 	__extension__ struct __rte_cache_aligned {
 #define RTE_NODE_CTX_SZ 16
 		union {
-- 
2.27.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 1/1] graph: improve node layout
  2024-11-15  1:55         ` [PATCH v5 1/1] graph: improve node layout Huichao Cai
@ 2024-11-15 14:23           ` Thomas Monjalon
  2024-11-15 15:57             ` [EXTERNAL] " Jerin Jacob
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Monjalon @ 2024-11-15 14:23 UTC (permalink / raw)
  To: jerinj, ndabilpuram; +Cc: kirankumark, yanzhirun_163, dev, Huichao Cai

Is it good to go?


15/11/2024 02:55, Huichao Cai:
> The members "dispatch" and "xstat_off" of the structure "rte_node"
> can be min cache aligned to make room for future expansion and to
> make sure have better performance. Add corresponding comments.
> 
> Signed-off-by: Huichao Cai <chcchc88@163.com>
> ---
>  doc/guides/rel_notes/release_24_11.rst |  2 ++
>  lib/graph/rte_graph_worker_common.h    | 10 +++++++---
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
> index 5063badf39..32800e8cb0 100644
> --- a/doc/guides/rel_notes/release_24_11.rst
> +++ b/doc/guides/rel_notes/release_24_11.rst
> @@ -491,6 +491,8 @@ ABI Changes
>    added new structure ``rte_node_xstats`` to ``rte_node_register`` and
>    added ``xstat_off`` to ``rte_node``.
>  
> +* graph: The members ``dispatch`` and ``xstat_off`` of the structure ``rte_node`` have been
> +  marked as RTE_CACHE_LINE_MIN_SIZE bytes aligned.
>  
>  Known Issues
>  ------------
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index a518af2b2a..d3ec88519d 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -104,16 +104,20 @@ struct __rte_cache_aligned rte_node {
>  	/** Original process function when pcap is enabled. */
>  	rte_node_process_t original_process;
>  
> +	/** Fast schedule area for mcore dispatch model. */
>  	union {
> -		/* Fast schedule area for mcore dispatch model */
> -		struct {
> +		alignas(RTE_CACHE_LINE_MIN_SIZE) struct {
>  			unsigned int lcore_id;  /**< Node running lcore. */
>  			uint64_t total_sched_objs; /**< Number of objects scheduled. */
>  			uint64_t total_sched_fail; /**< Number of scheduled failure. */
>  		} dispatch;
>  	};
> +
> +	/** Fast path area cache line 1. */
> +	alignas(RTE_CACHE_LINE_MIN_SIZE)
>  	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
> -	/* Fast path area  */
> +
> +	/** Fast path area cache line 2. */
>  	__extension__ struct __rte_cache_aligned {
>  #define RTE_NODE_CTX_SZ 16
>  		union {
> 






^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v5 1/1] graph: improve node layout
  2024-11-15 14:23           ` Thomas Monjalon
@ 2024-11-15 15:57             ` Jerin Jacob
  2024-11-19 10:31               ` Thomas Monjalon
  0 siblings, 1 reply; 32+ messages in thread
From: Jerin Jacob @ 2024-11-15 15:57 UTC (permalink / raw)
  To: Thomas Monjalon, Nithin Kumar Dabilpuram
  Cc: Kiran Kumar Kokkilagadda, yanzhirun_163, dev, Huichao Cai



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, November 15, 2024 7:54 PM
> To: Jerin Jacob <jerinj@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>
> Cc: Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
> yanzhirun_163@163.com; dev@dpdk.org; Huichao Cai <chcchc88@163.com>
> Subject: [EXTERNAL] Re: [PATCH v5 1/1] graph: improve node layout
> 
> Is it good to go? 15/11/2024 02: 55, Huichao Cai: > The members "dispatch"
> and "xstat_off" of the structure "rte_node" > can be min cache aligned to make
> room for future expansion and to > make sure have better performance. Add
> corresponding 
> Is it good to go?
> 
> 
> 15/11/2024 02:55, Huichao Cai:
> > The members "dispatch" and "xstat_off" of the structure "rte_node"
> > can be min cache aligned to make room for future expansion and to make
> > sure have better performance. Add corresponding comments.
> >
> > Signed-off-by: Huichao Cai <chcchc88@163.com>]


Acked-by: Jerin Jacob <jerinj@marvell.com>


> > ---
> >  doc/guides/rel_notes/release_24_11.rst |  2 ++
> >  lib/graph/rte_graph_worker_common.h    | 10 +++++++---
> >  2 files changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_24_11.rst
> > b/doc/guides/rel_notes/release_24_11.rst
> > index 5063badf39..32800e8cb0 100644
> > --- a/doc/guides/rel_notes/release_24_11.rst
> > +++ b/doc/guides/rel_notes/release_24_11.rst
> > @@ -491,6 +491,8 @@ ABI Changes
> >    added new structure ``rte_node_xstats`` to ``rte_node_register`` and
> >    added ``xstat_off`` to ``rte_node``.
> >
> > +* graph: The members ``dispatch`` and ``xstat_off`` of the structure
> > +``rte_node`` have been
> > +  marked as RTE_CACHE_LINE_MIN_SIZE bytes aligned.
> >
> >  Known Issues
> >  ------------
> > diff --git a/lib/graph/rte_graph_worker_common.h
> > b/lib/graph/rte_graph_worker_common.h
> > index a518af2b2a..d3ec88519d 100644
> > --- a/lib/graph/rte_graph_worker_common.h
> > +++ b/lib/graph/rte_graph_worker_common.h
> > @@ -104,16 +104,20 @@ struct __rte_cache_aligned rte_node {
> >  	/** Original process function when pcap is enabled. */
> >  	rte_node_process_t original_process;
> >
> > +	/** Fast schedule area for mcore dispatch model. */
> >  	union {
> > -		/* Fast schedule area for mcore dispatch model */
> > -		struct {
> > +		alignas(RTE_CACHE_LINE_MIN_SIZE) struct {
> >  			unsigned int lcore_id;  /**< Node running lcore. */
> >  			uint64_t total_sched_objs; /**< Number of objects
> scheduled. */
> >  			uint64_t total_sched_fail; /**< Number of scheduled
> failure. */
> >  		} dispatch;
> >  	};
> > +
> > +	/** Fast path area cache line 1. */
> > +	alignas(RTE_CACHE_LINE_MIN_SIZE)
> >  	rte_graph_off_t xstat_off; /**< Offset to xstat counters. */
> > -	/* Fast path area  */
> > +
> > +	/** Fast path area cache line 2. */
> >  	__extension__ struct __rte_cache_aligned {  #define RTE_NODE_CTX_SZ
> > 16
> >  		union {
> >
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v5 1/1] graph: improve node layout
  2024-11-15 15:57             ` [EXTERNAL] " Jerin Jacob
@ 2024-11-19 10:31               ` Thomas Monjalon
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Monjalon @ 2024-11-19 10:31 UTC (permalink / raw)
  To: Huichao Cai
  Cc: Nithin Kumar Dabilpuram, dev, Kiran Kumar Kokkilagadda,
	yanzhirun_163, dev, Jerin Jacob

> > 15/11/2024 02:55, Huichao Cai:
> > > The members "dispatch" and "xstat_off" of the structure "rte_node"
> > > can be min cache aligned to make room for future expansion and to make
> > > sure have better performance. Add corresponding comments.
> > >
> > > Signed-off-by: Huichao Cai <chcchc88@163.com>]
> 
> Acked-by: Jerin Jacob <jerinj@marvell.com>

Applied, thanks.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v5] graph: mcore: optimize graph search
  2024-11-14  8:45     ` [PATCH v4 1/2] graph: mcore: optimize graph search Huichao Cai
  2024-11-14  8:45       ` [PATCH v4 2/2] graph: add alignment to the member of rte_node Huichao Cai
@ 2024-12-13  2:21       ` Huichao Cai
  2024-12-13 14:36         ` David Marchand
  2024-12-16  1:43         ` [PATCH v6] " Huichao Cai
  1 sibling, 2 replies; 32+ messages in thread
From: Huichao Cai @ 2024-12-13  2:21 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163; +Cc: dev

In the function __rte_graph_mcore_dispatch_sched_node_enqueue,
use a slower loop to search for the graph, modify the search logic
to record the result of the first search, and use this record for
subsequent searches to improve search speed.

Signed-off-by: Huichao Cai <chcchc88@163.com>
---
 doc/guides/rel_notes/release_25_03.rst     |  2 ++
 lib/graph/rte_graph_model_mcore_dispatch.c | 11 +++++++----
 lib/graph/rte_graph_worker_common.h        |  1 +
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_25_03.rst b/doc/guides/rel_notes/release_25_03.rst
index 426dfcd982..55ffe8170d 100644
--- a/doc/guides/rel_notes/release_25_03.rst
+++ b/doc/guides/rel_notes/release_25_03.rst
@@ -102,6 +102,8 @@ ABI Changes
 
 * No ABI change that would break compatibility with 24.11.
 
+* graph: Added ``graph`` field to the ``dispatch`` structure in the ``rte_node`` structure.
+
 
 Known Issues
 ------------
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index a590fc9497..a81d338227 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -118,11 +118,14 @@ __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
 					      struct rte_graph_rq_head *rq)
 {
 	const unsigned int lcore_id = node->dispatch.lcore_id;
-	struct rte_graph *graph;
+	struct rte_graph *graph = node->dispatch.graph;
 
-	SLIST_FOREACH(graph, rq, next)
-		if (graph->dispatch.lcore_id == lcore_id)
-			break;
+	if (unlikely((!graph) || (graph->dispatch.lcore_id != lcore_id))) {
+		SLIST_FOREACH(graph, rq, next)
+			if (graph->dispatch.lcore_id == lcore_id)
+				break;
+		node->dispatch.graph = graph;
+	}
 
 	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index d3ec88519d..aef0f65673 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
 			unsigned int lcore_id;  /**< Node running lcore. */
 			uint64_t total_sched_objs; /**< Number of objects scheduled. */
 			uint64_t total_sched_fail; /**< Number of scheduled failure. */
+			struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
 		} dispatch;
 	};
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5] graph: mcore: optimize graph search
  2024-12-13  2:21       ` [PATCH v5] graph: mcore: optimize graph search Huichao Cai
@ 2024-12-13 14:36         ` David Marchand
  2024-12-16  1:43         ` [PATCH v6] " Huichao Cai
  1 sibling, 0 replies; 32+ messages in thread
From: David Marchand @ 2024-12-13 14:36 UTC (permalink / raw)
  To: Huichao Cai
  Cc: jerinj, kirankumark, ndabilpuram, yanzhirun_163, dev, Thomas Monjalon

On Fri, Dec 13, 2024 at 3:22 AM Huichao Cai <chcchc88@163.com> wrote:
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index d3ec88519d..aef0f65673 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
>                         unsigned int lcore_id;  /**< Node running lcore. */
>                         uint64_t total_sched_objs; /**< Number of objects scheduled. */
>                         uint64_t total_sched_fail; /**< Number of scheduled failure. */
> +                       struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
>                 } dispatch;
>         };

The rte_node struct size is not changed with this patch.
In v24.11, rte_node objects are populated/allocated in
graph_nodes_populate which zero's the whole rte_node.
So this change looks safe from an ABI compat with v24.11 pov.

However, we need to waive the warning from libabigail:
http://mails.dpdk.org/archives/test-report/2024-December/834167.html

Please add a temporary exception in devtools/libabigail.abignore.

It should be something like:
[suppress_type]
       name = rte_node
       has_size_change = no
       has_data_member_inserted_between =
{offset_of(total_sched_fail), offset_of(xstat_off)}


-- 
David Marchand


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6] graph: mcore: optimize graph search
  2024-12-13  2:21       ` [PATCH v5] graph: mcore: optimize graph search Huichao Cai
  2024-12-13 14:36         ` David Marchand
@ 2024-12-16  1:43         ` Huichao Cai
  2024-12-16 14:49           ` David Marchand
  2024-12-20  2:05           ` Huichao Cai
  1 sibling, 2 replies; 32+ messages in thread
From: Huichao Cai @ 2024-12-16  1:43 UTC (permalink / raw)
  Cc: dev, jerinj, kirankumark, ndabilpuram, yanzhirun_163

In the function __rte_graph_mcore_dispatch_sched_node_enqueue,
use a slower loop to search for the graph, modify the search logic
to record the result of the first search, and use this record for
subsequent searches to improve search speed.

Signed-off-by: Huichao Cai <chcchc88@163.com>
---
 devtools/libabigail.abignore               |  5 +++++
 doc/guides/rel_notes/release_25_03.rst     |  2 ++
 lib/graph/rte_graph_model_mcore_dispatch.c | 11 +++++++----
 lib/graph/rte_graph_worker_common.h        |  1 +
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 21b8cd6113..a92ee29512 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,3 +33,8 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+[suppress_type]
+       name = rte_node
+       has_size_change = no
+       has_data_member_inserted_between =
+{offset_of(total_sched_fail), offset_of(xstat_off)}
diff --git a/doc/guides/rel_notes/release_25_03.rst b/doc/guides/rel_notes/release_25_03.rst
index 426dfcd982..55ffe8170d 100644
--- a/doc/guides/rel_notes/release_25_03.rst
+++ b/doc/guides/rel_notes/release_25_03.rst
@@ -102,6 +102,8 @@ ABI Changes
 
 * No ABI change that would break compatibility with 24.11.
 
+* graph: Added ``graph`` field to the ``dispatch`` structure in the ``rte_node`` structure.
+
 
 Known Issues
 ------------
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.c b/lib/graph/rte_graph_model_mcore_dispatch.c
index a590fc9497..a81d338227 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.c
+++ b/lib/graph/rte_graph_model_mcore_dispatch.c
@@ -118,11 +118,14 @@ __rte_graph_mcore_dispatch_sched_node_enqueue(struct rte_node *node,
 					      struct rte_graph_rq_head *rq)
 {
 	const unsigned int lcore_id = node->dispatch.lcore_id;
-	struct rte_graph *graph;
+	struct rte_graph *graph = node->dispatch.graph;
 
-	SLIST_FOREACH(graph, rq, next)
-		if (graph->dispatch.lcore_id == lcore_id)
-			break;
+	if (unlikely((!graph) || (graph->dispatch.lcore_id != lcore_id))) {
+		SLIST_FOREACH(graph, rq, next)
+			if (graph->dispatch.lcore_id == lcore_id)
+				break;
+		node->dispatch.graph = graph;
+	}
 
 	return graph != NULL ? __graph_sched_node_enqueue(node, graph) : false;
 }
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index d3ec88519d..aef0f65673 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
 			unsigned int lcore_id;  /**< Node running lcore. */
 			uint64_t total_sched_objs; /**< Number of objects scheduled. */
 			uint64_t total_sched_fail; /**< Number of scheduled failure. */
+			struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
 		} dispatch;
 	};
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6] graph: mcore: optimize graph search
  2024-12-16  1:43         ` [PATCH v6] " Huichao Cai
@ 2024-12-16 14:49           ` David Marchand
  2024-12-17  9:04             ` David Marchand
  2024-12-20  2:05           ` Huichao Cai
  1 sibling, 1 reply; 32+ messages in thread
From: David Marchand @ 2024-12-16 14:49 UTC (permalink / raw)
  To: Dodji Seketeli
  Cc: dev, jerinj, kirankumark, ndabilpuram, yanzhirun_163, Huichao Cai

Salut Dodji,

On Mon, Dec 16, 2024 at 2:44 AM Huichao Cai <chcchc88@163.com> wrote:
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 21b8cd6113..a92ee29512 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -33,3 +33,8 @@
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ; Temporary exceptions till next major ABI version ;
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +[suppress_type]
> +       name = rte_node
> +       has_size_change = no
> +       has_data_member_inserted_between =
> +{offset_of(total_sched_fail), offset_of(xstat_off)}

Here is a suppression rule I suggested but does not have the intended effect.

For the context:

Before the change (that you can find below with the next hunk), we
made sure to zero the whole rte_node object at runtime in the library
allocator.
And the offset of the field next to 'dispatch' is fixed with an
explicit alignas() statement.

        /** Fast schedule area for mcore dispatch model. */
        union {
                alignas(RTE_CACHE_LINE_MIN_SIZE) struct {
                        unsigned int lcore_id;  /**< Node running
lcore. */
                        uint64_t total_sched_objs; /**< Number of
objects scheduled. */
                        uint64_t total_sched_fail; /**< Number of
scheduled failure. */
                } dispatch;
        };

        /** Fast path area cache line 1. */
        alignas(RTE_CACHE_LINE_MIN_SIZE)
        rte_graph_off_t xstat_off; /**< Offset to xstat counters. */

If you want the whole definition, you can have a look at:
https://git.dpdk.org/dpdk/tree/lib/graph/rte_graph_worker_common.h#n87

[...]

> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index d3ec88519d..aef0f65673 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -110,6 +110,7 @@ struct __rte_cache_aligned rte_node {
>                         unsigned int lcore_id;  /**< Node running lcore. */
>                         uint64_t total_sched_objs; /**< Number of objects scheduled. */
>                         uint64_t total_sched_fail; /**< Number of scheduled failure. */
> +                       struct rte_graph *graph;  /**< Graph corresponding to lcore_id. */
>                 } dispatch;
>         };

Now, the patch adds a new field in the struct {} dispatch field.

Here is what abidiff reports:

$ abidiff --version
abidiff: 2.6.0

$ abidiff --suppr
/home/dmarchan/git/pub/dpdk.org/main/devtools/libabigail.abignore
--no-added-syms --headers-dir1
/home/dmarchan/abi/v24.11/build-gcc-shared/usr/local/include
--headers-dir2 /home/dmarchan/builds/main/build-gcc-shared/install/usr/local/include
/home/dmarchan/abi/v24.11/build-gcc-shared/usr/local/lib64/librte_graph.so.25.0
/home/dmarchan/builds/main/build-gcc-shared/install/usr/local/lib64/librte_graph.so.25.1
Functions changes summary: 0 Removed, 1 Changed (9 filtered out), 0
Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

1 function with some indirect sub-type change:

  [C] 'function bool
__rte_graph_mcore_dispatch_sched_node_enqueue(rte_node*,
rte_graph_rq_head*)' at rte_graph_model_mcore_dispatch.c:117:1 has
some indirect sub-type changes:
    parameter 1 of type 'rte_node*' has sub-type changes:
      in pointed to type 'struct rte_node' at rte_graph_worker_common.h:92:1:
        type size hasn't changed
        1 data member changes (2 filtered):
          anonymous data member at offset 1536 (in bits) changed from:
            union {struct {unsigned int lcore_id; uint64_t
total_sched_objs; uint64_t total_sched_fail;} dispatch;}
          to:
            union {struct {unsigned int lcore_id; uint64_t
total_sched_objs; uint64_t total_sched_fail; rte_graph* graph;}
dispatch;}


What would be the best way to suppress this warning?
I tried the following which seems to work, but I prefer to ask for your advice.

[suppress_type]
    name = rte_node
    has_data_member_at = offset_of(total_sched_fail)


Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6] graph: mcore: optimize graph search
  2024-12-16 14:49           ` David Marchand
@ 2024-12-17  9:04             ` David Marchand
  0 siblings, 0 replies; 32+ messages in thread
From: David Marchand @ 2024-12-17  9:04 UTC (permalink / raw)
  To: Dodji Seketeli
  Cc: dev, jerinj, kirankumark, ndabilpuram, yanzhirun_163, Huichao Cai

On Mon, Dec 16, 2024 at 3:49 PM David Marchand
<david.marchand@redhat.com> wrote:
> $ abidiff --suppr
> /home/dmarchan/git/pub/dpdk.org/main/devtools/libabigail.abignore
> --no-added-syms --headers-dir1
> /home/dmarchan/abi/v24.11/build-gcc-shared/usr/local/include
> --headers-dir2 /home/dmarchan/builds/main/build-gcc-shared/install/usr/local/include
> /home/dmarchan/abi/v24.11/build-gcc-shared/usr/local/lib64/librte_graph.so.25.0
> /home/dmarchan/builds/main/build-gcc-shared/install/usr/local/lib64/librte_graph.so.25.1
> Functions changes summary: 0 Removed, 1 Changed (9 filtered out), 0
> Added functions
> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>
> 1 function with some indirect sub-type change:
>
>   [C] 'function bool
> __rte_graph_mcore_dispatch_sched_node_enqueue(rte_node*,
> rte_graph_rq_head*)' at rte_graph_model_mcore_dispatch.c:117:1 has
> some indirect sub-type changes:
>     parameter 1 of type 'rte_node*' has sub-type changes:
>       in pointed to type 'struct rte_node' at rte_graph_worker_common.h:92:1:
>         type size hasn't changed
>         1 data member changes (2 filtered):
>           anonymous data member at offset 1536 (in bits) changed from:
>             union {struct {unsigned int lcore_id; uint64_t
> total_sched_objs; uint64_t total_sched_fail;} dispatch;}
>           to:
>             union {struct {unsigned int lcore_id; uint64_t
> total_sched_objs; uint64_t total_sched_fail; rte_graph* graph;}
> dispatch;}
>
>
> What would be the best way to suppress this warning?
> I tried the following which seems to work, but I prefer to ask for your advice.
>
> [suppress_type]
>     name = rte_node
>     has_data_member_at = offset_of(total_sched_fail)

Gah.. I meant has_data_member_inserted_at.
But then testing with has_data_member_inserted_at, the warning is not
suppressed either...

Any help appreciated.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6] graph: mcore: optimize graph search
  2024-12-16  1:43         ` [PATCH v6] " Huichao Cai
  2024-12-16 14:49           ` David Marchand
@ 2024-12-20  2:05           ` Huichao Cai
  1 sibling, 0 replies; 32+ messages in thread
From: Huichao Cai @ 2024-12-20  2:05 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163, thomas; +Cc: dev

Hi David, does "has_data_member_inserted_between = {offset_of(total_sched_fail), offset_of(xstat_off)}" need to be on the same line?


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-12-20  2:06 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-07  8:04 [PATCH] graph: optimize graph search when scheduling nodes Huichao cai
2024-11-07  9:37 ` [EXTERNAL] " Jerin Jacob
2024-11-08  1:39   ` Huichao Cai
2024-11-08 12:22     ` Jerin Jacob
2024-11-08 13:38       ` David Marchand
2024-11-11  5:38         ` Jerin Jacob
2024-11-12  8:51           ` David Marchand
2024-11-12  9:35             ` Jerin Jacob
2024-11-12 12:57               ` Huichao Cai
2024-11-13  9:22               ` Huichao Cai
2024-11-14  7:09                 ` Jerin Jacob
2024-11-11  4:03 ` [PATCH v2] graph: mcore: optimize graph search Huichao Cai
2024-11-11  5:46   ` [EXTERNAL] " Jerin Jacob
2024-11-13  9:19     ` Huichao Cai
2024-11-13  7:35   ` [PATCH v3 1/2] " Huichao Cai
2024-11-13  7:35     ` [PATCH v3 2/2] graph: add alignment to the member of rte_node Huichao Cai
2024-11-14  7:14       ` [EXTERNAL] " Jerin Jacob
2024-11-14  8:45     ` [PATCH v4 1/2] graph: mcore: optimize graph search Huichao Cai
2024-11-14  8:45       ` [PATCH v4 2/2] graph: add alignment to the member of rte_node Huichao Cai
2024-11-14 10:05         ` [EXTERNAL] " Jerin Jacob
2024-11-14 12:06           ` Huichao Cai
2024-11-14 13:04             ` Jerin Jacob
2024-11-15  1:55         ` [PATCH v5 1/1] graph: improve node layout Huichao Cai
2024-11-15 14:23           ` Thomas Monjalon
2024-11-15 15:57             ` [EXTERNAL] " Jerin Jacob
2024-11-19 10:31               ` Thomas Monjalon
2024-12-13  2:21       ` [PATCH v5] graph: mcore: optimize graph search Huichao Cai
2024-12-13 14:36         ` David Marchand
2024-12-16  1:43         ` [PATCH v6] " Huichao Cai
2024-12-16 14:49           ` David Marchand
2024-12-17  9:04             ` David Marchand
2024-12-20  2:05           ` Huichao Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).