* [PATCH] eventdev: avoid non-burst shortcut for variable-size bursts
@ 2023-05-11 6:43 Mattias Rönnblom
2023-05-11 7:09 ` Morten Brørup
0 siblings, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2023-05-11 6:43 UTC (permalink / raw)
To: jerinj; +Cc: Jerin Jacob, hofors, dev, Mattias Rönnblom
Use non-burst event enqueue and dequeue calls from burst enqueue and
dequeue only when the burst size is compile-time constant (and equal
to one).
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
lib/eventdev/rte_eventdev.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index a90e23ac8b..8af15816db 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
* Allow zero cost non burst mode routine invocation if application
* requests nb_events as const one
*/
- if (nb_events == 1)
+ if (__builtin_constant_p(nb_events) && nb_events == 1)
return (fp_ops->enqueue)(port, ev);
else
return fn(port, ev, nb_events);
@@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
* Allow zero cost non burst mode routine invocation if application
* requests nb_events as const one
*/
- if (nb_events == 1)
+ if (__builtin_constant_p(nb_events) && nb_events == 1)
return (fp_ops->dequeue)(port, ev, timeout_ticks);
else
return (fp_ops->dequeue_burst)(port, ev, nb_events,
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-11 6:43 [PATCH] eventdev: avoid non-burst shortcut for variable-size bursts Mattias Rönnblom
@ 2023-05-11 7:09 ` Morten Brørup
2023-05-11 8:16 ` [PATCH v2] " Mattias Rönnblom
2023-05-11 8:29 ` [PATCH] " Mattias Rönnblom
0 siblings, 2 replies; 18+ messages in thread
From: Morten Brørup @ 2023-05-11 7:09 UTC (permalink / raw)
To: Mattias Rönnblom, jerinj; +Cc: Jerin Jacob, hofors, dev
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Thursday, 11 May 2023 08.43
>
> Use non-burst event enqueue and dequeue calls from burst enqueue and
> dequeue only when the burst size is compile-time constant (and equal
> to one).
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
> lib/eventdev/rte_eventdev.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> index a90e23ac8b..8af15816db 100644
> --- a/lib/eventdev/rte_eventdev.h
> +++ b/lib/eventdev/rte_eventdev.h
> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t
> port_id,
> * Allow zero cost non burst mode routine invocation if application
> * requests nb_events as const one
> */
> - if (nb_events == 1)
> + if (__builtin_constant_p(nb_events) && nb_events == 1)
In my experience using __builtin_constant_p(), you need __extension__(__builtin_constant_p(nb_events)) to avoid warnings about using this non-standard feature.
> return (fp_ops->enqueue)(port, ev);
> else
> return fn(port, ev, nb_events);
> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id,
> struct rte_event ev[],
> * Allow zero cost non burst mode routine invocation if application
> * requests nb_events as const one
> */
> - if (nb_events == 1)
> + if (__builtin_constant_p(nb_events) && nb_events == 1)
> return (fp_ops->dequeue)(port, ev, timeout_ticks);
> else
> return (fp_ops->dequeue_burst)(port, ev, nb_events,
> --
> 2.34.1
With __extension__() added,
Acked-by: Morten Brørup <mb@smartsharesystems.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-11 7:09 ` Morten Brørup
@ 2023-05-11 8:16 ` Mattias Rönnblom
2023-05-11 8:24 ` [PATCH v3] " Mattias Rönnblom
2023-05-11 8:24 ` [PATCH v2] " Morten Brørup
2023-05-11 8:29 ` [PATCH] " Mattias Rönnblom
1 sibling, 2 replies; 18+ messages in thread
From: Mattias Rönnblom @ 2023-05-11 8:16 UTC (permalink / raw)
To: jerinj
Cc: Jerin Jacob, hofors, dev, Morten Brørup, Mattias Rönnblom
Use non-burst event enqueue and dequeue calls from burst enqueue and
dequeue only when the burst size is compile-time constant (and equal
to one).
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
v2: Wrap builtin call in __extension__, to avoid compiler warnings if
application is compiled with -pedantic. (Morten Brørup)
---
lib/eventdev/rte_eventdev.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index a90e23ac8b..8af15816db 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
* Allow zero cost non burst mode routine invocation if application
* requests nb_events as const one
*/
- if (nb_events == 1)
+ if (__builtin_constant_p(nb_events) && nb_events == 1)
return (fp_ops->enqueue)(port, ev);
else
return fn(port, ev, nb_events);
@@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
* Allow zero cost non burst mode routine invocation if application
* requests nb_events as const one
*/
- if (nb_events == 1)
+ if (__builtin_constant_p(nb_events) && nb_events == 1)
return (fp_ops->dequeue)(port, ev, timeout_ticks);
else
return (fp_ops->dequeue_burst)(port, ev, nb_events,
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-11 8:16 ` [PATCH v2] " Mattias Rönnblom
@ 2023-05-11 8:24 ` Mattias Rönnblom
2023-05-11 9:01 ` Morten Brørup
2023-05-12 11:59 ` Jerin Jacob
2023-05-11 8:24 ` [PATCH v2] " Morten Brørup
1 sibling, 2 replies; 18+ messages in thread
From: Mattias Rönnblom @ 2023-05-11 8:24 UTC (permalink / raw)
To: jerinj
Cc: Jerin Jacob, hofors, dev, Morten Brørup, Mattias Rönnblom
Use non-burst event enqueue and dequeue calls from burst enqueue and
dequeue only when the burst size is compile-time constant (and equal
to one).
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
v3: Actually include the change v2 claimed to contain.
v2: Wrap builtin call in __extension__, to avoid compiler warnings if
application is compiled with -pedantic. (Morten Brørup)
---
lib/eventdev/rte_eventdev.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index a90e23ac8b..a471caeb6d 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
* Allow zero cost non burst mode routine invocation if application
* requests nb_events as const one
*/
- if (nb_events == 1)
+ if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
return (fp_ops->enqueue)(port, ev);
else
return fn(port, ev, nb_events);
@@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
* Allow zero cost non burst mode routine invocation if application
* requests nb_events as const one
*/
- if (nb_events == 1)
+ if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
return (fp_ops->dequeue)(port, ev, timeout_ticks);
else
return (fp_ops->dequeue_burst)(port, ev, nb_events,
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v2] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-11 8:16 ` [PATCH v2] " Mattias Rönnblom
2023-05-11 8:24 ` [PATCH v3] " Mattias Rönnblom
@ 2023-05-11 8:24 ` Morten Brørup
1 sibling, 0 replies; 18+ messages in thread
From: Morten Brørup @ 2023-05-11 8:24 UTC (permalink / raw)
To: Mattias Rönnblom, jerinj; +Cc: Jerin Jacob, hofors, dev
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Thursday, 11 May 2023 10.17
>
> Use non-burst event enqueue and dequeue calls from burst enqueue and
> dequeue only when the burst size is compile-time constant (and equal
> to one).
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-11 7:09 ` Morten Brørup
2023-05-11 8:16 ` [PATCH v2] " Mattias Rönnblom
@ 2023-05-11 8:29 ` Mattias Rönnblom
2023-05-11 9:06 ` Morten Brørup
1 sibling, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2023-05-11 8:29 UTC (permalink / raw)
To: Morten Brørup, jerinj; +Cc: Jerin Jacob, hofors, dev
On 2023-05-11 09:09, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Thursday, 11 May 2023 08.43
>>
>> Use non-burst event enqueue and dequeue calls from burst enqueue and
>> dequeue only when the burst size is compile-time constant (and equal
>> to one).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>> lib/eventdev/rte_eventdev.h | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>> index a90e23ac8b..8af15816db 100644
>> --- a/lib/eventdev/rte_eventdev.h
>> +++ b/lib/eventdev/rte_eventdev.h
>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t
>> port_id,
>> * Allow zero cost non burst mode routine invocation if application
>> * requests nb_events as const one
>> */
>> - if (nb_events == 1)
>> + if (__builtin_constant_p(nb_events) && nb_events == 1)
>
> In my experience using __builtin_constant_p(), you need __extension__(__builtin_constant_p(nb_events)) to avoid warnings about using this non-standard feature.
>
Yes, since it's a public header. Thanks.
>> return (fp_ops->enqueue)(port, ev);
>> else
>> return fn(port, ev, nb_events);
>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id,
>> struct rte_event ev[],
>> * Allow zero cost non burst mode routine invocation if application
>> * requests nb_events as const one
>> */
>> - if (nb_events == 1)
>> + if (__builtin_constant_p(nb_events) && nb_events == 1)
>> return (fp_ops->dequeue)(port, ev, timeout_ticks);
>> else
>> return (fp_ops->dequeue_burst)(port, ev, nb_events,
>> --
>> 2.34.1
>
> With __extension__() added,
>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-11 8:24 ` [PATCH v3] " Mattias Rönnblom
@ 2023-05-11 9:01 ` Morten Brørup
2023-05-12 11:59 ` Jerin Jacob
1 sibling, 0 replies; 18+ messages in thread
From: Morten Brørup @ 2023-05-11 9:01 UTC (permalink / raw)
To: Mattias Rönnblom, jerinj; +Cc: Jerin Jacob, hofors, dev
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Thursday, 11 May 2023 10.24
>
> Use non-burst event enqueue and dequeue calls from burst enqueue and
> dequeue only when the burst size is compile-time constant (and equal
> to one).
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-11 8:29 ` [PATCH] " Mattias Rönnblom
@ 2023-05-11 9:06 ` Morten Brørup
0 siblings, 0 replies; 18+ messages in thread
From: Morten Brørup @ 2023-05-11 9:06 UTC (permalink / raw)
To: Mattias Rönnblom, jerinj; +Cc: Jerin Jacob, hofors, dev, bruce.richardson
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Thursday, 11 May 2023 10.30
>
> On 2023-05-11 09:09, Morten Brørup wrote:
[...]
> > In my experience using __builtin_constant_p(), you need
> __extension__(__builtin_constant_p(nb_events)) to avoid warnings about using
> this non-standard feature.
> >
>
> Yes, since it's a public header. Thanks.
If you run meson with -Dcheck_includes=true, ninja will catch it.
Unfortunately, the build time increases significantly when using this option.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-11 8:24 ` [PATCH v3] " Mattias Rönnblom
2023-05-11 9:01 ` Morten Brørup
@ 2023-05-12 11:59 ` Jerin Jacob
2023-05-12 13:15 ` Mattias Rönnblom
1 sibling, 1 reply; 18+ messages in thread
From: Jerin Jacob @ 2023-05-12 11:59 UTC (permalink / raw)
To: Mattias Rönnblom; +Cc: jerinj, hofors, dev, Morten Brørup
On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> Use non-burst event enqueue and dequeue calls from burst enqueue and
> dequeue only when the burst size is compile-time constant (and equal
> to one).
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>
> ---
>
> v3: Actually include the change v2 claimed to contain.
> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> application is compiled with -pedantic. (Morten Brørup)
> ---
> lib/eventdev/rte_eventdev.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> index a90e23ac8b..a471caeb6d 100644
> --- a/lib/eventdev/rte_eventdev.h
> +++ b/lib/eventdev/rte_eventdev.h
> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
> * Allow zero cost non burst mode routine invocation if application
> * requests nb_events as const one
> */
> - if (nb_events == 1)
> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
"Why" part is not clear from the commit message. Is this to avoid
nb_events read if it is built-in const.
If so, check should be following. Right?
if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
|| nb_events == 1)
At least, It was my original intention in the code.
> return (fp_ops->enqueue)(port, ev);
> else
> return fn(port, ev, nb_events);
> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
> * Allow zero cost non burst mode routine invocation if application
> * requests nb_events as const one
> */
> - if (nb_events == 1)
> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
> return (fp_ops->dequeue)(port, ev, timeout_ticks);
> else
> return (fp_ops->dequeue_burst)(port, ev, nb_events,
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-12 11:59 ` Jerin Jacob
@ 2023-05-12 13:15 ` Mattias Rönnblom
2023-05-12 13:56 ` Morten Brørup
2023-05-15 12:38 ` Jerin Jacob
0 siblings, 2 replies; 18+ messages in thread
From: Mattias Rönnblom @ 2023-05-12 13:15 UTC (permalink / raw)
To: Jerin Jacob; +Cc: jerinj, hofors, dev, Morten Brørup
On 2023-05-12 13:59, Jerin Jacob wrote:
> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> Use non-burst event enqueue and dequeue calls from burst enqueue and
>> dequeue only when the burst size is compile-time constant (and equal
>> to one).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>
>> ---
>>
>> v3: Actually include the change v2 claimed to contain.
>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
>> application is compiled with -pedantic. (Morten Brørup)
>> ---
>> lib/eventdev/rte_eventdev.h | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>> index a90e23ac8b..a471caeb6d 100644
>> --- a/lib/eventdev/rte_eventdev.h
>> +++ b/lib/eventdev/rte_eventdev.h
>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
>> * Allow zero cost non burst mode routine invocation if application
>> * requests nb_events as const one
>> */
>> - if (nb_events == 1)
>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>
> "Why" part is not clear from the commit message. Is this to avoid
> nb_events read if it is built-in const.
The __builtin_constant_p() is introduced to avoid having the compiler
generate a conditional branch and two different code paths in case
nb_elem is a run-time variable.
In particular, this matters if nb_elems is run-time variable and varies
between 1 and some larger value.
I should have mention this in the commit message.
A very slight performance improvement. It also makes the code better
match the comment, imo. Zero cost for const one enqueues, but no impact
non-compile-time-constant-length enqueues.
Feel free to ignore.
> If so, check should be following. Right?
>
> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
> || nb_events == 1)
>
> At least, It was my original intention in the code.
>
>
>
>> return (fp_ops->enqueue)(port, ev);
>> else
>> return fn(port, ev, nb_events);
>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
>> * Allow zero cost non burst mode routine invocation if application
>> * requests nb_events as const one
>> */
>> - if (nb_events == 1)
>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>> return (fp_ops->dequeue)(port, ev, timeout_ticks);
>> else
>> return (fp_ops->dequeue_burst)(port, ev, nb_events,
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-12 13:15 ` Mattias Rönnblom
@ 2023-05-12 13:56 ` Morten Brørup
2023-05-12 14:52 ` Mattias Rönnblom
2023-05-15 12:33 ` Jerin Jacob
2023-05-15 12:38 ` Jerin Jacob
1 sibling, 2 replies; 18+ messages in thread
From: Morten Brørup @ 2023-05-12 13:56 UTC (permalink / raw)
To: Mattias Rönnblom, Jerin Jacob; +Cc: jerinj, hofors, dev
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Friday, 12 May 2023 15.15
>
> On 2023-05-12 13:59, Jerin Jacob wrote:
> > On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >>
> >> Use non-burst event enqueue and dequeue calls from burst enqueue and
> >> dequeue only when the burst size is compile-time constant (and equal
> >> to one).
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>
> >> ---
> >>
> >> v3: Actually include the change v2 claimed to contain.
> >> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> >> application is compiled with -pedantic. (Morten Brørup)
> >> ---
> >> lib/eventdev/rte_eventdev.h | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> >> index a90e23ac8b..a471caeb6d 100644
> >> --- a/lib/eventdev/rte_eventdev.h
> >> +++ b/lib/eventdev/rte_eventdev.h
> >> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t
> port_id,
> >> * Allow zero cost non burst mode routine invocation if
> application
> >> * requests nb_events as const one
> >> */
> >> - if (nb_events == 1)
> >> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events ==
> 1)
> >
> > "Why" part is not clear from the commit message. Is this to avoid
> > nb_events read if it is built-in const.
>
> The __builtin_constant_p() is introduced to avoid having the compiler
> generate a conditional branch and two different code paths in case
> nb_elem is a run-time variable.
>
> In particular, this matters if nb_elems is run-time variable and varies
> between 1 and some larger value.
>
> I should have mention this in the commit message.
>
> A very slight performance improvement. It also makes the code better
> match the comment, imo. Zero cost for const one enqueues, but no impact
> non-compile-time-constant-length enqueues.
>
> Feel free to ignore.
>
> > If so, check should be following. Right?
> >
> > if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
> > || nb_events == 1)
@Mattias: You missed the second part of this comparison, also catching nb_events == 1 with non-constant nb_events.
@Jerin: Such a change has no effect, compared to the original code.
> >
> > At least, It was my original intention in the code.
@Jerin: Mattias implemented exactly what the comment says.
Perhaps only the comment should be updated, not the code.
Is nb_events likely to be non-constant 1, and are there benefits to calling either of the non-burst functions in those cases, vs. the branch cost of this comparison (which Mattias' patch gets rid of)?
> >
> >
> >
> >> return (fp_ops->enqueue)(port, ev);
> >> else
> >> return fn(port, ev, nb_events);
> >> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t
> port_id, struct rte_event ev[],
> >> * Allow zero cost non burst mode routine invocation if
> application
> >> * requests nb_events as const one
> >> */
> >> - if (nb_events == 1)
> >> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events ==
> 1)
> >> return (fp_ops->dequeue)(port, ev, timeout_ticks);
> >> else
> >> return (fp_ops->dequeue_burst)(port, ev, nb_events,
> >> --
> >> 2.34.1
> >>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-12 13:56 ` Morten Brørup
@ 2023-05-12 14:52 ` Mattias Rönnblom
2023-05-15 12:33 ` Jerin Jacob
1 sibling, 0 replies; 18+ messages in thread
From: Mattias Rönnblom @ 2023-05-12 14:52 UTC (permalink / raw)
To: Morten Brørup, Mattias Rönnblom, Jerin Jacob; +Cc: jerinj, dev
On 2023-05-12 15:56, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Friday, 12 May 2023 15.15
>>
>> On 2023-05-12 13:59, Jerin Jacob wrote:
>>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>
>>>> Use non-burst event enqueue and dequeue calls from burst enqueue and
>>>> dequeue only when the burst size is compile-time constant (and equal
>>>> to one).
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>
>>>> ---
>>>>
>>>> v3: Actually include the change v2 claimed to contain.
>>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
>>>> application is compiled with -pedantic. (Morten Brørup)
>>>> ---
>>>> lib/eventdev/rte_eventdev.h | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>>>> index a90e23ac8b..a471caeb6d 100644
>>>> --- a/lib/eventdev/rte_eventdev.h
>>>> +++ b/lib/eventdev/rte_eventdev.h
>>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t
>> port_id,
>>>> * Allow zero cost non burst mode routine invocation if
>> application
>>>> * requests nb_events as const one
>>>> */
>>>> - if (nb_events == 1)
>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events ==
>> 1)
>>>
>>> "Why" part is not clear from the commit message. Is this to avoid
>>> nb_events read if it is built-in const.
>>
>> The __builtin_constant_p() is introduced to avoid having the compiler
>> generate a conditional branch and two different code paths in case
>> nb_elem is a run-time variable.
>>
>> In particular, this matters if nb_elems is run-time variable and varies
>> between 1 and some larger value.
>>
>> I should have mention this in the commit message.
>>
>> A very slight performance improvement. It also makes the code better
>> match the comment, imo. Zero cost for const one enqueues, but no impact
>> non-compile-time-constant-length enqueues.
>>
>> Feel free to ignore.
>>
>>> If so, check should be following. Right?
>>>
>>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
>>> || nb_events == 1)
>
> @Mattias: You missed the second part of this comparison, also catching nb_events == 1 with non-constant nb_events.
>
I didn't comment on that code snippet since it was based on a
misconception of the intention of my patch.
> @Jerin: Such a change has no effect, compared to the original code.
>
>>>
>>> At least, It was my original intention in the code.
>
> @Jerin: Mattias implemented exactly what the comment says.
>
> Perhaps only the comment should be updated, not the code.
>
> Is nb_events likely to be non-constant 1, and are there benefits to calling either of the non-burst functions in those cases, vs. the branch cost of this comparison (which Mattias' patch gets rid of)?
>
I think the main worry would be the cost of branch mispredictions in
case of alternating enqueue sizes (between 1 and some other size).
If there is a performance upside to calling single-event enqueue in a
scenario where all enqueues are *run-time variable* and 1 (which I find
unlikely, but well inside the realms of the possibility), the next
question would be: OK, but how about for two events? Three? Four. Etc.
>>>
>>>
>>>
>>>> return (fp_ops->enqueue)(port, ev);
>>>> else
>>>> return fn(port, ev, nb_events);
>>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t
>> port_id, struct rte_event ev[],
>>>> * Allow zero cost non burst mode routine invocation if
>> application
>>>> * requests nb_events as const one
>>>> */
>>>> - if (nb_events == 1)
>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events ==
>> 1)
>>>> return (fp_ops->dequeue)(port, ev, timeout_ticks);
>>>> else
>>>> return (fp_ops->dequeue_burst)(port, ev, nb_events,
>>>> --
>>>> 2.34.1
>>>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-12 13:56 ` Morten Brørup
2023-05-12 14:52 ` Mattias Rönnblom
@ 2023-05-15 12:33 ` Jerin Jacob
1 sibling, 0 replies; 18+ messages in thread
From: Jerin Jacob @ 2023-05-15 12:33 UTC (permalink / raw)
To: Morten Brørup; +Cc: Mattias Rönnblom, jerinj, hofors, dev
On Fri, May 12, 2023 at 7:26 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> > Sent: Friday, 12 May 2023 15.15
> >
> > On 2023-05-12 13:59, Jerin Jacob wrote:
> > > On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> > > <mattias.ronnblom@ericsson.com> wrote:
> > >>
> > >> Use non-burst event enqueue and dequeue calls from burst enqueue and
> > >> dequeue only when the burst size is compile-time constant (and equal
> > >> to one).
> > >>
> > >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > >>
> > >> ---
> > >>
> > >> v3: Actually include the change v2 claimed to contain.
> > >> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> > >> application is compiled with -pedantic. (Morten Brørup)
> > >> ---
> > >> lib/eventdev/rte_eventdev.h | 4 ++--
> > >> 1 file changed, 2 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > >> index a90e23ac8b..a471caeb6d 100644
> > >> --- a/lib/eventdev/rte_eventdev.h
> > >> +++ b/lib/eventdev/rte_eventdev.h
> > >> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t
> > port_id,
> > >> * Allow zero cost non burst mode routine invocation if
> > application
> > >> * requests nb_events as const one
> > >> */
> > >> - if (nb_events == 1)
> > >> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events ==
> > 1)
> > >
> > > "Why" part is not clear from the commit message. Is this to avoid
> > > nb_events read if it is built-in const.
> >
> > The __builtin_constant_p() is introduced to avoid having the compiler
> > generate a conditional branch and two different code paths in case
> > nb_elem is a run-time variable.
> >
> > In particular, this matters if nb_elems is run-time variable and varies
> > between 1 and some larger value.
> >
> > I should have mention this in the commit message.
> >
> > A very slight performance improvement. It also makes the code better
> > match the comment, imo. Zero cost for const one enqueues, but no impact
> > non-compile-time-constant-length enqueues.
> >
> > Feel free to ignore.
> >
> > > If so, check should be following. Right?
> > >
> > > if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
> > > || nb_events == 1)
>
> @Mattias: You missed the second part of this comparison, also catching nb_events == 1 with non-constant nb_events.
>
> @Jerin: Such a change has no effect, compared to the original code.
Yes. That was the reason for I was skipping __builtin_constant_p in
the initial version.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-12 13:15 ` Mattias Rönnblom
2023-05-12 13:56 ` Morten Brørup
@ 2023-05-15 12:38 ` Jerin Jacob
2023-05-15 20:52 ` Mattias Rönnblom
1 sibling, 1 reply; 18+ messages in thread
From: Jerin Jacob @ 2023-05-15 12:38 UTC (permalink / raw)
To: Mattias Rönnblom; +Cc: jerinj, hofors, dev, Morten Brørup
On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2023-05-12 13:59, Jerin Jacob wrote:
> > On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >>
> >> Use non-burst event enqueue and dequeue calls from burst enqueue and
> >> dequeue only when the burst size is compile-time constant (and equal
> >> to one).
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>
> >> ---
> >>
> >> v3: Actually include the change v2 claimed to contain.
> >> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> >> application is compiled with -pedantic. (Morten Brørup)
> >> ---
> >> lib/eventdev/rte_eventdev.h | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> >> index a90e23ac8b..a471caeb6d 100644
> >> --- a/lib/eventdev/rte_eventdev.h
> >> +++ b/lib/eventdev/rte_eventdev.h
> >> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
> >> * Allow zero cost non burst mode routine invocation if application
> >> * requests nb_events as const one
> >> */
> >> - if (nb_events == 1)
> >> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
> >
> > "Why" part is not clear from the commit message. Is this to avoid
> > nb_events read if it is built-in const.
>
> The __builtin_constant_p() is introduced to avoid having the compiler
> generate a conditional branch and two different code paths in case
> nb_elem is a run-time variable.
>
> In particular, this matters if nb_elems is run-time variable and varies
> between 1 and some larger value.
>
> I should have mention this in the commit message.
>
> A very slight performance improvement. It also makes the code better
> match the comment, imo. Zero cost for const one enqueues, but no impact
> non-compile-time-constant-length enqueues.
>
> Feel free to ignore.
I did some performance comparison of the patch.
A low-end ARM machines shows 0.7% drop with single event case. No
regression see with high-end ARM cores with single event case.
IMO, optimizing the check for burst mode(the new patch) may not show
any real improvement as the cost is divided by number of event.
Whereas optimizing the check for single event case(The current code)
shows better performance with single event case and no regression
with burst mode as cost is divided by number of events.
If you agree, then we can skip this patch.
>
> > If so, check should be following. Right?
> >
> > if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
> > || nb_events == 1)
> >
> > At least, It was my original intention in the code.
> >
> >
> >
> >> return (fp_ops->enqueue)(port, ev);
> >> else
> >> return fn(port, ev, nb_events);
> >> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
> >> * Allow zero cost non burst mode routine invocation if application
> >> * requests nb_events as const one
> >> */
> >> - if (nb_events == 1)
> >> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
> >> return (fp_ops->dequeue)(port, ev, timeout_ticks);
> >> else
> >> return (fp_ops->dequeue_burst)(port, ev, nb_events,
> >> --
> >> 2.34.1
> >>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-15 12:38 ` Jerin Jacob
@ 2023-05-15 20:52 ` Mattias Rönnblom
2023-05-16 13:08 ` Jerin Jacob
0 siblings, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2023-05-15 20:52 UTC (permalink / raw)
To: Jerin Jacob, Mattias Rönnblom; +Cc: jerinj, dev, Morten Brørup
On 2023-05-15 14:38, Jerin Jacob wrote:
> On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> On 2023-05-12 13:59, Jerin Jacob wrote:
>>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>
>>>> Use non-burst event enqueue and dequeue calls from burst enqueue and
>>>> dequeue only when the burst size is compile-time constant (and equal
>>>> to one).
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>
>>>> ---
>>>>
>>>> v3: Actually include the change v2 claimed to contain.
>>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
>>>> application is compiled with -pedantic. (Morten Brørup)
>>>> ---
>>>> lib/eventdev/rte_eventdev.h | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>>>> index a90e23ac8b..a471caeb6d 100644
>>>> --- a/lib/eventdev/rte_eventdev.h
>>>> +++ b/lib/eventdev/rte_eventdev.h
>>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
>>>> * Allow zero cost non burst mode routine invocation if application
>>>> * requests nb_events as const one
>>>> */
>>>> - if (nb_events == 1)
>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>>>
>>> "Why" part is not clear from the commit message. Is this to avoid
>>> nb_events read if it is built-in const.
>>
>> The __builtin_constant_p() is introduced to avoid having the compiler
>> generate a conditional branch and two different code paths in case
>> nb_elem is a run-time variable.
>>
>> In particular, this matters if nb_elems is run-time variable and varies
>> between 1 and some larger value.
>>
>> I should have mention this in the commit message.
>>
>> A very slight performance improvement. It also makes the code better
>> match the comment, imo. Zero cost for const one enqueues, but no impact
>> non-compile-time-constant-length enqueues.
>>
>> Feel free to ignore.
>
>
> I did some performance comparison of the patch.
> A low-end ARM machines shows 0.7% drop with single event case. No
> regression see with high-end ARM cores with single event case.
>
> IMO, optimizing the check for burst mode(the new patch) may not show
> any real improvement as the cost is divided by number of event.
> Whereas optimizing the check for single event case(The current code)
> shows better performance with single event case and no regression
> with burst mode as cost is divided by number of events.
I ran some tests on an AMD Zen 3 with DSW.
In the below tests the enqueue burst size is not compile-time constant.
Enqueue burst size Performance improvement
Run-time constant 1 ~5%
Run-time constant 2 ~0%
Run-time variable 1-2 ~9%
Run-time variable 1-16 ~0%
The run-time variable enqueue sizes randomly (uniformly) distributed in
the specified range.
The first result may come as a surprise. The benchmark is using
RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type
in most apps). The single-event enqueue function only exists in a
generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent).
I suspect that is the reason for the performance improvement.
This effect is large-enough to make it somewhat beneficial (+~1%) to use
run-time variable single-event enqueue compared to keeping the burst
size compile-time constant.
The performance gain is counted toward both enqueue and dequeue costs
(+benchmark app overhead), so an under-estimation if see this as an
enqueue performance improvement.
> If you agree, then we can skip this patch.
>
I have no strong opinion if this should be included or not.
It was up to me, I would drop the single-enqueue special case handling
altogether in the next ABI update.
>
>>
>>> If so, check should be following. Right?
>>>
>>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
>>> || nb_events == 1)
>>>
>>> At least, It was my original intention in the code.
>>>
>>>
>>>
>>>> return (fp_ops->enqueue)(port, ev);
>>>> else
>>>> return fn(port, ev, nb_events);
>>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
>>>> * Allow zero cost non burst mode routine invocation if application
>>>> * requests nb_events as const one
>>>> */
>>>> - if (nb_events == 1)
>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>>>> return (fp_ops->dequeue)(port, ev, timeout_ticks);
>>>> else
>>>> return (fp_ops->dequeue_burst)(port, ev, nb_events,
>>>> --
>>>> 2.34.1
>>>>
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-15 20:52 ` Mattias Rönnblom
@ 2023-05-16 13:08 ` Jerin Jacob
2023-05-17 7:16 ` Mattias Rönnblom
0 siblings, 1 reply; 18+ messages in thread
From: Jerin Jacob @ 2023-05-16 13:08 UTC (permalink / raw)
To: Mattias Rönnblom
Cc: Mattias Rönnblom, jerinj, dev, Morten Brørup
On Tue, May 16, 2023 at 2:22 AM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>
> On 2023-05-15 14:38, Jerin Jacob wrote:
> > On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >>
> >> On 2023-05-12 13:59, Jerin Jacob wrote:
> >>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> >>> <mattias.ronnblom@ericsson.com> wrote:
> >>>>
> >>>> Use non-burst event enqueue and dequeue calls from burst enqueue and
> >>>> dequeue only when the burst size is compile-time constant (and equal
> >>>> to one).
> >>>>
> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>
> >>>> ---
> >>>>
> >>>> v3: Actually include the change v2 claimed to contain.
> >>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> >>>> application is compiled with -pedantic. (Morten Brørup)
> >>>> ---
> >>>> lib/eventdev/rte_eventdev.h | 4 ++--
> >>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> >>>> index a90e23ac8b..a471caeb6d 100644
> >>>> --- a/lib/eventdev/rte_eventdev.h
> >>>> +++ b/lib/eventdev/rte_eventdev.h
> >>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
> >>>> * Allow zero cost non burst mode routine invocation if application
> >>>> * requests nb_events as const one
> >>>> */
> >>>> - if (nb_events == 1)
> >>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
> >>>
> >>> "Why" part is not clear from the commit message. Is this to avoid
> >>> nb_events read if it is built-in const.
> >>
> >> The __builtin_constant_p() is introduced to avoid having the compiler
> >> generate a conditional branch and two different code paths in case
> >> nb_elem is a run-time variable.
> >>
> >> In particular, this matters if nb_elems is run-time variable and varies
> >> between 1 and some larger value.
> >>
> >> I should have mention this in the commit message.
> >>
> >> A very slight performance improvement. It also makes the code better
> >> match the comment, imo. Zero cost for const one enqueues, but no impact
> >> non-compile-time-constant-length enqueues.
> >>
> >> Feel free to ignore.
> >
> >
> > I did some performance comparison of the patch.
> > A low-end ARM machines shows 0.7% drop with single event case. No
> > regression see with high-end ARM cores with single event case.
> >
> > IMO, optimizing the check for burst mode(the new patch) may not show
> > any real improvement as the cost is divided by number of event.
> > Whereas optimizing the check for single event case(The current code)
> > shows better performance with single event case and no regression
> > with burst mode as cost is divided by number of events.
>
> I ran some tests on an AMD Zen 3 with DSW.
> In the below tests the enqueue burst size is not compile-time constant.
>
> Enqueue burst size Performance improvement
> Run-time constant 1 ~5%
> Run-time constant 2 ~0%
> Run-time variable 1-2 ~9%
> Run-time variable 1-16 ~0%
>
> The run-time variable enqueue sizes randomly (uniformly) distributed in
> the specified range.
>
> The first result may come as a surprise. The benchmark is using
> RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type
> in most apps). The single-event enqueue function only exists in a
> generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent).
> I suspect that is the reason for the performance improvement.
>
> This effect is large-enough to make it somewhat beneficial (+~1%) to use
> run-time variable single-event enqueue compared to keeping the burst
> size compile-time constant.
# Interesting, Could you share your testeventdev command to test it.
# By having quick glance on DSW code, following change can be added(or
similar cases).
Not sure such change in DSW driver is making a difference or nor?
diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
index e84b65d99f..455470997b 100644
--- a/drivers/event/dsw/dsw_event.c
+++ b/drivers/event/dsw/dsw_event.c
@@ -1251,7 +1251,7 @@ dsw_port_flush_out_buffers(struct dsw_evdev
*dsw, struct dsw_port *source_port)
uint16_t
dsw_event_enqueue(void *port, const struct rte_event *ev)
{
- return dsw_event_enqueue_burst(port, ev, unlikely(ev == NULL) ? 0 : 1);
+ return dsw_event_enqueue_burst(port, ev, 1);
}
static __rte_always_inline uint16_t
@@ -1340,7 +1340,7 @@ dsw_event_enqueue_burst_generic(struct dsw_port
*source_port,
return (num_non_release + num_release);
}
-uint16_t
+inline uint16_t
dsw_event_enqueue_burst(void *port, const struct rte_event events[],
uint16_t events_len)
{
# I am testing with command like this "/build/app/dpdk-test-eventdev
-l 0-23 -a 0002:0e:00.0 -- --test=perf_atq --plcores 1 --wlcores 8
--stlist p --nb_pkts=10000000000"
>
> The performance gain is counted toward both enqueue and dequeue costs
> (+benchmark app overhead), so an under-estimation if see this as an
> enqueue performance improvement.
>
> > If you agree, then we can skip this patch.
> >
>
> I have no strong opinion if this should be included or not.
>
> It was up to me, I would drop the single-enqueue special case handling
> altogether in the next ABI update.
That's a reasonable path. If we are willing to push a patch, we can
test it and give feedback.
Or in our spare time, We can do that as well.
>
> >
> >>
> >>> If so, check should be following. Right?
> >>>
> >>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
> >>> || nb_events == 1)
> >>>
> >>> At least, It was my original intention in the code.
> >>>
> >>>
> >>>
> >>>> return (fp_ops->enqueue)(port, ev);
> >>>> else
> >>>> return fn(port, ev, nb_events);
> >>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
> >>>> * Allow zero cost non burst mode routine invocation if application
> >>>> * requests nb_events as const one
> >>>> */
> >>>> - if (nb_events == 1)
> >>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
> >>>> return (fp_ops->dequeue)(port, ev, timeout_ticks);
> >>>> else
> >>>> return (fp_ops->dequeue_burst)(port, ev, nb_events,
> >>>> --
> >>>> 2.34.1
> >>>>
> >>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-16 13:08 ` Jerin Jacob
@ 2023-05-17 7:16 ` Mattias Rönnblom
2023-05-17 12:28 ` Jerin Jacob
0 siblings, 1 reply; 18+ messages in thread
From: Mattias Rönnblom @ 2023-05-17 7:16 UTC (permalink / raw)
To: Jerin Jacob; +Cc: Mattias Rönnblom, jerinj, dev, Morten Brørup
On 2023-05-16 15:08, Jerin Jacob wrote:
> On Tue, May 16, 2023 at 2:22 AM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>>
>> On 2023-05-15 14:38, Jerin Jacob wrote:
>>> On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>
>>>> On 2023-05-12 13:59, Jerin Jacob wrote:
>>>>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
>>>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>>>
>>>>>> Use non-burst event enqueue and dequeue calls from burst enqueue and
>>>>>> dequeue only when the burst size is compile-time constant (and equal
>>>>>> to one).
>>>>>>
>>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v3: Actually include the change v2 claimed to contain.
>>>>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
>>>>>> application is compiled with -pedantic. (Morten Brørup)
>>>>>> ---
>>>>>> lib/eventdev/rte_eventdev.h | 4 ++--
>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>>>>>> index a90e23ac8b..a471caeb6d 100644
>>>>>> --- a/lib/eventdev/rte_eventdev.h
>>>>>> +++ b/lib/eventdev/rte_eventdev.h
>>>>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
>>>>>> * Allow zero cost non burst mode routine invocation if application
>>>>>> * requests nb_events as const one
>>>>>> */
>>>>>> - if (nb_events == 1)
>>>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>>>>>
>>>>> "Why" part is not clear from the commit message. Is this to avoid
>>>>> nb_events read if it is built-in const.
>>>>
>>>> The __builtin_constant_p() is introduced to avoid having the compiler
>>>> generate a conditional branch and two different code paths in case
>>>> nb_elem is a run-time variable.
>>>>
>>>> In particular, this matters if nb_elems is run-time variable and varies
>>>> between 1 and some larger value.
>>>>
>>>> I should have mention this in the commit message.
>>>>
>>>> A very slight performance improvement. It also makes the code better
>>>> match the comment, imo. Zero cost for const one enqueues, but no impact
>>>> non-compile-time-constant-length enqueues.
>>>>
>>>> Feel free to ignore.
>>>
>>>
>>> I did some performance comparison of the patch.
>>> A low-end ARM machines shows 0.7% drop with single event case. No
>>> regression see with high-end ARM cores with single event case.
>>>
>>> IMO, optimizing the check for burst mode(the new patch) may not show
>>> any real improvement as the cost is divided by number of event.
>>> Whereas optimizing the check for single event case(The current code)
>>> shows better performance with single event case and no regression
>>> with burst mode as cost is divided by number of events.
>>
>> I ran some tests on an AMD Zen 3 with DSW.
>> In the below tests the enqueue burst size is not compile-time constant.
>>
>> Enqueue burst size Performance improvement
>> Run-time constant 1 ~5%
>> Run-time constant 2 ~0%
>> Run-time variable 1-2 ~9%
>> Run-time variable 1-16 ~0%
>>
>> The run-time variable enqueue sizes randomly (uniformly) distributed in
>> the specified range.
>>
>> The first result may come as a surprise. The benchmark is using
>> RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type
>> in most apps). The single-event enqueue function only exists in a
>> generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent).
>> I suspect that is the reason for the performance improvement.
>>
>> This effect is large-enough to make it somewhat beneficial (+~1%) to use
>> run-time variable single-event enqueue compared to keeping the burst
>> size compile-time constant.
>
> # Interesting, Could you share your testeventdev command to test it.
I'm using a proprietary benchmark to evaluate the effect of these
changes. There's certainly nothing secret about that program, and also
nothing very DSW-specific either. I hope to at some point both extend
DPDK eventdev tests to include DSW, and also to contribute
benchmarks/characteristics tests (perf unit tests or as a separate
program), if there seems to be a value in this.
> # By having quick glance on DSW code, following change can be added(or
> similar cases).
> Not sure such change in DSW driver is making a difference or nor?
>
>
> diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
> index e84b65d99f..455470997b 100644
> --- a/drivers/event/dsw/dsw_event.c
> +++ b/drivers/event/dsw/dsw_event.c
> @@ -1251,7 +1251,7 @@ dsw_port_flush_out_buffers(struct dsw_evdev
> *dsw, struct dsw_port *source_port)
> uint16_t
> dsw_event_enqueue(void *port, const struct rte_event *ev)
> {
> - return dsw_event_enqueue_burst(port, ev, unlikely(ev == NULL) ? 0 : 1);
> + return dsw_event_enqueue_burst(port, ev, 1);
Good point.
Historical note: I think that comparison is old cruft borne out of a
misconception, that the single-event enqueue could be called directly
from application code, combined with the fact that producer-only ports
needed some way to "maintain" a port, prior to the introduction of
rte_event_maintain().
> }
>
> static __rte_always_inline uint16_t
> @@ -1340,7 +1340,7 @@ dsw_event_enqueue_burst_generic(struct dsw_port
> *source_port,
> return (num_non_release + num_release);
> }
>
> -uint16_t
> +inline uint16_t
From what it seems, this does not have the desired effect, at least not
on GCC 11.3 (w/ the default DPDK compiler configuration).
I reached this conclusion when I noticed that if I reshuffle the code so
to force (not hint) the inlining of the burst (and generic burst)
enqueue function into dsw_event_enqueue(), your change performs better.
> dsw_event_enqueue_burst(void *port, const struct rte_event events[],
> uint16_t events_len)
> {
>
> # I am testing with command like this "/build/app/dpdk-test-eventdev
> -l 0-23 -a 0002:0e:00.0 -- --test=perf_atq --plcores 1 --wlcores 8
> --stlist p --nb_pkts=10000000000"
>
I re-ran the compile-time variable, run-time constant enqueue size of 1,
and I got the following:
Jerin's change: +4%
Jerin's change + ensure inlining: +6%
RFC v3: +7%
(Here I use a more different setup that produces more deterministic
results, hence the different numbers compared to the previous runs. They
were using a pipeline spread over two chiplets, and these runs are using
only a single chiplet.)
It seems like with your suggested changes you eliminate most of the
single-enqueue-special case performance degradation (for DSW), but not
all of it. The remaining degradation is very small (for the above case,
larger for small by run-time variable enqueue sizes), but it's a little
sad that a supposedly performance-enhancing special case (that drives
complexity in the code, although not much) actually degrades performance.
>>
>> The performance gain is counted toward both enqueue and dequeue costs
>> (+benchmark app overhead), so an under-estimation if see this as an
>> enqueue performance improvement.
>>
>>> If you agree, then we can skip this patch.
>>>
>>
>> I have no strong opinion if this should be included or not.
>>
>> It was up to me, I would drop the single-enqueue special case handling
>> altogether in the next ABI update.
>
> That's a reasonable path. If we are willing to push a patch, we can
> test it and give feedback.
> Or in our spare time, We can do that as well.
>
Sure, I'll give it a try.
The next release is an ABI-breaking one?
>>
>>>
>>>>
>>>>> If so, check should be following. Right?
>>>>>
>>>>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
>>>>> || nb_events == 1)
>>>>>
>>>>> At least, It was my original intention in the code.
>>>>>
>>>>>
>>>>>
>>>>>> return (fp_ops->enqueue)(port, ev);
>>>>>> else
>>>>>> return fn(port, ev, nb_events);
>>>>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
>>>>>> * Allow zero cost non burst mode routine invocation if application
>>>>>> * requests nb_events as const one
>>>>>> */
>>>>>> - if (nb_events == 1)
>>>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>>>>>> return (fp_ops->dequeue)(port, ev, timeout_ticks);
>>>>>> else
>>>>>> return (fp_ops->dequeue_burst)(port, ev, nb_events,
>>>>>> --
>>>>>> 2.34.1
>>>>>>
>>>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
2023-05-17 7:16 ` Mattias Rönnblom
@ 2023-05-17 12:28 ` Jerin Jacob
0 siblings, 0 replies; 18+ messages in thread
From: Jerin Jacob @ 2023-05-17 12:28 UTC (permalink / raw)
To: Mattias Rönnblom
Cc: Mattias Rönnblom, jerinj, dev, Morten Brørup
On Wed, May 17, 2023 at 12:46 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>
> On 2023-05-16 15:08, Jerin Jacob wrote:
> > On Tue, May 16, 2023 at 2:22 AM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> >>
> >> On 2023-05-15 14:38, Jerin Jacob wrote:
> >>> On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
> >>> <mattias.ronnblom@ericsson.com> wrote:
> >>>>
> >>>> On 2023-05-12 13:59, Jerin Jacob wrote:
> >>>>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> >>>>> <mattias.ronnblom@ericsson.com> wrote:
> >>>>>>
> >>>>>> Use non-burst event enqueue and dequeue calls from burst enqueue and
> >>>>>> dequeue only when the burst size is compile-time constant (and equal
> >>>>>> to one).
> >>>>>>
> >>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> v3: Actually include the change v2 claimed to contain.
> >>>>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> >>>>>> application is compiled with -pedantic. (Morten Brørup)
> >>>>>> ---
> >>>>>> lib/eventdev/rte_eventdev.h | 4 ++--
> >>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> >>>>>> index a90e23ac8b..a471caeb6d 100644
> >>>>>> --- a/lib/eventdev/rte_eventdev.h
> >>>>>> +++ b/lib/eventdev/rte_eventdev.h
> >>>>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
> >>>>>> * Allow zero cost non burst mode routine invocation if application
> >>>>>> * requests nb_events as const one
> >>>>>> */
> >>>>>> - if (nb_events == 1)
> >>>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
> >>>>>
> >>>>> "Why" part is not clear from the commit message. Is this to avoid
> >>>>> nb_events read if it is built-in const.
> >>>>
> >>>> The __builtin_constant_p() is introduced to avoid having the compiler
> >>>> generate a conditional branch and two different code paths in case
> >>>> nb_elem is a run-time variable.
> >>>>
> >>>> In particular, this matters if nb_elems is run-time variable and varies
> >>>> between 1 and some larger value.
> >>>>
> >>>> I should have mention this in the commit message.
> >>>>
> >>>> A very slight performance improvement. It also makes the code better
> >>>> match the comment, imo. Zero cost for const one enqueues, but no impact
> >>>> non-compile-time-constant-length enqueues.
> >>>>
> >>>> Feel free to ignore.
> >>>
> >>>
> >>> I did some performance comparison of the patch.
> >>> A low-end ARM machines shows 0.7% drop with single event case. No
> >>> regression see with high-end ARM cores with single event case.
> >>>
> >>> IMO, optimizing the check for burst mode(the new patch) may not show
> >>> any real improvement as the cost is divided by number of event.
> >>> Whereas optimizing the check for single event case(The current code)
> >>> shows better performance with single event case and no regression
> >>> with burst mode as cost is divided by number of events.
> >>
> >> I ran some tests on an AMD Zen 3 with DSW.
> >> In the below tests the enqueue burst size is not compile-time constant.
> >>
> >> Enqueue burst size Performance improvement
> >> Run-time constant 1 ~5%
> >> Run-time constant 2 ~0%
> >> Run-time variable 1-2 ~9%
> >> Run-time variable 1-16 ~0%
> >>
> >> The run-time variable enqueue sizes randomly (uniformly) distributed in
> >> the specified range.
> >>
> >> The first result may come as a surprise. The benchmark is using
> >> RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type
> >> in most apps). The single-event enqueue function only exists in a
> >> generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent).
> >> I suspect that is the reason for the performance improvement.
> >>
> >> This effect is large-enough to make it somewhat beneficial (+~1%) to use
> >> run-time variable single-event enqueue compared to keeping the burst
> >> size compile-time constant.
> >
> > # Interesting, Could you share your testeventdev command to test it.
>
> I'm using a proprietary benchmark to evaluate the effect of these
> changes. There's certainly nothing secret about that program, and also
> nothing very DSW-specific either. I hope to at some point both extend
> DPDK eventdev tests to include DSW, and also to contribute
> benchmarks/characteristics tests (perf unit tests or as a separate
> program), if there seems to be a value in this.
Yes. Please extend the testeventdev for your use case so that all
drivers can test
and help to optimize _real world_ cases. Testeventdev already has
plugin kind of interface,
it should pretty easy to add new MODES.
>
> > # By having quick glance on DSW code, following change can be added(or
> > similar cases).
> > Not sure such change in DSW driver is making a difference or nor?
> >
> >
> > diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
> > index e84b65d99f..455470997b 100644
> > --- a/drivers/event/dsw/dsw_event.c
> > +++ b/drivers/event/dsw/dsw_event.c
> > @@ -1251,7 +1251,7 @@ dsw_port_flush_out_buffers(struct dsw_evdev
> > *dsw, struct dsw_port *source_port)
> > uint16_t
> > dsw_event_enqueue(void *port, const struct rte_event *ev)
> > {
> > - return dsw_event_enqueue_burst(port, ev, unlikely(ev == NULL) ? 0 : 1);
> > + return dsw_event_enqueue_burst(port, ev, 1);
>
> Good point.
>
> Historical note: I think that comparison is old cruft borne out of a
> misconception, that the single-event enqueue could be called directly
> from application code, combined with the fact that producer-only ports
> needed some way to "maintain" a port, prior to the introduction of
> rte_event_maintain().
>
> > }
> >
> > static __rte_always_inline uint16_t
> > @@ -1340,7 +1340,7 @@ dsw_event_enqueue_burst_generic(struct dsw_port
> > *source_port,
> > return (num_non_release + num_release);
> > }
> >
> > -uint16_t
> > +inline uint16_t
>
> From what it seems, this does not have the desired effect, at least not
> on GCC 11.3 (w/ the default DPDK compiler configuration).
>
> I reached this conclusion when I noticed that if I reshuffle the code so
> to force (not hint) the inlining of the burst (and generic burst)
> enqueue function into dsw_event_enqueue(), your change performs better.
>
> > dsw_event_enqueue_burst(void *port, const struct rte_event events[],
> > uint16_t events_len)
> > {
> >
> > # I am testing with command like this "/build/app/dpdk-test-eventdev
> > -l 0-23 -a 0002:0e:00.0 -- --test=perf_atq --plcores 1 --wlcores 8
> > --stlist p --nb_pkts=10000000000"
> >
>
>
> I re-ran the compile-time variable, run-time constant enqueue size of 1,
> and I got the following:
>
> Jerin's change: +4%
> Jerin's change + ensure inlining: +6%
> RFC v3: +7%
>
> (Here I use a more different setup that produces more deterministic
> results, hence the different numbers compared to the previous runs. They
> were using a pipeline spread over two chiplets, and these runs are using
> only a single chiplet.)
>
> It seems like with your suggested changes you eliminate most of the
> single-enqueue-special case performance degradation (for DSW), but not
> all of it. The remaining degradation is very small (for the above case,
Cores like AMD Zen 3, I was not expecting 1% diff with such check.
e.s.p if it has proper branch predictors. Even pretty low-end arm
cores, had around
0.7% diff and new arm cores shows no difference.
> larger for small by run-time variable enqueue sizes), but it's a little
> sad that a supposedly performance-enhancing special case (that drives
> complexity in the code, although not much) actually degrades performance.
OK. Let's get rid of fp_ops->dequeue callback. Initial RFC of eventdev
has public non burst API,
that was the reason for that callback.
>
> >>
> >> The performance gain is counted toward both enqueue and dequeue costs
> >> (+benchmark app overhead), so an under-estimation if see this as an
> >> enqueue performance improvement.
> >>
> >>> If you agree, then we can skip this patch.
> >>>
> >>
> >> I have no strong opinion if this should be included or not.
> >>
> >> It was up to me, I would drop the single-enqueue special case handling
> >> altogether in the next ABI update.
> >
> > That's a reasonable path. If we are willing to push a patch, we can
> > test it and give feedback.
> > Or in our spare time, We can do that as well.
> >
>
> Sure, I'll give it a try.
>
> The next release is an ABI-breaking one?
Yes (23.11). Please plan to send deprecation notice before 23.07 release.
I will mark this patch as rejected in patchwork.
Thanks for your time.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-05-17 12:29 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-11 6:43 [PATCH] eventdev: avoid non-burst shortcut for variable-size bursts Mattias Rönnblom
2023-05-11 7:09 ` Morten Brørup
2023-05-11 8:16 ` [PATCH v2] " Mattias Rönnblom
2023-05-11 8:24 ` [PATCH v3] " Mattias Rönnblom
2023-05-11 9:01 ` Morten Brørup
2023-05-12 11:59 ` Jerin Jacob
2023-05-12 13:15 ` Mattias Rönnblom
2023-05-12 13:56 ` Morten Brørup
2023-05-12 14:52 ` Mattias Rönnblom
2023-05-15 12:33 ` Jerin Jacob
2023-05-15 12:38 ` Jerin Jacob
2023-05-15 20:52 ` Mattias Rönnblom
2023-05-16 13:08 ` Jerin Jacob
2023-05-17 7:16 ` Mattias Rönnblom
2023-05-17 12:28 ` Jerin Jacob
2023-05-11 8:24 ` [PATCH v2] " Morten Brørup
2023-05-11 8:29 ` [PATCH] " Mattias Rönnblom
2023-05-11 9:06 ` Morten Brørup
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).