* [PATCH] event/eth_tx: prefetch mbuf headers
@ 2025-03-28 5:43 Mattias Rönnblom
2025-03-28 6:07 ` Mattias Rönnblom
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Mattias Rönnblom @ 2025-03-28 5:43 UTC (permalink / raw)
To: dev
Cc: Mattias Rönnblom, Naga Harish K S V, Jerin Jacob,
Mattias Rönnblom, Peter Nilsson
Prefetch mbuf headers, resulting in ~10% throughput improvement when
the Ethernet RX and TX Adapters are hosted on the same core (likely
~2x in case a dedicated TX core is used).
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
---
lib/eventdev/rte_event_eth_tx_adapter.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c b/lib/eventdev/rte_event_eth_tx_adapter.c
index 67fff8b7d6..d740ae00f9 100644
--- a/lib/eventdev/rte_event_eth_tx_adapter.c
+++ b/lib/eventdev/rte_event_eth_tx_adapter.c
@@ -598,6 +598,12 @@ txa_process_event_vector(struct txa_service_data *txa,
return nb_tx;
}
+static inline void
+txa_prefetch_mbuf(struct rte_mbuf *mbuf)
+{
+ rte_mbuf_prefetch_part1(mbuf);
+}
+
static void
txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
uint32_t n)
@@ -608,6 +614,20 @@ txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
stats = &txa->stats;
+ for (i = 0; i < n; i++) {
+ struct rte_event *event = &ev[i];
+
+ if (unlikely(event->event_type & RTE_EVENT_TYPE_VECTOR)) {
+ struct rte_event_vector *vec = event->vec;
+ struct rte_mbuf **mbufs = vec->mbufs;
+ uint32_t k;
+
+ for (k = 0; k < vec->nb_elem; k++)
+ txa_prefetch_mbuf(mbufs[k]);
+ } else
+ txa_prefetch_mbuf(event->mbuf);
+ }
+
nb_tx = 0;
for (i = 0; i < n; i++) {
uint16_t port;
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] event/eth_tx: prefetch mbuf headers
2025-03-28 5:43 [PATCH] event/eth_tx: prefetch mbuf headers Mattias Rönnblom
@ 2025-03-28 6:07 ` Mattias Rönnblom
2025-05-20 12:56 ` Mattias Rönnblom
2025-05-27 5:01 ` [EXTERNAL] " Jerin Jacob
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Mattias Rönnblom @ 2025-03-28 6:07 UTC (permalink / raw)
To: Mattias Rönnblom, dev; +Cc: Naga Harish K S V, Jerin Jacob, Peter Nilsson
On 2025-03-28 06:43, Mattias Rönnblom wrote:
> Prefetch mbuf headers, resulting in ~10% throughput improvement when
> the Ethernet RX and TX Adapters are hosted on the same core (likely
> ~2x in case a dedicated TX core is used).
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
What should be added is that what's been tested is the
non-RTE_EVENT_TYPE_VECTOR case.
> ---
> lib/eventdev/rte_event_eth_tx_adapter.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c b/lib/eventdev/rte_event_eth_tx_adapter.c
> index 67fff8b7d6..d740ae00f9 100644
> --- a/lib/eventdev/rte_event_eth_tx_adapter.c
> +++ b/lib/eventdev/rte_event_eth_tx_adapter.c
> @@ -598,6 +598,12 @@ txa_process_event_vector(struct txa_service_data *txa,
> return nb_tx;
> }
>
> +static inline void
> +txa_prefetch_mbuf(struct rte_mbuf *mbuf)
> +{
> + rte_mbuf_prefetch_part1(mbuf);
> +}
> +
> static void
> txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
> uint32_t n)
> @@ -608,6 +614,20 @@ txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
>
> stats = &txa->stats;
>
> + for (i = 0; i < n; i++) {
> + struct rte_event *event = &ev[i];
> +
> + if (unlikely(event->event_type & RTE_EVENT_TYPE_VECTOR)) {
> + struct rte_event_vector *vec = event->vec;
> + struct rte_mbuf **mbufs = vec->mbufs;
> + uint32_t k;
> +
> + for (k = 0; k < vec->nb_elem; k++)
> + txa_prefetch_mbuf(mbufs[k]);
> + } else
> + txa_prefetch_mbuf(event->mbuf);
> + }
> +
> nb_tx = 0;
> for (i = 0; i < n; i++) {
> uint16_t port;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] event/eth_tx: prefetch mbuf headers
2025-03-28 6:07 ` Mattias Rönnblom
@ 2025-05-20 12:56 ` Mattias Rönnblom
0 siblings, 0 replies; 11+ messages in thread
From: Mattias Rönnblom @ 2025-05-20 12:56 UTC (permalink / raw)
To: Mattias Rönnblom, dev; +Cc: Naga Harish K S V, Jerin Jacob, Peter Nilsson
On 2025-03-28 07:07, Mattias Rönnblom wrote:
> On 2025-03-28 06:43, Mattias Rönnblom wrote:
>> Prefetch mbuf headers, resulting in ~10% throughput improvement when
>> the Ethernet RX and TX Adapters are hosted on the same core (likely
>> ~2x in case a dedicated TX core is used).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
>
<snip>
Naga, could you comment on this patch?
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [EXTERNAL] [PATCH] event/eth_tx: prefetch mbuf headers
2025-03-28 5:43 [PATCH] event/eth_tx: prefetch mbuf headers Mattias Rönnblom
2025-03-28 6:07 ` Mattias Rönnblom
@ 2025-05-27 5:01 ` Jerin Jacob
2025-05-27 10:55 ` Naga Harish K, S V
2025-07-10 15:37 ` Stephen Hemminger
3 siblings, 0 replies; 11+ messages in thread
From: Jerin Jacob @ 2025-05-27 5:01 UTC (permalink / raw)
To: Mattias Rönnblom, dev, Naga Harish K S V
Cc: Mattias Rönnblom, Peter Nilsson
[-- Attachment #1: Type: text/html, Size: 8086 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] event/eth_tx: prefetch mbuf headers
2025-03-28 5:43 [PATCH] event/eth_tx: prefetch mbuf headers Mattias Rönnblom
2025-03-28 6:07 ` Mattias Rönnblom
2025-05-27 5:01 ` [EXTERNAL] " Jerin Jacob
@ 2025-05-27 10:55 ` Naga Harish K, S V
2025-07-02 20:19 ` Mattias Rönnblom
2025-07-10 15:37 ` Stephen Hemminger
3 siblings, 1 reply; 11+ messages in thread
From: Naga Harish K, S V @ 2025-05-27 10:55 UTC (permalink / raw)
To: Mattias Rönnblom, dev
Cc: Mattias Rönnblom, Jerin Jacob, Peter Nilsson
> -----Original Message-----
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Sent: Friday, March 28, 2025 11:14 AM
> To: dev@dpdk.org
> Cc: Mattias Rönnblom <hofors@lysator.liu.se>; Naga Harish K, S V
> <s.v.naga.harish.k@intel.com>; Jerin Jacob <jerinj@marvell.com>; Mattias
> Rönnblom <mattias.ronnblom@ericsson.com>; Peter Nilsson
> <peter.j.nilsson@ericsson.com>
> Subject: [PATCH] event/eth_tx: prefetch mbuf headers
>
> Prefetch mbuf headers, resulting in ~10% throughput improvement when the
> Ethernet RX and TX Adapters are hosted on the same core (likely ~2x in case a
> dedicated TX core is used).
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
> ---
> lib/eventdev/rte_event_eth_tx_adapter.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c
> b/lib/eventdev/rte_event_eth_tx_adapter.c
> index 67fff8b7d6..d740ae00f9 100644
> --- a/lib/eventdev/rte_event_eth_tx_adapter.c
> +++ b/lib/eventdev/rte_event_eth_tx_adapter.c
> @@ -598,6 +598,12 @@ txa_process_event_vector(struct txa_service_data
> *txa,
> return nb_tx;
> }
>
> +static inline void
> +txa_prefetch_mbuf(struct rte_mbuf *mbuf) {
> + rte_mbuf_prefetch_part1(mbuf);
> +}
> +
> static void
> txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
> uint32_t n)
> @@ -608,6 +614,20 @@ txa_service_tx(struct txa_service_data *txa, struct
> rte_event *ev,
>
> stats = &txa->stats;
>
> + for (i = 0; i < n; i++) {
> + struct rte_event *event = &ev[i];
> +
> + if (unlikely(event->event_type & RTE_EVENT_TYPE_VECTOR))
This gives a branch prediction advantage to non-vector events. Is that the intention?
> {
> + struct rte_event_vector *vec = event->vec;
> + struct rte_mbuf **mbufs = vec->mbufs;
> + uint32_t k;
> +
> + for (k = 0; k < vec->nb_elem; k++)
> + txa_prefetch_mbuf(mbufs[k]);
> + } else
> + txa_prefetch_mbuf(event->mbuf);
> + }
> +
> nb_tx = 0;
> for (i = 0; i < n; i++) {
> uint16_t port;
> --
> 2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] event/eth_tx: prefetch mbuf headers
2025-05-27 10:55 ` Naga Harish K, S V
@ 2025-07-02 20:19 ` Mattias Rönnblom
2025-07-07 9:00 ` Naga Harish K, S V
0 siblings, 1 reply; 11+ messages in thread
From: Mattias Rönnblom @ 2025-07-02 20:19 UTC (permalink / raw)
To: Naga Harish K, S V, Mattias Rönnblom, dev; +Cc: Jerin Jacob, Peter Nilsson
On 2025-05-27 12:55, Naga Harish K, S V wrote:
>
>
>> -----Original Message-----
>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Sent: Friday, March 28, 2025 11:14 AM
>> To: dev@dpdk.org
>> Cc: Mattias Rönnblom <hofors@lysator.liu.se>; Naga Harish K, S V
>> <s.v.naga.harish.k@intel.com>; Jerin Jacob <jerinj@marvell.com>; Mattias
>> Rönnblom <mattias.ronnblom@ericsson.com>; Peter Nilsson
>> <peter.j.nilsson@ericsson.com>
>> Subject: [PATCH] event/eth_tx: prefetch mbuf headers
>>
>> Prefetch mbuf headers, resulting in ~10% throughput improvement when the
>> Ethernet RX and TX Adapters are hosted on the same core (likely ~2x in case a
>> dedicated TX core is used).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
>> ---
>> lib/eventdev/rte_event_eth_tx_adapter.c | 20 ++++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>>
>> diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c
>> b/lib/eventdev/rte_event_eth_tx_adapter.c
>> index 67fff8b7d6..d740ae00f9 100644
>> --- a/lib/eventdev/rte_event_eth_tx_adapter.c
>> +++ b/lib/eventdev/rte_event_eth_tx_adapter.c
>> @@ -598,6 +598,12 @@ txa_process_event_vector(struct txa_service_data
>> *txa,
>> return nb_tx;
>> }
>>
>> +static inline void
>> +txa_prefetch_mbuf(struct rte_mbuf *mbuf) {
>> + rte_mbuf_prefetch_part1(mbuf);
>> +}
>> +
>> static void
>> txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
>> uint32_t n)
>> @@ -608,6 +614,20 @@ txa_service_tx(struct txa_service_data *txa, struct
>> rte_event *ev,
>>
>> stats = &txa->stats;
>>
>> + for (i = 0; i < n; i++) {
>> + struct rte_event *event = &ev[i];
>> +
>> + if (unlikely(event->event_type & RTE_EVENT_TYPE_VECTOR))
>
>
> This gives a branch prediction advantage to non-vector events. Is that the intention?
>
Yes.
>> {
>> + struct rte_event_vector *vec = event->vec;
>> + struct rte_mbuf **mbufs = vec->mbufs;
>> + uint32_t k;
>> +
>> + for (k = 0; k < vec->nb_elem; k++)
>> + txa_prefetch_mbuf(mbufs[k]);
>> + } else
>> + txa_prefetch_mbuf(event->mbuf);
>> + }
>> +
>> nb_tx = 0;
>> for (i = 0; i < n; i++) {
>> uint16_t port;
>> --
>> 2.43.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] event/eth_tx: prefetch mbuf headers
2025-07-02 20:19 ` Mattias Rönnblom
@ 2025-07-07 9:00 ` Naga Harish K, S V
2025-07-07 11:57 ` Mattias Rönnblom
0 siblings, 1 reply; 11+ messages in thread
From: Naga Harish K, S V @ 2025-07-07 9:00 UTC (permalink / raw)
To: Mattias Rönnblom, Mattias Rönnblom, dev
Cc: Jerin Jacob, Peter Nilsson
> -----Original Message-----
> From: Mattias Rönnblom <hofors@lysator.liu.se>
> Sent: Thursday, July 3, 2025 1:50 AM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; dev@dpdk.org
> Cc: Jerin Jacob <jerinj@marvell.com>; Peter Nilsson
> <peter.j.nilsson@ericsson.com>
> Subject: Re: [PATCH] event/eth_tx: prefetch mbuf headers
>
> On 2025-05-27 12:55, Naga Harish K, S V wrote:
> >
> >
> >> -----Original Message-----
> >> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> Sent: Friday, March 28, 2025 11:14 AM
> >> To: dev@dpdk.org
> >> Cc: Mattias Rönnblom <hofors@lysator.liu.se>; Naga Harish K, S V
> >> <s.v.naga.harish.k@intel.com>; Jerin Jacob <jerinj@marvell.com>;
> >> Mattias Rönnblom <mattias.ronnblom@ericsson.com>; Peter Nilsson
> >> <peter.j.nilsson@ericsson.com>
> >> Subject: [PATCH] event/eth_tx: prefetch mbuf headers
> >>
> >> Prefetch mbuf headers, resulting in ~10% throughput improvement when
> >> the Ethernet RX and TX Adapters are hosted on the same core (likely
> >> ~2x in case a dedicated TX core is used).
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
> >> ---
> >> lib/eventdev/rte_event_eth_tx_adapter.c | 20 ++++++++++++++++++++
> >> 1 file changed, 20 insertions(+)
> >>
> >> diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c
> >> b/lib/eventdev/rte_event_eth_tx_adapter.c
> >> index 67fff8b7d6..d740ae00f9 100644
> >> --- a/lib/eventdev/rte_event_eth_tx_adapter.c
> >> +++ b/lib/eventdev/rte_event_eth_tx_adapter.c
> >> @@ -598,6 +598,12 @@ txa_process_event_vector(struct
> txa_service_data
> >> *txa,
> >> return nb_tx;
> >> }
> >>
> >> +static inline void
> >> +txa_prefetch_mbuf(struct rte_mbuf *mbuf) {
> >> + rte_mbuf_prefetch_part1(mbuf);
> >> +}
> >> +
> >> static void
> >> txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
> >> uint32_t n)
> >> @@ -608,6 +614,20 @@ txa_service_tx(struct txa_service_data *txa,
> >> struct rte_event *ev,
> >>
> >> stats = &txa->stats;
> >>
> >> + for (i = 0; i < n; i++) {
> >> + struct rte_event *event = &ev[i];
> >> +
> >> + if (unlikely(event->event_type & RTE_EVENT_TYPE_VECTOR))
> >
> >
> > This gives a branch prediction advantage to non-vector events. Is that the
> intention?
> >
>
> Yes.
I think all event-types need to be equally weighted. My ask was to remove the "unlikely" for vector events.
>
> >> {
> >> + struct rte_event_vector *vec = event->vec;
> >> + struct rte_mbuf **mbufs = vec->mbufs;
> >> + uint32_t k;
> >> +
> >> + for (k = 0; k < vec->nb_elem; k++)
> >> + txa_prefetch_mbuf(mbufs[k]);
> >> + } else
> >> + txa_prefetch_mbuf(event->mbuf);
> >> + }
> >> +
> >> nb_tx = 0;
> >> for (i = 0; i < n; i++) {
> >> uint16_t port;
> >> --
> >> 2.43.0
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] event/eth_tx: prefetch mbuf headers
2025-07-07 9:00 ` Naga Harish K, S V
@ 2025-07-07 11:57 ` Mattias Rönnblom
2025-07-10 4:34 ` Naga Harish K, S V
0 siblings, 1 reply; 11+ messages in thread
From: Mattias Rönnblom @ 2025-07-07 11:57 UTC (permalink / raw)
To: Naga Harish K, S V, Mattias Rönnblom, dev; +Cc: Jerin Jacob, Peter Nilsson
On 2025-07-07 11:00, Naga Harish K, S V wrote:
>
>
>> -----Original Message-----
>> From: Mattias Rönnblom <hofors@lysator.liu.se>
>> Sent: Thursday, July 3, 2025 1:50 AM
>> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>; Mattias Rönnblom
>> <mattias.ronnblom@ericsson.com>; dev@dpdk.org
>> Cc: Jerin Jacob <jerinj@marvell.com>; Peter Nilsson
>> <peter.j.nilsson@ericsson.com>
>> Subject: Re: [PATCH] event/eth_tx: prefetch mbuf headers
>>
>> On 2025-05-27 12:55, Naga Harish K, S V wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> Sent: Friday, March 28, 2025 11:14 AM
>>>> To: dev@dpdk.org
>>>> Cc: Mattias Rönnblom <hofors@lysator.liu.se>; Naga Harish K, S V
>>>> <s.v.naga.harish.k@intel.com>; Jerin Jacob <jerinj@marvell.com>;
>>>> Mattias Rönnblom <mattias.ronnblom@ericsson.com>; Peter Nilsson
>>>> <peter.j.nilsson@ericsson.com>
>>>> Subject: [PATCH] event/eth_tx: prefetch mbuf headers
>>>>
>>>> Prefetch mbuf headers, resulting in ~10% throughput improvement when
>>>> the Ethernet RX and TX Adapters are hosted on the same core (likely
>>>> ~2x in case a dedicated TX core is used).
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
>>>> ---
>>>> lib/eventdev/rte_event_eth_tx_adapter.c | 20 ++++++++++++++++++++
>>>> 1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c
>>>> b/lib/eventdev/rte_event_eth_tx_adapter.c
>>>> index 67fff8b7d6..d740ae00f9 100644
>>>> --- a/lib/eventdev/rte_event_eth_tx_adapter.c
>>>> +++ b/lib/eventdev/rte_event_eth_tx_adapter.c
>>>> @@ -598,6 +598,12 @@ txa_process_event_vector(struct
>> txa_service_data
>>>> *txa,
>>>> return nb_tx;
>>>> }
>>>>
>>>> +static inline void
>>>> +txa_prefetch_mbuf(struct rte_mbuf *mbuf) {
>>>> + rte_mbuf_prefetch_part1(mbuf);
>>>> +}
>>>> +
>>>> static void
>>>> txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
>>>> uint32_t n)
>>>> @@ -608,6 +614,20 @@ txa_service_tx(struct txa_service_data *txa,
>>>> struct rte_event *ev,
>>>>
>>>> stats = &txa->stats;
>>>>
>>>> + for (i = 0; i < n; i++) {
>>>> + struct rte_event *event = &ev[i];
>>>> +
>>>> + if (unlikely(event->event_type & RTE_EVENT_TYPE_VECTOR))
>>>
>>>
>>> This gives a branch prediction advantage to non-vector events. Is that the
>> intention?
>>>
>>
>> Yes.
>
> I think all event-types need to be equally weighted. My ask was to remove the "unlikely" for vector events.
>
This is not possible. One branch will always be cheaper. If you leave
out unlikely()/likely(), you leave all control to compiler heuristics.
In this case, I think the resulting object code will be identical (on GCC).
RTE_EVENT_TYPE_VECTOR will result in fewer events, and thus the
per-event overhead is less of an issue. So if you weigh the importance
of vector and non-vector use cases equally, you should optimize for the
non-vector case.
>>
>>>> {
>>>> + struct rte_event_vector *vec = event->vec;
>>>> + struct rte_mbuf **mbufs = vec->mbufs;
>>>> + uint32_t k;
>>>> +
>>>> + for (k = 0; k < vec->nb_elem; k++)
>>>> + txa_prefetch_mbuf(mbufs[k]);
>>>> + } else
>>>> + txa_prefetch_mbuf(event->mbuf);
>>>> + }
>>>> +
>>>> nb_tx = 0;
>>>> for (i = 0; i < n; i++) {
>>>> uint16_t port;
>>>> --
>>>> 2.43.0
>>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] event/eth_tx: prefetch mbuf headers
2025-07-07 11:57 ` Mattias Rönnblom
@ 2025-07-10 4:34 ` Naga Harish K, S V
0 siblings, 0 replies; 11+ messages in thread
From: Naga Harish K, S V @ 2025-07-10 4:34 UTC (permalink / raw)
To: Mattias Rönnblom, Mattias Rönnblom, dev
Cc: Jerin Jacob, Peter Nilsson
> -----Original Message-----
> From: Mattias Rönnblom <hofors@lysator.liu.se>
> Sent: Monday, July 7, 2025 5:27 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; dev@dpdk.org
> Cc: Jerin Jacob <jerinj@marvell.com>; Peter Nilsson
> <peter.j.nilsson@ericsson.com>
> Subject: Re: [PATCH] event/eth_tx: prefetch mbuf headers
>
> On 2025-07-07 11:00, Naga Harish K, S V wrote:
> >
> >
> >> -----Original Message-----
> >> From: Mattias Rönnblom <hofors@lysator.liu.se>
> >> Sent: Thursday, July 3, 2025 1:50 AM
> >> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>; Mattias
> >> Rönnblom <mattias.ronnblom@ericsson.com>; dev@dpdk.org
> >> Cc: Jerin Jacob <jerinj@marvell.com>; Peter Nilsson
> >> <peter.j.nilsson@ericsson.com>
> >> Subject: Re: [PATCH] event/eth_tx: prefetch mbuf headers
> >>
> >> On 2025-05-27 12:55, Naga Harish K, S V wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>> Sent: Friday, March 28, 2025 11:14 AM
> >>>> To: dev@dpdk.org
> >>>> Cc: Mattias Rönnblom <hofors@lysator.liu.se>; Naga Harish K, S V
> >>>> <s.v.naga.harish.k@intel.com>; Jerin Jacob <jerinj@marvell.com>;
> >>>> Mattias Rönnblom <mattias.ronnblom@ericsson.com>; Peter Nilsson
> >>>> <peter.j.nilsson@ericsson.com>
> >>>> Subject: [PATCH] event/eth_tx: prefetch mbuf headers
> >>>>
> >>>> Prefetch mbuf headers, resulting in ~10% throughput improvement
> >>>> when the Ethernet RX and TX Adapters are hosted on the same core
> >>>> (likely ~2x in case a dedicated TX core is used).
> >>>>
> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
Acked-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
> >>>> ---
> >>>> lib/eventdev/rte_event_eth_tx_adapter.c | 20
> ++++++++++++++++++++
> >>>> 1 file changed, 20 insertions(+)
> >>>>
> >>>> diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c
> >>>> b/lib/eventdev/rte_event_eth_tx_adapter.c
> >>>> index 67fff8b7d6..d740ae00f9 100644
> >>>> --- a/lib/eventdev/rte_event_eth_tx_adapter.c
> >>>> +++ b/lib/eventdev/rte_event_eth_tx_adapter.c
> >>>> @@ -598,6 +598,12 @@ txa_process_event_vector(struct
> >> txa_service_data
> >>>> *txa,
> >>>> return nb_tx;
> >>>> }
> >>>>
> >>>> +static inline void
> >>>> +txa_prefetch_mbuf(struct rte_mbuf *mbuf) {
> >>>> + rte_mbuf_prefetch_part1(mbuf);
> >>>> +}
> >>>> +
> >>>> static void
> >>>> txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
> >>>> uint32_t n)
> >>>> @@ -608,6 +614,20 @@ txa_service_tx(struct txa_service_data *txa,
> >>>> struct rte_event *ev,
> >>>>
> >>>> stats = &txa->stats;
> >>>>
> >>>> + for (i = 0; i < n; i++) {
> >>>> + struct rte_event *event = &ev[i];
> >>>> +
> >>>> + if (unlikely(event->event_type & RTE_EVENT_TYPE_VECTOR))
> >>>
> >>>
> >>> This gives a branch prediction advantage to non-vector events. Is
> >>> that the
> >> intention?
> >>>
> >>
> >> Yes.
> >
> > I think all event-types need to be equally weighted. My ask was to remove
> the "unlikely" for vector events.
> >
>
> This is not possible. One branch will always be cheaper. If you leave out
> unlikely()/likely(), you leave all control to compiler heuristics.
> In this case, I think the resulting object code will be identical (on GCC).
>
> RTE_EVENT_TYPE_VECTOR will result in fewer events, and thus the per-event
> overhead is less of an issue. So if you weigh the importance of vector and non-
> vector use cases equally, you should optimize for the non-vector case.
>
Fine, agreed.
> >>
> >>>> {
> >>>> + struct rte_event_vector *vec = event->vec;
> >>>> + struct rte_mbuf **mbufs = vec->mbufs;
> >>>> + uint32_t k;
> >>>> +
> >>>> + for (k = 0; k < vec->nb_elem; k++)
> >>>> + txa_prefetch_mbuf(mbufs[k]);
> >>>> + } else
> >>>> + txa_prefetch_mbuf(event->mbuf);
> >>>> + }
> >>>> +
> >>>> nb_tx = 0;
> >>>> for (i = 0; i < n; i++) {
> >>>> uint16_t port;
> >>>> --
> >>>> 2.43.0
> >>>
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] event/eth_tx: prefetch mbuf headers
2025-03-28 5:43 [PATCH] event/eth_tx: prefetch mbuf headers Mattias Rönnblom
` (2 preceding siblings ...)
2025-05-27 10:55 ` Naga Harish K, S V
@ 2025-07-10 15:37 ` Stephen Hemminger
2025-07-11 12:44 ` Mattias Rönnblom
3 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2025-07-10 15:37 UTC (permalink / raw)
To: Mattias Rönnblom
Cc: dev, Mattias Rönnblom, Naga Harish K S V, Jerin Jacob,
Peter Nilsson
On Fri, 28 Mar 2025 06:43:39 +0100
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> Prefetch mbuf headers, resulting in ~10% throughput improvement when
> the Ethernet RX and TX Adapters are hosted on the same core (likely
> ~2x in case a dedicated TX core is used).
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
Prefetching all the mbufs can be counter productive on a big burst.
VPP does something similar but more unrolled.
See https://fd.io/docs/vpp/v2101/gettingstarted/developers/vnet.html#single-dual-loops
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] event/eth_tx: prefetch mbuf headers
2025-07-10 15:37 ` Stephen Hemminger
@ 2025-07-11 12:44 ` Mattias Rönnblom
0 siblings, 0 replies; 11+ messages in thread
From: Mattias Rönnblom @ 2025-07-11 12:44 UTC (permalink / raw)
To: Stephen Hemminger, Mattias Rönnblom
Cc: dev, Naga Harish K S V, Jerin Jacob, Peter Nilsson
On 2025-07-10 17:37, Stephen Hemminger wrote:
> On Fri, 28 Mar 2025 06:43:39 +0100
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>
>> Prefetch mbuf headers, resulting in ~10% throughput improvement when
>> the Ethernet RX and TX Adapters are hosted on the same core (likely
>> ~2x in case a dedicated TX core is used).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
>
> Prefetching all the mbufs can be counter productive on a big burst.
>
For the non-vector case, the burst is no larger than 32. From what's
available in terms of public information, the number of load queue
entries is 72 on Skylake. What it is on newer micro architecture
generations, I don't know. So 32 is a lot of prefetches, but at least
likely smaller than the load queue.
> VPP does something similar but more unrolled.
> See https://fd.io/docs/vpp/v2101/gettingstarted/developers/vnet.html#single-dual-loops
This pattern makes sense, if the do_something_to() function has
non-trivial latency.
If it doesn't, which I suspect is the case in the TX adapter case, you
will issue 4 prefetches, of which some or even all aren't resolved
before the core need to data. Repeat.
Also - and I'm guessing now - the do_something_to() equivalent in the TX
adapter case is likely not allocating a lot of load buffer entries, so
little risk of the prefetches being discarded.
That said, I'm sure you can tweak non-vector TXA prefetching to further
improve performance. For example, it may be little point in prefetching
the first few mbuf headers, since you will need that data very soon indeed.
I no longer have the setup to further refine this patch. I suggest we
live with only ~20% performance gain at this point.
For the vector case, I agree this loop may result in too many prefetches.
I can remove prefetching from the vector case, to maintain legacy
performance. I could also cap the number of prefetches (e.g., to 32).
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-07-11 12:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-28 5:43 [PATCH] event/eth_tx: prefetch mbuf headers Mattias Rönnblom
2025-03-28 6:07 ` Mattias Rönnblom
2025-05-20 12:56 ` Mattias Rönnblom
2025-05-27 5:01 ` [EXTERNAL] " Jerin Jacob
2025-05-27 10:55 ` Naga Harish K, S V
2025-07-02 20:19 ` Mattias Rönnblom
2025-07-07 9:00 ` Naga Harish K, S V
2025-07-07 11:57 ` Mattias Rönnblom
2025-07-10 4:34 ` Naga Harish K, S V
2025-07-10 15:37 ` Stephen Hemminger
2025-07-11 12:44 ` Mattias Rönnblom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).