From: "Mattias Rönnblom" <hofors@lysator.liu.se>
To: "Stephen Hemminger" <stephen@networkplumber.org>,
"Mattias Rönnblom" <mattias.ronnblom@ericsson.com>
Cc: dev@dpdk.org, Naga Harish K S V <s.v.naga.harish.k@intel.com>,
Jerin Jacob <jerinj@marvell.com>,
Peter Nilsson <peter.j.nilsson@ericsson.com>
Subject: Re: [PATCH] event/eth_tx: prefetch mbuf headers
Date: Fri, 11 Jul 2025 14:44:07 +0200 [thread overview]
Message-ID: <1dd2aaf4-9bc7-43fb-8261-0c4c9387ad07@lysator.liu.se> (raw)
In-Reply-To: <20250710083747.6f613e7f@hermes.local>
On 2025-07-10 17:37, Stephen Hemminger wrote:
> On Fri, 28 Mar 2025 06:43:39 +0100
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>
>> Prefetch mbuf headers, resulting in ~10% throughput improvement when
>> the Ethernet RX and TX Adapters are hosted on the same core (likely
>> ~2x in case a dedicated TX core is used).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Tested-by: Peter Nilsson <peter.j.nilsson@ericsson.com>
>
> Prefetching all the mbufs can be counter productive on a big burst.
>
For the non-vector case, the burst is no larger than 32. From what's
available in terms of public information, the number of load queue
entries is 72 on Skylake. What it is on newer micro architecture
generations, I don't know. So 32 is a lot of prefetches, but at least
likely smaller than the load queue.
> VPP does something similar but more unrolled.
> See https://fd.io/docs/vpp/v2101/gettingstarted/developers/vnet.html#single-dual-loops
This pattern makes sense, if the do_something_to() function has
non-trivial latency.
If it doesn't, which I suspect is the case in the TX adapter case, you
will issue 4 prefetches, of which some or even all aren't resolved
before the core need to data. Repeat.
Also - and I'm guessing now - the do_something_to() equivalent in the TX
adapter case is likely not allocating a lot of load buffer entries, so
little risk of the prefetches being discarded.
That said, I'm sure you can tweak non-vector TXA prefetching to further
improve performance. For example, it may be little point in prefetching
the first few mbuf headers, since you will need that data very soon indeed.
I no longer have the setup to further refine this patch. I suggest we
live with only ~20% performance gain at this point.
For the vector case, I agree this loop may result in too many prefetches.
I can remove prefetching from the vector case, to maintain legacy
performance. I could also cap the number of prefetches (e.g., to 32).
prev parent reply other threads:[~2025-07-11 12:44 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-28 5:43 Mattias Rönnblom
2025-03-28 6:07 ` Mattias Rönnblom
2025-05-20 12:56 ` Mattias Rönnblom
2025-05-27 5:01 ` [EXTERNAL] " Jerin Jacob
2025-05-27 10:55 ` Naga Harish K, S V
2025-07-02 20:19 ` Mattias Rönnblom
2025-07-07 9:00 ` Naga Harish K, S V
2025-07-07 11:57 ` Mattias Rönnblom
2025-07-10 4:34 ` Naga Harish K, S V
2025-07-10 15:37 ` Stephen Hemminger
2025-07-11 12:44 ` Mattias Rönnblom [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1dd2aaf4-9bc7-43fb-8261-0c4c9387ad07@lysator.liu.se \
--to=hofors@lysator.liu.se \
--cc=dev@dpdk.org \
--cc=jerinj@marvell.com \
--cc=mattias.ronnblom@ericsson.com \
--cc=peter.j.nilsson@ericsson.com \
--cc=s.v.naga.harish.k@intel.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).