From: Bruce Richardson <bruce.richardson@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx perf
Date: Thu, 18 Sep 2014 14:36:13 +0100 [thread overview]
Message-ID: <20140918133613.GA7208@BRICHA3-MOBL> (raw)
In-Reply-To: <20140917175936.GA13492@hmsreliant.think-freely.org>
On Wed, Sep 17, 2014 at 01:59:36PM -0400, Neil Horman wrote:
> On Wed, Sep 17, 2014 at 03:35:19PM +0000, Richardson, Bruce wrote:
> >
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > Sent: Wednesday, September 17, 2014 4:21 PM
> > > To: Richardson, Bruce
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx
> > > perf
> > >
> > > On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:
> > > > Make a small improvement to slow path TX performance by adding in a
> > > > prefetch for the second mbuf cache line.
> > > > Also move assignment of l2/l3 length values only when needed.
> > > >
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----
> > > > 1 file changed, 7 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > > > index 6f702b3..c0bb49f 100644
> > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > > > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf
> > > **tx_pkts,
> > > > ixgbe_xmit_cleanup(txq);
> > > > }
> > > >
> > > > + rte_prefetch0(&txe->mbuf->pool);
> > > > +
> > >
> > > Can you explain what all of these prefetches are doing? It looks to me like
> > > they're just fetching the first caheline of the mempool structure, which it
> > > appears amounts to the pools name. I don't see that having any use here.
> > >
> > This does make a decent enough performance difference in my tests (the amount varies depending on the RX path being used by testpmd).
> >
> > What I've done with the prefetches is two-fold:
> > 1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line.
> ok, this makes some sense, but you're not guaranteed to either have that
> prefetch be needed, nor are you certain it will still be in cache by the time
> you get to the free call. Seems like it might be preferable to prefecth the
> data pointed to by tx_pkt, as you're sure to use that every loop iteration.
The vast majority of the times the prefetch is necessary, and it does help
performance doing things this way. If the prefetch is not necessary, it's
just one extra instruction, while, if it is needed, having the prefetch
occur 20 cycles before access (picking an arbitrary value) means that we
have cut down the time it takes to pull the data from cache when it is
needed by 20 cycles. As for the value pointed to by tx_pkt, since this is a
packet the app has just been working on, it's almost certainly already in
l1/l2 cache.
>
> > 2) changed the code to prefetch earlier - in effect to prefetch one mbuf ahead. The original code prefetched the mbuf to be freed as soon as it started processing the mbuf to replace it. Instead now, every time we calculate what the next mbuf position is going to be we prefetch the mbuf in that position (i.e. the mbuf pool pointer we are going to free the mbuf to), even while we are still updating the previous mbuf slot on the ring. This gives the prefetch much more time to resolve and get the data we need in the cache before we need it.
> >
> Again, early isn't necessecarily better, as it just means more time for the data
> in cache to get victimized. It seems like it would be better to prefetch the
> tx_pkts data a few cache lines ahead.
>
> Neil
Basically it all comes down to measured performance - working with
prefetches is not an exactly science, sadly. I've just re-run a quick sanity
test on this patch in the sequence. Running with testpmd on a single core,
40G of small packet input, I see considerable performance increases. What
I've run is:
* testpmd with a single forwarding core, defaults - which means slow path RX
+ slow path TX (i.e. this code): Performance with this patch increases by
almost 8%
* testpmd with a single forwarding core, defaults + rxfreet=32 - which means
vector RX path + slow path TX (again, this code path): Performance
increases by over 18%.
Given these numbers, the prefetching seems better this way. Perhaps you
could run some tests yourself and see if you see a similar performance delta
(or perhaps there are other scenarios I'm missing here)?
Regards,
/Bruce
next prev parent reply other threads:[~2014-09-18 13:30 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-17 10:01 [dpdk-dev] [PATCH 0/5] Mbuf Structure Rework, part 3 Bruce Richardson
2014-09-17 10:01 ` [dpdk-dev] [PATCH 1/5] mbuf: ensure next pointer is set to null on free Bruce Richardson
2014-09-17 10:01 ` [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx perf Bruce Richardson
2014-09-17 15:21 ` Neil Horman
2014-09-17 15:35 ` Richardson, Bruce
2014-09-17 17:59 ` Neil Horman
2014-09-18 13:36 ` Bruce Richardson [this message]
2014-09-18 15:29 ` Neil Horman
2014-09-18 15:42 ` Bruce Richardson
2014-09-18 17:56 ` Neil Horman
2014-09-17 10:01 ` [dpdk-dev] [PATCH 3/5] testpmd: Change rxfreet default to 32 Bruce Richardson
2014-09-17 15:29 ` Neil Horman
2014-09-18 15:53 ` Richardson, Bruce
2014-09-18 17:13 ` Thomas Monjalon
2014-09-18 18:08 ` Neil Horman
2014-09-19 9:18 ` Richardson, Bruce
2014-09-19 10:24 ` Neil Horman
2014-09-19 10:28 ` Richardson, Bruce
2014-09-19 15:18 ` Neil Horman
2014-09-18 18:03 ` Neil Horman
2014-09-17 10:01 ` [dpdk-dev] [PATCH 4/5] mbuf: add userdata pointer field Bruce Richardson
2014-09-17 15:35 ` Neil Horman
2014-09-17 16:02 ` Richardson, Bruce
2014-09-17 18:29 ` Neil Horman
2014-09-17 10:01 ` [dpdk-dev] [PATCH 5/5] mbuf: Add in second vlan tag field to mbuf Bruce Richardson
2014-09-17 20:46 ` Stephen Hemminger
2014-09-23 11:08 ` [dpdk-dev] [PATCH v2 0/5] Mbuf Structure Rework, part 3 Bruce Richardson
2014-09-23 11:08 ` [dpdk-dev] [PATCH v2 1/5] mbuf: ensure next pointer is set to null on free Bruce Richardson
2014-09-23 11:08 ` [dpdk-dev] [PATCH v2 2/5] ixgbe: add prefetch to improve slow-path tx perf Bruce Richardson
2014-09-23 11:08 ` [dpdk-dev] [PATCH v2 3/5] testpmd: Change rxfreet default to 32 Bruce Richardson
2014-09-23 17:02 ` Neil Horman
2014-09-24 9:03 ` Richardson, Bruce
2014-09-24 10:05 ` Neil Horman
2014-11-07 12:30 ` Thomas Monjalon
2014-11-07 13:49 ` Bruce Richardson
2014-09-23 11:08 ` [dpdk-dev] [PATCH v2 4/5] mbuf: add userdata pointer field Bruce Richardson
2014-09-23 11:08 ` [dpdk-dev] [PATCH v2 5/5] mbuf: switch vlan_tci and reserved2 fields Bruce Richardson
2014-09-29 15:58 ` [dpdk-dev] [PATCH v2 0/5] Mbuf Structure Rework, part 3 De Lara Guarch, Pablo
2014-10-08 12:31 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140918133613.GA7208@BRICHA3-MOBL \
--to=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).