From: Paul Emmerich <emmericp@net.in.tum.de>
To: dev@dpdk.org
Subject: [dpdk-dev] TX performance regression caused by the mbuf cachline split
Date: Mon, 11 May 2015 02:14:58 +0200 [thread overview]
Message-ID: <554FF482.9080103@net.in.tum.de> (raw)
Hi,
this is a follow-up to my post from 3 weeks ago [1]. I'm starting a new
thread here since I now got a completely new test setup for improved
reproducibility.
Background for anyone that didn't catch my last post:
I'm investigating a performance regression in my packet generator [2]
that occurs since I tried to upgrade from DPDK 1.7.1 to 1.8 or 2.0. DPDK
1.7.1 is about 25% faster than 2.0 in my application.
I suspected that this is due to the new 2-cacheline mbufs, which I now
confirmed with a bisect.
My old test setup was based on the l2fwd example and required an
external packet generator and was kind of hard to reproduce.
I built a simple tx benchmark application that simply sends nonsensical
packets with a sequence number as fast as possible on two ports with a
single single core. You can download the benchmark app at [3].
Hardware setup:
CPU: E5-2620 v3 underclocked to 1.2 GHz
RAM: 4x 8 GB 1866 MHz DDR4 memory
NIC: X540-T2
Baseline test results:
DPDK simple tx full-featured tx
1.7.1 14.1 Mpps 10.7 Mpps
2.0.0 11.0 Mpps 9.3 Mpps
DPDK 1.7.1 is 28%/15% faster than 2.0 with simple/full-featured tx in
this benchmark.
I then did a few runs of git bisect to identify commits that caused a
significant drop in performance. You can find the script that I used to
quickly test the performance of a version at [4].
Commit simple full-featured
7869536f3f8edace05043be6f322b835702b201c 13.9 10.4
mbuf: flatten struct vlan_macip
The commit log explains that there is a perf regression and that it
cannot be avoided to be future-compatible. The log claims < 5% which is
consistent with my test results (old code is 4% faster). I guess that is
okay and cannot be avoided.
Commit simple full-featured
08b563ffb19d8baf59dd84200f25bc85031d18a7 12.8 10.4
mbuf: replace data pointer by an offset
This affects the simple tx path significantly.
This performance regression is probably simply be caused by the
(temporarily) disabled vector tx code that is mentioned in the commit
log. Not investigated further.
Commit simple full-featured
f867492346bd271742dd34974e9cf8ac55ddb869 10.7 9.1
mbuf: split mbuf across two cache lines.
This one is the real culprit.
The commit log does not mention any performance evaluations and a quick
scan of the mailing list also doesn't reveal any evaluations of the
impact of this change.
It looks like the main problem for tx is that the mempool pointer is in
the second cacheline.
I think the new mbuf structure is too bloated. It forces you to pay for
features that you don't need or don't want. I understand that it needs
to support all possible filters and offload features. But it's kind of
hard to justify 25% difference in performance for a framework that sets
performance above everything (Does it? I Picked that up from the
discussion in the "Beyond DPDK 2.0" thread).
I've counted 56 bytes in use in the first cacheline in v2.0.0.
Would it be possible to move the pool pointer and tx offload fields to
the first cacheline?
We would just need to free up 8 bytes. One candidate would be the seqn
field, does it really have to be in the first cache line? Another
candidate is the size of the ol_flags field? Do we really need 64 flags?
Sharing bits between rx and tx worked fine.
I naively tried to move the pool pointer into the first cache line in
the v2.0.0 tag and the performance actually decreased, I'm not yet sure
why this happens. There are probably assumptions about the cacheline
locations and prefetching in the code that would need to be adjusted.
Another possible solution would be a more dynamic approach to mbufs: the
mbuf struct could be made configurable to fit the requirements of the
application. This would probably require code generation or a lot of
ugly preprocessor hacks and add a lot of complexity to the code.
The question would be if DPDK really values performance above everything
else.
Paul
P.S.: I'm kind of disappointed by the lack of regression tests for the
performance. I think that such tests should be an integral part of a
framework with the explicit goal to be fast. For example, the main page
at dpdk.org claims a performance of "usually less than 80 cycles" for a
rx or tx operation. This claim is no longer true :(
Touching the layout of a core data structure like the mbuf shouldn't be
done without carefully evaluating the performance impacts.
But this discussion probably belongs in the "Beyond DPDK 2.0" thread.
P.P.S.: Benchmarking an rx-only application (e.g. traffic analysis)
would also be interesting, but that's not really on my todo list right
now. Mixed rx/tx like forwarding is also affected as discussed in my
last thread [1]).
[1] http://dpdk.org/ml/archives/dev/2015-April/016921.html
[2] https://github.com/emmericp/MoonGen
[3] https://github.com/emmericp/dpdk-tx-performance
[4] https://gist.github.com/emmericp/02c5885908c3cb5ac5b7
next reply other threads:[~2015-05-11 0:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-11 0:14 Paul Emmerich [this message]
2015-05-11 9:13 ` Luke Gorrie
2015-05-11 10:16 ` Paul Emmerich
2015-05-11 22:32 ` Paul Emmerich
2015-05-11 23:18 ` Paul Emmerich
2015-05-12 0:28 ` Marc Sune
2015-05-12 0:38 ` Marc Sune
2015-05-13 9:03 ` Ananyev, Konstantin
2016-02-15 19:15 ` Paul Emmerich
2016-02-19 12:31 ` Olivier MATZ
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=554FF482.9080103@net.in.tum.de \
--to=emmericp@net.in.tum.de \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).