From: jigsaw <jigsaw@gmail.com>
To: Bruce Richardson <bruce.richardson@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>
Subject: [dpdk-dev] LLC miss in librte_distributor
Date: Tue, 11 Nov 2014 17:37:52 +0200 [thread overview]
Message-ID: <CAHVfvh4+96-St8O=C9q6PvjwpbGVDBGL06Lhc5vZL0QzXfobYQ@mail.gmail.com> (raw)
Hi Bruce,
I noticed that librte_distributor has quite sever LLC miss problem when
running on 16 cores.
While on 8 cores, there's no such problem.
The test runs on a Intel(R) Xeon(R) CPU E5-2670, a SandyBridge with 32
cores on 2 sockets.
The test case is the distributor_perf_autotest, i.e.
in app/test/test_distributor_perf.c.
The test result is collected by command:
perf stat -e LLC-load-misses,LLC-loads,LLC-store-misses,LLC-stores ./test
-cff -n2 --no-huge
Note that test results show that with or without hugepage, the LCC miss
rate remains the same. So I will just show --no-huge config.
With 8 cores, the LLC miss rate is OK:
LLC-load-misses 26750
LLC-loads 93979233
LLC-store-misses 432263
LLC-stores 69954746
That is 0.028% of load miss and 0.62% of store miss.
With 16 cores, the LLC miss rate is very high:
LLC-load-misses 70263520
LLC-loads 143807657
LLC-store-misses 23115990
LLC-stores 63692854
That is 48.9% load miss and 36.3% store miss.
Most of the load miss happens at first line of rte_distributor_poll_pkt.
Most of the store miss happens at ... I don't know, because perf record on
LLC-store-misses brings down my machine.
It's not so straightforward to me how could this happen: 8 core fine, but
16 cores very bad.
My guess is that 16 cores bring in more QPI transaction between sockets?
Or 16 cores bring a different LLC access pattern?
So I tried to reduce the padding inside union rte_distributor_buffer from 3
cachelines to 1 cacheline.
- char pad[CACHE_LINE_SIZE*3];
+ char pad[CACHE_LINE_SIZE];
And it does have a obvious result:
LLC-load-misses 53159968
LLC-loads 167756282
LLC-store-misses 29012799
LLC-stores 63352541
Now it is 31.69% of load miss, and 45.79% of store miss.
It lows down the load miss rate, but raises the store miss rate.
Both numbers are still very high, sadly.
But the bright side is that it decrease the Time per burst and time per
packet.
The original version has:
=== Performance test of distributor ===
Time per burst: 8013
Time per packet: 250
And the patched ver has:
=== Performance test of distributor ===
Time per burst: 6834
Time per packet: 213
I tried a couple of other tricks. Such as adding more idle loops
in rte_distributor_get_pkt,
and making the rte_distributor_buffer thread_local to each worker core. But
none of this trick
has any noticeable outcome. These failures make me tend to believe the high
LLC miss rate
is related to QPI or NUMA. But my machine is not able to perf on uncore QPI
events so this
cannot be approved.
I cannot draw any conclusion or reveal the root cause after all. But I
suggest a further study on the performance bottleneck so as to find a good
solution.
thx &
rgds,
-qinglai
next reply other threads:[~2014-11-11 15:28 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-11 15:37 jigsaw [this message]
2014-11-12 8:37 ` jigsaw
2014-11-12 16:07 ` Bruce Richardson
2014-11-12 17:11 ` jigsaw
2014-11-13 15:17 ` jigsaw
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHVfvh4+96-St8O=C9q6PvjwpbGVDBGL06Lhc5vZL0QzXfobYQ@mail.gmail.com' \
--to=jigsaw@gmail.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).