From: Matt Laswell <laswell@infiniteio.com>
To: Matthew Hall <mhall@mhcomputing.net>
Cc: "<dev@dpdk.org>" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Appropriate DPDK data structures for TCP sockets
Date: Mon, 23 Feb 2015 08:48:57 -0600 [thread overview]
Message-ID: <CA+GnqApB+nEQXD1TssOotXX+sV8DZ5aoDwQnEv9CoUhqwSckFA@mail.gmail.com> (raw)
In-Reply-To: <F543F60F-083D-4018-8387-062EAF8319D1@mhcomputing.net>
Hey Matthew,
I've mostly worked on stackless systems over the last few years, but I have
done a fair bit of work on high performance, highly scalable connection
tracking data structures. In that spirit, here are a few counterintuitive
insights I've gained over the years. Perhaps they'll be useful to you.
Apologies in advance for likely being a bit long-winded.
First, you really need to take cache performance into account when you're
choosing a data structure. Something like a balanced tree can seem awfully
appealing at first blush, either on its own or as a chaining mechanism for
a hash table. But the problem with trees is that there really isn't much
locality of reference in your memory use - every single step in your
descent ends up being a cache miss. This hurts you twice: once that you
end up stalled waiting for the next node in the tree to load from main
memory, and again when you have to reload whatever you pushed out of cache
to get it.
It's often better if, instead of a tree, you do linear search across arrays
of hash values. It's easy to size the array so that it is exactly one
cache line long, and you can generally do linear search of the whole thing
in less time than it takes to do a single cache line fill. If you find a
match, you can do full verification against the full tuple as needed.
Second, rather than synchronizing (perhaps with locks, perhaps with
lockless data structures), it's often beneficial to create multiple
threads, each of which holds a fraction of your connection tracking data.
Every connection belongs to a single one of these threads, selected perhaps
by hash or RSS value, and all packets from the connection go through that
single thread. This approach has a couple of advantages. First,
obviously, no slowdowns for synchronization. But, second, I've found that
when you are spreading packets from a single connection across many compute
elements, you're inevitably going to start putting packets out of order.
In many applications, this ultimately leads to some additional processing
to put things back in order, which gives away the performance gains you
achieved. Of course, this approach brings its own set of complexities, and
challenges for your application, and doesn't always spread the work as
efficiently across all of your cores. But it might be worth considering.
Third, it's very worthwhile to have a cache for the most recently accessed
connection. First, because network traffic is bursty, and you'll
frequently see multiple packets from the same connection in succession.
Second, because it can make life easier for your application code. If you
have multiple places that need to access connection data, you don't have to
worry so much about the cost of repeated searches. Again, this may or may
not matter for your particular application. But for ones I've worked on,
it's been a win.
Anyway, as predicted, this post has gone far too long for a Monday
morning. Regardless, I hope you found it useful. Let me know if you have
questions or comments.
--
Matt Laswell
infinite io, inc.
laswell@infiniteio.com
On Sun, Feb 22, 2015 at 10:50 PM, Matthew Hall <mhall@mhcomputing.net>
wrote:
>
> On Feb 22, 2015, at 4:02 PM, Stephen Hemminger <stephen@networkplumber.org>
> wrote:
> > Use userspace RCU? or BSD RB_TREE
>
> Thanks Stephen,
>
> I think the RB_TREE stuff is single threaded mostly.
>
> But user-space RCU looks quite good indeed, I didn't know somebody ported
> it out of the kernel. I'll check it out.
>
> Matthew.
next prev parent reply other threads:[~2015-02-23 14:48 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-22 0:38 Matthew Hall
2015-02-23 0:02 ` Stephen Hemminger
2015-02-23 4:50 ` Matthew Hall
2015-02-23 14:48 ` Matt Laswell [this message]
2015-02-23 21:16 ` Matthew Hall
2015-02-23 21:51 ` Avi Kivity
2015-03-13 6:41 ` Matthew Hall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+GnqApB+nEQXD1TssOotXX+sV8DZ5aoDwQnEv9CoUhqwSckFA@mail.gmail.com \
--to=laswell@infiniteio.com \
--cc=dev@dpdk.org \
--cc=mhall@mhcomputing.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).