DPDK patches and discussions
 help / color / mirror / Atom feed
From: Avi Kivity <avi@cloudius-systems.com>
To: Matthew Hall <mhall@mhcomputing.net>,
	 Matt Laswell <laswell@infiniteio.com>
Cc: "<dev@dpdk.org>" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Appropriate DPDK data structures for TCP sockets
Date: Mon, 23 Feb 2015 23:51:46 +0200	[thread overview]
Message-ID: <54EBA0F2.6040409@cloudius-systems.com> (raw)
In-Reply-To: <20150223211645.GB20766@mhcomputing.net>

On 02/23/2015 11:16 PM, Matthew Hall wrote:
> On Mon, Feb 23, 2015 at 08:48:57AM -0600, Matt Laswell wrote:
>> Apologies in advance for likely being a bit long-winded.
> Long winded is great, helps me get context.
>
>> First, you really need to take cache performance into account when you're
>> choosing a data structure.  Something like a balanced tree can seem awfully
>> appealing at first blush
> Agreed. I did some amount of DPDK stuff before but without TCP. This is why I
> was figuring a packet-hash is better than a tree.
>
>> Second, rather than synchronizing (perhaps with locks, perhaps with
>> lockless data structures), it's often beneficial to create multiple
>> threads, each of which holds a fraction of your connection tracking data.
> Yes, I REALLY REALLY REALLY wanted to do RSS. But the virtio-net and other
> VM's don't support RSS, unlike the classic PCIe NIC's. In order to get the
> community to use my app I have to give them a "batteries included"
> environment, where the system can still work even with no RSS.

For an example of a tcp stack on top of dpdk please see seastar [1]. It 
supports hardware RSS, software RSS, or a combination (if the number of 
hardware queues is smaller than the number of cores).

>> Third, it's very worthwhile to have a cache for the most recently accessed
>> connection.  First, because network traffic is bursty, and you'll
>> frequently see multiple packets from the same connection in succession.
>> Second, because it can make life easier for your application code.  If you
>> have multiple places that need to access connection data, you don't have to
>> worry so much about the cost of repeated searches.  Again, this may or may
>> not matter for your particular application.  But for ones I've worked on,
>> it's been a win.
> Yes, this sounds like a really good idea. One advantage in my product, I am
> only doing TCP Syslog, so I don't have an arbitrary zillion connections like
> FW or IPS would want. I could cap it at something like 8192 or 16384 and be
> good enough for some time until a better solution is worked out.
>
> I could make some capped array or linked list of the X most recent ones for
> cheap access. It's just socket pointers so it doesn't hardly cost anything to
> copy a couple pointers into a cache and quickly invalidate when the connection
> closes.

A simple per-core hash table is sufficient in our experience.  Yes, you 
will take a cache miss, but it's not the end of the world.


[1] https://github.com/cloudius-systems/seastar

  reply	other threads:[~2015-02-23 21:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-22  0:38 Matthew Hall
2015-02-23  0:02 ` Stephen Hemminger
2015-02-23  4:50   ` Matthew Hall
2015-02-23 14:48     ` Matt Laswell
2015-02-23 21:16       ` Matthew Hall
2015-02-23 21:51         ` Avi Kivity [this message]
2015-03-13  6:41           ` Matthew Hall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54EBA0F2.6040409@cloudius-systems.com \
    --to=avi@cloudius-systems.com \
    --cc=dev@dpdk.org \
    --cc=laswell@infiniteio.com \
    --cc=mhall@mhcomputing.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).