From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f171.google.com (mail-ie0-f171.google.com [209.85.223.171]) by dpdk.org (Postfix) with ESMTP id A7D70ADE3 for ; Mon, 23 Feb 2015 15:48:58 +0100 (CET) Received: by iecvy18 with SMTP id vy18so23648532iec.13 for ; Mon, 23 Feb 2015 06:48:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=rahrDecaQHEJrhyo0TeGK8BG5iTpta9uNNKbF8qWRaM=; b=F47eNcH/AC1HDwJJqOEdd+TWY8kUYOc5u6NI8zB7o4SWbIbg+DMxUUaySmW606Gv0o oO4vv10uWz7RCtw/kJwwjpJv7fgZY//yzlud0tmCU2HyPn+MHG0pun2/Id12VarU6stP GQ77zMiMxwFeSKnk/QBaIbIUEmQD9VOT6brrl898xU06Tbp5WcIKWzOaw/I5kqQH1gVJ 3w5eQGjPGV43ZnQm7Dj5onoUDjvo6dwSQk3pZhruxgN4hRkdcv7Y/KZXkYEKBy0ZbVHv PkxO9Wi6uHKQshQiOYj/GQA75NTk+JgwvTzqY9Pi4eDyAK8wzQEVhMRaGFLA7OvqI3w3 WP1g== X-Gm-Message-State: ALoCoQkwgmn2X+dIW1revPmWiMYXb6PE7YJQfsD2QSlDUnJktK92Nb8FG2nVPgLlcvHHCA8Z5F19 MIME-Version: 1.0 X-Received: by 10.107.32.14 with SMTP id g14mr14654980iog.3.1424702937991; Mon, 23 Feb 2015 06:48:57 -0800 (PST) Received: by 10.36.58.135 with HTTP; Mon, 23 Feb 2015 06:48:57 -0800 (PST) In-Reply-To: References: <3ABAA9DB-3F71-44D4-9C46-22933F9F30F0@mhcomputing.net> <20150222160204.20816910@urahara> Date: Mon, 23 Feb 2015 08:48:57 -0600 Message-ID: From: Matt Laswell To: Matthew Hall Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "" Subject: Re: [dpdk-dev] Appropriate DPDK data structures for TCP sockets X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2015 14:48:59 -0000 Hey Matthew, I've mostly worked on stackless systems over the last few years, but I have done a fair bit of work on high performance, highly scalable connection tracking data structures. In that spirit, here are a few counterintuitive insights I've gained over the years. Perhaps they'll be useful to you. Apologies in advance for likely being a bit long-winded. First, you really need to take cache performance into account when you're choosing a data structure. Something like a balanced tree can seem awfully appealing at first blush, either on its own or as a chaining mechanism for a hash table. But the problem with trees is that there really isn't much locality of reference in your memory use - every single step in your descent ends up being a cache miss. This hurts you twice: once that you end up stalled waiting for the next node in the tree to load from main memory, and again when you have to reload whatever you pushed out of cache to get it. It's often better if, instead of a tree, you do linear search across arrays of hash values. It's easy to size the array so that it is exactly one cache line long, and you can generally do linear search of the whole thing in less time than it takes to do a single cache line fill. If you find a match, you can do full verification against the full tuple as needed. Second, rather than synchronizing (perhaps with locks, perhaps with lockless data structures), it's often beneficial to create multiple threads, each of which holds a fraction of your connection tracking data. Every connection belongs to a single one of these threads, selected perhaps by hash or RSS value, and all packets from the connection go through that single thread. This approach has a couple of advantages. First, obviously, no slowdowns for synchronization. But, second, I've found that when you are spreading packets from a single connection across many compute elements, you're inevitably going to start putting packets out of order. In many applications, this ultimately leads to some additional processing to put things back in order, which gives away the performance gains you achieved. Of course, this approach brings its own set of complexities, and challenges for your application, and doesn't always spread the work as efficiently across all of your cores. But it might be worth considering. Third, it's very worthwhile to have a cache for the most recently accessed connection. First, because network traffic is bursty, and you'll frequently see multiple packets from the same connection in succession. Second, because it can make life easier for your application code. If you have multiple places that need to access connection data, you don't have to worry so much about the cost of repeated searches. Again, this may or may not matter for your particular application. But for ones I've worked on, it's been a win. Anyway, as predicted, this post has gone far too long for a Monday morning. Regardless, I hope you found it useful. Let me know if you have questions or comments. -- Matt Laswell infinite io, inc. laswell@infiniteio.com On Sun, Feb 22, 2015 at 10:50 PM, Matthew Hall wrote: > > On Feb 22, 2015, at 4:02 PM, Stephen Hemminger > wrote: > > Use userspace RCU? or BSD RB_TREE > > Thanks Stephen, > > I think the RB_TREE stuff is single threaded mostly. > > But user-space RCU looks quite good indeed, I didn't know somebody ported > it out of the kernel. I'll check it out. > > Matthew.