From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.mhcomputing.net (master.mhcomputing.net [74.208.46.186]) by dpdk.org (Postfix) with ESMTP id 4EBC55A1D for ; Mon, 23 Feb 2015 22:17:25 +0100 (CET) Received: by mail.mhcomputing.net (Postfix, from userid 1000) id B4E9380C036; Mon, 23 Feb 2015 13:16:45 -0800 (PST) Date: Mon, 23 Feb 2015 13:16:45 -0800 From: Matthew Hall To: Matt Laswell Message-ID: <20150223211645.GB20766@mhcomputing.net> References: <3ABAA9DB-3F71-44D4-9C46-22933F9F30F0@mhcomputing.net> <20150222160204.20816910@urahara> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "" Subject: Re: [dpdk-dev] Appropriate DPDK data structures for TCP sockets X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2015 21:17:25 -0000 On Mon, Feb 23, 2015 at 08:48:57AM -0600, Matt Laswell wrote: > Apologies in advance for likely being a bit long-winded. Long winded is great, helps me get context. > First, you really need to take cache performance into account when you're > choosing a data structure. Something like a balanced tree can seem awfully > appealing at first blush Agreed. I did some amount of DPDK stuff before but without TCP. This is why I was figuring a packet-hash is better than a tree. > Second, rather than synchronizing (perhaps with locks, perhaps with > lockless data structures), it's often beneficial to create multiple > threads, each of which holds a fraction of your connection tracking data. Yes, I REALLY REALLY REALLY wanted to do RSS. But the virtio-net and other VM's don't support RSS, unlike the classic PCIe NIC's. In order to get the community to use my app I have to give them a "batteries included" environment, where the system can still work even with no RSS. > Third, it's very worthwhile to have a cache for the most recently accessed > connection. First, because network traffic is bursty, and you'll > frequently see multiple packets from the same connection in succession. > Second, because it can make life easier for your application code. If you > have multiple places that need to access connection data, you don't have to > worry so much about the cost of repeated searches. Again, this may or may > not matter for your particular application. But for ones I've worked on, > it's been a win. Yes, this sounds like a really good idea. One advantage in my product, I am only doing TCP Syslog, so I don't have an arbitrary zillion connections like FW or IPS would want. I could cap it at something like 8192 or 16384 and be good enough for some time until a better solution is worked out. I could make some capped array or linked list of the X most recent ones for cheap access. It's just socket pointers so it doesn't hardly cost anything to copy a couple pointers into a cache and quickly invalidate when the connection closes. > Anyway, as predicted, this post has gone far too long for a Monday > morning. Regardless, I hope you found it useful. This was great. Thank you! Matthew.