From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by dpdk.org (Postfix) with ESMTP id D04E6ADAD for ; Mon, 23 Feb 2015 22:51:49 +0100 (CET) Received: by mail-wi0-f182.google.com with SMTP id l15so20869113wiw.3 for ; Mon, 23 Feb 2015 13:51:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=MCOCzPvi8SDS5ZiNnbeZtia8i6etY7DIUiOXZ9TSEQg=; b=KqJDTSnB3IgXq1A47yt6Z3zodiZYyy0M1BJPXcNLcyBUOfKVxFHWBgqv0sZQmx6nAO tLwdzXhkZHS2Zn/13x03ihxZUduDPjm6nFNrnS4IU+PnytdVz1mju/gkxNO4qtLcr5x4 ezrVS2eNrqPkjLb3HEBz5K9stYbtJT/KkTNTQ0uxEN5b6/baUc9tKrEzmo/OWgLFFelN deeR5RazG4bDBiyKxmRFP9xexkFhS9tBh+c0rDEUR3qOoJ/qXDwmrrXeNAjNiwm7shNi So2NTbF9s7P+syMMidp2kiNHzW6IB60c8sikHKS06LumOtPQBz12nw11WxQ1QWaoXuNs RfBw== X-Gm-Message-State: ALoCoQk+QpXy4wdtvXxpT2PmAO2KgVgn9kn0Nkyk2NTgcchWZqzWMXNZxOT1K2ToDbOhiq+/fsW9 X-Received: by 10.180.12.233 with SMTP id b9mr24043242wic.49.1424728309640; Mon, 23 Feb 2015 13:51:49 -0800 (PST) Received: from [192.168.1.16] ([176.228.178.8]) by mx.google.com with ESMTPSA id f1sm17669294wij.2.2015.02.23.13.51.47 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Feb 2015 13:51:48 -0800 (PST) Message-ID: <54EBA0F2.6040409@cloudius-systems.com> Date: Mon, 23 Feb 2015 23:51:46 +0200 From: Avi Kivity User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Matthew Hall , Matt Laswell References: <3ABAA9DB-3F71-44D4-9C46-22933F9F30F0@mhcomputing.net> <20150222160204.20816910@urahara> <20150223211645.GB20766@mhcomputing.net> In-Reply-To: <20150223211645.GB20766@mhcomputing.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: "" Subject: Re: [dpdk-dev] Appropriate DPDK data structures for TCP sockets X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2015 21:51:50 -0000 On 02/23/2015 11:16 PM, Matthew Hall wrote: > On Mon, Feb 23, 2015 at 08:48:57AM -0600, Matt Laswell wrote: >> Apologies in advance for likely being a bit long-winded. > Long winded is great, helps me get context. > >> First, you really need to take cache performance into account when you're >> choosing a data structure. Something like a balanced tree can seem awfully >> appealing at first blush > Agreed. I did some amount of DPDK stuff before but without TCP. This is why I > was figuring a packet-hash is better than a tree. > >> Second, rather than synchronizing (perhaps with locks, perhaps with >> lockless data structures), it's often beneficial to create multiple >> threads, each of which holds a fraction of your connection tracking data. > Yes, I REALLY REALLY REALLY wanted to do RSS. But the virtio-net and other > VM's don't support RSS, unlike the classic PCIe NIC's. In order to get the > community to use my app I have to give them a "batteries included" > environment, where the system can still work even with no RSS. For an example of a tcp stack on top of dpdk please see seastar [1]. It supports hardware RSS, software RSS, or a combination (if the number of hardware queues is smaller than the number of cores). >> Third, it's very worthwhile to have a cache for the most recently accessed >> connection. First, because network traffic is bursty, and you'll >> frequently see multiple packets from the same connection in succession. >> Second, because it can make life easier for your application code. If you >> have multiple places that need to access connection data, you don't have to >> worry so much about the cost of repeated searches. Again, this may or may >> not matter for your particular application. But for ones I've worked on, >> it's been a win. > Yes, this sounds like a really good idea. One advantage in my product, I am > only doing TCP Syslog, so I don't have an arbitrary zillion connections like > FW or IPS would want. I could cap it at something like 8192 or 16384 and be > good enough for some time until a better solution is worked out. > > I could make some capped array or linked list of the X most recent ones for > cheap access. It's just socket pointers so it doesn't hardly cost anything to > copy a couple pointers into a cache and quickly invalidate when the connection > closes. A simple per-core hash table is sufficient in our experience. Yes, you will take a cache miss, but it's not the end of the world. [1] https://github.com/cloudius-systems/seastar