From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f180.google.com (mail-yk0-f180.google.com [209.85.160.180]) by dpdk.org (Postfix) with ESMTP id 24E9C5AB1 for ; Thu, 22 Jan 2015 20:36:27 +0100 (CET) Received: by mail-yk0-f180.google.com with SMTP id 131so1503240ykp.11 for ; Thu, 22 Jan 2015 11:36:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=JdlFmmk+cNRYDfIID2mtQ7XHFON3icQifAzSuKQG6GM=; b=VkMP+qS29wdNWmPOJ+QzHAov5hEmhJ+10hc6DoTEtsPzGpRYxnCQ8EEtsFD6WsKJEF SzGOq6Kthj1eqHqHEQ4ICyzGx4eocvIU/hqklvEBbA0T1ChRYqxBONFSnv9om4OJMdwS Lxtr2dfP7IjSO+Ao072sygFv27IfS9XL+UiZaoFvzSTsM9fPJ3clu3YeatMTC2rRgN1C QOzMEjKspVA26b3KXOM1DDIynsjLh0qsAH1Sfu3csMLVt7NzAIlaAgLiygX16Ff3SxMh ne0Eq3tmxmFENuxNMST9r40a989hSRLAVcKqA0N5tyhxsapCAj9imb3GrAGZ5Fj6DskR /nCA== X-Gm-Message-State: ALoCoQkDyTAmSaWt12lTavLq5PSPJo7xZmVaRi1HzhDd3u0yyWOSmhNYnkZBHLKj0I+rXI4+tcYM MIME-Version: 1.0 X-Received: by 10.170.39.70 with SMTP id 67mr1921810ykh.36.1421955386543; Thu, 22 Jan 2015 11:36:26 -0800 (PST) Received: by 10.170.54.73 with HTTP; Thu, 22 Jan 2015 11:36:26 -0800 (PST) In-Reply-To: References: <20150119130221.GB21790@hmsreliant.think-freely.org> <20150120151118.GD18449@hmsreliant.think-freely.org> <20150120161453.GA5316@bricha3-MOBL3> <54BF9D59.7070104@bisdn.de> <20150121130234.GB10756@bricha3-MOBL3> <54BFA7D5.7020106@bisdn.de> <20150121132620.GC10756@bricha3-MOBL3> <20150121114947.0753ae87@urahara> <20150121205404.GB32617@hmsreliant.think-freely.org> <53D2253B-DE20-486E-ADF0-DA02AAB1EF35@netgate.com> Date: Thu, 22 Jan 2015 13:36:26 -0600 Message-ID: From: Jay Rolette To: Luke Gorrie Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jan 2015 19:36:27 -0000 On Thu, Jan 22, 2015 at 12:27 PM, Luke Gorrie wrote: > On 22 January 2015 at 14:29, Jay Rolette wrote: > >> Microseconds matter. Scaling up to 100GbE, nanoseconds matter. >> > > True. Is there a cut-off point though? > There are always engineering trade-offs that have to be made. If I'm optimizing something today, I'm certainly not starting at something that takes 1ns for an app that is doing L4-7 processing. It's all about profiling and figuring out where the bottlenecks are. For past networking products I've built, there was a lot of traffic that the software didn't have to do much to. Minimal L2/L3 checks, then forward the packet. It didn't even have to parse the headers because that was offloaded on an FPGA. The only way to make those packets faster was to turn them around in the FPGA and not send them to the CPU at all. That change improved small packet performance by ~30%. That was on high-end network processors that are significantly faster than Intel processors for packet handling. It seems to be a strange thing when you realize that just getting the packets into the CPU is expensive, nevermind what you do with them after that. Does one nanosecond matter? > You just have to be careful when talking about things like a nanosecond. It's sounds really small, but IPG for a 10G link is only 9.6ns. It's all relative. AVX512 will fit a 64-byte packet in one register and move that to or from > memory with one instruction. L1/L2 cache bandwidth per server is growing on > a double-exponential curve (both bandwidth per core and cores per CPU). I > wonder if moving data around in cache will soon be too cheap for us to > justify worrying about. > Adding cores helps with aggregate performance, but doesn't really help with latency on a single packet. That said, I'll take advantage of anything I can from the hardware to either let me scale up how much traffic I can handle or the amount of features I can add at the same performance level! Jay