From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by dpdk.org (Postfix) with ESMTP id 921B2C3EA for ; Mon, 11 May 2015 11:13:03 +0200 (CEST) Received: by wiun10 with SMTP id n10so88497942wiu.1 for ; Mon, 11 May 2015 02:13:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=AULp3u+xMjdWXZjVfXt0R9dDNLIsc8xqWvzjR4xwT4g=; b=YOAOCAIh8+2SV8cbt5FTijvTg2r4rgUGeM4cOtw90f962Jf3h6IVdn2Xfl++JW6q+R rXTJLUF0tCzAH8SLsmapOwo4U3V9ITmLR6C3Dk/DgQJWdAe20NEs7D3siDpcKwgfkjS8 +Q6kJ2GU7xztOmz80cdgBY5fICWMMxvJF46jq3bi9heXSKEeIUHUYoozOBrQTl+989hL jdRoF24Xua4S7BBgtQPfI9o8ECHIvRG1hZJ6RVdmgKCdpl3bf+NUJDbHZCodvw3PMcFf K5Nx7CcY0VIs7CMHH0Vhxe4/vC6aGlg1/CLJVfW/sTLKJc+Os8HOFg3z5HjwZ4+t6LgR d+Mg== MIME-Version: 1.0 X-Received: by 10.180.78.135 with SMTP id b7mr17342915wix.65.1431335583386; Mon, 11 May 2015 02:13:03 -0700 (PDT) Sender: lukego@gmail.com Received: by 10.27.134.198 with HTTP; Mon, 11 May 2015 02:13:03 -0700 (PDT) In-Reply-To: <554FF482.9080103@net.in.tum.de> References: <554FF482.9080103@net.in.tum.de> Date: Mon, 11 May 2015 11:13:03 +0200 X-Google-Sender-Auth: cFg-7X2QU4Fj-F5EaPx95EBqWrQ Message-ID: From: Luke Gorrie To: Paul Emmerich Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] TX performance regression caused by the mbuf cachline split X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 May 2015 09:13:03 -0000 Hi Paul, On 11 May 2015 at 02:14, Paul Emmerich wrote: > Another possible solution would be a more dynamic approach to mbufs: Let me suggest a slightly more extreme idea for your consideration. This method can easily do > 100 Mpps with one very lightly loaded core. I don't know if it works for your application or not but I share it just in case. Background: Load generators are specialist applications and can benefit from specialist transmit mechanisms. You can instruct the NIC to send up to 32K packets with one operation: load the address of a descriptor list into the TDBA register (Transmit Descriptor Base Address). The descriptor list is a simple series of 64-bit values: addr0, flags0, addr1, flags1, ... etc. It is easy to construct by hand. The NIC can also be made to play the packets in a loop. You just have to periodically reset the DMA cursor to make all the packets valid again. That is a simple register poke: TDT = TDH-1. We do this routinely when we want to generate a large amount of traffic with few resources, typically when generating load using spare capacity of a device under test. (I have sample code but it is not based on DPDK.) If you want all of your packets to be unique then you have to be a bit more clever. For example you could poll to see the DMA progress: let half the packets be sent, then rewrite those while the other half are sent, and so on. Kind of like the way video games tracked the progress of the display scan beam to update parts of the frame buffer that were not being DMA'd. This method may impose other limitations that are not acceptable for your application of course. But if not then it can drastically reduce the number of instructions and cache footprint required to generate load. You don't have to touch mbufs or descriptors at all. You just update the payload and update the DMA register every millisecond or so. Cheers, -Luke