From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-out1.informatik.tu-muenchen.de (mail-out1.informatik.tu-muenchen.de [131.159.0.8]) by dpdk.org (Postfix) with ESMTP id E94425323 for ; Mon, 15 Feb 2016 20:15:23 +0100 (CET) Received: from [127.0.0.1] (localhost [127.0.0.1]) by mail.net.in.tum.de (Postfix) with ESMTPSA id 9ECF2282F03B; Mon, 15 Feb 2016 20:15:23 +0100 (CET) To: "Ananyev, Konstantin" References: <554FF482.9080103@net.in.tum.de> <55512DE5.7010800@net.in.tum.de> <555138C7.5010002@net.in.tum.de> <2601191342CEEE43887BDE71AB9772582142EB46@irsmsx105.ger.corp.intel.com> From: Paul Emmerich Message-ID: <56C223CB.9080901@net.in.tum.de> Date: Mon, 15 Feb 2016 20:15:23 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <2601191342CEEE43887BDE71AB9772582142EB46@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] TX performance regression caused by the mbuf cachline split X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Feb 2016 19:15:24 -0000 Hi, here's a kind of late follow-up. I've only recently found the need (mostly for the better support of XL710 NICs (which I still dislike but people are using them...)) to seriously address DPDK 2.x support in MoonGen. On 13.05.15 11:03, Ananyev, Konstantin wrote: > Before start to discuss your findings, there is one thing in your test app that looks strange to me: > You use BATCH_SIZE==64 for TX packets, but your mempool cache_size==32. > This is not really a good choice, as it means that for each iteration your mempool cache will be exhausted, > and you'll endup doing ring_dequeue(). > I'd suggest you use something like ' 2 * BATCH_SIZE' for mempools cache size, > that should improve your numbers (at least it did to me). Thanks for pointing that out. However, my real app did not have this bug and I also saw the performance improvement there. > Though, I suppose that scenario might be improved without manual 'prefetch' - by reordering code a bit. > Below are 2 small patches, that introduce rte_pktmbuf_bulk_alloc() and modifies your test app to use it. > Could you give it a try and see would it help to close a gap between 1.7.1 and 2.0? > I don't have box with the same off-hand, but on my IVB box results are quite promising: > on 1.2 GHz for simple_tx there is practically no difference in results (-0.33%), > for full_tx the drop reduced to 2%. > That's comparing DPDK1.7.1+testpapp with cache_size=2*batch_size vs > latest DPDK+ testpapp with cache_size=2*batch_size+bulk_alloc. The bulk_alloc patch is great and helps. I'd love to see such a function in DPDK. I agree that this is a better solution than prefetching. I also can't see a difference with/without prefetching when using bulk alloc. Paul