From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id E77EE8DA4 for ; Wed, 28 Oct 2015 12:15:41 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP; 28 Oct 2015 04:15:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,209,1444719600"; d="scan'208";a="673459210" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.208.61]) by orsmga003.jf.intel.com with SMTP; 28 Oct 2015 04:15:07 -0700 Received: by (sSMTP sendmail emulation); Wed, 28 Oct 2015 11:15:07 +0025 Date: Wed, 28 Oct 2015 11:15:06 +0000 From: Bruce Richardson To: "Polehn, Mike A" Message-ID: <20151028111506.GA11220@bricha3-MOBL3> References: <745DB4B8861F8E4B9849C970520ABBF14974C1EB@ORSMSX102.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <745DB4B8861F8E4B9849C970520ABBF14974C1EB@ORSMSX102.amr.corp.intel.com> Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [Patch 1/2] i40e RX Bulk Alloc: Larger list size (33 to 128) throughput optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Oct 2015 11:15:42 -0000 On Tue, Oct 27, 2015 at 08:56:36PM +0000, Polehn, Mike A wrote: > Combined 2 subroutines of code into one subroutine with one read operation followed by > buffer allocate and load loop. > > Eliminated the staging queue and subroutine, which removed extra pointer list movements > and reduced number of active variable cache pages during for call. > > Reduced queue position variables to just 2, the next read point and last NIC RX descriptor > position, also changed to allowing NIC descriptor table to not always need to be filled. > > Removed NIC register update write from per loop to one per driver write call to minimize CPU > stalls waiting of multiple SMB synchronization points and for earlier NIC register writes that > often take large cycle counts to complete. For example with an input packet list of 33, with > the default loops size of 32, the second NIC register write will occur just after RX processing > for just 1 packet, resulting in large CPU stall time. > > Eliminated initial rx packet present test before rx processing loop that also checks, since less > free time is generally available when packets are present than when not processing any input > packets. > > Used some standard variables to help reduce overhead of non-standard variable sizes. > > Reduced number of variables, reordered variable structure to put most active variables in > first cache line, better utilize memory bytes inside cache line, and reduced active cache line > count to 1 cache line during processing call. Other RX subroutine sets might still use more > than 1 variable cache line. > > Signed-off-by: Mike A. Polehn Hi Mike, Thanks for the contribution. However, this patch seems to contain a lot of changes to the i40e code. Since you have multiple optimizations listed above in the description it would be good if you could submit this patch as multiple patches, one for each optimization. That would make it far easier for us to review and test. The same would apply to patch 2 of this set, which looks to have multiple changes in a single patch too. Also, each patch should have a unique title stating very briefly what the one change in that patch is. Regards, /Bruce