From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9C567A0503; Fri, 6 May 2022 17:28:49 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4584B40395; Fri, 6 May 2022 17:28:49 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 33B204014F for ; Fri, 6 May 2022 17:28:46 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651850927; x=1683386927; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=nNBYluu/zXW1OIq5vfMIPCX9+LgutrlSSCv5979gVwo=; b=AXMx2ooPIkj0XuSRzvRXbuOeC19FSDNUJ3BgkqlbbFx2qgBvTIc9Vrzl yZpLRNsu4oTQ2pwJ2mNxLNT5eoq/+MDtqEuaLXYcVnnTkvsQ+zw4WJLWP bkcyf3nfoejaHLWT/fEapQecejYe7xOO3z2jgK0XBWC38gSnSqGsPEO7c aL3PNO1gYMq9mb141y7ck8acIoR9bYJS6wm+w0VMgB6yOW8h6n3DYlOzP 6QUCkzOeWC1ehc0OV4p6qPigSSdeOpMnv6e+EP2pDKDFSsjxIoU7Nh0xz yCsOT3DPY4t+xhic7cTsHkhmjPV5rdCWSXb8lokMFtS9Ha3r1aEumdegj Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10339"; a="248399343" X-IronPort-AV: E=Sophos;i="5.91,203,1647327600"; d="scan'208";a="248399343" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 May 2022 08:28:46 -0700 X-IronPort-AV: E=Sophos;i="5.91,203,1647327600"; d="scan'208";a="621867926" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.0.201]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 06 May 2022 08:28:44 -0700 Date: Fri, 6 May 2022 16:28:41 +0100 From: Bruce Richardson To: Honnappa Nagarahalli Cc: Tyler Retzlaff , Stephen Hemminger , "dev@dpdk.org" , nd , "Ananyev, Konstantin" Subject: Re: [RFC] rte_ring: don't use always inline Message-ID: References: <20220505224547.394253-1-stephen@networkplumber.org> <20220506072434.GA19777@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, May 06, 2022 at 03:12:32PM +0000, Honnappa Nagarahalli wrote: > > > > > On Thu, May 05, 2022 at 10:59:32PM +0000, Honnappa Nagarahalli wrote: > > > Thanks Stephen. Do you see any performance difference with this change? > > > > as a matter of due diligence i think a comparison should be made just to be > > confident nothing is regressing. > > > > i support this change in principal since it is generally accepted best practice to > > not force inlining since it can remove more valuable optimizations that the > > compiler may make that the human can't see. > > the optimizations may vary depending on compiler implementation. > > > > force inlining should be used as a targeted measure rather than blanket on > > every function and when in use probably needs to be periodically reviewed and > > potentially removed as the code / compiler evolves. > > > > also one other consideration is the impact of a particular compiler's force > > inlining intrinsic/builtin is that it may permit inlining of functions when not > > declared in a header. i.e. a function from one library may be able to be inlined > > to another binary as a link time optimization. although everything here is in a > > header so it's a bit moot. > > > > i'd like to see this change go in if possible. > Like Stephen mentions below, I am sure we will have a for and against discussion here. > As a DPDK community we have put performance front and center, I would prefer to go down that route first. > I ran some initial numbers with this patch, and the very quick summary of what I've seen so far: * Unit tests show no major differences, and while it depends on what specific number you are interested in, most seem within margin of error. * Within unit tests, the one number I mostly look at when considering inlining is the "empty poll" cost, since I believe we should look to keep that as close to zero as possible. In the past I've seen that number jump from 3 cycles to 12 cycles due to missed inlining. In this case, it seem fine. * Ran a quick test with the eventdev_pipeline example app using SW eventdev, as a test of an actual app which is fairly ring-heavy [used 8 workers with 1000 cycles per packet hop]. (Thanks to Harry vH for this suggestion of a workload) * GCC 8 build - no difference observed * GCC 11 build - approx 2% perf reduction observed As I said, these are just some quick rough numbers, and I'll try and get some more numbers on a couple of different platforms, see if the small reduction seen is consistent or not. I may also test a few differnet combinations/options in the eventdev test. It would be good if others also tested on a few platforms available to them. /Bruce