From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E252FA0503; Fri, 6 May 2022 18:38:17 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8D6DA40395; Fri, 6 May 2022 18:38:17 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id 9F6034014F for ; Fri, 6 May 2022 18:38:16 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651855096; x=1683391096; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=wyuxo6LcnhNT2nOIFRjCnh1hApsIhu3PHFsrHxk5qts=; b=d3HWhvHo6lqVLzey9yu1WB7gkLS1xEFoYkUJsnlMPaS4L32tVitHMdiv FvGtj/A6hWmm4cysB5EDh/4yEpSVL3lqlhxpVdQDKmn8afaVR/JmzCL0u 1vrRRIxyr1BDdHRGgR79/haORO9VBtjjaBBhtWBWXGBnpd6Zz4e6LrAiv 1zsxYLzqtSimiNmnQySg0mWEYoSL05vPCzWtC/9Mo6z4AjZvf/G1Ygymh RBSn3OiNCibkuNKmaKA6Uoa0ySU3+Bx4/2we3tDsN731EonzAtS/9N/SX dATdnXlhx+dj2bB3hQu5BCs9hHAzZNLo5sI0M5FOPfAb4VsnevEqnnPYR g==; X-IronPort-AV: E=McAfee;i="6400,9594,10339"; a="354948342" X-IronPort-AV: E=Sophos;i="5.91,203,1647327600"; d="scan'208";a="354948342" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 May 2022 09:38:15 -0700 X-IronPort-AV: E=Sophos;i="5.91,203,1647327600"; d="scan'208";a="600626100" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.0.201]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 06 May 2022 09:38:13 -0700 Date: Fri, 6 May 2022 17:38:10 +0100 From: Bruce Richardson To: Stephen Hemminger Cc: Tyler Retzlaff , Honnappa Nagarahalli , "dev@dpdk.org" , nd Subject: Re: [RFC] rte_ring: don't use always inline Message-ID: References: <20220505224547.394253-1-stephen@networkplumber.org> <20220506072434.GA19777@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> <20220506084112.5bcc3000@hermes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220506084112.5bcc3000@hermes.local> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, May 06, 2022 at 08:41:12AM -0700, Stephen Hemminger wrote: > On Fri, 6 May 2022 00:24:34 -0700 > Tyler Retzlaff wrote: > > > On Thu, May 05, 2022 at 10:59:32PM +0000, Honnappa Nagarahalli wrote: > > > Thanks Stephen. Do you see any performance difference with this change? > > > > as a matter of due diligence i think a comparison should be made just > > to be confident nothing is regressing. > > > > i support this change in principal since it is generally accepted best > > practice to not force inlining since it can remove more valuable > > optimizations that the compiler may make that the human can't see. > > the optimizations may vary depending on compiler implementation. > > > > force inlining should be used as a targeted measure rather than blanket > > on every function and when in use probably needs to be periodically > > reviewed and potentially removed as the code / compiler evolves. > > > > also one other consideration is the impact of a particular compiler's > > force inlining intrinsic/builtin is that it may permit inlining of > > functions when not declared in a header. i.e. a function from one > > library may be able to be inlined to another binary as a link time > > optimization. although everything here is in a header so it's a bit > > moot. > > > > i'd like to see this change go in if possible. > > > > thanks > > > > > Some quick numbers from Gcc 10.3 and 2.7G AMD and ring_perf_autotest > > Looks like always inline is faster on second run but just inline is > slightly faster on first run. Maybe the icache gets loaded for second run, > but on first pass the smaller code size helps. > Interesting, I haven't observed that effect. The main trouble with the unit tests is that there are so many possible numbers to compare. We probably need to focus on a few to make sense of it all. Running on an "Intel(R) Xeon(R) Gold 6330N CPU @ 2.20GHz" with Ubuntu 20.04 (GCC 9.4), I scanned through the numbers looking for signicant percentage differences. This one caught my eye due to the %age difference: Basline value: sudo ./build-baseline/app/test/dpdk-test -- ring_perf_autotest | grep "SP/SC: single:" ... legacy APIs: SP/SC: single: 9.08 elem APIs: element size 16B: SP/SC: single: 11.13 With patch: sudo ./build/app/test/dpdk-test -- ring_perf_autotest | grep "SP/SC: single:" ... legacy APIs: SP/SC: single: 15.81 elem APIs: element size 16B: SP/SC: single: 21.14 So the SP/SC element enqueue cost went from 9-11 cycles to 15-21 cycles. Percentage-wise, this seems a lot, though in absolute terms it may not be. Therefore, I think we'll need a decent amount of varied testing before taking this patch. /Bruce