From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 9DC5AD4A6; Thu, 19 Apr 2018 14:10:03 +0200 (CEST) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Apr 2018 05:10:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,469,1517904000"; d="scan'208";a="221676961" Received: from bricha3-mobl.ger.corp.intel.com ([10.237.221.51]) by fmsmga005.fm.intel.com with SMTP; 19 Apr 2018 05:09:59 -0700 Received: by (sSMTP sendmail emulation); Thu, 19 Apr 2018 13:09:59 +0100 Date: Thu, 19 Apr 2018 13:09:58 +0100 From: Bruce Richardson To: Pavan Nikhilesh Cc: Ferruh Yigit , thomas@monjalon.net, jerin.jacob@caviumnetworks.com, techboard@dpdk.org, dev@dpdk.org Message-ID: <20180419120958.GC11352@bricha3-MOBL.ger.corp.intel.com> References: <20180418153035.5972-1-pbhagavatula@caviumnetworks.com> <20180418175505.GA17954@ltp-pvn> <291a43da-6c2d-f65b-374d-206a0f674db6@intel.com> <20180419092051.GA8072@ltp-pvn> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180419092051.GA8072@ltp-pvn> Organization: Intel Research and Development Ireland Ltd. User-Agent: Mutt/1.9.4 (2018-02-28) Subject: Re: [dpdk-dev] [PATCH 1/2] eal: add macro to mark variable mostly read only X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2018 12:10:04 -0000 On Thu, Apr 19, 2018 at 02:50:52PM +0530, Pavan Nikhilesh wrote: > On Wed, Apr 18, 2018 at 07:03:06PM +0100, Ferruh Yigit wrote: > > On 4/18/2018 6:55 PM, Pavan Nikhilesh wrote: > > > On Wed, Apr 18, 2018 at 06:43:11PM +0100, Ferruh Yigit wrote: > > >> On 4/18/2018 4:30 PM, Pavan Nikhilesh wrote: > > >>> Add macro to mark a variable to be mostly read only and place it in a > > >>> separate section. > > >>> > > >>> Signed-off-by: Pavan Nikhilesh > > >>> --- > > >>> > > >>> Group together mostly read only data to avoid cacheline bouncing, also > > >>> useful for auditing purposes. > > >>> > > >>> lib/librte_eal/common/include/rte_common.h | 5 +++++ > > >>> 1 file changed, 5 insertions(+) > > >>> > > >>> diff --git a/lib/librte_eal/common/include/rte_common.h b/lib/librte_eal/common/include/rte_common.h > > >>> index 6c5bc5a76..f2ff2e9e6 100644 > > >>> --- a/lib/librte_eal/common/include/rte_common.h > > >>> +++ b/lib/librte_eal/common/include/rte_common.h > > >>> @@ -114,6 +114,11 @@ static void __attribute__((constructor(prio), used)) func(void) > > >>> */ > > >>> #define __rte_noinline __attribute__((noinline)) > > >>> > > >>> +/** > > >>> + * Mark a variable to be mostly read only and place it in a separate section. > > >>> + */ > > >>> +#define __rte_read_mostly __attribute__((__section__(".read_mostly"))) > > >> > > > > > > Hi Ferruh, > > > > > >> Hi Pavan, > > >> > > >> Is the section ".read_mostly" treated specially [1] or is this just for grouping > > >> symbols together (to reduce cacheline bouncing)? > > > > > > The section .read_mostly is not treated specially it's just for grouping > > > symbols. > > > > I have encounter with a blog post claiming this is not working: > > > > " > > The problem with the above approach is that once all the __read_mostly variables > > are grouped into one section, the remaining "non-read-mostly" variables end-up > > together too. This increases the chances that two frequently used elements (in > > the "non-read-mostly" region) will end-up competing for the same position (or > > cache-line, the basic fixed-sized block for memory<-->cache transfers) in the > > cache. Thus frequent accesses will cause excessive cache thrashing on that > > particular cache-line thereby degrading the overall system performance. > > " > > > > https://thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.html > > > > The author is concerned about processors with less cache set-associativity, > almost all modern processors have >= 16 way set associativity. And the above > issue can happen even now when two frequently written global variables are > placed next to each other. > > Currently, we don't have much control over how the global variables are > arranged and a single addition/deletion to the global variables causes change > in alignment and in some cases minor performance regression. > Tagging them as __read_mostly we can easily identify the alignment changes > across builds by comparing map files global variable section. > > I have verified the patch-set on arm64 (16-way set-associative) and didn't > notice any performance regression. > Did you have a chance to verify if there is any performance regression? > Is there a performance improvement? It's seems a relatively strange change to me, so I'd like to know that it really improves performance in test cases. /Bruce