From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 0EC80DE3 for ; Sat, 10 Jun 2017 10:16:49 +0200 (CEST) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Jun 2017 01:16:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,322,1493708400"; d="scan'208";a="96549500" Received: from irsmsx105.ger.corp.intel.com ([163.33.3.28]) by orsmga004.jf.intel.com with ESMTP; 10 Jun 2017 01:16:47 -0700 Received: from irsmsx156.ger.corp.intel.com (10.108.20.68) by irsmsx105.ger.corp.intel.com (163.33.3.28) with Microsoft SMTP Server (TLS) id 14.3.319.2; Sat, 10 Jun 2017 09:16:46 +0100 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.250]) by IRSMSX156.ger.corp.intel.com ([169.254.3.48]) with mapi id 14.03.0319.002; Sat, 10 Jun 2017 09:16:46 +0100 From: "Ananyev, Konstantin" To: Jerin Jacob , Stephen Hemminger CC: Yerden Zhumabekov , "Richardson, Bruce" , "Verkamp, Daniel" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation Thread-Index: AQHS29yUGhSpppN6a069aA11FRzec6ISDAhwgAAJcACAANJnAIADf1kAgAE3tZCAAB0sgIAAGAMwgASgk4CAAEsTgIAAA36AgAEITsA= Date: Sat, 10 Jun 2017 08:16:44 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583FB07AEC@IRSMSX109.ger.corp.intel.com> References: <20170602201213.51143-1-daniel.verkamp@intel.com> <2601191342CEEE43887BDE71AB9772583FB05190@IRSMSX109.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583FB05216@IRSMSX109.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583FB060FD@IRSMSX109.ger.corp.intel.com> <20170606124201.GA43772@bricha3-MOBL3.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583FB0644D@IRSMSX109.ger.corp.intel.com> <6908e71a-c849-83d3-e86d-745acf9f9491@sts.kz> <20170609101625.09075858@xeon-e3> <20170609172854.GA2828@jerin> In-Reply-To: <20170609172854.GA2828@jerin> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Jun 2017 08:16:50 -0000 > -----Original Message----- > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > Sent: Friday, June 9, 2017 6:29 PM > To: Stephen Hemminger > Cc: Yerden Zhumabekov ; Ananyev, Konstantin ; Richardson, Bruce > ; Verkamp, Daniel ;= dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation >=20 > -----Original Message----- > > Date: Fri, 9 Jun 2017 10:16:25 -0700 > > From: Stephen Hemminger > > To: Yerden Zhumabekov > > Cc: "Ananyev, Konstantin" , "Richardson, > > Bruce" , "Verkamp, Daniel" > > , "dev@dpdk.org" > > Subject: Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation > > > > On Fri, 9 Jun 2017 18:47:43 +0600 > > Yerden Zhumabekov wrote: > > > > > On 06.06.2017 19:19, Ananyev, Konstantin wrote: > > > > > > > >>>> Maybe there is some deeper reason for the >=3D 128-byte alignme= nt logic in rte_ring.h? > > > >>> Might be, would be good to hear opinion the author of that change= . > > > >> It gives improved performance for core-2-core transfer. > > > > You mean empty cache-line(s) after prod/cons, correct? > > > > That's ok but why we can't keep them and whole rte_ring aligned on = cache-line boundaries? > > > > Something like that: > > > > struct rte_ring { > > > > ... > > > > struct rte_ring_headtail prod __rte_cache_aligned; > > > > EMPTY_CACHE_LINE __rte_cache_aligned; > > > > struct rte_ring_headtail cons __rte_cache_aligned; > > > > EMPTY_CACHE_LINE __rte_cache_aligned; > > > > }; > > > > > > > > Konstantin > > > > > > > > > > I'm curious, can anyone explain, how does it actually affect > > > performance? Maybe we can utilize it application code? > > > > I think it is because on Intel CPU's the CPU will speculatively fetch a= djacent cache lines. > > If these cache lines change, then it will create false sharing. >=20 > I see. I think, In such cases it is better to abstract as conditional > compilation. The above logic has worst case cache memory > requirement if CPU is 128B CL and no speculative prefetch. I think this is already done for rte_ring.h: http://dpdk.org/browse/dpdk/tree/lib/librte_ring/rte_ring.h#n119 Konstantin