From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 33D052E8A for ; Tue, 20 May 2014 18:37:17 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 20 May 2014 09:32:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.98,875,1392192000"; d="scan'208";a="543665230" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by orsmga002.jf.intel.com with ESMTP; 20 May 2014 09:36:28 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.70]) by IRSMSX104.ger.corp.intel.com ([169.254.5.98]) with mapi id 14.03.0123.003; Tue, 20 May 2014 17:35:11 +0100 From: "Ananyev, Konstantin" To: Olivier MATZ , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] atomic: clarify use of memory barriers Thread-Index: AQHPdA8XFpR5Q78M6UqUBVjJHldsjZtJN2XggAAZSwCAAEhjMA== Date: Tue, 20 May 2014 16:35:10 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580EFA796F@IRSMSX105.ger.corp.intel.com> References: <1400578588-21137-1-git-send-email-olivier.matz@6wind.com> <2601191342CEEE43887BDE71AB9772580EFA776F@IRSMSX105.ger.corp.intel.com> <537B46B4.4000202@6wind.com> In-Reply-To: <537B46B4.4000202@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] atomic: clarify use of memory barriers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 May 2014 16:37:25 -0000 Hi Oliver, >- optimize some code to avoid a real memory barrier when not required (= timers, virtio, ...) That seems like a good thing to me. > - make the code more readable to distinguish between the 2 kinds of memor= y barrier. That part seems a bit misleading to me. rte_compiler_barier() - is a barrier just for a compiler, not for real cpu.= =20 It only guarantees that the compiler wouldn't reorder instructions across i= t while emitting the code. Looking at Intel Memory Ordering rules (Intel System PG, section 8.2): 1) Reads may be reordered with older writes to different locations but not = with older writes to the same location. So with the following fragment of code: int a; extern int *x, *y; L0: *y =3D 0; rte_compiler_barrier(); L1: a =3D *x; There is no guarantee that store at L0 will always be finished before load = at L1. Which means to me that rte_smp_mb() can't be identical to compiler_barrier,= but should be real 'mfence' instruction instead. =20 2) Writes to memory are not reordered with other writes, with the following= exceptions: ... streaming stores (writes) executed with the non-temporal move instructio= ns (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD);=20 ... So with the following fragment of code: extern int *x; extern __128i a, *p; L0:=20 _mm_stream_si128( p, a); rte_compiler_barrier(); L1: *x =3D 0; There is no guarantee that store at L0 will always be finished before store= at L1. Which means to me that rte_smp_wmb() can't be identical to compiler_barrier= , but should be real 'sfence' instruction instead. =20 The only replacement that seems safe to me is: #define rte_smp_rmb() rte_compiler_barrier() But now, there seems a confusion: everyone has to remember that smp_mb() an= d smp_wmb() are 'real' fences, while smp_rmb() is not. That's why my suggestion was to simply keep using compiler_barrier() for al= l cases, when we don't need real fence. Thanks Konstantin -----Original Message----- From: Olivier MATZ [mailto:olivier.matz@6wind.com]=20 Sent: Tuesday, May 20, 2014 1:13 PM To: Ananyev, Konstantin; dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH] atomic: clarify use of memory barriers Hi Konstantin, Thank you for your review and feedback. On 05/20/2014 12:05 PM, Ananyev, Konstantin wrote: >> Note that on x86 CPUs, memory barriers between different cores can be gu= aranteed by a simple compiler barrier. > > I don't think this is totally correct. > Yes, for Intel cpus in many cases memory barrier could be avoided due to= nearly strict memory ordering. > Though there are few cases where reordering is possible and when fence in= structions would be needed. I tried to mimic the behavior of linux that differentiates *mb() from smp_*mb(), but I did too fast. In linux, we have [1]: smp_mb() =3D mb() =3D asm volatile("mfence":::"memory") smp_rmb() =3D compiler_barrier() smp_wmb() =3D compiler_barrier() At least this should fixed in the patch. By the way, just for reference, the idea of the patch came from a discussion we had on the list [2]. > For me: > +#define rte_smp_rmb() rte_compiler_barrier() > Seems a bit misleading, as there is no real fence. > So I suggest we keep rte_compiler_barrier() naming and usage. The objectives of the patch (which was probably not explained very clearly in the commit log) were: - make the code more readable to distinguish between the 2 kinds of memory barrier. - optimize some code to avoid a real memory barrier when not required (timers, virtio, ...) Having a compiler barrier in place of a memory barrier in the code does not really help to understand what the developper wanted to do. In the current code we can see that the use of rte_compiler_barrier() is ambiguous, as it need a comment to clarify the situation: rte_compiler_barrier(); /* rmb */ Don't you think we could fix the patch but keep its logic? Regards, Olivier [1] http://lxr.free-electrons.com/source/arch/x86/include/asm/barrier.h#L81 [2] http://dpdk.org/ml/archives/dev/2014-March/001741.html