From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 0F3EF58F4
 for <dev@dpdk.org>; Mon, 26 May 2014 15:57:19 +0200 (CEST)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga101.jf.intel.com with ESMTP; 26 May 2014 06:57:28 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.98,913,1392192000"; d="scan'208";a="517856688"
Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157])
 by orsmga001.jf.intel.com with ESMTP; 26 May 2014 06:57:27 -0700
Received: from irsmsx106.ger.corp.intel.com (163.33.3.31) by
 IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS)
 id 14.3.123.3; Mon, 26 May 2014 14:57:26 +0100
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.239]) by
 IRSMSX106.ger.corp.intel.com ([169.254.8.14]) with mapi id 14.03.0123.003;
 Mon, 26 May 2014 14:57:26 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH] atomic: clarify use of memory barriers
Thread-Index: AQHPdA8XFpR5Q78M6UqUBVjJHldsjZtJN2XggATxJICABLknkA==
Date: Mon, 26 May 2014 13:57:25 +0000
Message-ID: <2601191342CEEE43887BDE71AB9772580EFB0A95@IRSMSX105.ger.corp.intel.com>
References: <1400578588-21137-1-git-send-email-olivier.matz@6wind.com>
 <2601191342CEEE43887BDE71AB9772580EFA776F@IRSMSX105.ger.corp.intel.com>
 <537F56C3.3060503@6wind.com>
In-Reply-To: <537F56C3.3060503@6wind.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.182]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH] atomic: clarify use of memory barriers
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 26 May 2014 13:57:20 -0000


Hi Oliver,

>> So with the following fragment of code:
>> extern int *x;
>> extern  __128i a, *p;
>> L0:
>> _mm_stream_si128( p, a);
>> rte_compiler_barrier();
>> L1:
>> *x =3D 0;
>>
>> There is no guarantee that store at L0 will always be finished
>> before store at L1.

>This code fragment looks very similar to what is done in
>__rte_ring_sp_do_enqueue():
>
>    [...]
>     ENQUEUE_PTRS(); /* I expect it is converted to an SSE store */
>     rte_compiler_barrier();
>     [...]
>     r->prod.tail =3D prod_next;
>So, according to your previous explanation, I understand that
>this code would require a write memory barrier in place of the
>compiler barrier. Am I wrong?

No, right now compiler barrier is enough here.
ENQUEUE_PTRS() doesn't use Non-Temporal stores (MOVNT*), so write order sho=
uld be guaranteed.
Though, if in future we'll change ENQUEUE_PTRS() to use  non-tempral stores=
, we'll have to use sfence(or mfence).=20

>Moreover, if I understand well, a real wmb() is needed only if
>a SSE store is issued. But the programmer may not control that,
>it's the job of the compiler.

'Normal' SIMD writes are not reordered.
So it is ok for the compiler to use them if appropriate. =20

> > But now, there seems a confusion: everyone has to remember that
>> smp_mb() and smp_wmb() are 'real' fences, while smp_rmb() is not.
>> That's why my suggestion was to simply keep using compiler_barrier()
>> for all cases, when we don't need real fence.

>I'm not sure the programmer has to know which smp_*mb() is a real fence
>or not. He just expects that it generates the proper CPU instructions
>that guarantees the effectiveness of the memory barrier.

In most cases just a compiler barrier is enough, but there are few exceptio=
ns.
Always using fence instructions -  means introduce unnecessary slowdown for=
 cases, when order is guaranteed.
No using fences in cases, when they are needed - means introduce race windo=
w and possible data corruption.
That's why right now people can use either rte_compiler_barrier() or mb/rmb=
/wmb - whatever is appropriate for particular case.

Konstantin