From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <olivier.matz@6wind.com>
Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67])
 by dpdk.org (Postfix) with ESMTP id A1CA3593A
 for <dev@dpdk.org>; Fri, 23 May 2014 16:10:03 +0200 (CEST)
Received: from was59-1-82-226-113-214.fbx.proxad.net ([82.226.113.214]
 helo=[192.168.0.10])
 by mail.droids-corp.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128)
 (Exim 4.80) (envelope-from <olivier.matz@6wind.com>)
 id 1WnqCK-0005oS-F0; Fri, 23 May 2014 16:12:01 +0200
Message-ID: <537F56C3.3060503@6wind.com>
Date: Fri, 23 May 2014 16:10:11 +0200
From: Olivier MATZ <olivier.matz@6wind.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Icedove/24.4.0
MIME-Version: 1.0
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, 
 "dev@dpdk.org" <dev@dpdk.org>
References: <1400578588-21137-1-git-send-email-olivier.matz@6wind.com>
 <2601191342CEEE43887BDE71AB9772580EFA776F@IRSMSX105.ger.corp.intel.com>
In-Reply-To: <2601191342CEEE43887BDE71AB9772580EFA776F@IRSMSX105.ger.corp.intel.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH] atomic: clarify use of memory barriers
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 23 May 2014 14:10:03 -0000

Hi Konstantin,

Thanks for these code examples and explanations.

On 05/20/2014 06:35 PM, Ananyev, Konstantin wrote:
 > So with the following fragment of code:
 > extern int *x;
 > extern  __128i a, *p;
 > L0:
 > _mm_stream_si128( p, a);
 > rte_compiler_barrier();
 > L1:
 > *x = 0;
 >
 > There is no guarantee that store at L0 will always be finished
 > before store at L1.

This code fragment looks very similar to what is done in
__rte_ring_sp_do_enqueue():

     [...]
     ENQUEUE_PTRS(); /* I expect it is converted to an SSE store */
     rte_compiler_barrier();
     [...]
     r->prod.tail = prod_next;

So, according to your previous explanation, I understand that
this code would require a write memory barrier in place of the
compiler barrier. Am I wrong?

If it's correct, we are back on the initial question: in this kind
of code, if the programmer wants that all stores are issued before
setting the value of r->prod.tail. This is the definition of the
write memory barrier. So wouldn't be better that he explicitelly
calls rte_smp_wmb() instead of adding a compiler barrier because
he knows that it is sufficient on all currently supported CPUs?
Can we be sure that next Intel CPU generations will behave that
way in the future?

Moreover, if I understand well, a real wmb() is needed only if
a SSE store is issued. But the programmer may not control that,
it's the job of the compiler.

 > But now, there seems a confusion: everyone has to remember that
 > smp_mb() and smp_wmb() are 'real' fences, while smp_rmb() is not.
 > That's why my suggestion was to simply keep using compiler_barrier()
 > for all cases, when we don't need real fence.

I'm not sure the programmer has to know which smp_*mb() is a real fence
or not. He just expects that it generates the proper CPU instructions
that guarantees the effectiveness of the memory barrier.

Regards,
Olivier