From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id A1CA3593A for ; Fri, 23 May 2014 16:10:03 +0200 (CEST) Received: from was59-1-82-226-113-214.fbx.proxad.net ([82.226.113.214] helo=[192.168.0.10]) by mail.droids-corp.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from ) id 1WnqCK-0005oS-F0; Fri, 23 May 2014 16:12:01 +0200 Message-ID: <537F56C3.3060503@6wind.com> Date: Fri, 23 May 2014 16:10:11 +0200 From: Olivier MATZ User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.4.0 MIME-Version: 1.0 To: "Ananyev, Konstantin" , "dev@dpdk.org" References: <1400578588-21137-1-git-send-email-olivier.matz@6wind.com> <2601191342CEEE43887BDE71AB9772580EFA776F@IRSMSX105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772580EFA776F@IRSMSX105.ger.corp.intel.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] atomic: clarify use of memory barriers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 May 2014 14:10:03 -0000 Hi Konstantin, Thanks for these code examples and explanations. On 05/20/2014 06:35 PM, Ananyev, Konstantin wrote: > So with the following fragment of code: > extern int *x; > extern __128i a, *p; > L0: > _mm_stream_si128( p, a); > rte_compiler_barrier(); > L1: > *x = 0; > > There is no guarantee that store at L0 will always be finished > before store at L1. This code fragment looks very similar to what is done in __rte_ring_sp_do_enqueue(): [...] ENQUEUE_PTRS(); /* I expect it is converted to an SSE store */ rte_compiler_barrier(); [...] r->prod.tail = prod_next; So, according to your previous explanation, I understand that this code would require a write memory barrier in place of the compiler barrier. Am I wrong? If it's correct, we are back on the initial question: in this kind of code, if the programmer wants that all stores are issued before setting the value of r->prod.tail. This is the definition of the write memory barrier. So wouldn't be better that he explicitelly calls rte_smp_wmb() instead of adding a compiler barrier because he knows that it is sufficient on all currently supported CPUs? Can we be sure that next Intel CPU generations will behave that way in the future? Moreover, if I understand well, a real wmb() is needed only if a SSE store is issued. But the programmer may not control that, it's the job of the compiler. > But now, there seems a confusion: everyone has to remember that > smp_mb() and smp_wmb() are 'real' fences, while smp_rmb() is not. > That's why my suggestion was to simply keep using compiler_barrier() > for all cases, when we don't need real fence. I'm not sure the programmer has to know which smp_*mb() is a real fence or not. He just expects that it generates the proper CPU instructions that guarantees the effectiveness of the memory barrier. Regards, Olivier