From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from BLU004-OMC4S30.hotmail.com (blu004-omc4s30.hotmail.com [65.55.111.169]) by dpdk.org (Postfix) with ESMTP id 66C675A35 for ; Tue, 12 May 2015 17:23:27 +0200 (CEST) Received: from BLU436-SMTP157 ([65.55.111.137]) by BLU004-OMC4S30.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Tue, 12 May 2015 08:23:26 -0700 X-TMN: [VcQo6VbL6QJzu7E9HmuYg6tGQqUNKzIQDDuFeH4IriY=] X-Originating-Email: [dong.wang.pro@hotmail.com] Message-ID: Date: Tue, 12 May 2015 23:23:19 +0800 From: Wang Dong User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: "Ananyev, Konstantin" , "dev@dpdk.org" References: <2601191342CEEE43887BDE71AB97725821424E84@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258214255E7@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772582142E122@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772582142E122@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 12 May 2015 15:23:26.0102 (UTC) FILETIME=[9772FB60:01D08CC7] Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 May 2015 15:23:27 -0000 > Hi Dong, > >> -----Original Message----- >> From: Wang Dong [mailto:dong.wang.pro@hotmail.com] >> Sent: Saturday, May 09, 2015 11:24 AM >> To: Ananyev, Konstantin; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >> >> Hi Konstantin, >> >>> >>> Hi Dong, >>> >>>> -----Original Message----- >>>> From: Wang Dong [mailto:dong.wang.pro@hotmail.com] >>>> Sent: Thursday, May 07, 2015 4:28 PM >>>> To: Ananyev, Konstantin; dev@dpdk.org >>>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >>>> >>>> Hi Konstantin, >>>> >>>>> Hi Dong, >>>>> >>>>>> -----Original Message----- >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong >>>>>> Sent: Tuesday, May 05, 2015 4:38 PM >>>>>> To: dev@dpdk.org >>>>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >>>>>> >>>>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, >>>> compiler >>>>>> memory barrier is enough. >>>>> >>>>> I wouldn't say they are 'unnecessary'. >>>>> There are situations, even on IA, when you need _fence_ isntructions. >>>>> So, please leave rte_*mb() macros unmodified. >>>> OK, leave them unmodified, but I really can't find a situation to use >>>> sfence and lfence instructions. >>> >>> For example: >>> http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ >>> http://dpdk.org/ml/archives/dev/2014-May/002613.html >>> >>>> >>>> >>>>> I still think that we need to create a new set of architecture dependent macros, as what discussed before. >>>>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them. >>>>> Though if you have some better name in mind, I am open to suggestions here. >>>> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~ >>> >>> Hmm, but why _dma_? >>> We need same thing for multi-core communication too. >>> If rte_smp_ is not good enough, might be: rte_arch_? >> I want these two macro only used in PMD, so I think _dma_ is better. The >> memory barrier of processor-processor maybe more complex, and I'm not >> familiar with it... Someone can add rte_smp_*mb for multi-core. > > Sorry, what you are talking about? > At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_. > Konstantin Hi Konstantin, In previous mail, I want to say, both rte_smp_*mb() and rte_dma_*mb() can be added, but the context of rte_smp_*mb() is different from rte_dma_*mb(), maybe rte_smp_*mb() is for thread and rte_dma_*mb() is for PMD. I'm not sure how to implement rte_smp_*mb(), hope it can be implemented by other developer. In Linux, I find it same as _dma_, if so, they will use same instructions. linux-4.0.1/arch/x86/include/asm/barrier.h, line 27: #ifdef CONFIG_X86_PPRO_FENCE #define dma_rmb() rmb() #else #define dma_rmb() barrier() #endif #define dma_wmb() barrier() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() dma_rmb() #define smp_wmb() barrier() #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) #else /* !SMP */ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() #define set_mb(var, value) do { var = value; barrier(); } while (0) #endif /* SMP */ Dong > >> >> I think _arch_ is means nothing here, because rte_*mb is already for >> architectures that dpdk supported, they are redefined in these architecture. >> >>> >>>> >>>>> >>>>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier. >>>>> >>>>> As far as I remember, amd has the same memory ordering model. >>>> It's too hard to find a AMD's software developer manual..... >>> >>> There for example: >>> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf >>> ? >> Search such document on AMD offical website for a long time, this manual >> is what I want, thanks very much!!! >> >> Dong >> >>> >>> Konstantin >>> >>>> >>>> Dong >>>> >>>>> So, I don't think we need #ifdef RTE_ARCH_X86_IA here. >>>>> >>>>> Konstantin >>>>> >>>>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve >>>> performance >>>>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't >> add >>>> the >>>>>> macro, the memory ordering will not be guaranteed. Which macro is better? >>>>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with >>>> rte_rmb() >>>>>> and rte_wmb() for any architecture. >>>>>> >>>>>> --- >>>>>> lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ >>>>>> 1 file changed, 10 insertions(+) >>>>>> >>>>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>>>> index e93e8ee..52b1e81 100644 >>>>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>>>> @@ -49,10 +49,20 @@ extern "C" { >>>>>> >>>>>> #define rte_mb() _mm_mfence() >>>>>> >>>>>> +#ifdef RTE_ARCH_X86_IA >>>>>> + >>>>>> +#define rte_wmb() rte_compiler_barrier() >>>>>> + >>>>>> +#define rte_rmb() rte_compiler_barrier() >>>>>> + >>>>>> +#else >>>>>> + >>>>>> #define rte_wmb() _mm_sfence() >>>>>> >>>>>> #define rte_rmb() _mm_lfence() >>>>>> >>>>>> +#endif >>>>>> + >>>>>> /*------------------------- 16 bit atomic operations -------------------------*/ >>>>>> >>>>>> #ifndef RTE_FORCE_INTRINSICS >>>>>> -- >>>>>> 1.9.1 >>>>>