* [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. @ 2015-05-05 15:38 WangDong 2015-05-05 22:46 ` Ananyev, Konstantin 0 siblings, 1 reply; 7+ messages in thread From: WangDong @ 2015-05-05 15:38 UTC (permalink / raw) To: dev The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler memory barrier is enough. But if dpdk runing on a AMD processor, maybe we should use processor memory barrier. I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the macro, the memory ordering will not be guaranteed. Which macro is better? If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb() and rte_wmb() for any architecture. --- lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h index e93e8ee..52b1e81 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h @@ -49,10 +49,20 @@ extern "C" { #define rte_mb() _mm_mfence() +#ifdef RTE_ARCH_X86_IA + +#define rte_wmb() rte_compiler_barrier() + +#define rte_rmb() rte_compiler_barrier() + +#else + #define rte_wmb() _mm_sfence() #define rte_rmb() _mm_lfence() +#endif + /*------------------------- 16 bit atomic operations -------------------------*/ #ifndef RTE_FORCE_INTRINSICS -- 1.9.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. 2015-05-05 15:38 [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb WangDong @ 2015-05-05 22:46 ` Ananyev, Konstantin 2015-05-07 15:28 ` Wang Dong 0 siblings, 1 reply; 7+ messages in thread From: Ananyev, Konstantin @ 2015-05-05 22:46 UTC (permalink / raw) To: WangDong, dev Hi Dong, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong > Sent: Tuesday, May 05, 2015 4:38 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. > > The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler > memory barrier is enough. I wouldn't say they are 'unnecessary'. There are situations, even on IA, when you need _fence_ isntructions. So, please leave rte_*mb() macros unmodified. I still think that we need to create a new set of architecture dependent macros, as what discussed before. Probably by analogy with linux kernel rte_smp_*mb() is a good name for them. Though if you have some better name in mind, I am open to suggestions here. > But if dpdk runing on a AMD processor, maybe we should use processor memory barrier. As far as I remember, amd has the same memory ordering model. So, I don't think we need #ifdef RTE_ARCH_X86_IA here. Konstantin > I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance > with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the > macro, the memory ordering will not be guaranteed. Which macro is better? > If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb() > and rte_wmb() for any architecture. > > --- > lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > index e93e8ee..52b1e81 100644 > --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > @@ -49,10 +49,20 @@ extern "C" { > > #define rte_mb() _mm_mfence() > > +#ifdef RTE_ARCH_X86_IA > + > +#define rte_wmb() rte_compiler_barrier() > + > +#define rte_rmb() rte_compiler_barrier() > + > +#else > + > #define rte_wmb() _mm_sfence() > > #define rte_rmb() _mm_lfence() > > +#endif > + > /*------------------------- 16 bit atomic operations -------------------------*/ > > #ifndef RTE_FORCE_INTRINSICS > -- > 1.9.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. 2015-05-05 22:46 ` Ananyev, Konstantin @ 2015-05-07 15:28 ` Wang Dong 2015-05-07 16:34 ` Ananyev, Konstantin 0 siblings, 1 reply; 7+ messages in thread From: Wang Dong @ 2015-05-07 15:28 UTC (permalink / raw) To: Ananyev, Konstantin, dev Hi Konstantin, > Hi Dong, > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong >> Sent: Tuesday, May 05, 2015 4:38 PM >> To: dev@dpdk.org >> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >> >> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler >> memory barrier is enough. > > I wouldn't say they are 'unnecessary'. > There are situations, even on IA, when you need _fence_ isntructions. > So, please leave rte_*mb() macros unmodified. OK, leave them unmodified, but I really can't find a situation to use sfence and lfence instructions. > I still think that we need to create a new set of architecture dependent macros, as what discussed before. > Probably by analogy with linux kernel rte_smp_*mb() is a good name for them. > Though if you have some better name in mind, I am open to suggestions here. What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~ > >> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier. > > As far as I remember, amd has the same memory ordering model. It's too hard to find a AMD's software developer manual..... Dong > So, I don't think we need #ifdef RTE_ARCH_X86_IA here. > > Konstantin > >> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance >> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the >> macro, the memory ordering will not be guaranteed. Which macro is better? >> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb() >> and rte_wmb() for any architecture. >> >> --- >> lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >> index e93e8ee..52b1e81 100644 >> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h >> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >> @@ -49,10 +49,20 @@ extern "C" { >> >> #define rte_mb() _mm_mfence() >> >> +#ifdef RTE_ARCH_X86_IA >> + >> +#define rte_wmb() rte_compiler_barrier() >> + >> +#define rte_rmb() rte_compiler_barrier() >> + >> +#else >> + >> #define rte_wmb() _mm_sfence() >> >> #define rte_rmb() _mm_lfence() >> >> +#endif >> + >> /*------------------------- 16 bit atomic operations -------------------------*/ >> >> #ifndef RTE_FORCE_INTRINSICS >> -- >> 1.9.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. 2015-05-07 15:28 ` Wang Dong @ 2015-05-07 16:34 ` Ananyev, Konstantin 2015-05-09 10:24 ` Wang Dong 0 siblings, 1 reply; 7+ messages in thread From: Ananyev, Konstantin @ 2015-05-07 16:34 UTC (permalink / raw) To: Wang Dong, dev Hi Dong, > -----Original Message----- > From: Wang Dong [mailto:dong.wang.pro@hotmail.com] > Sent: Thursday, May 07, 2015 4:28 PM > To: Ananyev, Konstantin; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. > > Hi Konstantin, > > > Hi Dong, > > > >> -----Original Message----- > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong > >> Sent: Tuesday, May 05, 2015 4:38 PM > >> To: dev@dpdk.org > >> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. > >> > >> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, > compiler > >> memory barrier is enough. > > > > I wouldn't say they are 'unnecessary'. > > There are situations, even on IA, when you need _fence_ isntructions. > > So, please leave rte_*mb() macros unmodified. > OK, leave them unmodified, but I really can't find a situation to use > sfence and lfence instructions. For example: http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ http://dpdk.org/ml/archives/dev/2014-May/002613.html > > > > I still think that we need to create a new set of architecture dependent macros, as what discussed before. > > Probably by analogy with linux kernel rte_smp_*mb() is a good name for them. > > Though if you have some better name in mind, I am open to suggestions here. > What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~ Hmm, but why _dma_? We need same thing for multi-core communication too. If rte_smp_ is not good enough, might be: rte_arch_? > > > > >> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier. > > > > As far as I remember, amd has the same memory ordering model. > It's too hard to find a AMD's software developer manual..... There for example: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf ? Konstantin > > Dong > > > So, I don't think we need #ifdef RTE_ARCH_X86_IA here. > > > > Konstantin > > > >> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve > performance > >> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add > the > >> macro, the memory ordering will not be guaranteed. Which macro is better? > >> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with > rte_rmb() > >> and rte_wmb() for any architecture. > >> > >> --- > >> lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >> index e93e8ee..52b1e81 100644 > >> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >> @@ -49,10 +49,20 @@ extern "C" { > >> > >> #define rte_mb() _mm_mfence() > >> > >> +#ifdef RTE_ARCH_X86_IA > >> + > >> +#define rte_wmb() rte_compiler_barrier() > >> + > >> +#define rte_rmb() rte_compiler_barrier() > >> + > >> +#else > >> + > >> #define rte_wmb() _mm_sfence() > >> > >> #define rte_rmb() _mm_lfence() > >> > >> +#endif > >> + > >> /*------------------------- 16 bit atomic operations -------------------------*/ > >> > >> #ifndef RTE_FORCE_INTRINSICS > >> -- > >> 1.9.1 > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. 2015-05-07 16:34 ` Ananyev, Konstantin @ 2015-05-09 10:24 ` Wang Dong 2015-05-11 9:59 ` Ananyev, Konstantin 0 siblings, 1 reply; 7+ messages in thread From: Wang Dong @ 2015-05-09 10:24 UTC (permalink / raw) To: Ananyev, Konstantin, dev Hi Konstantin, > > Hi Dong, > >> -----Original Message----- >> From: Wang Dong [mailto:dong.wang.pro@hotmail.com] >> Sent: Thursday, May 07, 2015 4:28 PM >> To: Ananyev, Konstantin; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >> >> Hi Konstantin, >> >>> Hi Dong, >>> >>>> -----Original Message----- >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong >>>> Sent: Tuesday, May 05, 2015 4:38 PM >>>> To: dev@dpdk.org >>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >>>> >>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, >> compiler >>>> memory barrier is enough. >>> >>> I wouldn't say they are 'unnecessary'. >>> There are situations, even on IA, when you need _fence_ isntructions. >>> So, please leave rte_*mb() macros unmodified. >> OK, leave them unmodified, but I really can't find a situation to use >> sfence and lfence instructions. > > For example: > http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ > http://dpdk.org/ml/archives/dev/2014-May/002613.html > >> >> >>> I still think that we need to create a new set of architecture dependent macros, as what discussed before. >>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them. >>> Though if you have some better name in mind, I am open to suggestions here. >> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~ > > Hmm, but why _dma_? > We need same thing for multi-core communication too. > If rte_smp_ is not good enough, might be: rte_arch_? I want these two macro only used in PMD, so I think _dma_ is better. The memory barrier of processor-processor maybe more complex, and I'm not familiar with it... Someone can add rte_smp_*mb for multi-core. I think _arch_ is means nothing here, because rte_*mb is already for architectures that dpdk supported, they are redefined in these architecture. > >> >>> >>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier. >>> >>> As far as I remember, amd has the same memory ordering model. >> It's too hard to find a AMD's software developer manual..... > > There for example: > http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf > ? Search such document on AMD offical website for a long time, this manual is what I want, thanks very much!!! Dong > > Konstantin > >> >> Dong >> >>> So, I don't think we need #ifdef RTE_ARCH_X86_IA here. >>> >>> Konstantin >>> >>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve >> performance >>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add >> the >>>> macro, the memory ordering will not be guaranteed. Which macro is better? >>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with >> rte_rmb() >>>> and rte_wmb() for any architecture. >>>> >>>> --- >>>> lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ >>>> 1 file changed, 10 insertions(+) >>>> >>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>> index e93e8ee..52b1e81 100644 >>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>> @@ -49,10 +49,20 @@ extern "C" { >>>> >>>> #define rte_mb() _mm_mfence() >>>> >>>> +#ifdef RTE_ARCH_X86_IA >>>> + >>>> +#define rte_wmb() rte_compiler_barrier() >>>> + >>>> +#define rte_rmb() rte_compiler_barrier() >>>> + >>>> +#else >>>> + >>>> #define rte_wmb() _mm_sfence() >>>> >>>> #define rte_rmb() _mm_lfence() >>>> >>>> +#endif >>>> + >>>> /*------------------------- 16 bit atomic operations -------------------------*/ >>>> >>>> #ifndef RTE_FORCE_INTRINSICS >>>> -- >>>> 1.9.1 >>> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. 2015-05-09 10:24 ` Wang Dong @ 2015-05-11 9:59 ` Ananyev, Konstantin 2015-05-12 15:23 ` Wang Dong 0 siblings, 1 reply; 7+ messages in thread From: Ananyev, Konstantin @ 2015-05-11 9:59 UTC (permalink / raw) To: Wang Dong, dev Hi Dong, > -----Original Message----- > From: Wang Dong [mailto:dong.wang.pro@hotmail.com] > Sent: Saturday, May 09, 2015 11:24 AM > To: Ananyev, Konstantin; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. > > Hi Konstantin, > > > > > Hi Dong, > > > >> -----Original Message----- > >> From: Wang Dong [mailto:dong.wang.pro@hotmail.com] > >> Sent: Thursday, May 07, 2015 4:28 PM > >> To: Ananyev, Konstantin; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. > >> > >> Hi Konstantin, > >> > >>> Hi Dong, > >>> > >>>> -----Original Message----- > >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong > >>>> Sent: Tuesday, May 05, 2015 4:38 PM > >>>> To: dev@dpdk.org > >>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. > >>>> > >>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, > >> compiler > >>>> memory barrier is enough. > >>> > >>> I wouldn't say they are 'unnecessary'. > >>> There are situations, even on IA, when you need _fence_ isntructions. > >>> So, please leave rte_*mb() macros unmodified. > >> OK, leave them unmodified, but I really can't find a situation to use > >> sfence and lfence instructions. > > > > For example: > > http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ > > http://dpdk.org/ml/archives/dev/2014-May/002613.html > > > >> > >> > >>> I still think that we need to create a new set of architecture dependent macros, as what discussed before. > >>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them. > >>> Though if you have some better name in mind, I am open to suggestions here. > >> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~ > > > > Hmm, but why _dma_? > > We need same thing for multi-core communication too. > > If rte_smp_ is not good enough, might be: rte_arch_? > I want these two macro only used in PMD, so I think _dma_ is better. The > memory barrier of processor-processor maybe more complex, and I'm not > familiar with it... Someone can add rte_smp_*mb for multi-core. Sorry, what you are talking about? At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_. Konstantin > > I think _arch_ is means nothing here, because rte_*mb is already for > architectures that dpdk supported, they are redefined in these architecture. > > > > >> > >>> > >>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier. > >>> > >>> As far as I remember, amd has the same memory ordering model. > >> It's too hard to find a AMD's software developer manual..... > > > > There for example: > > http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf > > ? > Search such document on AMD offical website for a long time, this manual > is what I want, thanks very much!!! > > Dong > > > > > Konstantin > > > >> > >> Dong > >> > >>> So, I don't think we need #ifdef RTE_ARCH_X86_IA here. > >>> > >>> Konstantin > >>> > >>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve > >> performance > >>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't > add > >> the > >>>> macro, the memory ordering will not be guaranteed. Which macro is better? > >>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with > >> rte_rmb() > >>>> and rte_wmb() for any architecture. > >>>> > >>>> --- > >>>> lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ > >>>> 1 file changed, 10 insertions(+) > >>>> > >>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>>> index e93e8ee..52b1e81 100644 > >>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>>> @@ -49,10 +49,20 @@ extern "C" { > >>>> > >>>> #define rte_mb() _mm_mfence() > >>>> > >>>> +#ifdef RTE_ARCH_X86_IA > >>>> + > >>>> +#define rte_wmb() rte_compiler_barrier() > >>>> + > >>>> +#define rte_rmb() rte_compiler_barrier() > >>>> + > >>>> +#else > >>>> + > >>>> #define rte_wmb() _mm_sfence() > >>>> > >>>> #define rte_rmb() _mm_lfence() > >>>> > >>>> +#endif > >>>> + > >>>> /*------------------------- 16 bit atomic operations -------------------------*/ > >>>> > >>>> #ifndef RTE_FORCE_INTRINSICS > >>>> -- > >>>> 1.9.1 > >>> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. 2015-05-11 9:59 ` Ananyev, Konstantin @ 2015-05-12 15:23 ` Wang Dong 0 siblings, 0 replies; 7+ messages in thread From: Wang Dong @ 2015-05-12 15:23 UTC (permalink / raw) To: Ananyev, Konstantin, dev > Hi Dong, > >> -----Original Message----- >> From: Wang Dong [mailto:dong.wang.pro@hotmail.com] >> Sent: Saturday, May 09, 2015 11:24 AM >> To: Ananyev, Konstantin; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >> >> Hi Konstantin, >> >>> >>> Hi Dong, >>> >>>> -----Original Message----- >>>> From: Wang Dong [mailto:dong.wang.pro@hotmail.com] >>>> Sent: Thursday, May 07, 2015 4:28 PM >>>> To: Ananyev, Konstantin; dev@dpdk.org >>>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >>>> >>>> Hi Konstantin, >>>> >>>>> Hi Dong, >>>>> >>>>>> -----Original Message----- >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong >>>>>> Sent: Tuesday, May 05, 2015 4:38 PM >>>>>> To: dev@dpdk.org >>>>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb. >>>>>> >>>>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, >>>> compiler >>>>>> memory barrier is enough. >>>>> >>>>> I wouldn't say they are 'unnecessary'. >>>>> There are situations, even on IA, when you need _fence_ isntructions. >>>>> So, please leave rte_*mb() macros unmodified. >>>> OK, leave them unmodified, but I really can't find a situation to use >>>> sfence and lfence instructions. >>> >>> For example: >>> http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ >>> http://dpdk.org/ml/archives/dev/2014-May/002613.html >>> >>>> >>>> >>>>> I still think that we need to create a new set of architecture dependent macros, as what discussed before. >>>>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them. >>>>> Though if you have some better name in mind, I am open to suggestions here. >>>> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~ >>> >>> Hmm, but why _dma_? >>> We need same thing for multi-core communication too. >>> If rte_smp_ is not good enough, might be: rte_arch_? >> I want these two macro only used in PMD, so I think _dma_ is better. The >> memory barrier of processor-processor maybe more complex, and I'm not >> familiar with it... Someone can add rte_smp_*mb for multi-core. > > Sorry, what you are talking about? > At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_. > Konstantin Hi Konstantin, In previous mail, I want to say, both rte_smp_*mb() and rte_dma_*mb() can be added, but the context of rte_smp_*mb() is different from rte_dma_*mb(), maybe rte_smp_*mb() is for thread and rte_dma_*mb() is for PMD. I'm not sure how to implement rte_smp_*mb(), hope it can be implemented by other developer. In Linux, I find it same as _dma_, if so, they will use same instructions. linux-4.0.1/arch/x86/include/asm/barrier.h, line 27: #ifdef CONFIG_X86_PPRO_FENCE #define dma_rmb() rmb() #else #define dma_rmb() barrier() #endif #define dma_wmb() barrier() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() dma_rmb() #define smp_wmb() barrier() #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) #else /* !SMP */ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() #define set_mb(var, value) do { var = value; barrier(); } while (0) #endif /* SMP */ Dong > >> >> I think _arch_ is means nothing here, because rte_*mb is already for >> architectures that dpdk supported, they are redefined in these architecture. >> >>> >>>> >>>>> >>>>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier. >>>>> >>>>> As far as I remember, amd has the same memory ordering model. >>>> It's too hard to find a AMD's software developer manual..... >>> >>> There for example: >>> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf >>> ? >> Search such document on AMD offical website for a long time, this manual >> is what I want, thanks very much!!! >> >> Dong >> >>> >>> Konstantin >>> >>>> >>>> Dong >>>> >>>>> So, I don't think we need #ifdef RTE_ARCH_X86_IA here. >>>>> >>>>> Konstantin >>>>> >>>>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve >>>> performance >>>>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't >> add >>>> the >>>>>> macro, the memory ordering will not be guaranteed. Which macro is better? >>>>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with >>>> rte_rmb() >>>>>> and rte_wmb() for any architecture. >>>>>> >>>>>> --- >>>>>> lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ >>>>>> 1 file changed, 10 insertions(+) >>>>>> >>>>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>>>> index e93e8ee..52b1e81 100644 >>>>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>>>>> @@ -49,10 +49,20 @@ extern "C" { >>>>>> >>>>>> #define rte_mb() _mm_mfence() >>>>>> >>>>>> +#ifdef RTE_ARCH_X86_IA >>>>>> + >>>>>> +#define rte_wmb() rte_compiler_barrier() >>>>>> + >>>>>> +#define rte_rmb() rte_compiler_barrier() >>>>>> + >>>>>> +#else >>>>>> + >>>>>> #define rte_wmb() _mm_sfence() >>>>>> >>>>>> #define rte_rmb() _mm_lfence() >>>>>> >>>>>> +#endif >>>>>> + >>>>>> /*------------------------- 16 bit atomic operations -------------------------*/ >>>>>> >>>>>> #ifndef RTE_FORCE_INTRINSICS >>>>>> -- >>>>>> 1.9.1 >>>>> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-05-12 15:23 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-05 15:38 [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb WangDong 2015-05-05 22:46 ` Ananyev, Konstantin 2015-05-07 15:28 ` Wang Dong 2015-05-07 16:34 ` Ananyev, Konstantin 2015-05-09 10:24 ` Wang Dong 2015-05-11 9:59 ` Ananyev, Konstantin 2015-05-12 15:23 ` Wang Dong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).