DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
@ 2015-05-05 15:38 WangDong
  2015-05-05 22:46 ` Ananyev, Konstantin
  0 siblings, 1 reply; 7+ messages in thread
From: WangDong @ 2015-05-05 15:38 UTC (permalink / raw)
  To: dev

The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler memory barrier is enough. But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the macro, the memory ordering will not be guaranteed. Which macro is better?
If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb() and rte_wmb() for any architecture.

---
 lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
index e93e8ee..52b1e81 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
@@ -49,10 +49,20 @@ extern "C" {
 
 #define	rte_mb() _mm_mfence()
 
+#ifdef RTE_ARCH_X86_IA
+
+#define rte_wmb() rte_compiler_barrier()
+
+#define rte_rmb() rte_compiler_barrier()
+
+#else
+
 #define	rte_wmb() _mm_sfence()
 
 #define	rte_rmb() _mm_lfence()
 
+#endif
+
 /*------------------------- 16 bit atomic operations -------------------------*/
 
 #ifndef RTE_FORCE_INTRINSICS
-- 
1.9.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
  2015-05-05 15:38 [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb WangDong
@ 2015-05-05 22:46 ` Ananyev, Konstantin
  2015-05-07 15:28   ` Wang Dong
  0 siblings, 1 reply; 7+ messages in thread
From: Ananyev, Konstantin @ 2015-05-05 22:46 UTC (permalink / raw)
  To: WangDong, dev

Hi Dong,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
> Sent: Tuesday, May 05, 2015 4:38 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> 
> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler
> memory barrier is enough. 

I wouldn't say they are 'unnecessary'.
There are situations, even on IA, when you need _fence_ isntructions.
So, please leave rte_*mb() macros unmodified.
I still think that we need to create a new set of architecture dependent macros, as what discussed before.
Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.  
Though if you have some better name in mind, I am open to suggestions here.

> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.

As far as I remember, amd has the same memory ordering model.
So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.

Konstantin

> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance
> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the
> macro, the memory ordering will not be guaranteed. Which macro is better?
> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb()
> and rte_wmb() for any architecture.
> 
> ---
>  lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> index e93e8ee..52b1e81 100644
> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> @@ -49,10 +49,20 @@ extern "C" {
> 
>  #define	rte_mb() _mm_mfence()
> 
> +#ifdef RTE_ARCH_X86_IA
> +
> +#define rte_wmb() rte_compiler_barrier()
> +
> +#define rte_rmb() rte_compiler_barrier()
> +
> +#else
> +
>  #define	rte_wmb() _mm_sfence()
> 
>  #define	rte_rmb() _mm_lfence()
> 
> +#endif
> +
>  /*------------------------- 16 bit atomic operations -------------------------*/
> 
>  #ifndef RTE_FORCE_INTRINSICS
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
  2015-05-05 22:46 ` Ananyev, Konstantin
@ 2015-05-07 15:28   ` Wang Dong
  2015-05-07 16:34     ` Ananyev, Konstantin
  0 siblings, 1 reply; 7+ messages in thread
From: Wang Dong @ 2015-05-07 15:28 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev

Hi Konstantin,

> Hi Dong,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
>> Sent: Tuesday, May 05, 2015 4:38 PM
>> To: dev@dpdk.org
>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>
>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler
>> memory barrier is enough.
>
> I wouldn't say they are 'unnecessary'.
> There are situations, even on IA, when you need _fence_ isntructions.
> So, please leave rte_*mb() macros unmodified.
OK, leave them unmodified, but I really can't find a situation to use 
sfence and lfence instructions.


> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
> Though if you have some better name in mind, I am open to suggestions here.
What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~

>
>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
>
> As far as I remember, amd has the same memory ordering model.
It's too hard to find a AMD's software developer manual.....

Dong

> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
>
> Konstantin
>
>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance
>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the
>> macro, the memory ordering will not be guaranteed. Which macro is better?
>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb()
>> and rte_wmb() for any architecture.
>>
>> ---
>>   lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>> index e93e8ee..52b1e81 100644
>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>> @@ -49,10 +49,20 @@ extern "C" {
>>
>>   #define	rte_mb() _mm_mfence()
>>
>> +#ifdef RTE_ARCH_X86_IA
>> +
>> +#define rte_wmb() rte_compiler_barrier()
>> +
>> +#define rte_rmb() rte_compiler_barrier()
>> +
>> +#else
>> +
>>   #define	rte_wmb() _mm_sfence()
>>
>>   #define	rte_rmb() _mm_lfence()
>>
>> +#endif
>> +
>>   /*------------------------- 16 bit atomic operations -------------------------*/
>>
>>   #ifndef RTE_FORCE_INTRINSICS
>> --
>> 1.9.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
  2015-05-07 15:28   ` Wang Dong
@ 2015-05-07 16:34     ` Ananyev, Konstantin
  2015-05-09 10:24       ` Wang Dong
  0 siblings, 1 reply; 7+ messages in thread
From: Ananyev, Konstantin @ 2015-05-07 16:34 UTC (permalink / raw)
  To: Wang Dong, dev


Hi Dong,

> -----Original Message-----
> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
> Sent: Thursday, May 07, 2015 4:28 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> 
> Hi Konstantin,
> 
> > Hi Dong,
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
> >> Sent: Tuesday, May 05, 2015 4:38 PM
> >> To: dev@dpdk.org
> >> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> >>
> >> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
> compiler
> >> memory barrier is enough.
> >
> > I wouldn't say they are 'unnecessary'.
> > There are situations, even on IA, when you need _fence_ isntructions.
> > So, please leave rte_*mb() macros unmodified.
> OK, leave them unmodified, but I really can't find a situation to use
> sfence and lfence instructions.

For example:
http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
http://dpdk.org/ml/archives/dev/2014-May/002613.html

> 
> 
> > I still think that we need to create a new set of architecture dependent macros, as what discussed before.
> > Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
> > Though if you have some better name in mind, I am open to suggestions here.
> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~

Hmm, but why _dma_?
We need same thing for multi-core communication too.
If rte_smp_ is not good enough, might be: rte_arch_?

> 
> >
> >> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
> >
> > As far as I remember, amd has the same memory ordering model.
> It's too hard to find a AMD's software developer manual.....

There for example:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
?

Konstantin

> 
> Dong
> 
> > So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
> >
> > Konstantin
> >
> >> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
> performance
> >> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add
> the
> >> macro, the memory ordering will not be guaranteed. Which macro is better?
> >> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
> rte_rmb()
> >> and rte_wmb() for any architecture.
> >>
> >> ---
> >>   lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
> >>   1 file changed, 10 insertions(+)
> >>
> >> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> index e93e8ee..52b1e81 100644
> >> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> @@ -49,10 +49,20 @@ extern "C" {
> >>
> >>   #define	rte_mb() _mm_mfence()
> >>
> >> +#ifdef RTE_ARCH_X86_IA
> >> +
> >> +#define rte_wmb() rte_compiler_barrier()
> >> +
> >> +#define rte_rmb() rte_compiler_barrier()
> >> +
> >> +#else
> >> +
> >>   #define	rte_wmb() _mm_sfence()
> >>
> >>   #define	rte_rmb() _mm_lfence()
> >>
> >> +#endif
> >> +
> >>   /*------------------------- 16 bit atomic operations -------------------------*/
> >>
> >>   #ifndef RTE_FORCE_INTRINSICS
> >> --
> >> 1.9.1
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
  2015-05-07 16:34     ` Ananyev, Konstantin
@ 2015-05-09 10:24       ` Wang Dong
  2015-05-11  9:59         ` Ananyev, Konstantin
  0 siblings, 1 reply; 7+ messages in thread
From: Wang Dong @ 2015-05-09 10:24 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev

Hi Konstantin,

>
> Hi Dong,
>
>> -----Original Message-----
>> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
>> Sent: Thursday, May 07, 2015 4:28 PM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>
>> Hi Konstantin,
>>
>>> Hi Dong,
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
>>>> Sent: Tuesday, May 05, 2015 4:38 PM
>>>> To: dev@dpdk.org
>>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>
>>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
>> compiler
>>>> memory barrier is enough.
>>>
>>> I wouldn't say they are 'unnecessary'.
>>> There are situations, even on IA, when you need _fence_ isntructions.
>>> So, please leave rte_*mb() macros unmodified.
>> OK, leave them unmodified, but I really can't find a situation to use
>> sfence and lfence instructions.
>
> For example:
> http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
> http://dpdk.org/ml/archives/dev/2014-May/002613.html
>
>>
>>
>>> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
>>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
>>> Though if you have some better name in mind, I am open to suggestions here.
>> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~
>
> Hmm, but why _dma_?
> We need same thing for multi-core communication too.
> If rte_smp_ is not good enough, might be: rte_arch_?
I want these two macro only used in PMD, so I think _dma_ is better. The 
memory barrier of processor-processor maybe more complex, and I'm not 
familiar with it... Someone can add rte_smp_*mb for multi-core.

I think _arch_ is means nothing here, because rte_*mb is already for 
architectures that dpdk supported, they are redefined in these architecture.

>
>>
>>>
>>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
>>>
>>> As far as I remember, amd has the same memory ordering model.
>> It's too hard to find a AMD's software developer manual.....
>
> There for example:
> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
> ?
Search such document on AMD offical website for a long time, this manual 
is what I want, thanks very much!!!

Dong

>
> Konstantin
>
>>
>> Dong
>>
>>> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
>>>
>>> Konstantin
>>>
>>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
>> performance
>>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add
>> the
>>>> macro, the memory ordering will not be guaranteed. Which macro is better?
>>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
>> rte_rmb()
>>>> and rte_wmb() for any architecture.
>>>>
>>>> ---
>>>>    lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>>>>    1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> index e93e8ee..52b1e81 100644
>>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> @@ -49,10 +49,20 @@ extern "C" {
>>>>
>>>>    #define	rte_mb() _mm_mfence()
>>>>
>>>> +#ifdef RTE_ARCH_X86_IA
>>>> +
>>>> +#define rte_wmb() rte_compiler_barrier()
>>>> +
>>>> +#define rte_rmb() rte_compiler_barrier()
>>>> +
>>>> +#else
>>>> +
>>>>    #define	rte_wmb() _mm_sfence()
>>>>
>>>>    #define	rte_rmb() _mm_lfence()
>>>>
>>>> +#endif
>>>> +
>>>>    /*------------------------- 16 bit atomic operations -------------------------*/
>>>>
>>>>    #ifndef RTE_FORCE_INTRINSICS
>>>> --
>>>> 1.9.1
>>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
  2015-05-09 10:24       ` Wang Dong
@ 2015-05-11  9:59         ` Ananyev, Konstantin
  2015-05-12 15:23           ` Wang Dong
  0 siblings, 1 reply; 7+ messages in thread
From: Ananyev, Konstantin @ 2015-05-11  9:59 UTC (permalink / raw)
  To: Wang Dong, dev

Hi Dong,

> -----Original Message-----
> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
> Sent: Saturday, May 09, 2015 11:24 AM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> 
> Hi Konstantin,
> 
> >
> > Hi Dong,
> >
> >> -----Original Message-----
> >> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
> >> Sent: Thursday, May 07, 2015 4:28 PM
> >> To: Ananyev, Konstantin; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> >>
> >> Hi Konstantin,
> >>
> >>> Hi Dong,
> >>>
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
> >>>> Sent: Tuesday, May 05, 2015 4:38 PM
> >>>> To: dev@dpdk.org
> >>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> >>>>
> >>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
> >> compiler
> >>>> memory barrier is enough.
> >>>
> >>> I wouldn't say they are 'unnecessary'.
> >>> There are situations, even on IA, when you need _fence_ isntructions.
> >>> So, please leave rte_*mb() macros unmodified.
> >> OK, leave them unmodified, but I really can't find a situation to use
> >> sfence and lfence instructions.
> >
> > For example:
> > http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
> > http://dpdk.org/ml/archives/dev/2014-May/002613.html
> >
> >>
> >>
> >>> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
> >>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
> >>> Though if you have some better name in mind, I am open to suggestions here.
> >> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~
> >
> > Hmm, but why _dma_?
> > We need same thing for multi-core communication too.
> > If rte_smp_ is not good enough, might be: rte_arch_?
> I want these two macro only used in PMD, so I think _dma_ is better. The
> memory barrier of processor-processor maybe more complex, and I'm not
> familiar with it... Someone can add rte_smp_*mb for multi-core.

Sorry, what you are talking about?
At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_.
Konstantin

> 
> I think _arch_ is means nothing here, because rte_*mb is already for
> architectures that dpdk supported, they are redefined in these architecture.
> 
> >
> >>
> >>>
> >>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
> >>>
> >>> As far as I remember, amd has the same memory ordering model.
> >> It's too hard to find a AMD's software developer manual.....
> >
> > There for example:
> > http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
> > ?
> Search such document on AMD offical website for a long time, this manual
> is what I want, thanks very much!!!
> 
> Dong
> 
> >
> > Konstantin
> >
> >>
> >> Dong
> >>
> >>> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
> >>>
> >>> Konstantin
> >>>
> >>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
> >> performance
> >>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't
> add
> >> the
> >>>> macro, the memory ordering will not be guaranteed. Which macro is better?
> >>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
> >> rte_rmb()
> >>>> and rte_wmb() for any architecture.
> >>>>
> >>>> ---
> >>>>    lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
> >>>>    1 file changed, 10 insertions(+)
> >>>>
> >>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >>>> index e93e8ee..52b1e81 100644
> >>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >>>> @@ -49,10 +49,20 @@ extern "C" {
> >>>>
> >>>>    #define	rte_mb() _mm_mfence()
> >>>>
> >>>> +#ifdef RTE_ARCH_X86_IA
> >>>> +
> >>>> +#define rte_wmb() rte_compiler_barrier()
> >>>> +
> >>>> +#define rte_rmb() rte_compiler_barrier()
> >>>> +
> >>>> +#else
> >>>> +
> >>>>    #define	rte_wmb() _mm_sfence()
> >>>>
> >>>>    #define	rte_rmb() _mm_lfence()
> >>>>
> >>>> +#endif
> >>>> +
> >>>>    /*------------------------- 16 bit atomic operations -------------------------*/
> >>>>
> >>>>    #ifndef RTE_FORCE_INTRINSICS
> >>>> --
> >>>> 1.9.1
> >>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
  2015-05-11  9:59         ` Ananyev, Konstantin
@ 2015-05-12 15:23           ` Wang Dong
  0 siblings, 0 replies; 7+ messages in thread
From: Wang Dong @ 2015-05-12 15:23 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev



> Hi Dong,
>
>> -----Original Message-----
>> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
>> Sent: Saturday, May 09, 2015 11:24 AM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>
>> Hi Konstantin,
>>
>>>
>>> Hi Dong,
>>>
>>>> -----Original Message-----
>>>> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
>>>> Sent: Thursday, May 07, 2015 4:28 PM
>>>> To: Ananyev, Konstantin; dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>
>>>> Hi Konstantin,
>>>>
>>>>> Hi Dong,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
>>>>>> Sent: Tuesday, May 05, 2015 4:38 PM
>>>>>> To: dev@dpdk.org
>>>>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>>>
>>>>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
>>>> compiler
>>>>>> memory barrier is enough.
>>>>>
>>>>> I wouldn't say they are 'unnecessary'.
>>>>> There are situations, even on IA, when you need _fence_ isntructions.
>>>>> So, please leave rte_*mb() macros unmodified.
>>>> OK, leave them unmodified, but I really can't find a situation to use
>>>> sfence and lfence instructions.
>>>
>>> For example:
>>> http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
>>> http://dpdk.org/ml/archives/dev/2014-May/002613.html
>>>
>>>>
>>>>
>>>>> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
>>>>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
>>>>> Though if you have some better name in mind, I am open to suggestions here.
>>>> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~
>>>
>>> Hmm, but why _dma_?
>>> We need same thing for multi-core communication too.
>>> If rte_smp_ is not good enough, might be: rte_arch_?
>> I want these two macro only used in PMD, so I think _dma_ is better. The
>> memory barrier of processor-processor maybe more complex, and I'm not
>> familiar with it... Someone can add rte_smp_*mb for multi-core.
>
> Sorry, what you are talking about?
> At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_.
> Konstantin

Hi Konstantin,

In previous mail, I want to say, both rte_smp_*mb() and rte_dma_*mb() 
can be added, but the context of rte_smp_*mb() is different from 
rte_dma_*mb(), maybe rte_smp_*mb() is for thread and rte_dma_*mb() is 
for PMD. I'm not sure how to implement rte_smp_*mb(), hope it can be 
implemented by other developer. In Linux, I find it same as _dma_, if 
so, they will use same instructions.

linux-4.0.1/arch/x86/include/asm/barrier.h, line 27:
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb()       rmb()
#else
#define dma_rmb()       barrier()
#endif
#define dma_wmb()       barrier()

#ifdef CONFIG_SMP
#define smp_mb()        mb()
#define smp_rmb()       dma_rmb()
#define smp_wmb()       barrier()
#define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
#else /* !SMP */
#define smp_mb()        barrier()
#define smp_rmb()       barrier()
#define smp_wmb()       barrier()
#define set_mb(var, value) do { var = value; barrier(); } while (0)
#endif /* SMP */

Dong
>
>>
>> I think _arch_ is means nothing here, because rte_*mb is already for
>> architectures that dpdk supported, they are redefined in these architecture.
>>
>>>
>>>>
>>>>>
>>>>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
>>>>>
>>>>> As far as I remember, amd has the same memory ordering model.
>>>> It's too hard to find a AMD's software developer manual.....
>>>
>>> There for example:
>>> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
>>> ?
>> Search such document on AMD offical website for a long time, this manual
>> is what I want, thanks very much!!!
>>
>> Dong
>>
>>>
>>> Konstantin
>>>
>>>>
>>>> Dong
>>>>
>>>>> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
>>>>>
>>>>> Konstantin
>>>>>
>>>>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
>>>> performance
>>>>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't
>> add
>>>> the
>>>>>> macro, the memory ordering will not be guaranteed. Which macro is better?
>>>>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
>>>> rte_rmb()
>>>>>> and rte_wmb() for any architecture.
>>>>>>
>>>>>> ---
>>>>>>     lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>>>>>>     1 file changed, 10 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> index e93e8ee..52b1e81 100644
>>>>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> @@ -49,10 +49,20 @@ extern "C" {
>>>>>>
>>>>>>     #define	rte_mb() _mm_mfence()
>>>>>>
>>>>>> +#ifdef RTE_ARCH_X86_IA
>>>>>> +
>>>>>> +#define rte_wmb() rte_compiler_barrier()
>>>>>> +
>>>>>> +#define rte_rmb() rte_compiler_barrier()
>>>>>> +
>>>>>> +#else
>>>>>> +
>>>>>>     #define	rte_wmb() _mm_sfence()
>>>>>>
>>>>>>     #define	rte_rmb() _mm_lfence()
>>>>>>
>>>>>> +#endif
>>>>>> +
>>>>>>     /*------------------------- 16 bit atomic operations -------------------------*/
>>>>>>
>>>>>>     #ifndef RTE_FORCE_INTRINSICS
>>>>>> --
>>>>>> 1.9.1
>>>>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-12 15:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-05 15:38 [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb WangDong
2015-05-05 22:46 ` Ananyev, Konstantin
2015-05-07 15:28   ` Wang Dong
2015-05-07 16:34     ` Ananyev, Konstantin
2015-05-09 10:24       ` Wang Dong
2015-05-11  9:59         ` Ananyev, Konstantin
2015-05-12 15:23           ` Wang Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).