DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] eal: force gcc to inline rte_movX function
@ 2018-04-12  5:16 Junjie Chen
  2018-04-17 13:22 ` Thomas Monjalon
  0 siblings, 1 reply; 5+ messages in thread
From: Junjie Chen @ 2018-04-12  5:16 UTC (permalink / raw)
  To: bruce.richardson, konstantin.ananyev; +Cc: dev, Chen, Junjie, Chen

From: "Chen, Junjie" <junjie.j.chen@intel.com>

Sometimes gcc does not inline the function despite keyword *inline*,
we obeserve rte_movX is not inline when doing performance profiling,
so use *always_inline* keyword to force gcc to inline the function.

Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
---
 .../common/include/arch/x86/rte_memcpy.h           | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
index cc140ecca..5ead68ab2 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
@@ -52,7 +52,7 @@ rte_memcpy(void *dst, const void *src, size_t n);
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
 	__m128i xmm0;
@@ -65,7 +65,7 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
  * Copy 32 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
 	__m256i ymm0;
@@ -78,7 +78,7 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
  * Copy 64 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
 	__m512i zmm0;
@@ -91,7 +91,7 @@ rte_mov64(uint8_t *dst, const uint8_t *src)
  * Copy 128 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov128(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov64(dst + 0 * 64, src + 0 * 64);
@@ -102,7 +102,7 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
  * Copy 256 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov256(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov64(dst + 0 * 64, src + 0 * 64);
@@ -293,7 +293,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
 	__m128i xmm0;
@@ -306,7 +306,7 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
  * Copy 32 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
 	__m256i ymm0;
@@ -319,7 +319,7 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
  * Copy 64 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
@@ -486,7 +486,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
 	__m128i xmm0;
@@ -499,7 +499,7 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
  * Copy 32 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
@@ -510,7 +510,7 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
  * Copy 64 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
-- 
2.16.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: force gcc to inline rte_movX function
  2018-04-12  5:16 [dpdk-dev] [PATCH] eal: force gcc to inline rte_movX function Junjie Chen
@ 2018-04-17 13:22 ` Thomas Monjalon
  2018-04-17 14:57   ` Bruce Richardson
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Monjalon @ 2018-04-17 13:22 UTC (permalink / raw)
  To: Junjie Chen, bruce.richardson, konstantin.ananyev; +Cc: dev

12/04/2018 07:16, Junjie Chen:
> From: "Chen, Junjie" <junjie.j.chen@intel.com>
> 
> Sometimes gcc does not inline the function despite keyword *inline*,
> we obeserve rte_movX is not inline when doing performance profiling,
> so use *always_inline* keyword to force gcc to inline the function.
> 
> Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> ---
>  .../common/include/arch/x86/rte_memcpy.h           | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)

The title should start with "eal/x86:"
Something like that:
	eal/x86: force inlining of memcpy sub-functions

Bruce, Konstantin, any review of the content/optimization?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: force gcc to inline rte_movX function
  2018-04-17 13:22 ` Thomas Monjalon
@ 2018-04-17 14:57   ` Bruce Richardson
  2018-04-18  2:43     ` Chen, Junjie J
  0 siblings, 1 reply; 5+ messages in thread
From: Bruce Richardson @ 2018-04-17 14:57 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Junjie Chen, konstantin.ananyev, dev

On Tue, Apr 17, 2018 at 03:22:06PM +0200, Thomas Monjalon wrote:
> 12/04/2018 07:16, Junjie Chen:
> > From: "Chen, Junjie" <junjie.j.chen@intel.com>
> > 
> > Sometimes gcc does not inline the function despite keyword *inline*,
> > we obeserve rte_movX is not inline when doing performance profiling,
> > so use *always_inline* keyword to force gcc to inline the function.
> > 
> > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > ---
> >  .../common/include/arch/x86/rte_memcpy.h           | 22 +++++++++++-----------
> >  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> The title should start with "eal/x86:"
> Something like that:
> 	eal/x86: force inlining of memcpy sub-functions
> 
> Bruce, Konstantin, any review of the content/optimization?
> 
No objection here.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: force gcc to inline rte_movX function
  2018-04-17 14:57   ` Bruce Richardson
@ 2018-04-18  2:43     ` Chen, Junjie J
  2018-04-18  7:25       ` Thomas Monjalon
  0 siblings, 1 reply; 5+ messages in thread
From: Chen, Junjie J @ 2018-04-18  2:43 UTC (permalink / raw)
  To: Richardson, Bruce, Thomas Monjalon; +Cc: Ananyev, Konstantin, dev

Thanks to point this out. I agree for the title change.

Do you want me to send v2 patch? Or you can handle it when committing? 

> > > Sometimes gcc does not inline the function despite keyword *inline*,
> > > we obeserve rte_movX is not inline when doing performance profiling,
> > > so use *always_inline* keyword to force gcc to inline the function.
> > >
> > > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > > ---
> > >  .../common/include/arch/x86/rte_memcpy.h           | 22
> +++++++++++-----------
> > >  1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > The title should start with "eal/x86:"
> > Something like that:
> > 	eal/x86: force inlining of memcpy sub-functions
> >
> > Bruce, Konstantin, any review of the content/optimization?
> >
> No objection here.
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: force gcc to inline rte_movX function
  2018-04-18  2:43     ` Chen, Junjie J
@ 2018-04-18  7:25       ` Thomas Monjalon
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Monjalon @ 2018-04-18  7:25 UTC (permalink / raw)
  To: Chen, Junjie J; +Cc: dev, Richardson, Bruce, Ananyev, Konstantin

18/04/2018 04:43, Chen, Junjie J:
> Thanks to point this out. I agree for the title change.
> 
> Do you want me to send v2 patch? Or you can handle it when committing? 
> 
> > > > Sometimes gcc does not inline the function despite keyword *inline*,
> > > > we obeserve rte_movX is not inline when doing performance profiling,
> > > > so use *always_inline* keyword to force gcc to inline the function.
> > > >
> > > > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > > > ---
> > > >  .../common/include/arch/x86/rte_memcpy.h           | 22
> > +++++++++++-----------
> > > >  1 file changed, 11 insertions(+), 11 deletions(-)
> > >
> > > The title should start with "eal/x86:"
> > > Something like that:
> > > 	eal/x86: force inlining of memcpy sub-functions
> > >
> > > Bruce, Konstantin, any review of the content/optimization?
> > >
> > No objection here.
> > 
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-18  7:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12  5:16 [dpdk-dev] [PATCH] eal: force gcc to inline rte_movX function Junjie Chen
2018-04-17 13:22 ` Thomas Monjalon
2018-04-17 14:57   ` Bruce Richardson
2018-04-18  2:43     ` Chen, Junjie J
2018-04-18  7:25       ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).