* [PATCH] x86: rte_mov256 was missing for AVX2
@ 2022-08-20 10:30 Morten Brørup
2022-08-29 10:55 ` Thomas Monjalon
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Morten Brørup @ 2022-08-20 10:30 UTC (permalink / raw)
To: bruce.richardson, konstantin.v.ananyev; +Cc: dev, Morten Brørup
The rte_mov256 function was missing for AVX2.
Does nobody build test for AVX2 and check the compiler output?
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
lib/eal/x86/include/rte_memcpy.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index b678b5c942..d4d7a5cfc8 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -371,6 +371,23 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
}
+/**
+ * Copy 256 bytes from one location to another,
+ * locations should not overlap.
+ */
+static __rte_always_inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+ rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
+ rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32);
+ rte_mov32((uint8_t *)dst + 2 * 32, (const uint8_t *)src + 2 * 32);
+ rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
+ rte_mov32((uint8_t *)dst + 4 * 32, (const uint8_t *)src + 4 * 32);
+ rte_mov32((uint8_t *)dst + 5 * 32, (const uint8_t *)src + 5 * 32);
+ rte_mov32((uint8_t *)dst + 6 * 32, (const uint8_t *)src + 6 * 32);
+ rte_mov32((uint8_t *)dst + 7 * 32, (const uint8_t *)src + 7 * 32);
+}
+
/**
* Copy 128-byte blocks from one location to another,
* locations should not overlap.
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86: rte_mov256 was missing for AVX2
2022-08-20 10:30 [PATCH] x86: rte_mov256 was missing for AVX2 Morten Brørup
@ 2022-08-29 10:55 ` Thomas Monjalon
2022-08-29 12:18 ` Morten Brørup
2022-09-28 19:44 ` Morten Brørup
2022-09-30 8:34 ` David Marchand
2 siblings, 1 reply; 7+ messages in thread
From: Thomas Monjalon @ 2022-08-29 10:55 UTC (permalink / raw)
To: Morten Brørup; +Cc: bruce.richardson, konstantin.v.ananyev, dev
20/08/2022 12:30, Morten Brørup:
> The rte_mov256 function was missing for AVX2.
> Does nobody build test for AVX2 and check the compiler output?
Please could you specify the context/setup to reproduce the issue?
An error message would be nice to paste here as well.
Thanks
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] x86: rte_mov256 was missing for AVX2
2022-08-29 10:55 ` Thomas Monjalon
@ 2022-08-29 12:18 ` Morten Brørup
2022-08-29 13:12 ` Thomas Monjalon
0 siblings, 1 reply; 7+ messages in thread
From: Morten Brørup @ 2022-08-29 12:18 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: bruce.richardson, konstantin.v.ananyev, dev
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Monday, 29 August 2022 12.56
>
> 20/08/2022 12:30, Morten Brørup:
> > The rte_mov256 function was missing for AVX2.
> > Does nobody build test for AVX2 and check the compiler output?
>
> Please could you specify the context/setup to reproduce the issue?
I stumbled upon it while working on the new non-temporal memcpy function.
Reproduction described below.
>
> An error message would be nice to paste here as well.
> Thanks
The rte_memcpy declarations are in the lib/eal/generic/rte_memcpy.h header file, so add this declaration header file to the implementation file. (I wonder why it is not already there?)
lib/eal/x86/rte_memcpy.h:
#include <rte_common.h>
#include <rte_config.h>
#include <rte_debug.h>
+ #include "generic/rte_memcpy.h"
#ifdef __cplusplus
extern "C" {
#endif
The error messages from ninja look like this:
[46/2597] Compiling C object lib/acl/libavx2_tmp.a.p/acl_run_avx2.c.o
In file included from ../lib/eal/x86/include/rte_memcpy.h:24,
from ../lib/acl/rte_acl_osdep.h:40,
from ../lib/acl/rte_acl.h:14,
from ../lib/acl/acl_run.h:8,
from ../lib/acl/acl_run_sse.h:5,
from ../lib/acl/acl_run_avx2.h:5,
from ../lib/acl/acl_run_avx2.c:6:
../lib/eal/include/generic/rte_memcpy.h:89:1: warning: 'rte_mov256' declared 'static' but never defined [-Wunused-function]
89 | rte_mov256(uint8_t *dst, const uint8_t *src);
| ^~~~~~~~~~
[52/2597] Compiling C object lib/acl/libavx512_tmp.a.p/acl_run_avx512.c.o
In file included from ../lib/eal/x86/include/rte_memcpy.h:24,
from ../lib/acl/rte_acl_osdep.h:40,
from ../lib/acl/rte_acl.h:14,
from ../lib/acl/acl_run.h:8,
from ../lib/acl/acl_run_sse.h:5,
from ../lib/acl/acl_run_avx512.c:5:
../lib/eal/include/generic/rte_memcpy.h:89:1: warning: 'rte_mov256' declared 'static' but never defined [-Wunused-function]
89 | rte_mov256(uint8_t *dst, const uint8_t *src);
| ^~~~~~~~~~
At SmartShare Systems we follow a coding convention of including the declaration header file at the absolute top of the file implementing it. This reveals at build time if anything is missing in the declaration header file. The DPDK Project could do the same, and find bugs like this.
Here's an example:
foo.h:
------
// Declaration
static inline uint32_t bar(uint32_t x);
foo.c:
------
#include <foo.h> // <-- Note: At the absolute top!
#include <stdint.h>
// Implementation
static inline uint32_t bar(uint32_t x)
{
return x * 2;
}
Following our coding convention will reveal that <stdint.h> is required for using <foo.h>, and thus should be included in foo.h (not in foo.c) - because someone else might include <foo.h>, and then <stdint.h> could be missing there.
-Morten
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86: rte_mov256 was missing for AVX2
2022-08-29 12:18 ` Morten Brørup
@ 2022-08-29 13:12 ` Thomas Monjalon
0 siblings, 0 replies; 7+ messages in thread
From: Thomas Monjalon @ 2022-08-29 13:12 UTC (permalink / raw)
To: Morten Brørup; +Cc: bruce.richardson, konstantin.v.ananyev, dev
29/08/2022 14:18, Morten Brørup:
> At SmartShare Systems we follow a coding convention of including the declaration header file at the absolute top of the file implementing it. This reveals at build time if anything is missing in the declaration header file. The DPDK Project could do the same, and find bugs like this.
>
> Here's an example:
>
> foo.h:
> ------
> // Declaration
> static inline uint32_t bar(uint32_t x);
>
> foo.c:
> ------
> #include <foo.h> // <-- Note: At the absolute top!
> #include <stdint.h>
>
> // Implementation
> static inline uint32_t bar(uint32_t x)
> {
> return x * 2;
> }
>
> Following our coding convention will reveal that <stdint.h> is required for using <foo.h>, and thus should be included in foo.h (not in foo.c) - because someone else might include <foo.h>, and then <stdint.h> could be missing there.
Yes we could follow this convention.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] x86: rte_mov256 was missing for AVX2
2022-08-20 10:30 [PATCH] x86: rte_mov256 was missing for AVX2 Morten Brørup
2022-08-29 10:55 ` Thomas Monjalon
@ 2022-09-28 19:44 ` Morten Brørup
2022-09-29 8:25 ` Bruce Richardson
2022-09-30 8:34 ` David Marchand
2 siblings, 1 reply; 7+ messages in thread
From: Morten Brørup @ 2022-09-28 19:44 UTC (permalink / raw)
To: Bruce Richardson, david.marchand, Thomas Monjalon
Cc: dev, konstantin.v.ananyev
Bruce, David, Thomas,
PING. Please ack or review this simple patch, so it can be merged.
Details were already discussed on the list with Thomas.
NB: The test errors in Patchwork are bogus: "ERROR: Could not detect Ninja v1.5 or newer" is clearly not related to the patch.
-Morten
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Saturday, 20 August 2022 12.31
>
> The rte_mov256 function was missing for AVX2.
> Does nobody build test for AVX2 and check the compiler output?
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> lib/eal/x86/include/rte_memcpy.h | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/lib/eal/x86/include/rte_memcpy.h
> b/lib/eal/x86/include/rte_memcpy.h
> index b678b5c942..d4d7a5cfc8 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -371,6 +371,23 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
> rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 *
> 32);
> }
>
> +/**
> + * Copy 256 bytes from one location to another,
> + * locations should not overlap.
> + */
> +static __rte_always_inline void
> +rte_mov256(uint8_t *dst, const uint8_t *src)
> +{
> + rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 *
> 32);
> + rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 *
> 32);
> + rte_mov32((uint8_t *)dst + 2 * 32, (const uint8_t *)src + 2 *
> 32);
> + rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 *
> 32);
> + rte_mov32((uint8_t *)dst + 4 * 32, (const uint8_t *)src + 4 *
> 32);
> + rte_mov32((uint8_t *)dst + 5 * 32, (const uint8_t *)src + 5 *
> 32);
> + rte_mov32((uint8_t *)dst + 6 * 32, (const uint8_t *)src + 6 *
> 32);
> + rte_mov32((uint8_t *)dst + 7 * 32, (const uint8_t *)src + 7 *
> 32);
> +}
> +
> /**
> * Copy 128-byte blocks from one location to another,
> * locations should not overlap.
> --
> 2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86: rte_mov256 was missing for AVX2
2022-09-28 19:44 ` Morten Brørup
@ 2022-09-29 8:25 ` Bruce Richardson
0 siblings, 0 replies; 7+ messages in thread
From: Bruce Richardson @ 2022-09-29 8:25 UTC (permalink / raw)
To: Morten Brørup
Cc: david.marchand, Thomas Monjalon, dev, konstantin.v.ananyev
On Wed, Sep 28, 2022 at 09:44:35PM +0200, Morten Brørup wrote:
> Bruce, David, Thomas,
>
> PING. Please ack or review this simple patch, so it can be merged.
>
> Details were already discussed on the list with Thomas.
>
> NB: The test errors in Patchwork are bogus: "ERROR: Could not detect Ninja v1.5 or newer" is clearly not related to the patch.
>
> -Morten
>
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Saturday, 20 August 2022 12.31
> >
> > The rte_mov256 function was missing for AVX2.
> > Does nobody build test for AVX2 and check the compiler output?
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86: rte_mov256 was missing for AVX2
2022-08-20 10:30 [PATCH] x86: rte_mov256 was missing for AVX2 Morten Brørup
2022-08-29 10:55 ` Thomas Monjalon
2022-09-28 19:44 ` Morten Brørup
@ 2022-09-30 8:34 ` David Marchand
2 siblings, 0 replies; 7+ messages in thread
From: David Marchand @ 2022-09-30 8:34 UTC (permalink / raw)
To: Morten Brørup; +Cc: bruce.richardson, konstantin.v.ananyev, dev
On Sat, Aug 20, 2022 at 12:30 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> The rte_mov256 function was missing for AVX2.
Afaiu:
Fixes: 9144d6bcdefd ("eal/x86: optimize memcpy for SSE and AVX")
This has been missing for a long time, so I guess nobody actually uses it.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Applied, thanks.
If you think it is worth always including the generic/ headers in all
arch specific headers, can you work on it?
Thanks.
--
David Marchand
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-09-30 8:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-20 10:30 [PATCH] x86: rte_mov256 was missing for AVX2 Morten Brørup
2022-08-29 10:55 ` Thomas Monjalon
2022-08-29 12:18 ` Morten Brørup
2022-08-29 13:12 ` Thomas Monjalon
2022-09-28 19:44 ` Morten Brørup
2022-09-29 8:25 ` Bruce Richardson
2022-09-30 8:34 ` David Marchand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).