* [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
@ 2020-09-10  1:16 Omkar Maslekar
  2020-09-10  1:16 ` Omkar Maslekar
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Omkar Maslekar @ 2020-09-10  1:16 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
 doc/guides/rel_notes/release_20_11.rst        | 26 ++++----------------------
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h |  7 +++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 35 insertions(+), 22 deletions(-)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
@ 2020-09-10  1:16 ` Omkar Maslekar
  2020-09-10  8:55   ` Bruce Richardson
  2020-09-10 22:04   ` David Christensen
  2020-09-11 16:51 ` [dpdk-dev] [PATCH v2] " Omkar Maslekar
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Omkar Maslekar @ 2020-09-10  1:16 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
---
 doc/guides/rel_notes/release_20_11.rst        | 26 ++++----------------------
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h |  7 +++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 35 insertions(+), 22 deletions(-)
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index df227a1..c4a4362 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -27,29 +27,11 @@ New Features
 .. This section should contain new features added in this release.
    Sample format:
 
-   * **Add a title in the past tense with a full stop.**
+Added new instruction CLDEMOTE in rte_prefetch.h.
 
-     Add a short 1-2 sentence description in the past tense.
-     The description should be enough to allow someone scanning
-     the release notes to understand the new feature.
-
-     If the feature adds a lot of sub-features you can use a bullet list
-     like this:
-
-     * Added feature foo to do something.
-     * Enhanced feature bar to do something else.
-
-     Refer to the previous release notes for examples.
-
-     Suggested order in release notes items:
-     * Core libs (EAL, mempool, ring, mbuf, buses)
-     * Device abstraction libs and PMDs
-       - ethdev (lib, PMDs)
-       - cryptodev (lib, PMDs)
-       - eventdev (lib, PMDs)
-       - etc
-     * Other libs
-     * Apps, Examples, Tools (if significant)
+     Added a hardware hint CLDEMOTE which is similar to prefetch in reverse.
+     CLDEMOTES moves the cache line to the last shared cache, where it expects
+     sharing to be efficient.
 
      This section is a comment. Do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..ad91edd 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..35d278a 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdf..89ec69c 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,11 @@
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Demote a cache line into the last shared cache level.
+ * @param p
+ *   Address to demote
+ */
+static inline void rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..3fe9655 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..029d06e 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we're using raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
  2020-09-10  1:16 ` Omkar Maslekar
@ 2020-09-10  8:55   ` Bruce Richardson
  2020-09-10 23:30     ` Maslekar, Omkar
  2020-09-10 22:04   ` David Christensen
  1 sibling, 1 reply; 38+ messages in thread
From: Bruce Richardson @ 2020-09-10  8:55 UTC (permalink / raw)
  To: Omkar Maslekar; +Cc: dev, ciara.loftus
On Wed, Sep 09, 2020 at 06:16:54PM -0700, Omkar Maslekar wrote:
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
> 
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> ---
Hi Omkar,
please see some review comments inline below.
Regards,
/Bruce
>  doc/guides/rel_notes/release_20_11.rst        | 26 ++++----------------------
>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
>  lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
>  lib/librte_eal/include/generic/rte_prefetch.h |  7 +++++++
>  lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
>  lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
>  6 files changed, 35 insertions(+), 22 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index df227a1..c4a4362 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -27,29 +27,11 @@ New Features
>  .. This section should contain new features added in this release.
>     Sample format:
>  
> -   * **Add a title in the past tense with a full stop.**
> +Added new instruction CLDEMOTE in rte_prefetch.h.
You need to prefix this with the library it is in, in this case EAL. Also,
since this is C code, you are adding a function, not an instruction.
>  
> -     Add a short 1-2 sentence description in the past tense.
> -     The description should be enough to allow someone scanning
> -     the release notes to understand the new feature.
> -
> -     If the feature adds a lot of sub-features you can use a bullet list
> -     like this:
> -
> -     * Added feature foo to do something.
> -     * Enhanced feature bar to do something else.
> -
> -     Refer to the previous release notes for examples.
> -
> -     Suggested order in release notes items:
> -     * Core libs (EAL, mempool, ring, mbuf, buses)
> -     * Device abstraction libs and PMDs
> -       - ethdev (lib, PMDs)
> -       - cryptodev (lib, PMDs)
> -       - eventdev (lib, PMDs)
> -       - etc
> -     * Other libs
> -     * Apps, Examples, Tools (if significant)
Don't remove these lines, they are all also part of the same comment as
below where it says "Do not overwrite or remove it" :-)
> +     Added a hardware hint CLDEMOTE which is similar to prefetch in reverse.
> +     CLDEMOTES moves the cache line to the last shared cache, where it expects
> +     sharing to be efficient.
>  
Reading the instruction description in the Intel instruction set reference,
it says about moving the cache line to a more remote cache-line, rather
than guaranteeing that it goes to the last level cache. Therefore, to
ensure compatiblity with the current spec and make it more flexible to meet
any other hardware implementations, I suggest changing the "last shared
cache ..." to "more remote cache where sharing may be more efficient".
>       This section is a comment. Do not overwrite or remove it.
>       Also, make sure to start the actual text at the margin.
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
> index e53420a..ad91edd 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	rte_prefetch0(p);
>  }
>  
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
> index fc2b391..35d278a 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
>  }
>  
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
> index 6e47bdf..89ec69c 100644
> --- a/lib/librte_eal/include/generic/rte_prefetch.h
> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
> @@ -51,4 +51,11 @@
>   */
>  static inline void rte_prefetch_non_temporal(const volatile void *p);
>  
> +/**
> + * Demote a cache line into the last shared cache level.
Same comment as above. Since this will make it into the official API
doxygen documentation, I think a bit fuller of a description would be good
also.
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
  2020-09-10  1:16 ` Omkar Maslekar
  2020-09-10  8:55   ` Bruce Richardson
@ 2020-09-10 22:04   ` David Christensen
  1 sibling, 0 replies; 38+ messages in thread
From: David Christensen @ 2020-09-10 22:04 UTC (permalink / raw)
  To: Omkar Maslekar, dev; +Cc: bruce.richardson, ciara.loftus
On 9/9/20 6:16 PM, Omkar Maslekar wrote:
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
> 
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> ---
...
> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
> index 9ba07c8..3fe9655 100644
> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
> @@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>   	rte_prefetch0(p);
>   }
> 
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>   #ifdef __cplusplus
>   }
>   #endif
For POWER there's an instruction defined in the ISA which is the most 
similar, miso (i.e. Make It So), but the instruction is interpreted as a 
NOP in the POWER8/POWER9 CPUs, so NOP is the right choice for PPC at 
this time.
Dave
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
  2020-09-10  8:55   ` Bruce Richardson
@ 2020-09-10 23:30     ` Maslekar, Omkar
  0 siblings, 0 replies; 38+ messages in thread
From: Maslekar, Omkar @ 2020-09-10 23:30 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev, Loftus, Ciara
Hi Bruce,
 >-----Original Message-----
 >From: Bruce Richardson <bruce.richardson@intel.com>
 >Sent: Thursday, September 10, 2020 1:55 AM
 >To: Maslekar, Omkar <omkar.maslekar@intel.com>
 >Cc: dev@dpdk.org; Loftus, Ciara <ciara.loftus@intel.com>
 >Subject: Re: [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in
 >rte_prefetch.h
 >
 >On Wed, Sep 09, 2020 at 06:16:54PM -0700, Omkar Maslekar wrote:
 >> rte_cldemote is similar to a prefetch hint - in reverse.
 >> cldemote(addr) enables software to hint to hardware that line is likely to be
 >shared.
 >> Useful in core-to-core communications where cache-line is likely to be
 >> shared. ARM and PPC implementation is provided with NOP and can be
 >> added if any equivalent instructions could be used for implementation
 >> on those architectures.
 >>
 >> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
 >> ---
 >
 >Hi Omkar,
 >
 >please see some review comments inline below.
 >
 >Regards,
 >/Bruce
 >
 >>  doc/guides/rel_notes/release_20_11.rst        | 26 ++++----------------------
 >>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 >> lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 >> lib/librte_eal/include/generic/rte_prefetch.h |  7 +++++++
 >>  lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 >>  lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 >>  6 files changed, 35 insertions(+), 22 deletions(-)
 >>
 >> diff --git a/doc/guides/rel_notes/release_20_11.rst
 >> b/doc/guides/rel_notes/release_20_11.rst
 >> index df227a1..c4a4362 100644
 >> --- a/doc/guides/rel_notes/release_20_11.rst
 >> +++ b/doc/guides/rel_notes/release_20_11.rst
 >> @@ -27,29 +27,11 @@ New Features
 >>  .. This section should contain new features added in this release.
 >>     Sample format:
 >>
 >> -   * **Add a title in the past tense with a full stop.**
 >> +Added new instruction CLDEMOTE in rte_prefetch.h.
 >
 >You need to prefix this with the library it is in, in this case EAL. Also, since this
 >is C code, you are adding a function, not an instruction.
[I will fix these release notes] 
 >
 >>
 >> -     Add a short 1-2 sentence description in the past tense.
 >> -     The description should be enough to allow someone scanning
 >> -     the release notes to understand the new feature.
 >> -
 >> -     If the feature adds a lot of sub-features you can use a bullet list
 >> -     like this:
 >> -
 >> -     * Added feature foo to do something.
 >> -     * Enhanced feature bar to do something else.
 >> -
 >> -     Refer to the previous release notes for examples.
 >> -
 >> -     Suggested order in release notes items:
 >> -     * Core libs (EAL, mempool, ring, mbuf, buses)
 >> -     * Device abstraction libs and PMDs
 >> -       - ethdev (lib, PMDs)
 >> -       - cryptodev (lib, PMDs)
 >> -       - eventdev (lib, PMDs)
 >> -       - etc
 >> -     * Other libs
 >> -     * Apps, Examples, Tools (if significant)
 >
 >Don't remove these lines, they are all also part of the same comment as
 >below where it says "Do not overwrite or remove it" :-)
[I will revert original comment and add appropriate] 
 >
 >> +     Added a hardware hint CLDEMOTE which is similar to prefetch in
 >reverse.
 >> +     CLDEMOTES moves the cache line to the last shared cache, where it
 >expects
 >> +     sharing to be efficient.
 >>
 >
 >Reading the instruction description in the Intel instruction set reference, it
 >says about moving the cache line to a more remote cache-line, rather than
 >guaranteeing that it goes to the last level cache. Therefore, to ensure
 >compatiblity with the current spec and make it more flexible to meet any
 >other hardware implementations, I suggest changing the "last shared cache
 >..." to "more remote cache where sharing may be more efficient".
[I will make these changes as per suggestion and make sure it is in sync with software development manual ] 
 >
 >>       This section is a comment. Do not overwrite or remove it.
 >>       Also, make sure to start the actual text at the margin.
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> index e53420a..ad91edd 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	rte_prefetch0(p);
 >>  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> index fc2b391..35d278a 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
 >> b/lib/librte_eal/include/generic/rte_prefetch.h
 >> index 6e47bdf..89ec69c 100644
 >> --- a/lib/librte_eal/include/generic/rte_prefetch.h
 >> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
 >> @@ -51,4 +51,11 @@
 >>   */
 >>  static inline void rte_prefetch_non_temporal(const volatile void *p);
 >>
 >> +/**
 >> + * Demote a cache line into the last shared cache level.
 >
 >Same comment as above. Since this will make it into the official API doxygen
 >documentation, I think a bit fuller of a description would be good also.
[I will add more documentation] 
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v2] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
  2020-09-10  1:16 ` Omkar Maslekar
@ 2020-09-11 16:51 ` Omkar Maslekar
  2020-09-11 16:51   ` Omkar Maslekar
  2020-09-11 21:22 ` [dpdk-dev] [PATCH v3] " Omkar Maslekar
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-09-11 16:51 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
 doc/guides/rel_notes/release_20_11.rst        |  8 +++++++-
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 44 insertions(+), 1 deletion(-)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v2] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
  2020-09-11 16:51 ` [dpdk-dev] [PATCH v2] " Omkar Maslekar
@ 2020-09-11 16:51   ` Omkar Maslekar
  0 siblings, 0 replies; 38+ messages in thread
From: Omkar Maslekar @ 2020-09-11 16:51 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
---
v2: documentation updated
---
---
 doc/guides/rel_notes/release_20_11.rst        |  8 +++++++-
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index df227a1..b86e9e0 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -26,7 +26,7 @@ New Features
 
 .. This section should contain new features added in this release.
    Sample format:
-
+   
    * **Add a title in the past tense with a full stop.**
 
      Add a short 1-2 sentence description in the past tense.
@@ -55,6 +55,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+     EAL: Added new function rte_cldemote in rte_prefetch.h.
+
+     Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+     CLDEMOTE moves the cache line to the more remote cache, where it expects
+     sharing to be efficient. Moving the cache line to a level more distant from
+     the processor helps to accelerate core-to-core communication.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..ad91edd 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..35d278a 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdf..8742412 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,17 @@
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Demote a cache line to a more distant level of cache from the processor.
+ *
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guarantee. rte_cldemote is intended to speed up things at the producer,
+ * in the producer-consumer case.
+ *
+ * @param p
+ *   Address to demote
+ */
+static inline void rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..3fe9655 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..029d06e 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we're using raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v3] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
  2020-09-10  1:16 ` Omkar Maslekar
  2020-09-11 16:51 ` [dpdk-dev] [PATCH v2] " Omkar Maslekar
@ 2020-09-11 21:22 ` Omkar Maslekar
  2020-09-11 21:22   ` Omkar Maslekar
  2020-09-22  1:59 ` [dpdk-dev] [PATCH v4] eal: add cache-line demote support Omkar Maslekar
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-09-11 21:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
 doc/guides/rel_notes/release_20_11.rst        |  6 ++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 43 insertions(+)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v3] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
  2020-09-11 21:22 ` [dpdk-dev] [PATCH v3] " Omkar Maslekar
@ 2020-09-11 21:22   ` Omkar Maslekar
  0 siblings, 0 replies; 38+ messages in thread
From: Omkar Maslekar @ 2020-09-11 21:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
---
v3: fixed warning regarding whitespace
*
v2: documentation updated
---
---
 doc/guides/rel_notes/release_20_11.rst        |  6 ++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 43 insertions(+)
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index df227a1..248f2a8 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+     EAL: Added new function rte_cldemote in rte_prefetch.h.
+
+     Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+     CLDEMOTE moves the cache line to the more remote cache, where it expects
+     sharing to be efficient. Moving the cache line to a level more distant from
+     the processor helps to accelerate core-to-core communication.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..ad91edd 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..35d278a 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdf..8742412 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,17 @@
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Demote a cache line to a more distant level of cache from the processor.
+ *
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guarantee. rte_cldemote is intended to speed up things at the producer,
+ * in the producer-consumer case.
+ *
+ * @param p
+ *   Address to demote
+ */
+static inline void rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..3fe9655 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..029d06e 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we're using raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v4] eal: add cache-line demote support
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
                   ` (2 preceding siblings ...)
  2020-09-11 21:22 ` [dpdk-dev] [PATCH v3] " Omkar Maslekar
@ 2020-09-22  1:59 ` Omkar Maslekar
  2020-09-22  1:59   ` Omkar Maslekar
  2020-10-01  0:28 ` [dpdk-dev] [PATCH v5] " Omkar Maslekar
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-09-22  1:59 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h
 doc/guides/rel_notes/release_20_11.rst        |  6 ++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 43 insertions(+)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v4] eal: add cache-line demote support
  2020-09-22  1:59 ` [dpdk-dev] [PATCH v4] eal: add cache-line demote support Omkar Maslekar
@ 2020-09-22  1:59   ` Omkar Maslekar
  2020-09-22  8:28     ` Bruce Richardson
  0 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-09-22  1:59 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
---
v4: updated bold text for title and fixed margin in release notes
*
v3: fixed warning regarding whitespace
*
v2: documentation updated
---
---
 doc/guides/rel_notes/release_20_11.rst        |  6 ++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 43 insertions(+)
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index df227a1..b844b96 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added new function rte_cldemote in rte_prefetch.h.**
+
+  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+  CLDEMOTE moves the cache line to the more remote cache, where it expects
+  sharing to be efficient. Moving the cache line to a level more distant from
+  the processor helps to accelerate core-to-core communication.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..ad91edd 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..35d278a 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdf..8742412 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,17 @@
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Demote a cache line to a more distant level of cache from the processor.
+ *
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guarantee. rte_cldemote is intended to speed up things at the producer,
+ * in the producer-consumer case.
+ *
+ * @param p
+ *   Address to demote
+ */
+static inline void rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..3fe9655 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..029d06e 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we're using raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v4] eal: add cache-line demote support
  2020-09-22  1:59   ` Omkar Maslekar
@ 2020-09-22  8:28     ` Bruce Richardson
  2020-09-22 21:53       ` Maslekar, Omkar
  0 siblings, 1 reply; 38+ messages in thread
From: Bruce Richardson @ 2020-09-22  8:28 UTC (permalink / raw)
  To: Omkar Maslekar; +Cc: dev, ciara.loftus
On Mon, Sep 21, 2020 at 06:59:27PM -0700, Omkar Maslekar wrote:
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
> 
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
>
Few minor suggestions below. With those fixed, feel free to add my ack to
future versions of this patch.
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
 
> ---
> v4: updated bold text for title and fixed margin in release notes
> *
> v3: fixed warning regarding whitespace
> *
> v2: documentation updated
> ---
> ---
>  doc/guides/rel_notes/release_20_11.rst        |  6 ++++++
>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
>  lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
>  lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
>  lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
>  lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
>  6 files changed, 43 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index df227a1..b844b96 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,12 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
>  
> +* **Added new function rte_cldemote in rte_prefetch.h.**
> +
> +  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
> +  CLDEMOTE moves the cache line to the more remote cache, where it expects
> +  sharing to be efficient. Moving the cache line to a level more distant from
> +  the processor helps to accelerate core-to-core communication.
>  
I think you need two blank lines between sections here, not just one.
>  Removed Items
>  -------------
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
> index e53420a..ad91edd 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	rte_prefetch0(p);
>  }
>  
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
> index fc2b391..35d278a 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
>  }
>  
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
> index 6e47bdf..8742412 100644
> --- a/lib/librte_eal/include/generic/rte_prefetch.h
> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
> @@ -51,4 +51,17 @@
>   */
>  static inline void rte_prefetch_non_temporal(const volatile void *p);
>  
> +/**
> + * Demote a cache line to a more distant level of cache from the processor.
> + *
> + * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
> + * the processor to a level more distant from the processor. It is a hint and
> + * not guarantee. rte_cldemote is intended to speed up things at the producer,
> + * in the producer-consumer case.
> + *
Two thoughts here:
1. Is it not more the consumer who benefits more since they are the ones
receiving the demoted value, while the producer pays a higher cost since
they have to demote the value on send?
2. Rather than talking about producer consumer case specifically, I think
it would be good to replace the last sentence with what you have in the
cover letter about it being for sharing, and to indicate that a line may be
accessed by a different core in the future.
> + * @param p
> + *   Address to demote
> + */
> +static inline void rte_cldemote(const volatile void *p);
> +
>  #endif /* _RTE_PREFETCH_H_ */
> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
> index 9ba07c8..3fe9655 100644
> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
> @@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	rte_prefetch0(p);
>  }
>  
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
> index 384c6b3..029d06e 100644
> --- a/lib/librte_eal/x86/include/rte_prefetch.h
> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
> @@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
>  }
>  
> +/*
> + * we're using raw byte codes for now as only the newest compiler
> + * versions support this instruction natively.
> + */
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> -- 
> 1.8.3.1
> 
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v4] eal: add cache-line demote support
  2020-09-22  8:28     ` Bruce Richardson
@ 2020-09-22 21:53       ` Maslekar, Omkar
  0 siblings, 0 replies; 38+ messages in thread
From: Maslekar, Omkar @ 2020-09-22 21:53 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev, Loftus, Ciara
Hi Bruce,
My comments are inline
 >-----Original Message-----
 >From: Bruce Richardson <bruce.richardson@intel.com>
 >Sent: Tuesday, September 22, 2020 1:28 AM
 >To: Maslekar, Omkar <omkar.maslekar@intel.com>
 >Cc: dev@dpdk.org; Loftus, Ciara <ciara.loftus@intel.com>
 >Subject: Re: [PATCH v4] eal: add cache-line demote support
 >
 >On Mon, Sep 21, 2020 at 06:59:27PM -0700, Omkar Maslekar wrote:
 >> rte_cldemote is similar to a prefetch hint - in reverse.
 >> cldemote(addr) enables software to hint to hardware that line is likely to be
 >shared.
 >> Useful in core-to-core communications where cache-line is likely to be
 >> shared. ARM and PPC implementation is provided with NOP and can be
 >> added if any equivalent instructions could be used for implementation
 >> on those architectures.
 >>
 >> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
 >>
 >Few minor suggestions below. With those fixed, feel free to add my ack to
 >future versions of this patch.
 >
 >Acked-by: Bruce Richardson <bruce.richardson@intel.com>
 >
 >> ---
 >> v4: updated bold text for title and fixed margin in release notes
 >> *
 >> v3: fixed warning regarding whitespace
 >> *
 >> v2: documentation updated
 >> ---
 >> ---
 >>  doc/guides/rel_notes/release_20_11.rst        |  6 ++++++
 >>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 >> lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 >> lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
 >>  lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 >>  lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 >>  6 files changed, 43 insertions(+)
 >>
 >> diff --git a/doc/guides/rel_notes/release_20_11.rst
 >> b/doc/guides/rel_notes/release_20_11.rst
 >> index df227a1..b844b96 100644
 >> --- a/doc/guides/rel_notes/release_20_11.rst
 >> +++ b/doc/guides/rel_notes/release_20_11.rst
 >> @@ -55,6 +55,12 @@ New Features
 >>       Also, make sure to start the actual text at the margin.
 >>       =======================================================
 >>
 >> +* **Added new function rte_cldemote in rte_prefetch.h.**
 >> +
 >> +  Added a hardware hint CLDEMOTE, which is similar to prefetch in
 >reverse.
 >> +  CLDEMOTE moves the cache line to the more remote cache, where it
 >> + expects  sharing to be efficient. Moving the cache line to a level
 >> + more distant from  the processor helps to accelerate core-to-core
 >communication.
 >>
 >
 >I think you need two blank lines between sections here, not just one.
[om] you are right, I will fix in v5. 
 >
 >>  Removed Items
 >>  -------------
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> index e53420a..ad91edd 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	rte_prefetch0(p);
 >>  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> index fc2b391..35d278a 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
 >> b/lib/librte_eal/include/generic/rte_prefetch.h
 >> index 6e47bdf..8742412 100644
 >> --- a/lib/librte_eal/include/generic/rte_prefetch.h
 >> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
 >> @@ -51,4 +51,17 @@
 >>   */
 >>  static inline void rte_prefetch_non_temporal(const volatile void *p);
 >>
 >> +/**
 >> + * Demote a cache line to a more distant level of cache from the
 >processor.
 >> + *
 >> + * CLDEMOTE hints to hardware to move (demote) a cache line from the
 >> +closest to
 >> + * the processor to a level more distant from the processor. It is a
 >> +hint and
 >> + * not guarantee. rte_cldemote is intended to speed up things at the
 >> +producer,
 >> + * in the producer-consumer case.
 >> + *
 >
 >Two thoughts here:
 >1. Is it not more the consumer who benefits more since they are the ones
 >receiving the demoted value, while the producer pays a higher cost since
 >they have to demote the value on send?
[OM] CLDEMOTE benefits the consumer. My statement "speed up things at the producer" indicate proximity where the distance is reduced.
But I will make it simple and more readable. 
 >2. Rather than talking about producer consumer case specifically, I think it
 >would be good to replace the last sentence with what you have in the cover
 >letter about it being for sharing, and to indicate that a line may be accessed
 >by a different core in the future.
 
[OM] Good point, there could be many other cores that can benefit instead of just a single consumer. I will update this.
 >
 >> + * @param p
 >> + *   Address to demote
 >> + */
 >> +static inline void rte_cldemote(const volatile void *p);
 >> +
 >>  #endif /* _RTE_PREFETCH_H_ */
 >> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h
 >> b/lib/librte_eal/ppc/include/rte_prefetch.h
 >> index 9ba07c8..3fe9655 100644
 >> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
 >> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
 >> @@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	rte_prefetch0(p);
 >>  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h
 >> b/lib/librte_eal/x86/include/rte_prefetch.h
 >> index 384c6b3..029d06e 100644
 >> --- a/lib/librte_eal/x86/include/rte_prefetch.h
 >> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
 >> @@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char
 >> *)p));  }
 >>
 >> +/*
 >> + * we're using raw byte codes for now as only the newest compiler
 >> + * versions support this instruction natively.
 >> + */
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p)); }
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> --
 >> 1.8.3.1
 >>
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v5] eal: add cache-line demote support
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
                   ` (3 preceding siblings ...)
  2020-09-22  1:59 ` [dpdk-dev] [PATCH v4] eal: add cache-line demote support Omkar Maslekar
@ 2020-10-01  0:28 ` Omkar Maslekar
  2020-10-01  0:28   ` Omkar Maslekar
  2020-10-12 10:19 ` [dpdk-dev] [PATCH v6] " Omkar Maslekar
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-01  0:28 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  eal: add cache-line demote support
 doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 14 ++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 45 insertions(+)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v5] eal: add cache-line demote support
  2020-10-01  0:28 ` [dpdk-dev] [PATCH v5] " Omkar Maslekar
@ 2020-10-01  0:28   ` Omkar Maslekar
  2020-10-08  7:09     ` David Marchand
  2020-10-08 13:12     ` Jerin Jacob
  0 siblings, 2 replies; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-01  0:28 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ciara.loftus, omkar.maslekar
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
v5: documentation updated
    fixed formatting issue in release notes
    added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
*
v4: updated bold text for title and fixed margin in release notes
*
v3: fixed warning regarding whitespace
*
v2: documentation updated
---
---
 doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 14 ++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 45 insertions(+)
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index df227a1..dc402ab 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added new function rte_cldemote in rte_prefetch.h.**
+
+  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+  CLDEMOTE moves the cache line to the more remote cache, where it expects
+  sharing to be efficient. Moving the cache line to a level more distant from
+  the processor helps to accelerate core-to-core communication.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..ad91edd 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..35d278a 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdf..5500cd5 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,18 @@
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Demote a cache line to a more distant level of cache from the processor.
+ *
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guarantee. rte_cldemote is intended to move the cache line to the more
+ * remote cache, where it expects sharing to be efficient and to indicate that a
+ * line may be accessed by a different core in the future.
+ *
+ * @param p
+ *   Address to demote
+ */
+static inline void rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..3fe9655 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..029d06e 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we're using raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v5] eal: add cache-line demote support
  2020-10-01  0:28   ` Omkar Maslekar
@ 2020-10-08  7:09     ` David Marchand
  2020-10-08  9:02       ` Bruce Richardson
  2020-10-08 13:12     ` Jerin Jacob
  1 sibling, 1 reply; 38+ messages in thread
From: David Marchand @ 2020-10-08  7:09 UTC (permalink / raw)
  To: Omkar Maslekar
  Cc: dev, Bruce Richardson, Ananyev, Konstantin, Ciara Loftus,
	David Christensen, Jerin Jacob Kollanukkaran,
	Honnappa Nagarahalli, Ruifeng Wang (Arm Technology China),
	Jan Viktorin, Thomas Monjalon
On Thu, Oct 1, 2020 at 2:30 AM Omkar Maslekar <omkar.maslekar@intel.com> wrote:
>
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
>
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
I find this "rte_cldemote" name too close to the Intel instruction,
but I can see no complaint from other arch maintainers, so I guess
everyone is happy with it.
In any case, this is a new API, so it should be marked experimental.
As for unit tests, not sure there is much to do, maybe rename
test_prefetch.c and call this new API too, wdyt?
-- 
David Marchand
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v5] eal: add cache-line demote support
  2020-10-08  7:09     ` David Marchand
@ 2020-10-08  9:02       ` Bruce Richardson
  2020-10-12  9:41         ` David Marchand
  0 siblings, 1 reply; 38+ messages in thread
From: Bruce Richardson @ 2020-10-08  9:02 UTC (permalink / raw)
  To: David Marchand
  Cc: Omkar Maslekar, dev, Ananyev, Konstantin, Ciara Loftus,
	David Christensen, Jerin Jacob Kollanukkaran,
	Honnappa Nagarahalli, Ruifeng Wang (Arm Technology China),
	Jan Viktorin, Thomas Monjalon
On Thu, Oct 08, 2020 at 09:09:52AM +0200, David Marchand wrote:
> On Thu, Oct 1, 2020 at 2:30 AM Omkar Maslekar <omkar.maslekar@intel.com> wrote:
> >
> > rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> > enables software to hint to hardware that line is likely to be shared.
> > Useful in core-to-core communications where cache-line is likely to be
> > shared. ARM and PPC implementation is provided with NOP and can be added
> > if any equivalent instructions could be used for implementation on those
> > architectures.
> >
> > Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> I find this "rte_cldemote" name too close to the Intel instruction,
> but I can see no complaint from other arch maintainers, so I guess
> everyone is happy with it.
It is very close, alright - though the name too does fairly well convey the
likely actual done by the instruction.. Is there a suggestion for a better,
more generic name.
> In any case, this is a new API, so it should be marked experimental.
> 
Agreed.
> As for unit tests, not sure there is much to do, maybe rename
> test_prefetch.c and call this new API too, wdyt?
> 
I'm not sure how much value this would provide, but it can be done.
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v5] eal: add cache-line demote support
  2020-10-01  0:28   ` Omkar Maslekar
  2020-10-08  7:09     ` David Marchand
@ 2020-10-08 13:12     ` Jerin Jacob
  1 sibling, 0 replies; 38+ messages in thread
From: Jerin Jacob @ 2020-10-08 13:12 UTC (permalink / raw)
  To: Omkar Maslekar; +Cc: dpdk-dev, Richardson, Bruce, ciara.loftus
On Thu, Oct 1, 2020 at 6:00 AM Omkar Maslekar <omkar.maslekar@intel.com> wrote:
>
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
>
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v5: documentation updated
>     fixed formatting issue in release notes
>     added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> *
> v4: updated bold text for title and fixed margin in release notes
> *
> v3: fixed warning regarding whitespace
> *
> v2: documentation updated
> ---
> ---
>  doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
>  lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
>  lib/librte_eal/include/generic/rte_prefetch.h | 14 ++++++++++++++
>  lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
>  lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
>  6 files changed, 45 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index df227a1..dc402ab 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,13 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
>
> +* **Added new function rte_cldemote in rte_prefetch.h.**
> +
> +  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
> +  CLDEMOTE moves the cache line to the more remote cache, where it expects
> +  sharing to be efficient. Moving the cache line to a level more distant from
> +  the processor helps to accelerate core-to-core communication.
> +
>
>  Removed Items
>  -------------
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
> index e53420a..ad91edd 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>         rte_prefetch0(p);
>  }
>
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +       RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
> index fc2b391..35d278a 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>         asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
>  }
>
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +       RTE_SET_USED(p);
> +}
ARM64 does not have this support so NOP is fine for this.
Acked-by: Jerin Jacob <jerinj@marvell.com>
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
> index 6e47bdf..5500cd5 100644
> --- a/lib/librte_eal/include/generic/rte_prefetch.h
> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
> @@ -51,4 +51,18 @@
>   */
>  static inline void rte_prefetch_non_temporal(const volatile void *p);
>
> +/**
> + * Demote a cache line to a more distant level of cache from the processor.
> + *
> + * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
> + * the processor to a level more distant from the processor. It is a hint and
> + * not guarantee. rte_cldemote is intended to move the cache line to the more
> + * remote cache, where it expects sharing to be efficient and to indicate that a
> + * line may be accessed by a different core in the future.
> + *
> + * @param p
> + *   Address to demote
> + */
> +static inline void rte_cldemote(const volatile void *p);
> +
>  #endif /* _RTE_PREFETCH_H_ */
> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
> index 9ba07c8..3fe9655 100644
> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
> @@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>         rte_prefetch0(p);
>  }
>
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +       RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
> index 384c6b3..029d06e 100644
> --- a/lib/librte_eal/x86/include/rte_prefetch.h
> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
> @@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>         asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
>  }
>
> +/*
> + * we're using raw byte codes for now as only the newest compiler
> + * versions support this instruction natively.
> + */
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +       asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 1.8.3.1
>
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v5] eal: add cache-line demote support
  2020-10-08  9:02       ` Bruce Richardson
@ 2020-10-12  9:41         ` David Marchand
  0 siblings, 0 replies; 38+ messages in thread
From: David Marchand @ 2020-10-12  9:41 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Omkar Maslekar, dev, Ananyev, Konstantin, Ciara Loftus,
	David Christensen, Jerin Jacob Kollanukkaran,
	Honnappa Nagarahalli, Ruifeng Wang (Arm Technology China),
	Jan Viktorin, Thomas Monjalon
On Thu, Oct 8, 2020 at 11:02 AM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Thu, Oct 08, 2020 at 09:09:52AM +0200, David Marchand wrote:
> > On Thu, Oct 1, 2020 at 2:30 AM Omkar Maslekar <omkar.maslekar@intel.com> wrote:
> > >
> > > rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> > > enables software to hint to hardware that line is likely to be shared.
> > > Useful in core-to-core communications where cache-line is likely to be
> > > shared. ARM and PPC implementation is provided with NOP and can be added
> > > if any equivalent instructions could be used for implementation on those
> > > architectures.
> > >
> > > Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> > > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> >
> > I find this "rte_cldemote" name too close to the Intel instruction,
> > but I can see no complaint from other arch maintainers, so I guess
> > everyone is happy with it.
>
> It is very close, alright - though the name too does fairly well convey the
> likely actual done by the instruction.. Is there a suggestion for a better,
> more generic name.
I don't have a better suggestion.
The prefetch API has some hints on the level of cache to put data in.
For this new API, we have no indication, would it make sense?
Is this available on all Intel CPUs supported with DPDK?
No cpuflag check needed?
>
> > In any case, this is a new API, so it should be marked experimental.
> >
> Agreed.
>
> > As for unit tests, not sure there is much to do, maybe rename
> > test_prefetch.c and call this new API too, wdyt?
> >
> I'm not sure how much value this would provide, but it can be done.
As much as the existing test, checking we can call this API.
If you think it is not worth it, we can drop the prefetch ut code.
-- 
David Marchand
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v6] eal: add cache-line demote support
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
                   ` (4 preceding siblings ...)
  2020-10-01  0:28 ` [dpdk-dev] [PATCH v5] " Omkar Maslekar
@ 2020-10-12 10:19 ` Omkar Maslekar
  2020-10-12 10:19   ` Omkar Maslekar
  2020-10-13  9:43 ` [dpdk-dev] [PATCH v7] " Omkar Maslekar
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-12 10:19 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, ciara.loftus, omkar.maslekar, drc, jerinj,
	ruifeng.wang, honnappa.nagarahalli
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  eal: add cache-line demote support
 app/test/test_prefetch.c                      |  4 ++++
 doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  8 ++++++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  8 ++++++++
 lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  8 ++++++++
 lib/librte_eal/x86/include/rte_prefetch.h     | 12 ++++++++++++
 7 files changed, 63 insertions(+)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v6] eal: add cache-line demote support
  2020-10-12 10:19 ` [dpdk-dev] [PATCH v6] " Omkar Maslekar
@ 2020-10-12 10:19   ` Omkar Maslekar
  2020-10-12 19:31     ` David Christensen
  2020-10-13  2:59     ` Ruifeng Wang
  0 siblings, 2 replies; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-12 10:19 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, ciara.loftus, omkar.maslekar, drc, jerinj,
	ruifeng.wang, honnappa.nagarahalli
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
v6: marked rte_cldemote as experimental
    added rte_cldemote call in existing app/test_prefetch.c
v5: documentation updated
    fixed formatting issue in release notes
    added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
*
v4: updated bold text for title and fixed margin in release notes
*
v3: fixed warning regarding whitespace
*
v2: documentation updated
---
---
 app/test/test_prefetch.c                      |  4 ++++
 doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  8 ++++++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  8 ++++++++
 lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  8 ++++++++
 lib/librte_eal/x86/include/rte_prefetch.h     | 12 ++++++++++++
 7 files changed, 63 insertions(+)
diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c
index 41f219a..5c58d0c 100644
--- a/app/test/test_prefetch.c
+++ b/app/test/test_prefetch.c
@@ -26,7 +26,11 @@
 	rte_prefetch1(&a);
 	rte_prefetch2(&a);
 
+/* test for marking a line as shared to test cldemote functionality */
+	rte_cldemote(&a);
+
 	return 0;
 }
 
+
 REGISTER_TEST_COMMAND(prefetch_autotest, test_prefetch);
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index df227a1..dc402ab 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added new function rte_cldemote in rte_prefetch.h.**
+
+  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+  CLDEMOTE moves the cache line to the more remote cache, where it expects
+  sharing to be efficient. Moving the cache line to a level more distant from
+  the processor helps to accelerate core-to-core communication.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..062ed27 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -10,6 +10,7 @@
 #endif
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include "generic/rte_prefetch.h"
 
 static inline void rte_prefetch0(const volatile void *p)
@@ -33,6 +34,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void
+__rte_experimental
+rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..6e5ee07 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -10,6 +10,7 @@
 #endif
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include "generic/rte_prefetch.h"
 
 static inline void rte_prefetch0(const volatile void *p)
@@ -32,6 +33,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+static inline void
+__rte_experimental
+rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdf..3474548 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,20 @@
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Demote a cache line to a more distant level of cache from the processor.
+ *
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guarantee. rte_cldemote is intended to move the cache line to the more
+ * remote cache, where it expects sharing to be efficient and to indicate that a
+ * line may be accessed by a different core in the future.
+ *
+ * @param p
+ *   Address to demote
+ */
+static inline void
+__rte_experimental
+rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..9630227 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -11,6 +11,7 @@
 #endif
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include "generic/rte_prefetch.h"
 
 static inline void rte_prefetch0(const volatile void *p)
@@ -34,6 +35,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void
+__rte_experimental
+rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..e1e120e 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -10,6 +10,7 @@
 #endif
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include "generic/rte_prefetch.h"
 
 static inline void rte_prefetch0(const volatile void *p)
@@ -32,6 +33,17 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we're using raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void
+__rte_experimental
+rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v6] eal: add cache-line demote support
  2020-10-12 10:19   ` Omkar Maslekar
@ 2020-10-12 19:31     ` David Christensen
  2020-10-13  2:59     ` Ruifeng Wang
  1 sibling, 0 replies; 38+ messages in thread
From: David Christensen @ 2020-10-12 19:31 UTC (permalink / raw)
  To: Omkar Maslekar, dev
  Cc: bruce.richardson, ciara.loftus, jerinj, ruifeng.wang,
	honnappa.nagarahalli
On 10/12/20 3:19 AM, Omkar Maslekar wrote:
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
> 
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v6: marked rte_cldemote as experimental
>      added rte_cldemote call in existing app/test_prefetch.c
> 
> v5: documentation updated
>      fixed formatting issue in release notes
>      added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> *
> v4: updated bold text for title and fixed margin in release notes
> *
> v3: fixed warning regarding whitespace
> *
> v2: documentation updated
> ---
> ---
>   app/test/test_prefetch.c                      |  4 ++++
>   doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
>   lib/librte_eal/arm/include/rte_prefetch_32.h  |  8 ++++++++
>   lib/librte_eal/arm/include/rte_prefetch_64.h  |  8 ++++++++
>   lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
>   lib/librte_eal/ppc/include/rte_prefetch.h     |  8 ++++++++
>   lib/librte_eal/x86/include/rte_prefetch.h     | 12 ++++++++++++
>   7 files changed, 63 insertions(+)
...snip...
> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
> index 9ba07c8..9630227 100644
> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
> @@ -11,6 +11,7 @@
>   #endif
> 
>   #include <rte_common.h>
> +#include <rte_compat.h>
>   #include "generic/rte_prefetch.h"
> 
>   static inline void rte_prefetch0(const volatile void *p)
> @@ -34,6 +35,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>   	rte_prefetch0(p);
>   }
> 
> +static inline void
> +__rte_experimental
> +rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>   #ifdef __cplusplus
>   }
>   #endif
Don't see an equivalent operation in the 3.1 ISA for POWER processors, 
so NOP is the right implementation.
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v6] eal: add cache-line demote support
  2020-10-12 10:19   ` Omkar Maslekar
  2020-10-12 19:31     ` David Christensen
@ 2020-10-13  2:59     ` Ruifeng Wang
  2020-10-13 16:20       ` Bruce Richardson
  1 sibling, 1 reply; 38+ messages in thread
From: Ruifeng Wang @ 2020-10-13  2:59 UTC (permalink / raw)
  To: Omkar Maslekar, dev
  Cc: bruce.richardson, ciara.loftus, drc, jerinj, Honnappa Nagarahalli, nd
> -----Original Message-----
> From: Omkar Maslekar <omkar.maslekar@intel.com>
> Sent: Monday, October 12, 2020 6:20 PM
> To: dev@dpdk.org
> Cc: bruce.richardson@intel.com; ciara.loftus@intel.com;
> omkar.maslekar@intel.com; drc@linux.vnet.ibm.com; jerinj@marvell.com;
> Ruifeng Wang <Ruifeng.Wang@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Subject: [PATCH v6] eal: add cache-line demote support
> 
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be
> added if any equivalent instructions could be used for implementation on
> those architectures.
> 
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v6: marked rte_cldemote as experimental
>     added rte_cldemote call in existing app/test_prefetch.c
> 
> v5: documentation updated
>     fixed formatting issue in release notes
>     added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> *
> v4: updated bold text for title and fixed margin in release notes
> *
> v3: fixed warning regarding whitespace
> *
> v2: documentation updated
> ---
> ---
>  app/test/test_prefetch.c                      |  4 ++++
>  doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  8 ++++++++
> lib/librte_eal/arm/include/rte_prefetch_64.h  |  8 ++++++++
> lib/librte_eal/include/generic/rte_prefetch.h | 16 ++++++++++++++++
>  lib/librte_eal/ppc/include/rte_prefetch.h     |  8 ++++++++
>  lib/librte_eal/x86/include/rte_prefetch.h     | 12 ++++++++++++
>  7 files changed, 63 insertions(+)
> 
> diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c index
> 41f219a..5c58d0c 100644
> --- a/app/test/test_prefetch.c
> +++ b/app/test/test_prefetch.c
> @@ -26,7 +26,11 @@
>  	rte_prefetch1(&a);
>  	rte_prefetch2(&a);
> 
> +/* test for marking a line as shared to test cldemote functionality */
> +	rte_cldemote(&a);
> +
>  	return 0;
>  }
> 
> +
>  REGISTER_TEST_COMMAND(prefetch_autotest, test_prefetch); diff --git
> a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index df227a1..dc402ab 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,13 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Added new function rte_cldemote in rte_prefetch.h.**
> +
> +  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
> +  CLDEMOTE moves the cache line to the more remote cache, where it
> + expects  sharing to be efficient. Moving the cache line to a level
> + more distant from  the processor helps to accelerate core-to-core
> communication.
> +
Patch cannot apply. Maybe rebase is needed.
> 
>  Removed Items
>  -------------
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
> b/lib/librte_eal/arm/include/rte_prefetch_32.h
> index e53420a..062ed27 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
> @@ -10,6 +10,7 @@
>  #endif
> 
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
> 
>  static inline void rte_prefetch0(const volatile void *p) @@ -33,6 +34,13 @@
> static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	rte_prefetch0(p);
>  }
> 
> +static inline void
> +__rte_experimental
See below.
> +rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h
> b/lib/librte_eal/arm/include/rte_prefetch_64.h
> index fc2b391..6e5ee07 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
> @@ -10,6 +10,7 @@
>  #endif
> 
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
> 
>  static inline void rte_prefetch0(const volatile void *p) @@ -32,6 +33,13 @@
> static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));  }
> 
> +static inline void
> +__rte_experimental
> +rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
> b/lib/librte_eal/include/generic/rte_prefetch.h
> index 6e47bdf..3474548 100644
> --- a/lib/librte_eal/include/generic/rte_prefetch.h
> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
> @@ -51,4 +51,20 @@
>   */
>  static inline void rte_prefetch_non_temporal(const volatile void *p);
> 
> +/**
> + * Demote a cache line to a more distant level of cache from the processor.
> + *
> + * CLDEMOTE hints to hardware to move (demote) a cache line from the
> +closest to
> + * the processor to a level more distant from the processor. It is a
> +hint and
> + * not guarantee. rte_cldemote is intended to move the cache line to
> +the more
> + * remote cache, where it expects sharing to be efficient and to
> +indicate that a
> + * line may be accessed by a different core in the future.
> + *
> + * @param p
> + *   Address to demote
> + */
> +static inline void
> +__rte_experimental
1. Experimental tag is only needed in this file. Tags at other places can be removed.
2. To align with other codes, experimental tag can be put above 'static inline void' line.
> +rte_cldemote(const volatile void *p);
> +
>  #endif /* _RTE_PREFETCH_H_ */
> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h
> b/lib/librte_eal/ppc/include/rte_prefetch.h
> index 9ba07c8..9630227 100644
> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
> @@ -11,6 +11,7 @@
>  #endif
> 
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
> 
>  static inline void rte_prefetch0(const volatile void *p) @@ -34,6 +35,13 @@
> static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	rte_prefetch0(p);
>  }
> 
> +static inline void
> +__rte_experimental
> +rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h
> b/lib/librte_eal/x86/include/rte_prefetch.h
> index 384c6b3..e1e120e 100644
> --- a/lib/librte_eal/x86/include/rte_prefetch.h
> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
> @@ -10,6 +10,7 @@
>  #endif
> 
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
> 
>  static inline void rte_prefetch0(const volatile void *p) @@ -32,6 +33,17 @@
> static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char
> *)p));  }
> 
> +/*
> + * we're using raw byte codes for now as only the newest compiler
> + * versions support this instruction natively.
> + */
> +static inline void
> +__rte_experimental
> +rte_cldemote(const volatile void *p)
> +{
> +	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p)); }
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v7] eal: add cache-line demote support
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
                   ` (5 preceding siblings ...)
  2020-10-12 10:19 ` [dpdk-dev] [PATCH v6] " Omkar Maslekar
@ 2020-10-13  9:43 ` Omkar Maslekar
  2020-10-13  9:43   ` Omkar Maslekar
  2020-10-15 15:18 ` [dpdk-dev] [PATCH v8] " Omkar Maslekar
  2020-10-15 23:20 ` [dpdk-dev] [PATCH v9] " Omkar Maslekar
  8 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-13  9:43 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, ciara.loftus, omkar.maslekar, drc, jerinj,
	ruifeng.wang, honnappa.nagarahalli
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  eal: add cache-line demote support
 app/test/test_prefetch.c                      |  4 ++++
 doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  7 +++++++
 lib/librte_eal/include/generic/rte_prefetch.h | 15 +++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  7 +++++++
 lib/librte_eal/x86/include/rte_prefetch.h     | 11 +++++++++++
 7 files changed, 58 insertions(+)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v7] eal: add cache-line demote support
  2020-10-13  9:43 ` [dpdk-dev] [PATCH v7] " Omkar Maslekar
@ 2020-10-13  9:43   ` Omkar Maslekar
  2020-10-14  7:24     ` Ruifeng Wang
  2020-10-15  8:01     ` David Marchand
  0 siblings, 2 replies; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-13  9:43 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, ciara.loftus, omkar.maslekar, drc, jerinj,
	ruifeng.wang, honnappa.nagarahalli
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
v7: fixed experimental tag
v6: marked rte_cldemote as experimental
    added rte_cldemote call in existing app/test_prefetch.c
v5: documentation updated
    fixed formatting issue in release notes
    added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
*
v4: updated bold text for title and fixed margin in release notes
*
v3: fixed warning regarding whitespace
*
v2: documentation updated
---
---
 app/test/test_prefetch.c                      |  4 ++++
 doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  7 +++++++
 lib/librte_eal/include/generic/rte_prefetch.h | 15 +++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  7 +++++++
 lib/librte_eal/x86/include/rte_prefetch.h     | 11 +++++++++++
 7 files changed, 58 insertions(+)
diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c
index 41f219a..5c58d0c 100644
--- a/app/test/test_prefetch.c
+++ b/app/test/test_prefetch.c
@@ -26,7 +26,11 @@
 	rte_prefetch1(&a);
 	rte_prefetch2(&a);
 
+/* test for marking a line as shared to test cldemote functionality */
+	rte_cldemote(&a);
+
 	return 0;
 }
 
+
 REGISTER_TEST_COMMAND(prefetch_autotest, test_prefetch);
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index b7881f2..8a1ed01 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -171,6 +171,13 @@ New Features
   * Extern objects and functions can be plugged into the pipeline.
   * Transaction-oriented table updates.
 
+* **Added new function rte_cldemote in rte_prefetch.h.**
+
+  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+  CLDEMOTE moves the cache line to the more remote cache, where it expects
+  sharing to be efficient. Moving the cache line to a level more distant from
+  the processor helps to accelerate core-to-core communication.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..28b3d48 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -10,6 +10,7 @@
 #endif
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include "generic/rte_prefetch.h"
 
 static inline void rte_prefetch0(const volatile void *p)
@@ -33,6 +34,12 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+__rte_experimental
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..1c722eb 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -10,6 +10,7 @@
 #endif
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include "generic/rte_prefetch.h"
 
 static inline void rte_prefetch0(const volatile void *p)
@@ -32,6 +33,12 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+__rte_experimental
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdf..ad9844c 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,19 @@
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Demote a cache line to a more distant level of cache from the processor.
+ *
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guarantee. rte_cldemote is intended to move the cache line to the more
+ * remote cache, where it expects sharing to be efficient and to indicate that a
+ * line may be accessed by a different core in the future.
+ *
+ * @param p
+ *   Address to demote
+ */
+__rte_experimental
+static inline void rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..b55cac4 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -11,6 +11,7 @@
 #endif
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include "generic/rte_prefetch.h"
 
 static inline void rte_prefetch0(const volatile void *p)
@@ -34,6 +35,12 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+__rte_experimental
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..92ba05a 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -10,6 +10,7 @@
 #endif
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include "generic/rte_prefetch.h"
 
 static inline void rte_prefetch0(const volatile void *p)
@@ -32,6 +33,16 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we're using raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+__rte_experimental
+static inline void rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v6] eal: add cache-line demote support
  2020-10-13  2:59     ` Ruifeng Wang
@ 2020-10-13 16:20       ` Bruce Richardson
  2020-10-14  1:55         ` Ruifeng Wang
  2020-10-14  7:14         ` David Marchand
  0 siblings, 2 replies; 38+ messages in thread
From: Bruce Richardson @ 2020-10-13 16:20 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Omkar Maslekar, dev, ciara.loftus, drc, jerinj, Honnappa Nagarahalli, nd
On Tue, Oct 13, 2020 at 02:59:24AM +0000, Ruifeng Wang wrote:
> 
> > -----Original Message-----
> > From: Omkar Maslekar <omkar.maslekar@intel.com>
> > Sent: Monday, October 12, 2020 6:20 PM
> > To: dev@dpdk.org
> > Cc: bruce.richardson@intel.com; ciara.loftus@intel.com;
> > omkar.maslekar@intel.com; drc@linux.vnet.ibm.com; jerinj@marvell.com;
> > Ruifeng Wang <Ruifeng.Wang@arm.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>
> > Subject: [PATCH v6] eal: add cache-line demote support
> > 
> > rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> > enables software to hint to hardware that line is likely to be shared.
> > Useful in core-to-core communications where cache-line is likely to be
> > shared. ARM and PPC implementation is provided with NOP and can be
> > added if any equivalent instructions could be used for implementation on
> > those architectures.
> > 
> > Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> > 
> > ---
> > v6: marked rte_cldemote as experimental
> >     added rte_cldemote call in existing app/test_prefetch.c
> > 
> > v5: documentation updated
> >     fixed formatting issue in release notes
> >     added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> > *
> > v4: updated bold text for title and fixed margin in release notes
> > *
> > v3: fixed warning regarding whitespace
> > *
> > v2: documentation updated
> > ---
> > ---
<snip>
> 
> > +/**
> > + * Demote a cache line to a more distant level of cache from the processor.
> > + *
> > + * CLDEMOTE hints to hardware to move (demote) a cache line from the
> > +closest to
> > + * the processor to a level more distant from the processor. It is a
> > +hint and
> > + * not guarantee. rte_cldemote is intended to move the cache line to
> > +the more
> > + * remote cache, where it expects sharing to be efficient and to
> > +indicate that a
> > + * line may be accessed by a different core in the future.
> > + *
> > + * @param p
> > + *   Address to demote
> > + */
> > +static inline void
> > +__rte_experimental
> 
> 1. Experimental tag is only needed in this file. Tags at other places can be removed.
I'm not sure that is the case. The generic file is used when preparing the
docs, so the experimental tag needs to go there for the docs, but when
actually using the function in compiled code the "generic" version is
unused. Therefore we need the experimental tag there to trigger a build
warning about using the function if the appropriate ALLOW_EXPERIMENTAL_APIS
flag is not set.
/Bruce
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v6] eal: add cache-line demote support
  2020-10-13 16:20       ` Bruce Richardson
@ 2020-10-14  1:55         ` Ruifeng Wang
  2020-10-14  7:14         ` David Marchand
  1 sibling, 0 replies; 38+ messages in thread
From: Ruifeng Wang @ 2020-10-14  1:55 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Omkar Maslekar, dev, ciara.loftus, drc, jerinj,
	Honnappa Nagarahalli, nd, david.marchand, nd
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Wednesday, October 14, 2020 12:20 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Omkar Maslekar <omkar.maslekar@intel.com>; dev@dpdk.org;
> ciara.loftus@intel.com; drc@linux.vnet.ibm.com; jerinj@marvell.com;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v6] eal: add cache-line demote support
> 
> On Tue, Oct 13, 2020 at 02:59:24AM +0000, Ruifeng Wang wrote:
> >
> > > -----Original Message-----
> > > From: Omkar Maslekar <omkar.maslekar@intel.com>
> > > Sent: Monday, October 12, 2020 6:20 PM
> > > To: dev@dpdk.org
> > > Cc: bruce.richardson@intel.com; ciara.loftus@intel.com;
> > > omkar.maslekar@intel.com; drc@linux.vnet.ibm.com;
> > > jerinj@marvell.com; Ruifeng Wang <Ruifeng.Wang@arm.com>;
> Honnappa
> > > Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > Subject: [PATCH v6] eal: add cache-line demote support
> > >
> > > rte_cldemote is similar to a prefetch hint - in reverse.
> > > cldemote(addr) enables software to hint to hardware that line is likely to
> be shared.
> > > Useful in core-to-core communications where cache-line is likely to
> > > be shared. ARM and PPC implementation is provided with NOP and can
> > > be added if any equivalent instructions could be used for
> > > implementation on those architectures.
> > >
> > > Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> > > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> > >
> > > ---
> > > v6: marked rte_cldemote as experimental
> > >     added rte_cldemote call in existing app/test_prefetch.c
> > >
> > > v5: documentation updated
> > >     fixed formatting issue in release notes
> > >     added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> > > *
> > > v4: updated bold text for title and fixed margin in release notes
> > > *
> > > v3: fixed warning regarding whitespace
> > > *
> > > v2: documentation updated
> > > ---
> > > ---
> <snip>
> >
> > > +/**
> > > + * Demote a cache line to a more distant level of cache from the
> processor.
> > > + *
> > > + * CLDEMOTE hints to hardware to move (demote) a cache line from
> > > +the closest to
> > > + * the processor to a level more distant from the processor. It is
> > > +a hint and
> > > + * not guarantee. rte_cldemote is intended to move the cache line
> > > +to the more
> > > + * remote cache, where it expects sharing to be efficient and to
> > > +indicate that a
> > > + * line may be accessed by a different core in the future.
> > > + *
> > > + * @param p
> > > + *   Address to demote
> > > + */
> > > +static inline void
> > > +__rte_experimental
> >
> > 1. Experimental tag is only needed in this file. Tags at other places can be
> removed.
> 
> I'm not sure that is the case. The generic file is used when preparing the docs,
> so the experimental tag needs to go there for the docs, but when actually
> using the function in compiled code the "generic" version is unused.
> Therefore we need the experimental tag there to trigger a build warning
> about using the function if the appropriate ALLOW_EXPERIMENTAL_APIS flag
> is not set.
> 
+David in cc.
I learnt this from David's comment in thread:
http://patches.dpdk.org/patch/61573/
"We only need it in the function prototype"
Hi David,
Can you comment if my understanding of experimental tag usage is correct?
/Ruifeng
> /Bruce
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v6] eal: add cache-line demote support
  2020-10-13 16:20       ` Bruce Richardson
  2020-10-14  1:55         ` Ruifeng Wang
@ 2020-10-14  7:14         ` David Marchand
  2020-10-14  7:51           ` Ruifeng Wang
  1 sibling, 1 reply; 38+ messages in thread
From: David Marchand @ 2020-10-14  7:14 UTC (permalink / raw)
  To: Bruce Richardson, Ruifeng Wang
  Cc: Omkar Maslekar, dev, ciara.loftus, drc, jerinj, Honnappa Nagarahalli, nd
On Tue, Oct 13, 2020 at 6:21 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
> > 1. Experimental tag is only needed in this file. Tags at other places can be removed.
>
> I'm not sure that is the case. The generic file is used when preparing the
> docs, so the experimental tag needs to go there for the docs, but when
> actually using the function in compiled code the "generic" version is
> unused. Therefore we need the experimental tag there to trigger a build
> warning about using the function if the appropriate ALLOW_EXPERIMENTAL_APIS
> flag is not set.
It is enough to put an experimental tag when declaring a symbol.
Here, the generic/ header only contains the doxygen part and there is
no common declaration: the tag is needed in the arch specific header.
-- 
David Marchand
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v7] eal: add cache-line demote support
  2020-10-13  9:43   ` Omkar Maslekar
@ 2020-10-14  7:24     ` Ruifeng Wang
  2020-10-15  8:01     ` David Marchand
  1 sibling, 0 replies; 38+ messages in thread
From: Ruifeng Wang @ 2020-10-14  7:24 UTC (permalink / raw)
  To: Omkar Maslekar, dev
  Cc: bruce.richardson, ciara.loftus, drc, jerinj, Honnappa Nagarahalli, nd
> -----Original Message-----
> From: Omkar Maslekar <omkar.maslekar@intel.com>
> Sent: Tuesday, October 13, 2020 5:43 PM
> To: dev@dpdk.org
> Cc: bruce.richardson@intel.com; ciara.loftus@intel.com;
> omkar.maslekar@intel.com; drc@linux.vnet.ibm.com; jerinj@marvell.com;
> Ruifeng Wang <Ruifeng.Wang@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Subject: [PATCH v7] eal: add cache-line demote support
> 
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be
> added if any equivalent instructions could be used for implementation on
> those architectures.
> 
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: David Christensen <drc@linux.vnet.ibm.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> 
> ---
> v7: fixed experimental tag
> 
> v6: marked rte_cldemote as experimental
>     added rte_cldemote call in existing app/test_prefetch.c
> 
> v5: documentation updated
>     fixed formatting issue in release notes
>     added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> *
> v4: updated bold text for title and fixed margin in release notes
> *
> v3: fixed warning regarding whitespace
> *
> v2: documentation updated
> ---
> ---
>  app/test/test_prefetch.c                      |  4 ++++
>  doc/guides/rel_notes/release_20_11.rst        |  7 +++++++
>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  7 +++++++
> lib/librte_eal/arm/include/rte_prefetch_64.h  |  7 +++++++
> lib/librte_eal/include/generic/rte_prefetch.h | 15 +++++++++++++++
>  lib/librte_eal/ppc/include/rte_prefetch.h     |  7 +++++++
>  lib/librte_eal/x86/include/rte_prefetch.h     | 11 +++++++++++
>  7 files changed, 58 insertions(+)
> 
> diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c index
> 41f219a..5c58d0c 100644
> --- a/app/test/test_prefetch.c
> +++ b/app/test/test_prefetch.c
> @@ -26,7 +26,11 @@
>  	rte_prefetch1(&a);
>  	rte_prefetch2(&a);
> 
> +/* test for marking a line as shared to test cldemote functionality */
> +	rte_cldemote(&a);
> +
>  	return 0;
>  }
> 
> +
>  REGISTER_TEST_COMMAND(prefetch_autotest, test_prefetch); diff --git
> a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index b7881f2..8a1ed01 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -171,6 +171,13 @@ New Features
>    * Extern objects and functions can be plugged into the pipeline.
>    * Transaction-oriented table updates.
> 
> +* **Added new function rte_cldemote in rte_prefetch.h.**
> +
> +  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
> +  CLDEMOTE moves the cache line to the more remote cache, where it
> + expects  sharing to be efficient. Moving the cache line to a level
> + more distant from  the processor helps to accelerate core-to-core
> communication.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
> b/lib/librte_eal/arm/include/rte_prefetch_32.h
> index e53420a..28b3d48 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
> @@ -10,6 +10,7 @@
>  #endif
> 
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
> 
>  static inline void rte_prefetch0(const volatile void *p) @@ -33,6 +34,12 @@
> static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	rte_prefetch0(p);
>  }
> 
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p) {
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h
> b/lib/librte_eal/arm/include/rte_prefetch_64.h
> index fc2b391..1c722eb 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
> @@ -10,6 +10,7 @@
>  #endif
> 
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
> 
>  static inline void rte_prefetch0(const volatile void *p) @@ -32,6 +33,12 @@
> static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));  }
> 
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p) {
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
> b/lib/librte_eal/include/generic/rte_prefetch.h
> index 6e47bdf..ad9844c 100644
> --- a/lib/librte_eal/include/generic/rte_prefetch.h
> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
> @@ -51,4 +51,19 @@
>   */
>  static inline void rte_prefetch_non_temporal(const volatile void *p);
> 
> +/**
> + * Demote a cache line to a more distant level of cache from the processor.
> + *
> + * CLDEMOTE hints to hardware to move (demote) a cache line from the
> +closest to
> + * the processor to a level more distant from the processor. It is a
> +hint and
> + * not guarantee. rte_cldemote is intended to move the cache line to
> +the more
> + * remote cache, where it expects sharing to be efficient and to
> +indicate that a
> + * line may be accessed by a different core in the future.
> + *
> + * @param p
> + *   Address to demote
> + */
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p);
> +
>  #endif /* _RTE_PREFETCH_H_ */
> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h
> b/lib/librte_eal/ppc/include/rte_prefetch.h
> index 9ba07c8..b55cac4 100644
> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
> @@ -11,6 +11,7 @@
>  #endif
> 
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
> 
>  static inline void rte_prefetch0(const volatile void *p) @@ -34,6 +35,12 @@
> static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	rte_prefetch0(p);
>  }
> 
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p) {
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h
> b/lib/librte_eal/x86/include/rte_prefetch.h
> index 384c6b3..92ba05a 100644
> --- a/lib/librte_eal/x86/include/rte_prefetch.h
> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
> @@ -10,6 +10,7 @@
>  #endif
> 
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
> 
>  static inline void rte_prefetch0(const volatile void *p) @@ -32,6 +33,16 @@
> static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char
> *)p));  }
> 
> +/*
> + * we're using raw byte codes for now as only the newest compiler
> + * versions support this instruction natively.
> + */
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p) {
> +	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p)); }
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 1.8.3.1
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v6] eal: add cache-line demote support
  2020-10-14  7:14         ` David Marchand
@ 2020-10-14  7:51           ` Ruifeng Wang
  0 siblings, 0 replies; 38+ messages in thread
From: Ruifeng Wang @ 2020-10-14  7:51 UTC (permalink / raw)
  To: David Marchand, Bruce Richardson
  Cc: Omkar Maslekar, dev, ciara.loftus, drc, jerinj,
	Honnappa Nagarahalli, nd, nd
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, October 14, 2020 3:14 PM
> To: Bruce Richardson <bruce.richardson@intel.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Cc: Omkar Maslekar <omkar.maslekar@intel.com>; dev@dpdk.org;
> ciara.loftus@intel.com; drc@linux.vnet.ibm.com; jerinj@marvell.com;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v6] eal: add cache-line demote support
> 
> On Tue, Oct 13, 2020 at 6:21 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > > 1. Experimental tag is only needed in this file. Tags at other places can be
> removed.
> >
> > I'm not sure that is the case. The generic file is used when preparing
> > the docs, so the experimental tag needs to go there for the docs, but
> > when actually using the function in compiled code the "generic"
> > version is unused. Therefore we need the experimental tag there to
> > trigger a build warning about using the function if the appropriate
> > ALLOW_EXPERIMENTAL_APIS flag is not set.
> 
> It is enough to put an experimental tag when declaring a symbol.
> Here, the generic/ header only contains the doxygen part and there is no
> common declaration: the tag is needed in the arch specific header.
> 
Thank you David for the clarification.
I added my reviewed-by tag to v7.
> 
> --
> David Marchand
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v7] eal: add cache-line demote support
  2020-10-13  9:43   ` Omkar Maslekar
  2020-10-14  7:24     ` Ruifeng Wang
@ 2020-10-15  8:01     ` David Marchand
  2020-10-15 14:41       ` Maslekar, Omkar
  1 sibling, 1 reply; 38+ messages in thread
From: David Marchand @ 2020-10-15  8:01 UTC (permalink / raw)
  To: Omkar Maslekar
  Cc: dev, Bruce Richardson, Ciara Loftus, David Christensen,
	Jerin Jacob Kollanukkaran, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli
Repeating my questions:
- would there be a point in hinting at where the "demoted" line goes?
- is this instruction available on all x86 CPUs?
See comments:
On Tue, Oct 13, 2020 at 6:47 PM Omkar Maslekar <omkar.maslekar@intel.com> wrote:
> diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c
> index 41f219a..5c58d0c 100644
> --- a/app/test/test_prefetch.c
> +++ b/app/test/test_prefetch.c
> @@ -26,7 +26,11 @@
>         rte_prefetch1(&a);
>         rte_prefetch2(&a);
>
> +/* test for marking a line as shared to test cldemote functionality */
Non indented comment that gives no more info than the call itself.
Please remove.
> +       rte_cldemote(&a);
> +
>         return 0;
>  }
>
> +
Please remove this empty line.
>  REGISTER_TEST_COMMAND(prefetch_autotest, test_prefetch);
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index b7881f2..8a1ed01 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -171,6 +171,13 @@ New Features
>    * Extern objects and functions can be plugged into the pipeline.
>    * Transaction-oriented table updates.
>
> +* **Added new function rte_cldemote in rte_prefetch.h.**
> +
> +  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
This should come at the top of the features list (but after "write
combining store" entry that got in first).
Please add a mention that it only concerns x86.
> +  CLDEMOTE moves the cache line to the more remote cache, where it expects
> +  sharing to be efficient. Moving the cache line to a level more distant from
> +  the processor helps to accelerate core-to-core communication.
> +
>
>  Removed Items
>  -------------
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
> index e53420a..28b3d48 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
> @@ -10,6 +10,7 @@
>  #endif
>
>  #include <rte_common.h>
> +#include <rte_compat.h>
Move rte_compat.h inclusion from the arch headers to the
generic/rte_prefetch.h header only.
>  #include "generic/rte_prefetch.h"
>
>  static inline void rte_prefetch0(const volatile void *p)
> @@ -33,6 +34,12 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>         rte_prefetch0(p);
>  }
>
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +       RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
> index fc2b391..1c722eb 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
> @@ -10,6 +10,7 @@
>  #endif
>
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
>
>  static inline void rte_prefetch0(const volatile void *p)
> @@ -32,6 +33,12 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>         asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
>  }
>
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +       RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
> index 6e47bdf..ad9844c 100644
> --- a/lib/librte_eal/include/generic/rte_prefetch.h
> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
> @@ -51,4 +51,19 @@
>   */
>  static inline void rte_prefetch_non_temporal(const volatile void *p);
>
> +/**
> + * Demote a cache line to a more distant level of cache from the processor.
> + *
> + * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
> + * the processor to a level more distant from the processor. It is a hint and
> + * not guarantee. rte_cldemote is intended to move the cache line to the more
guaranteed*
> + * remote cache, where it expects sharing to be efficient and to indicate that a
> + * line may be accessed by a different core in the future.
> + *
> + * @param p
> + *   Address to demote
> + */
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p);
> +
>  #endif /* _RTE_PREFETCH_H_ */
> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
> index 9ba07c8..b55cac4 100644
> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
> @@ -11,6 +11,7 @@
>  #endif
>
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
>
>  static inline void rte_prefetch0(const volatile void *p)
> @@ -34,6 +35,12 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>         rte_prefetch0(p);
>  }
>
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +       RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
> index 384c6b3..92ba05a 100644
> --- a/lib/librte_eal/x86/include/rte_prefetch.h
> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
> @@ -10,6 +10,7 @@
>  #endif
>
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include "generic/rte_prefetch.h"
>
>  static inline void rte_prefetch0(const volatile void *p)
> @@ -32,6 +33,16 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>         asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
>  }
>
> +/*
> + * we're using raw byte codes for now as only the newest compiler
We use
> + * versions support this instruction natively.
> + */
> +__rte_experimental
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +       asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 1.8.3.1
>
-- 
David Marchand
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v7] eal: add cache-line demote support
  2020-10-15  8:01     ` David Marchand
@ 2020-10-15 14:41       ` Maslekar, Omkar
  2020-10-15 20:32         ` David Marchand
  0 siblings, 1 reply; 38+ messages in thread
From: Maslekar, Omkar @ 2020-10-15 14:41 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Richardson, Bruce, Loftus, Ciara, David Christensen,
	Jerin Jacob Kollanukkaran, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli
Hi David,
 >-----Original Message-----
 >From: David Marchand <david.marchand@redhat.com>
 >Sent: Thursday, October 15, 2020 1:01 AM
 >To: Maslekar, Omkar <omkar.maslekar@intel.com>
 >Cc: dev <dev@dpdk.org>; Richardson, Bruce <bruce.richardson@intel.com>;
 >Loftus, Ciara <ciara.loftus@intel.com>; David Christensen
 ><drc@linux.vnet.ibm.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
 >Ruifeng Wang (Arm Technology China) <ruifeng.wang@arm.com>;
 >Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
 >Subject: Re: [dpdk-dev] [PATCH v7] eal: add cache-line demote support
 >
 >Repeating my questions:
 >- would there be a point in hinting at where the "demoted" line goes?
Yes, it is worth mentioning a point that demoted line goes to last shared level of cache hierarchy. Demotion to desired cache level is not possible.
 >- is this instruction available on all x86 CPUs?
Yes, this instruction is available on all x86 CPUs, it works on latest cpus and substitute NOP in the older generations
 >
 >
 >See comments:
 >
 >On Tue, Oct 13, 2020 at 6:47 PM Omkar Maslekar
 ><omkar.maslekar@intel.com> wrote:
 >> diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c index
 >> 41f219a..5c58d0c 100644
 >> --- a/app/test/test_prefetch.c
 >> +++ b/app/test/test_prefetch.c
 >> @@ -26,7 +26,11 @@
 >>         rte_prefetch1(&a);
 >>         rte_prefetch2(&a);
 >>
 >> +/* test for marking a line as shared to test cldemote functionality
 >> +*/
 >
 >Non indented comment that gives no more info than the call itself.
 >Please remove.
I will fix it
 >
 >> +       rte_cldemote(&a);
 >> +
 >>         return 0;
 >>  }
 >>
 >> +
 >
 >Please remove this empty line.
 I will fix it
 >
 >>  REGISTER_TEST_COMMAND(prefetch_autotest, test_prefetch); diff --git
 >> a/doc/guides/rel_notes/release_20_11.rst
 >> b/doc/guides/rel_notes/release_20_11.rst
 >> index b7881f2..8a1ed01 100644
 >> --- a/doc/guides/rel_notes/release_20_11.rst
 >> +++ b/doc/guides/rel_notes/release_20_11.rst
 >> @@ -171,6 +171,13 @@ New Features
 >>    * Extern objects and functions can be plugged into the pipeline.
 >>    * Transaction-oriented table updates.
 >>
 >> +* **Added new function rte_cldemote in rte_prefetch.h.**
 >> +
 >> +  Added a hardware hint CLDEMOTE, which is similar to prefetch in
 >reverse.
 >
 >This should come at the top of the features list (but after "write combining
 >store" entry that got in first).
 >
 >Please add a mention that it only concerns x86.
I will modify the sequence in the release notes
 >
 >
 >> +  CLDEMOTE moves the cache line to the more remote cache, where it
 >> + expects  sharing to be efficient. Moving the cache line to a level
 >> + more distant from  the processor helps to accelerate core-to-core
 >communication.
 >> +
 >>
 >>  Removed Items
 >>  -------------
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> index e53420a..28b3d48 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> @@ -10,6 +10,7 @@
 >>  #endif
 >>
 >>  #include <rte_common.h>
 >> +#include <rte_compat.h>
 >
 >Move rte_compat.h inclusion from the arch headers to the
 >generic/rte_prefetch.h header only.
I got below build error if I move rte_compat.h inclusion from the arch headers to the generic/rte_prefetch.h header only. I will remove it and send out a new patch v8.
In file included from ../lib/librte_eal/x86/include/rte_prefetch.h:14:0,
                 from ../lib/librte_table/rte_swx_table_em.c:10:
../lib/librte_eal/include/generic/rte_prefetch.h:67:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘static’
 static inline void rte_cldemote(const volatile void *p);
 >
 >
 >>  #include "generic/rte_prefetch.h"
 >>
 >>  static inline void rte_prefetch0(const volatile void *p) @@ -33,6
 >> +34,12 @@ static inline void rte_prefetch_non_temporal(const volatile
 >void *p)
 >>         rte_prefetch0(p);
 >>  }
 >>
 >> +__rte_experimental
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +       RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> index fc2b391..1c722eb 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> @@ -10,6 +10,7 @@
 >>  #endif
 >>
 >>  #include <rte_common.h>
 >> +#include <rte_compat.h>
 >>  #include "generic/rte_prefetch.h"
 >>
 >>  static inline void rte_prefetch0(const volatile void *p) @@ -32,6
 >> +33,12 @@ static inline void rte_prefetch_non_temporal(const volatile
 >void *p)
 >>         asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));  }
 >>
 >> +__rte_experimental
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +       RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
 >> b/lib/librte_eal/include/generic/rte_prefetch.h
 >> index 6e47bdf..ad9844c 100644
 >> --- a/lib/librte_eal/include/generic/rte_prefetch.h
 >> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
 >> @@ -51,4 +51,19 @@
 >>   */
 >>  static inline void rte_prefetch_non_temporal(const volatile void *p);
 >>
 >> +/**
 >> + * Demote a cache line to a more distant level of cache from the
 >processor.
 >> + *
 >> + * CLDEMOTE hints to hardware to move (demote) a cache line from the
 >> +closest to
 >> + * the processor to a level more distant from the processor. It is a
 >> +hint and
 >> + * not guarantee. rte_cldemote is intended to move the cache line to
 >> +the more
 >
 >guaranteed*
I will fix this
 >
 >
 >> + * remote cache, where it expects sharing to be efficient and to
 >> +indicate that a
 >> + * line may be accessed by a different core in the future.
 >> + *
 >> + * @param p
 >> + *   Address to demote
 >> + */
 >> +__rte_experimental
 >> +static inline void rte_cldemote(const volatile void *p);
 >> +
 >>  #endif /* _RTE_PREFETCH_H_ */
 >> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h
 >> b/lib/librte_eal/ppc/include/rte_prefetch.h
 >> index 9ba07c8..b55cac4 100644
 >> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
 >> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
 >> @@ -11,6 +11,7 @@
 >>  #endif
 >>
 >>  #include <rte_common.h>
 >> +#include <rte_compat.h>
 >>  #include "generic/rte_prefetch.h"
 >>
 >>  static inline void rte_prefetch0(const volatile void *p) @@ -34,6
 >> +35,12 @@ static inline void rte_prefetch_non_temporal(const volatile
 >void *p)
 >>         rte_prefetch0(p);
 >>  }
 >>
 >> +__rte_experimental
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +       RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h
 >> b/lib/librte_eal/x86/include/rte_prefetch.h
 >> index 384c6b3..92ba05a 100644
 >> --- a/lib/librte_eal/x86/include/rte_prefetch.h
 >> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
 >> @@ -10,6 +10,7 @@
 >>  #endif
 >>
 >>  #include <rte_common.h>
 >> +#include <rte_compat.h>
 >>  #include "generic/rte_prefetch.h"
 >>
 >>  static inline void rte_prefetch0(const volatile void *p) @@ -32,6
 >> +33,16 @@ static inline void rte_prefetch_non_temporal(const volatile
 >void *p)
 >>         asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile
 >> char *)p));  }
 >>
 >> +/*
 >> + * we're using raw byte codes for now as only the newest compiler
 >
 >We use
I will fix this
 >
 >> + * versions support this instruction natively.
 >> + */
 >> +__rte_experimental
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +       asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p)); }
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> --
 >> 1.8.3.1
 >>
 >
 >
 >--
 >David Marchand
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v8] eal: add cache-line demote support
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
                   ` (6 preceding siblings ...)
  2020-10-13  9:43 ` [dpdk-dev] [PATCH v7] " Omkar Maslekar
@ 2020-10-15 15:18 ` Omkar Maslekar
  2020-10-15 15:18   ` Omkar Maslekar
  2020-10-15 23:20 ` [dpdk-dev] [PATCH v9] " Omkar Maslekar
  8 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-15 15:18 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, ciara.loftus, omkar.maslekar, drc, jerinj,
	ruifeng.wang, honnappa.nagarahalli
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  eal: add cache-line demote support
 app/test/test_prefetch.c                      |  2 ++
 doc/guides/rel_notes/release_20_11.rst        |  8 ++++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 18 ++++++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 7 files changed, 52 insertions(+)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v8] eal: add cache-line demote support
  2020-10-15 15:18 ` [dpdk-dev] [PATCH v8] " Omkar Maslekar
@ 2020-10-15 15:18   ` Omkar Maslekar
  0 siblings, 0 replies; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-15 15:18 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, ciara.loftus, omkar.maslekar, drc, jerinj,
	ruifeng.wang, honnappa.nagarahalli
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v8: removed unnecessary comment in test_prefetch.h
    removed header file rte_compat.h from specific arch
    rearranged sequence in the release notes
    fixed coding style in test_prefetch.h and grammar issue in documentation
    added tag Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
v7: fixed experimental tag
v6: marked rte_cldemote as experimental
    added rte_cldemote call in existing app/test_prefetch.c
v5: documentation updated
    fixed formatting issue in release notes
    added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
*
v4: updated bold text for title and fixed margin in release notes
*
v3: fixed warning regarding whitespace
*
v2: documentation updated
---
---
 app/test/test_prefetch.c                      |  2 ++
 doc/guides/rel_notes/release_20_11.rst        |  8 ++++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h | 18 ++++++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 7 files changed, 52 insertions(+)
diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c
index 32e08f8..5489885 100644
--- a/app/test/test_prefetch.c
+++ b/app/test/test_prefetch.c
@@ -30,6 +30,8 @@
 	rte_prefetch1_write(&a);
 	rte_prefetch2_write(&a);
 
+	rte_cldemote(&a);
+
 	return 0;
 }
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 708ebb0..bff213e 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -68,6 +68,14 @@ New Features
   which allow the programmer to prefetch a cache line and also indicate
   the intention to write.
 
+* **Added new function rte_cldemote in rte_prefetch.h.**
+
+  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+  CLDEMOTE moves the cache line to the more remote cache, where it expects
+  sharing to be efficient. Moving the cache line to a level more distant from
+  the processor helps to accelerate core-to-core communication.This is X86
+  specific implementation.
+
 * **Updated CRC modules of the net library.**
 
   * Added runtime selection of the optimal architecture-specific CRC path.
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..ad91edd 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..35d278a 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index df9764e..f9fab5e 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -116,4 +116,22 @@
 	__builtin_prefetch(p, 1, 1);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Demote a cache line to a more distant level of cache from the processor.
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guaranteed. rte_cldemote is intended to move the cache line to the more
+ * remote cache, where it expects sharing to be efficient and to indicate that
+ * a line may be accessed by a different core in the future.
+ *
+ * @param p
+ *   Address to demote
+ */
+__rte_experimental
+static inline void
+rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..3fe9655 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..3a9b488 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we use raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v7] eal: add cache-line demote support
  2020-10-15 14:41       ` Maslekar, Omkar
@ 2020-10-15 20:32         ` David Marchand
  0 siblings, 0 replies; 38+ messages in thread
From: David Marchand @ 2020-10-15 20:32 UTC (permalink / raw)
  To: Maslekar, Omkar
  Cc: dev, Richardson, Bruce, Loftus, Ciara, David Christensen,
	Jerin Jacob Kollanukkaran, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli
On Thu, Oct 15, 2020 at 4:41 PM Maslekar, Omkar
<omkar.maslekar@intel.com> wrote:
>  >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
>  >> b/lib/librte_eal/arm/include/rte_prefetch_32.h
>  >> index e53420a..28b3d48 100644
>  >> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
>  >> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
>  >> @@ -10,6 +10,7 @@
>  >>  #endif
>  >>
>  >>  #include <rte_common.h>
>  >> +#include <rte_compat.h>
>  >
>  >Move rte_compat.h inclusion from the arch headers to the
>  >generic/rte_prefetch.h header only.
> I got below build error if I move rte_compat.h inclusion from the arch headers to the generic/rte_prefetch.h header only. I will remove it and send out a new patch v8.
> In file included from ../lib/librte_eal/x86/include/rte_prefetch.h:14:0,
>                  from ../lib/librte_table/rte_swx_table_em.c:10:
> ../lib/librte_eal/include/generic/rte_prefetch.h:67:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘static’
Please rebase on main as I took Harry patch which was ready.
Thanks.
-- 
David Marchand
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v9] eal: add cache-line demote support
  2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
                   ` (7 preceding siblings ...)
  2020-10-15 15:18 ` [dpdk-dev] [PATCH v8] " Omkar Maslekar
@ 2020-10-15 23:20 ` Omkar Maslekar
  2020-10-15 23:20   ` Omkar Maslekar
  8 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-15 23:20 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, ciara.loftus, omkar.maslekar, drc, jerinj,
	ruifeng.wang, honnappa.nagarahalli
We are including this in rte_prefetch.h since it is the most closely
related code location.rte_cldemte is similar to a prefetch hint -in reverse
Omkar Maslekar (1):
  eal: add cache-line demote support
 app/test/test_prefetch.c                      |  2 ++
 doc/guides/rel_notes/release_20_11.rst        |  8 ++++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  7 +++++++
 lib/librte_eal/include/generic/rte_prefetch.h | 18 ++++++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  7 +++++++
 lib/librte_eal/x86/include/rte_prefetch.h     | 11 +++++++++++
 7 files changed, 60 insertions(+)
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [dpdk-dev] [PATCH v9] eal: add cache-line demote support
  2020-10-15 23:20 ` [dpdk-dev] [PATCH v9] " Omkar Maslekar
@ 2020-10-15 23:20   ` Omkar Maslekar
  2020-10-16 12:14     ` David Marchand
  0 siblings, 1 reply; 38+ messages in thread
From: Omkar Maslekar @ 2020-10-15 23:20 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, ciara.loftus, omkar.maslekar, drc, jerinj,
	ruifeng.wang, honnappa.nagarahalli
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v9: added experimental tag in arch specific files
v8: removed unnecessary comment in test_prefetch.h
    removed header file rte_compat.h from specific arch
    rearranged sequence in the release notes
    fixed coding style in test_prefetch.h and grammar issue in documentation
    added tag Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
v7: fixed experimental tag
v6: marked rte_cldemote as experimental
    added rte_cldemote call in existing app/test_prefetch.c
v5: documentation updated
    fixed formatting issue in release notes
    added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
*
v4: updated bold text for title and fixed margin in release notes
*
v3: fixed warning regarding whitespace
*
v2: documentation updated
---
---
 app/test/test_prefetch.c                      |  2 ++
 doc/guides/rel_notes/release_20_11.rst        |  8 ++++++++
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  7 +++++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  7 +++++++
 lib/librte_eal/include/generic/rte_prefetch.h | 18 ++++++++++++++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  7 +++++++
 lib/librte_eal/x86/include/rte_prefetch.h     | 11 +++++++++++
 7 files changed, 60 insertions(+)
diff --git a/app/test/test_prefetch.c b/app/test/test_prefetch.c
index 32e08f8..5489885 100644
--- a/app/test/test_prefetch.c
+++ b/app/test/test_prefetch.c
@@ -30,6 +30,8 @@
 	rte_prefetch1_write(&a);
 	rte_prefetch2_write(&a);
 
+	rte_cldemote(&a);
+
 	return 0;
 }
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index cda5b2f..7095727 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -68,6 +68,14 @@ New Features
   which allow the programmer to prefetch a cache line and also indicate
   the intention to write.
 
+* **Added new function rte_cldemote in rte_prefetch.h.**
+
+  Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+  CLDEMOTE moves the cache line to the more remote cache, where it expects
+  sharing to be efficient. Moving the cache line to a level more distant from
+  the processor helps to accelerate core-to-core communication.This is X86
+  specific implementation.
+
 * **Updated CRC modules of the net library.**
 
   * Added runtime selection of the optimal architecture-specific CRC path.
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..303caaa 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -33,6 +33,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+__rte_experimental
+static inline void
+rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..e28b66f 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -32,6 +32,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+__rte_experimental
+static inline void
+rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index df9764e..f9fab5e 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -116,4 +116,22 @@
 	__builtin_prefetch(p, 1, 1);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Demote a cache line to a more distant level of cache from the processor.
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guaranteed. rte_cldemote is intended to move the cache line to the more
+ * remote cache, where it expects sharing to be efficient and to indicate that
+ * a line may be accessed by a different core in the future.
+ *
+ * @param p
+ *   Address to demote
+ */
+__rte_experimental
+static inline void
+rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..6df8087 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -34,6 +34,13 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+__rte_experimental
+static inline void
+rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..05d49fc 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -32,6 +32,17 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we use raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+__rte_experimental
+static inline void
+rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [dpdk-dev] [PATCH v9] eal: add cache-line demote support
  2020-10-15 23:20   ` Omkar Maslekar
@ 2020-10-16 12:14     ` David Marchand
  0 siblings, 0 replies; 38+ messages in thread
From: David Marchand @ 2020-10-16 12:14 UTC (permalink / raw)
  To: Omkar Maslekar
  Cc: dev, Bruce Richardson, Ciara Loftus, David Christensen,
	Jerin Jacob Kollanukkaran, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli
On Fri, Oct 16, 2020 at 8:24 AM Omkar Maslekar <omkar.maslekar@intel.com> wrote:
>
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
>
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: David Christensen <drc@linux.vnet.ibm.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Applied, thanks Omkar.
-- 
David Marchand
^ permalink raw reply	[flat|nested] 38+ messages in thread
end of thread, other threads:[~2020-10-16 12:14 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10  1:16 [dpdk-dev] [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h Omkar Maslekar
2020-09-10  1:16 ` Omkar Maslekar
2020-09-10  8:55   ` Bruce Richardson
2020-09-10 23:30     ` Maslekar, Omkar
2020-09-10 22:04   ` David Christensen
2020-09-11 16:51 ` [dpdk-dev] [PATCH v2] " Omkar Maslekar
2020-09-11 16:51   ` Omkar Maslekar
2020-09-11 21:22 ` [dpdk-dev] [PATCH v3] " Omkar Maslekar
2020-09-11 21:22   ` Omkar Maslekar
2020-09-22  1:59 ` [dpdk-dev] [PATCH v4] eal: add cache-line demote support Omkar Maslekar
2020-09-22  1:59   ` Omkar Maslekar
2020-09-22  8:28     ` Bruce Richardson
2020-09-22 21:53       ` Maslekar, Omkar
2020-10-01  0:28 ` [dpdk-dev] [PATCH v5] " Omkar Maslekar
2020-10-01  0:28   ` Omkar Maslekar
2020-10-08  7:09     ` David Marchand
2020-10-08  9:02       ` Bruce Richardson
2020-10-12  9:41         ` David Marchand
2020-10-08 13:12     ` Jerin Jacob
2020-10-12 10:19 ` [dpdk-dev] [PATCH v6] " Omkar Maslekar
2020-10-12 10:19   ` Omkar Maslekar
2020-10-12 19:31     ` David Christensen
2020-10-13  2:59     ` Ruifeng Wang
2020-10-13 16:20       ` Bruce Richardson
2020-10-14  1:55         ` Ruifeng Wang
2020-10-14  7:14         ` David Marchand
2020-10-14  7:51           ` Ruifeng Wang
2020-10-13  9:43 ` [dpdk-dev] [PATCH v7] " Omkar Maslekar
2020-10-13  9:43   ` Omkar Maslekar
2020-10-14  7:24     ` Ruifeng Wang
2020-10-15  8:01     ` David Marchand
2020-10-15 14:41       ` Maslekar, Omkar
2020-10-15 20:32         ` David Marchand
2020-10-15 15:18 ` [dpdk-dev] [PATCH v8] " Omkar Maslekar
2020-10-15 15:18   ` Omkar Maslekar
2020-10-15 23:20 ` [dpdk-dev] [PATCH v9] " Omkar Maslekar
2020-10-15 23:20   ` Omkar Maslekar
2020-10-16 12:14     ` David Marchand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).