From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4B6A7A04E1; Tue, 22 Sep 2020 10:28:10 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 191781D417; Tue, 22 Sep 2020 10:28:10 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 0A5771C2DD for ; Tue, 22 Sep 2020 10:28:07 +0200 (CEST) IronPort-SDR: ADSJfuqFj5K5OL60QtUKTQD4fhxOGqgL7FZUomJnpyxcVC/m+nLWGbQ9mf1Z67tR1+CvKTzmyG FFlSkmcZcKDg== X-IronPort-AV: E=McAfee;i="6000,8403,9751"; a="148312379" X-IronPort-AV: E=Sophos;i="5.77,290,1596524400"; d="scan'208";a="148312379" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2020 01:28:07 -0700 IronPort-SDR: dBbEcTvjUMBXYz0Ld0fitLMbz+dHxnMteS/wX884aH8Aoe+5lUT3OXSQsOIW+PKKCwkzG3ueew /ehOIRkA9kKA== X-IronPort-AV: E=Sophos;i="5.77,290,1596524400"; d="scan'208";a="485863611" Received: from bricha3-mobl.ger.corp.intel.com ([10.249.150.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 22 Sep 2020 01:28:05 -0700 Date: Tue, 22 Sep 2020 09:28:01 +0100 From: Bruce Richardson To: Omkar Maslekar Cc: dev@dpdk.org, ciara.loftus@intel.com Message-ID: <20200922082801.GA1604@bricha3-MOBL.ger.corp.intel.com> References: <1599700614-22809-1-git-send-email-omkar.maslekar@intel.com> <1600739967-6499-1-git-send-email-omkar.maslekar@intel.com> <1600739967-6499-2-git-send-email-omkar.maslekar@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1600739967-6499-2-git-send-email-omkar.maslekar@intel.com> Subject: Re: [dpdk-dev] [PATCH v4] eal: add cache-line demote support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, Sep 21, 2020 at 06:59:27PM -0700, Omkar Maslekar wrote: > rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr) > enables software to hint to hardware that line is likely to be shared. > Useful in core-to-core communications where cache-line is likely to be > shared. ARM and PPC implementation is provided with NOP and can be added > if any equivalent instructions could be used for implementation on those > architectures. > > Signed-off-by: Omkar Maslekar > Few minor suggestions below. With those fixed, feel free to add my ack to future versions of this patch. Acked-by: Bruce Richardson > --- > v4: updated bold text for title and fixed margin in release notes > * > v3: fixed warning regarding whitespace > * > v2: documentation updated > --- > --- > doc/guides/rel_notes/release_20_11.rst | 6 ++++++ > lib/librte_eal/arm/include/rte_prefetch_32.h | 5 +++++ > lib/librte_eal/arm/include/rte_prefetch_64.h | 5 +++++ > lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++ > lib/librte_eal/ppc/include/rte_prefetch.h | 5 +++++ > lib/librte_eal/x86/include/rte_prefetch.h | 9 +++++++++ > 6 files changed, 43 insertions(+) > > diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst > index df227a1..b844b96 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -55,6 +55,12 @@ New Features > Also, make sure to start the actual text at the margin. > ======================================================= > > +* **Added new function rte_cldemote in rte_prefetch.h.** > + > + Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse. > + CLDEMOTE moves the cache line to the more remote cache, where it expects > + sharing to be efficient. Moving the cache line to a level more distant from > + the processor helps to accelerate core-to-core communication. > I think you need two blank lines between sections here, not just one. > Removed Items > ------------- > diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h > index e53420a..ad91edd 100644 > --- a/lib/librte_eal/arm/include/rte_prefetch_32.h > +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h > @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p) > rte_prefetch0(p); > } > > +static inline void rte_cldemote(const volatile void *p) > +{ > + RTE_SET_USED(p); > +} > + > #ifdef __cplusplus > } > #endif > diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h > index fc2b391..35d278a 100644 > --- a/lib/librte_eal/arm/include/rte_prefetch_64.h > +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h > @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p) > asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p)); > } > > +static inline void rte_cldemote(const volatile void *p) > +{ > + RTE_SET_USED(p); > +} > + > #ifdef __cplusplus > } > #endif > diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h > index 6e47bdf..8742412 100644 > --- a/lib/librte_eal/include/generic/rte_prefetch.h > +++ b/lib/librte_eal/include/generic/rte_prefetch.h > @@ -51,4 +51,17 @@ > */ > static inline void rte_prefetch_non_temporal(const volatile void *p); > > +/** > + * Demote a cache line to a more distant level of cache from the processor. > + * > + * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to > + * the processor to a level more distant from the processor. It is a hint and > + * not guarantee. rte_cldemote is intended to speed up things at the producer, > + * in the producer-consumer case. > + * Two thoughts here: 1. Is it not more the consumer who benefits more since they are the ones receiving the demoted value, while the producer pays a higher cost since they have to demote the value on send? 2. Rather than talking about producer consumer case specifically, I think it would be good to replace the last sentence with what you have in the cover letter about it being for sharing, and to indicate that a line may be accessed by a different core in the future. > + * @param p > + * Address to demote > + */ > +static inline void rte_cldemote(const volatile void *p); > + > #endif /* _RTE_PREFETCH_H_ */ > diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h > index 9ba07c8..3fe9655 100644 > --- a/lib/librte_eal/ppc/include/rte_prefetch.h > +++ b/lib/librte_eal/ppc/include/rte_prefetch.h > @@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p) > rte_prefetch0(p); > } > > +static inline void rte_cldemote(const volatile void *p) > +{ > + RTE_SET_USED(p); > +} > + > #ifdef __cplusplus > } > #endif > diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h > index 384c6b3..029d06e 100644 > --- a/lib/librte_eal/x86/include/rte_prefetch.h > +++ b/lib/librte_eal/x86/include/rte_prefetch.h > @@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p) > asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p)); > } > > +/* > + * we're using raw byte codes for now as only the newest compiler > + * versions support this instruction natively. > + */ > +static inline void rte_cldemote(const volatile void *p) > +{ > + asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p)); > +} > + > #ifdef __cplusplus > } > #endif > -- > 1.8.3.1 >