From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 08ACFA0350; Wed, 1 Jul 2020 15:14:55 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AA4A81C295; Wed, 1 Jul 2020 15:14:54 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 8DED61C28F for ; Wed, 1 Jul 2020 15:14:52 +0200 (CEST) IronPort-SDR: mNYp44Kjz9x4qoolaZOtD0Y13YqxQJdDoVXe4ljWJWbdxofYwiiRh1arwlC3RiLhrQFSM4lUgL BmjNJDBIQhVQ== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="134838293" X-IronPort-AV: E=Sophos;i="5.75,300,1589266800"; d="scan'208";a="134838293" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jul 2020 06:14:43 -0700 IronPort-SDR: BP5FtHVyywamkHjund9YbuHx5QGuUuICW9uJ7vdgI+qNk1KXpLiVw7r10sq9KD2tePKmDhVIuw KBd2dicXTYsg== X-IronPort-AV: E=Sophos;i="5.75,300,1589266800"; d="scan'208";a="425590426" Received: from bricha3-mobl.ger.corp.intel.com ([10.251.80.77]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 01 Jul 2020 06:14:41 -0700 Date: Wed, 1 Jul 2020 14:14:38 +0100 From: Bruce Richardson To: Radu Nicolau Cc: dev@dpdk.org, beilei.xing@intel.com, jia.guo@intel.com, konstantin.ananyev@intel.com, jerinjacobk@gmail.com Message-ID: <20200701131438.GC595@bricha3-MOBL.ger.corp.intel.com> References: <1591870283-7776-1-git-send-email-radu.nicolau@intel.com> <1592568380-27120-1-git-send-email-radu.nicolau@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1592568380-27120-1-git-send-email-radu.nicolau@intel.com> Subject: Re: [dpdk-dev] [PATCH v2 1/2] eal: add WC store functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Jun 19, 2020 at 01:06:19PM +0100, Radu Nicolau wrote: > Add rte_write32_wc and rte_write32_wc_relaxed functions > that implement 32bit stores using write combining memory protocol. > Provided generic stubs and x86 implementation. > > Signed-off-by: Radu Nicolau > --- > v2 rework new eal io functions > I think you need to add into the cover letter, as suggest on V1 by Konstantin, a bit about using WC memory for better performance. > lib/librte_eal/include/generic/rte_io.h | 47 ++++++++++++++++++++++++++ > lib/librte_eal/x86/include/rte_io.h | 59 +++++++++++++++++++++++++++++++++ > 2 files changed, 106 insertions(+) > > diff --git a/lib/librte_eal/include/generic/rte_io.h b/lib/librte_eal/include/generic/rte_io.h > index da457f7..7391782 100644 > --- a/lib/librte_eal/include/generic/rte_io.h > +++ b/lib/librte_eal/include/generic/rte_io.h > @@ -229,6 +229,39 @@ rte_write32(uint32_t value, volatile void *addr); > static inline void > rte_write64(uint64_t value, volatile void *addr); > > +/** > + * Write a 32-bit value to I/O device memory address addr using write > + * combining memory write protocol. Depending on the platform write combining > + * may not be available and/or may be treated as a hint and the behavior may > + * fallback to a regular store. > + * > + * @param value > + * Value to write > + * @param addr > + * I/O memory address to write the value to > + */ > +static inline void > +rte_write32_wc(uint32_t value, volatile void *addr); > + > +/** > + * Write a 32-bit value to I/O device memory address addr using write > + * combining memory write protocol. Depending on the platform write combining > + * may not be available and/or may be treated as a hint and the behavior may > + * fallback to a regular store. > + * > + * The relaxed version does not have additional I/O memory barrier, useful in > + * accessing the device registers of integrated controllers which implicitly > + * strongly ordered with respect to memory access. > + * > + * @param value > + * Value to write > + * @param addr > + * I/O memory address to write the value to > + */ > +static inline void > +rte_write32_wc_relaxed(uint32_t value, volatile void *addr); > + > + > #endif /* __DOXYGEN__ */ > > #ifndef RTE_OVERRIDE_IO_H > @@ -345,6 +378,20 @@ rte_write64(uint64_t value, volatile void *addr) > rte_write64_relaxed(value, addr); > } > > +#ifndef RTE_NATIVE_WRITE32_WC > +rte_write32_wc(uint32_t value, volatile void *addr) > +{ > + rte_write32(value, addr); > +} > + > +static __rte_always_inline void > +rte_write32_wc_relaxed(uint32_t value, volatile void *addr) > +{ > + rte_write32_relaxed(value, addr); > +} > +#endif /* RTE_NATIVE_WRITE32_WC */ > + > + > #endif /* RTE_OVERRIDE_IO_H */ > I like this approach, since it saves duplicating the non-overridden functions. Nice! > #endif /* _RTE_IO_H_ */ > diff --git a/lib/librte_eal/x86/include/rte_io.h b/lib/librte_eal/x86/include/rte_io.h > index 2db71b1..5efbf0d 100644 > --- a/lib/librte_eal/x86/include/rte_io.h > +++ b/lib/librte_eal/x86/include/rte_io.h > @@ -9,8 +9,67 @@ > extern "C" { > #endif > > +#include "rte_cpuflags.h" > + > +#define RTE_NATIVE_WRITE32_WC > #include "generic/rte_io.h" > > +/** > + * @internal > + * MOVDIRI wrapper. > + */ > +static __rte_always_inline void > +_rte_x86_movdiri(uint32_t value, volatile void *addr) > +{ > + asm volatile( > + /* MOVDIRI */ > + ".byte 0x40, 0x0f, 0x38, 0xf9, 0x02" > + : > + : "a" (value), "d" (addr)); > +} > + > +static __rte_always_inline void > +rte_write32_wc(uint32_t value, volatile void *addr) > +{ > + static int _x86_movdiri_flag = -1; > + if (_x86_movdiri_flag == 1) { > + rte_wmb(); > + _rte_x86_movdiri(value, addr); > + } else if (_x86_movdiri_flag == 0) { > + rte_write32(value, addr); > + } else { > + _x86_movdiri_flag = > + (rte_cpu_get_flag_enabled(RTE_CPUFLAG_MOVDIRI) > 0); > + if (_x86_movdiri_flag == 1) { > + rte_wmb(); > + _rte_x86_movdiri(value, addr); > + } else { > + rte_write32(value, addr); > + } > + } > +} > + > +static __rte_always_inline void > +rte_write32_wc_relaxed(uint32_t value, volatile void *addr) > +{ > + static int _x86_movdiri_flag = -1; > + if (_x86_movdiri_flag == 1) { > + _rte_x86_movdiri(value, addr); > + } else if (_x86_movdiri_flag == 0) { > + rte_write32_relaxed(value, addr); > + } else { > + _x86_movdiri_flag = > + (rte_cpu_get_flag_enabled(RTE_CPUFLAG_MOVDIRI) > 0); > + if (_x86_movdiri_flag == 1) > + _rte_x86_movdiri(value, addr); > + else > + rte_write32_relaxed(value, addr); > + } > +} > + > + > + > + Rather a lot of whitespace here. > #ifdef __cplusplus > } > #endif > -- > 2.7.4 With the nits called out above fixed: Acked-by: Bruce Richardson