DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
@ 2014-09-26  9:33 Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific Chao Zhu
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Chao Zhu @ 2014-09-26  9:33 UTC (permalink / raw)
  To: dev

The set of patches split x86 architecture specific operations from DPDK and put them to the
arch directories of i686 and x86_64 architecture. This will make the adpotion of DPDK much easier
on other computer architecture. For a new architecture, just add an architecture specific
directory and necessary building configuration files, then DPDK can support it.

Chao Zhu (7):
  Split atomic operations to architecture specific
  Split byte order operations to architecture specific
  Split CPU cycle operation to architecture specific
  Split prefetch operations to architecture specific
  Split spinlock operations to architecture specific
  Split memcpy operation to architecture specific
  Split CPU flags operations to architecture specific

 lib/librte_eal/common/Makefile                     |    2 +-
 lib/librte_eal/common/eal_common_cpuflags.c        |  173 +---------
 .../common/include/i686/arch/rte_atomic_arch.h     |  378 ++++++++++++++++++++
 .../common/include/i686/arch/rte_byteorder_arch.h  |   95 +++++
 .../common/include/i686/arch/rte_cpuflags_arch.h   |  335 +++++++++++++++++
 .../common/include/i686/arch/rte_cycles_arch.h     |  108 ++++++
 .../common/include/i686/arch/rte_memcpy_arch.h     |  199 ++++++++++
 .../common/include/i686/arch/rte_prefetch_arch.h   |   68 ++++
 .../common/include/i686/arch/rte_spinlock_arch.h   |  128 +++++++
 lib/librte_eal/common/include/rte_atomic.h         |  172 +--------
 lib/librte_eal/common/include/rte_byteorder.h      |   58 +---
 lib/librte_eal/common/include/rte_cpuflags.h       |  110 +------
 lib/librte_eal/common/include/rte_cycles.h         |   31 +--
 lib/librte_eal/common/include/rte_memcpy.h         |   95 +-----
 lib/librte_eal/common/include/rte_prefetch.h       |    7 +-
 lib/librte_eal/common/include/rte_spinlock.h       |   55 +---
 .../common/include/x86_64/arch/rte_atomic_arch.h   |  378 ++++++++++++++++++++
 .../include/x86_64/arch/rte_byteorder_arch.h       |   95 +++++
 .../common/include/x86_64/arch/rte_cpuflags_arch.h |  335 +++++++++++++++++
 .../common/include/x86_64/arch/rte_cycles_arch.h   |  108 ++++++
 .../common/include/x86_64/arch/rte_memcpy_arch.h   |  199 ++++++++++
 .../common/include/x86_64/arch/rte_prefetch_arch.h |   68 ++++
 .../common/include/x86_64/arch/rte_spinlock_arch.h |  128 +++++++
 23 files changed, 2660 insertions(+), 665 deletions(-)
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_byteorder_arch.h
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_cpuflags_arch.h
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_cycles_arch.h
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_memcpy_arch.h
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_prefetch_arch.h
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_spinlock_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_byteorder_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_cpuflags_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_cycles_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_memcpy_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_prefetch_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_spinlock_arch.h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
@ 2014-09-26  9:33 ` Chao Zhu
  2014-09-29 11:05   ` Bruce Richardson
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 2/7] Split byte order " Chao Zhu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Chao Zhu @ 2014-09-26  9:33 UTC (permalink / raw)
  To: dev

This patch splits the atomic operations from DPDK and push them to
architecture specific arch directories, so that other processor
architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
---
 lib/librte_eal/common/Makefile                     |    2 +-
 .../common/include/i686/arch/rte_atomic_arch.h     |  378 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_atomic.h         |  172 +--------
 .../common/include/x86_64/arch/rte_atomic_arch.h   |  378 ++++++++++++++++++++
 4 files changed, 772 insertions(+), 158 deletions(-)
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 7f27966..d730de5 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -46,7 +46,7 @@ ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
 endif
 
-ARCH_INC := rte_atomic.h
+ARCH_INC := rte_atomic.h rte_atomic_arch.h
 
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include := $(addprefix include/,$(INC))
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include/arch := \
diff --git a/lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h b/lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h
new file mode 100644
index 0000000..cb2d91d
--- /dev/null
+++ b/lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h
@@ -0,0 +1,378 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ATOMIC_H_
+#error "don't include this file directly, please include generic <rte_atomic.h>"
+#endif
+
+#ifndef _RTE_ATOMIC_ARCH_H_
+#define _RTE_ATOMIC_ARCH_H_
+
+#include <stdint.h>
+
+#if RTE_MAX_LCORE == 1
+#define MPLOCKED                        /**< No need to insert MP lock prefix. */
+#else
+#define MPLOCKED        "lock ; "       /**< Insert MP lock prefix. */
+#endif
+
+/**
+ * General memory barrier.
+ *
+ * Guarantees that the LOAD and STORE operations generated before the
+ * barrier occur before the LOAD and STORE operations generated after.
+ */
+#define	rte_arch_mb() _mm_mfence()
+
+/**
+ * Write memory barrier.
+ *
+ * Guarantees that the STORE operations generated before the barrier
+ * occur before the STORE operations generated after.
+ */
+#define	rte_arch_wmb() _mm_sfence()
+
+/**
+ * Read memory barrier.
+ *
+ * Guarantees that the LOAD operations generated before the barrier
+ * occur before the LOAD operations generated after.
+ */
+#define	rte_arch_rmb() _mm_lfence()
+
+/**
+ * Compiler barrier.
+ *
+ * Guarantees that operation reordering does not occur at compile time
+ * for operations directly before and after the barrier.
+ */
+#define	rte_arch_compiler_barrier() do {		\
+	asm volatile ("" : : : "memory");	\
+} while(0)
+
+#include <emmintrin.h>
+
+/*------------------------- 16 bit atomic operations -------------------------*/
+
+/**
+ * The atomic counter structure.
+ */
+typedef struct {
+	volatile int16_t cnt; /**< An internal counter value. */
+} rte_atomic16_t;
+
+/**
+ * Atomic compare and set.
+ *
+ * (atomic) equivalent to:
+ *   if (*dst == exp)
+ *     *dst = src (all 16-bit words)
+ *
+ * @param dst
+ *   The destination location into which the value will be written.
+ * @param exp
+ *   The expected value.
+ * @param src
+ *   The new value.
+ * @return
+ *   Non-zero on success; 0 on failure.
+ */
+static inline int
+rte_arch_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t res;
+
+	asm volatile(
+			MPLOCKED
+			"cmpxchgw %[src], %[dst];"
+			"sete %[res];"
+			: [res] "=a" (res),     /* output */
+			  [dst] "=m" (*dst)
+			: [src] "r" (src),      /* input */
+			  "a" (exp),
+			  "m" (*dst)
+			: "memory");            /* no-clobber list */
+	return res;
+#else
+	return __sync_bool_compare_and_swap(dst, exp, src);
+#endif
+}
+
+/**
+ * Atomically increment a counter by one.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void
+rte_arch_atomic16_inc(rte_atomic16_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	asm volatile(
+			MPLOCKED
+			"incw %[cnt]"
+			: [cnt] "=m" (v->cnt)   /* output */
+			: "m" (v->cnt)          /* input */
+			);
+#else
+	rte_atomic16_add(v, 1);
+#endif
+}
+
+/**
+ * Atomically decrement a counter by one.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void
+rte_arch_atomic16_dec(rte_atomic16_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	asm volatile(
+			MPLOCKED
+			"decw %[cnt]"
+			: [cnt] "=m" (v->cnt)   /* output */
+			: "m" (v->cnt)          /* input */
+			);
+#else
+	rte_atomic16_sub(v, 1);
+#endif
+}
+
+/**
+ * Atomically increment a 16-bit counter by one and test.
+ *
+ * Atomically increments the atomic counter (v) by one and returns true if
+ * the result is 0, or false in all other cases.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ * @return
+ *   True if the result after the increment operation is 0; false otherwise.
+ */
+static inline int rte_arch_atomic16_inc_and_test(rte_atomic16_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t ret;
+
+	asm volatile(
+			MPLOCKED
+			"incw %[cnt] ; "
+			"sete %[ret]"
+			: [cnt] "+m" (v->cnt),  /* output */
+			  [ret] "=qm" (ret)
+			);
+	return (ret != 0);
+#else
+	return (__sync_add_and_fetch(&v->cnt, 1) == 0);
+#endif
+}
+
+/**
+ * Atomically decrement a 16-bit counter by one and test.
+ *
+ * Atomically decrements the atomic counter (v) by one and returns true if
+ * the result is 0, or false in all other cases.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ * @return
+ *   True if the result after the decrement operation is 0; false otherwise.
+ */
+static inline int rte_arch_atomic16_dec_and_test(rte_atomic16_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t ret;
+
+	asm volatile(MPLOCKED
+			"decw %[cnt] ; "
+			"sete %[ret]"
+			: [cnt] "+m" (v->cnt),  /* output */
+			  [ret] "=qm" (ret)
+			);
+	return (ret != 0);
+#else
+	return (__sync_sub_and_fetch(&v->cnt, 1) == 0);
+#endif
+}
+
+/*------------------------- 32 bit atomic operations -------------------------*/
+
+/**
+ * The atomic counter structure.
+ */
+typedef struct {
+	volatile int32_t cnt; /**< An internal counter value. */
+} rte_atomic32_t;
+
+/**
+ * Atomic compare and set.
+ *
+ * (atomic) equivalent to:
+ *   if (*dst == exp)
+ *     *dst = src (all 32-bit words)
+ *
+ * @param dst
+ *   The destination location into which the value will be written.
+ * @param exp
+ *   The expected value.
+ * @param src
+ *   The new value.
+ * @return
+ *   Non-zero on success; 0 on failure.
+ */
+static inline int
+rte_arch_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t res;
+
+	asm volatile(
+			MPLOCKED
+			"cmpxchgl %[src], %[dst];"
+			"sete %[res];"
+			: [res] "=a" (res),     /* output */
+			  [dst] "=m" (*dst)
+			: [src] "r" (src),      /* input */
+			  "a" (exp),
+			  "m" (*dst)
+			: "memory");            /* no-clobber list */
+	return res;
+#else
+	return __sync_bool_compare_and_swap(dst, exp, src);
+#endif
+}
+
+/**
+ * Atomically increment a counter by one.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void
+rte_arch_atomic32_inc(rte_atomic32_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	asm volatile(
+			MPLOCKED
+			"incl %[cnt]"
+			: [cnt] "=m" (v->cnt)   /* output */
+			: "m" (v->cnt)          /* input */
+			);
+#else
+	rte_atomic32_add(v, 1);
+#endif
+}
+
+/**
+ * Atomically decrement a counter by one.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void
+rte_arch_atomic32_dec(rte_atomic32_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	asm volatile(
+			MPLOCKED
+			"decl %[cnt]"
+			: [cnt] "=m" (v->cnt)   /* output */
+			: "m" (v->cnt)          /* input */
+			);
+#else
+	rte_atomic32_sub(v,1);
+#endif
+}
+
+/**
+ * Atomically increment a 32-bit counter by one and test.
+ *
+ * Atomically increments the atomic counter (v) by one and returns true if
+ * the result is 0, or false in all other cases.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ * @return
+ *   True if the result after the increment operation is 0; false otherwise.
+ */
+static inline int rte_arch_atomic32_inc_and_test(rte_atomic32_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t ret;
+
+	asm volatile(
+			MPLOCKED
+			"incl %[cnt] ; "
+			"sete %[ret]"
+			: [cnt] "+m" (v->cnt),  /* output */
+			  [ret] "=qm" (ret)
+			);
+	return (ret != 0);
+#else
+	return (__sync_add_and_fetch(&v->cnt, 1) == 0);
+#endif
+}
+
+/**
+ * Atomically decrement a 32-bit counter by one and test.
+ *
+ * Atomically decrements the atomic counter (v) by one and returns true if
+ * the result is 0, or false in all other cases.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ * @return
+ *   True if the result after the decrement operation is 0; false otherwise.
+ */
+static inline int rte_arch_atomic32_dec_and_test(rte_atomic32_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t ret;
+
+	asm volatile(MPLOCKED
+			"decl %[cnt] ; "
+			"sete %[ret]"
+			: [cnt] "+m" (v->cnt),  /* output */
+			  [ret] "=qm" (ret)
+			);
+	return (ret != 0);
+#else
+	return (__sync_sub_and_fetch(&v->cnt, 1) == 0);
+#endif
+}
+
+#endif /* _RTE_ATOMIC_ARCH_H_ */
+
diff --git a/lib/librte_eal/common/include/rte_atomic.h b/lib/librte_eal/common/include/rte_atomic.h
index a5b6eec..24ba5d0 100644
--- a/lib/librte_eal/common/include/rte_atomic.h
+++ b/lib/librte_eal/common/include/rte_atomic.h
@@ -49,13 +49,7 @@
 extern "C" {
 #endif
 
-#include <stdint.h>
-
-#if RTE_MAX_LCORE == 1
-#define MPLOCKED                        /**< No need to insert MP lock prefix. */
-#else
-#define MPLOCKED        "lock ; "       /**< Insert MP lock prefix. */
-#endif
+#include "arch/rte_atomic_arch.h"
 
 /**
  * General memory barrier.
@@ -63,7 +57,7 @@ extern "C" {
  * Guarantees that the LOAD and STORE operations generated before the
  * barrier occur before the LOAD and STORE operations generated after.
  */
-#define	rte_mb() _mm_mfence()
+#define	rte_mb() rte_arch_mb()
 
 /**
  * Write memory barrier.
@@ -71,7 +65,7 @@ extern "C" {
  * Guarantees that the STORE operations generated before the barrier
  * occur before the STORE operations generated after.
  */
-#define	rte_wmb() _mm_sfence()
+#define	rte_wmb() rte_arch_wmb()
 
 /**
  * Read memory barrier.
@@ -79,7 +73,7 @@ extern "C" {
  * Guarantees that the LOAD operations generated before the barrier
  * occur before the LOAD operations generated after.
  */
-#define	rte_rmb() _mm_lfence()
+#define	rte_rmb() rte_arch_rmb()
 
 /**
  * Compiler barrier.
@@ -87,11 +81,7 @@ extern "C" {
  * Guarantees that operation reordering does not occur at compile time
  * for operations directly before and after the barrier.
  */
-#define	rte_compiler_barrier() do {		\
-	asm volatile ("" : : : "memory");	\
-} while(0)
-
-#include <emmintrin.h>
+#define	rte_compiler_barrier() rte_arch_compiler_barrier()
 
 /**
  * @file
@@ -119,33 +109,10 @@ extern "C" {
 static inline int
 rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	uint8_t res;
-
-	asm volatile(
-			MPLOCKED
-			"cmpxchgw %[src], %[dst];"
-			"sete %[res];"
-			: [res] "=a" (res),     /* output */
-			  [dst] "=m" (*dst)
-			: [src] "r" (src),      /* input */
-			  "a" (exp),
-			  "m" (*dst)
-			: "memory");            /* no-clobber list */
-	return res;
-#else
-	return __sync_bool_compare_and_swap(dst, exp, src);
-#endif
+	return rte_arch_atomic16_cmpset(dst, exp, src);
 }
 
 /**
- * The atomic counter structure.
- */
-typedef struct {
-	volatile int16_t cnt; /**< An internal counter value. */
-} rte_atomic16_t;
-
-/**
  * Static initializer for an atomic counter.
  */
 #define RTE_ATOMIC16_INIT(val) { (val) }
@@ -227,16 +194,7 @@ rte_atomic16_sub(rte_atomic16_t *v, int16_t dec)
 static inline void
 rte_atomic16_inc(rte_atomic16_t *v)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	asm volatile(
-			MPLOCKED
-			"incw %[cnt]"
-			: [cnt] "=m" (v->cnt)   /* output */
-			: "m" (v->cnt)          /* input */
-			);
-#else
-	rte_atomic16_add(v, 1);
-#endif
+	rte_arch_atomic16_inc(v);
 }
 
 /**
@@ -248,16 +206,7 @@ rte_atomic16_inc(rte_atomic16_t *v)
 static inline void
 rte_atomic16_dec(rte_atomic16_t *v)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	asm volatile(
-			MPLOCKED
-			"decw %[cnt]"
-			: [cnt] "=m" (v->cnt)   /* output */
-			: "m" (v->cnt)          /* input */
-			);
-#else
-	rte_atomic16_sub(v, 1);
-#endif
+	rte_arch_atomic16_dec(v);
 }
 
 /**
@@ -312,20 +261,7 @@ rte_atomic16_sub_return(rte_atomic16_t *v, int16_t dec)
  */
 static inline int rte_atomic16_inc_and_test(rte_atomic16_t *v)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	uint8_t ret;
-
-	asm volatile(
-			MPLOCKED
-			"incw %[cnt] ; "
-			"sete %[ret]"
-			: [cnt] "+m" (v->cnt),  /* output */
-			  [ret] "=qm" (ret)
-			);
-	return (ret != 0);
-#else
-	return (__sync_add_and_fetch(&v->cnt, 1) == 0);
-#endif
+	return rte_arch_atomic16_inc_and_test(v);
 }
 
 /**
@@ -341,19 +277,7 @@ static inline int rte_atomic16_inc_and_test(rte_atomic16_t *v)
  */
 static inline int rte_atomic16_dec_and_test(rte_atomic16_t *v)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	uint8_t ret;
-
-	asm volatile(MPLOCKED
-			"decw %[cnt] ; "
-			"sete %[ret]"
-			: [cnt] "+m" (v->cnt),  /* output */
-			  [ret] "=qm" (ret)
-			);
-	return (ret != 0);
-#else
-	return (__sync_sub_and_fetch(&v->cnt, 1) == 0);
-#endif
+	return rte_arch_atomic16_dec_and_test(v);
 }
 
 /**
@@ -404,33 +328,10 @@ static inline void rte_atomic16_clear(rte_atomic16_t *v)
 static inline int
 rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	uint8_t res;
-
-	asm volatile(
-			MPLOCKED
-			"cmpxchgl %[src], %[dst];"
-			"sete %[res];"
-			: [res] "=a" (res),     /* output */
-			  [dst] "=m" (*dst)
-			: [src] "r" (src),      /* input */
-			  "a" (exp),
-			  "m" (*dst)
-			: "memory");            /* no-clobber list */
-	return res;
-#else
-	return __sync_bool_compare_and_swap(dst, exp, src);
-#endif
+	return rte_arch_atomic32_cmpset(dst, exp, src);
 }
 
 /**
- * The atomic counter structure.
- */
-typedef struct {
-	volatile int32_t cnt; /**< An internal counter value. */
-} rte_atomic32_t;
-
-/**
  * Static initializer for an atomic counter.
  */
 #define RTE_ATOMIC32_INIT(val) { (val) }
@@ -512,16 +413,7 @@ rte_atomic32_sub(rte_atomic32_t *v, int32_t dec)
 static inline void
 rte_atomic32_inc(rte_atomic32_t *v)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	asm volatile(
-			MPLOCKED
-			"incl %[cnt]"
-			: [cnt] "=m" (v->cnt)   /* output */
-			: "m" (v->cnt)          /* input */
-			);
-#else
-	rte_atomic32_add(v, 1);
-#endif
+	rte_arch_atomic32_inc(v);
 }
 
 /**
@@ -533,16 +425,7 @@ rte_atomic32_inc(rte_atomic32_t *v)
 static inline void
 rte_atomic32_dec(rte_atomic32_t *v)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	asm volatile(
-			MPLOCKED
-			"decl %[cnt]"
-			: [cnt] "=m" (v->cnt)   /* output */
-			: "m" (v->cnt)          /* input */
-			);
-#else
-	rte_atomic32_sub(v,1);
-#endif
+	rte_arch_atomic32_dec(v);
 }
 
 /**
@@ -597,20 +480,7 @@ rte_atomic32_sub_return(rte_atomic32_t *v, int32_t dec)
  */
 static inline int rte_atomic32_inc_and_test(rte_atomic32_t *v)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	uint8_t ret;
-
-	asm volatile(
-			MPLOCKED
-			"incl %[cnt] ; "
-			"sete %[ret]"
-			: [cnt] "+m" (v->cnt),  /* output */
-			  [ret] "=qm" (ret)
-			);
-	return (ret != 0);
-#else
-	return (__sync_add_and_fetch(&v->cnt, 1) == 0);
-#endif
+	return rte_arch_atomic32_inc_and_test(v);
 }
 
 /**
@@ -626,19 +496,7 @@ static inline int rte_atomic32_inc_and_test(rte_atomic32_t *v)
  */
 static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	uint8_t ret;
-
-	asm volatile(MPLOCKED
-			"decl %[cnt] ; "
-			"sete %[ret]"
-			: [cnt] "+m" (v->cnt),  /* output */
-			  [ret] "=qm" (ret)
-			);
-	return (ret != 0);
-#else
-	return (__sync_sub_and_fetch(&v->cnt, 1) == 0);
-#endif
+	return rte_arch_atomic32_dec_and_test(v);
 }
 
 /**
diff --git a/lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h b/lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h
new file mode 100644
index 0000000..cb2d91d
--- /dev/null
+++ b/lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h
@@ -0,0 +1,378 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ATOMIC_H_
+#error "don't include this file directly, please include generic <rte_atomic.h>"
+#endif
+
+#ifndef _RTE_ATOMIC_ARCH_H_
+#define _RTE_ATOMIC_ARCH_H_
+
+#include <stdint.h>
+
+#if RTE_MAX_LCORE == 1
+#define MPLOCKED                        /**< No need to insert MP lock prefix. */
+#else
+#define MPLOCKED        "lock ; "       /**< Insert MP lock prefix. */
+#endif
+
+/**
+ * General memory barrier.
+ *
+ * Guarantees that the LOAD and STORE operations generated before the
+ * barrier occur before the LOAD and STORE operations generated after.
+ */
+#define	rte_arch_mb() _mm_mfence()
+
+/**
+ * Write memory barrier.
+ *
+ * Guarantees that the STORE operations generated before the barrier
+ * occur before the STORE operations generated after.
+ */
+#define	rte_arch_wmb() _mm_sfence()
+
+/**
+ * Read memory barrier.
+ *
+ * Guarantees that the LOAD operations generated before the barrier
+ * occur before the LOAD operations generated after.
+ */
+#define	rte_arch_rmb() _mm_lfence()
+
+/**
+ * Compiler barrier.
+ *
+ * Guarantees that operation reordering does not occur at compile time
+ * for operations directly before and after the barrier.
+ */
+#define	rte_arch_compiler_barrier() do {		\
+	asm volatile ("" : : : "memory");	\
+} while(0)
+
+#include <emmintrin.h>
+
+/*------------------------- 16 bit atomic operations -------------------------*/
+
+/**
+ * The atomic counter structure.
+ */
+typedef struct {
+	volatile int16_t cnt; /**< An internal counter value. */
+} rte_atomic16_t;
+
+/**
+ * Atomic compare and set.
+ *
+ * (atomic) equivalent to:
+ *   if (*dst == exp)
+ *     *dst = src (all 16-bit words)
+ *
+ * @param dst
+ *   The destination location into which the value will be written.
+ * @param exp
+ *   The expected value.
+ * @param src
+ *   The new value.
+ * @return
+ *   Non-zero on success; 0 on failure.
+ */
+static inline int
+rte_arch_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t res;
+
+	asm volatile(
+			MPLOCKED
+			"cmpxchgw %[src], %[dst];"
+			"sete %[res];"
+			: [res] "=a" (res),     /* output */
+			  [dst] "=m" (*dst)
+			: [src] "r" (src),      /* input */
+			  "a" (exp),
+			  "m" (*dst)
+			: "memory");            /* no-clobber list */
+	return res;
+#else
+	return __sync_bool_compare_and_swap(dst, exp, src);
+#endif
+}
+
+/**
+ * Atomically increment a counter by one.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void
+rte_arch_atomic16_inc(rte_atomic16_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	asm volatile(
+			MPLOCKED
+			"incw %[cnt]"
+			: [cnt] "=m" (v->cnt)   /* output */
+			: "m" (v->cnt)          /* input */
+			);
+#else
+	rte_atomic16_add(v, 1);
+#endif
+}
+
+/**
+ * Atomically decrement a counter by one.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void
+rte_arch_atomic16_dec(rte_atomic16_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	asm volatile(
+			MPLOCKED
+			"decw %[cnt]"
+			: [cnt] "=m" (v->cnt)   /* output */
+			: "m" (v->cnt)          /* input */
+			);
+#else
+	rte_atomic16_sub(v, 1);
+#endif
+}
+
+/**
+ * Atomically increment a 16-bit counter by one and test.
+ *
+ * Atomically increments the atomic counter (v) by one and returns true if
+ * the result is 0, or false in all other cases.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ * @return
+ *   True if the result after the increment operation is 0; false otherwise.
+ */
+static inline int rte_arch_atomic16_inc_and_test(rte_atomic16_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t ret;
+
+	asm volatile(
+			MPLOCKED
+			"incw %[cnt] ; "
+			"sete %[ret]"
+			: [cnt] "+m" (v->cnt),  /* output */
+			  [ret] "=qm" (ret)
+			);
+	return (ret != 0);
+#else
+	return (__sync_add_and_fetch(&v->cnt, 1) == 0);
+#endif
+}
+
+/**
+ * Atomically decrement a 16-bit counter by one and test.
+ *
+ * Atomically decrements the atomic counter (v) by one and returns true if
+ * the result is 0, or false in all other cases.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ * @return
+ *   True if the result after the decrement operation is 0; false otherwise.
+ */
+static inline int rte_arch_atomic16_dec_and_test(rte_atomic16_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t ret;
+
+	asm volatile(MPLOCKED
+			"decw %[cnt] ; "
+			"sete %[ret]"
+			: [cnt] "+m" (v->cnt),  /* output */
+			  [ret] "=qm" (ret)
+			);
+	return (ret != 0);
+#else
+	return (__sync_sub_and_fetch(&v->cnt, 1) == 0);
+#endif
+}
+
+/*------------------------- 32 bit atomic operations -------------------------*/
+
+/**
+ * The atomic counter structure.
+ */
+typedef struct {
+	volatile int32_t cnt; /**< An internal counter value. */
+} rte_atomic32_t;
+
+/**
+ * Atomic compare and set.
+ *
+ * (atomic) equivalent to:
+ *   if (*dst == exp)
+ *     *dst = src (all 32-bit words)
+ *
+ * @param dst
+ *   The destination location into which the value will be written.
+ * @param exp
+ *   The expected value.
+ * @param src
+ *   The new value.
+ * @return
+ *   Non-zero on success; 0 on failure.
+ */
+static inline int
+rte_arch_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t res;
+
+	asm volatile(
+			MPLOCKED
+			"cmpxchgl %[src], %[dst];"
+			"sete %[res];"
+			: [res] "=a" (res),     /* output */
+			  [dst] "=m" (*dst)
+			: [src] "r" (src),      /* input */
+			  "a" (exp),
+			  "m" (*dst)
+			: "memory");            /* no-clobber list */
+	return res;
+#else
+	return __sync_bool_compare_and_swap(dst, exp, src);
+#endif
+}
+
+/**
+ * Atomically increment a counter by one.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void
+rte_arch_atomic32_inc(rte_atomic32_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	asm volatile(
+			MPLOCKED
+			"incl %[cnt]"
+			: [cnt] "=m" (v->cnt)   /* output */
+			: "m" (v->cnt)          /* input */
+			);
+#else
+	rte_atomic32_add(v, 1);
+#endif
+}
+
+/**
+ * Atomically decrement a counter by one.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void
+rte_arch_atomic32_dec(rte_atomic32_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	asm volatile(
+			MPLOCKED
+			"decl %[cnt]"
+			: [cnt] "=m" (v->cnt)   /* output */
+			: "m" (v->cnt)          /* input */
+			);
+#else
+	rte_atomic32_sub(v,1);
+#endif
+}
+
+/**
+ * Atomically increment a 32-bit counter by one and test.
+ *
+ * Atomically increments the atomic counter (v) by one and returns true if
+ * the result is 0, or false in all other cases.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ * @return
+ *   True if the result after the increment operation is 0; false otherwise.
+ */
+static inline int rte_arch_atomic32_inc_and_test(rte_atomic32_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t ret;
+
+	asm volatile(
+			MPLOCKED
+			"incl %[cnt] ; "
+			"sete %[ret]"
+			: [cnt] "+m" (v->cnt),  /* output */
+			  [ret] "=qm" (ret)
+			);
+	return (ret != 0);
+#else
+	return (__sync_add_and_fetch(&v->cnt, 1) == 0);
+#endif
+}
+
+/**
+ * Atomically decrement a 32-bit counter by one and test.
+ *
+ * Atomically decrements the atomic counter (v) by one and returns true if
+ * the result is 0, or false in all other cases.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ * @return
+ *   True if the result after the decrement operation is 0; false otherwise.
+ */
+static inline int rte_arch_atomic32_dec_and_test(rte_atomic32_t *v)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	uint8_t ret;
+
+	asm volatile(MPLOCKED
+			"decl %[cnt] ; "
+			"sete %[ret]"
+			: [cnt] "+m" (v->cnt),  /* output */
+			  [ret] "=qm" (ret)
+			);
+	return (ret != 0);
+#else
+	return (__sync_sub_and_fetch(&v->cnt, 1) == 0);
+#endif
+}
+
+#endif /* _RTE_ATOMIC_ARCH_H_ */
+
-- 
1.7.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 2/7] Split byte order operations to architecture specific
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific Chao Zhu
@ 2014-09-26  9:33 ` Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 3/7] Split CPU cycle operation " Chao Zhu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Chao Zhu @ 2014-09-26  9:33 UTC (permalink / raw)
  To: dev

This patch splits the byte order operations from DPDK and push them to
architecture specific arch directories, so that other processor
architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
---
 lib/librte_eal/common/Makefile                     |    2 +-
 .../common/include/i686/arch/rte_byteorder_arch.h  |   95 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_byteorder.h      |   58 +------------
 .../include/x86_64/arch/rte_byteorder_arch.h       |   95 ++++++++++++++++++++
 4 files changed, 193 insertions(+), 57 deletions(-)
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_byteorder_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_byteorder_arch.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index d730de5..d588c94 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -46,7 +46,7 @@ ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
 endif
 
-ARCH_INC := rte_atomic.h rte_atomic_arch.h
+ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h
 
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include := $(addprefix include/,$(INC))
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include/arch := \
diff --git a/lib/librte_eal/common/include/i686/arch/rte_byteorder_arch.h b/lib/librte_eal/common/include/i686/arch/rte_byteorder_arch.h
new file mode 100644
index 0000000..06c1afc
--- /dev/null
+++ b/lib/librte_eal/common/include/i686/arch/rte_byteorder_arch.h
@@ -0,0 +1,95 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_BYTEORDER_ARCH_H_
+#define _RTE_BYTEORDER_ARCH_H_
+
+#include <stdint.h>
+
+/*
+ * An architecture-optimized byte swap for a 16-bit value.
+ *
+ * Do not use this function directly. The preferred function is rte_bswap16().
+ */
+static inline uint16_t rte_arch_bswap16(uint16_t _x)
+{
+	register uint16_t x = _x;
+	asm volatile ("xchgb %b[x1],%h[x2]"
+		      : [x1] "=Q" (x)
+		      : [x2] "0" (x)
+		      );
+	return x;
+}
+
+/*
+ * An architecture-optimized byte swap for a 32-bit value.
+ *
+ * Do not use this function directly. The preferred function is rte_bswap32().
+ */
+static inline uint32_t rte_arch_bswap32(uint32_t _x)
+{
+	register uint32_t x = _x;
+	asm volatile ("bswap %[x]"
+		      : [x] "+r" (x)
+		      );
+	return x;
+}
+
+/*
+ * An architecture-optimized byte swap for a 64-bit value.
+ *
+  * Do not use this function directly. The preferred function is rte_bswap64().
+ */
+#ifdef RTE_ARCH_X86_64
+/* 64-bit mode */
+static inline uint64_t rte_arch_bswap64(uint64_t _x)
+{
+	register uint64_t x = _x;
+	asm volatile ("bswap %[x]"
+		      : [x] "+r" (x)
+		      );
+	return x;
+}
+#else /* ! RTE_ARCH_X86_64 */
+/* Compat./Leg. mode */
+static inline uint64_t rte_arch_bswap64(uint64_t x)
+{
+	uint64_t ret = 0;
+	ret |= ((uint64_t)rte_arch_bswap32(x & 0xffffffffUL) << 32);
+	ret |= ((uint64_t)rte_arch_bswap32((x >> 32) & 0xffffffffUL));
+	return ret;
+}
+#endif /* RTE_ARCH_X86_64 */
+
+#endif /* _RTE_BYTEORDER_ARCH_H_ */
+
diff --git a/lib/librte_eal/common/include/rte_byteorder.h b/lib/librte_eal/common/include/rte_byteorder.h
index 30fbd56..98e3764 100644
--- a/lib/librte_eal/common/include/rte_byteorder.h
+++ b/lib/librte_eal/common/include/rte_byteorder.h
@@ -34,6 +34,8 @@
 #ifndef _RTE_BYTEORDER_H_
 #define _RTE_BYTEORDER_H_
 
+#include "arch/rte_byteorder_arch.h"
+
 /**
  * @file
  *
@@ -96,62 +98,6 @@ rte_constant_bswap64(uint64_t x)
 		((x & 0xff00000000000000ULL) >> 56);
 }
 
-/*
- * An architecture-optimized byte swap for a 16-bit value.
- *
- * Do not use this function directly. The preferred function is rte_bswap16().
- */
-static inline uint16_t rte_arch_bswap16(uint16_t _x)
-{
-	register uint16_t x = _x;
-	asm volatile ("xchgb %b[x1],%h[x2]"
-		      : [x1] "=Q" (x)
-		      : [x2] "0" (x)
-		      );
-	return x;
-}
-
-/*
- * An architecture-optimized byte swap for a 32-bit value.
- *
- * Do not use this function directly. The preferred function is rte_bswap32().
- */
-static inline uint32_t rte_arch_bswap32(uint32_t _x)
-{
-	register uint32_t x = _x;
-	asm volatile ("bswap %[x]"
-		      : [x] "+r" (x)
-		      );
-	return x;
-}
-
-/*
- * An architecture-optimized byte swap for a 64-bit value.
- *
-  * Do not use this function directly. The preferred function is rte_bswap64().
- */
-#ifdef RTE_ARCH_X86_64
-/* 64-bit mode */
-static inline uint64_t rte_arch_bswap64(uint64_t _x)
-{
-	register uint64_t x = _x;
-	asm volatile ("bswap %[x]"
-		      : [x] "+r" (x)
-		      );
-	return x;
-}
-#else /* ! RTE_ARCH_X86_64 */
-/* Compat./Leg. mode */
-static inline uint64_t rte_arch_bswap64(uint64_t x)
-{
-	uint64_t ret = 0;
-	ret |= ((uint64_t)rte_arch_bswap32(x & 0xffffffffUL) << 32);
-	ret |= ((uint64_t)rte_arch_bswap32((x >> 32) & 0xffffffffUL));
-	return ret;
-}
-#endif /* RTE_ARCH_X86_64 */
-
-
 #ifndef RTE_FORCE_INTRINSICS
 /**
  * Swap bytes in a 16-bit value.
diff --git a/lib/librte_eal/common/include/x86_64/arch/rte_byteorder_arch.h b/lib/librte_eal/common/include/x86_64/arch/rte_byteorder_arch.h
new file mode 100644
index 0000000..06c1afc
--- /dev/null
+++ b/lib/librte_eal/common/include/x86_64/arch/rte_byteorder_arch.h
@@ -0,0 +1,95 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_BYTEORDER_ARCH_H_
+#define _RTE_BYTEORDER_ARCH_H_
+
+#include <stdint.h>
+
+/*
+ * An architecture-optimized byte swap for a 16-bit value.
+ *
+ * Do not use this function directly. The preferred function is rte_bswap16().
+ */
+static inline uint16_t rte_arch_bswap16(uint16_t _x)
+{
+	register uint16_t x = _x;
+	asm volatile ("xchgb %b[x1],%h[x2]"
+		      : [x1] "=Q" (x)
+		      : [x2] "0" (x)
+		      );
+	return x;
+}
+
+/*
+ * An architecture-optimized byte swap for a 32-bit value.
+ *
+ * Do not use this function directly. The preferred function is rte_bswap32().
+ */
+static inline uint32_t rte_arch_bswap32(uint32_t _x)
+{
+	register uint32_t x = _x;
+	asm volatile ("bswap %[x]"
+		      : [x] "+r" (x)
+		      );
+	return x;
+}
+
+/*
+ * An architecture-optimized byte swap for a 64-bit value.
+ *
+  * Do not use this function directly. The preferred function is rte_bswap64().
+ */
+#ifdef RTE_ARCH_X86_64
+/* 64-bit mode */
+static inline uint64_t rte_arch_bswap64(uint64_t _x)
+{
+	register uint64_t x = _x;
+	asm volatile ("bswap %[x]"
+		      : [x] "+r" (x)
+		      );
+	return x;
+}
+#else /* ! RTE_ARCH_X86_64 */
+/* Compat./Leg. mode */
+static inline uint64_t rte_arch_bswap64(uint64_t x)
+{
+	uint64_t ret = 0;
+	ret |= ((uint64_t)rte_arch_bswap32(x & 0xffffffffUL) << 32);
+	ret |= ((uint64_t)rte_arch_bswap32((x >> 32) & 0xffffffffUL));
+	return ret;
+}
+#endif /* RTE_ARCH_X86_64 */
+
+#endif /* _RTE_BYTEORDER_ARCH_H_ */
+
-- 
1.7.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 3/7] Split CPU cycle operation to architecture specific
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 2/7] Split byte order " Chao Zhu
@ 2014-09-26  9:33 ` Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 4/7] Split prefetch operations " Chao Zhu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Chao Zhu @ 2014-09-26  9:33 UTC (permalink / raw)
  To: dev

This patch splits the CPU TSC read operations from DPDK and push them to
architecture specific arch directories, so that other processor
architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
---
 lib/librte_eal/common/Makefile                     |    2 +-
 .../common/include/i686/arch/rte_cycles_arch.h     |  108 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_cycles.h         |   31 +-----
 .../common/include/x86_64/arch/rte_cycles_arch.h   |  108 ++++++++++++++++++++
 4 files changed, 219 insertions(+), 30 deletions(-)
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_cycles_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_cycles_arch.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index d588c94..0863aeb 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -46,7 +46,7 @@ ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
 endif
 
-ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h
+ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h
 
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include := $(addprefix include/,$(INC))
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include/arch := \
diff --git a/lib/librte_eal/common/include/i686/arch/rte_cycles_arch.h b/lib/librte_eal/common/include/i686/arch/rte_cycles_arch.h
new file mode 100644
index 0000000..1a4b3e0
--- /dev/null
+++ b/lib/librte_eal/common/include/i686/arch/rte_cycles_arch.h
@@ -0,0 +1,108 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+/*   BSD LICENSE
+ *
+ *   Copyright(c) 2013 6WIND.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_CYCLES_ARCH_H_
+#define _RTE_CYCLES_ARCH_H_
+
+#include <stdint.h>
+
+#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
+/** Global switch to use VMWARE mapping of TSC instead of RDTSC */
+extern int rte_cycles_vmware_tsc_map;
+#include <rte_branch_prediction.h>
+#endif
+
+/**
+ * Read the TSC register.
+ *
+ * @return
+ *   The TSC for this lcore.
+ */
+static inline uint64_t
+rte_arch_rdtsc(void)
+{
+	union {
+		uint64_t tsc_64;
+		struct {
+			uint32_t lo_32;
+			uint32_t hi_32;
+		};
+	} tsc;
+
+#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
+	if (unlikely(rte_cycles_vmware_tsc_map)) {
+		/* ecx = 0x10000 corresponds to the physical TSC for VMware */
+		asm volatile("rdpmc" :
+		             "=a" (tsc.lo_32),
+		             "=d" (tsc.hi_32) :
+		             "c"(0x10000));
+		return tsc.tsc_64;
+	}
+#endif
+
+	asm volatile("rdtsc" :
+		     "=a" (tsc.lo_32),
+		     "=d" (tsc.hi_32));
+	return tsc.tsc_64;
+}
+#endif /* _RTE_CYCLES_ARCH_H_ */
diff --git a/lib/librte_eal/common/include/rte_cycles.h b/lib/librte_eal/common/include/rte_cycles.h
index 9b4dbe1..022cfcc 100644
--- a/lib/librte_eal/common/include/rte_cycles.h
+++ b/lib/librte_eal/common/include/rte_cycles.h
@@ -74,15 +74,10 @@
 extern "C" {
 #endif
 
-#include <stdint.h>
 #include <rte_debug.h>
 #include <rte_atomic.h>
+#include <arch/rte_cycles_arch.h>
 
-#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
-/** Global switch to use VMWARE mapping of TSC instead of RDTSC */
-extern int rte_cycles_vmware_tsc_map;
-#include <rte_branch_prediction.h>
-#endif
 
 #define MS_PER_S 1000
 #define US_PER_S 1000000
@@ -103,29 +98,7 @@ extern enum timer_source eal_timer_source;
 static inline uint64_t
 rte_rdtsc(void)
 {
-	union {
-		uint64_t tsc_64;
-		struct {
-			uint32_t lo_32;
-			uint32_t hi_32;
-		};
-	} tsc;
-
-#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
-	if (unlikely(rte_cycles_vmware_tsc_map)) {
-		/* ecx = 0x10000 corresponds to the physical TSC for VMware */
-		asm volatile("rdpmc" :
-		             "=a" (tsc.lo_32),
-		             "=d" (tsc.hi_32) :
-		             "c"(0x10000));
-		return tsc.tsc_64;
-	}
-#endif
-
-	asm volatile("rdtsc" :
-		     "=a" (tsc.lo_32),
-		     "=d" (tsc.hi_32));
-	return tsc.tsc_64;
+	return rte_arch_rdtsc();
 }
 
 /**
diff --git a/lib/librte_eal/common/include/x86_64/arch/rte_cycles_arch.h b/lib/librte_eal/common/include/x86_64/arch/rte_cycles_arch.h
new file mode 100644
index 0000000..1a4b3e0
--- /dev/null
+++ b/lib/librte_eal/common/include/x86_64/arch/rte_cycles_arch.h
@@ -0,0 +1,108 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+/*   BSD LICENSE
+ *
+ *   Copyright(c) 2013 6WIND.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_CYCLES_ARCH_H_
+#define _RTE_CYCLES_ARCH_H_
+
+#include <stdint.h>
+
+#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
+/** Global switch to use VMWARE mapping of TSC instead of RDTSC */
+extern int rte_cycles_vmware_tsc_map;
+#include <rte_branch_prediction.h>
+#endif
+
+/**
+ * Read the TSC register.
+ *
+ * @return
+ *   The TSC for this lcore.
+ */
+static inline uint64_t
+rte_arch_rdtsc(void)
+{
+	union {
+		uint64_t tsc_64;
+		struct {
+			uint32_t lo_32;
+			uint32_t hi_32;
+		};
+	} tsc;
+
+#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
+	if (unlikely(rte_cycles_vmware_tsc_map)) {
+		/* ecx = 0x10000 corresponds to the physical TSC for VMware */
+		asm volatile("rdpmc" :
+		             "=a" (tsc.lo_32),
+		             "=d" (tsc.hi_32) :
+		             "c"(0x10000));
+		return tsc.tsc_64;
+	}
+#endif
+
+	asm volatile("rdtsc" :
+		     "=a" (tsc.lo_32),
+		     "=d" (tsc.hi_32));
+	return tsc.tsc_64;
+}
+#endif /* _RTE_CYCLES_ARCH_H_ */
-- 
1.7.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 4/7] Split prefetch operations to architecture specific
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
                   ` (2 preceding siblings ...)
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 3/7] Split CPU cycle operation " Chao Zhu
@ 2014-09-26  9:33 ` Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 5/7] Split spinlock " Chao Zhu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Chao Zhu @ 2014-09-26  9:33 UTC (permalink / raw)
  To: dev

This patch splits the prefetch operations from DPDK and push them to
architecture specific arch directories, so that other processor
architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
---
 lib/librte_eal/common/Makefile                     |    2 +-
 .../common/include/i686/arch/rte_prefetch_arch.h   |   68 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_prefetch.h       |    7 +-
 .../common/include/x86_64/arch/rte_prefetch_arch.h |   68 ++++++++++++++++++++
 4 files changed, 141 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_prefetch_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_prefetch_arch.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 0863aeb..bb175ca 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -46,7 +46,7 @@ ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
 endif
 
-ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h
+ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h rte_prefetch_arch.h
 
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include := $(addprefix include/,$(INC))
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include/arch := \
diff --git a/lib/librte_eal/common/include/i686/arch/rte_prefetch_arch.h b/lib/librte_eal/common/include/i686/arch/rte_prefetch_arch.h
new file mode 100644
index 0000000..48cfaf5
--- /dev/null
+++ b/lib/librte_eal/common/include/i686/arch/rte_prefetch_arch.h
@@ -0,0 +1,68 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PREFETCH_ARCH_H_
+#define _RTE_PREFETCH_ARCH_H_
+
+/**
+ * Prefetch a cache line into all cache levels.
+ * @param p
+ *   Address to prefetch
+ */
+static inline void rte_arch_prefetch0(volatile void *p)
+{
+	asm volatile ("prefetcht0 %[p]" : [p] "+m" (*(volatile char *)p));
+}
+
+/**
+ * Prefetch a cache line into all cache levels except the 0th cache level.
+ * @param p
+ *   Address to prefetch
+ */
+static inline void rte_arch_prefetch1(volatile void *p)
+{
+	asm volatile ("prefetcht1 %[p]" : [p] "+m" (*(volatile char *)p));
+}
+
+/**
+ * Prefetch a cache line into all cache levels except the 0th and 1th cache
+ * levels.
+ * @param p
+ *   Address to prefetch
+ */
+static inline void rte_arch_prefetch2(volatile void *p)
+{
+	asm volatile ("prefetcht2 %[p]" : [p] "+m" (*(volatile char *)p));
+}
+
+#endif /* _RTE_PREFETCH_ARCH_H_ */
diff --git a/lib/librte_eal/common/include/rte_prefetch.h b/lib/librte_eal/common/include/rte_prefetch.h
index 8a691ef..0a45176 100644
--- a/lib/librte_eal/common/include/rte_prefetch.h
+++ b/lib/librte_eal/common/include/rte_prefetch.h
@@ -34,6 +34,7 @@
 #ifndef _RTE_PREFETCH_H_
 #define _RTE_PREFETCH_H_
 
+#include <arch/rte_prefetch_arch.h>
 /**
  * @file
  *
@@ -57,7 +58,7 @@ extern "C" {
  */
 static inline void rte_prefetch0(volatile void *p)
 {
-	asm volatile ("prefetcht0 %[p]" : [p] "+m" (*(volatile char *)p));
+	rte_arch_prefetch0(p);
 }
 
 /**
@@ -67,7 +68,7 @@ static inline void rte_prefetch0(volatile void *p)
  */
 static inline void rte_prefetch1(volatile void *p)
 {
-	asm volatile ("prefetcht1 %[p]" : [p] "+m" (*(volatile char *)p));
+	rte_arch_prefetch1(p);
 }
 
 /**
@@ -78,7 +79,7 @@ static inline void rte_prefetch1(volatile void *p)
  */
 static inline void rte_prefetch2(volatile void *p)
 {
-	asm volatile ("prefetcht2 %[p]" : [p] "+m" (*(volatile char *)p));
+	rte_arch_prefetch2(p);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_eal/common/include/x86_64/arch/rte_prefetch_arch.h b/lib/librte_eal/common/include/x86_64/arch/rte_prefetch_arch.h
new file mode 100644
index 0000000..48cfaf5
--- /dev/null
+++ b/lib/librte_eal/common/include/x86_64/arch/rte_prefetch_arch.h
@@ -0,0 +1,68 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PREFETCH_ARCH_H_
+#define _RTE_PREFETCH_ARCH_H_
+
+/**
+ * Prefetch a cache line into all cache levels.
+ * @param p
+ *   Address to prefetch
+ */
+static inline void rte_arch_prefetch0(volatile void *p)
+{
+	asm volatile ("prefetcht0 %[p]" : [p] "+m" (*(volatile char *)p));
+}
+
+/**
+ * Prefetch a cache line into all cache levels except the 0th cache level.
+ * @param p
+ *   Address to prefetch
+ */
+static inline void rte_arch_prefetch1(volatile void *p)
+{
+	asm volatile ("prefetcht1 %[p]" : [p] "+m" (*(volatile char *)p));
+}
+
+/**
+ * Prefetch a cache line into all cache levels except the 0th and 1th cache
+ * levels.
+ * @param p
+ *   Address to prefetch
+ */
+static inline void rte_arch_prefetch2(volatile void *p)
+{
+	asm volatile ("prefetcht2 %[p]" : [p] "+m" (*(volatile char *)p));
+}
+
+#endif /* _RTE_PREFETCH_ARCH_H_ */
-- 
1.7.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 5/7] Split spinlock operations to architecture specific
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
                   ` (3 preceding siblings ...)
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 4/7] Split prefetch operations " Chao Zhu
@ 2014-09-26  9:33 ` Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 6/7] Split memcpy operation " Chao Zhu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Chao Zhu @ 2014-09-26  9:33 UTC (permalink / raw)
  To: dev

This patch splits the spinlock operations from DPDK and push them to
architecture specific arch directories, so that other processor
architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
---
 lib/librte_eal/common/Makefile                     |    2 +-
 .../common/include/i686/arch/rte_spinlock_arch.h   |  128 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_spinlock.h       |   55 +--------
 .../common/include/x86_64/arch/rte_spinlock_arch.h |  128 ++++++++++++++++++++
 4 files changed, 261 insertions(+), 52 deletions(-)
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_spinlock_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_spinlock_arch.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index bb175ca..249ea2f 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -46,7 +46,7 @@ ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
 endif
 
-ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h rte_prefetch_arch.h
+ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h rte_prefetch_arch.h rte_spinlock_arch.h
 
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include := $(addprefix include/,$(INC))
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include/arch := \
diff --git a/lib/librte_eal/common/include/i686/arch/rte_spinlock_arch.h b/lib/librte_eal/common/include/i686/arch/rte_spinlock_arch.h
new file mode 100644
index 0000000..2b13dcd
--- /dev/null
+++ b/lib/librte_eal/common/include/i686/arch/rte_spinlock_arch.h
@@ -0,0 +1,128 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_SPINLOCK_ARCH_H_
+#define _RTE_SPINLOCK_ARCH_H_
+
+#include <rte_lcore.h>
+#ifdef RTE_FORCE_INTRINSICS
+#include <rte_common.h>
+#endif
+
+/**
+ * The rte_spinlock_t type.
+ */
+typedef struct {
+	volatile int locked; /**< lock status 0 = unlocked, 1 = locked */
+} rte_spinlock_t;
+
+/**
+ * Take the spinlock.
+ *
+ * @param sl
+ *   A pointer to the spinlock.
+ */
+static inline void
+rte_arch_spinlock_lock(rte_spinlock_t *sl)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	int lock_val = 1;
+	asm volatile (
+			"1:\n"
+			"xchg %[locked], %[lv]\n"
+			"test %[lv], %[lv]\n"
+			"jz 3f\n"
+			"2:\n"
+			"pause\n"
+			"cmpl $0, %[locked]\n"
+			"jnz 2b\n"
+			"jmp 1b\n"
+			"3:\n"
+			: [locked] "=m" (sl->locked), [lv] "=q" (lock_val)
+			: "[lv]" (lock_val)
+			: "memory");
+#else
+	while (__sync_lock_test_and_set(&sl->locked, 1))
+		while(sl->locked)
+			rte_pause();
+#endif
+}
+
+/**
+ * Release the spinlock.
+ *
+ * @param sl
+ *   A pointer to the spinlock.
+ */
+static inline void
+rte_arch_spinlock_unlock (rte_spinlock_t *sl)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	int unlock_val = 0;
+	asm volatile (
+			"xchg %[locked], %[ulv]\n"
+			: [locked] "=m" (sl->locked), [ulv] "=q" (unlock_val)
+			: "[ulv]" (unlock_val)
+			: "memory");
+#else
+	__sync_lock_release(&sl->locked);
+#endif
+}
+
+/**
+ * Try to take the lock.
+ *
+ * @param sl
+ *   A pointer to the spinlock.
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+static inline int
+rte_arch_spinlock_trylock (rte_spinlock_t *sl)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	int lockval = 1;
+
+	asm volatile (
+			"xchg %[locked], %[lockval]"
+			: [locked] "=m" (sl->locked), [lockval] "=q" (lockval)
+			: "[lockval]" (lockval)
+			: "memory");
+
+	return (lockval == 0);
+#else
+	return (__sync_lock_test_and_set(&sl->locked,1) == 0);
+#endif
+}
+
+#endif /* _RTE_SPINLOCK_ARCH_H_ */
\ No newline at end of file
diff --git a/lib/librte_eal/common/include/rte_spinlock.h b/lib/librte_eal/common/include/rte_spinlock.h
index 661908d..1cab17f 100644
--- a/lib/librte_eal/common/include/rte_spinlock.h
+++ b/lib/librte_eal/common/include/rte_spinlock.h
@@ -55,13 +55,7 @@ extern "C" {
 #ifdef RTE_FORCE_INTRINSICS
 #include <rte_common.h>
 #endif
-
-/**
- * The rte_spinlock_t type.
- */
-typedef struct {
-	volatile int locked; /**< lock status 0 = unlocked, 1 = locked */
-} rte_spinlock_t;
+#include <arch/rte_spinlock_arch.h>
 
 /**
  * A static spinlock initializer.
@@ -89,27 +83,7 @@ rte_spinlock_init(rte_spinlock_t *sl)
 static inline void
 rte_spinlock_lock(rte_spinlock_t *sl)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	int lock_val = 1;
-	asm volatile (
-			"1:\n"
-			"xchg %[locked], %[lv]\n"
-			"test %[lv], %[lv]\n"
-			"jz 3f\n"
-			"2:\n"
-			"pause\n"
-			"cmpl $0, %[locked]\n"
-			"jnz 2b\n"
-			"jmp 1b\n"
-			"3:\n"
-			: [locked] "=m" (sl->locked), [lv] "=q" (lock_val)
-			: "[lv]" (lock_val)
-			: "memory");
-#else
-	while (__sync_lock_test_and_set(&sl->locked, 1))
-		while(sl->locked)
-			rte_pause();
-#endif
+	rte_arch_spinlock_lock(sl);
 }
 
 /**
@@ -121,16 +95,7 @@ rte_spinlock_lock(rte_spinlock_t *sl)
 static inline void
 rte_spinlock_unlock (rte_spinlock_t *sl)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	int unlock_val = 0;
-	asm volatile (
-			"xchg %[locked], %[ulv]\n"
-			: [locked] "=m" (sl->locked), [ulv] "=q" (unlock_val)
-			: "[ulv]" (unlock_val)
-			: "memory");
-#else
-	__sync_lock_release(&sl->locked);
-#endif
+	rte_arch_spinlock_unlock(sl);
 }
 
 /**
@@ -144,19 +109,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl)
 static inline int
 rte_spinlock_trylock (rte_spinlock_t *sl)
 {
-#ifndef RTE_FORCE_INTRINSICS
-	int lockval = 1;
-
-	asm volatile (
-			"xchg %[locked], %[lockval]"
-			: [locked] "=m" (sl->locked), [lockval] "=q" (lockval)
-			: "[lockval]" (lockval)
-			: "memory");
-
-	return (lockval == 0);
-#else
-	return (__sync_lock_test_and_set(&sl->locked,1) == 0);
-#endif
+	return rte_arch_spinlock_trylock(sl);
 }
 
 /**
diff --git a/lib/librte_eal/common/include/x86_64/arch/rte_spinlock_arch.h b/lib/librte_eal/common/include/x86_64/arch/rte_spinlock_arch.h
new file mode 100644
index 0000000..2b13dcd
--- /dev/null
+++ b/lib/librte_eal/common/include/x86_64/arch/rte_spinlock_arch.h
@@ -0,0 +1,128 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_SPINLOCK_ARCH_H_
+#define _RTE_SPINLOCK_ARCH_H_
+
+#include <rte_lcore.h>
+#ifdef RTE_FORCE_INTRINSICS
+#include <rte_common.h>
+#endif
+
+/**
+ * The rte_spinlock_t type.
+ */
+typedef struct {
+	volatile int locked; /**< lock status 0 = unlocked, 1 = locked */
+} rte_spinlock_t;
+
+/**
+ * Take the spinlock.
+ *
+ * @param sl
+ *   A pointer to the spinlock.
+ */
+static inline void
+rte_arch_spinlock_lock(rte_spinlock_t *sl)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	int lock_val = 1;
+	asm volatile (
+			"1:\n"
+			"xchg %[locked], %[lv]\n"
+			"test %[lv], %[lv]\n"
+			"jz 3f\n"
+			"2:\n"
+			"pause\n"
+			"cmpl $0, %[locked]\n"
+			"jnz 2b\n"
+			"jmp 1b\n"
+			"3:\n"
+			: [locked] "=m" (sl->locked), [lv] "=q" (lock_val)
+			: "[lv]" (lock_val)
+			: "memory");
+#else
+	while (__sync_lock_test_and_set(&sl->locked, 1))
+		while(sl->locked)
+			rte_pause();
+#endif
+}
+
+/**
+ * Release the spinlock.
+ *
+ * @param sl
+ *   A pointer to the spinlock.
+ */
+static inline void
+rte_arch_spinlock_unlock (rte_spinlock_t *sl)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	int unlock_val = 0;
+	asm volatile (
+			"xchg %[locked], %[ulv]\n"
+			: [locked] "=m" (sl->locked), [ulv] "=q" (unlock_val)
+			: "[ulv]" (unlock_val)
+			: "memory");
+#else
+	__sync_lock_release(&sl->locked);
+#endif
+}
+
+/**
+ * Try to take the lock.
+ *
+ * @param sl
+ *   A pointer to the spinlock.
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+static inline int
+rte_arch_spinlock_trylock (rte_spinlock_t *sl)
+{
+#ifndef RTE_FORCE_INTRINSICS
+	int lockval = 1;
+
+	asm volatile (
+			"xchg %[locked], %[lockval]"
+			: [locked] "=m" (sl->locked), [lockval] "=q" (lockval)
+			: "[lockval]" (lockval)
+			: "memory");
+
+	return (lockval == 0);
+#else
+	return (__sync_lock_test_and_set(&sl->locked,1) == 0);
+#endif
+}
+
+#endif /* _RTE_SPINLOCK_ARCH_H_ */
\ No newline at end of file
-- 
1.7.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 6/7] Split memcpy operation to architecture specific
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
                   ` (4 preceding siblings ...)
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 5/7] Split spinlock " Chao Zhu
@ 2014-09-26  9:33 ` Chao Zhu
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 7/7] Split CPU flags operations " Chao Zhu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Chao Zhu @ 2014-09-26  9:33 UTC (permalink / raw)
  To: dev

This patch splits the vector instruction based memory copy from DPDK and
push them to architecture specific arch directories, so that other
processor architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
---
 lib/librte_eal/common/Makefile                     |    2 +-
 .../common/include/i686/arch/rte_memcpy_arch.h     |  199 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_memcpy.h         |   95 +---------
 .../common/include/x86_64/arch/rte_memcpy_arch.h   |  199 ++++++++++++++++++++
 4 files changed, 406 insertions(+), 89 deletions(-)
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_memcpy_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_memcpy_arch.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 249ea2f..4add1c1 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -46,7 +46,7 @@ ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
 endif
 
-ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h rte_prefetch_arch.h rte_spinlock_arch.h
+ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h rte_prefetch_arch.h rte_spinlock_arch.h rte_memcpy_arch.h 
 
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include := $(addprefix include/,$(INC))
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include/arch := \
diff --git a/lib/librte_eal/common/include/i686/arch/rte_memcpy_arch.h b/lib/librte_eal/common/include/i686/arch/rte_memcpy_arch.h
new file mode 100644
index 0000000..44f7760
--- /dev/null
+++ b/lib/librte_eal/common/include/i686/arch/rte_memcpy_arch.h
@@ -0,0 +1,199 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_MEMCPY_ARCH_H_
+#define _RTE_MEMCPY_ARCH_H_
+
+#include <stdint.h>
+#include <string.h>
+#include <emmintrin.h>
+
+#ifdef __INTEL_COMPILER
+#pragma warning(disable:593) /* Stop unused variable warning (reg_a etc). */
+#endif
+
+/**
+ * Copy 16 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov16(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		: [reg_a] "=x" (reg_a)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+/**
+ * Copy 32 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov32(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a, reg_b;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu 16(%[src]), %[reg_b]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		"movdqu %[reg_b], 16(%[dst])\n\t"
+		: [reg_a] "=x" (reg_a),
+		  [reg_b] "=x" (reg_b)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+/**
+ * Copy 48 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov48(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a, reg_b, reg_c;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu 16(%[src]), %[reg_b]\n\t"
+		"movdqu 32(%[src]), %[reg_c]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		"movdqu %[reg_b], 16(%[dst])\n\t"
+		"movdqu %[reg_c], 32(%[dst])\n\t"
+		: [reg_a] "=x" (reg_a),
+		  [reg_b] "=x" (reg_b),
+		  [reg_c] "=x" (reg_c)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+/**
+ * Copy 64 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov64(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a, reg_b, reg_c, reg_d;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu 16(%[src]), %[reg_b]\n\t"
+		"movdqu 32(%[src]), %[reg_c]\n\t"
+		"movdqu 48(%[src]), %[reg_d]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		"movdqu %[reg_b], 16(%[dst])\n\t"
+		"movdqu %[reg_c], 32(%[dst])\n\t"
+		"movdqu %[reg_d], 48(%[dst])\n\t"
+		: [reg_a] "=x" (reg_a),
+		  [reg_b] "=x" (reg_b),
+		  [reg_c] "=x" (reg_c),
+		  [reg_d] "=x" (reg_d)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+/**
+ * Copy 128 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov128(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a, reg_b, reg_c, reg_d, reg_e, reg_f, reg_g, reg_h;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu 16(%[src]), %[reg_b]\n\t"
+		"movdqu 32(%[src]), %[reg_c]\n\t"
+		"movdqu 48(%[src]), %[reg_d]\n\t"
+		"movdqu 64(%[src]), %[reg_e]\n\t"
+		"movdqu 80(%[src]), %[reg_f]\n\t"
+		"movdqu 96(%[src]), %[reg_g]\n\t"
+		"movdqu 112(%[src]), %[reg_h]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		"movdqu %[reg_b], 16(%[dst])\n\t"
+		"movdqu %[reg_c], 32(%[dst])\n\t"
+		"movdqu %[reg_d], 48(%[dst])\n\t"
+		"movdqu %[reg_e], 64(%[dst])\n\t"
+		"movdqu %[reg_f], 80(%[dst])\n\t"
+		"movdqu %[reg_g], 96(%[dst])\n\t"
+		"movdqu %[reg_h], 112(%[dst])\n\t"
+		: [reg_a] "=x" (reg_a),
+		  [reg_b] "=x" (reg_b),
+		  [reg_c] "=x" (reg_c),
+		  [reg_d] "=x" (reg_d),
+		  [reg_e] "=x" (reg_e),
+		  [reg_f] "=x" (reg_f),
+		  [reg_g] "=x" (reg_g),
+		  [reg_h] "=x" (reg_h)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+#endif /* _RTE_MEMCPY_ARCH_H_ */
\ No newline at end of file
diff --git a/lib/librte_eal/common/include/rte_memcpy.h b/lib/librte_eal/common/include/rte_memcpy.h
index 131b196..11a099e 100644
--- a/lib/librte_eal/common/include/rte_memcpy.h
+++ b/lib/librte_eal/common/include/rte_memcpy.h
@@ -37,12 +37,10 @@
 /**
  * @file
  *
- * Functions for SSE implementation of memcpy().
+ * Functions for vector instruction implementation of memcpy().
  */
 
-#include <stdint.h>
-#include <string.h>
-#include <emmintrin.h>
+#include "arch/rte_memcpy_arch.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -64,15 +62,7 @@ extern "C" {
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
-	__m128i reg_a;
-	asm volatile (
-		"movdqu (%[src]), %[reg_a]\n\t"
-		"movdqu %[reg_a], (%[dst])\n\t"
-		: [reg_a] "=x" (reg_a)
-		: [src] "r" (src),
-		  [dst] "r"(dst)
-		: "memory"
-	);
+	rte_arch_mov16(dst, src);
 }
 
 /**
@@ -87,18 +77,7 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
 static inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
-	__m128i reg_a, reg_b;
-	asm volatile (
-		"movdqu (%[src]), %[reg_a]\n\t"
-		"movdqu 16(%[src]), %[reg_b]\n\t"
-		"movdqu %[reg_a], (%[dst])\n\t"
-		"movdqu %[reg_b], 16(%[dst])\n\t"
-		: [reg_a] "=x" (reg_a),
-		  [reg_b] "=x" (reg_b)
-		: [src] "r" (src),
-		  [dst] "r"(dst)
-		: "memory"
-	);
+	rte_arch_mov32(dst, src);
 }
 
 /**
@@ -113,21 +92,7 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
 static inline void
 rte_mov48(uint8_t *dst, const uint8_t *src)
 {
-	__m128i reg_a, reg_b, reg_c;
-	asm volatile (
-		"movdqu (%[src]), %[reg_a]\n\t"
-		"movdqu 16(%[src]), %[reg_b]\n\t"
-		"movdqu 32(%[src]), %[reg_c]\n\t"
-		"movdqu %[reg_a], (%[dst])\n\t"
-		"movdqu %[reg_b], 16(%[dst])\n\t"
-		"movdqu %[reg_c], 32(%[dst])\n\t"
-		: [reg_a] "=x" (reg_a),
-		  [reg_b] "=x" (reg_b),
-		  [reg_c] "=x" (reg_c)
-		: [src] "r" (src),
-		  [dst] "r"(dst)
-		: "memory"
-	);
+	rte_arch_mov48(dst, src);
 }
 
 /**
@@ -142,24 +107,7 @@ rte_mov48(uint8_t *dst, const uint8_t *src)
 static inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
-	__m128i reg_a, reg_b, reg_c, reg_d;
-	asm volatile (
-		"movdqu (%[src]), %[reg_a]\n\t"
-		"movdqu 16(%[src]), %[reg_b]\n\t"
-		"movdqu 32(%[src]), %[reg_c]\n\t"
-		"movdqu 48(%[src]), %[reg_d]\n\t"
-		"movdqu %[reg_a], (%[dst])\n\t"
-		"movdqu %[reg_b], 16(%[dst])\n\t"
-		"movdqu %[reg_c], 32(%[dst])\n\t"
-		"movdqu %[reg_d], 48(%[dst])\n\t"
-		: [reg_a] "=x" (reg_a),
-		  [reg_b] "=x" (reg_b),
-		  [reg_c] "=x" (reg_c),
-		  [reg_d] "=x" (reg_d)
-		: [src] "r" (src),
-		  [dst] "r"(dst)
-		: "memory"
-	);
+	rte_arch_mov64(dst, src);
 }
 
 /**
@@ -174,36 +122,7 @@ rte_mov64(uint8_t *dst, const uint8_t *src)
 static inline void
 rte_mov128(uint8_t *dst, const uint8_t *src)
 {
-	__m128i reg_a, reg_b, reg_c, reg_d, reg_e, reg_f, reg_g, reg_h;
-	asm volatile (
-		"movdqu (%[src]), %[reg_a]\n\t"
-		"movdqu 16(%[src]), %[reg_b]\n\t"
-		"movdqu 32(%[src]), %[reg_c]\n\t"
-		"movdqu 48(%[src]), %[reg_d]\n\t"
-		"movdqu 64(%[src]), %[reg_e]\n\t"
-		"movdqu 80(%[src]), %[reg_f]\n\t"
-		"movdqu 96(%[src]), %[reg_g]\n\t"
-		"movdqu 112(%[src]), %[reg_h]\n\t"
-		"movdqu %[reg_a], (%[dst])\n\t"
-		"movdqu %[reg_b], 16(%[dst])\n\t"
-		"movdqu %[reg_c], 32(%[dst])\n\t"
-		"movdqu %[reg_d], 48(%[dst])\n\t"
-		"movdqu %[reg_e], 64(%[dst])\n\t"
-		"movdqu %[reg_f], 80(%[dst])\n\t"
-		"movdqu %[reg_g], 96(%[dst])\n\t"
-		"movdqu %[reg_h], 112(%[dst])\n\t"
-		: [reg_a] "=x" (reg_a),
-		  [reg_b] "=x" (reg_b),
-		  [reg_c] "=x" (reg_c),
-		  [reg_d] "=x" (reg_d),
-		  [reg_e] "=x" (reg_e),
-		  [reg_f] "=x" (reg_f),
-		  [reg_g] "=x" (reg_g),
-		  [reg_h] "=x" (reg_h)
-		: [src] "r" (src),
-		  [dst] "r"(dst)
-		: "memory"
-	);
+	rte_arch_mov128(dst, src);
 }
 
 #ifdef __INTEL_COMPILER
diff --git a/lib/librte_eal/common/include/x86_64/arch/rte_memcpy_arch.h b/lib/librte_eal/common/include/x86_64/arch/rte_memcpy_arch.h
new file mode 100644
index 0000000..44f7760
--- /dev/null
+++ b/lib/librte_eal/common/include/x86_64/arch/rte_memcpy_arch.h
@@ -0,0 +1,199 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_MEMCPY_ARCH_H_
+#define _RTE_MEMCPY_ARCH_H_
+
+#include <stdint.h>
+#include <string.h>
+#include <emmintrin.h>
+
+#ifdef __INTEL_COMPILER
+#pragma warning(disable:593) /* Stop unused variable warning (reg_a etc). */
+#endif
+
+/**
+ * Copy 16 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov16(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		: [reg_a] "=x" (reg_a)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+/**
+ * Copy 32 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov32(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a, reg_b;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu 16(%[src]), %[reg_b]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		"movdqu %[reg_b], 16(%[dst])\n\t"
+		: [reg_a] "=x" (reg_a),
+		  [reg_b] "=x" (reg_b)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+/**
+ * Copy 48 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov48(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a, reg_b, reg_c;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu 16(%[src]), %[reg_b]\n\t"
+		"movdqu 32(%[src]), %[reg_c]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		"movdqu %[reg_b], 16(%[dst])\n\t"
+		"movdqu %[reg_c], 32(%[dst])\n\t"
+		: [reg_a] "=x" (reg_a),
+		  [reg_b] "=x" (reg_b),
+		  [reg_c] "=x" (reg_c)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+/**
+ * Copy 64 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov64(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a, reg_b, reg_c, reg_d;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu 16(%[src]), %[reg_b]\n\t"
+		"movdqu 32(%[src]), %[reg_c]\n\t"
+		"movdqu 48(%[src]), %[reg_d]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		"movdqu %[reg_b], 16(%[dst])\n\t"
+		"movdqu %[reg_c], 32(%[dst])\n\t"
+		"movdqu %[reg_d], 48(%[dst])\n\t"
+		: [reg_a] "=x" (reg_a),
+		  [reg_b] "=x" (reg_b),
+		  [reg_c] "=x" (reg_c),
+		  [reg_d] "=x" (reg_d)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+/**
+ * Copy 128 bytes from one location to another using optimised SSE
+ * instructions. The locations should not overlap.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ */
+static inline void
+rte_arch_mov128(uint8_t *dst, const uint8_t *src)
+{
+	__m128i reg_a, reg_b, reg_c, reg_d, reg_e, reg_f, reg_g, reg_h;
+	asm volatile (
+		"movdqu (%[src]), %[reg_a]\n\t"
+		"movdqu 16(%[src]), %[reg_b]\n\t"
+		"movdqu 32(%[src]), %[reg_c]\n\t"
+		"movdqu 48(%[src]), %[reg_d]\n\t"
+		"movdqu 64(%[src]), %[reg_e]\n\t"
+		"movdqu 80(%[src]), %[reg_f]\n\t"
+		"movdqu 96(%[src]), %[reg_g]\n\t"
+		"movdqu 112(%[src]), %[reg_h]\n\t"
+		"movdqu %[reg_a], (%[dst])\n\t"
+		"movdqu %[reg_b], 16(%[dst])\n\t"
+		"movdqu %[reg_c], 32(%[dst])\n\t"
+		"movdqu %[reg_d], 48(%[dst])\n\t"
+		"movdqu %[reg_e], 64(%[dst])\n\t"
+		"movdqu %[reg_f], 80(%[dst])\n\t"
+		"movdqu %[reg_g], 96(%[dst])\n\t"
+		"movdqu %[reg_h], 112(%[dst])\n\t"
+		: [reg_a] "=x" (reg_a),
+		  [reg_b] "=x" (reg_b),
+		  [reg_c] "=x" (reg_c),
+		  [reg_d] "=x" (reg_d),
+		  [reg_e] "=x" (reg_e),
+		  [reg_f] "=x" (reg_f),
+		  [reg_g] "=x" (reg_g),
+		  [reg_h] "=x" (reg_h)
+		: [src] "r" (src),
+		  [dst] "r"(dst)
+		: "memory"
+	);
+}
+
+#endif /* _RTE_MEMCPY_ARCH_H_ */
\ No newline at end of file
-- 
1.7.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 7/7] Split CPU flags operations to architecture specific
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
                   ` (5 preceding siblings ...)
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 6/7] Split memcpy operation " Chao Zhu
@ 2014-09-26  9:33 ` Chao Zhu
  2014-10-03 13:21 ` [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK David Marchand
  2014-10-06 21:46 ` Cyril Chemparathy
  8 siblings, 0 replies; 16+ messages in thread
From: Chao Zhu @ 2014-09-26  9:33 UTC (permalink / raw)
  To: dev

This patch splits CPU flags related operations from DPDK and push them
to architecture specific arch directories, so that other processor
architecture to support DPDK can be easily adopted.

Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
---
 lib/librte_eal/common/Makefile                     |    2 +-
 lib/librte_eal/common/eal_common_cpuflags.c        |  173 +----------
 .../common/include/i686/arch/rte_cpuflags_arch.h   |  335 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_cpuflags.h       |  110 +-------
 .../common/include/x86_64/arch/rte_cpuflags_arch.h |  335 ++++++++++++++++++++
 5 files changed, 674 insertions(+), 281 deletions(-)
 create mode 100644 lib/librte_eal/common/include/i686/arch/rte_cpuflags_arch.h
 create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_cpuflags_arch.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 4add1c1..ff56255 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -46,7 +46,7 @@ ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
 endif
 
-ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h rte_prefetch_arch.h rte_spinlock_arch.h rte_memcpy_arch.h 
+ARCH_INC := rte_atomic.h rte_atomic_arch.h rte_byteorder_arch.h rte_cycles_arch.h rte_prefetch_arch.h rte_spinlock_arch.h rte_memcpy_arch.h rte_cpuflags_arch.h
 
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include := $(addprefix include/,$(INC))
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include/arch := \
diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
index 9e79179..aa58b05 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -30,10 +30,6 @@
  *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
-#include <stdlib.h>
-#include <stdio.h>
-#include <errno.h>
-#include <stdint.h>
 #include <rte_cpuflags.h>
 
 /*
@@ -49,131 +45,6 @@
 #endif
 #endif
 
-/**
- * Enumeration of CPU registers
- */
-enum cpu_register_t {
-	REG_EAX = 0,
-	REG_EBX,
-	REG_ECX,
-	REG_EDX,
-};
-
-typedef uint32_t cpuid_registers_t[4];
-
-#define CPU_FLAG_NAME_MAX_LEN 64
-
-/**
- * Struct to hold a processor feature entry
- */
-struct feature_entry {
-	uint32_t leaf;				/**< cpuid leaf */
-	uint32_t subleaf;			/**< cpuid subleaf */
-	uint32_t reg;				/**< cpuid register */
-	uint32_t bit;				/**< cpuid register bit */
-	char name[CPU_FLAG_NAME_MAX_LEN];       /**< String for printing */
-};
-
-#define FEAT_DEF(name, leaf, subleaf, reg, bit) \
-	[RTE_CPUFLAG_##name] = {leaf, subleaf, reg, bit, #name },
-
-/**
- * An array that holds feature entries
- */
-static const struct feature_entry cpu_feature_table[] = {
-	FEAT_DEF(SSE3, 0x00000001, 0, REG_ECX,  0)
-	FEAT_DEF(PCLMULQDQ, 0x00000001, 0, REG_ECX,  1)
-	FEAT_DEF(DTES64, 0x00000001, 0, REG_ECX,  2)
-	FEAT_DEF(MONITOR, 0x00000001, 0, REG_ECX,  3)
-	FEAT_DEF(DS_CPL, 0x00000001, 0, REG_ECX,  4)
-	FEAT_DEF(VMX, 0x00000001, 0, REG_ECX,  5)
-	FEAT_DEF(SMX, 0x00000001, 0, REG_ECX,  6)
-	FEAT_DEF(EIST, 0x00000001, 0, REG_ECX,  7)
-	FEAT_DEF(TM2, 0x00000001, 0, REG_ECX,  8)
-	FEAT_DEF(SSSE3, 0x00000001, 0, REG_ECX,  9)
-	FEAT_DEF(CNXT_ID, 0x00000001, 0, REG_ECX, 10)
-	FEAT_DEF(FMA, 0x00000001, 0, REG_ECX, 12)
-	FEAT_DEF(CMPXCHG16B, 0x00000001, 0, REG_ECX, 13)
-	FEAT_DEF(XTPR, 0x00000001, 0, REG_ECX, 14)
-	FEAT_DEF(PDCM, 0x00000001, 0, REG_ECX, 15)
-	FEAT_DEF(PCID, 0x00000001, 0, REG_ECX, 17)
-	FEAT_DEF(DCA, 0x00000001, 0, REG_ECX, 18)
-	FEAT_DEF(SSE4_1, 0x00000001, 0, REG_ECX, 19)
-	FEAT_DEF(SSE4_2, 0x00000001, 0, REG_ECX, 20)
-	FEAT_DEF(X2APIC, 0x00000001, 0, REG_ECX, 21)
-	FEAT_DEF(MOVBE, 0x00000001, 0, REG_ECX, 22)
-	FEAT_DEF(POPCNT, 0x00000001, 0, REG_ECX, 23)
-	FEAT_DEF(TSC_DEADLINE, 0x00000001, 0, REG_ECX, 24)
-	FEAT_DEF(AES, 0x00000001, 0, REG_ECX, 25)
-	FEAT_DEF(XSAVE, 0x00000001, 0, REG_ECX, 26)
-	FEAT_DEF(OSXSAVE, 0x00000001, 0, REG_ECX, 27)
-	FEAT_DEF(AVX, 0x00000001, 0, REG_ECX, 28)
-	FEAT_DEF(F16C, 0x00000001, 0, REG_ECX, 29)
-	FEAT_DEF(RDRAND, 0x00000001, 0, REG_ECX, 30)
-
-	FEAT_DEF(FPU, 0x00000001, 0, REG_EDX,  0)
-	FEAT_DEF(VME, 0x00000001, 0, REG_EDX,  1)
-	FEAT_DEF(DE, 0x00000001, 0, REG_EDX,  2)
-	FEAT_DEF(PSE, 0x00000001, 0, REG_EDX,  3)
-	FEAT_DEF(TSC, 0x00000001, 0, REG_EDX,  4)
-	FEAT_DEF(MSR, 0x00000001, 0, REG_EDX,  5)
-	FEAT_DEF(PAE, 0x00000001, 0, REG_EDX,  6)
-	FEAT_DEF(MCE, 0x00000001, 0, REG_EDX,  7)
-	FEAT_DEF(CX8, 0x00000001, 0, REG_EDX,  8)
-	FEAT_DEF(APIC, 0x00000001, 0, REG_EDX,  9)
-	FEAT_DEF(SEP, 0x00000001, 0, REG_EDX, 11)
-	FEAT_DEF(MTRR, 0x00000001, 0, REG_EDX, 12)
-	FEAT_DEF(PGE, 0x00000001, 0, REG_EDX, 13)
-	FEAT_DEF(MCA, 0x00000001, 0, REG_EDX, 14)
-	FEAT_DEF(CMOV, 0x00000001, 0, REG_EDX, 15)
-	FEAT_DEF(PAT, 0x00000001, 0, REG_EDX, 16)
-	FEAT_DEF(PSE36, 0x00000001, 0, REG_EDX, 17)
-	FEAT_DEF(PSN, 0x00000001, 0, REG_EDX, 18)
-	FEAT_DEF(CLFSH, 0x00000001, 0, REG_EDX, 19)
-	FEAT_DEF(DS, 0x00000001, 0, REG_EDX, 21)
-	FEAT_DEF(ACPI, 0x00000001, 0, REG_EDX, 22)
-	FEAT_DEF(MMX, 0x00000001, 0, REG_EDX, 23)
-	FEAT_DEF(FXSR, 0x00000001, 0, REG_EDX, 24)
-	FEAT_DEF(SSE, 0x00000001, 0, REG_EDX, 25)
-	FEAT_DEF(SSE2, 0x00000001, 0, REG_EDX, 26)
-	FEAT_DEF(SS, 0x00000001, 0, REG_EDX, 27)
-	FEAT_DEF(HTT, 0x00000001, 0, REG_EDX, 28)
-	FEAT_DEF(TM, 0x00000001, 0, REG_EDX, 29)
-	FEAT_DEF(PBE, 0x00000001, 0, REG_EDX, 31)
-
-	FEAT_DEF(DIGTEMP, 0x00000006, 0, REG_EAX,  0)
-	FEAT_DEF(TRBOBST, 0x00000006, 0, REG_EAX,  1)
-	FEAT_DEF(ARAT, 0x00000006, 0, REG_EAX,  2)
-	FEAT_DEF(PLN, 0x00000006, 0, REG_EAX,  4)
-	FEAT_DEF(ECMD, 0x00000006, 0, REG_EAX,  5)
-	FEAT_DEF(PTM, 0x00000006, 0, REG_EAX,  6)
-
-	FEAT_DEF(MPERF_APERF_MSR, 0x00000006, 0, REG_ECX,  0)
-	FEAT_DEF(ACNT2, 0x00000006, 0, REG_ECX,  1)
-	FEAT_DEF(ENERGY_EFF, 0x00000006, 0, REG_ECX,  3)
-
-	FEAT_DEF(FSGSBASE, 0x00000007, 0, REG_EBX,  0)
-	FEAT_DEF(BMI1, 0x00000007, 0, REG_EBX,  2)
-	FEAT_DEF(HLE, 0x00000007, 0, REG_EBX,  4)
-	FEAT_DEF(AVX2, 0x00000007, 0, REG_EBX,  5)
-	FEAT_DEF(SMEP, 0x00000007, 0, REG_EBX,  6)
-	FEAT_DEF(BMI2, 0x00000007, 0, REG_EBX,  7)
-	FEAT_DEF(ERMS, 0x00000007, 0, REG_EBX,  8)
-	FEAT_DEF(INVPCID, 0x00000007, 0, REG_EBX, 10)
-	FEAT_DEF(RTM, 0x00000007, 0, REG_EBX, 11)
-
-	FEAT_DEF(LAHF_SAHF, 0x80000001, 0, REG_ECX,  0)
-	FEAT_DEF(LZCNT, 0x80000001, 0, REG_ECX,  4)
-
-	FEAT_DEF(SYSCALL, 0x80000001, 0, REG_EDX, 11)
-	FEAT_DEF(XD, 0x80000001, 0, REG_EDX, 20)
-	FEAT_DEF(1GB_PG, 0x80000001, 0, REG_EDX, 26)
-	FEAT_DEF(RDTSCP, 0x80000001, 0, REG_EDX, 27)
-	FEAT_DEF(EM64T, 0x80000001, 0, REG_EDX, 29)
-
-	FEAT_DEF(INVTSC, 0x80000007, 0, REG_EDX,  8)
-};
-
 /*
  * Execute CPUID instruction and get contents of a specific register
  *
@@ -183,24 +54,7 @@ static const struct feature_entry cpu_feature_table[] = {
 static inline void
 rte_cpu_get_features(uint32_t leaf, uint32_t subleaf, cpuid_registers_t out)
 {
-#if defined(__i386__) && defined(__PIC__)
-    /* %ebx is a forbidden register if we compile with -fPIC or -fPIE */
-    asm volatile("movl %%ebx,%0 ; cpuid ; xchgl %%ebx,%0"
-		 : "=r" (out[REG_EBX]),
-		   "=a" (out[REG_EAX]),
-		   "=c" (out[REG_ECX]),
-		   "=d" (out[REG_EDX])
-		 : "a" (leaf), "c" (subleaf));
-#else
-
-    asm volatile("cpuid"
-		 : "=a" (out[REG_EAX]),
-		   "=b" (out[REG_EBX]),
-		   "=c" (out[REG_ECX]),
-		   "=d" (out[REG_EDX])
-		 : "a" (leaf), "c" (subleaf));
-
-#endif
+    rte_arch_cpu_get_features(leaf, subleaf, out);
 }
 
 /*
@@ -209,30 +63,7 @@ rte_cpu_get_features(uint32_t leaf, uint32_t subleaf, cpuid_registers_t out)
 int
 rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
 {
-	const struct feature_entry *feat;
-	cpuid_registers_t regs;
-
-
-	if (feature >= RTE_CPUFLAG_NUMFLAGS)
-		/* Flag does not match anything in the feature tables */
-		return -ENOENT;
-
-	feat = &cpu_feature_table[feature];
-
-	if (!feat->leaf)
-		/* This entry in the table wasn't filled out! */
-		return -EFAULT;
-
-	rte_cpu_get_features(feat->leaf & 0xffff0000, 0, regs);
-	if (((regs[REG_EAX] ^ feat->leaf) & 0xffff0000) ||
-	      regs[REG_EAX] < feat->leaf)
-		return 0;
-
-	/* get the cpuid leaf containing the desired feature */
-	rte_cpu_get_features(feat->leaf, feat->subleaf, regs);
-
-	/* check if the feature is enabled */
-	return (regs[feat->reg] >> feat->bit) & 1;
+	return rte_arch_cpu_get_flag_enabled(feature);
 }
 
 /**
diff --git a/lib/librte_eal/common/include/i686/arch/rte_cpuflags_arch.h b/lib/librte_eal/common/include/i686/arch/rte_cpuflags_arch.h
new file mode 100644
index 0000000..b2f078a
--- /dev/null
+++ b/lib/librte_eal/common/include/i686/arch/rte_cpuflags_arch.h
@@ -0,0 +1,335 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_CPUFLAGS_ARCH_H_
+#define _RTE_CPUFLAGS_ARCH_H_
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include <stdint.h>
+
+/**
+ * Enumeration of CPU registers
+ */
+enum cpu_register_t {
+	REG_EAX = 0,
+	REG_EBX,
+	REG_ECX,
+	REG_EDX,
+};
+
+/**
+ * Enumeration of all CPU features supported
+ */
+enum rte_cpu_flag_t {
+	/* (EAX 01h) ECX features*/
+	RTE_CPUFLAG_SSE3 = 0,               /**< SSE3 */
+	RTE_CPUFLAG_PCLMULQDQ,              /**< PCLMULQDQ */
+	RTE_CPUFLAG_DTES64,                 /**< DTES64 */
+	RTE_CPUFLAG_MONITOR,                /**< MONITOR */
+	RTE_CPUFLAG_DS_CPL,                 /**< DS_CPL */
+	RTE_CPUFLAG_VMX,                    /**< VMX */
+	RTE_CPUFLAG_SMX,                    /**< SMX */
+	RTE_CPUFLAG_EIST,                   /**< EIST */
+	RTE_CPUFLAG_TM2,                    /**< TM2 */
+	RTE_CPUFLAG_SSSE3,                  /**< SSSE3 */
+	RTE_CPUFLAG_CNXT_ID,                /**< CNXT_ID */
+	RTE_CPUFLAG_FMA,                    /**< FMA */
+	RTE_CPUFLAG_CMPXCHG16B,             /**< CMPXCHG16B */
+	RTE_CPUFLAG_XTPR,                   /**< XTPR */
+	RTE_CPUFLAG_PDCM,                   /**< PDCM */
+	RTE_CPUFLAG_PCID,                   /**< PCID */
+	RTE_CPUFLAG_DCA,                    /**< DCA */
+	RTE_CPUFLAG_SSE4_1,                 /**< SSE4_1 */
+	RTE_CPUFLAG_SSE4_2,                 /**< SSE4_2 */
+	RTE_CPUFLAG_X2APIC,                 /**< X2APIC */
+	RTE_CPUFLAG_MOVBE,                  /**< MOVBE */
+	RTE_CPUFLAG_POPCNT,                 /**< POPCNT */
+	RTE_CPUFLAG_TSC_DEADLINE,           /**< TSC_DEADLINE */
+	RTE_CPUFLAG_AES,                    /**< AES */
+	RTE_CPUFLAG_XSAVE,                  /**< XSAVE */
+	RTE_CPUFLAG_OSXSAVE,                /**< OSXSAVE */
+	RTE_CPUFLAG_AVX,                    /**< AVX */
+	RTE_CPUFLAG_F16C,                   /**< F16C */
+	RTE_CPUFLAG_RDRAND,                 /**< RDRAND */
+
+	/* (EAX 01h) EDX features */
+	RTE_CPUFLAG_FPU,                    /**< FPU */
+	RTE_CPUFLAG_VME,                    /**< VME */
+	RTE_CPUFLAG_DE,                     /**< DE */
+	RTE_CPUFLAG_PSE,                    /**< PSE */
+	RTE_CPUFLAG_TSC,                    /**< TSC */
+	RTE_CPUFLAG_MSR,                    /**< MSR */
+	RTE_CPUFLAG_PAE,                    /**< PAE */
+	RTE_CPUFLAG_MCE,                    /**< MCE */
+	RTE_CPUFLAG_CX8,                    /**< CX8 */
+	RTE_CPUFLAG_APIC,                   /**< APIC */
+	RTE_CPUFLAG_SEP,                    /**< SEP */
+	RTE_CPUFLAG_MTRR,                   /**< MTRR */
+	RTE_CPUFLAG_PGE,                    /**< PGE */
+	RTE_CPUFLAG_MCA,                    /**< MCA */
+	RTE_CPUFLAG_CMOV,                   /**< CMOV */
+	RTE_CPUFLAG_PAT,                    /**< PAT */
+	RTE_CPUFLAG_PSE36,                  /**< PSE36 */
+	RTE_CPUFLAG_PSN,                    /**< PSN */
+	RTE_CPUFLAG_CLFSH,                  /**< CLFSH */
+	RTE_CPUFLAG_DS,                     /**< DS */
+	RTE_CPUFLAG_ACPI,                   /**< ACPI */
+	RTE_CPUFLAG_MMX,                    /**< MMX */
+	RTE_CPUFLAG_FXSR,                   /**< FXSR */
+	RTE_CPUFLAG_SSE,                    /**< SSE */
+	RTE_CPUFLAG_SSE2,                   /**< SSE2 */
+	RTE_CPUFLAG_SS,                     /**< SS */
+	RTE_CPUFLAG_HTT,                    /**< HTT */
+	RTE_CPUFLAG_TM,                     /**< TM */
+	RTE_CPUFLAG_PBE,                    /**< PBE */
+
+	/* (EAX 06h) EAX features */
+	RTE_CPUFLAG_DIGTEMP,                /**< DIGTEMP */
+	RTE_CPUFLAG_TRBOBST,                /**< TRBOBST */
+	RTE_CPUFLAG_ARAT,                   /**< ARAT */
+	RTE_CPUFLAG_PLN,                    /**< PLN */
+	RTE_CPUFLAG_ECMD,                   /**< ECMD */
+	RTE_CPUFLAG_PTM,                    /**< PTM */
+
+	/* (EAX 06h) ECX features */
+	RTE_CPUFLAG_MPERF_APERF_MSR,        /**< MPERF_APERF_MSR */
+	RTE_CPUFLAG_ACNT2,                  /**< ACNT2 */
+	RTE_CPUFLAG_ENERGY_EFF,             /**< ENERGY_EFF */
+
+	/* (EAX 07h, ECX 0h) EBX features */
+	RTE_CPUFLAG_FSGSBASE,               /**< FSGSBASE */
+	RTE_CPUFLAG_BMI1,                   /**< BMI1 */
+	RTE_CPUFLAG_HLE,                    /**< Hardware Lock elision */
+	RTE_CPUFLAG_AVX2,                   /**< AVX2 */
+	RTE_CPUFLAG_SMEP,                   /**< SMEP */
+	RTE_CPUFLAG_BMI2,                   /**< BMI2 */
+	RTE_CPUFLAG_ERMS,                   /**< ERMS */
+	RTE_CPUFLAG_INVPCID,                /**< INVPCID */
+	RTE_CPUFLAG_RTM,                    /**< Transactional memory */
+
+	/* (EAX 80000001h) ECX features */
+	RTE_CPUFLAG_LAHF_SAHF,              /**< LAHF_SAHF */
+	RTE_CPUFLAG_LZCNT,                  /**< LZCNT */
+
+	/* (EAX 80000001h) EDX features */
+	RTE_CPUFLAG_SYSCALL,                /**< SYSCALL */
+	RTE_CPUFLAG_XD,                     /**< XD */
+	RTE_CPUFLAG_1GB_PG,                 /**< 1GB_PG */
+	RTE_CPUFLAG_RDTSCP,                 /**< RDTSCP */
+	RTE_CPUFLAG_EM64T,                  /**< EM64T */
+
+	/* (EAX 80000007h) EDX features */
+	RTE_CPUFLAG_INVTSC,                 /**< INVTSC */
+
+	/* The last item */
+	RTE_CPUFLAG_NUMFLAGS,               /**< This should always be the last! */
+};
+
+typedef uint32_t cpuid_registers_t[4];
+
+#define CPU_FLAG_NAME_MAX_LEN 64
+
+/**
+ * Struct to hold a processor feature entry
+ */
+struct feature_entry {
+	uint32_t leaf;				/**< cpuid leaf */
+	uint32_t subleaf;			/**< cpuid subleaf */
+	uint32_t reg;				/**< cpuid register */
+	uint32_t bit;				/**< cpuid register bit */
+	char name[CPU_FLAG_NAME_MAX_LEN];       /**< String for printing */
+};
+
+#define FEAT_DEF(name, leaf, subleaf, reg, bit) \
+	[RTE_CPUFLAG_##name] = {leaf, subleaf, reg, bit, #name },
+
+/**
+ * An array that holds feature entries
+ */
+static const struct feature_entry cpu_feature_table[] = {
+	FEAT_DEF(SSE3, 0x00000001, 0, REG_ECX,  0)
+	FEAT_DEF(PCLMULQDQ, 0x00000001, 0, REG_ECX,  1)
+	FEAT_DEF(DTES64, 0x00000001, 0, REG_ECX,  2)
+	FEAT_DEF(MONITOR, 0x00000001, 0, REG_ECX,  3)
+	FEAT_DEF(DS_CPL, 0x00000001, 0, REG_ECX,  4)
+	FEAT_DEF(VMX, 0x00000001, 0, REG_ECX,  5)
+	FEAT_DEF(SMX, 0x00000001, 0, REG_ECX,  6)
+	FEAT_DEF(EIST, 0x00000001, 0, REG_ECX,  7)
+	FEAT_DEF(TM2, 0x00000001, 0, REG_ECX,  8)
+	FEAT_DEF(SSSE3, 0x00000001, 0, REG_ECX,  9)
+	FEAT_DEF(CNXT_ID, 0x00000001, 0, REG_ECX, 10)
+	FEAT_DEF(FMA, 0x00000001, 0, REG_ECX, 12)
+	FEAT_DEF(CMPXCHG16B, 0x00000001, 0, REG_ECX, 13)
+	FEAT_DEF(XTPR, 0x00000001, 0, REG_ECX, 14)
+	FEAT_DEF(PDCM, 0x00000001, 0, REG_ECX, 15)
+	FEAT_DEF(PCID, 0x00000001, 0, REG_ECX, 17)
+	FEAT_DEF(DCA, 0x00000001, 0, REG_ECX, 18)
+	FEAT_DEF(SSE4_1, 0x00000001, 0, REG_ECX, 19)
+	FEAT_DEF(SSE4_2, 0x00000001, 0, REG_ECX, 20)
+	FEAT_DEF(X2APIC, 0x00000001, 0, REG_ECX, 21)
+	FEAT_DEF(MOVBE, 0x00000001, 0, REG_ECX, 22)
+	FEAT_DEF(POPCNT, 0x00000001, 0, REG_ECX, 23)
+	FEAT_DEF(TSC_DEADLINE, 0x00000001, 0, REG_ECX, 24)
+	FEAT_DEF(AES, 0x00000001, 0, REG_ECX, 25)
+	FEAT_DEF(XSAVE, 0x00000001, 0, REG_ECX, 26)
+	FEAT_DEF(OSXSAVE, 0x00000001, 0, REG_ECX, 27)
+	FEAT_DEF(AVX, 0x00000001, 0, REG_ECX, 28)
+	FEAT_DEF(F16C, 0x00000001, 0, REG_ECX, 29)
+	FEAT_DEF(RDRAND, 0x00000001, 0, REG_ECX, 30)
+
+	FEAT_DEF(FPU, 0x00000001, 0, REG_EDX,  0)
+	FEAT_DEF(VME, 0x00000001, 0, REG_EDX,  1)
+	FEAT_DEF(DE, 0x00000001, 0, REG_EDX,  2)
+	FEAT_DEF(PSE, 0x00000001, 0, REG_EDX,  3)
+	FEAT_DEF(TSC, 0x00000001, 0, REG_EDX,  4)
+	FEAT_DEF(MSR, 0x00000001, 0, REG_EDX,  5)
+	FEAT_DEF(PAE, 0x00000001, 0, REG_EDX,  6)
+	FEAT_DEF(MCE, 0x00000001, 0, REG_EDX,  7)
+	FEAT_DEF(CX8, 0x00000001, 0, REG_EDX,  8)
+	FEAT_DEF(APIC, 0x00000001, 0, REG_EDX,  9)
+	FEAT_DEF(SEP, 0x00000001, 0, REG_EDX, 11)
+	FEAT_DEF(MTRR, 0x00000001, 0, REG_EDX, 12)
+	FEAT_DEF(PGE, 0x00000001, 0, REG_EDX, 13)
+	FEAT_DEF(MCA, 0x00000001, 0, REG_EDX, 14)
+	FEAT_DEF(CMOV, 0x00000001, 0, REG_EDX, 15)
+	FEAT_DEF(PAT, 0x00000001, 0, REG_EDX, 16)
+	FEAT_DEF(PSE36, 0x00000001, 0, REG_EDX, 17)
+	FEAT_DEF(PSN, 0x00000001, 0, REG_EDX, 18)
+	FEAT_DEF(CLFSH, 0x00000001, 0, REG_EDX, 19)
+	FEAT_DEF(DS, 0x00000001, 0, REG_EDX, 21)
+	FEAT_DEF(ACPI, 0x00000001, 0, REG_EDX, 22)
+	FEAT_DEF(MMX, 0x00000001, 0, REG_EDX, 23)
+	FEAT_DEF(FXSR, 0x00000001, 0, REG_EDX, 24)
+	FEAT_DEF(SSE, 0x00000001, 0, REG_EDX, 25)
+	FEAT_DEF(SSE2, 0x00000001, 0, REG_EDX, 26)
+	FEAT_DEF(SS, 0x00000001, 0, REG_EDX, 27)
+	FEAT_DEF(HTT, 0x00000001, 0, REG_EDX, 28)
+	FEAT_DEF(TM, 0x00000001, 0, REG_EDX, 29)
+	FEAT_DEF(PBE, 0x00000001, 0, REG_EDX, 31)
+
+	FEAT_DEF(DIGTEMP, 0x00000006, 0, REG_EAX,  0)
+	FEAT_DEF(TRBOBST, 0x00000006, 0, REG_EAX,  1)
+	FEAT_DEF(ARAT, 0x00000006, 0, REG_EAX,  2)
+	FEAT_DEF(PLN, 0x00000006, 0, REG_EAX,  4)
+	FEAT_DEF(ECMD, 0x00000006, 0, REG_EAX,  5)
+	FEAT_DEF(PTM, 0x00000006, 0, REG_EAX,  6)
+
+	FEAT_DEF(MPERF_APERF_MSR, 0x00000006, 0, REG_ECX,  0)
+	FEAT_DEF(ACNT2, 0x00000006, 0, REG_ECX,  1)
+	FEAT_DEF(ENERGY_EFF, 0x00000006, 0, REG_ECX,  3)
+
+	FEAT_DEF(FSGSBASE, 0x00000007, 0, REG_EBX,  0)
+	FEAT_DEF(BMI1, 0x00000007, 0, REG_EBX,  2)
+	FEAT_DEF(HLE, 0x00000007, 0, REG_EBX,  4)
+	FEAT_DEF(AVX2, 0x00000007, 0, REG_EBX,  5)
+	FEAT_DEF(SMEP, 0x00000007, 0, REG_EBX,  6)
+	FEAT_DEF(BMI2, 0x00000007, 0, REG_EBX,  7)
+	FEAT_DEF(ERMS, 0x00000007, 0, REG_EBX,  8)
+	FEAT_DEF(INVPCID, 0x00000007, 0, REG_EBX, 10)
+	FEAT_DEF(RTM, 0x00000007, 0, REG_EBX, 11)
+
+	FEAT_DEF(LAHF_SAHF, 0x80000001, 0, REG_ECX,  0)
+	FEAT_DEF(LZCNT, 0x80000001, 0, REG_ECX,  4)
+
+	FEAT_DEF(SYSCALL, 0x80000001, 0, REG_EDX, 11)
+	FEAT_DEF(XD, 0x80000001, 0, REG_EDX, 20)
+	FEAT_DEF(1GB_PG, 0x80000001, 0, REG_EDX, 26)
+	FEAT_DEF(RDTSCP, 0x80000001, 0, REG_EDX, 27)
+	FEAT_DEF(EM64T, 0x80000001, 0, REG_EDX, 29)
+
+	FEAT_DEF(INVTSC, 0x80000007, 0, REG_EDX,  8)
+};
+
+/*
+ * Execute CPUID instruction and get contents of a specific register
+ *
+ * This function, when compiled with GCC, will generate architecture-neutral
+ * code, as per GCC manual.
+ */
+static inline void
+rte_arch_cpu_get_features(uint32_t leaf, uint32_t subleaf, cpuid_registers_t out)
+{
+#if defined(__i386__) && defined(__PIC__)
+    /* %ebx is a forbidden register if we compile with -fPIC or -fPIE */
+    asm volatile("movl %%ebx,%0 ; cpuid ; xchgl %%ebx,%0"
+		 : "=r" (out[REG_EBX]),
+		   "=a" (out[REG_EAX]),
+		   "=c" (out[REG_ECX]),
+		   "=d" (out[REG_EDX])
+		 : "a" (leaf), "c" (subleaf));
+#else
+
+    asm volatile("cpuid"
+		 : "=a" (out[REG_EAX]),
+		   "=b" (out[REG_EBX]),
+		   "=c" (out[REG_ECX]),
+		   "=d" (out[REG_EDX])
+		 : "a" (leaf), "c" (subleaf));
+
+#endif
+}
+
+/*
+ * Checks if a particular flag is available on current machine.
+ */
+static inline int
+rte_arch_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
+{
+	const struct feature_entry *feat;
+	cpuid_registers_t regs;
+
+	if (feature >= RTE_CPUFLAG_NUMFLAGS)
+		/* Flag does not match anything in the feature tables */
+		return -ENOENT;
+
+	feat = &cpu_feature_table[feature];
+
+	if (!feat->leaf)
+		/* This entry in the table wasn't filled out! */
+		return -EFAULT;
+
+	rte_arch_cpu_get_features(feat->leaf & 0xffff0000, 0, regs);
+	if (((regs[REG_EAX] ^ feat->leaf) & 0xffff0000) ||
+	      regs[REG_EAX] < feat->leaf)
+		return 0;
+
+	/* get the cpuid leaf containing the desired feature */
+	rte_arch_cpu_get_features(feat->leaf, feat->subleaf, regs);
+
+	/* check if the feature is enabled */
+	return (regs[feat->reg] >> feat->bit) & 1;
+}
+
+#endif /* _RTE_CPUFLAGS_ARCH_H_ */
\ No newline at end of file
diff --git a/lib/librte_eal/common/include/rte_cpuflags.h b/lib/librte_eal/common/include/rte_cpuflags.h
index 5fa96db..4179c61 100644
--- a/lib/librte_eal/common/include/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/rte_cpuflags.h
@@ -43,115 +43,7 @@
 extern "C" {
 #endif
 
-
-/**
- * Enumeration of all CPU features supported
- */
-enum rte_cpu_flag_t {
-	/* (EAX 01h) ECX features*/
-	RTE_CPUFLAG_SSE3 = 0,               /**< SSE3 */
-	RTE_CPUFLAG_PCLMULQDQ,              /**< PCLMULQDQ */
-	RTE_CPUFLAG_DTES64,                 /**< DTES64 */
-	RTE_CPUFLAG_MONITOR,                /**< MONITOR */
-	RTE_CPUFLAG_DS_CPL,                 /**< DS_CPL */
-	RTE_CPUFLAG_VMX,                    /**< VMX */
-	RTE_CPUFLAG_SMX,                    /**< SMX */
-	RTE_CPUFLAG_EIST,                   /**< EIST */
-	RTE_CPUFLAG_TM2,                    /**< TM2 */
-	RTE_CPUFLAG_SSSE3,                  /**< SSSE3 */
-	RTE_CPUFLAG_CNXT_ID,                /**< CNXT_ID */
-	RTE_CPUFLAG_FMA,                    /**< FMA */
-	RTE_CPUFLAG_CMPXCHG16B,             /**< CMPXCHG16B */
-	RTE_CPUFLAG_XTPR,                   /**< XTPR */
-	RTE_CPUFLAG_PDCM,                   /**< PDCM */
-	RTE_CPUFLAG_PCID,                   /**< PCID */
-	RTE_CPUFLAG_DCA,                    /**< DCA */
-	RTE_CPUFLAG_SSE4_1,                 /**< SSE4_1 */
-	RTE_CPUFLAG_SSE4_2,                 /**< SSE4_2 */
-	RTE_CPUFLAG_X2APIC,                 /**< X2APIC */
-	RTE_CPUFLAG_MOVBE,                  /**< MOVBE */
-	RTE_CPUFLAG_POPCNT,                 /**< POPCNT */
-	RTE_CPUFLAG_TSC_DEADLINE,           /**< TSC_DEADLINE */
-	RTE_CPUFLAG_AES,                    /**< AES */
-	RTE_CPUFLAG_XSAVE,                  /**< XSAVE */
-	RTE_CPUFLAG_OSXSAVE,                /**< OSXSAVE */
-	RTE_CPUFLAG_AVX,                    /**< AVX */
-	RTE_CPUFLAG_F16C,                   /**< F16C */
-	RTE_CPUFLAG_RDRAND,                 /**< RDRAND */
-
-	/* (EAX 01h) EDX features */
-	RTE_CPUFLAG_FPU,                    /**< FPU */
-	RTE_CPUFLAG_VME,                    /**< VME */
-	RTE_CPUFLAG_DE,                     /**< DE */
-	RTE_CPUFLAG_PSE,                    /**< PSE */
-	RTE_CPUFLAG_TSC,                    /**< TSC */
-	RTE_CPUFLAG_MSR,                    /**< MSR */
-	RTE_CPUFLAG_PAE,                    /**< PAE */
-	RTE_CPUFLAG_MCE,                    /**< MCE */
-	RTE_CPUFLAG_CX8,                    /**< CX8 */
-	RTE_CPUFLAG_APIC,                   /**< APIC */
-	RTE_CPUFLAG_SEP,                    /**< SEP */
-	RTE_CPUFLAG_MTRR,                   /**< MTRR */
-	RTE_CPUFLAG_PGE,                    /**< PGE */
-	RTE_CPUFLAG_MCA,                    /**< MCA */
-	RTE_CPUFLAG_CMOV,                   /**< CMOV */
-	RTE_CPUFLAG_PAT,                    /**< PAT */
-	RTE_CPUFLAG_PSE36,                  /**< PSE36 */
-	RTE_CPUFLAG_PSN,                    /**< PSN */
-	RTE_CPUFLAG_CLFSH,                  /**< CLFSH */
-	RTE_CPUFLAG_DS,                     /**< DS */
-	RTE_CPUFLAG_ACPI,                   /**< ACPI */
-	RTE_CPUFLAG_MMX,                    /**< MMX */
-	RTE_CPUFLAG_FXSR,                   /**< FXSR */
-	RTE_CPUFLAG_SSE,                    /**< SSE */
-	RTE_CPUFLAG_SSE2,                   /**< SSE2 */
-	RTE_CPUFLAG_SS,                     /**< SS */
-	RTE_CPUFLAG_HTT,                    /**< HTT */
-	RTE_CPUFLAG_TM,                     /**< TM */
-	RTE_CPUFLAG_PBE,                    /**< PBE */
-
-	/* (EAX 06h) EAX features */
-	RTE_CPUFLAG_DIGTEMP,                /**< DIGTEMP */
-	RTE_CPUFLAG_TRBOBST,                /**< TRBOBST */
-	RTE_CPUFLAG_ARAT,                   /**< ARAT */
-	RTE_CPUFLAG_PLN,                    /**< PLN */
-	RTE_CPUFLAG_ECMD,                   /**< ECMD */
-	RTE_CPUFLAG_PTM,                    /**< PTM */
-
-	/* (EAX 06h) ECX features */
-	RTE_CPUFLAG_MPERF_APERF_MSR,        /**< MPERF_APERF_MSR */
-	RTE_CPUFLAG_ACNT2,                  /**< ACNT2 */
-	RTE_CPUFLAG_ENERGY_EFF,             /**< ENERGY_EFF */
-
-	/* (EAX 07h, ECX 0h) EBX features */
-	RTE_CPUFLAG_FSGSBASE,               /**< FSGSBASE */
-	RTE_CPUFLAG_BMI1,                   /**< BMI1 */
-	RTE_CPUFLAG_HLE,                    /**< Hardware Lock elision */
-	RTE_CPUFLAG_AVX2,                   /**< AVX2 */
-	RTE_CPUFLAG_SMEP,                   /**< SMEP */
-	RTE_CPUFLAG_BMI2,                   /**< BMI2 */
-	RTE_CPUFLAG_ERMS,                   /**< ERMS */
-	RTE_CPUFLAG_INVPCID,                /**< INVPCID */
-	RTE_CPUFLAG_RTM,                    /**< Transactional memory */
-
-	/* (EAX 80000001h) ECX features */
-	RTE_CPUFLAG_LAHF_SAHF,              /**< LAHF_SAHF */
-	RTE_CPUFLAG_LZCNT,                  /**< LZCNT */
-
-	/* (EAX 80000001h) EDX features */
-	RTE_CPUFLAG_SYSCALL,                /**< SYSCALL */
-	RTE_CPUFLAG_XD,                     /**< XD */
-	RTE_CPUFLAG_1GB_PG,                 /**< 1GB_PG */
-	RTE_CPUFLAG_RDTSCP,                 /**< RDTSCP */
-	RTE_CPUFLAG_EM64T,                  /**< EM64T */
-
-	/* (EAX 80000007h) EDX features */
-	RTE_CPUFLAG_INVTSC,                 /**< INVTSC */
-
-	/* The last item */
-	RTE_CPUFLAG_NUMFLAGS,               /**< This should always be the last! */
-};
-
+#include "arch/rte_cpuflags_arch.h"
 
 /**
  * Function for checking a CPU flag availability
diff --git a/lib/librte_eal/common/include/x86_64/arch/rte_cpuflags_arch.h b/lib/librte_eal/common/include/x86_64/arch/rte_cpuflags_arch.h
new file mode 100644
index 0000000..b2f078a
--- /dev/null
+++ b/lib/librte_eal/common/include/x86_64/arch/rte_cpuflags_arch.h
@@ -0,0 +1,335 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_CPUFLAGS_ARCH_H_
+#define _RTE_CPUFLAGS_ARCH_H_
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include <stdint.h>
+
+/**
+ * Enumeration of CPU registers
+ */
+enum cpu_register_t {
+	REG_EAX = 0,
+	REG_EBX,
+	REG_ECX,
+	REG_EDX,
+};
+
+/**
+ * Enumeration of all CPU features supported
+ */
+enum rte_cpu_flag_t {
+	/* (EAX 01h) ECX features*/
+	RTE_CPUFLAG_SSE3 = 0,               /**< SSE3 */
+	RTE_CPUFLAG_PCLMULQDQ,              /**< PCLMULQDQ */
+	RTE_CPUFLAG_DTES64,                 /**< DTES64 */
+	RTE_CPUFLAG_MONITOR,                /**< MONITOR */
+	RTE_CPUFLAG_DS_CPL,                 /**< DS_CPL */
+	RTE_CPUFLAG_VMX,                    /**< VMX */
+	RTE_CPUFLAG_SMX,                    /**< SMX */
+	RTE_CPUFLAG_EIST,                   /**< EIST */
+	RTE_CPUFLAG_TM2,                    /**< TM2 */
+	RTE_CPUFLAG_SSSE3,                  /**< SSSE3 */
+	RTE_CPUFLAG_CNXT_ID,                /**< CNXT_ID */
+	RTE_CPUFLAG_FMA,                    /**< FMA */
+	RTE_CPUFLAG_CMPXCHG16B,             /**< CMPXCHG16B */
+	RTE_CPUFLAG_XTPR,                   /**< XTPR */
+	RTE_CPUFLAG_PDCM,                   /**< PDCM */
+	RTE_CPUFLAG_PCID,                   /**< PCID */
+	RTE_CPUFLAG_DCA,                    /**< DCA */
+	RTE_CPUFLAG_SSE4_1,                 /**< SSE4_1 */
+	RTE_CPUFLAG_SSE4_2,                 /**< SSE4_2 */
+	RTE_CPUFLAG_X2APIC,                 /**< X2APIC */
+	RTE_CPUFLAG_MOVBE,                  /**< MOVBE */
+	RTE_CPUFLAG_POPCNT,                 /**< POPCNT */
+	RTE_CPUFLAG_TSC_DEADLINE,           /**< TSC_DEADLINE */
+	RTE_CPUFLAG_AES,                    /**< AES */
+	RTE_CPUFLAG_XSAVE,                  /**< XSAVE */
+	RTE_CPUFLAG_OSXSAVE,                /**< OSXSAVE */
+	RTE_CPUFLAG_AVX,                    /**< AVX */
+	RTE_CPUFLAG_F16C,                   /**< F16C */
+	RTE_CPUFLAG_RDRAND,                 /**< RDRAND */
+
+	/* (EAX 01h) EDX features */
+	RTE_CPUFLAG_FPU,                    /**< FPU */
+	RTE_CPUFLAG_VME,                    /**< VME */
+	RTE_CPUFLAG_DE,                     /**< DE */
+	RTE_CPUFLAG_PSE,                    /**< PSE */
+	RTE_CPUFLAG_TSC,                    /**< TSC */
+	RTE_CPUFLAG_MSR,                    /**< MSR */
+	RTE_CPUFLAG_PAE,                    /**< PAE */
+	RTE_CPUFLAG_MCE,                    /**< MCE */
+	RTE_CPUFLAG_CX8,                    /**< CX8 */
+	RTE_CPUFLAG_APIC,                   /**< APIC */
+	RTE_CPUFLAG_SEP,                    /**< SEP */
+	RTE_CPUFLAG_MTRR,                   /**< MTRR */
+	RTE_CPUFLAG_PGE,                    /**< PGE */
+	RTE_CPUFLAG_MCA,                    /**< MCA */
+	RTE_CPUFLAG_CMOV,                   /**< CMOV */
+	RTE_CPUFLAG_PAT,                    /**< PAT */
+	RTE_CPUFLAG_PSE36,                  /**< PSE36 */
+	RTE_CPUFLAG_PSN,                    /**< PSN */
+	RTE_CPUFLAG_CLFSH,                  /**< CLFSH */
+	RTE_CPUFLAG_DS,                     /**< DS */
+	RTE_CPUFLAG_ACPI,                   /**< ACPI */
+	RTE_CPUFLAG_MMX,                    /**< MMX */
+	RTE_CPUFLAG_FXSR,                   /**< FXSR */
+	RTE_CPUFLAG_SSE,                    /**< SSE */
+	RTE_CPUFLAG_SSE2,                   /**< SSE2 */
+	RTE_CPUFLAG_SS,                     /**< SS */
+	RTE_CPUFLAG_HTT,                    /**< HTT */
+	RTE_CPUFLAG_TM,                     /**< TM */
+	RTE_CPUFLAG_PBE,                    /**< PBE */
+
+	/* (EAX 06h) EAX features */
+	RTE_CPUFLAG_DIGTEMP,                /**< DIGTEMP */
+	RTE_CPUFLAG_TRBOBST,                /**< TRBOBST */
+	RTE_CPUFLAG_ARAT,                   /**< ARAT */
+	RTE_CPUFLAG_PLN,                    /**< PLN */
+	RTE_CPUFLAG_ECMD,                   /**< ECMD */
+	RTE_CPUFLAG_PTM,                    /**< PTM */
+
+	/* (EAX 06h) ECX features */
+	RTE_CPUFLAG_MPERF_APERF_MSR,        /**< MPERF_APERF_MSR */
+	RTE_CPUFLAG_ACNT2,                  /**< ACNT2 */
+	RTE_CPUFLAG_ENERGY_EFF,             /**< ENERGY_EFF */
+
+	/* (EAX 07h, ECX 0h) EBX features */
+	RTE_CPUFLAG_FSGSBASE,               /**< FSGSBASE */
+	RTE_CPUFLAG_BMI1,                   /**< BMI1 */
+	RTE_CPUFLAG_HLE,                    /**< Hardware Lock elision */
+	RTE_CPUFLAG_AVX2,                   /**< AVX2 */
+	RTE_CPUFLAG_SMEP,                   /**< SMEP */
+	RTE_CPUFLAG_BMI2,                   /**< BMI2 */
+	RTE_CPUFLAG_ERMS,                   /**< ERMS */
+	RTE_CPUFLAG_INVPCID,                /**< INVPCID */
+	RTE_CPUFLAG_RTM,                    /**< Transactional memory */
+
+	/* (EAX 80000001h) ECX features */
+	RTE_CPUFLAG_LAHF_SAHF,              /**< LAHF_SAHF */
+	RTE_CPUFLAG_LZCNT,                  /**< LZCNT */
+
+	/* (EAX 80000001h) EDX features */
+	RTE_CPUFLAG_SYSCALL,                /**< SYSCALL */
+	RTE_CPUFLAG_XD,                     /**< XD */
+	RTE_CPUFLAG_1GB_PG,                 /**< 1GB_PG */
+	RTE_CPUFLAG_RDTSCP,                 /**< RDTSCP */
+	RTE_CPUFLAG_EM64T,                  /**< EM64T */
+
+	/* (EAX 80000007h) EDX features */
+	RTE_CPUFLAG_INVTSC,                 /**< INVTSC */
+
+	/* The last item */
+	RTE_CPUFLAG_NUMFLAGS,               /**< This should always be the last! */
+};
+
+typedef uint32_t cpuid_registers_t[4];
+
+#define CPU_FLAG_NAME_MAX_LEN 64
+
+/**
+ * Struct to hold a processor feature entry
+ */
+struct feature_entry {
+	uint32_t leaf;				/**< cpuid leaf */
+	uint32_t subleaf;			/**< cpuid subleaf */
+	uint32_t reg;				/**< cpuid register */
+	uint32_t bit;				/**< cpuid register bit */
+	char name[CPU_FLAG_NAME_MAX_LEN];       /**< String for printing */
+};
+
+#define FEAT_DEF(name, leaf, subleaf, reg, bit) \
+	[RTE_CPUFLAG_##name] = {leaf, subleaf, reg, bit, #name },
+
+/**
+ * An array that holds feature entries
+ */
+static const struct feature_entry cpu_feature_table[] = {
+	FEAT_DEF(SSE3, 0x00000001, 0, REG_ECX,  0)
+	FEAT_DEF(PCLMULQDQ, 0x00000001, 0, REG_ECX,  1)
+	FEAT_DEF(DTES64, 0x00000001, 0, REG_ECX,  2)
+	FEAT_DEF(MONITOR, 0x00000001, 0, REG_ECX,  3)
+	FEAT_DEF(DS_CPL, 0x00000001, 0, REG_ECX,  4)
+	FEAT_DEF(VMX, 0x00000001, 0, REG_ECX,  5)
+	FEAT_DEF(SMX, 0x00000001, 0, REG_ECX,  6)
+	FEAT_DEF(EIST, 0x00000001, 0, REG_ECX,  7)
+	FEAT_DEF(TM2, 0x00000001, 0, REG_ECX,  8)
+	FEAT_DEF(SSSE3, 0x00000001, 0, REG_ECX,  9)
+	FEAT_DEF(CNXT_ID, 0x00000001, 0, REG_ECX, 10)
+	FEAT_DEF(FMA, 0x00000001, 0, REG_ECX, 12)
+	FEAT_DEF(CMPXCHG16B, 0x00000001, 0, REG_ECX, 13)
+	FEAT_DEF(XTPR, 0x00000001, 0, REG_ECX, 14)
+	FEAT_DEF(PDCM, 0x00000001, 0, REG_ECX, 15)
+	FEAT_DEF(PCID, 0x00000001, 0, REG_ECX, 17)
+	FEAT_DEF(DCA, 0x00000001, 0, REG_ECX, 18)
+	FEAT_DEF(SSE4_1, 0x00000001, 0, REG_ECX, 19)
+	FEAT_DEF(SSE4_2, 0x00000001, 0, REG_ECX, 20)
+	FEAT_DEF(X2APIC, 0x00000001, 0, REG_ECX, 21)
+	FEAT_DEF(MOVBE, 0x00000001, 0, REG_ECX, 22)
+	FEAT_DEF(POPCNT, 0x00000001, 0, REG_ECX, 23)
+	FEAT_DEF(TSC_DEADLINE, 0x00000001, 0, REG_ECX, 24)
+	FEAT_DEF(AES, 0x00000001, 0, REG_ECX, 25)
+	FEAT_DEF(XSAVE, 0x00000001, 0, REG_ECX, 26)
+	FEAT_DEF(OSXSAVE, 0x00000001, 0, REG_ECX, 27)
+	FEAT_DEF(AVX, 0x00000001, 0, REG_ECX, 28)
+	FEAT_DEF(F16C, 0x00000001, 0, REG_ECX, 29)
+	FEAT_DEF(RDRAND, 0x00000001, 0, REG_ECX, 30)
+
+	FEAT_DEF(FPU, 0x00000001, 0, REG_EDX,  0)
+	FEAT_DEF(VME, 0x00000001, 0, REG_EDX,  1)
+	FEAT_DEF(DE, 0x00000001, 0, REG_EDX,  2)
+	FEAT_DEF(PSE, 0x00000001, 0, REG_EDX,  3)
+	FEAT_DEF(TSC, 0x00000001, 0, REG_EDX,  4)
+	FEAT_DEF(MSR, 0x00000001, 0, REG_EDX,  5)
+	FEAT_DEF(PAE, 0x00000001, 0, REG_EDX,  6)
+	FEAT_DEF(MCE, 0x00000001, 0, REG_EDX,  7)
+	FEAT_DEF(CX8, 0x00000001, 0, REG_EDX,  8)
+	FEAT_DEF(APIC, 0x00000001, 0, REG_EDX,  9)
+	FEAT_DEF(SEP, 0x00000001, 0, REG_EDX, 11)
+	FEAT_DEF(MTRR, 0x00000001, 0, REG_EDX, 12)
+	FEAT_DEF(PGE, 0x00000001, 0, REG_EDX, 13)
+	FEAT_DEF(MCA, 0x00000001, 0, REG_EDX, 14)
+	FEAT_DEF(CMOV, 0x00000001, 0, REG_EDX, 15)
+	FEAT_DEF(PAT, 0x00000001, 0, REG_EDX, 16)
+	FEAT_DEF(PSE36, 0x00000001, 0, REG_EDX, 17)
+	FEAT_DEF(PSN, 0x00000001, 0, REG_EDX, 18)
+	FEAT_DEF(CLFSH, 0x00000001, 0, REG_EDX, 19)
+	FEAT_DEF(DS, 0x00000001, 0, REG_EDX, 21)
+	FEAT_DEF(ACPI, 0x00000001, 0, REG_EDX, 22)
+	FEAT_DEF(MMX, 0x00000001, 0, REG_EDX, 23)
+	FEAT_DEF(FXSR, 0x00000001, 0, REG_EDX, 24)
+	FEAT_DEF(SSE, 0x00000001, 0, REG_EDX, 25)
+	FEAT_DEF(SSE2, 0x00000001, 0, REG_EDX, 26)
+	FEAT_DEF(SS, 0x00000001, 0, REG_EDX, 27)
+	FEAT_DEF(HTT, 0x00000001, 0, REG_EDX, 28)
+	FEAT_DEF(TM, 0x00000001, 0, REG_EDX, 29)
+	FEAT_DEF(PBE, 0x00000001, 0, REG_EDX, 31)
+
+	FEAT_DEF(DIGTEMP, 0x00000006, 0, REG_EAX,  0)
+	FEAT_DEF(TRBOBST, 0x00000006, 0, REG_EAX,  1)
+	FEAT_DEF(ARAT, 0x00000006, 0, REG_EAX,  2)
+	FEAT_DEF(PLN, 0x00000006, 0, REG_EAX,  4)
+	FEAT_DEF(ECMD, 0x00000006, 0, REG_EAX,  5)
+	FEAT_DEF(PTM, 0x00000006, 0, REG_EAX,  6)
+
+	FEAT_DEF(MPERF_APERF_MSR, 0x00000006, 0, REG_ECX,  0)
+	FEAT_DEF(ACNT2, 0x00000006, 0, REG_ECX,  1)
+	FEAT_DEF(ENERGY_EFF, 0x00000006, 0, REG_ECX,  3)
+
+	FEAT_DEF(FSGSBASE, 0x00000007, 0, REG_EBX,  0)
+	FEAT_DEF(BMI1, 0x00000007, 0, REG_EBX,  2)
+	FEAT_DEF(HLE, 0x00000007, 0, REG_EBX,  4)
+	FEAT_DEF(AVX2, 0x00000007, 0, REG_EBX,  5)
+	FEAT_DEF(SMEP, 0x00000007, 0, REG_EBX,  6)
+	FEAT_DEF(BMI2, 0x00000007, 0, REG_EBX,  7)
+	FEAT_DEF(ERMS, 0x00000007, 0, REG_EBX,  8)
+	FEAT_DEF(INVPCID, 0x00000007, 0, REG_EBX, 10)
+	FEAT_DEF(RTM, 0x00000007, 0, REG_EBX, 11)
+
+	FEAT_DEF(LAHF_SAHF, 0x80000001, 0, REG_ECX,  0)
+	FEAT_DEF(LZCNT, 0x80000001, 0, REG_ECX,  4)
+
+	FEAT_DEF(SYSCALL, 0x80000001, 0, REG_EDX, 11)
+	FEAT_DEF(XD, 0x80000001, 0, REG_EDX, 20)
+	FEAT_DEF(1GB_PG, 0x80000001, 0, REG_EDX, 26)
+	FEAT_DEF(RDTSCP, 0x80000001, 0, REG_EDX, 27)
+	FEAT_DEF(EM64T, 0x80000001, 0, REG_EDX, 29)
+
+	FEAT_DEF(INVTSC, 0x80000007, 0, REG_EDX,  8)
+};
+
+/*
+ * Execute CPUID instruction and get contents of a specific register
+ *
+ * This function, when compiled with GCC, will generate architecture-neutral
+ * code, as per GCC manual.
+ */
+static inline void
+rte_arch_cpu_get_features(uint32_t leaf, uint32_t subleaf, cpuid_registers_t out)
+{
+#if defined(__i386__) && defined(__PIC__)
+    /* %ebx is a forbidden register if we compile with -fPIC or -fPIE */
+    asm volatile("movl %%ebx,%0 ; cpuid ; xchgl %%ebx,%0"
+		 : "=r" (out[REG_EBX]),
+		   "=a" (out[REG_EAX]),
+		   "=c" (out[REG_ECX]),
+		   "=d" (out[REG_EDX])
+		 : "a" (leaf), "c" (subleaf));
+#else
+
+    asm volatile("cpuid"
+		 : "=a" (out[REG_EAX]),
+		   "=b" (out[REG_EBX]),
+		   "=c" (out[REG_ECX]),
+		   "=d" (out[REG_EDX])
+		 : "a" (leaf), "c" (subleaf));
+
+#endif
+}
+
+/*
+ * Checks if a particular flag is available on current machine.
+ */
+static inline int
+rte_arch_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
+{
+	const struct feature_entry *feat;
+	cpuid_registers_t regs;
+
+	if (feature >= RTE_CPUFLAG_NUMFLAGS)
+		/* Flag does not match anything in the feature tables */
+		return -ENOENT;
+
+	feat = &cpu_feature_table[feature];
+
+	if (!feat->leaf)
+		/* This entry in the table wasn't filled out! */
+		return -EFAULT;
+
+	rte_arch_cpu_get_features(feat->leaf & 0xffff0000, 0, regs);
+	if (((regs[REG_EAX] ^ feat->leaf) & 0xffff0000) ||
+	      regs[REG_EAX] < feat->leaf)
+		return 0;
+
+	/* get the cpuid leaf containing the desired feature */
+	rte_arch_cpu_get_features(feat->leaf, feat->subleaf, regs);
+
+	/* check if the feature is enabled */
+	return (regs[feat->reg] >> feat->bit) & 1;
+}
+
+#endif /* _RTE_CPUFLAGS_ARCH_H_ */
\ No newline at end of file
-- 
1.7.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific Chao Zhu
@ 2014-09-29 11:05   ` Bruce Richardson
  2014-09-29 15:24     ` Neil Horman
  0 siblings, 1 reply; 16+ messages in thread
From: Bruce Richardson @ 2014-09-29 11:05 UTC (permalink / raw)
  To: Chao Zhu; +Cc: dev

On Fri, Sep 26, 2014 at 05:33:32AM -0400, Chao Zhu wrote:
> This patch splits the atomic operations from DPDK and push them to
> architecture specific arch directories, so that other processor
> architecture to support DPDK can be easily adopted.
> 
> Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
> ---
>  lib/librte_eal/common/Makefile                     |    2 +-
>  .../common/include/i686/arch/rte_atomic_arch.h     |  378 ++++++++++++++++++++
>  lib/librte_eal/common/include/rte_atomic.h         |  172 +--------
>  .../common/include/x86_64/arch/rte_atomic_arch.h   |  378 ++++++++++++++++++++
>  4 files changed, 772 insertions(+), 158 deletions(-)
>  create mode 100644 lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h
>  create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h
> 
<...snip...>
> +#define	rte_compiler_barrier() rte_arch_compiler_barrier()

Small question: shouldn't the compiler barrier be independent of 
architecture?

/Bruce

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific
  2014-09-29 11:05   ` Bruce Richardson
@ 2014-09-29 15:24     ` Neil Horman
  2014-09-30  2:18       ` Chao CH Zhu
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Horman @ 2014-09-29 15:24 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Chao Zhu

On Mon, Sep 29, 2014 at 12:05:22PM +0100, Bruce Richardson wrote:
> On Fri, Sep 26, 2014 at 05:33:32AM -0400, Chao Zhu wrote:
> > This patch splits the atomic operations from DPDK and push them to
> > architecture specific arch directories, so that other processor
> > architecture to support DPDK can be easily adopted.
> > 
> > Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
> > ---
> >  lib/librte_eal/common/Makefile                     |    2 +-
> >  .../common/include/i686/arch/rte_atomic_arch.h     |  378 ++++++++++++++++++++
> >  lib/librte_eal/common/include/rte_atomic.h         |  172 +--------
> >  .../common/include/x86_64/arch/rte_atomic_arch.h   |  378 ++++++++++++++++++++
> >  4 files changed, 772 insertions(+), 158 deletions(-)
> >  create mode 100644 lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h
> >  create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h
> > 
> <...snip...>
> > +#define	rte_compiler_barrier() rte_arch_compiler_barrier()
> 
> Small question: shouldn't the compiler barrier be independent of 
> architecture?
> 
Agreed, compiler intrinsics I thought were used to define barriers, regardless
of arch (__memory_barrier() is the gcc intrinsic IIRC)
Neil

> /Bruce
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific
  2014-09-29 15:24     ` Neil Horman
@ 2014-09-30  2:18       ` Chao CH Zhu
  0 siblings, 0 replies; 16+ messages in thread
From: Chao CH Zhu @ 2014-09-30  2:18 UTC (permalink / raw)
  To: Neil Horman, Bruce Richardson; +Cc: dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="GB2312", Size: 2289 bytes --]

Bruce and Neil,

Thanks for your comments! Actually, the compiler hides the difference with 
different architecture.
I'll submit another patch to correct this!


Best Regards!
------------------------------
Chao Zhu (×£³¬)
Research Staff Member
Cloud Infrastructure and Technology Group
IBM China Research Lab
Building 19 Zhongguancun Software Park
8 Dongbeiwang West Road, Haidian District,
Beijing, PRC. 100193
Tel: +86-10-58748711
Email: bjzhuc@cn.ibm.com




From:   Neil Horman <nhorman@tuxdriver.com>
To:     Bruce Richardson <bruce.richardson@intel.com>
Cc:     Chao CH Zhu/China/IBM@IBMCN, dev@dpdk.org
Date:   2014/09/29 23:23
Subject:        Re: [dpdk-dev] [PATCH 1/7] Split atomic operations to 
architecture specific



On Mon, Sep 29, 2014 at 12:05:22PM +0100, Bruce Richardson wrote:
> On Fri, Sep 26, 2014 at 05:33:32AM -0400, Chao Zhu wrote:
> > This patch splits the atomic operations from DPDK and push them to
> > architecture specific arch directories, so that other processor
> > architecture to support DPDK can be easily adopted.
> > 
> > Signed-off-by: Chao Zhu <bjzhuc@cn.ibm.com>
> > ---
> >  lib/librte_eal/common/Makefile                     |    2 +-
> >  .../common/include/i686/arch/rte_atomic_arch.h     |  378 
++++++++++++++++++++
> >  lib/librte_eal/common/include/rte_atomic.h         |  172 +--------
> >  .../common/include/x86_64/arch/rte_atomic_arch.h   |  378 
++++++++++++++++++++
> >  4 files changed, 772 insertions(+), 158 deletions(-)
> >  create mode 100644 
lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h
> >  create mode 100644 
lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h
> > 
> <...snip...>
> > +#define             rte_compiler_barrier() 
rte_arch_compiler_barrier()
> 
> Small question: shouldn't the compiler barrier be independent of 
> architecture?
> 
Agreed, compiler intrinsics I thought were used to define barriers, 
regardless
of arch (__memory_barrier() is the gcc intrinsic IIRC)
Neil

> /Bruce
> 
> 



\x16º&†æ°z,bz)ízW(™;žIêwÓN7ãž6Ó^\x11zÛ«œö­†^[šÁ豉觵é\¢d^qè¯y×ë¢i kM¢ž×¥r‰¦­6Š{^•Ê&×~5ߍwëm^[ÉÚ]’Šà>‹-~,pŠØDHÄωÑvÛÎy÷MŸ¢·^½Ú]’ŠàNç·Ñ'©ÛMxӍøç\x7f´ÛM\x02\x11$ÑyÇ¢½ç_®‰šÎÉ kM5r\x18§µé\¢mtÛ^õõ¼¨®É k]5ø§µé\¢l"¶\x11\x1213öõ'©ÛMx×Þ5ß];ÓEÄÆÒ袝u\Šèœú+´\x05D

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
                   ` (6 preceding siblings ...)
  2014-09-26  9:33 ` [dpdk-dev] [PATCH 7/7] Split CPU flags operations " Chao Zhu
@ 2014-10-03 13:21 ` David Marchand
  2014-10-03 13:29   ` Bruce Richardson
  2014-10-13  2:36   ` Chao CH Zhu
  2014-10-06 21:46 ` Cyril Chemparathy
  8 siblings, 2 replies; 16+ messages in thread
From: David Marchand @ 2014-10-03 13:21 UTC (permalink / raw)
  To: Chao Zhu; +Cc: dev

Hello Chao,

On Fri, Sep 26, 2014 at 11:33 AM, Chao Zhu <bjzhuc@cn.ibm.com> wrote:

> The set of patches split x86 architecture specific operations from DPDK
> and put them to the
> arch directories of i686 and x86_64 architecture. This will make the
> adpotion of DPDK much easier
> on other computer architecture. For a new architecture, just add an
> architecture specific
> directory and necessary building configuration files, then DPDK can
> support it.
>
>
Here is a different approach for the headers splitting.

If we are going to support multiple architectures, the best would be to
have a specific header for each arch which implements a common API (no need
for any _arch suffix).
These headers would be located in lib/librte_eal/common/include/arch/$arch/
rather than lib/librte_eal/common/include/$arch/arch/ (which looks odd to
me).
Makefiles can add some -I for dpdk to build itself (and we can remove those
symlinks from the makefiles).
Makefiles only install the specific headers in RTE_SDK/include for use by
applications.

For common code and documentation, we can add a "generic" directory in
lib/librte_eal/common/include (or "arch-generic", or "shared" ... any
better idea ?).
DPDK makefiles installs the generic headers in RTE_SDK/include/generic.
arch headers (like rte_atomic.h) include the generic one
(<generic/rte_atomic.h>).

These generic headers can be implemented using compiler intrinsics when
possible.
They also include the doxygen stuff in a single place.


This would look like something like this, for rte_atomic.h :
- in DPDK sources
$ ls lib/librte_eal/common/include/*/rte_atomic.h
lib/librte_eal/common/include/i686/rte_atomic.h
lib/librte_eal/common/include/x86_64/rte_atomic.h
lib/librte_eal/common/include/generic/rte_atomic.h

- in installed RTE_SDK
$ ls RTE_SDK/include/{,*/}rte_atomic.h
RTE_SDK/include/rte_atomic.h
RTE_SDK/include/generic/rte_atomic.h

Comments ?


I am only focusing on the first patchset at the moment, but if we can find
consensus here, a respin of the two patchsets would be great.

Thanks.

-- 
David Marchand

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
  2014-10-03 13:21 ` [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK David Marchand
@ 2014-10-03 13:29   ` Bruce Richardson
  2014-10-13  2:36   ` Chao CH Zhu
  1 sibling, 0 replies; 16+ messages in thread
From: Bruce Richardson @ 2014-10-03 13:29 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Chao Zhu

On Fri, Oct 03, 2014 at 03:21:53PM +0200, David Marchand wrote:
> Hello Chao,
> 
> On Fri, Sep 26, 2014 at 11:33 AM, Chao Zhu <bjzhuc@cn.ibm.com> wrote:
> 
> > The set of patches split x86 architecture specific operations from DPDK
> > and put them to the
> > arch directories of i686 and x86_64 architecture. This will make the
> > adpotion of DPDK much easier
> > on other computer architecture. For a new architecture, just add an
> > architecture specific
> > directory and necessary building configuration files, then DPDK can
> > support it.
> >
> >
> Here is a different approach for the headers splitting.
> 
> If we are going to support multiple architectures, the best would be to
> have a specific header for each arch which implements a common API (no need
> for any _arch suffix).
> These headers would be located in lib/librte_eal/common/include/arch/$arch/
> rather than lib/librte_eal/common/include/$arch/arch/ (which looks odd to
> me).
> Makefiles can add some -I for dpdk to build itself (and we can remove those
> symlinks from the makefiles).
> Makefiles only install the specific headers in RTE_SDK/include for use by
> applications.
> 
> For common code and documentation, we can add a "generic" directory in
> lib/librte_eal/common/include (or "arch-generic", or "shared" ... any
> better idea ?).
> DPDK makefiles installs the generic headers in RTE_SDK/include/generic.
> arch headers (like rte_atomic.h) include the generic one
> (<generic/rte_atomic.h>).
> 
> These generic headers can be implemented using compiler intrinsics when
> possible.
> They also include the doxygen stuff in a single place.
> 
> 
> This would look like something like this, for rte_atomic.h :
> - in DPDK sources
> $ ls lib/librte_eal/common/include/*/rte_atomic.h
> lib/librte_eal/common/include/i686/rte_atomic.h
> lib/librte_eal/common/include/x86_64/rte_atomic.h
> lib/librte_eal/common/include/generic/rte_atomic.h
> 
> - in installed RTE_SDK
> $ ls RTE_SDK/include/{,*/}rte_atomic.h
> RTE_SDK/include/rte_atomic.h
> RTE_SDK/include/generic/rte_atomic.h
> 
> Comments ?
> 
> 
> I am only focusing on the first patchset at the moment, but if we can find
> consensus here, a respin of the two patchsets would be great.
> 
> Thanks.
> 
> -- 
> David Marchand


I would have no objection to such a scheme. However, I'm not seeing much 
advantage over the existing way of doing things. I think I'd rather see the 
proposed patch sets merged first and then any additional cleanup done, 
rather than holding up a worthwhile submission for a bit of tidy-up.

/Bruce

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
  2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
                   ` (7 preceding siblings ...)
  2014-10-03 13:21 ` [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK David Marchand
@ 2014-10-06 21:46 ` Cyril Chemparathy
  2014-10-12  9:14   ` Chao CH Zhu
  8 siblings, 1 reply; 16+ messages in thread
From: Cyril Chemparathy @ 2014-10-06 21:46 UTC (permalink / raw)
  To: Chao Zhu, dev

On 9/26/2014 2:33 AM, Chao Zhu wrote:
> The set of patches split x86 architecture specific operations from DPDK and put them to the
> arch directories of i686 and x86_64 architecture. This will make the adpotion of DPDK much easier
> on other computer architecture. For a new architecture, just add an architecture specific
> directory and necessary building configuration files, then DPDK can support it.

Wouldn't the SSE specifics in rte_common.h and rte_common_vect.h need to 
be similarly split out into architecture specifics?

Thanks
-- Cyril.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
  2014-10-06 21:46 ` Cyril Chemparathy
@ 2014-10-12  9:14   ` Chao CH Zhu
  0 siblings, 0 replies; 16+ messages in thread
From: Chao CH Zhu @ 2014-10-12  9:14 UTC (permalink / raw)
  To: Cyril Chemparathy; +Cc: dev

Cyril,

Thanks for your comments! You are right. SSE needs to be splited. The 
current split is not a completed one. I'll continue to contribute.

Best Regards!
------------------------------
Chao Zhu (祝超)
Research Staff Member
Cloud Infrastructure and Technology Group
IBM China Research Lab
Building 19 Zhongguancun Software Park
8 Dongbeiwang West Road, Haidian District,
Beijing, PRC. 100193
Tel: +86-10-58748711
Email: bjzhuc@cn.ibm.com




From:   Cyril Chemparathy <cchemparathy@tilera.com>
To:     Chao CH Zhu/China/IBM@IBMCN, <dev@dpdk.org>
Date:   2014/10/07 05:39
Subject:        Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture 
specific operations from DPDK



On 9/26/2014 2:33 AM, Chao Zhu wrote:
> The set of patches split x86 architecture specific operations from DPDK 
and put them to the
> arch directories of i686 and x86_64 architecture. This will make the 
adpotion of DPDK much easier
> on other computer architecture. For a new architecture, just add an 
architecture specific
> directory and necessary building configuration files, then DPDK can 
support it.

Wouldn't the SSE specifics in rte_common.h and rte_common_vect.h need to 
be similarly split out into architecture specifics?

Thanks
-- Cyril.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
  2014-10-03 13:21 ` [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK David Marchand
  2014-10-03 13:29   ` Bruce Richardson
@ 2014-10-13  2:36   ` Chao CH Zhu
  1 sibling, 0 replies; 16+ messages in thread
From: Chao CH Zhu @ 2014-10-13  2:36 UTC (permalink / raw)
  To: David Marchand; +Cc: dev

David,

I'll update the patches acccording to your comments. 
Thanks!

Best Regards!
------------------------------
Chao Zhu 



From:   David Marchand <david.marchand@6wind.com>
To:     Chao CH Zhu/China/IBM@IBMCN
Cc:     "dev@dpdk.org" <dev@dpdk.org>
Date:   2014/10/03 21:21
Subject:        Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture 
specific operations from DPDK



Hello Chao, 

On Fri, Sep 26, 2014 at 11:33 AM, Chao Zhu <bjzhuc@cn.ibm.com> wrote:
The set of patches split x86 architecture specific operations from DPDK 
and put them to the
arch directories of i686 and x86_64 architecture. This will make the 
adpotion of DPDK much easier
on other computer architecture. For a new architecture, just add an 
architecture specific
directory and necessary building configuration files, then DPDK can 
support it.

 
Here is a different approach for the headers splitting.

If we are going to support multiple architectures, the best would be to 
have a specific header for each arch which implements a common API (no 
need for any _arch suffix).
These headers would be located in 
lib/librte_eal/common/include/arch/$arch/ rather than 
lib/librte_eal/common/include/$arch/arch/ (which looks odd to me).
Makefiles can add some -I for dpdk to build itself (and we can remove 
those symlinks from the makefiles).
Makefiles only install the specific headers in RTE_SDK/include for use by 
applications.

For common code and documentation, we can add a "generic" directory in 
lib/librte_eal/common/include (or "arch-generic", or "shared" ... any 
better idea ?).
DPDK makefiles installs the generic headers in RTE_SDK/include/generic.
arch headers (like rte_atomic.h) include the generic one 
(<generic/rte_atomic.h>).

These generic headers can be implemented using compiler intrinsics when 
possible.
They also include the doxygen stuff in a single place.


This would look like something like this, for rte_atomic.h :
- in DPDK sources
$ ls lib/librte_eal/common/include/*/rte_atomic.h
lib/librte_eal/common/include/i686/rte_atomic.h
lib/librte_eal/common/include/x86_64/rte_atomic.h
lib/librte_eal/common/include/generic/rte_atomic.h

- in installed RTE_SDK
$ ls RTE_SDK/include/{,*/}rte_atomic.h
RTE_SDK/include/rte_atomic.h
RTE_SDK/include/generic/rte_atomic.h

Comments ?


I am only focusing on the first patchset at the moment, but if we can find 
consensus here, a respin of the two patchsets would be great.

Thanks.

-- 
David Marchand

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-10-13  2:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-26  9:33 [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Chao Zhu
2014-09-26  9:33 ` [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific Chao Zhu
2014-09-29 11:05   ` Bruce Richardson
2014-09-29 15:24     ` Neil Horman
2014-09-30  2:18       ` Chao CH Zhu
2014-09-26  9:33 ` [dpdk-dev] [PATCH 2/7] Split byte order " Chao Zhu
2014-09-26  9:33 ` [dpdk-dev] [PATCH 3/7] Split CPU cycle operation " Chao Zhu
2014-09-26  9:33 ` [dpdk-dev] [PATCH 4/7] Split prefetch operations " Chao Zhu
2014-09-26  9:33 ` [dpdk-dev] [PATCH 5/7] Split spinlock " Chao Zhu
2014-09-26  9:33 ` [dpdk-dev] [PATCH 6/7] Split memcpy operation " Chao Zhu
2014-09-26  9:33 ` [dpdk-dev] [PATCH 7/7] Split CPU flags operations " Chao Zhu
2014-10-03 13:21 ` [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK David Marchand
2014-10-03 13:29   ` Bruce Richardson
2014-10-13  2:36   ` Chao CH Zhu
2014-10-06 21:46 ` Cyril Chemparathy
2014-10-12  9:14   ` Chao CH Zhu

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git