DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support
@ 2015-10-29 17:29 David Hunt
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 1/5] eal: split arm rte_memcpy.h into 32-bit and 64-bit versions David Hunt
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: David Hunt @ 2015-10-29 17:29 UTC (permalink / raw)
  To: dev

Hi DPDK Community. 

This is an updated patchset for ARMv8 that now sits on top of the previously 
submitted ARMv7 code by RehiveTech. It re-uses a lot of that code, and splits
some header files into 32-bit and 64-bit versions, so uses the same arm include
directory. 

Tested on an XGene 64-bit arm server board, with PCI slots. Passes traffic between
two physical ports on an Intel 82599 dual-port 10Gig NIC. Should work with many
other NICS, but these are as yet untested. 

Compiles igb_uio, kni and all the physical device PMDs. 

ACL and LPM are disabled due to compilation issues. 

Note added to the Release notes. 


David Hunt (5):
  eal/arm: split arm rte_memcpy.h into 32 and 64 bit versions.
  eal/arm: split arm rte_prefetch.h into 32 and 64 bit versions
  eal/arm: fix 64-bit compilation for armv8
  mk: Add makefile support for armv8 architecture
  test: add test for cpu flags on armv8

 MAINTAINERS                                        |   3 +-
 app/test/test_cpuflags.c                           |  13 +-
 config/defconfig_arm64-armv8a-linuxapp-gcc         |  56 ++++
 doc/guides/rel_notes/release_2_2.rst               |   7 +-
 .../common/include/arch/arm/rte_cpuflags.h         |   9 +
 .../common/include/arch/arm/rte_memcpy.h           | 302 +------------------
 .../common/include/arch/arm/rte_memcpy_32.h        | 334 +++++++++++++++++++++
 .../common/include/arch/arm/rte_memcpy_64.h        | 322 ++++++++++++++++++++
 .../common/include/arch/arm/rte_prefetch.h         |  31 +-
 .../common/include/arch/arm/rte_prefetch_32.h      |  61 ++++
 .../common/include/arch/arm/rte_prefetch_64.h      |  61 ++++
 mk/arch/arm64/rte.vars.mk                          |  58 ++++
 mk/machine/armv8a/rte.vars.mk                      |  57 ++++
 13 files changed, 986 insertions(+), 328 deletions(-)
 create mode 100644 config/defconfig_arm64-armv8a-linuxapp-gcc
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
 create mode 100644 mk/arch/arm64/rte.vars.mk
 create mode 100644 mk/machine/armv8a/rte.vars.mk

-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH 1/5] eal: split arm rte_memcpy.h into 32-bit and 64-bit versions
  2015-10-29 17:29 [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support David Hunt
@ 2015-10-29 17:29 ` David Hunt
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 2/5] eal: split arm rte_prefetch.h " David Hunt
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: David Hunt @ 2015-10-29 17:29 UTC (permalink / raw)
  To: dev

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 .../common/include/arch/arm/rte_memcpy.h           | 302 +------------------
 .../common/include/arch/arm/rte_memcpy_32.h        | 334 +++++++++++++++++++++
 .../common/include/arch/arm/rte_memcpy_64.h        | 308 +++++++++++++++++++
 3 files changed, 647 insertions(+), 297 deletions(-)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
index f41648a..19c98e1 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
@@ -1,7 +1,7 @@
 /*
  *   BSD LICENSE
  *
- *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -13,7 +13,7 @@
  *       notice, this list of conditions and the following disclaimer in
  *       the documentation and/or other materials provided with the
  *       distribution.
- *     * Neither the name of RehiveTech nor the names of its
+ *     * Neither the name of Intel Corportation nor the names of its
  *       contributors may be used to endorse or promote products derived
  *       from this software without specific prior written permission.
  *
@@ -33,302 +33,10 @@
 #ifndef _RTE_MEMCPY_ARM_H_
 #define _RTE_MEMCPY_ARM_H_
 
-#include <stdint.h>
-#include <string.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#include "generic/rte_memcpy.h"
-
-#ifdef __ARM_NEON_FP
-
-/* ARM NEON Intrinsics are used to copy data */
-#include <arm_neon.h>
-
-static inline void
-rte_mov16(uint8_t *dst, const uint8_t *src)
-{
-	vst1q_u8(dst, vld1q_u8(src));
-}
-
-static inline void
-rte_mov32(uint8_t *dst, const uint8_t *src)
-{
-	asm volatile (
-		"vld1.8 {d0-d3}, [%0]\n\t"
-		"vst1.8 {d0-d3}, [%1]\n\t"
-		: "+r" (src), "+r" (dst)
-		: : "memory", "d0", "d1", "d2", "d3");
-}
-
-static inline void
-rte_mov48(uint8_t *dst, const uint8_t *src)
-{
-	asm volatile (
-		"vld1.8 {d0-d3}, [%0]!\n\t"
-		"vld1.8 {d4-d5}, [%0]\n\t"
-		"vst1.8 {d0-d3}, [%1]!\n\t"
-		"vst1.8 {d4-d5}, [%1]\n\t"
-		: "+r" (src), "+r" (dst)
-		:
-		: "memory", "d0", "d1", "d2", "d3", "d4", "d5");
-}
-
-static inline void
-rte_mov64(uint8_t *dst, const uint8_t *src)
-{
-	asm volatile (
-		"vld1.8 {d0-d3}, [%0]!\n\t"
-		"vld1.8 {d4-d7}, [%0]\n\t"
-		"vst1.8 {d0-d3}, [%1]!\n\t"
-		"vst1.8 {d4-d7}, [%1]\n\t"
-		: "+r" (src), "+r" (dst)
-		:
-		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7");
-}
-
-static inline void
-rte_mov128(uint8_t *dst, const uint8_t *src)
-{
-	asm volatile ("pld [%0, #64]" : : "r" (src));
-	asm volatile (
-		"vld1.8 {d0-d3},   [%0]!\n\t"
-		"vld1.8 {d4-d7},   [%0]!\n\t"
-		"vld1.8 {d8-d11},  [%0]!\n\t"
-		"vld1.8 {d12-d15}, [%0]\n\t"
-		"vst1.8 {d0-d3},   [%1]!\n\t"
-		"vst1.8 {d4-d7},   [%1]!\n\t"
-		"vst1.8 {d8-d11},  [%1]!\n\t"
-		"vst1.8 {d12-d15}, [%1]\n\t"
-		: "+r" (src), "+r" (dst)
-		:
-		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
-		"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15");
-}
-
-static inline void
-rte_mov256(uint8_t *dst, const uint8_t *src)
-{
-	asm volatile ("pld [%0,  #64]" : : "r" (src));
-	asm volatile ("pld [%0, #128]" : : "r" (src));
-	asm volatile ("pld [%0, #192]" : : "r" (src));
-	asm volatile ("pld [%0, #256]" : : "r" (src));
-	asm volatile ("pld [%0, #320]" : : "r" (src));
-	asm volatile ("pld [%0, #384]" : : "r" (src));
-	asm volatile ("pld [%0, #448]" : : "r" (src));
-	asm volatile (
-		"vld1.8 {d0-d3},   [%0]!\n\t"
-		"vld1.8 {d4-d7},   [%0]!\n\t"
-		"vld1.8 {d8-d11},  [%0]!\n\t"
-		"vld1.8 {d12-d15}, [%0]!\n\t"
-		"vld1.8 {d16-d19}, [%0]!\n\t"
-		"vld1.8 {d20-d23}, [%0]!\n\t"
-		"vld1.8 {d24-d27}, [%0]!\n\t"
-		"vld1.8 {d28-d31}, [%0]\n\t"
-		"vst1.8 {d0-d3},   [%1]!\n\t"
-		"vst1.8 {d4-d7},   [%1]!\n\t"
-		"vst1.8 {d8-d11},  [%1]!\n\t"
-		"vst1.8 {d12-d15}, [%1]!\n\t"
-		"vst1.8 {d16-d19}, [%1]!\n\t"
-		"vst1.8 {d20-d23}, [%1]!\n\t"
-		"vst1.8 {d24-d27}, [%1]!\n\t"
-		"vst1.8 {d28-d31}, [%1]!\n\t"
-		: "+r" (src), "+r" (dst)
-		:
-		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
-		"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15",
-		"d16", "d17", "d18", "d19", "d20", "d21", "d22", "d23",
-		"d24", "d25", "d26", "d27", "d28", "d29", "d30", "d31");
-}
-
-#define rte_memcpy(dst, src, n)              \
-	({ (__builtin_constant_p(n)) ?       \
-	memcpy((dst), (src), (n)) :          \
-	rte_memcpy_func((dst), (src), (n)); })
-
-static inline void *
-rte_memcpy_func(void *dst, const void *src, size_t n)
-{
-	void *ret = dst;
-
-	/* We can't copy < 16 bytes using XMM registers so do it manually. */
-	if (n < 16) {
-		if (n & 0x01) {
-			*(uint8_t *)dst = *(const uint8_t *)src;
-			dst = (uint8_t *)dst + 1;
-			src = (const uint8_t *)src + 1;
-		}
-		if (n & 0x02) {
-			*(uint16_t *)dst = *(const uint16_t *)src;
-			dst = (uint16_t *)dst + 1;
-			src = (const uint16_t *)src + 1;
-		}
-		if (n & 0x04) {
-			*(uint32_t *)dst = *(const uint32_t *)src;
-			dst = (uint32_t *)dst + 1;
-			src = (const uint32_t *)src + 1;
-		}
-		if (n & 0x08) {
-			/* ARMv7 can not handle unaligned access to long long
-			 * (uint64_t). Therefore two uint32_t operations are
-			 * used.
-			 */
-			*(uint32_t *)dst = *(const uint32_t *)src;
-			dst = (uint32_t *)dst + 1;
-			src = (const uint32_t *)src + 1;
-			*(uint32_t *)dst = *(const uint32_t *)src;
-		}
-		return ret;
-	}
-
-	/* Special fast cases for <= 128 bytes */
-	if (n <= 32) {
-		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
-		rte_mov16((uint8_t *)dst - 16 + n,
-			(const uint8_t *)src - 16 + n);
-		return ret;
-	}
-
-	if (n <= 64) {
-		rte_mov32((uint8_t *)dst, (const uint8_t *)src);
-		rte_mov32((uint8_t *)dst - 32 + n,
-			(const uint8_t *)src - 32 + n);
-		return ret;
-	}
-
-	if (n <= 128) {
-		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
-		rte_mov64((uint8_t *)dst - 64 + n,
-			(const uint8_t *)src - 64 + n);
-		return ret;
-	}
-
-	/*
-	 * For large copies > 128 bytes. This combination of 256, 64 and 16 byte
-	 * copies was found to be faster than doing 128 and 32 byte copies as
-	 * well.
-	 */
-	for ( ; n >= 256; n -= 256) {
-		rte_mov256((uint8_t *)dst, (const uint8_t *)src);
-		dst = (uint8_t *)dst + 256;
-		src = (const uint8_t *)src + 256;
-	}
-
-	/*
-	 * We split the remaining bytes (which will be less than 256) into
-	 * 64byte (2^6) chunks.
-	 * Using incrementing integers in the case labels of a switch statement
-	 * enourages the compiler to use a jump table. To get incrementing
-	 * integers, we shift the 2 relevant bits to the LSB position to first
-	 * get decrementing integers, and then subtract.
-	 */
-	switch (3 - (n >> 6)) {
-	case 0x00:
-		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
-		n -= 64;
-		dst = (uint8_t *)dst + 64;
-		src = (const uint8_t *)src + 64;      /* fallthrough */
-	case 0x01:
-		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
-		n -= 64;
-		dst = (uint8_t *)dst + 64;
-		src = (const uint8_t *)src + 64;      /* fallthrough */
-	case 0x02:
-		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
-		n -= 64;
-		dst = (uint8_t *)dst + 64;
-		src = (const uint8_t *)src + 64;      /* fallthrough */
-	default:
-		break;
-	}
-
-	/*
-	 * We split the remaining bytes (which will be less than 64) into
-	 * 16byte (2^4) chunks, using the same switch structure as above.
-	 */
-	switch (3 - (n >> 4)) {
-	case 0x00:
-		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
-		n -= 16;
-		dst = (uint8_t *)dst + 16;
-		src = (const uint8_t *)src + 16;      /* fallthrough */
-	case 0x01:
-		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
-		n -= 16;
-		dst = (uint8_t *)dst + 16;
-		src = (const uint8_t *)src + 16;      /* fallthrough */
-	case 0x02:
-		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
-		n -= 16;
-		dst = (uint8_t *)dst + 16;
-		src = (const uint8_t *)src + 16;      /* fallthrough */
-	default:
-		break;
-	}
-
-	/* Copy any remaining bytes, without going beyond end of buffers */
-	if (n != 0)
-		rte_mov16((uint8_t *)dst - 16 + n,
-			(const uint8_t *)src - 16 + n);
-	return ret;
-}
-
+#ifdef RTE_ARCH_64
+#include "rte_memcpy_64.h"
 #else
-
-static inline void
-rte_mov16(uint8_t *dst, const uint8_t *src)
-{
-	memcpy(dst, src, 16);
-}
-
-static inline void
-rte_mov32(uint8_t *dst, const uint8_t *src)
-{
-	memcpy(dst, src, 32);
-}
-
-static inline void
-rte_mov48(uint8_t *dst, const uint8_t *src)
-{
-	memcpy(dst, src, 48);
-}
-
-static inline void
-rte_mov64(uint8_t *dst, const uint8_t *src)
-{
-	memcpy(dst, src, 64);
-}
-
-static inline void
-rte_mov128(uint8_t *dst, const uint8_t *src)
-{
-	memcpy(dst, src, 128);
-}
-
-static inline void
-rte_mov256(uint8_t *dst, const uint8_t *src)
-{
-	memcpy(dst, src, 256);
-}
-
-static inline void *
-rte_memcpy(void *dst, const void *src, size_t n)
-{
-	return memcpy(dst, src, n);
-}
-
-static inline void *
-rte_memcpy_func(void *dst, const void *src, size_t n)
-{
-	return memcpy(dst, src, n);
-}
-
-#endif /* __ARM_NEON_FP */
-
-#ifdef __cplusplus
-}
+#include "rte_memcpy_32.h"
 #endif
 
 #endif /* _RTE_MEMCPY_ARM_H_ */
diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
new file mode 100644
index 0000000..f41648a
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
@@ -0,0 +1,334 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_MEMCPY_ARM32_H_
+#define _RTE_MEMCPY_ARM32_H_
+
+#include <stdint.h>
+#include <string.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_memcpy.h"
+
+#ifdef __ARM_NEON_FP
+
+/* ARM NEON Intrinsics are used to copy data */
+#include <arm_neon.h>
+
+static inline void
+rte_mov16(uint8_t *dst, const uint8_t *src)
+{
+	vst1q_u8(dst, vld1q_u8(src));
+}
+
+static inline void
+rte_mov32(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile (
+		"vld1.8 {d0-d3}, [%0]\n\t"
+		"vst1.8 {d0-d3}, [%1]\n\t"
+		: "+r" (src), "+r" (dst)
+		: : "memory", "d0", "d1", "d2", "d3");
+}
+
+static inline void
+rte_mov48(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile (
+		"vld1.8 {d0-d3}, [%0]!\n\t"
+		"vld1.8 {d4-d5}, [%0]\n\t"
+		"vst1.8 {d0-d3}, [%1]!\n\t"
+		"vst1.8 {d4-d5}, [%1]\n\t"
+		: "+r" (src), "+r" (dst)
+		:
+		: "memory", "d0", "d1", "d2", "d3", "d4", "d5");
+}
+
+static inline void
+rte_mov64(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile (
+		"vld1.8 {d0-d3}, [%0]!\n\t"
+		"vld1.8 {d4-d7}, [%0]\n\t"
+		"vst1.8 {d0-d3}, [%1]!\n\t"
+		"vst1.8 {d4-d7}, [%1]\n\t"
+		: "+r" (src), "+r" (dst)
+		:
+		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7");
+}
+
+static inline void
+rte_mov128(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile ("pld [%0, #64]" : : "r" (src));
+	asm volatile (
+		"vld1.8 {d0-d3},   [%0]!\n\t"
+		"vld1.8 {d4-d7},   [%0]!\n\t"
+		"vld1.8 {d8-d11},  [%0]!\n\t"
+		"vld1.8 {d12-d15}, [%0]\n\t"
+		"vst1.8 {d0-d3},   [%1]!\n\t"
+		"vst1.8 {d4-d7},   [%1]!\n\t"
+		"vst1.8 {d8-d11},  [%1]!\n\t"
+		"vst1.8 {d12-d15}, [%1]\n\t"
+		: "+r" (src), "+r" (dst)
+		:
+		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
+		"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15");
+}
+
+static inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile ("pld [%0,  #64]" : : "r" (src));
+	asm volatile ("pld [%0, #128]" : : "r" (src));
+	asm volatile ("pld [%0, #192]" : : "r" (src));
+	asm volatile ("pld [%0, #256]" : : "r" (src));
+	asm volatile ("pld [%0, #320]" : : "r" (src));
+	asm volatile ("pld [%0, #384]" : : "r" (src));
+	asm volatile ("pld [%0, #448]" : : "r" (src));
+	asm volatile (
+		"vld1.8 {d0-d3},   [%0]!\n\t"
+		"vld1.8 {d4-d7},   [%0]!\n\t"
+		"vld1.8 {d8-d11},  [%0]!\n\t"
+		"vld1.8 {d12-d15}, [%0]!\n\t"
+		"vld1.8 {d16-d19}, [%0]!\n\t"
+		"vld1.8 {d20-d23}, [%0]!\n\t"
+		"vld1.8 {d24-d27}, [%0]!\n\t"
+		"vld1.8 {d28-d31}, [%0]\n\t"
+		"vst1.8 {d0-d3},   [%1]!\n\t"
+		"vst1.8 {d4-d7},   [%1]!\n\t"
+		"vst1.8 {d8-d11},  [%1]!\n\t"
+		"vst1.8 {d12-d15}, [%1]!\n\t"
+		"vst1.8 {d16-d19}, [%1]!\n\t"
+		"vst1.8 {d20-d23}, [%1]!\n\t"
+		"vst1.8 {d24-d27}, [%1]!\n\t"
+		"vst1.8 {d28-d31}, [%1]!\n\t"
+		: "+r" (src), "+r" (dst)
+		:
+		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
+		"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15",
+		"d16", "d17", "d18", "d19", "d20", "d21", "d22", "d23",
+		"d24", "d25", "d26", "d27", "d28", "d29", "d30", "d31");
+}
+
+#define rte_memcpy(dst, src, n)              \
+	({ (__builtin_constant_p(n)) ?       \
+	memcpy((dst), (src), (n)) :          \
+	rte_memcpy_func((dst), (src), (n)); })
+
+static inline void *
+rte_memcpy_func(void *dst, const void *src, size_t n)
+{
+	void *ret = dst;
+
+	/* We can't copy < 16 bytes using XMM registers so do it manually. */
+	if (n < 16) {
+		if (n & 0x01) {
+			*(uint8_t *)dst = *(const uint8_t *)src;
+			dst = (uint8_t *)dst + 1;
+			src = (const uint8_t *)src + 1;
+		}
+		if (n & 0x02) {
+			*(uint16_t *)dst = *(const uint16_t *)src;
+			dst = (uint16_t *)dst + 1;
+			src = (const uint16_t *)src + 1;
+		}
+		if (n & 0x04) {
+			*(uint32_t *)dst = *(const uint32_t *)src;
+			dst = (uint32_t *)dst + 1;
+			src = (const uint32_t *)src + 1;
+		}
+		if (n & 0x08) {
+			/* ARMv7 can not handle unaligned access to long long
+			 * (uint64_t). Therefore two uint32_t operations are
+			 * used.
+			 */
+			*(uint32_t *)dst = *(const uint32_t *)src;
+			dst = (uint32_t *)dst + 1;
+			src = (const uint32_t *)src + 1;
+			*(uint32_t *)dst = *(const uint32_t *)src;
+		}
+		return ret;
+	}
+
+	/* Special fast cases for <= 128 bytes */
+	if (n <= 32) {
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov16((uint8_t *)dst - 16 + n,
+			(const uint8_t *)src - 16 + n);
+		return ret;
+	}
+
+	if (n <= 64) {
+		rte_mov32((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov32((uint8_t *)dst - 32 + n,
+			(const uint8_t *)src - 32 + n);
+		return ret;
+	}
+
+	if (n <= 128) {
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov64((uint8_t *)dst - 64 + n,
+			(const uint8_t *)src - 64 + n);
+		return ret;
+	}
+
+	/*
+	 * For large copies > 128 bytes. This combination of 256, 64 and 16 byte
+	 * copies was found to be faster than doing 128 and 32 byte copies as
+	 * well.
+	 */
+	for ( ; n >= 256; n -= 256) {
+		rte_mov256((uint8_t *)dst, (const uint8_t *)src);
+		dst = (uint8_t *)dst + 256;
+		src = (const uint8_t *)src + 256;
+	}
+
+	/*
+	 * We split the remaining bytes (which will be less than 256) into
+	 * 64byte (2^6) chunks.
+	 * Using incrementing integers in the case labels of a switch statement
+	 * enourages the compiler to use a jump table. To get incrementing
+	 * integers, we shift the 2 relevant bits to the LSB position to first
+	 * get decrementing integers, and then subtract.
+	 */
+	switch (3 - (n >> 6)) {
+	case 0x00:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	case 0x01:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	case 0x02:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	default:
+		break;
+	}
+
+	/*
+	 * We split the remaining bytes (which will be less than 64) into
+	 * 16byte (2^4) chunks, using the same switch structure as above.
+	 */
+	switch (3 - (n >> 4)) {
+	case 0x00:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	case 0x01:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	case 0x02:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	default:
+		break;
+	}
+
+	/* Copy any remaining bytes, without going beyond end of buffers */
+	if (n != 0)
+		rte_mov16((uint8_t *)dst - 16 + n,
+			(const uint8_t *)src - 16 + n);
+	return ret;
+}
+
+#else
+
+static inline void
+rte_mov16(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 16);
+}
+
+static inline void
+rte_mov32(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 32);
+}
+
+static inline void
+rte_mov48(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 48);
+}
+
+static inline void
+rte_mov64(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 64);
+}
+
+static inline void
+rte_mov128(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 128);
+}
+
+static inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 256);
+}
+
+static inline void *
+rte_memcpy(void *dst, const void *src, size_t n)
+{
+	return memcpy(dst, src, n);
+}
+
+static inline void *
+rte_memcpy_func(void *dst, const void *src, size_t n)
+{
+	return memcpy(dst, src, n);
+}
+
+#endif /* __ARM_NEON_FP */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MEMCPY_ARM32_H_ */
diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
new file mode 100644
index 0000000..6d85113
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
@@ -0,0 +1,308 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright (C) IBM Corporation 2014.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IBM Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#ifndef _RTE_MEMCPY_ARM_64_H_
+#define _RTE_MEMCPY_ARM_64_H_
+
+#include <stdint.h>
+#include <string.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_memcpy.h"
+
+#ifdef __ARM_NEON_FP
+
+/* ARM NEON Intrinsics are used to copy data */
+#include <arm_neon.h>
+
+static inline void
+rte_mov16(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile("LDP d0, d1, [%0]\n\t"
+		     "STP d0, d1, [%1]\n\t"
+		     : : "r" (src), "r" (dst) :
+	);
+}
+
+static inline void
+rte_mov32(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile("LDP q0, q1, [%0]\n\t"
+		     "STP q0, q1, [%1]\n\t"
+		     : : "r" (src), "r" (dst) :
+	);
+}
+
+static inline void
+rte_mov48(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile("LDP q0, q1, [%0]\n\t"
+		     "STP q0, q1, [%1]\n\t"
+		     "LDP d0, d1, [%0 , #32]\n\t"
+		     "STP d0, d1, [%1 , #32]\n\t"
+		     : : "r" (src), "r" (dst) :
+	);
+}
+
+static inline void
+rte_mov64(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile("LDP q0, q1, [%0]\n\t"
+		     "STP q0, q1, [%1]\n\t"
+		     "LDP q0, q1, [%0 , #32]\n\t"
+		     "STP q0, q1, [%1 , #32]\n\t"
+		     : : "r" (src), "r" (dst) :
+	);
+}
+
+static inline void
+rte_mov128(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile("LDP q0, q1, [%0]\n\t"
+		     "STP q0, q1, [%1]\n\t"
+		     "LDP q0, q1, [%0 , #32]\n\t"
+		     "STP q0, q1, [%1 , #32]\n\t"
+		     "LDP q0, q1, [%0 , #64]\n\t"
+		     "STP q0, q1, [%1 , #64]\n\t"
+		     "LDP q0, q1, [%0 , #96]\n\t"
+		     "STP q0, q1, [%1 , #96]\n\t"
+		     : : "r" (src), "r" (dst) :
+	);
+}
+
+static inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile("LDP q0, q1, [%0]\n\t"
+		     "STP q0, q1, [%1]\n\t"
+		     "LDP q0, q1, [%0 , #32]\n\t"
+		     "STP q0, q1, [%1 , #32]\n\t"
+		     "LDP q0, q1, [%0 , #64]\n\t"
+		     "STP q0, q1, [%1 , #64]\n\t"
+		     "LDP q0, q1, [%0 , #96]\n\t"
+		     "STP q0, q1, [%1 , #96]\n\t"
+		     "LDP q0, q1, [%0 , #128]\n\t"
+		     "STP q0, q1, [%1 , #128]\n\t"
+		     "LDP q0, q1, [%0 , #160]\n\t"
+		     "STP q0, q1, [%1 , #160]\n\t"
+		     "LDP q0, q1, [%0 , #192]\n\t"
+		     "STP q0, q1, [%1 , #192]\n\t"
+		     "LDP q0, q1, [%0 , #224]\n\t"
+		     "STP q0, q1, [%1 , #224]\n\t"
+		     : : "r" (src), "r" (dst) :
+	);
+}
+
+#define rte_memcpy(dst, src, n)              \
+	({ (__builtin_constant_p(n)) ?       \
+	memcpy((dst), (src), (n)) :          \
+	rte_memcpy_func((dst), (src), (n)); })
+
+static inline void *
+rte_memcpy_func(void *dst, const void *src, size_t n)
+{
+	void *ret = dst;
+
+	/* We can't copy < 16 bytes using XMM registers so do it manually. */
+	if (n < 16) {
+		if (n & 0x01) {
+			*(uint8_t *)dst = *(const uint8_t *)src;
+			dst = (uint8_t *)dst + 1;
+			src = (const uint8_t *)src + 1;
+		}
+		if (n & 0x02) {
+			*(uint16_t *)dst = *(const uint16_t *)src;
+			dst = (uint16_t *)dst + 1;
+			src = (const uint16_t *)src + 1;
+		}
+		if (n & 0x04) {
+			*(uint32_t *)dst = *(const uint32_t *)src;
+			dst = (uint32_t *)dst + 1;
+			src = (const uint32_t *)src + 1;
+		}
+		if (n & 0x08)
+			*(uint64_t *)dst = *(const uint64_t *)src;
+		return ret;
+	}
+
+	/* Special fast cases for <= 128 bytes */
+	if (n <= 32) {
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov16((uint8_t *)dst - 16 + n,
+			(const uint8_t *)src - 16 + n);
+		return ret;
+	}
+
+	if (n <= 64) {
+		rte_mov32((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov32((uint8_t *)dst - 32 + n,
+			(const uint8_t *)src - 32 + n);
+		return ret;
+	}
+
+	if (n <= 128) {
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov64((uint8_t *)dst - 64 + n,
+			(const uint8_t *)src - 64 + n);
+		return ret;
+	}
+
+	/*
+	 * For large copies > 128 bytes. This combination of 256, 64 and 16 byte
+	 * copies was found to be faster than doing 128 and 32 byte copies as
+	 * well.
+	 */
+	for ( ; n >= 256; n -= 256) {
+		rte_mov256((uint8_t *)dst, (const uint8_t *)src);
+		dst = (uint8_t *)dst + 256;
+		src = (const uint8_t *)src + 256;
+	}
+
+	/*
+	 * We split the remaining bytes (which will be less than 256) into
+	 * 64byte (2^6) chunks.
+	 * Using incrementing integers in the case labels of a switch statement
+	 * enourages the compiler to use a jump table. To get incrementing
+	 * integers, we shift the 2 relevant bits to the LSB position to first
+	 * get decrementing integers, and then subtract.
+	 */
+	switch (3 - (n >> 6)) {
+	case 0x00:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	case 0x01:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	case 0x02:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	default:
+		break;
+	}
+
+	/*
+	 * We split the remaining bytes (which will be less than 64) into
+	 * 16byte (2^4) chunks, using the same switch structure as above.
+	 */
+	switch (3 - (n >> 4)) {
+	case 0x00:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	case 0x01:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	case 0x02:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	default:
+		break;
+	}
+
+	/* Copy any remaining bytes, without going beyond end of buffers */
+	if (n != 0)
+		rte_mov16((uint8_t *)dst - 16 + n,
+			(const uint8_t *)src - 16 + n);
+	return ret;
+}
+
+#else
+
+static inline void
+rte_mov16(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 16);
+}
+
+static inline void
+rte_mov32(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 32);
+}
+
+static inline void
+rte_mov48(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 48);
+}
+
+static inline void
+rte_mov64(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 64);
+}
+
+static inline void
+rte_mov128(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 128);
+}
+
+static inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 256);
+}
+
+static inline void *
+rte_memcpy(void *dst, const void *src, size_t n)
+{
+	return memcpy(dst, src, n);
+}
+
+static inline void *
+rte_memcpy_func(void *dst, const void *src, size_t n)
+{
+	return memcpy(dst, src, n);
+}
+
+#endif /* __ARM_NEON_FP */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MEMCPY_ARM_64_H_ */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH 2/5] eal: split arm rte_prefetch.h into 32-bit and 64-bit versions
  2015-10-29 17:29 [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support David Hunt
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 1/5] eal: split arm rte_memcpy.h into 32-bit and 64-bit versions David Hunt
@ 2015-10-29 17:29 ` David Hunt
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 3/5] eal: fix compilation for armv8 64-bit David Hunt
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: David Hunt @ 2015-10-29 17:29 UTC (permalink / raw)
  To: dev

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 .../common/include/arch/arm/rte_prefetch.h         | 31 +++--------
 .../common/include/arch/arm/rte_prefetch_32.h      | 61 ++++++++++++++++++++++
 .../common/include/arch/arm/rte_prefetch_64.h      | 61 ++++++++++++++++++++++
 3 files changed, 128 insertions(+), 25 deletions(-)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_prefetch.h b/lib/librte_eal/common/include/arch/arm/rte_prefetch.h
index 62c3991..0c6473a 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_prefetch.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_prefetch.h
@@ -1,7 +1,7 @@
 /*
  *   BSD LICENSE
  *
- *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -13,7 +13,7 @@
  *       notice, this list of conditions and the following disclaimer in
  *       the documentation and/or other materials provided with the
  *       distribution.
- *     * Neither the name of RehiveTech nor the names of its
+ *     * Neither the name of Intel Corporation nor the names of its
  *       contributors may be used to endorse or promote products derived
  *       from this software without specific prior written permission.
  *
@@ -33,29 +33,10 @@
 #ifndef _RTE_PREFETCH_ARM_H_
 #define _RTE_PREFETCH_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#include "generic/rte_prefetch.h"
-
-static inline void rte_prefetch0(const volatile void *p)
-{
-	asm volatile ("pld [%0]" : : "r" (p));
-}
-
-static inline void rte_prefetch1(const volatile void *p)
-{
-	asm volatile ("pld [%0]" : : "r" (p));
-}
-
-static inline void rte_prefetch2(const volatile void *p)
-{
-	asm volatile ("pld [%0]" : : "r" (p));
-}
-
-#ifdef __cplusplus
-}
+#ifdef RTE_ARCH_64
+#include "rte_prefetch_64.h"
+#else
+#include "rte_prefetch_32.h"
 #endif
 
 #endif /* _RTE_PREFETCH_ARM_H_ */
diff --git a/lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h b/lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h
new file mode 100644
index 0000000..62c3991
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h
@@ -0,0 +1,61 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PREFETCH_ARM32_H_
+#define _RTE_PREFETCH_ARM32_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_prefetch.h"
+
+static inline void rte_prefetch0(const volatile void *p)
+{
+	asm volatile ("pld [%0]" : : "r" (p));
+}
+
+static inline void rte_prefetch1(const volatile void *p)
+{
+	asm volatile ("pld [%0]" : : "r" (p));
+}
+
+static inline void rte_prefetch2(const volatile void *p)
+{
+	asm volatile ("pld [%0]" : : "r" (p));
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PREFETCH_ARM32_H_ */
diff --git a/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h b/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
new file mode 100644
index 0000000..b0d9170
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
@@ -0,0 +1,61 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PREFETCH_ARM_64_H_
+#define _RTE_PREFETCH_ARM_64_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_prefetch.h"
+/* May want to add PSTL1KEEP instructions for prefetch for ownership. */
+static inline void rte_prefetch0(const volatile void *p)
+{
+	asm volatile ("PRFM PLDL1KEEP, [%0]" : : "r" (p));
+}
+
+static inline void rte_prefetch1(const volatile void *p)
+{
+	asm volatile ("PRFM PLDL2KEEP, [%0]" : : "r" (p));
+}
+
+static inline void rte_prefetch2(const volatile void *p)
+{
+	asm volatile ("PRFM PLDL3KEEP, [%0]" : : "r" (p));
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PREFETCH_ARM_64_H_ */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH 3/5] eal: fix compilation for armv8 64-bit
  2015-10-29 17:29 [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support David Hunt
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 1/5] eal: split arm rte_memcpy.h into 32-bit and 64-bit versions David Hunt
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 2/5] eal: split arm rte_prefetch.h " David Hunt
@ 2015-10-29 17:29 ` David Hunt
  2015-10-29 17:38   ` Jan Viktorin
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 4/5] mk: add support for armv8 on top of armv7 David Hunt
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: David Hunt @ 2015-10-29 17:29 UTC (permalink / raw)
  To: dev

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
index 7ce9d14..27d49c0 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
@@ -141,12 +141,21 @@ rte_cpu_get_features(__attribute__((unused)) uint32_t leaf,
 	__attribute__((unused)) uint32_t subleaf, cpuid_registers_t out)
 {
 	int auxv_fd;
+#ifdef RTE_ARCH_64
+	Elf64_auxv_t auxv;
+#else
 	Elf32_auxv_t auxv;
+#endif
 
 	auxv_fd = open("/proc/self/auxv", O_RDONLY);
 	assert(auxv_fd);
+#ifdef RTE_ARCH_64
+	while (read(auxv_fd, &auxv,
+		sizeof(Elf64_auxv_t)) == sizeof(Elf64_auxv_t)) {
+#else
 	while (read(auxv_fd, &auxv,
 		sizeof(Elf32_auxv_t)) == sizeof(Elf32_auxv_t)) {
+#endif
 		if (auxv.a_type == AT_HWCAP)
 			out[REG_HWCAP] = auxv.a_un.a_val;
 		else if (auxv.a_type == AT_HWCAP2)
-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH 4/5] mk: add support for armv8 on top of armv7
  2015-10-29 17:29 [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support David Hunt
                   ` (2 preceding siblings ...)
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 3/5] eal: fix compilation for armv8 64-bit David Hunt
@ 2015-10-29 17:29 ` David Hunt
  2015-10-29 17:39   ` Jan Viktorin
  2015-10-29 17:42   ` Jan Viktorin
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 5/5] test: add checks for cpu flags on armv8 David Hunt
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 32+ messages in thread
From: David Hunt @ 2015-10-29 17:29 UTC (permalink / raw)
  To: dev

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 MAINTAINERS                                |  3 +-
 config/defconfig_arm64-armv8a-linuxapp-gcc | 56 +++++++++++++++++++++++++++++
 doc/guides/rel_notes/release_2_2.rst       |  7 ++--
 mk/arch/arm64/rte.vars.mk                  | 58 ++++++++++++++++++++++++++++++
 mk/machine/armv8a/rte.vars.mk              | 57 +++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+), 4 deletions(-)
 create mode 100644 config/defconfig_arm64-armv8a-linuxapp-gcc
 create mode 100644 mk/arch/arm64/rte.vars.mk
 create mode 100644 mk/machine/armv8a/rte.vars.mk

diff --git a/MAINTAINERS b/MAINTAINERS
index a8933eb..4569f13 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -124,8 +124,9 @@ IBM POWER
 M: Chao Zhu <chaozhu@linux.vnet.ibm.com>
 F: lib/librte_eal/common/include/arch/ppc_64/
 
-ARM v7
+ARM
 M: Jan Viktorin <viktorin@rehivetech.com>
+M: David Hunt <david.hunt@intel.com>
 F: lib/librte_eal/common/include/arch/arm/
 
 Intel x86
diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
new file mode 100644
index 0000000..79a9533
--- /dev/null
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -0,0 +1,56 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+
+#include "common_linuxapp"
+
+CONFIG_RTE_MACHINE="armv8a"
+
+CONFIG_RTE_ARCH="arm64"
+CONFIG_RTE_ARCH_ARM64=y
+CONFIG_RTE_ARCH_64=y
+CONFIG_RTE_ARCH_ARM_NEON=y
+
+CONFIG_RTE_TOOLCHAIN="gcc"
+CONFIG_RTE_TOOLCHAIN_GCC=y
+
+CONFIG_RTE_IXGBE_INC_VECTOR=n
+CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
+CONFIG_RTE_LIBRTE_IVSHMEM=n
+CONFIG_RTE_LIBRTE_EAL_HOTPLUG=n
+
+CONFIG_RTE_LIBRTE_LPM=n
+CONFIG_RTE_LIBRTE_ACL=n
+CONFIG_RTE_LIBRTE_TABLE=n
+CONFIG_RTE_LIBRTE_PIPELINE=n
+
+# This is used to adjust the generic arm timer to align with the cpu cycle count.
+CONFIG_RTE_TIMER_MULTIPLIER=48
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 43a3a3c..2b806f5 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -23,10 +23,11 @@ New Features
 
 * **Added vhost-user multiple queue support.**
 
-* **Introduce ARMv7 architecture**
+* **Introduce ARMv7 and ARMv8 architectures**
 
-  It is now possible to build DPDK for the ARMv7 platform and test with
-  virtual PMD drivers.
+  * It is now possible to build DPDK for the ARMv7 and ARMv8 platforms.
+  * ARMv7 can be tested with virtual PMD drivers.
+  * ARMv8 can be tested with virtual and physicla PMD drivers.
 
 
 Resolved Issues
diff --git a/mk/arch/arm64/rte.vars.mk b/mk/arch/arm64/rte.vars.mk
new file mode 100644
index 0000000..3aad712
--- /dev/null
+++ b/mk/arch/arm64/rte.vars.mk
@@ -0,0 +1,58 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2015 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#
+# arch:
+#
+#   - define ARCH variable (overridden by cmdline or by previous
+#     optional define in machine .mk)
+#   - define CROSS variable (overridden by cmdline or previous define
+#     in machine .mk)
+#   - define CPU_CFLAGS variable (overridden by cmdline or previous
+#     define in machine .mk)
+#   - define CPU_LDFLAGS variable (overridden by cmdline or previous
+#     define in machine .mk)
+#   - define CPU_ASFLAGS variable (overridden by cmdline or previous
+#     define in machine .mk)
+#   - may override any previously defined variable
+#
+# examples for CONFIG_RTE_ARCH: i686, x86_64, x86_64_32
+#
+
+ARCH  ?= arm64
+# common arch dir in eal headers
+ARCH_DIR := arm
+CROSS ?=
+
+CPU_CFLAGS  ?= -DRTE_CACHE_LINE_SIZE=64
+CPU_LDFLAGS ?=
+CPU_ASFLAGS ?= -felf
+
+export ARCH CROSS CPU_CFLAGS CPU_LDFLAGS CPU_ASFLAGS
diff --git a/mk/machine/armv8a/rte.vars.mk b/mk/machine/armv8a/rte.vars.mk
new file mode 100644
index 0000000..b785062
--- /dev/null
+++ b/mk/machine/armv8a/rte.vars.mk
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2015 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#
+# machine:
+#
+#   - can define ARCH variable (overridden by cmdline value)
+#   - can define CROSS variable (overridden by cmdline value)
+#   - define MACHINE_CFLAGS variable (overridden by cmdline value)
+#   - define MACHINE_LDFLAGS variable (overridden by cmdline value)
+#   - define MACHINE_ASFLAGS variable (overridden by cmdline value)
+#   - can define CPU_CFLAGS variable (overridden by cmdline value) that
+#     overrides the one defined in arch.
+#   - can define CPU_LDFLAGS variable (overridden by cmdline value) that
+#     overrides the one defined in arch.
+#   - can define CPU_ASFLAGS variable (overridden by cmdline value) that
+#     overrides the one defined in arch.
+#   - may override any previously defined variable
+#
+
+# ARCH =
+# CROSS =
+# MACHINE_CFLAGS =
+# MACHINE_LDFLAGS =
+# MACHINE_ASFLAGS =
+# CPU_CFLAGS =
+# CPU_LDFLAGS =
+# CPU_ASFLAGS =
+
+MACHINE_CFLAGS += -march=armv8-a
-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH 5/5] test: add checks for cpu flags on armv8
  2015-10-29 17:29 [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support David Hunt
                   ` (3 preceding siblings ...)
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 4/5] mk: add support for armv8 on top of armv7 David Hunt
@ 2015-10-29 17:29 ` David Hunt
  2015-10-29 18:27 ` [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support Thomas Monjalon
  2015-10-30  0:17 ` [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support Jan Viktorin
  6 siblings, 0 replies; 32+ messages in thread
From: David Hunt @ 2015-10-29 17:29 UTC (permalink / raw)
  To: dev

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_cpuflags.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
index 557458f..1689048 100644
--- a/app/test/test_cpuflags.c
+++ b/app/test/test_cpuflags.c
@@ -1,4 +1,4 @@
-/*-
+/*
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
@@ -115,9 +115,18 @@ test_cpuflags(void)
 	CHECK_FOR_FLAG(RTE_CPUFLAG_ICACHE_SNOOP);
 #endif
 
-#if defined(RTE_ARCH_ARM)
+#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
+	printf("Checking for Floating Point:\t\t");
+	CHECK_FOR_FLAG(RTE_CPUFLAG_FPA);
+
 	printf("Check for NEON:\t\t");
 	CHECK_FOR_FLAG(RTE_CPUFLAG_NEON);
+
+	printf("Checking for ARM32 mode:\t\t");
+	CHECK_FOR_FLAG(RTE_CPUFLAG_AARCH32);
+
+	printf("Checking for ARM64 mode:\t\t");
+	CHECK_FOR_FLAG(RTE_CPUFLAG_AARCH64);
 #endif
 
 #if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH 3/5] eal: fix compilation for armv8 64-bit
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 3/5] eal: fix compilation for armv8 64-bit David Hunt
@ 2015-10-29 17:38   ` Jan Viktorin
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-29 17:38 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

Hello Dave,

On Thu, 29 Oct 2015 17:29:52 +0000
David Hunt <david.hunt@intel.com> wrote:

> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_eal/common/include/arch/arm/rte_cpuflags.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
> index 7ce9d14..27d49c0 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
> @@ -141,12 +141,21 @@ rte_cpu_get_features(__attribute__((unused)) uint32_t leaf,
>  	__attribute__((unused)) uint32_t subleaf, cpuid_registers_t out)
>  {
>  	int auxv_fd;
> +#ifdef RTE_ARCH_64
> +	Elf64_auxv_t auxv;
> +#else
>  	Elf32_auxv_t auxv;
> +#endif
>  
>  	auxv_fd = open("/proc/self/auxv", O_RDONLY);
>  	assert(auxv_fd);
> +#ifdef RTE_ARCH_64
> +	while (read(auxv_fd, &auxv,
> +		sizeof(Elf64_auxv_t)) == sizeof(Elf64_auxv_t)) {
> +#else
>  	while (read(auxv_fd, &auxv,
>  		sizeof(Elf32_auxv_t)) == sizeof(Elf32_auxv_t)) {
> +#endif
>  		if (auxv.a_type == AT_HWCAP)
>  			out[REG_HWCAP] = auxv.a_un.a_val;
>  		else if (auxv.a_type == AT_HWCAP2)

I think, it might be better to do a typedef (or define) like

#ifdef RTE_ARCH_64
typedef Elf64_auxv_t Elf_auxv_t;
#else
typedef Elf32_auxv_t Elf_auxv_t;
#endif

while leaving the above code almost untouched (just Elf32_auxv_t ->
Elf_auxv_t). This is like spagetti... :)

Regards
Jan

-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH 4/5] mk: add support for armv8 on top of armv7
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 4/5] mk: add support for armv8 on top of armv7 David Hunt
@ 2015-10-29 17:39   ` Jan Viktorin
  2015-10-29 17:42   ` Jan Viktorin
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-29 17:39 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Thu, 29 Oct 2015 17:29:53 +0000
David Hunt <david.hunt@intel.com> wrote:

> +* **Introduce ARMv7 and ARMv8 architectures**
>  
> -  It is now possible to build DPDK for the ARMv7 platform and test with
> -  virtual PMD drivers.
> +  * It is now possible to build DPDK for the ARMv7 and ARMv8 platforms.
> +  * ARMv7 can be tested with virtual PMD drivers.
> +  * ARMv8 can be tested with virtual and physicla PMD drivers.

Typo "physical"

>  
>  
>  Resolved Issues
> diff --git a/mk/arch/arm64/rte.vars.mk b/mk/arch/arm64/rte.vars.mk
> new file mode 100644


-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH 4/5] mk: add support for armv8 on top of armv7
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 4/5] mk: add support for armv8 on top of armv7 David Hunt
  2015-10-29 17:39   ` Jan Viktorin
@ 2015-10-29 17:42   ` Jan Viktorin
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-29 17:42 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Thu, 29 Oct 2015 17:29:53 +0000
David Hunt <david.hunt@intel.com> wrote:

> +
> +CONFIG_RTE_LIBRTE_LPM=n
> +CONFIG_RTE_LIBRTE_ACL=n
> +CONFIG_RTE_LIBRTE_TABLE=n
> +CONFIG_RTE_LIBRTE_PIPELINE=n
> +
> +# This is used to adjust the generic arm timer to align with the cpu cycle count.
> +CONFIG_RTE_TIMER_MULTIPLIER=48

Where is the rte_cycles.h for ARMv8? Did you forget it? I could not
find it in the patch set.

Jan

> diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
> index 43a3a3c..2b806f5 100644
> --- a/doc/guides/rel_notes/release_2_2.rst
> +++ b/doc/guides/rel_notes/release_2_2.rst
> @@ -23,10 +23,11 @@ New Features
>  
>  * **Added vhost-user multiple queue support.**
>  
> -* **Introduce ARMv7 architecture**


-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support
  2015-10-29 17:29 [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support David Hunt
                   ` (4 preceding siblings ...)
  2015-10-29 17:29 ` [dpdk-dev] [PATCH 5/5] test: add checks for cpu flags on armv8 David Hunt
@ 2015-10-29 18:27 ` Thomas Monjalon
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
  2015-10-30  0:17 ` [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support Jan Viktorin
  6 siblings, 1 reply; 32+ messages in thread
From: Thomas Monjalon @ 2015-10-29 18:27 UTC (permalink / raw)
  To: David Hunt, Jan Viktorin; +Cc: dev

Thanks David.

2015-10-29 17:29, David Hunt:
> This is an updated patchset for ARMv8 that now sits on top of the previously 
> submitted ARMv7 code by RehiveTech. It re-uses a lot of that code, and splits
> some header files into 32-bit and 64-bit versions, so uses the same arm include
> directory. 
[...]
>   eal/arm: split arm rte_memcpy.h into 32 and 64 bit versions.
>   eal/arm: split arm rte_prefetch.h into 32 and 64 bit versions
[...]
>  .../common/include/arch/arm/rte_memcpy.h           | 302 +------------------
>  .../common/include/arch/arm/rte_memcpy_32.h        | 334 +++++++++++++++++++++
>  .../common/include/arch/arm/rte_memcpy_64.h        | 322 ++++++++++++++++++++
>  .../common/include/arch/arm/rte_prefetch.h         |  31 +-
>  .../common/include/arch/arm/rte_prefetch_32.h      |  61 ++++
>  .../common/include/arch/arm/rte_prefetch_64.h      |  61 ++++

Jan, it would be easier to review if your patchset was creating the 32-bit
versions of these files. Then David just has to add the 64-bit ones.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support
  2015-10-29 17:29 [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support David Hunt
                   ` (5 preceding siblings ...)
  2015-10-29 18:27 ` [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support Thomas Monjalon
@ 2015-10-30  0:17 ` Jan Viktorin
  2015-10-30  8:52   ` Hunt, David
  6 siblings, 1 reply; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:17 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

I've failed to compile kni/igb for ARMv8. Any ideas? Is it Linux 4.2
compatbile?

  CC [M]  /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.o
/home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c: In functi
on ‘igb_ndo_bridge_getlink’:
/home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2279:9: er
ror: too few arguments to function ‘ndo_dflt_bridge_getlink’
  return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode, 0, 0, nlflags);
         ^
In file included from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/net/dst.h:13:0,
                 from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/net/sock.h:67,
                 from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/linux/tcp.h:22,
                 from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:34:
/home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/linux/rtnetlink.h:115:12: note: declared here
 extern int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
            ^
/home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2286:1: error: control reaches end of non-void function [-Werror=return-type]
 }
 ^
cc1: all warnings being treated as errors
/home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/scripts/Makefile.build:258: recipe for target '/home/jviki/Projects/bu
ildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.o' failed

Regards
Jan

On Thu, 29 Oct 2015 17:29:49 +0000
David Hunt <david.hunt@intel.com> wrote:

> Hi DPDK Community. 
> 
> This is an updated patchset for ARMv8 that now sits on top of the previously 
> submitted ARMv7 code by RehiveTech. It re-uses a lot of that code, and splits
> some header files into 32-bit and 64-bit versions, so uses the same arm include
> directory. 
> 
> Tested on an XGene 64-bit arm server board, with PCI slots. Passes traffic between
> two physical ports on an Intel 82599 dual-port 10Gig NIC. Should work with many
> other NICS, but these are as yet untested. 
> 
> Compiles igb_uio, kni and all the physical device PMDs. 
> 
> ACL and LPM are disabled due to compilation issues. 
> 
> Note added to the Release notes. 
> 
> 
> David Hunt (5):
>   eal/arm: split arm rte_memcpy.h into 32 and 64 bit versions.
>   eal/arm: split arm rte_prefetch.h into 32 and 64 bit versions
>   eal/arm: fix 64-bit compilation for armv8
>   mk: Add makefile support for armv8 architecture
>   test: add test for cpu flags on armv8
> 
>  MAINTAINERS                                        |   3 +-
>  app/test/test_cpuflags.c                           |  13 +-
>  config/defconfig_arm64-armv8a-linuxapp-gcc         |  56 ++++
>  doc/guides/rel_notes/release_2_2.rst               |   7 +-
>  .../common/include/arch/arm/rte_cpuflags.h         |   9 +
>  .../common/include/arch/arm/rte_memcpy.h           | 302 +------------------
>  .../common/include/arch/arm/rte_memcpy_32.h        | 334 +++++++++++++++++++++
>  .../common/include/arch/arm/rte_memcpy_64.h        | 322 ++++++++++++++++++++
>  .../common/include/arch/arm/rte_prefetch.h         |  31 +-
>  .../common/include/arch/arm/rte_prefetch_32.h      |  61 ++++
>  .../common/include/arch/arm/rte_prefetch_64.h      |  61 ++++
>  mk/arch/arm64/rte.vars.mk                          |  58 ++++
>  mk/machine/armv8a/rte.vars.mk                      |  57 ++++
>  13 files changed, 986 insertions(+), 328 deletions(-)
>  create mode 100644 config/defconfig_arm64-armv8a-linuxapp-gcc
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
>  create mode 100644 mk/arch/arm64/rte.vars.mk
>  create mode 100644 mk/machine/armv8a/rte.vars.mk
> 



-- 
  Jan Viktorin                E-mail: Viktorin@RehiveTech.com
  System Architect            Web:    www.RehiveTech.com
  RehiveTech
  Brno, Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture
  2015-10-29 18:27 ` [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support Thomas Monjalon
@ 2015-10-30  0:25   ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM Jan Viktorin
                       ` (14 more replies)
  0 siblings, 15 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: dev

Hello,

as Thomas M. suggested, I've made few changes to the ARMv7 code to
make the ARMv8 inclusion easier. I can just say that it compiles,
however, as there are no functional changes I would expect it is OK.

Regards
Jan

---

You can pull the changes from

  https://github.com/RehiveTech/dpdk.git arm-support-v5

since commit 82fb702077f67585d64a07de0080e5cb6a924a72:

  ixgbe: support new flow director modes for X550 (2015-10-29 00:06:01 +0100)

up to 285d29f6226d53c8af8035ebaf4c9edf635e2c56:

  maintainers: claim responsibility for ARMv7 (2015-10-30 01:13:26 +0100)

---

Jan Viktorin (7):
  eal/arm: implement rdtsc by PMU or clock_gettime
  eal/arm: use vector memcpy only when NEON is enabled
  eal/arm: detect arm architecture in cpu flags
  eal/arm: rwlock support for ARM
  eal/arm: add very incomplete rte_vect
  gcc/arm: avoid alignment errors to break build
  maintainers: claim responsibility for ARMv7

Vlastimil Kosar (8):
  eal/arm: atomic operations for ARM
  eal/arm: byte order operations for ARM
  eal/arm: cpu cycle operations for ARM
  eal/arm: prefetch operations for ARM
  eal/arm: spinlock operations for ARM (without HTM)
  eal/arm: vector memcpy for ARM
  eal/arm: cpu flag checks for ARM
  mk: Introduce ARMv7 architecture

 MAINTAINERS                                        |   4 +
 app/test/test_cpuflags.c                           |   5 +
 config/defconfig_arm-armv7a-linuxapp-gcc           |  74 +++++
 doc/guides/rel_notes/release_2_2.rst               |   5 +
 .../common/include/arch/arm/rte_atomic.h           | 256 ++++++++++++++++
 .../common/include/arch/arm/rte_byteorder.h        | 150 +++++++++
 .../common/include/arch/arm/rte_cpuflags.h         | 193 ++++++++++++
 .../common/include/arch/arm/rte_cycles.h           |  38 +++
 .../common/include/arch/arm/rte_cycles_32.h        | 121 ++++++++
 .../common/include/arch/arm/rte_memcpy.h           |  38 +++
 .../common/include/arch/arm/rte_memcpy_32.h        | 334 +++++++++++++++++++++
 .../common/include/arch/arm/rte_prefetch.h         |  38 +++
 .../common/include/arch/arm/rte_prefetch_32.h      |  61 ++++
 .../common/include/arch/arm/rte_rwlock.h           |  40 +++
 .../common/include/arch/arm/rte_spinlock.h         | 114 +++++++
 lib/librte_eal/common/include/arch/arm/rte_vect.h  |  84 ++++++
 mk/arch/arm/rte.vars.mk                            |  39 +++
 mk/machine/armv7-a/rte.vars.mk                     |  67 +++++
 mk/rte.cpuflags.mk                                 |   6 +
 mk/toolchain/gcc/rte.vars.mk                       |   6 +
 20 files changed, 1673 insertions(+)
 create mode 100644 config/defconfig_arm-armv7a-linuxapp-gcc
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_vect.h
 create mode 100644 mk/arch/arm/rte.vars.mk
 create mode 100644 mk/machine/armv7-a/rte.vars.mk

-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-11-02  5:53       ` Jerin Jacob
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 02/15] eal/arm: byte order " Jan Viktorin
                       ` (13 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: Vlastimil Kosar, dev

From: Vlastimil Kosar <kosar@rehivetech.com>

This patch adds architecture specific atomic operation file
for ARM architecture. It utilizes compiler intrinsics only.

Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
v1 -> v2:
* improve rte_wmb()
* use __atomic_* or __sync_*? (may affect the required GCC version)

v4:
* checkpatch complaints about volatile keyword (but seems to be OK to me)
* checkpatch complaints about do { ... } while (0) for single statement
  with asm volatile (but I didn't find a way how to write it without
  the checkpatch complaints)
* checkpatch is now happy with whitespaces
---
 .../common/include/arch/arm/rte_atomic.h           | 256 +++++++++++++++++++++
 1 file changed, 256 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic.h b/lib/librte_eal/common/include/arch/arm/rte_atomic.h
new file mode 100644
index 0000000..ea1e485
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic.h
@@ -0,0 +1,256 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ATOMIC_ARM_H_
+#define _RTE_ATOMIC_ARM_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_atomic.h"
+
+/**
+ * General memory barrier.
+ *
+ * Guarantees that the LOAD and STORE operations generated before the
+ * barrier occur before the LOAD and STORE operations generated after.
+ */
+#define	rte_mb()  __sync_synchronize()
+
+/**
+ * Write memory barrier.
+ *
+ * Guarantees that the STORE operations generated before the barrier
+ * occur before the STORE operations generated after.
+ */
+#define	rte_wmb() do { asm volatile ("dmb st" : : : "memory"); } while (0)
+
+/**
+ * Read memory barrier.
+ *
+ * Guarantees that the LOAD operations generated before the barrier
+ * occur before the LOAD operations generated after.
+ */
+#define	rte_rmb() __sync_synchronize()
+
+/*------------------------- 16 bit atomic operations -------------------------*/
+
+#ifndef RTE_FORCE_INTRINSICS
+static inline int
+rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
+{
+	return __atomic_compare_exchange(dst, &exp, &src, 0, __ATOMIC_ACQUIRE,
+		__ATOMIC_ACQUIRE) ? 1 : 0;
+}
+
+static inline int rte_atomic16_test_and_set(rte_atomic16_t *v)
+{
+	return rte_atomic16_cmpset((volatile uint16_t *)&v->cnt, 0, 1);
+}
+
+static inline void
+rte_atomic16_inc(rte_atomic16_t *v)
+{
+	__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE);
+}
+
+static inline void
+rte_atomic16_dec(rte_atomic16_t *v)
+{
+	__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE);
+}
+
+static inline int rte_atomic16_inc_and_test(rte_atomic16_t *v)
+{
+	return (__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
+}
+
+static inline int rte_atomic16_dec_and_test(rte_atomic16_t *v)
+{
+	return (__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
+}
+
+/*------------------------- 32 bit atomic operations -------------------------*/
+
+static inline int
+rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
+{
+	return __atomic_compare_exchange(dst, &exp, &src, 0, __ATOMIC_ACQUIRE,
+		__ATOMIC_ACQUIRE) ? 1 : 0;
+}
+
+static inline int rte_atomic32_test_and_set(rte_atomic32_t *v)
+{
+	return rte_atomic32_cmpset((volatile uint32_t *)&v->cnt, 0, 1);
+}
+
+static inline void
+rte_atomic32_inc(rte_atomic32_t *v)
+{
+	__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE);
+}
+
+static inline void
+rte_atomic32_dec(rte_atomic32_t *v)
+{
+	__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE);
+}
+
+static inline int rte_atomic32_inc_and_test(rte_atomic32_t *v)
+{
+	return (__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
+}
+
+static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
+{
+	return (__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
+}
+
+/*------------------------- 64 bit atomic operations -------------------------*/
+
+static inline int
+rte_atomic64_cmpset(volatile uint64_t *dst, uint64_t exp, uint64_t src)
+{
+	return __atomic_compare_exchange(dst, &exp, &src, 0, __ATOMIC_ACQUIRE,
+		__ATOMIC_ACQUIRE) ? 1 : 0;
+}
+
+static inline void
+rte_atomic64_init(rte_atomic64_t *v)
+{
+	int success = 0;
+	uint64_t tmp;
+
+	while (success == 0) {
+		tmp = v->cnt;
+		success = rte_atomic64_cmpset(
+				(volatile uint64_t *)&v->cnt, tmp, 0);
+	}
+}
+
+static inline int64_t
+rte_atomic64_read(rte_atomic64_t *v)
+{
+	int success = 0;
+	uint64_t tmp;
+
+	while (success == 0) {
+		tmp = v->cnt;
+		/* replace the value by itself */
+		success = rte_atomic64_cmpset(
+				(volatile uint64_t *) &v->cnt, tmp, tmp);
+	}
+	return tmp;
+}
+
+static inline void
+rte_atomic64_set(rte_atomic64_t *v, int64_t new_value)
+{
+	int success = 0;
+	uint64_t tmp;
+
+	while (success == 0) {
+		tmp = v->cnt;
+		success = rte_atomic64_cmpset(
+				(volatile uint64_t *)&v->cnt, tmp, new_value);
+	}
+}
+
+static inline void
+rte_atomic64_add(rte_atomic64_t *v, int64_t inc)
+{
+	__atomic_fetch_add(&v->cnt, inc, __ATOMIC_ACQUIRE);
+}
+
+static inline void
+rte_atomic64_sub(rte_atomic64_t *v, int64_t dec)
+{
+	__atomic_fetch_sub(&v->cnt, dec, __ATOMIC_ACQUIRE);
+}
+
+static inline void
+rte_atomic64_inc(rte_atomic64_t *v)
+{
+	__atomic_fetch_add(&v->cnt, 1, __ATOMIC_ACQUIRE);
+}
+
+static inline void
+rte_atomic64_dec(rte_atomic64_t *v)
+{
+	__atomic_fetch_sub(&v->cnt, 1, __ATOMIC_ACQUIRE);
+}
+
+static inline int64_t
+rte_atomic64_add_return(rte_atomic64_t *v, int64_t inc)
+{
+	return __atomic_add_fetch(&v->cnt, inc, __ATOMIC_ACQUIRE);
+}
+
+static inline int64_t
+rte_atomic64_sub_return(rte_atomic64_t *v, int64_t dec)
+{
+	return __atomic_sub_fetch(&v->cnt, dec, __ATOMIC_ACQUIRE);
+}
+
+static inline int rte_atomic64_inc_and_test(rte_atomic64_t *v)
+{
+	return (__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
+}
+
+static inline int rte_atomic64_dec_and_test(rte_atomic64_t *v)
+{
+	return (__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
+}
+
+static inline int rte_atomic64_test_and_set(rte_atomic64_t *v)
+{
+	return rte_atomic64_cmpset((volatile uint64_t *)&v->cnt, 0, 1);
+}
+
+/**
+ * Atomically set a 64-bit counter to 0.
+ *
+ * @param v
+ *   A pointer to the atomic counter.
+ */
+static inline void rte_atomic64_clear(rte_atomic64_t *v)
+{
+	rte_atomic64_set(v, 0);
+}
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_ATOMIC_ARM_H_ */
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 02/15] eal/arm: byte order operations for ARM
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 03/15] eal/arm: cpu cycle " Jan Viktorin
                       ` (12 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: Vlastimil Kosar, dev

From: Vlastimil Kosar <kosar@rehivetech.com>

This patch adds architecture specific byte order operations
for ARM. The architecture supports both big and little endian.

Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
v4: fix passing params to asm volatile for checkpatch
---
 .../common/include/arch/arm/rte_byteorder.h        | 150 +++++++++++++++++++++
 1 file changed, 150 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_byteorder.h b/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
new file mode 100644
index 0000000..5776997
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
@@ -0,0 +1,150 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_BYTEORDER_ARM_H_
+#define _RTE_BYTEORDER_ARM_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_byteorder.h"
+
+/*
+ * An architecture-optimized byte swap for a 16-bit value.
+ *
+ * Do not use this function directly. The preferred function is rte_bswap16().
+ */
+static inline uint16_t rte_arch_bswap16(uint16_t _x)
+{
+	register uint16_t x = _x;
+
+	asm volatile ("rev16 %0,%1"
+		      : "=r" (x)
+		      : "r" (x)
+		      );
+	return x;
+}
+
+/*
+ * An architecture-optimized byte swap for a 32-bit value.
+ *
+ * Do not use this function directly. The preferred function is rte_bswap32().
+ */
+static inline uint32_t rte_arch_bswap32(uint32_t _x)
+{
+	register uint32_t x = _x;
+
+	asm volatile ("rev %0,%1"
+		      : "=r" (x)
+		      : "r" (x)
+		      );
+	return x;
+}
+
+/*
+ * An architecture-optimized byte swap for a 64-bit value.
+ *
+  * Do not use this function directly. The preferred function is rte_bswap64().
+ */
+/* 64-bit mode */
+static inline uint64_t rte_arch_bswap64(uint64_t _x)
+{
+	return  __builtin_bswap64(_x);
+}
+
+#ifndef RTE_FORCE_INTRINSICS
+#define rte_bswap16(x) ((uint16_t)(__builtin_constant_p(x) ?		\
+				   rte_constant_bswap16(x) :		\
+				   rte_arch_bswap16(x)))
+
+#define rte_bswap32(x) ((uint32_t)(__builtin_constant_p(x) ?		\
+				   rte_constant_bswap32(x) :		\
+				   rte_arch_bswap32(x)))
+
+#define rte_bswap64(x) ((uint64_t)(__builtin_constant_p(x) ?		\
+				   rte_constant_bswap64(x) :		\
+				   rte_arch_bswap64(x)))
+#else
+/*
+ * __builtin_bswap16 is only available gcc 4.8 and upwards
+ */
+#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 8)
+#define rte_bswap16(x) ((uint16_t)(__builtin_constant_p(x) ?		\
+				   rte_constant_bswap16(x) :		\
+				   rte_arch_bswap16(x)))
+#endif
+#endif
+
+/* ARM architecture is bi-endian (both big and little). */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+
+#define rte_cpu_to_le_16(x) (x)
+#define rte_cpu_to_le_32(x) (x)
+#define rte_cpu_to_le_64(x) (x)
+
+#define rte_cpu_to_be_16(x) rte_bswap16(x)
+#define rte_cpu_to_be_32(x) rte_bswap32(x)
+#define rte_cpu_to_be_64(x) rte_bswap64(x)
+
+#define rte_le_to_cpu_16(x) (x)
+#define rte_le_to_cpu_32(x) (x)
+#define rte_le_to_cpu_64(x) (x)
+
+#define rte_be_to_cpu_16(x) rte_bswap16(x)
+#define rte_be_to_cpu_32(x) rte_bswap32(x)
+#define rte_be_to_cpu_64(x) rte_bswap64(x)
+
+#else /* RTE_BIG_ENDIAN */
+
+#define rte_cpu_to_le_16(x) rte_bswap16(x)
+#define rte_cpu_to_le_32(x) rte_bswap32(x)
+#define rte_cpu_to_le_64(x) rte_bswap64(x)
+
+#define rte_cpu_to_be_16(x) (x)
+#define rte_cpu_to_be_32(x) (x)
+#define rte_cpu_to_be_64(x) (x)
+
+#define rte_le_to_cpu_16(x) rte_bswap16(x)
+#define rte_le_to_cpu_32(x) rte_bswap32(x)
+#define rte_le_to_cpu_64(x) rte_bswap64(x)
+
+#define rte_be_to_cpu_16(x) (x)
+#define rte_be_to_cpu_32(x) (x)
+#define rte_be_to_cpu_64(x) (x)
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BYTEORDER_ARM_H_ */
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 03/15] eal/arm: cpu cycle operations for ARM
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 02/15] eal/arm: byte order " Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 04/15] eal/arm: implement rdtsc by PMU or clock_gettime Jan Viktorin
                       ` (11 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: Vlastimil Kosar, dev

From: Vlastimil Kosar <kosar@rehivetech.com>

ARM architecture doesn't have a suitable source of CPU cycles. This
patch uses clock_gettime instead. The implementation should be improved
in the future.

Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
v5: prepare for applying ARMv8
---
 .../common/include/arch/arm/rte_cycles.h           | 38 ++++++++++
 .../common/include/arch/arm/rte_cycles_32.h        | 85 ++++++++++++++++++++++
 2 files changed, 123 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles_32.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles.h b/lib/librte_eal/common/include/arch/arm/rte_cycles.h
new file mode 100644
index 0000000..b2372fa
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles.h
@@ -0,0 +1,38 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_CYCLES_ARM_H_
+#define _RTE_CYCLES_ARM_H_
+
+#include <rte_cycles_32.h>
+
+#endif /* _RTE_CYCLES_ARM_H_ */
diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
new file mode 100644
index 0000000..755cc4a
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
@@ -0,0 +1,85 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_CYCLES_ARM32_H_
+#define _RTE_CYCLES_ARM32_H_
+
+/* ARM v7 does not have suitable source of clock signals. The only clock counter
+   available in the core is 32 bit wide. Therefore it is unsuitable as the
+   counter overlaps every few seconds and probably is not accessible by
+   userspace programs. Therefore we use clock_gettime(CLOCK_MONOTONIC_RAW) to
+   simulate counter running at 1GHz.
+*/
+
+#include <time.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_cycles.h"
+
+/**
+ * Read the time base register.
+ *
+ * @return
+ *   The time base for this lcore.
+ */
+static inline uint64_t
+rte_rdtsc(void)
+{
+	struct timespec val;
+	uint64_t v;
+
+	while (clock_gettime(CLOCK_MONOTONIC_RAW, &val) != 0)
+		/* no body */;
+
+	v  = (uint64_t) val.tv_sec * 1000000000LL;
+	v += (uint64_t) val.tv_nsec;
+	return v;
+}
+
+static inline uint64_t
+rte_rdtsc_precise(void)
+{
+	rte_mb();
+	return rte_rdtsc();
+}
+
+static inline uint64_t
+rte_get_tsc_cycles(void) { return rte_rdtsc(); }
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_CYCLES_ARM32_H_ */
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 04/15] eal/arm: implement rdtsc by PMU or clock_gettime
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (2 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 03/15] eal/arm: cpu cycle " Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 05/15] eal/arm: prefetch operations for ARM Jan Viktorin
                       ` (10 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: dev

Enable to choose a preferred way to read timer based on the
configuration entry CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU.
It requires a kernel module that is not included to work.

Based on the patch by David Hunt and Armuta Zende:

  lib: added support for armv7 architecture

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Amruta Zende <amruta.zende@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
---
 .../common/include/arch/arm/rte_cycles_32.h        | 38 +++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
index 755cc4a..6c6098e 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
@@ -54,8 +54,14 @@ extern "C" {
  * @return
  *   The time base for this lcore.
  */
+#ifndef CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
+
+/**
+ * This call is easily portable to any ARM architecture, however,
+ * it may be damn slow and inprecise for some tasks.
+ */
 static inline uint64_t
-rte_rdtsc(void)
+__rte_rdtsc_syscall(void)
 {
 	struct timespec val;
 	uint64_t v;
@@ -67,6 +73,36 @@ rte_rdtsc(void)
 	v += (uint64_t) val.tv_nsec;
 	return v;
 }
+#define rte_rdtsc __rte_rdtsc_syscall
+
+#else
+
+/**
+ * This function requires to configure the PMCCNTR and enable
+ * userspace access to it:
+ *
+ *      asm volatile("mcr p15, 0, %0, c9, c14, 0" : : "r"(1));
+ *      asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r"(29));
+ *      asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r"(0x8000000f));
+ *
+ * which is possible only from the priviledged mode (kernel space).
+ */
+static inline uint64_t
+__rte_rdtsc_pmccntr(void)
+{
+	unsigned tsc;
+	uint64_t final_tsc;
+
+	/* Read PMCCNTR */
+	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(tsc));
+	/* 1 tick = 64 clocks */
+	final_tsc = ((uint64_t)tsc) << 6;
+
+	return (uint64_t)final_tsc;
+}
+#define rte_rdtsc __rte_rdtsc_pmccntr
+
+#endif /* RTE_ARM_EAL_RDTSC_USE_PMU */
 
 static inline uint64_t
 rte_rdtsc_precise(void)
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 05/15] eal/arm: prefetch operations for ARM
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (3 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 04/15] eal/arm: implement rdtsc by PMU or clock_gettime Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 06/15] eal/arm: spinlock operations for ARM (without HTM) Jan Viktorin
                       ` (9 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: Vlastimil Kosar, dev

From: Vlastimil Kosar <kosar@rehivetech.com>

This patch adds architecture specific prefetch operations
for ARM architecture. It utilizes the pld instruction that
starts filling the appropriate cache line without blocking.

Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
v4:
* checkpatch does not like the syntax of naming params
    to asm volatile; switched to %0, %1 syntax
* checkpatch complatins about volatile (seems to be OK for me)

v5: prepare for applying ARMv8
---
 .../common/include/arch/arm/rte_prefetch.h         | 38 ++++++++++++++
 .../common/include/arch/arm/rte_prefetch_32.h      | 61 ++++++++++++++++++++++
 2 files changed, 99 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_prefetch.h b/lib/librte_eal/common/include/arch/arm/rte_prefetch.h
new file mode 100644
index 0000000..1f46697
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_prefetch.h
@@ -0,0 +1,38 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PREFETCH_ARM_H_
+#define _RTE_PREFETCH_ARM_H_
+
+#include <rte_prefetch_32.h>
+
+#endif /* _RTE_PREFETCH_ARM_H_ */
diff --git a/lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h b/lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h
new file mode 100644
index 0000000..b716384
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_prefetch_32.h
@@ -0,0 +1,61 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PREFETCH_ARM32_H_
+#define _RTE_PREFETCH_ARM32_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_prefetch.h"
+
+static inline void rte_prefetch0(const volatile void *p)
+{
+	asm volatile ("pld [%0]" : : "r" (p));
+}
+
+static inline void rte_prefetch1(const volatile void *p)
+{
+	asm volatile ("pld [%0]" : : "r" (p));
+}
+
+static inline void rte_prefetch2(const volatile void *p)
+{
+	asm volatile ("pld [%0]" : : "r" (p));
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PREFETCH_ARM32_H_ */
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 06/15] eal/arm: spinlock operations for ARM (without HTM)
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (4 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 05/15] eal/arm: prefetch operations for ARM Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 07/15] eal/arm: vector memcpy for ARM Jan Viktorin
                       ` (8 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: Vlastimil Kosar, dev

From: Vlastimil Kosar <kosar@rehivetech.com>

This patch adds spinlock operations for ARM architecture.
We do not support HTM in spinlocks on ARM.

Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 .../common/include/arch/arm/rte_spinlock.h         | 114 +++++++++++++++++++++
 1 file changed, 114 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
new file mode 100644
index 0000000..cd5ab8b
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
@@ -0,0 +1,114 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_SPINLOCK_ARM_H_
+#define _RTE_SPINLOCK_ARM_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
+/* Intrinsics are used to implement the spinlock on ARM architecture */
+
+#ifndef RTE_FORCE_INTRINSICS
+
+static inline void
+rte_spinlock_lock(rte_spinlock_t *sl)
+{
+	while (__sync_lock_test_and_set(&sl->locked, 1))
+		while (sl->locked)
+			rte_pause();
+}
+
+static inline void
+rte_spinlock_unlock(rte_spinlock_t *sl)
+{
+	__sync_lock_release(&sl->locked);
+}
+
+static inline int
+rte_spinlock_trylock(rte_spinlock_t *sl)
+{
+	return (__sync_lock_test_and_set(&sl->locked, 1) == 0);
+}
+
+#endif
+
+static inline int rte_tm_supported(void)
+{
+	return 0;
+}
+
+static inline void
+rte_spinlock_lock_tm(rte_spinlock_t *sl)
+{
+	rte_spinlock_lock(sl); /* fall-back */
+}
+
+static inline int
+rte_spinlock_trylock_tm(rte_spinlock_t *sl)
+{
+	return rte_spinlock_trylock(sl);
+}
+
+static inline void
+rte_spinlock_unlock_tm(rte_spinlock_t *sl)
+{
+	rte_spinlock_unlock(sl);
+}
+
+static inline void
+rte_spinlock_recursive_lock_tm(rte_spinlock_recursive_t *slr)
+{
+	rte_spinlock_recursive_lock(slr); /* fall-back */
+}
+
+static inline void
+rte_spinlock_recursive_unlock_tm(rte_spinlock_recursive_t *slr)
+{
+	rte_spinlock_recursive_unlock(slr);
+}
+
+static inline int
+rte_spinlock_recursive_trylock_tm(rte_spinlock_recursive_t *slr)
+{
+	return rte_spinlock_recursive_trylock(slr);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_SPINLOCK_ARM_H_ */
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 07/15] eal/arm: vector memcpy for ARM
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (5 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 06/15] eal/arm: spinlock operations for ARM (without HTM) Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 08/15] eal/arm: use vector memcpy only when NEON is enabled Jan Viktorin
                       ` (7 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: Vlastimil Kosar, dev

From: Vlastimil Kosar <kosar@rehivetech.com>

The SSE based memory copy in DPDK only support x86. This patch
adds ARM NEON based memory copy functions for ARM architecture.

The implementation improves memory copy of short or well aligned
data buffers. The following measurements show improvements over
the libc memcpy on Cortex CPUs.

               by X % faster
Length (B)   a15    a7     a9
   1         4.9  15.2    3.2
   7        56.9  48.2   40.3
   8        37.3  39.8   29.6
   9        69.3  38.7   33.9
  15        60.8  35.3   23.7
  16        50.6  35.9   35.0
  17        57.7  35.7   31.1
  31        16.0  23.3    9.0
  32        65.9  13.5   21.4
  33         3.9  10.3   -3.7
  63         2.0  12.9   -2.0
  64        66.5   0.0   16.5
  65         2.7   7.6  -35.6
 127         0.1   4.5  -18.9
 128        66.2   1.5  -51.4
 129        -0.8   3.2  -35.8
 255        -3.1  -0.9  -69.1
 256        67.9   1.2    7.2
 257        -3.6  -1.9  -36.9
 320        67.7   1.4    0.0
 384        66.8   1.4  -14.2
 511       -44.9  -2.3  -41.9
 512        67.3   1.4   -6.8
 513       -41.7  -3.0  -36.2
1023       -82.4  -2.8  -41.2
1024        68.3   1.4  -11.6
1025       -80.1  -3.3  -38.1
1518       -47.3  -5.0  -38.3
1522       -48.3  -6.0  -37.9
1600        65.4   1.3  -27.3
2048        59.5   1.5  -10.9
3072        52.3   1.5  -12.2
4096        45.3   1.4  -12.5
5120        40.6   1.5  -14.5
6144        35.4   1.4  -13.4
7168        32.9   1.4  -13.9
8192        28.2   1.4  -15.1

Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
v4:
* fix whitespace issues reported by checkpatch
* fix passing params to asm volatile for checkpatch

v5: prepare for applying ARMv8
---
 .../common/include/arch/arm/rte_memcpy.h           |  38 +++
 .../common/include/arch/arm/rte_memcpy_32.h        | 279 +++++++++++++++++++++
 2 files changed, 317 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
new file mode 100644
index 0000000..d9f5bf1
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
@@ -0,0 +1,38 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_MEMCPY_ARM_H_
+#define _RTE_MEMCPY_ARM_H_
+
+#include <rte_memcpy_32.h>
+
+#endif /* _RTE_MEMCPY_ARM_H_ */
diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
new file mode 100644
index 0000000..11f8241
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
@@ -0,0 +1,279 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_MEMCPY_ARM32_H_
+#define _RTE_MEMCPY_ARM32_H_
+
+#include <stdint.h>
+#include <string.h>
+/* ARM NEON Intrinsics are used to copy data */
+#include <arm_neon.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_memcpy.h"
+
+static inline void
+rte_mov16(uint8_t *dst, const uint8_t *src)
+{
+	vst1q_u8(dst, vld1q_u8(src));
+}
+
+static inline void
+rte_mov32(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile (
+		"vld1.8 {d0-d3}, [%0]\n\t"
+		"vst1.8 {d0-d3}, [%1]\n\t"
+		: "+r" (src), "+r" (dst)
+		: : "memory", "d0", "d1", "d2", "d3");
+}
+
+static inline void
+rte_mov48(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile (
+		"vld1.8 {d0-d3}, [%0]!\n\t"
+		"vld1.8 {d4-d5}, [%0]\n\t"
+		"vst1.8 {d0-d3}, [%1]!\n\t"
+		"vst1.8 {d4-d5}, [%1]\n\t"
+		: "+r" (src), "+r" (dst)
+		:
+		: "memory", "d0", "d1", "d2", "d3", "d4", "d5");
+}
+
+static inline void
+rte_mov64(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile (
+		"vld1.8 {d0-d3}, [%0]!\n\t"
+		"vld1.8 {d4-d7}, [%0]\n\t"
+		"vst1.8 {d0-d3}, [%1]!\n\t"
+		"vst1.8 {d4-d7}, [%1]\n\t"
+		: "+r" (src), "+r" (dst)
+		:
+		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7");
+}
+
+static inline void
+rte_mov128(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile ("pld [%0, #64]" : : "r" (src));
+	asm volatile (
+		"vld1.8 {d0-d3},   [%0]!\n\t"
+		"vld1.8 {d4-d7},   [%0]!\n\t"
+		"vld1.8 {d8-d11},  [%0]!\n\t"
+		"vld1.8 {d12-d15}, [%0]\n\t"
+		"vst1.8 {d0-d3},   [%1]!\n\t"
+		"vst1.8 {d4-d7},   [%1]!\n\t"
+		"vst1.8 {d8-d11},  [%1]!\n\t"
+		"vst1.8 {d12-d15}, [%1]\n\t"
+		: "+r" (src), "+r" (dst)
+		:
+		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
+		"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15");
+}
+
+static inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+	asm volatile ("pld [%0,  #64]" : : "r" (src));
+	asm volatile ("pld [%0, #128]" : : "r" (src));
+	asm volatile ("pld [%0, #192]" : : "r" (src));
+	asm volatile ("pld [%0, #256]" : : "r" (src));
+	asm volatile ("pld [%0, #320]" : : "r" (src));
+	asm volatile ("pld [%0, #384]" : : "r" (src));
+	asm volatile ("pld [%0, #448]" : : "r" (src));
+	asm volatile (
+		"vld1.8 {d0-d3},   [%0]!\n\t"
+		"vld1.8 {d4-d7},   [%0]!\n\t"
+		"vld1.8 {d8-d11},  [%0]!\n\t"
+		"vld1.8 {d12-d15}, [%0]!\n\t"
+		"vld1.8 {d16-d19}, [%0]!\n\t"
+		"vld1.8 {d20-d23}, [%0]!\n\t"
+		"vld1.8 {d24-d27}, [%0]!\n\t"
+		"vld1.8 {d28-d31}, [%0]\n\t"
+		"vst1.8 {d0-d3},   [%1]!\n\t"
+		"vst1.8 {d4-d7},   [%1]!\n\t"
+		"vst1.8 {d8-d11},  [%1]!\n\t"
+		"vst1.8 {d12-d15}, [%1]!\n\t"
+		"vst1.8 {d16-d19}, [%1]!\n\t"
+		"vst1.8 {d20-d23}, [%1]!\n\t"
+		"vst1.8 {d24-d27}, [%1]!\n\t"
+		"vst1.8 {d28-d31}, [%1]!\n\t"
+		: "+r" (src), "+r" (dst)
+		:
+		: "memory", "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
+		"d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15",
+		"d16", "d17", "d18", "d19", "d20", "d21", "d22", "d23",
+		"d24", "d25", "d26", "d27", "d28", "d29", "d30", "d31");
+}
+
+#define rte_memcpy(dst, src, n)              \
+	({ (__builtin_constant_p(n)) ?       \
+	memcpy((dst), (src), (n)) :          \
+	rte_memcpy_func((dst), (src), (n)); })
+
+static inline void *
+rte_memcpy_func(void *dst, const void *src, size_t n)
+{
+	void *ret = dst;
+
+	/* We can't copy < 16 bytes using XMM registers so do it manually. */
+	if (n < 16) {
+		if (n & 0x01) {
+			*(uint8_t *)dst = *(const uint8_t *)src;
+			dst = (uint8_t *)dst + 1;
+			src = (const uint8_t *)src + 1;
+		}
+		if (n & 0x02) {
+			*(uint16_t *)dst = *(const uint16_t *)src;
+			dst = (uint16_t *)dst + 1;
+			src = (const uint16_t *)src + 1;
+		}
+		if (n & 0x04) {
+			*(uint32_t *)dst = *(const uint32_t *)src;
+			dst = (uint32_t *)dst + 1;
+			src = (const uint32_t *)src + 1;
+		}
+		if (n & 0x08) {
+			/* ARMv7 can not handle unaligned access to long long
+			 * (uint64_t). Therefore two uint32_t operations are
+			 * used.
+			 */
+			*(uint32_t *)dst = *(const uint32_t *)src;
+			dst = (uint32_t *)dst + 1;
+			src = (const uint32_t *)src + 1;
+			*(uint32_t *)dst = *(const uint32_t *)src;
+		}
+		return ret;
+	}
+
+	/* Special fast cases for <= 128 bytes */
+	if (n <= 32) {
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov16((uint8_t *)dst - 16 + n,
+			(const uint8_t *)src - 16 + n);
+		return ret;
+	}
+
+	if (n <= 64) {
+		rte_mov32((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov32((uint8_t *)dst - 32 + n,
+			(const uint8_t *)src - 32 + n);
+		return ret;
+	}
+
+	if (n <= 128) {
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		rte_mov64((uint8_t *)dst - 64 + n,
+			(const uint8_t *)src - 64 + n);
+		return ret;
+	}
+
+	/*
+	 * For large copies > 128 bytes. This combination of 256, 64 and 16 byte
+	 * copies was found to be faster than doing 128 and 32 byte copies as
+	 * well.
+	 */
+	for ( ; n >= 256; n -= 256) {
+		rte_mov256((uint8_t *)dst, (const uint8_t *)src);
+		dst = (uint8_t *)dst + 256;
+		src = (const uint8_t *)src + 256;
+	}
+
+	/*
+	 * We split the remaining bytes (which will be less than 256) into
+	 * 64byte (2^6) chunks.
+	 * Using incrementing integers in the case labels of a switch statement
+	 * enourages the compiler to use a jump table. To get incrementing
+	 * integers, we shift the 2 relevant bits to the LSB position to first
+	 * get decrementing integers, and then subtract.
+	 */
+	switch (3 - (n >> 6)) {
+	case 0x00:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	case 0x01:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	case 0x02:
+		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
+		n -= 64;
+		dst = (uint8_t *)dst + 64;
+		src = (const uint8_t *)src + 64;      /* fallthrough */
+	default:
+		break;
+	}
+
+	/*
+	 * We split the remaining bytes (which will be less than 64) into
+	 * 16byte (2^4) chunks, using the same switch structure as above.
+	 */
+	switch (3 - (n >> 4)) {
+	case 0x00:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	case 0x01:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	case 0x02:
+		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+		n -= 16;
+		dst = (uint8_t *)dst + 16;
+		src = (const uint8_t *)src + 16;      /* fallthrough */
+	default:
+		break;
+	}
+
+	/* Copy any remaining bytes, without going beyond end of buffers */
+	if (n != 0)
+		rte_mov16((uint8_t *)dst - 16 + n,
+			(const uint8_t *)src - 16 + n);
+	return ret;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MEMCPY_ARM32_H_ */
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 08/15] eal/arm: use vector memcpy only when NEON is enabled
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (6 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 07/15] eal/arm: vector memcpy for ARM Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 09/15] eal/arm: cpu flag checks for ARM Jan Viktorin
                       ` (6 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: dev

The GCC can be configured to avoid using NEON extensions.
For that purpose, we provide just the memcpy implementation
of the rte_memcpy.

Based on the patch by David Hunt and Armuta Zende:

  lib: added support for armv7 architecture

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Amruta Zende <amruta.zende@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
---
v5: prepare for applying ARMv8
---
 .../common/include/arch/arm/rte_memcpy_32.h        | 59 +++++++++++++++++++++-
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
index 11f8241..df47c0d 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
@@ -35,8 +35,6 @@
 
 #include <stdint.h>
 #include <string.h>
-/* ARM NEON Intrinsics are used to copy data */
-#include <arm_neon.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -44,6 +42,11 @@ extern "C" {
 
 #include "generic/rte_memcpy.h"
 
+#ifdef __ARM_NEON_FP
+
+/* ARM NEON Intrinsics are used to copy data */
+#include <arm_neon.h>
+
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
@@ -272,6 +275,58 @@ rte_memcpy_func(void *dst, const void *src, size_t n)
 	return ret;
 }
 
+#else
+
+static inline void
+rte_mov16(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 16);
+}
+
+static inline void
+rte_mov32(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 32);
+}
+
+static inline void
+rte_mov48(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 48);
+}
+
+static inline void
+rte_mov64(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 64);
+}
+
+static inline void
+rte_mov128(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 128);
+}
+
+static inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+	memcpy(dst, src, 256);
+}
+
+static inline void *
+rte_memcpy(void *dst, const void *src, size_t n)
+{
+	return memcpy(dst, src, n);
+}
+
+static inline void *
+rte_memcpy_func(void *dst, const void *src, size_t n)
+{
+	return memcpy(dst, src, n);
+}
+
+#endif /* __ARM_NEON_FP */
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 09/15] eal/arm: cpu flag checks for ARM
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (7 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 08/15] eal/arm: use vector memcpy only when NEON is enabled Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 10/15] eal/arm: detect arm architecture in cpu flags Jan Viktorin
                       ` (5 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: Vlastimil Kosar, dev

From: Vlastimil Kosar <kosar@rehivetech.com>

This implementation is based on IBM POWER version of
rte_cpuflags. We use software emulation of HW capability
registers, because those are usually not directly accessible
from userspace on ARM.

Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 app/test/test_cpuflags.c                           |   5 +
 .../common/include/arch/arm/rte_cpuflags.h         | 177 +++++++++++++++++++++
 mk/rte.cpuflags.mk                                 |   6 +
 3 files changed, 188 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h

diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
index 5b92061..557458f 100644
--- a/app/test/test_cpuflags.c
+++ b/app/test/test_cpuflags.c
@@ -115,6 +115,11 @@ test_cpuflags(void)
 	CHECK_FOR_FLAG(RTE_CPUFLAG_ICACHE_SNOOP);
 #endif
 
+#if defined(RTE_ARCH_ARM)
+	printf("Check for NEON:\t\t");
+	CHECK_FOR_FLAG(RTE_CPUFLAG_NEON);
+#endif
+
 #if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
 	printf("Check for SSE:\t\t");
 	CHECK_FOR_FLAG(RTE_CPUFLAG_SSE);
diff --git a/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
new file mode 100644
index 0000000..1eadb33
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
@@ -0,0 +1,177 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_CPUFLAGS_ARM_H_
+#define _RTE_CPUFLAGS_ARM_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <elf.h>
+#include <fcntl.h>
+#include <assert.h>
+#include <unistd.h>
+
+#include "generic/rte_cpuflags.h"
+
+#ifndef AT_HWCAP
+#define AT_HWCAP 16
+#endif
+
+#ifndef AT_HWCAP2
+#define AT_HWCAP2 26
+#endif
+
+/* software based registers */
+enum cpu_register_t {
+	REG_HWCAP = 0,
+	REG_HWCAP2,
+};
+
+/**
+ * Enumeration of all CPU features supported
+ */
+enum rte_cpu_flag_t {
+	RTE_CPUFLAG_SWP = 0,
+	RTE_CPUFLAG_HALF,
+	RTE_CPUFLAG_THUMB,
+	RTE_CPUFLAG_A26BIT,
+	RTE_CPUFLAG_FAST_MULT,
+	RTE_CPUFLAG_FPA,
+	RTE_CPUFLAG_VFP,
+	RTE_CPUFLAG_EDSP,
+	RTE_CPUFLAG_JAVA,
+	RTE_CPUFLAG_IWMMXT,
+	RTE_CPUFLAG_CRUNCH,
+	RTE_CPUFLAG_THUMBEE,
+	RTE_CPUFLAG_NEON,
+	RTE_CPUFLAG_VFPv3,
+	RTE_CPUFLAG_VFPv3D16,
+	RTE_CPUFLAG_TLS,
+	RTE_CPUFLAG_VFPv4,
+	RTE_CPUFLAG_IDIVA,
+	RTE_CPUFLAG_IDIVT,
+	RTE_CPUFLAG_VFPD32,
+	RTE_CPUFLAG_LPAE,
+	RTE_CPUFLAG_EVTSTRM,
+	RTE_CPUFLAG_AES,
+	RTE_CPUFLAG_PMULL,
+	RTE_CPUFLAG_SHA1,
+	RTE_CPUFLAG_SHA2,
+	RTE_CPUFLAG_CRC32,
+	/* The last item */
+	RTE_CPUFLAG_NUMFLAGS,/**< This should always be the last! */
+};
+
+static const struct feature_entry cpu_feature_table[] = {
+	FEAT_DEF(SWP,       0x00000001, 0, REG_HWCAP,  0)
+	FEAT_DEF(HALF,      0x00000001, 0, REG_HWCAP,  1)
+	FEAT_DEF(THUMB,     0x00000001, 0, REG_HWCAP,  2)
+	FEAT_DEF(A26BIT,    0x00000001, 0, REG_HWCAP,  3)
+	FEAT_DEF(FAST_MULT, 0x00000001, 0, REG_HWCAP,  4)
+	FEAT_DEF(FPA,       0x00000001, 0, REG_HWCAP,  5)
+	FEAT_DEF(VFP,       0x00000001, 0, REG_HWCAP,  6)
+	FEAT_DEF(EDSP,      0x00000001, 0, REG_HWCAP,  7)
+	FEAT_DEF(JAVA,      0x00000001, 0, REG_HWCAP,  8)
+	FEAT_DEF(IWMMXT,    0x00000001, 0, REG_HWCAP,  9)
+	FEAT_DEF(CRUNCH,    0x00000001, 0, REG_HWCAP,  10)
+	FEAT_DEF(THUMBEE,   0x00000001, 0, REG_HWCAP,  11)
+	FEAT_DEF(NEON,      0x00000001, 0, REG_HWCAP,  12)
+	FEAT_DEF(VFPv3,     0x00000001, 0, REG_HWCAP,  13)
+	FEAT_DEF(VFPv3D16,  0x00000001, 0, REG_HWCAP,  14)
+	FEAT_DEF(TLS,       0x00000001, 0, REG_HWCAP,  15)
+	FEAT_DEF(VFPv4,     0x00000001, 0, REG_HWCAP,  16)
+	FEAT_DEF(IDIVA,     0x00000001, 0, REG_HWCAP,  17)
+	FEAT_DEF(IDIVT,     0x00000001, 0, REG_HWCAP,  18)
+	FEAT_DEF(VFPD32,    0x00000001, 0, REG_HWCAP,  19)
+	FEAT_DEF(LPAE,      0x00000001, 0, REG_HWCAP,  20)
+	FEAT_DEF(EVTSTRM,   0x00000001, 0, REG_HWCAP,  21)
+	FEAT_DEF(AES,       0x00000001, 0, REG_HWCAP2,  0)
+	FEAT_DEF(PMULL,     0x00000001, 0, REG_HWCAP2,  1)
+	FEAT_DEF(SHA1,      0x00000001, 0, REG_HWCAP2,  2)
+	FEAT_DEF(SHA2,      0x00000001, 0, REG_HWCAP2,  3)
+	FEAT_DEF(CRC32,     0x00000001, 0, REG_HWCAP2,  4)
+};
+
+/*
+ * Read AUXV software register and get cpu features for ARM
+ */
+static inline void
+rte_cpu_get_features(__attribute__((unused)) uint32_t leaf,
+	__attribute__((unused)) uint32_t subleaf, cpuid_registers_t out)
+{
+	int auxv_fd;
+	Elf32_auxv_t auxv;
+
+	auxv_fd = open("/proc/self/auxv", O_RDONLY);
+	assert(auxv_fd);
+	while (read(auxv_fd, &auxv,
+		sizeof(Elf32_auxv_t)) == sizeof(Elf32_auxv_t)) {
+		if (auxv.a_type == AT_HWCAP)
+			out[REG_HWCAP] = auxv.a_un.a_val;
+		else if (auxv.a_type == AT_HWCAP2)
+			out[REG_HWCAP2] = auxv.a_un.a_val;
+	}
+}
+
+/*
+ * Checks if a particular flag is available on current machine.
+ */
+static inline int
+rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
+{
+	const struct feature_entry *feat;
+	cpuid_registers_t regs = {0};
+
+	if (feature >= RTE_CPUFLAG_NUMFLAGS)
+		/* Flag does not match anything in the feature tables */
+		return -ENOENT;
+
+	feat = &cpu_feature_table[feature];
+
+	if (!feat->leaf)
+		/* This entry in the table wasn't filled out! */
+		return -EFAULT;
+
+	/* get the cpuid leaf containing the desired feature */
+	rte_cpu_get_features(feat->leaf, feat->subleaf, regs);
+
+	/* check if the feature is enabled */
+	return (regs[feat->reg] >> feat->bit) & 1;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_CPUFLAGS_ARM_H_ */
diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index f595cd0..bec7bdd 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -106,6 +106,12 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__builtin_vsx_xvnmaddadp),)
 CPUFLAGS += VSX
 endif
 
+# ARM flags
+ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_NEON_FP),)
+CPUFLAGS += NEON
+endif
+
+
 MACHINE_CFLAGS += $(addprefix -DRTE_MACHINE_CPUFLAG_,$(CPUFLAGS))
 
 # To strip whitespace
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 10/15] eal/arm: detect arm architecture in cpu flags
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (8 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 09/15] eal/arm: cpu flag checks for ARM Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 11/15] eal/arm: rwlock support for ARM Jan Viktorin
                       ` (4 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: dev

Based on the patch by David Hunt and Armuta Zende:

  lib: added support for armv7 architecture

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Amruta Zende <amruta.zende@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
---
v2 -> v3: fixed forgotten include of string.h
v4: checkpatch reports few characters over 80 for checking aarch64
---
 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
index 1eadb33..7ce9d14 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
@@ -41,6 +41,7 @@ extern "C" {
 #include <fcntl.h>
 #include <assert.h>
 #include <unistd.h>
+#include <string.h>
 
 #include "generic/rte_cpuflags.h"
 
@@ -52,10 +53,15 @@ extern "C" {
 #define AT_HWCAP2 26
 #endif
 
+#ifndef AT_PLATFORM
+#define AT_PLATFORM 15
+#endif
+
 /* software based registers */
 enum cpu_register_t {
 	REG_HWCAP = 0,
 	REG_HWCAP2,
+	REG_PLATFORM,
 };
 
 /**
@@ -89,6 +95,8 @@ enum rte_cpu_flag_t {
 	RTE_CPUFLAG_SHA1,
 	RTE_CPUFLAG_SHA2,
 	RTE_CPUFLAG_CRC32,
+	RTE_CPUFLAG_AARCH32,
+	RTE_CPUFLAG_AARCH64,
 	/* The last item */
 	RTE_CPUFLAG_NUMFLAGS,/**< This should always be the last! */
 };
@@ -121,6 +129,8 @@ static const struct feature_entry cpu_feature_table[] = {
 	FEAT_DEF(SHA1,      0x00000001, 0, REG_HWCAP2,  2)
 	FEAT_DEF(SHA2,      0x00000001, 0, REG_HWCAP2,  3)
 	FEAT_DEF(CRC32,     0x00000001, 0, REG_HWCAP2,  4)
+	FEAT_DEF(AARCH32,   0x00000001, 0, REG_PLATFORM, 0)
+	FEAT_DEF(AARCH64,   0x00000001, 0, REG_PLATFORM, 1)
 };
 
 /*
@@ -141,6 +151,12 @@ rte_cpu_get_features(__attribute__((unused)) uint32_t leaf,
 			out[REG_HWCAP] = auxv.a_un.a_val;
 		else if (auxv.a_type == AT_HWCAP2)
 			out[REG_HWCAP2] = auxv.a_un.a_val;
+		else if (auxv.a_type == AT_PLATFORM) {
+			if (!strcmp((const char *)auxv.a_un.a_val, "aarch32"))
+				out[REG_PLATFORM] = 0x0001;
+			else if (!strcmp((const char *)auxv.a_un.a_val, "aarch64"))
+				out[REG_PLATFORM] = 0x0002;
+		}
 	}
 }
 
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 11/15] eal/arm: rwlock support for ARM
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (9 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 10/15] eal/arm: detect arm architecture in cpu flags Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 12/15] eal/arm: add very incomplete rte_vect Jan Viktorin
                       ` (3 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: dev

Just a copy from PPC.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 .../common/include/arch/arm/rte_rwlock.h           | 40 ++++++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_rwlock.h b/lib/librte_eal/common/include/arch/arm/rte_rwlock.h
new file mode 100644
index 0000000..664bec8
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_rwlock.h
@@ -0,0 +1,40 @@
+/* copied from ppc_64 */
+
+#ifndef _RTE_RWLOCK_ARM_H_
+#define _RTE_RWLOCK_ARM_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_rwlock.h"
+
+static inline void
+rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
+{
+	rte_rwlock_read_lock(rwl);
+}
+
+static inline void
+rte_rwlock_read_unlock_tm(rte_rwlock_t *rwl)
+{
+	rte_rwlock_read_unlock(rwl);
+}
+
+static inline void
+rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
+{
+	rte_rwlock_write_lock(rwl);
+}
+
+static inline void
+rte_rwlock_write_unlock_tm(rte_rwlock_t *rwl)
+{
+	rte_rwlock_write_unlock(rwl);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RWLOCK_ARM_H_ */
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 12/15] eal/arm: add very incomplete rte_vect
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (10 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 11/15] eal/arm: rwlock support for ARM Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 13/15] gcc/arm: avoid alignment errors to break build Jan Viktorin
                       ` (2 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: dev

This patch does not map x86 SIMD operations to the ARM ones.
It just fills the necessary gap between the platforms to enable
compilation of libraries LPM (includes rte_vect.h, lpm_test needs
those SIMD functions) and ACL (includes rte_vect.h).

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
v4: checkpatch reports warning for the new typedef
---
 lib/librte_eal/common/include/arch/arm/rte_vect.h | 84 +++++++++++++++++++++++
 1 file changed, 84 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_vect.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
new file mode 100644
index 0000000..7d5de97
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of RehiveTech nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_VECT_ARM_H_
+#define _RTE_VECT_ARM_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define XMM_SIZE 16
+#define XMM_MASK (XMM_MASK - 1)
+
+typedef struct {
+	union uint128 {
+		uint8_t uint8[16];
+		uint32_t uint32[4];
+	} val;
+} __m128i;
+
+static inline __m128i
+_mm_set_epi32(uint32_t v0, uint32_t v1, uint32_t v2, uint32_t v3)
+{
+	__m128i res;
+
+	res.val.uint32[0] = v0;
+	res.val.uint32[1] = v1;
+	res.val.uint32[2] = v2;
+	res.val.uint32[3] = v3;
+	return res;
+}
+
+static inline __m128i
+_mm_loadu_si128(__m128i *v)
+{
+	__m128i res;
+
+	res = *v;
+	return res;
+}
+
+static inline __m128i
+_mm_load_si128(__m128i *v)
+{
+	__m128i res;
+
+	res = *v;
+	return res;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 13/15] gcc/arm: avoid alignment errors to break build
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (11 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 12/15] eal/arm: add very incomplete rte_vect Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 14/15] mk: Introduce ARMv7 architecture Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 15/15] maintainers: claim responsibility for ARMv7 Jan Viktorin
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: dev, Vlastimil Kosar

There several issues with alignment when compiling for ARMv7.
They are not considered to be fatal (ARMv7 supports unaligned
access of 32b words), so we just leave them as warnings. They
should be solved later, however.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
---
v4: restrict -Wno-error to the cast-align only
---
 mk/toolchain/gcc/rte.vars.mk | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
index 0f51c66..c2c5255 100644
--- a/mk/toolchain/gcc/rte.vars.mk
+++ b/mk/toolchain/gcc/rte.vars.mk
@@ -77,6 +77,12 @@ WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
 WERROR_FLAGS += -Wformat-nonliteral -Wformat-security
 WERROR_FLAGS += -Wundef -Wwrite-strings
 
+# There are many issues reported for ARMv7 architecture
+# which are not necessarily fatal. Report as warnings.
+ifeq ($(CONFIG_RTE_ARCH_ARMv7),y)
+WERROR_FLAGS += -Wno-error=cast-align
+endif
+
 # process cpu flags
 include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
 
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 14/15] mk: Introduce ARMv7 architecture
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (12 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 13/15] gcc/arm: avoid alignment errors to break build Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 15/15] maintainers: claim responsibility for ARMv7 Jan Viktorin
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: Vlastimil Kosar, dev

From: Vlastimil Kosar <kosar@rehivetech.com>

Make DPDK run on ARMv7-A architecture. This patch assumes
ARM Cortex-A9. However, it is known to be working on Cortex-A7
and Cortex-A15.

Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
v2:
* the -mtune parameter of GCC is configurable now
* the -mfpu=neon can be turned off

v3: XMM_SIZE is defined in rte_vect.h in a following patch

v4:
* update release notes for 2.2
* get rid of CONFIG_RTE_BITMAP_OPTIMIZATIONS=0 setting
* rename arm defconfig: "armv7-a" -> "arvm7a"
* disable pipeline and table modules unless lpm is fixed
---
 config/defconfig_arm-armv7a-linuxapp-gcc | 74 ++++++++++++++++++++++++++++++++
 doc/guides/rel_notes/release_2_2.rst     |  5 +++
 mk/arch/arm/rte.vars.mk                  | 39 +++++++++++++++++
 mk/machine/armv7-a/rte.vars.mk           | 67 +++++++++++++++++++++++++++++
 4 files changed, 185 insertions(+)
 create mode 100644 config/defconfig_arm-armv7a-linuxapp-gcc
 create mode 100644 mk/arch/arm/rte.vars.mk
 create mode 100644 mk/machine/armv7-a/rte.vars.mk

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
new file mode 100644
index 0000000..d623222
--- /dev/null
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -0,0 +1,74 @@
+#   BSD LICENSE
+#
+#   Copyright (C) 2015 RehiveTech. All right reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of RehiveTech nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#include "common_linuxapp"
+
+CONFIG_RTE_MACHINE="armv7-a"
+
+CONFIG_RTE_ARCH="arm"
+CONFIG_RTE_ARCH_ARM=y
+CONFIG_RTE_ARCH_ARMv7=y
+CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
+CONFIG_RTE_ARCH_ARM_NEON=y
+
+CONFIG_RTE_TOOLCHAIN="gcc"
+CONFIG_RTE_TOOLCHAIN_GCC=y
+
+# ARM doesn't have support for vmware TSC map
+CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
+
+# KNI is not supported on 32-bit
+CONFIG_RTE_LIBRTE_KNI=n
+
+# PCI is usually not used on ARM
+CONFIG_RTE_EAL_IGB_UIO=n
+
+# fails to compile on ARM
+CONFIG_RTE_LIBRTE_ACL=n
+CONFIG_RTE_LIBRTE_LPM=n
+CONFIG_RTE_LIBRTE_TABLE=n
+CONFIG_RTE_LIBRTE_PIPELINE=n
+
+# cannot use those on ARM
+CONFIG_RTE_KNI_KMOD=n
+CONFIG_RTE_LIBRTE_EM_PMD=n
+CONFIG_RTE_LIBRTE_IGB_PMD=n
+CONFIG_RTE_LIBRTE_CXGBE_PMD=n
+CONFIG_RTE_LIBRTE_E1000_PMD=n
+CONFIG_RTE_LIBRTE_ENIC_PMD=n
+CONFIG_RTE_LIBRTE_FM10K_PMD=n
+CONFIG_RTE_LIBRTE_I40E_PMD=n
+CONFIG_RTE_LIBRTE_IXGBE_PMD=n
+CONFIG_RTE_LIBRTE_MLX4_PMD=n
+CONFIG_RTE_LIBRTE_MPIPE_PMD=n
+CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
+CONFIG_RTE_LIBRTE_VMXNET3_PMD=n
+CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
+CONFIG_RTE_LIBRTE_PMD_BNX2X=n
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index be6f827..43a3a3c 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -23,6 +23,11 @@ New Features
 
 * **Added vhost-user multiple queue support.**
 
+* **Introduce ARMv7 architecture**
+
+  It is now possible to build DPDK for the ARMv7 platform and test with
+  virtual PMD drivers.
+
 
 Resolved Issues
 ---------------
diff --git a/mk/arch/arm/rte.vars.mk b/mk/arch/arm/rte.vars.mk
new file mode 100644
index 0000000..df0c043
--- /dev/null
+++ b/mk/arch/arm/rte.vars.mk
@@ -0,0 +1,39 @@
+#   BSD LICENSE
+#
+#   Copyright (C) 2015 RehiveTech. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of RehiveTech nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+ARCH  ?= arm
+CROSS ?=
+
+CPU_CFLAGS  ?= -marm -DRTE_CACHE_LINE_SIZE=64 -munaligned-access
+CPU_LDFLAGS ?=
+CPU_ASFLAGS ?= -felf
+
+export ARCH CROSS CPU_CFLAGS CPU_LDFLAGS CPU_ASFLAGS
diff --git a/mk/machine/armv7-a/rte.vars.mk b/mk/machine/armv7-a/rte.vars.mk
new file mode 100644
index 0000000..48d3979
--- /dev/null
+++ b/mk/machine/armv7-a/rte.vars.mk
@@ -0,0 +1,67 @@
+#   BSD LICENSE
+#
+#   Copyright (C) 2015 RehiveTech. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of RehiveTech nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#
+# machine:
+#
+#   - can define ARCH variable (overridden by cmdline value)
+#   - can define CROSS variable (overridden by cmdline value)
+#   - define MACHINE_CFLAGS variable (overridden by cmdline value)
+#   - define MACHINE_LDFLAGS variable (overridden by cmdline value)
+#   - define MACHINE_ASFLAGS variable (overridden by cmdline value)
+#   - can define CPU_CFLAGS variable (overridden by cmdline value) that
+#     overrides the one defined in arch.
+#   - can define CPU_LDFLAGS variable (overridden by cmdline value) that
+#     overrides the one defined in arch.
+#   - can define CPU_ASFLAGS variable (overridden by cmdline value) that
+#     overrides the one defined in arch.
+#   - may override any previously defined variable
+#
+
+# ARCH =
+# CROSS =
+# MACHINE_CFLAGS =
+# MACHINE_LDFLAGS =
+# MACHINE_ASFLAGS =
+# CPU_CFLAGS =
+# CPU_LDFLAGS =
+# CPU_ASFLAGS =
+
+CPU_CFLAGS += -mfloat-abi=softfp
+
+MACHINE_CFLAGS += -march=armv7-a
+
+ifdef CONFIG_RTE_ARCH_ARM_TUNE
+MACHINE_CFLAGS += -mtune=$(CONFIG_RTE_ARCH_ARM_TUNE)
+endif
+
+ifeq ($(CONFIG_RTE_ARCH_ARM_NEON),y)
+MACHINE_CFLAGS += -mfpu=neon
+endif
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [dpdk-dev] [PATCH v5 15/15] maintainers: claim responsibility for ARMv7
  2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
                       ` (13 preceding siblings ...)
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 14/15] mk: Introduce ARMv7 architecture Jan Viktorin
@ 2015-10-30  0:25     ` Jan Viktorin
  14 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30  0:25 UTC (permalink / raw)
  To: david.marchand, David Hunt, Thomas Monjalon; +Cc: dev

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 MAINTAINERS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 080a8e8..a8933eb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -124,6 +124,10 @@ IBM POWER
 M: Chao Zhu <chaozhu@linux.vnet.ibm.com>
 F: lib/librte_eal/common/include/arch/ppc_64/
 
+ARM v7
+M: Jan Viktorin <viktorin@rehivetech.com>
+F: lib/librte_eal/common/include/arch/arm/
+
 Intel x86
 M: Bruce Richardson <bruce.richardson@intel.com>
 M: Konstantin Ananyev <konstantin.ananyev@intel.com>
-- 
2.6.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support
  2015-10-30  0:17 ` [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support Jan Viktorin
@ 2015-10-30  8:52   ` Hunt, David
  2015-10-30 10:48     ` Jan Viktorin
  0 siblings, 1 reply; 32+ messages in thread
From: Hunt, David @ 2015-10-30  8:52 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev

On 30/10/2015 00:17, Jan Viktorin wrote:
> I've failed to compile kni/igb for ARMv8. Any ideas? Is it Linux 4.2
> compatbile?
>
>    CC [M]  /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.o
> /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c: In functi
> on ‘igb_ndo_bridge_getlink’:
> /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2279:9: er
> ror: too few arguments to function ‘ndo_dflt_bridge_getlink’
>    return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode, 0, 0, nlflags);
>           ^
> In file included from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/net/dst.h:13:0,
>                   from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/net/sock.h:67,
>                   from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/linux/tcp.h:22,
>                   from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:34:
> /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/linux/rtnetlink.h:115:12: note: declared here
>   extern int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
>              ^
> /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2286:1: error: control reaches end of non-void function [-Werror=return-type]
>   }
>   ^
> cc1: all warnings being treated as errors
> /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/scripts/Makefile.build:258: recipe for target '/home/jviki/Projects/bu
> ildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.o' failed
>
> Regards
> Jan

Jan,

To compile DPDK on kernels 4.2 and later, you need two patches submitted 
to the list last week. The ID's are

   7518 - kni-rename-HAVE_NDO_BRIDGE_GETLINK_FILTER_MASK-macro
   7519 - kni-fix-igb-build-with-kernel-4.2

And if you're on a 4.3 kernel:

   8131 - fix igb_uio's access to pci_dev->msi_list for kernels >= 4.3

Regards,
Dave.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support
  2015-10-30  8:52   ` Hunt, David
@ 2015-10-30 10:48     ` Jan Viktorin
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-10-30 10:48 UTC (permalink / raw)
  To: Hunt, David; +Cc: dev

Thanks for that hint. I am able to run it in qemu. I tried several
tests from the test suite and it works.

Jan

On Fri, 30 Oct 2015 08:52:49 +0000
"Hunt, David" <david.hunt@intel.com> wrote:

> On 30/10/2015 00:17, Jan Viktorin wrote:
> > I've failed to compile kni/igb for ARMv8. Any ideas? Is it Linux 4.2
> > compatbile?
> >
> >    CC [M]  /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.o
> > /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c: In functi
> > on ‘igb_ndo_bridge_getlink’:
> > /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2279:9: er
> > ror: too few arguments to function ‘ndo_dflt_bridge_getlink’
> >    return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode, 0, 0, nlflags);
> >           ^
> > In file included from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/net/dst.h:13:0,
> >                   from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/net/sock.h:67,
> >                   from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/linux/tcp.h:22,
> >                   from /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:34:
> > /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/include/linux/rtnetlink.h:115:12: note: declared here
> >   extern int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
> >              ^
> > /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2286:1: error: control reaches end of non-void function [-Werror=return-type]
> >   }
> >   ^
> > cc1: all warnings being treated as errors
> > /home/jviki/Projects/buildroot-armv8/qemu-armv8/build/linux-4.2/scripts/Makefile.build:258: recipe for target '/home/jviki/Projects/bu
> > ildroot-armv8/qemu-armv8/build/dpdk-armv8-hunt-v1/build/build/lib/librte_eal/linuxapp/kni/igb_main.o' failed
> >
> > Regards
> > Jan  
> 
> Jan,
> 
> To compile DPDK on kernels 4.2 and later, you need two patches submitted 
> to the list last week. The ID's are
> 
>    7518 - kni-rename-HAVE_NDO_BRIDGE_GETLINK_FILTER_MASK-macro
>    7519 - kni-fix-igb-build-with-kernel-4.2
> 
> And if you're on a 4.3 kernel:
> 
>    8131 - fix igb_uio's access to pci_dev->msi_list for kernels >= 4.3
> 
> Regards,
> Dave.
> 
> 



-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM
  2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM Jan Viktorin
@ 2015-11-02  5:53       ` Jerin Jacob
  2015-11-02 13:00         ` Jan Viktorin
  2015-11-02 13:10         ` Jan Viktorin
  0 siblings, 2 replies; 32+ messages in thread
From: Jerin Jacob @ 2015-11-02  5:53 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: Vlastimil Kosar, dev

On Fri, Oct 30, 2015 at 01:25:28AM +0100, Jan Viktorin wrote:
> From: Vlastimil Kosar <kosar@rehivetech.com>
>
> This patch adds architecture specific atomic operation file
> for ARM architecture. It utilizes compiler intrinsics only.
>
> Signed-off-by: Vlastimil Kosar <kosar@rehivetech.com>
> Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> ---
> v1 -> v2:
> * improve rte_wmb()
> * use __atomic_* or __sync_*? (may affect the required GCC version)
>
> v4:
> * checkpatch complaints about volatile keyword (but seems to be OK to me)
> * checkpatch complaints about do { ... } while (0) for single statement
>   with asm volatile (but I didn't find a way how to write it without
>   the checkpatch complaints)
> * checkpatch is now happy with whitespaces
> ---
>  .../common/include/arch/arm/rte_atomic.h           | 256 +++++++++++++++++++++
>  1 file changed, 256 insertions(+)
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic.h b/lib/librte_eal/common/include/arch/arm/rte_atomic.h
> new file mode 100644
> index 0000000..ea1e485
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic.h
> @@ -0,0 +1,256 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 RehiveTech. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of RehiveTech nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_ATOMIC_ARM_H_
> +#define _RTE_ATOMIC_ARM_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "generic/rte_atomic.h"
> +
> +/**
> + * General memory barrier.
> + *
> + * Guarantees that the LOAD and STORE operations generated before the
> + * barrier occur before the LOAD and STORE operations generated after.
> + */
> +#define	rte_mb()  __sync_synchronize()
> +
> +/**
> + * Write memory barrier.
> + *
> + * Guarantees that the STORE operations generated before the barrier
> + * occur before the STORE operations generated after.
> + */
> +#define	rte_wmb() do { asm volatile ("dmb st" : : : "memory"); } while (0)
> +
> +/**
> + * Read memory barrier.
> + *
> + * Guarantees that the LOAD operations generated before the barrier
> + * occur before the LOAD operations generated after.
> + */
> +#define	rte_rmb() __sync_synchronize()
> +

#define dmb(opt)        asm volatile("dmb " #opt : : : "memory")

static inline void rte_mb(void)
{
        dmb(ish);
}

static inline void rte_wmb(void)
{
        dmb(ishst);
}

static inline void rte_rmb(void)
{
        dmb(ishld);
}

For armv8, it make sense to have above definition for rte_*mb(). If does
n't make sense for armv7 then we need split this file rte_atomic_32/64.h



> +/*------------------------- 16 bit atomic operations -------------------------*/
> +
> +#ifndef RTE_FORCE_INTRINSICS
> +static inline int
> +rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
> +{
> +	return __atomic_compare_exchange(dst, &exp, &src, 0, __ATOMIC_ACQUIRE,
> +		__ATOMIC_ACQUIRE) ? 1 : 0;
> +}

IMO, it should be __ATOMIC_SEQ_CST be instead of __ATOMIC_ACQUIRE.
__ATOMIC_ACQUIRE works in conjunction with __ATOMIC_RELEASE.
AFAIK, DPDK atomic api expects full barrier. C11 memory model not yet
used. So why can't we use RTE_FORCE_INTRINSICS based generic
implementation. Same holds true for spinlock implementation too(i.e using
RTE_FORCE_INTRINSICS). Am I missing something here ?



> +
> +static inline int rte_atomic16_test_and_set(rte_atomic16_t *v)
> +{
> +	return rte_atomic16_cmpset((volatile uint16_t *)&v->cnt, 0, 1);
> +}
> +
> +static inline void
> +rte_atomic16_inc(rte_atomic16_t *v)
> +{
> +	__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline void
> +rte_atomic16_dec(rte_atomic16_t *v)
> +{
> +	__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline int rte_atomic16_inc_and_test(rte_atomic16_t *v)
> +{
> +	return (__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
> +}
> +
> +static inline int rte_atomic16_dec_and_test(rte_atomic16_t *v)
> +{
> +	return (__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
> +}
> +
> +/*------------------------- 32 bit atomic operations -------------------------*/
> +
> +static inline int
> +rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
> +{
> +	return __atomic_compare_exchange(dst, &exp, &src, 0, __ATOMIC_ACQUIRE,
> +		__ATOMIC_ACQUIRE) ? 1 : 0;
> +}
> +
> +static inline int rte_atomic32_test_and_set(rte_atomic32_t *v)
> +{
> +	return rte_atomic32_cmpset((volatile uint32_t *)&v->cnt, 0, 1);
> +}
> +
> +static inline void
> +rte_atomic32_inc(rte_atomic32_t *v)
> +{
> +	__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline void
> +rte_atomic32_dec(rte_atomic32_t *v)
> +{
> +	__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline int rte_atomic32_inc_and_test(rte_atomic32_t *v)
> +{
> +	return (__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
> +}
> +
> +static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
> +{
> +	return (__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
> +}
> +
> +/*------------------------- 64 bit atomic operations -------------------------*/
> +
> +static inline int
> +rte_atomic64_cmpset(volatile uint64_t *dst, uint64_t exp, uint64_t src)
> +{
> +	return __atomic_compare_exchange(dst, &exp, &src, 0, __ATOMIC_ACQUIRE,
> +		__ATOMIC_ACQUIRE) ? 1 : 0;
> +}
> +
> +static inline void
> +rte_atomic64_init(rte_atomic64_t *v)
> +{
> +	int success = 0;
> +	uint64_t tmp;
> +
> +	while (success == 0) {
> +		tmp = v->cnt;
> +		success = rte_atomic64_cmpset(
> +				(volatile uint64_t *)&v->cnt, tmp, 0);
> +	}
> +}
> +
> +static inline int64_t
> +rte_atomic64_read(rte_atomic64_t *v)
> +{
> +	int success = 0;
> +	uint64_t tmp;
> +
> +	while (success == 0) {
> +		tmp = v->cnt;
> +		/* replace the value by itself */
> +		success = rte_atomic64_cmpset(
> +				(volatile uint64_t *) &v->cnt, tmp, tmp);
> +	}
> +	return tmp;
> +}

This will be overkill for arm64. Generic implementation has __LP64__
based check for 64bit platform


> +
> +static inline void
> +rte_atomic64_set(rte_atomic64_t *v, int64_t new_value)
> +{
> +	int success = 0;
> +	uint64_t tmp;
> +
> +	while (success == 0) {
> +		tmp = v->cnt;
> +		success = rte_atomic64_cmpset(
> +				(volatile uint64_t *)&v->cnt, tmp, new_value);
> +	}
> +}
> +
> +static inline void
> +rte_atomic64_add(rte_atomic64_t *v, int64_t inc)
> +{
> +	__atomic_fetch_add(&v->cnt, inc, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline void
> +rte_atomic64_sub(rte_atomic64_t *v, int64_t dec)
> +{
> +	__atomic_fetch_sub(&v->cnt, dec, __ATOMIC_ACQUIRE);
> +}
> +

__atomic_fetch_* operations on 64bit works only when compiler support
(__GCC_ATOMIC_LLONG_LOCK_FREE >=2).

if DPDK API's expects full barrier not the C11 memory model based
__ATOMIC_ACQUIRE then better to use generic implementation.

> +static inline void
> +rte_atomic64_inc(rte_atomic64_t *v)
> +{
> +	__atomic_fetch_add(&v->cnt, 1, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline void
> +rte_atomic64_dec(rte_atomic64_t *v)
> +{
> +	__atomic_fetch_sub(&v->cnt, 1, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline int64_t
> +rte_atomic64_add_return(rte_atomic64_t *v, int64_t inc)
> +{
> +	return __atomic_add_fetch(&v->cnt, inc, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline int64_t
> +rte_atomic64_sub_return(rte_atomic64_t *v, int64_t dec)
> +{
> +	return __atomic_sub_fetch(&v->cnt, dec, __ATOMIC_ACQUIRE);
> +}
> +
> +static inline int rte_atomic64_inc_and_test(rte_atomic64_t *v)
> +{
> +	return (__atomic_add_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
> +}
> +
> +static inline int rte_atomic64_dec_and_test(rte_atomic64_t *v)
> +{
> +	return (__atomic_sub_fetch(&v->cnt, 1, __ATOMIC_ACQUIRE) == 0);
> +}
> +
> +static inline int rte_atomic64_test_and_set(rte_atomic64_t *v)
> +{
> +	return rte_atomic64_cmpset((volatile uint64_t *)&v->cnt, 0, 1);
> +}
> +
> +/**
> + * Atomically set a 64-bit counter to 0.
> + *
> + * @param v
> + *   A pointer to the atomic counter.
> + */
> +static inline void rte_atomic64_clear(rte_atomic64_t *v)
> +{
> +	rte_atomic64_set(v, 0);
> +}
> +#endif
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_ATOMIC_ARM_H_ */
> --
> 2.6.1
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM
  2015-11-02  5:53       ` Jerin Jacob
@ 2015-11-02 13:00         ` Jan Viktorin
  2015-11-02 13:10         ` Jan Viktorin
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-11-02 13:00 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Vlastimil Kosar, dev

On Mon, 2 Nov 2015 11:23:05 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

--snip--
> > +/*------------------------- 16 bit atomic operations -------------------------*/
> > +
> > +#ifndef RTE_FORCE_INTRINSICS
> > +static inline int
> > +rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
> > +{
> > +	return __atomic_compare_exchange(dst, &exp, &src, 0, __ATOMIC_ACQUIRE,
> > +		__ATOMIC_ACQUIRE) ? 1 : 0;
> > +}  
> 
> IMO, it should be __ATOMIC_SEQ_CST be instead of __ATOMIC_ACQUIRE.
> __ATOMIC_ACQUIRE works in conjunction with __ATOMIC_RELEASE.
> AFAIK, DPDK atomic api expects full barrier. C11 memory model not yet
> used.

Seems to be reasonable, thanks.

> So why can't we use RTE_FORCE_INTRINSICS based generic
> implementation. Same holds true for spinlock implementation too(i.e using
> RTE_FORCE_INTRINSICS). Am I missing something here ?

True. This was done with the intention to rewrite as a platform-specific
assembly. But it's never been done yet... If you mean to set
RTE_FORCE_INTRINSICS=y in the defconfig and remove this code entirely
(at least for ARMv7), I would agree.

> 
> 
> 
> > +
> > +static inline int rte_atomic16_test_and_set(rte_atomic16_t *v)
> > +{
> > +	return rte_atomic16_cmpset((volatile uint16_t *)&v->cnt, 0, 1);
> > +}
--snip--

-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM
  2015-11-02  5:53       ` Jerin Jacob
  2015-11-02 13:00         ` Jan Viktorin
@ 2015-11-02 13:10         ` Jan Viktorin
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Viktorin @ 2015-11-02 13:10 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Vlastimil Kosar, dev

On Mon, 2 Nov 2015 11:23:05 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

--snip--
> > +#ifndef _RTE_ATOMIC_ARM_H_
> > +#define _RTE_ATOMIC_ARM_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "generic/rte_atomic.h"
> > +
> > +/**
> > + * General memory barrier.
> > + *
> > + * Guarantees that the LOAD and STORE operations generated before the
> > + * barrier occur before the LOAD and STORE operations generated after.
> > + */
> > +#define	rte_mb()  __sync_synchronize()
> > +
> > +/**
> > + * Write memory barrier.
> > + *
> > + * Guarantees that the STORE operations generated before the barrier
> > + * occur before the STORE operations generated after.
> > + */
> > +#define	rte_wmb() do { asm volatile ("dmb st" : : : "memory"); } while (0)
> > +
> > +/**
> > + * Read memory barrier.
> > + *
> > + * Guarantees that the LOAD operations generated before the barrier
> > + * occur before the LOAD operations generated after.
> > + */
> > +#define	rte_rmb() __sync_synchronize()
> > +  
> 
> #define dmb(opt)        asm volatile("dmb " #opt : : : "memory")
> 
> static inline void rte_mb(void)
> {
>         dmb(ish);
> }
> 
> static inline void rte_wmb(void)
> {
>         dmb(ishst);
> }
> 
> static inline void rte_rmb(void)
> {
>         dmb(ishld);

I cannot see this option in the doc for ARMv7
(http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0588b/CIHGHHIE.html).

> }
> 
> For armv8, it make sense to have above definition for rte_*mb().

If it is OK to restrict the barriers to the inner-domain then OK. Quite
frankly, I don't know.

> If doesn't make sense for armv7 then we need split this file rte_atomic_32/64.h
> 
> 


-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-11-02 13:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-29 17:29 [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support David Hunt
2015-10-29 17:29 ` [dpdk-dev] [PATCH 1/5] eal: split arm rte_memcpy.h into 32-bit and 64-bit versions David Hunt
2015-10-29 17:29 ` [dpdk-dev] [PATCH 2/5] eal: split arm rte_prefetch.h " David Hunt
2015-10-29 17:29 ` [dpdk-dev] [PATCH 3/5] eal: fix compilation for armv8 64-bit David Hunt
2015-10-29 17:38   ` Jan Viktorin
2015-10-29 17:29 ` [dpdk-dev] [PATCH 4/5] mk: add support for armv8 on top of armv7 David Hunt
2015-10-29 17:39   ` Jan Viktorin
2015-10-29 17:42   ` Jan Viktorin
2015-10-29 17:29 ` [dpdk-dev] [PATCH 5/5] test: add checks for cpu flags on armv8 David Hunt
2015-10-29 18:27 ` [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support Thomas Monjalon
2015-10-30  0:25   ` [dpdk-dev] [PATCH v5 00/15] Support ARMv7 architecture Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 01/15] eal/arm: atomic operations for ARM Jan Viktorin
2015-11-02  5:53       ` Jerin Jacob
2015-11-02 13:00         ` Jan Viktorin
2015-11-02 13:10         ` Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 02/15] eal/arm: byte order " Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 03/15] eal/arm: cpu cycle " Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 04/15] eal/arm: implement rdtsc by PMU or clock_gettime Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 05/15] eal/arm: prefetch operations for ARM Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 06/15] eal/arm: spinlock operations for ARM (without HTM) Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 07/15] eal/arm: vector memcpy for ARM Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 08/15] eal/arm: use vector memcpy only when NEON is enabled Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 09/15] eal/arm: cpu flag checks for ARM Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 10/15] eal/arm: detect arm architecture in cpu flags Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 11/15] eal/arm: rwlock support for ARM Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 12/15] eal/arm: add very incomplete rte_vect Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 13/15] gcc/arm: avoid alignment errors to break build Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 14/15] mk: Introduce ARMv7 architecture Jan Viktorin
2015-10-30  0:25     ` [dpdk-dev] [PATCH v5 15/15] maintainers: claim responsibility for ARMv7 Jan Viktorin
2015-10-30  0:17 ` [dpdk-dev] [PATCH 0/5] ARMv8 additions to ARMv7 support Jan Viktorin
2015-10-30  8:52   ` Hunt, David
2015-10-30 10:48     ` Jan Viktorin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).