From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from we2-f167.wedos.net (w-smtp-out-7.wedos.net [46.28.106.5]) by dpdk.org (Postfix) with ESMTP id 254CB8E71 for ; Sat, 3 Oct 2015 10:58:52 +0200 (CEST) Received: from ([147.229.13.147]) by we2-f167.wedos.net (WEDOS Mail Server mail2) with ASMTP (SSL) id QWJ00150; Sat, 03 Oct 2015 10:58:50 +0200 From: Jan Viktorin To: dev@dpdk.org Date: Sat, 3 Oct 2015 10:58:06 +0200 Message-Id: X-Mailer: git-send-email 2.6.0 Cc: Jan Viktorin Subject: [dpdk-dev] [PATCH v1 00/12] Support for ARM(v7) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Oct 2015 08:58:52 -0000 Dear DPDK community, I am proposing a patch series with support of the ARMv7 architecture for DPDK. The patch series does not introduce any PMD driver. It is possible to compile it, boot it and test it with some virtual PMD (eg. pcap). It is rebased on top of v2.1.0. All but the last two patches (11, 12) are quite staightforward and usually based on the ppc_64 architecture. Notes: * we test on Cortex-A9 (mostly Xilinx Zynq at the moment) * atomic operations and spinlocks are implemented by (GCC) intrinsics * cpu cycle is implemented by clock_gettime because there is no standard 64-bit counter available * we have to set -Wno-error to pass the build process because there are quite a lot of alignment problems reported (we didn't find any real issues so far) The last two patches (11, 12) are not to be merged into mainline. They are just a temporary workaround for the two libraries (ACL, LPM) which heavily utilizes the SSE... It is not possible to easily convert the SSE calls to the NEON SIMD operations. ============ It is important to note that the current Linux Kernel does not contain the support for huge tables for non-LPAE ARM architectures (Cortex-A9). There is a patch available on the Internet but it is not going to be merged for now (4/2014): http://thread.gmane.org/gmane.linux.kernel.mm/115788 We ported this patch to 3.18 and it can improve the performance. Here follow results for our tests of several algorithms showing the execution time reduction: CPU median 3x3 - 0.2 % NEON median 3x3 - 19.5 % Random read - 0.0 % Random write - 6.2 % Matrix multiplication - 31.0 % NEON copy - 4.2 % ============ We are working on the PMD + kernel-support part. At the moment, we have a working PMD for Xilinx Zynq's EMAC. However, it uses some dirty features. We have to rethink it a bit before going to the mainline. We are facing some problems during the implementation (some are already being solved in the mailing-list): * rte_eth_dev is defined as a PCI device. As ARMs are SoCs with integrated EMAC on the chip and an external phyter, we need a different approach. There can be an ARM computer with PCI-E but then you put there a network card and use a different kind of driver (but this is not very common at the moment). * ARM does not have coherent memory for DMA transfers. It is possible to allocate non-cachable memory (DMA transfers can be as fast as possible) but it slows down the payload processing on CPU. For this purpose, we have to call dma_map/unmap_* in kernel. A custom kernel driver is needed and it should not be the UIO because it is quite limited (almost non-extendable mmap, no support for custom ioctl and write). * We are not going to put the PHY layer into userspace, so it will stay in the kernel. There is also a need for the CLK control (clock gating) in the PMD. Regards Jan Viktorin Jan Viktorin (2): eal/arm: rwlock support for ARM gcc/arm: avoid alignment errors to break build Vlastimil Kosar (10): mk: Introduce ARMv7 architecture eal/arm: atomic operations for ARM eal/arm: byte order operations for ARM eal/arm: cpu cycle operations for ARM eal/arm: prefetch operations for ARM eal/arm: spinlock operations for ARM (without HTM) eal/arm: vector memcpy for ARM eal/arm: cpu flag checks for ARM lpm/arm: implement rte_lpm_lookupx4 using rte_lpm_lookup_bulk on for-x86 arm: Disable usage of SSE optimized code in librte_acl app/test/test_cpuflags.c | 5 + config/defconfig_arm-armv7-a-linuxapp-gcc | 72 ++++++ lib/librte_acl/acl.h | 2 + lib/librte_acl/rte_acl.c | 8 +- lib/librte_acl/rte_acl_osdep.h | 2 + .../common/include/arch/arm/rte_atomic.h | 257 ++++++++++++++++++++ .../common/include/arch/arm/rte_byteorder.h | 148 +++++++++++ .../common/include/arch/arm/rte_cpuflags.h | 169 +++++++++++++ .../common/include/arch/arm/rte_cycles.h | 85 +++++++ .../common/include/arch/arm/rte_memcpy.h | 270 +++++++++++++++++++++ .../common/include/arch/arm/rte_prefetch.h | 61 +++++ .../common/include/arch/arm/rte_rwlock.h | 40 +++ .../common/include/arch/arm/rte_spinlock.h | 114 +++++++++ lib/librte_lpm/rte_lpm.h | 71 ++++++ mk/arch/arm/rte.vars.mk | 39 +++ mk/machine/armv7-a/rte.vars.mk | 60 +++++ mk/rte.cpuflags.mk | 6 + mk/toolchain/gcc/rte.vars.mk | 6 + 18 files changed, 1414 insertions(+), 1 deletion(-) create mode 100644 config/defconfig_arm-armv7-a-linuxapp-gcc create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h create mode 100644 mk/arch/arm/rte.vars.mk create mode 100644 mk/machine/armv7-a/rte.vars.mk -- 2.5.2