DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/2] librte_net: add crc computation support
@ 2017-02-24 20:54 Jasvinder Singh
  2017-02-24 20:54 ` [dpdk-dev] [PATCH 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
  2017-02-24 20:54 ` [dpdk-dev] [PATCH " Jasvinder Singh
  0 siblings, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-02-24 20:54 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty

In some applications, CRC (Cyclic Redundancy Check) needs to be computed or
updated during packet processing operations. This patchset adds software
implementation of some common standard CRCs (32-bit Ethernet CRC as per
Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]). Two
versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm that
uses precomputed 256 element table as explained in reference[2].

Following APIs have been added;

(i) rte_net_crc_init()   
(ii)rte_net_crc_calc()

The first API (i) initalises the data structures required for CRC computation
and this api should be used only once in the application before using second
API (ii) for 16-bit and 32-bit CRC calculations.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

Jasvinder Singh (2):
  librte_net: add crc init and compute APIs
  app/test: add unit test for CRC computation

 app/test/Makefile                  |   2 +
 app/test/test_crc.c                | 229 +++++++++++++
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/rte_net_crc.c       | 657 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 101 ++++++
 lib/librte_net/rte_net_version.map |   8 +
 6 files changed, 999 insertions(+)
 create mode 100644 app/test/test_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH 1/2] librte_net: add crc init and compute APIs
  2017-02-24 20:54 [dpdk-dev] [PATCH 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-02-24 20:54 ` Jasvinder Singh
  2017-02-28 12:08   ` [dpdk-dev] [PATCH v2 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-02-24 20:54 ` [dpdk-dev] [PATCH " Jasvinder Singh
  1 sibling, 1 reply; 69+ messages in thread
From: Jasvinder Singh @ 2017-02-24 20:54 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty

APIs for initalising and computing the crc (16-bit and 32-bit CRCs) are
added. For CRCs calculation, scalar as well as x86 intrinsic(sse4.2)
versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm, while
x86 intrinsic version uses carry-less multiplication method for fast CRC
computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/rte_net_crc.c       | 657 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 101 ++++++
 lib/librte_net/rte_net_version.map |   8 +
 4 files changed, 768 insertions(+)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h

diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 20cf664..41be751 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,11 +39,13 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf
 
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..592431c
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,657 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <stddef.h>
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_CRC RTE_LOGTYPE_USER1
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+typedef int (*rte_net_crc_handler)(struct rte_net_crc_params *);
+
+static int rte_crc16_ccitt_handler(struct rte_net_crc_params *p);
+static int rte_crc32_eth_handler(struct rte_net_crc_params *p);
+static int rte_crc_invalid_handler(struct rte_net_crc_params *p);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+	[RTE_NET_CRC_REQS] = rte_crc_invalid_handler,
+};
+
+int
+rte_crc_invalid_handler(__rte_unused struct rte_net_crc_params *p)
+{
+	RTE_LOG(ERR, CRC, "CRC type not supported!\n");
+	return -1;	/* Error */
+}
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+
+#include <cpuid.h>
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block 16 byte data block
+ * @param precomp precomputed rk1 constanst
+ * @param fold running 16 byte folded data
+ *
+ * @return New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(const __m128i data_block,
+		const __m128i precomp,
+		const __m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * @brief Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128 128 bits data to be reduced
+ * @param precomp rk5 and rk6 precomputed constants
+ *
+ * @return data reduced to 64 bits
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128,
+	const __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * @brief Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64 64 bits data to be reduced
+ * @param precomp rk7 precomputed constant
+ *
+ * @return data reduced to 32 bits
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64,
+	const __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+/**
+ * @brief Computes constant for CLMUL algorithm
+ *
+ * Result is: X^exp mod poly
+ *
+ * @param poly polynomial
+ * @param exp exponent
+ *
+ * @return constant value
+ */
+
+static inline uint32_t
+get_poly_constant(const uint32_t poly, const uint32_t exp)
+{
+	uint32_t i, res = poly;
+
+	for (i = 32; i < exp; i++)
+		if (res & 0x80000000)
+			res = (res << 1) ^ poly;
+		else
+			res = (res << 1);
+
+	return res;
+}
+
+/**
+ * @brief Calculates quotient and reminder of X^64 / P(X)
+ *
+ * @param poly P(X)
+ * @param q_ptr place to store quotient
+ * @param r_ptr place to store reminder
+ */
+static inline void
+div_poly(const uint64_t poly,
+	uint64_t *q_ptr,
+	uint64_t *r_ptr)
+{
+	uint64_t p = 0, q = 0, r = 0;
+	int i;
+
+	p = poly | 0x100000000ULL;
+
+	r = p;
+	r = r << 32;
+
+	i = 32;
+	do {
+		uint64_t one_shl_n = 0;
+
+		q = q << 1;
+		if ((i + 32) < 64)
+			one_shl_n = 1ULL << (32 + i);
+
+		if (r & one_shl_n) {
+			r ^= (p << i);
+			q |= 1;
+		}
+		i--;
+	} while (i >= 0);
+
+	if (q_ptr != NULL)
+		*q_ptr = q;
+
+	if (r_ptr != NULL)
+		*r_ptr = r;
+}
+
+/**
+ * @brief Reflects selected group of bits in \a v
+ *
+ * @param v value to be reflected
+ * @param n size of the bit field to be reflected
+ *
+ * @return bit reflected value
+ */
+static uint64_t
+reflect(uint64_t v, const uint32_t n)
+{
+	uint32_t i;
+	uint64_t r = 0;
+
+	for (i = 0; i < n; i++) {
+		if (i != 0) {
+			r <<= 1;
+			v >>= 1;
+		}
+		r |= (v & 1);
+	}
+
+	return r;
+}
+
+const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * @brief Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg 128 bit value
+ * @param num number of bytes to shift left reg by (0-16)
+ *
+ * @return reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+/**
+ * @brief Initializes CRC computation context structure for given polynomial
+ *
+ * @param pctx plcmulqdq CRC computation context structure to be initialized
+ * @param poly CRC polynomial
+ */
+static inline __attribute__((always_inline)) int
+crc32_eth_init_pclmulqdq(
+	struct crc_pclmulqdq_ctx *pctx,
+	const uint64_t poly)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	if (pctx == NULL)
+		return -1;
+
+	k1 = get_poly_constant(poly, 128 - 32);
+	k2 = get_poly_constant(poly, 128 + 32);
+	k5 = get_poly_constant(poly, 96);
+	k6 = get_poly_constant(poly, 64);
+
+	div_poly(poly, &q, NULL);
+	q = q & 0xffffffff;			/** quotient X^64 / P(X) */
+	p = poly | 0x100000000ULL;	/** P(X) */
+
+	k1 = reflect(k1 << 32, 64) << 1;
+	k2 = reflect(k2 << 32, 64) << 1;
+	k5 = reflect(k5 << 32, 64) << 1;
+	k6 = reflect(k6 << 32, 64) << 1;
+	q = reflect(q, 33);
+	p = reflect(p, 33);
+
+	/** Save the params in context structure */
+	pctx->rk1_rk2 = _mm_setr_epi64(_m_from_int64(k1), _m_from_int64(k2));
+	pctx->rk5_rk6 = _mm_setr_epi64(_m_from_int64(k5), _m_from_int64(k6));
+	pctx->rk7_rk8 = _mm_setr_epi64(_m_from_int64(q), _m_from_int64(p));
+
+	return 0;
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	if (unlikely(data == NULL))
+		return crc;
+
+	if (unlikely(data_len == 0))
+		return crc;
+
+	if (unlikely(params == NULL))
+		return crc;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	  * Folding all data into single 16 byte data block
+	  * Assumes: fold holds first 16 bytes of data
+	  */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static int
+rte_net_crc_sse42_init(void)
+{
+	int status = 0;
+
+	/** Initialize CRC functions */
+	status = crc32_eth_init_pclmulqdq(&crc16_ccitt_pclmulqdq,
+		CRC16_CCITT_POLYNOMIAL << 16);
+	if (status == -1)
+		return -1;
+
+	status = crc32_eth_init_pclmulqdq(&crc32_eth_pclmulqdq,
+		CRC32_ETH_POLYNOMIAL);
+	if (status == -1)
+		return -1;
+
+	_mm_empty();
+
+	return 0;
+}
+
+static inline int
+rte_crc16_ccitt_sse42_handler(struct rte_net_crc_params *p)
+{
+	uint16_t ret;
+	const uint8_t *data =
+		rte_pktmbuf_mtod_offset(p->mbuf, uint8_t *, p->data_offset);
+
+	ret = (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		p->data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+
+	return ret;
+}
+
+static inline int
+rte_crc32_eth_sse42_handler(struct rte_net_crc_params *p)
+{
+	uint32_t ret;
+	const uint8_t *data =
+		rte_pktmbuf_mtod_offset(p->mbuf, uint8_t *, p->data_offset);
+
+	ret = ~crc32_eth_calc_pclmulqdq(data,
+		p->data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+
+	return ret;
+}
+
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+	[RTE_NET_CRC_REQS] = rte_crc_invalid_handler,
+};
+
+#endif
+
+/** Local data */
+static uint32_t crc32_eth_lut[256];
+static uint32_t crc16_ccitt_lut[256];
+
+/**
+ * @brief Reflect the bits about the middle
+ *
+ * @param x value to be reflected
+ *
+ * @return reflected value
+ */
+static uint32_t
+reflect_32bits(const uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static int
+crc32_eth_init_lut(const uint32_t poly,
+	uint32_t *lut)
+{
+	uint_fast32_t i, j;
+
+	if (lut == NULL)
+		return -1;
+
+	for (i = 0; i < 256; i++) {
+		uint_fast32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+	lut[i] = reflect_32bits(crc);
+	}
+
+	return 0;
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	if (unlikely(data == NULL || lut == NULL))
+		return crc;
+
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static int
+rte_net_crc_scalar_init(void)
+{
+	int status = 0;
+
+	/** 32-bit crc init */
+	status = crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL,
+		crc32_eth_lut);
+	if (status == -1)
+		return -1;
+
+	/** 16-bit CRC init */
+	status = crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16,
+		crc16_ccitt_lut);
+	if (status == -1)
+		return -1;
+
+	return 0;
+}
+
+static inline int
+rte_crc16_ccitt_handler(struct rte_net_crc_params *p)
+{
+	uint16_t ret;
+	const uint8_t *data =
+		rte_pktmbuf_mtod_offset(p->mbuf, uint8_t *, p->data_offset);
+
+	ret = (uint16_t)~crc32_eth_calc_lut(data,
+		p->data_len,
+		0xffff,
+		crc16_ccitt_lut);
+
+	return ret;
+}
+
+static inline int
+rte_crc32_eth_handler(struct rte_net_crc_params *p)
+{
+	uint32_t ret;
+	const uint8_t *data =
+		rte_pktmbuf_mtod_offset(p->mbuf, uint8_t *, p->data_offset);
+
+	ret = ~crc32_eth_calc_lut(data,
+		p->data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+
+	return ret;
+}
+
+int
+rte_net_crc_init(enum rte_net_crc_mode m)
+{
+	int status;
+
+	switch (m) {
+
+	case RTE_NET_CRC_SSE42:
+		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+
+			status = rte_net_crc_sse42_init();
+			handlers = handlers_sse42;
+
+		} else {
+			RTE_LOG(ERR, CRC,
+				"bad configuration(SSE4.2 not supported!)\n %s",
+				__func__);
+			status = -1;
+		}
+		return status;
+
+	case RTE_NET_CRC_SCALAR:
+	default:
+		status = rte_net_crc_scalar_init();
+		handlers = handlers_scalar;
+
+		return status;
+	}
+}
+
+int
+rte_net_crc_calc(struct rte_net_crc_params *p)
+{
+	int ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[p->type];
+	ret = f_handle(p);
+
+	return ret;
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..accbeea
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,101 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute mode */
+enum rte_net_crc_mode {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+	RTE_NET_CRC_DEFAULT
+};
+
+/** CRC calc APIs params */
+struct rte_net_crc_params {
+	struct rte_mbuf *mbuf;		/**< packet mbuf */
+	uint32_t data_offset;		/**< offset to the data */
+	uint32_t data_len;			/**< length of the data */
+	enum rte_net_crc_type type;	/**< crc type */
+};
+
+/**
+ * CRC Initialisation API
+ *
+ *  This API should be called only once to initialise the internal crc
+ *  data structue before using CRC compute API.
+ *
+ * @param crc_mode
+ *   crc compute mode
+ *
+ * @return
+ *   0 on success, -1 otherwise
+ */
+
+int
+rte_net_crc_init(enum rte_net_crc_mode m);
+
+/**
+ * CRC compute API
+ *
+ * @param
+ *  structure rte_net_crc_params
+ *
+ * @return
+ *   crc value
+ */
+
+int
+rte_net_crc_calc(struct rte_net_crc_params *p);
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..d391834 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_init;
+	rte_net_crc_calc;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH 2/2] app/test: add unit test for CRC computation
  2017-02-24 20:54 [dpdk-dev] [PATCH 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-02-24 20:54 ` [dpdk-dev] [PATCH 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
@ 2017-02-24 20:54 ` Jasvinder Singh
  1 sibling, 0 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-02-24 20:54 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty

This patch provides a set of unit tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 app/test/Makefile   |   2 +
 app/test/test_crc.c | 229 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 231 insertions(+)
 create mode 100644 app/test/test_crc.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 1a5e03d..2a497f7 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/app/test/test_crc.c b/app/test/test_crc.c
new file mode 100644
index 0000000..17c9161
--- /dev/null
+++ b/app/test/test_crc.c
@@ -0,0 +1,229 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <rte_malloc.h>
+
+#include "test.h"
+
+#define MBUF_DATA_SIZE          2048
+#define NB_MBUF                 64
+#define CRC_VEC_LEN				32
+#define CRC32_VEC_LEN1			1512
+#define CRC32_VEC_LEN2			348
+#define CRC16_VEC_LEN1			12
+#define CRC16_VEC_LEN2			2
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1*/
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2*/
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(struct rte_mempool *mp,
+	const uint8_t *vec,
+	uint32_t vec_len,
+	uint32_t crc_r,
+	enum rte_net_crc_type type)
+{
+	struct rte_net_crc_params p;
+	uint8_t *data;
+
+	/* alloc first mbuf from the mempool */
+	p.mbuf = rte_pktmbuf_alloc(mp);
+	if (p.mbuf == NULL) {
+		printf("rte_pktmbuf_alloc() failed!\n");
+		return -1;
+	}
+
+	/* copy parameters */
+	p.type = type;
+	p.data_offset = 0;
+	p.data_len = vec_len;
+
+	/* append data length to an mbuf */
+	data = (uint8_t *)rte_pktmbuf_append(p.mbuf, p.data_len);
+
+	/* copy ref_vector */
+	rte_memcpy(data, vec, p.data_len);
+
+	/* dump mbuf data on console*/
+	rte_pktmbuf_dump(stdout, p.mbuf, p.data_len);
+
+	/* compute CRC */
+	int ret = rte_net_crc_calc(&p);
+
+	if (crc_r != (uint32_t) ret) {
+		rte_pktmbuf_free(p.mbuf);
+		return ret;
+	}
+
+	/* free mbuf */
+	rte_pktmbuf_free(p.mbuf);
+
+	return 0;
+}
+
+static int
+test_crc(void) {
+	struct rte_mempool *pktmbuf_pool = NULL;
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	int ret = 0;
+
+	/* init crc */
+	if (rte_net_crc_init(RTE_NET_CRC_SCALAR) == -1) {
+		printf("test_crc: rte_net_crc_init() failed!\n");
+		return -1;
+	}
+
+	/* create pktmbuf pool */
+	pktmbuf_pool = rte_pktmbuf_pool_create("pktmbuf_pool",
+			NB_MBUF, 32, 0, MBUF_DATA_SIZE, SOCKET_ID_ANY);
+
+	if (pktmbuf_pool == NULL) {
+		printf("test_crc: cannot allocate mbuf pool!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	ret = crc_calc(pktmbuf_pool,
+		crc_vec,
+		CRC_VEC_LEN,
+		crc32_vec_res,
+		type);
+	if (ret) {
+		printf("test_crc(32-bit): test1 failed!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(pktmbuf_pool,
+		test_data,
+		CRC32_VEC_LEN1,
+		crc32_vec1_res,
+		type);
+	if (ret) {
+		printf("test_crc(32-bit): test2 failed!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(pktmbuf_pool,
+		test_data,
+		CRC32_VEC_LEN2,
+		crc32_vec2_res,
+		type);
+	if (ret) {
+		printf("test_crc(32-bit): test3 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(pktmbuf_pool,
+		crc_vec,
+		CRC_VEC_LEN,
+		crc16_vec_res,
+		type);
+	if (ret) {
+		printf("test_crc (16-bit): test4 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 5 */
+	ret = crc_calc(pktmbuf_pool,
+		crc16_vec1,
+		CRC16_VEC_LEN1,
+		crc16_vec1_res,
+		type);
+	if (ret) {
+		printf("test_crc (16-bit): test5 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 6 */
+	ret = crc_calc(pktmbuf_pool,
+		crc16_vec2,
+		CRC16_VEC_LEN2,
+		crc16_vec2_res,
+		type);
+	if (ret) {
+		printf("test_crc (16-bit): test6 failed!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v2 0/2] librte_net: add crc computation support
  2017-02-24 20:54 ` [dpdk-dev] [PATCH 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
@ 2017-02-28 12:08   ` Jasvinder Singh
  2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
  2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 " Jasvinder Singh
  0 siblings, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-02-28 12:08 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty

In some applications, CRC (Cyclic Redundancy Check) needs to be computed or
updated during packet processing operations. This patchset adds software
implementation of some common standard CRCs (32-bit Ethernet CRC as per
Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]). Two
versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm that
uses precomputed 256 element table as explained in reference[2].

Following APIs have been added;

(i) rte_net_crc_init()   
(ii)rte_net_crc_calc()

The first API (i) initalises the data structures required for CRC computation
and this api should be used only once in the application before using second
API (ii) for 16-bit and 32-bit CRC calculations.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (2):
  librte_net: add crc init and compute APIs
  app/test: add unit test for CRC computation

 app/test/Makefile                  |   2 +
 app/test/test_crc.c                | 229 +++++++++++++
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/rte_net_crc.c       | 664 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 101 ++++++
 lib/librte_net/rte_net_version.map |   8 +
 6 files changed, 1006 insertions(+)
 create mode 100644 app/test/test_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-02-28 12:08   ` [dpdk-dev] [PATCH v2 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-02-28 12:08     ` Jasvinder Singh
  2017-02-28 12:15       ` Jerin Jacob
                         ` (2 more replies)
  2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 " Jasvinder Singh
  1 sibling, 3 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-02-28 12:08 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty

APIs for initialising and computing the crc (16-bit and 32-bit CRCs)
are added. For CRCs calculation, scalar as well as x86 intrinsic(sse4.2)
versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/rte_net_crc.c       | 664 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 101 ++++++
 lib/librte_net/rte_net_version.map |   8 +
 4 files changed, 775 insertions(+)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h

diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 20cf664..41be751 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,11 +39,13 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf
 
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..78a49dd
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,664 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <stddef.h>
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_CRC RTE_LOGTYPE_USER1
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+typedef int (*rte_net_crc_handler)(struct rte_net_crc_params *);
+
+static int rte_crc16_ccitt_handler(struct rte_net_crc_params *p);
+static int rte_crc32_eth_handler(struct rte_net_crc_params *p);
+static int rte_crc_invalid_handler(struct rte_net_crc_params *p);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+	[RTE_NET_CRC_REQS] = rte_crc_invalid_handler,
+};
+
+int
+rte_crc_invalid_handler(__rte_unused struct rte_net_crc_params *p)
+{
+	RTE_LOG(ERR, CRC, "CRC type not supported!\n");
+	return -1;	/* Error */
+}
+
+#if defined RTE_ARCH_X86_64 && defined RTE_MACHINE_CPUFLAG_SSE4_2
+
+#include <cpuid.h>
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block 16 byte data block
+ * @param precomp precomputed rk1 constanst
+ * @param fold running 16 byte folded data
+ *
+ * @return New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(const __m128i data_block,
+		const __m128i precomp,
+		const __m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128 128 bits data to be reduced
+ * @param precomp rk5 and rk6 precomputed constants
+ *
+ * @return data reduced to 64 bits
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128,
+	const __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64 64 bits data to be reduced
+ * @param precomp rk7 precomputed constant
+ *
+ * @return data reduced to 32 bits
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64,
+	const __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+/**
+ * Computes constant for CLMUL algorithm
+ *
+ * Result is: X^exp mod poly
+ *
+ * @param poly polynomial
+ * @param exp exponent
+ *
+ * @return constant value
+ */
+
+static inline uint32_t
+get_poly_constant(const uint32_t poly, const uint32_t exp)
+{
+	uint32_t i, res = poly;
+
+	for (i = 32; i < exp; i++)
+		if (res & 0x80000000)
+			res = (res << 1) ^ poly;
+		else
+			res = (res << 1);
+
+	return res;
+}
+
+/**
+ * Calculates quotient and reminder of X^64 / P(X)
+ *
+ * @param poly P(X)
+ * @param q_ptr place to store quotient
+ * @param r_ptr place to store reminder
+ */
+static inline void
+div_poly(const uint64_t poly,
+	uint64_t *q_ptr,
+	uint64_t *r_ptr)
+{
+	uint64_t p = 0, q = 0, r = 0;
+	int i;
+
+	p = poly | 0x100000000ULL;
+
+	r = p;
+	r = r << 32;
+
+	i = 32;
+	do {
+		uint64_t one_shl_n = 0;
+
+		q = q << 1;
+		if ((i + 32) < 64)
+			one_shl_n = 1ULL << (32 + i);
+
+		if (r & one_shl_n) {
+			r ^= (p << i);
+			q |= 1;
+		}
+		i--;
+	} while (i >= 0);
+
+	if (q_ptr != NULL)
+		*q_ptr = q;
+
+	if (r_ptr != NULL)
+		*r_ptr = r;
+}
+
+/**
+ * Reflects selected group of bits in \a v
+ *
+ * @param v value to be reflected
+ * @param n size of the bit field to be reflected
+ *
+ * @return bit reflected value
+ */
+static uint64_t
+reflect(uint64_t v, const uint32_t n)
+{
+	uint32_t i;
+	uint64_t r = 0;
+
+	for (i = 0; i < n; i++) {
+		if (i != 0) {
+			r <<= 1;
+			v >>= 1;
+		}
+		r |= (v & 1);
+	}
+
+	return r;
+}
+
+const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg 128 bit value
+ * @param num number of bytes to shift left reg by (0-16)
+ *
+ * @return reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+/**
+ * Initializes CRC computation context structure for given polynomial
+ *
+ * @param pctx plcmulqdq CRC computation context structure to be initialized
+ * @param poly CRC polynomial
+ *
+ * return 0 on success, -1 otherwise
+ */
+static inline __attribute__((always_inline)) int
+crc32_eth_init_pclmulqdq(
+	struct crc_pclmulqdq_ctx *pctx,
+	const uint64_t poly)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	if (pctx == NULL)
+		return -1;
+
+	k1 = get_poly_constant(poly, 128 - 32);
+	k2 = get_poly_constant(poly, 128 + 32);
+	k5 = get_poly_constant(poly, 96);
+	k6 = get_poly_constant(poly, 64);
+
+	div_poly(poly, &q, NULL);
+	q = q & 0xffffffff;			/** quotient X^64 / P(X) */
+	p = poly | 0x100000000ULL;	/** P(X) */
+
+	k1 = reflect(k1 << 32, 64) << 1;
+	k2 = reflect(k2 << 32, 64) << 1;
+	k5 = reflect(k5 << 32, 64) << 1;
+	k6 = reflect(k6 << 32, 64) << 1;
+	q = reflect(q, 33);
+	p = reflect(p, 33);
+
+	/** Save the params in context structure */
+	pctx->rk1_rk2 = _mm_setr_epi64(_m_from_int64(k1), _m_from_int64(k2));
+	pctx->rk5_rk6 = _mm_setr_epi64(_m_from_int64(k5), _m_from_int64(k6));
+	pctx->rk7_rk8 = _mm_setr_epi64(_m_from_int64(q), _m_from_int64(p));
+
+	return 0;
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	if (unlikely(data == NULL))
+		return crc;
+
+	if (unlikely(data_len == 0))
+		return crc;
+
+	if (unlikely(params == NULL))
+		return crc;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static int
+rte_net_crc_sse42_init(void)
+{
+	int status = 0;
+
+	/** Initialize CRC functions */
+	status = crc32_eth_init_pclmulqdq(&crc16_ccitt_pclmulqdq,
+		CRC16_CCITT_POLYNOMIAL << 16);
+	if (status == -1)
+		return -1;
+
+	status = crc32_eth_init_pclmulqdq(&crc32_eth_pclmulqdq,
+		CRC32_ETH_POLYNOMIAL);
+	if (status == -1)
+		return -1;
+
+	_mm_empty();
+
+	return 0;
+}
+
+static inline int
+rte_crc16_ccitt_sse42_handler(struct rte_net_crc_params *p)
+{
+	uint16_t ret;
+	const uint8_t *data =
+		rte_pktmbuf_mtod_offset(p->mbuf, uint8_t *, p->data_offset);
+
+	ret = (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		p->data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+
+	return ret;
+}
+
+static inline int
+rte_crc32_eth_sse42_handler(struct rte_net_crc_params *p)
+{
+	uint32_t ret;
+	const uint8_t *data =
+		rte_pktmbuf_mtod_offset(p->mbuf, uint8_t *, p->data_offset);
+
+	ret = ~crc32_eth_calc_pclmulqdq(data,
+		p->data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+
+	return ret;
+}
+
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+	[RTE_NET_CRC_REQS] = rte_crc_invalid_handler,
+};
+
+#endif
+
+/** Local data */
+static uint32_t crc32_eth_lut[256];
+static uint32_t crc16_ccitt_lut[256];
+
+/**
+ * @brief Reflect the bits about the middle
+ *
+ * @param x value to be reflected
+ *
+ * @return reflected value
+ */
+static uint32_t
+reflect_32bits(const uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static int
+crc32_eth_init_lut(const uint32_t poly,
+	uint32_t *lut)
+{
+	uint_fast32_t i, j;
+
+	if (lut == NULL)
+		return -1;
+
+	for (i = 0; i < 256; i++) {
+		uint_fast32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+	lut[i] = reflect_32bits(crc);
+	}
+
+	return 0;
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	if (unlikely(data == NULL || lut == NULL))
+		return crc;
+
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static int
+rte_net_crc_scalar_init(void)
+{
+	int status = 0;
+
+	/** 32-bit crc init */
+	status = crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL,
+		crc32_eth_lut);
+	if (status == -1)
+		return -1;
+
+	/** 16-bit CRC init */
+	status = crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16,
+		crc16_ccitt_lut);
+	if (status == -1)
+		return -1;
+
+	return 0;
+}
+
+static inline int
+rte_crc16_ccitt_handler(struct rte_net_crc_params *p)
+{
+	uint16_t ret;
+	const uint8_t *data =
+		rte_pktmbuf_mtod_offset(p->mbuf, uint8_t *, p->data_offset);
+
+	ret = (uint16_t)~crc32_eth_calc_lut(data,
+		p->data_len,
+		0xffff,
+		crc16_ccitt_lut);
+
+	return ret;
+}
+
+static inline int
+rte_crc32_eth_handler(struct rte_net_crc_params *p)
+{
+	uint32_t ret;
+	const uint8_t *data =
+		rte_pktmbuf_mtod_offset(p->mbuf, uint8_t *, p->data_offset);
+
+	ret = ~crc32_eth_calc_lut(data,
+		p->data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+
+	return ret;
+}
+
+int
+rte_net_crc_init(enum rte_net_crc_mode m)
+{
+	int status;
+
+	switch (m) {
+
+	case RTE_NET_CRC_SSE42:
+#ifdef RTE_ARCH_X86_64
+		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+			status = rte_net_crc_sse42_init();
+			handlers = handlers_sse42;
+		} else {
+			status = -1;
+			RTE_LOG(ERR, CRC,
+			"bad configuration(SSE4.2 not supported!)\n %s",
+			__func__);
+		}
+
+		return status;
+#else
+		RTE_LOG(ERR, CRC,
+			"bad configuration(x86_64 architecure not supported!)\n %s",
+			__func__);
+		return -1;
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		status = rte_net_crc_scalar_init();
+		handlers = handlers_scalar;
+
+		return status;
+	}
+}
+
+int
+rte_net_crc_calc(struct rte_net_crc_params *p)
+{
+	int ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[p->type];
+	ret = f_handle(p);
+
+	return ret;
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..accbeea
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,101 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute mode */
+enum rte_net_crc_mode {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+	RTE_NET_CRC_DEFAULT
+};
+
+/** CRC calc APIs params */
+struct rte_net_crc_params {
+	struct rte_mbuf *mbuf;		/**< packet mbuf */
+	uint32_t data_offset;		/**< offset to the data */
+	uint32_t data_len;			/**< length of the data */
+	enum rte_net_crc_type type;	/**< crc type */
+};
+
+/**
+ * CRC Initialisation API
+ *
+ *  This API should be called only once to initialise the internal crc
+ *  data structue before using CRC compute API.
+ *
+ * @param crc_mode
+ *   crc compute mode
+ *
+ * @return
+ *   0 on success, -1 otherwise
+ */
+
+int
+rte_net_crc_init(enum rte_net_crc_mode m);
+
+/**
+ * CRC compute API
+ *
+ * @param
+ *  structure rte_net_crc_params
+ *
+ * @return
+ *   crc value
+ */
+
+int
+rte_net_crc_calc(struct rte_net_crc_params *p);
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..d391834 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_init;
+	rte_net_crc_calc;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v2 2/2] app/test: add unit test for CRC computation
  2017-02-28 12:08   ` [dpdk-dev] [PATCH v2 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
@ 2017-02-28 12:08     ` Jasvinder Singh
  1 sibling, 0 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-02-28 12:08 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 app/test/Makefile   |   2 +
 app/test/test_crc.c | 229 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 231 insertions(+)
 create mode 100644 app/test/test_crc.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 1a5e03d..2a497f7 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/app/test/test_crc.c b/app/test/test_crc.c
new file mode 100644
index 0000000..17c9161
--- /dev/null
+++ b/app/test/test_crc.c
@@ -0,0 +1,229 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <rte_malloc.h>
+
+#include "test.h"
+
+#define MBUF_DATA_SIZE          2048
+#define NB_MBUF                 64
+#define CRC_VEC_LEN				32
+#define CRC32_VEC_LEN1			1512
+#define CRC32_VEC_LEN2			348
+#define CRC16_VEC_LEN1			12
+#define CRC16_VEC_LEN2			2
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1*/
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2*/
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(struct rte_mempool *mp,
+	const uint8_t *vec,
+	uint32_t vec_len,
+	uint32_t crc_r,
+	enum rte_net_crc_type type)
+{
+	struct rte_net_crc_params p;
+	uint8_t *data;
+
+	/* alloc first mbuf from the mempool */
+	p.mbuf = rte_pktmbuf_alloc(mp);
+	if (p.mbuf == NULL) {
+		printf("rte_pktmbuf_alloc() failed!\n");
+		return -1;
+	}
+
+	/* copy parameters */
+	p.type = type;
+	p.data_offset = 0;
+	p.data_len = vec_len;
+
+	/* append data length to an mbuf */
+	data = (uint8_t *)rte_pktmbuf_append(p.mbuf, p.data_len);
+
+	/* copy ref_vector */
+	rte_memcpy(data, vec, p.data_len);
+
+	/* dump mbuf data on console*/
+	rte_pktmbuf_dump(stdout, p.mbuf, p.data_len);
+
+	/* compute CRC */
+	int ret = rte_net_crc_calc(&p);
+
+	if (crc_r != (uint32_t) ret) {
+		rte_pktmbuf_free(p.mbuf);
+		return ret;
+	}
+
+	/* free mbuf */
+	rte_pktmbuf_free(p.mbuf);
+
+	return 0;
+}
+
+static int
+test_crc(void) {
+	struct rte_mempool *pktmbuf_pool = NULL;
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	int ret = 0;
+
+	/* init crc */
+	if (rte_net_crc_init(RTE_NET_CRC_SCALAR) == -1) {
+		printf("test_crc: rte_net_crc_init() failed!\n");
+		return -1;
+	}
+
+	/* create pktmbuf pool */
+	pktmbuf_pool = rte_pktmbuf_pool_create("pktmbuf_pool",
+			NB_MBUF, 32, 0, MBUF_DATA_SIZE, SOCKET_ID_ANY);
+
+	if (pktmbuf_pool == NULL) {
+		printf("test_crc: cannot allocate mbuf pool!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	ret = crc_calc(pktmbuf_pool,
+		crc_vec,
+		CRC_VEC_LEN,
+		crc32_vec_res,
+		type);
+	if (ret) {
+		printf("test_crc(32-bit): test1 failed!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(pktmbuf_pool,
+		test_data,
+		CRC32_VEC_LEN1,
+		crc32_vec1_res,
+		type);
+	if (ret) {
+		printf("test_crc(32-bit): test2 failed!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(pktmbuf_pool,
+		test_data,
+		CRC32_VEC_LEN2,
+		crc32_vec2_res,
+		type);
+	if (ret) {
+		printf("test_crc(32-bit): test3 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(pktmbuf_pool,
+		crc_vec,
+		CRC_VEC_LEN,
+		crc16_vec_res,
+		type);
+	if (ret) {
+		printf("test_crc (16-bit): test4 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 5 */
+	ret = crc_calc(pktmbuf_pool,
+		crc16_vec1,
+		CRC16_VEC_LEN1,
+		crc16_vec1_res,
+		type);
+	if (ret) {
+		printf("test_crc (16-bit): test5 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 6 */
+	ret = crc_calc(pktmbuf_pool,
+		crc16_vec2,
+		CRC16_VEC_LEN2,
+		crc16_vec2_res,
+		type);
+	if (ret) {
+		printf("test_crc (16-bit): test6 failed!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
@ 2017-02-28 12:15       ` Jerin Jacob
  2017-03-01 18:46       ` Thomas Monjalon
  2017-03-12 21:33       ` [dpdk-dev] [PATCH v3 0/2] librte_net: add crc computation support Jasvinder Singh
  2 siblings, 0 replies; 69+ messages in thread
From: Jerin Jacob @ 2017-02-28 12:15 UTC (permalink / raw)
  To: Jasvinder Singh; +Cc: dev, declan.doherty

On Tue, Feb 28, 2017 at 12:08:20PM +0000, Jasvinder Singh wrote:
> APIs for initialising and computing the crc (16-bit and 32-bit CRCs)
> are added. For CRCs calculation, scalar as well as x86 intrinsic(sse4.2)
> versions are implemented.
> 
> The scalar version is based on generic Look-Up Table(LUT) algorithm,
> while x86 intrinsic version uses carry-less multiplication for
> fast CRC computation.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> ---
>  lib/librte_net/Makefile            |   2 +
>  lib/librte_net/rte_net_crc.c       | 664 +++++++++++++++++++++++++++++++++++++
>  lib/librte_net/rte_net_crc.h       | 101 ++++++
>  lib/librte_net/rte_net_version.map |   8 +
>  4 files changed, 775 insertions(+)
>  create mode 100644 lib/librte_net/rte_net_crc.c
>  create mode 100644 lib/librte_net/rte_net_crc.h
> 
> diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
> index 20cf664..41be751 100644
> --- a/lib/librte_net/Makefile
> +++ b/lib/librte_net/Makefile
> @@ -39,11 +39,13 @@ EXPORT_MAP := rte_net_version.map
>  LIBABIVER := 1
>  
>  SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
> +SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
>  
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf
>  
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
> new file mode 100644
> index 0000000..78a49dd
> --- /dev/null
> +++ b/lib/librte_net/rte_net_crc.c
> @@ -0,0 +1,664 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_net_crc.h>
> +#include <stddef.h>
> +
> +/* Macros for printing using RTE_LOG */
> +#define RTE_LOGTYPE_CRC RTE_LOGTYPE_USER1
> +
> +/** CRC polynomials */
> +#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
> +#define CRC16_CCITT_POLYNOMIAL 0x1021U
> +
> +typedef int (*rte_net_crc_handler)(struct rte_net_crc_params *);
> +
> +static int rte_crc16_ccitt_handler(struct rte_net_crc_params *p);
> +static int rte_crc32_eth_handler(struct rte_net_crc_params *p);
> +static int rte_crc_invalid_handler(struct rte_net_crc_params *p);
> +
> +static rte_net_crc_handler *handlers;
> +
> +static rte_net_crc_handler handlers_scalar[] = {
> +	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
> +	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
> +	[RTE_NET_CRC_REQS] = rte_crc_invalid_handler,
> +};
> +
> +int
> +rte_crc_invalid_handler(__rte_unused struct rte_net_crc_params *p)
> +{
> +	RTE_LOG(ERR, CRC, "CRC type not supported!\n");
> +	return -1;	/* Error */
> +}
> +
> +#if defined RTE_ARCH_X86_64 && defined RTE_MACHINE_CPUFLAG_SSE4_2

Could you please abstract the vector function and move the SSE implementation to
separate file for future clean neon and altivec integration.

Reference: lib/librte_lpm/rte_lpm_sse.h

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
  2017-02-28 12:15       ` Jerin Jacob
@ 2017-03-01 18:46       ` Thomas Monjalon
  2017-03-02 13:03         ` Singh, Jasvinder
  2017-03-12 21:33       ` [dpdk-dev] [PATCH v3 0/2] librte_net: add crc computation support Jasvinder Singh
  2 siblings, 1 reply; 69+ messages in thread
From: Thomas Monjalon @ 2017-03-01 18:46 UTC (permalink / raw)
  To: Jasvinder Singh; +Cc: dev, declan.doherty

2017-02-28 12:08, Jasvinder Singh:
>  lib/librte_net/rte_net_crc.c       | 664 +++++++++++++++++++++++++++++++++++++
>  lib/librte_net/rte_net_crc.h       | 101 ++++++

I think it should be in librte_hash.

Please check lib/librte_hash/rte_hash_crc.h

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-03-01 18:46       ` Thomas Monjalon
@ 2017-03-02 13:03         ` Singh, Jasvinder
  2017-03-06 15:27           ` Thomas Monjalon
  0 siblings, 1 reply; 69+ messages in thread
From: Singh, Jasvinder @ 2017-03-02 13:03 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Doherty, Declan

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, March 1, 2017 6:46 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>
> Cc: dev@dpdk.org; Doherty, Declan <declan.doherty@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute
> APIs
> 
> 2017-02-28 12:08, Jasvinder Singh:
> >  lib/librte_net/rte_net_crc.c       | 664
> +++++++++++++++++++++++++++++++++++++
> >  lib/librte_net/rte_net_crc.h       | 101 ++++++
> 
> I think it should be in librte_hash.
> 
> Please check lib/librte_hash/rte_hash_crc.h

Is it good to include payload crc calculation in hash library as I see all hash related functionality there?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-03-02 13:03         ` Singh, Jasvinder
@ 2017-03-06 15:27           ` Thomas Monjalon
  2017-03-08 11:08             ` De Lara Guarch, Pablo
  0 siblings, 1 reply; 69+ messages in thread
From: Thomas Monjalon @ 2017-03-06 15:27 UTC (permalink / raw)
  To: Singh, Jasvinder; +Cc: dev, Doherty, Declan, Pablo DeLara Guarch

2017-03-02 13:03, Singh, Jasvinder:
> Hi Thomas,
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > Sent: Wednesday, March 1, 2017 6:46 PM
> > To: Singh, Jasvinder <jasvinder.singh@intel.com>
> > Cc: dev@dpdk.org; Doherty, Declan <declan.doherty@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute
> > APIs
> > 
> > 2017-02-28 12:08, Jasvinder Singh:
> > >  lib/librte_net/rte_net_crc.c       | 664
> > +++++++++++++++++++++++++++++++++++++
> > >  lib/librte_net/rte_net_crc.h       | 101 ++++++
> > 
> > I think it should be in librte_hash.
> > 
> > Please check lib/librte_hash/rte_hash_crc.h
> 
> Is it good to include payload crc calculation in hash library as I see all hash related functionality there?

I think yes. Pablo?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-03-06 15:27           ` Thomas Monjalon
@ 2017-03-08 11:08             ` De Lara Guarch, Pablo
  2017-03-15 17:35               ` Thomas Monjalon
  0 siblings, 1 reply; 69+ messages in thread
From: De Lara Guarch, Pablo @ 2017-03-08 11:08 UTC (permalink / raw)
  To: Thomas Monjalon, Singh, Jasvinder; +Cc: dev, Doherty, Declan



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Monday, March 06, 2017 3:28 PM
> To: Singh, Jasvinder
> Cc: dev@dpdk.org; Doherty, Declan; De Lara Guarch, Pablo
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute
> APIs
> 
> 2017-03-02 13:03, Singh, Jasvinder:
> > Hi Thomas,
> >
> > > -----Original Message-----
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > Sent: Wednesday, March 1, 2017 6:46 PM
> > > To: Singh, Jasvinder <jasvinder.singh@intel.com>
> > > Cc: dev@dpdk.org; Doherty, Declan <declan.doherty@intel.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and
> compute
> > > APIs
> > >
> > > 2017-02-28 12:08, Jasvinder Singh:
> > > >  lib/librte_net/rte_net_crc.c       | 664
> > > +++++++++++++++++++++++++++++++++++++
> > > >  lib/librte_net/rte_net_crc.h       | 101 ++++++
> > >
> > > I think it should be in librte_hash.
> > >
> > > Please check lib/librte_hash/rte_hash_crc.h
> >
> > Is it good to include payload crc calculation in hash library as I see all hash
> related functionality there?
> 
> I think yes. Pablo?

I think this doesn't belong in the hash library. These new functions calculate CRC, but not as a hash function.
Yes, CRC can be used as hash function (in fact, it is used as such in the hash library,
the CRC32C version, and I assume that's why it is in there), but its use is much broader
(its main purpose is not to be a hash function, but for data error detection, for any data).

Therefore, I would suggest either creating a separate library for this, if we want to use this as a broader use,
or leave it in net library, if we want to focus on calculating CRC for Ethernet frames.

Regarding to the CRC that we have in the hash library, if we go for a separate library,
we could move that function there, but then it would have to follow the function prototype of a hash function,
defined in the hash library. 

Thanks,
Pablo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v3 0/2] librte_net: add crc computation support
  2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
  2017-02-28 12:15       ` Jerin Jacob
  2017-03-01 18:46       ` Thomas Monjalon
@ 2017-03-12 21:33       ` Jasvinder Singh
  2017-03-12 21:33         ` [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-12 21:33         ` [dpdk-dev] [PATCH v3 " Jasvinder Singh
  2 siblings, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-12 21:33 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty, pablo.de.lara.guarch

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3019 bytes --]

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v3 changes:
- separate the x86 specific implementation into new file  
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (2):
  librte_net: add crc compute APIs
  app/test: add unit test for CRC computation

 app/test/Makefile                  |   2 +
 app/test/test_crc.c                | 265 ++++++++++++++++++++++++++++
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 205 ++++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 104 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 351 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 7 files changed, 938 insertions(+)
 create mode 100644 app/test/test_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs
  2017-03-12 21:33       ` [dpdk-dev] [PATCH v3 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-03-12 21:33         ` Jasvinder Singh
  2017-03-13  3:06           ` Ananyev, Konstantin
  2017-03-20 19:29           ` [dpdk-dev] [PATCH v4 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-12 21:33         ` [dpdk-dev] [PATCH v3 " Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-12 21:33 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 205 ++++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 104 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 351 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 5 files changed, 671 insertions(+)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 20cf664..39ff1cc 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,11 +39,14 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc_sse.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf
 
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..c460ab0
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,205 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <stddef.h>
+
+/** crc tables */
+static uint32_t crc32_eth_lut[256];
+static uint32_t crc16_ccitt_lut[256];
+
+static uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#if defined(RTE_ARCH_X86_64) || defined(RTE_CPU_FALGS_SSE_4_2)
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param x value to be reflected
+ *
+ * @return reflected value
+ */
+static uint32_t
+reflect_32bits(const uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(const uint32_t poly,
+	uint32_t *lut)
+{
+	uint_fast32_t i, j;
+
+	for (i = 0; i < 256; i++) {
+		uint_fast32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+	lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	if (unlikely(data == NULL || lut == NULL))
+		return crc;
+
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/** 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/** 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+
+	case RTE_NET_CRC_SSE42:
+#ifdef RTE_ARCH_X86_64
+		if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = RTE_NET_CRC_SCALAR;
+		else {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(struct rte_mbuf *mbuf,
+	uint32_t data_offset,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	const uint8_t *data =
+		(const uint8_t *) rte_pktmbuf_mtod_offset(mbuf,
+			uint8_t *,
+			data_offset);
+
+	f_handle = handlers[type];
+	ret = f_handle(data, data_len);
+
+	return ret;
+}
+
+/*
+ * Select highest available crc algorithm as default one.
+ */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef RTE_ARCH_X86_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..26b6406
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,104 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ * CRC Initialisation API
+ *
+ *  This API set the crc computation algorithm and internal crc
+ *  data structue.
+ *
+ * @param crc_algortihm
+ *   crc compute algorithm
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param mbuf
+ *  Pointer to the packet mbuf for crc computation
+ * @param data_offset
+ *  Offset to the data in the mbuf
+ * @param data_len
+ *  length of the data to compute the crc on
+ * @param type
+ *  crc type
+ *
+ * @return
+ *  computed crc value
+ */
+uint32_t
+rte_net_crc_calc(struct rte_mbuf *mbuf,
+	uint32_t data_offset,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#if defined(RTE_ARCH_X86_64) || defined(RTE_CPU_FALGS_SSE_4_2)
+#include <rte_net_crc_sse.h>
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_crc_sse.h b/lib/librte_net/rte_net_crc_sse.h
new file mode 100644
index 0000000..332b95b
--- /dev/null
+++ b/lib/librte_net/rte_net_crc_sse.h
@@ -0,0 +1,351 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <cpuid.h>
+#include <rte_net_crc.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block 16 byte data block
+ * @param precomp precomputed rk1 constanst
+ * @param fold running 16 byte folded data
+ *
+ * @return New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(const __m128i data_block,
+		const __m128i precomp,
+		const __m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128 128 bits data to be reduced
+ * @param precomp rk5 and rk6 precomputed constants
+ *
+ * @return data reduced to 64 bits
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128,
+	const __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64 64 bits data to be reduced
+ * @param precomp rk7 precomputed constant
+ *
+ * @return data reduced to 32 bits
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64,
+	const __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg 128 bit value
+ * @param num number of bytes to shift left reg by (0-16)
+ *
+ * @return reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	if (unlikely(data == NULL))
+		return crc;
+
+	if (unlikely(data_len == 0))
+		return crc;
+
+	if (unlikely(params == NULL))
+		return crc;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1),	_mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5),	_mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1),	_mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
+{
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..c6716ec 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_set_alg;
+	rte_net_crc_calc;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v3 2/2] app/test: add unit test for CRC computation
  2017-03-12 21:33       ` [dpdk-dev] [PATCH v3 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-12 21:33         ` [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-12 21:33         ` Jasvinder Singh
  1 sibling, 0 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-12 21:33 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 app/test/Makefile   |   2 +
 app/test/test_crc.c | 265 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 267 insertions(+)
 create mode 100644 app/test/test_crc.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 1a5e03d..2a497f7 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/app/test/test_crc.c b/app/test/test_crc.c
new file mode 100644
index 0000000..6cc0364
--- /dev/null
+++ b/app/test/test_crc.c
@@ -0,0 +1,265 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <rte_malloc.h>
+
+#include "test.h"
+
+#define MBUF_DATA_SIZE          2048
+#define NB_MBUF                 64
+#define CRC_VEC_LEN				32
+#define CRC32_VEC_LEN1			1512
+#define CRC32_VEC_LEN2			348
+#define CRC16_VEC_LEN1			12
+#define CRC16_VEC_LEN2			2
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1*/
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2*/
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+uint32_t crc32_vec_res, crc32_vec1_res, crc32_vec2_res;
+uint32_t crc16_vec_res, crc16_vec1_res, crc16_vec2_res;
+
+static int
+crc_calc(struct rte_mempool *mp,
+	const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	uint8_t *data;
+	struct rte_mbuf *mbuf;
+
+	/* alloc first mbuf from the mempool */
+	mbuf = rte_pktmbuf_alloc(mp);
+	if (mbuf == NULL) {
+		printf("rte_pktmbuf_alloc() failed!\n");
+		return -1;
+	}
+
+	/* append data length to an mbuf */
+	data = (uint8_t *)rte_pktmbuf_append(mbuf, vec_len);
+
+	/* copy ref_vector */
+	rte_memcpy(data, vec, vec_len);
+
+	/* dump mbuf data on console*/
+	rte_pktmbuf_dump(stdout, mbuf, vec_len);
+
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(mbuf, 0, vec_len, type);
+
+	/* free mbuf */
+	rte_pktmbuf_free(mbuf);
+
+	return  ret;
+}
+
+static void
+crc_calc_scalar(struct rte_mempool *mp)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+
+	/* 32-bit ethernet CRC: scalar result */
+	type = RTE_NET_CRC32_ETH;
+	crc32_vec_res = crc_calc(mp,
+						crc_vec,
+						CRC_VEC_LEN,
+						type);
+
+	/* 32-bit ethernet CRC scalar result*/
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	crc32_vec1_res = crc_calc(mp,
+						test_data,
+						CRC32_VEC_LEN1,
+						type);
+
+	/* 32-bit ethernet CRC scalar result */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	crc32_vec2_res = crc_calc(mp,
+						test_data,
+						CRC32_VEC_LEN2,
+						type);
+
+	/* 16-bit CCITT CRC scalar result */
+	type = RTE_NET_CRC16_CCITT;
+	crc16_vec_res = crc_calc(mp,
+						crc_vec,
+						CRC_VEC_LEN,
+						type);
+
+	/* 16-bit CCITT CRC scalar result */
+	crc16_vec1_res = crc_calc(mp,
+						crc16_vec1,
+						CRC16_VEC_LEN1,
+						type);
+
+	/* 16-bit CCITT CRC scalar result*/
+	crc16_vec2_res = crc_calc(mp,
+						crc16_vec2,
+						CRC16_VEC_LEN2,
+						type);
+}
+
+static int
+test_crc(void) {
+	struct rte_mempool *pktmbuf_pool = NULL;
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t ret;
+
+	/* create pktmbuf pool */
+	pktmbuf_pool = rte_pktmbuf_pool_create("pktmbuf_pool",
+			NB_MBUF, 32, 0, MBUF_DATA_SIZE, SOCKET_ID_ANY);
+
+	if (pktmbuf_pool == NULL) {
+		printf("test_crc: cannot allocate mbuf pool!\n");
+		return -1;
+	}
+
+	/* set crc scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+	crc_calc_scalar(pktmbuf_pool);
+
+	/* set crc sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	ret = crc_calc(pktmbuf_pool,
+		crc_vec,
+		CRC_VEC_LEN,
+		type);
+	if (ret != crc32_vec_res) {
+		printf("test_crc(32-bit): test1 failed !\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(pktmbuf_pool,
+		test_data,
+		CRC32_VEC_LEN1,
+		type);
+	if (ret != crc32_vec1_res) {
+		printf("test_crc(32-bit): test2 failed!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(pktmbuf_pool,
+		test_data,
+		CRC32_VEC_LEN2,
+		type);
+	if (ret != crc32_vec2_res) {
+		printf("test_crc(32-bit): test3 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(pktmbuf_pool,
+		crc_vec,
+		CRC_VEC_LEN,
+		type);
+	if (ret != crc16_vec_res) {
+		printf("test_crc (16-bit): test4 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 5 */
+	ret = crc_calc(pktmbuf_pool,
+		crc16_vec1,
+		CRC16_VEC_LEN1,
+		type);
+	if (ret != crc16_vec1_res) {
+		printf("test_crc (16-bit): test5 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 6 */
+	ret = crc_calc(pktmbuf_pool,
+		crc16_vec2,
+		CRC16_VEC_LEN2,
+		type);
+	if (ret != crc16_vec2_res) {
+		printf("test_crc (16-bit): test6 failed!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs
  2017-03-12 21:33         ` [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-13  3:06           ` Ananyev, Konstantin
  2017-03-13  9:05             ` Singh, Jasvinder
  2017-03-20 19:29           ` [dpdk-dev] [PATCH v4 0/2] librte_net: add crc computation support Jasvinder Singh
  1 sibling, 1 reply; 69+ messages in thread
From: Ananyev, Konstantin @ 2017-03-13  3:06 UTC (permalink / raw)
  To: Singh, Jasvinder, dev; +Cc: Doherty, Declan, De Lara Guarch, Pablo

Hi Jasvinder,

> 
> APIs for selecting the architecure specific implementation and computing
> the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
> as well as x86 intrinsic(sse4.2) versions are implemented.
> 
> The scalar version is based on generic Look-Up Table(LUT) algorithm,
> while x86 intrinsic version uses carry-less multiplication for
> fast CRC computation.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> ---
>  lib/librte_net/Makefile            |   3 +
>  lib/librte_net/rte_net_crc.c       | 205 ++++++++++++++++++++++
>  lib/librte_net/rte_net_crc.h       | 104 +++++++++++
>  lib/librte_net/rte_net_crc_sse.h   | 351 +++++++++++++++++++++++++++++++++++++
>  lib/librte_net/rte_net_version.map |   8 +
>  5 files changed, 671 insertions(+)
>  create mode 100644 lib/librte_net/rte_net_crc.c
>  create mode 100644 lib/librte_net/rte_net_crc.h
>  create mode 100644 lib/librte_net/rte_net_crc_sse.h
> 
> +
> +/**
> + * CRC compute API
> + *
> + * @param mbuf
> + *  Pointer to the packet mbuf for crc computation
> + * @param data_offset
> + *  Offset to the data in the mbuf
> + * @param data_len
> + *  length of the data to compute the crc on
> + * @param type
> + *  crc type
> + *
> + * @return
> + *  computed crc value
> + */
> +uint32_t
> +rte_net_crc_calc(struct rte_mbuf *mbuf,
> +	uint32_t data_offset,
> +	uint32_t data_len,
> +	enum rte_net_crc_type type);


I think it will probably be convenient to have this API not dependent on mbuf,
something like:
 
uint32_t rte_net_crc_calc(const void *buf, uint32_t data_len, enum rte_net_crc_type type);

And if we like to have an extra function that would do similar thing for mbuf,
that's fine too, but I suppose such function would have to handle multi-segment case too.
Konstantin


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs
  2017-03-13  3:06           ` Ananyev, Konstantin
@ 2017-03-13  9:05             ` Singh, Jasvinder
  0 siblings, 0 replies; 69+ messages in thread
From: Singh, Jasvinder @ 2017-03-13  9:05 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: Doherty, Declan, De Lara Guarch, Pablo

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, March 13, 2017 3:06 AM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; dev@dpdk.org
> Cc: Doherty, Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs
> 
> Hi Jasvinder,
> 
> >
> > APIs for selecting the architecure specific implementation and
> > computing the crc (16-bit and 32-bit CRCs) are added. For CRCs
> > calculation, scalar as well as x86 intrinsic(sse4.2) versions are implemented.
> >
> > The scalar version is based on generic Look-Up Table(LUT) algorithm,
> > while x86 intrinsic version uses carry-less multiplication for fast
> > CRC computation.
> >
> > Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> > ---
> >  lib/librte_net/Makefile            |   3 +
> >  lib/librte_net/rte_net_crc.c       | 205 ++++++++++++++++++++++
> >  lib/librte_net/rte_net_crc.h       | 104 +++++++++++
> >  lib/librte_net/rte_net_crc_sse.h   | 351
> +++++++++++++++++++++++++++++++++++++
> >  lib/librte_net/rte_net_version.map |   8 +
> >  5 files changed, 671 insertions(+)
> >  create mode 100644 lib/librte_net/rte_net_crc.c  create mode 100644
> > lib/librte_net/rte_net_crc.h  create mode 100644
> > lib/librte_net/rte_net_crc_sse.h
> >
> > +
> > +/**
> > + * CRC compute API
> > + *
> > + * @param mbuf
> > + *  Pointer to the packet mbuf for crc computation
> > + * @param data_offset
> > + *  Offset to the data in the mbuf
> > + * @param data_len
> > + *  length of the data to compute the crc on
> > + * @param type
> > + *  crc type
> > + *
> > + * @return
> > + *  computed crc value
> > + */
> > +uint32_t
> > +rte_net_crc_calc(struct rte_mbuf *mbuf,
> > +	uint32_t data_offset,
> > +	uint32_t data_len,
> > +	enum rte_net_crc_type type);
> 
> 
> I think it will probably be convenient to have this API not dependent on
> mbuf, something like:
> 
> uint32_t rte_net_crc_calc(const void *buf, uint32_t data_len, enum
> rte_net_crc_type type);
> 
> And if we like to have an extra function that would do similar thing for mbuf,
> that's fine too, but I suppose such function would have to handle multi-
> segment case too.
> Konstantin

I will change the api as you suggested. Thanks.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-03-08 11:08             ` De Lara Guarch, Pablo
@ 2017-03-15 17:35               ` Thomas Monjalon
  2017-03-15 19:03                 ` Dumitrescu, Cristian
  2017-03-15 19:09                 ` Dumitrescu, Cristian
  0 siblings, 2 replies; 69+ messages in thread
From: Thomas Monjalon @ 2017-03-15 17:35 UTC (permalink / raw)
  To: De Lara Guarch, Pablo; +Cc: Singh, Jasvinder, dev, Doherty, Declan

2017-03-08 11:08, De Lara Guarch, Pablo:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2017-03-02 13:03, Singh, Jasvinder:
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > 2017-02-28 12:08, Jasvinder Singh:
> > > > >  lib/librte_net/rte_net_crc.c       | 664 +++++++++++++++++++++++++++++++++++++
> > > > >  lib/librte_net/rte_net_crc.h       | 101 ++++++
> > > >
> > > > I think it should be in librte_hash.
> > > >
> > > > Please check lib/librte_hash/rte_hash_crc.h
> > >
> > > Is it good to include payload crc calculation in hash library as I see all hash
> > related functionality there?
> > 
> > I think yes. Pablo?
> 
> I think this doesn't belong in the hash library. These new functions calculate CRC, but not as a hash function.

Can't we say that a CRC is a hash? What is a hash?
A function generating the same output bytes from given input bytes.

I think you must separate hash functions and hash table management.

> Yes, CRC can be used as hash function (in fact, it is used as such in the hash library,
> the CRC32C version, and I assume that's why it is in there), but its use is much broader
> (its main purpose is not to be a hash function, but for data error detection, for any data).

The librte_hash has several hash functions, including CRC32C and Toeplitz.

> Therefore, I would suggest either creating a separate library for this, if we want to use this as a broader use,
> or leave it in net library, if we want to focus on calculating CRC for Ethernet frames.

I don't think Toeplitz should go in librte_net.
That's why I suggest to keep every kind of hash functions in librte_hash.

> Regarding to the CRC that we have in the hash library, if we go for a separate library,
> we could move that function there, but then it would have to follow the function prototype of a hash function,
> defined in the hash library.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-03-15 17:35               ` Thomas Monjalon
@ 2017-03-15 19:03                 ` Dumitrescu, Cristian
  2017-03-15 20:15                   ` Thomas Monjalon
  2017-03-15 19:09                 ` Dumitrescu, Cristian
  1 sibling, 1 reply; 69+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-15 19:03 UTC (permalink / raw)
  To: Thomas Monjalon, De Lara Guarch, Pablo
  Cc: Singh, Jasvinder, dev, Doherty, Declan

... <snip>

> > > > > I think it should be in librte_hash.
> > > > >
> > > > > Please check lib/librte_hash/rte_hash_crc.h
> > > >
> > > > Is it good to include payload crc calculation in hash library as I see all
> hash
> > > related functionality there?
> > >
> > > I think yes. Pablo?
> >
> > I think this doesn't belong in the hash library. These new functions calculate
> CRC, but not as a hash function.
> 
> Can't we say that a CRC is a hash? What is a hash?
> A function generating the same output bytes from given input bytes.
> 
> I think you must separate hash functions and hash table management.
> 

The fact that CRC32 instruction is opportunistically used to compute a hash digest/signature for load balancing (affinity-based) or hash tables (flow tables, ARP cache, etc) does not mean that all the code that uses CRC32 instruction should be placed in librte_hash.

The purpose of the hash functions in librte_hash is to compute a digest/signature for a given array of bytes (the key) as fast as possible. Any hash function that produces a hash signature with good uniform distribution in a small amount of cycles belongs here, including those opportunistically using specialized CPU instructions such as CRC32 (or XOR, AESNI, etc).

The code proposed in this patch is used to compute packet fields for various protocols that have a CRC field, such as FCS of Ethernet frames, etc. according to the relevant standard (IEEE 802, others). It is an utility to be used for implementing protocol-level functionality for various protocols from the network stack, similar to e.g. IP or UDP checksum. If we were to add an IP or UDCP checksum calculator, would you put it in librte_hash?

The code from this patch is never going to be used to compute a fast signature/digest. Typically this CRC is computed over the entire frame/packet rather than just selected fields from the packet representing the application-specific flow key. Also note that the signature produced by CRC32 hash function from librte_hash is actually not the correct Cyclic Redundancy Check of that array of bytes (or, for math guys, of the associated polynomial), it is just a partial/intermediate value.

Therefore, I suggest placing this code into: librte_ether (given that it can be used to compute Ethernet FCS), or librte_net, or librte_crc. Definitely not in librte_hash.

Regards,
Cristian

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-03-15 17:35               ` Thomas Monjalon
  2017-03-15 19:03                 ` Dumitrescu, Cristian
@ 2017-03-15 19:09                 ` Dumitrescu, Cristian
  1 sibling, 0 replies; 69+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-15 19:09 UTC (permalink / raw)
  To: Thomas Monjalon, De Lara Guarch, Pablo
  Cc: Singh, Jasvinder, dev, Doherty, Declan



> -----Original Message-----
> From: Dumitrescu, Cristian
> Sent: Wednesday, March 15, 2017 7:04 PM
> To: 'Thomas Monjalon' <thomas.monjalon@6wind.com>; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>
> Cc: Singh, Jasvinder <jasvinder.singh@intel.com>; dev@dpdk.org; Doherty,
> Declan <declan.doherty@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute
> APIs
> 
> ... <snip>
> 
> > > > > > I think it should be in librte_hash.
> > > > > >
> > > > > > Please check lib/librte_hash/rte_hash_crc.h
> > > > >
> > > > > Is it good to include payload crc calculation in hash library as I see all
> > hash
> > > > related functionality there?
> > > >
> > > > I think yes. Pablo?
> > >
> > > I think this doesn't belong in the hash library. These new functions
> calculate
> > CRC, but not as a hash function.
> >
> > Can't we say that a CRC is a hash? What is a hash?
> > A function generating the same output bytes from given input bytes.
> >
> > I think you must separate hash functions and hash table management.
> >
> 
> The fact that CRC32 instruction is opportunistically used to compute a hash
> digest/signature for load balancing (affinity-based) or hash tables (flow
> tables, ARP cache, etc) does not mean that all the code that uses CRC32
> instruction should be placed in librte_hash.
> 
> The purpose of the hash functions in librte_hash is to compute a
> digest/signature for a given array of bytes (the key) as fast as possible. Any
> hash function that produces a hash signature with good uniform distribution
> in a small amount of cycles belongs here, including those opportunistically
> using specialized CPU instructions such as CRC32 (or XOR, AESNI, etc).
> 
> The code proposed in this patch is used to compute packet fields for various
> protocols that have a CRC field, such as FCS of Ethernet frames, etc.
> according to the relevant standard (IEEE 802, others). It is an utility to be used
> for implementing protocol-level functionality for various protocols from the
> network stack, similar to e.g. IP or UDP checksum. If we were to add an IP or
> UDCP checksum calculator, would you put it in librte_hash?
> 
> The code from this patch is never going to be used to compute a fast
> signature/digest. Typically this CRC is computed over the entire frame/packet
> rather than just selected fields from the packet representing the application-
> specific flow key. Also note that the signature produced by CRC32 hash
> function from librte_hash is actually not the correct Cyclic Redundancy Check
> of that array of bytes (or, for math guys, of the associated polynomial), it is
> just a partial/intermediate value.
> 
> Therefore, I suggest placing this code into: librte_ether (given that it can be
> used to compute Ethernet FCS), or librte_net, or librte_crc. Definitely not in
> librte_hash.
> 

Sorry, my bad on librte_ether: the rte_ether.h where Ethernet frame is defined is located in librte_net, not librte_ether, so librte_ether should not be on the above list. Therefore, my suggestion is: librte_net or a new library librte_crc.

> Regards,
> Cristian

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-03-15 19:03                 ` Dumitrescu, Cristian
@ 2017-03-15 20:15                   ` Thomas Monjalon
  2017-03-15 21:11                     ` Dumitrescu, Cristian
  0 siblings, 1 reply; 69+ messages in thread
From: Thomas Monjalon @ 2017-03-15 20:15 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: De Lara Guarch, Pablo, Singh, Jasvinder, dev, Doherty, Declan

2017-03-15 19:03, Dumitrescu, Cristian:
> ... <snip>
> 
> > > > > > I think it should be in librte_hash.
> > > > > >
> > > > > > Please check lib/librte_hash/rte_hash_crc.h
> > > > >
> > > > > Is it good to include payload crc calculation in hash library as I see all
> > hash
> > > > related functionality there?
> > > >
> > > > I think yes. Pablo?
> > >
> > > I think this doesn't belong in the hash library. These new functions calculate
> > CRC, but not as a hash function.
> > 
> > Can't we say that a CRC is a hash? What is a hash?
> > A function generating the same output bytes from given input bytes.
> > 
> > I think you must separate hash functions and hash table management.
> > 
> 
> The fact that CRC32 instruction is opportunistically used to compute a hash digest/signature for load balancing (affinity-based) or hash tables (flow tables, ARP cache, etc) does not mean that all the code that uses CRC32 instruction should be placed in librte_hash.
> 
> The purpose of the hash functions in librte_hash is to compute a digest/signature for a given array of bytes (the key) as fast as possible. Any hash function that produces a hash signature with good uniform distribution in a small amount of cycles belongs here, including those opportunistically using specialized CPU instructions such as CRC32 (or XOR, AESNI, etc).
> 
> The code proposed in this patch is used to compute packet fields for various protocols that have a CRC field, such as FCS of Ethernet frames, etc. according to the relevant standard (IEEE 802, others). It is an utility to be used for implementing protocol-level functionality for various protocols from the network stack, similar to e.g. IP or UDP checksum. If we were to add an IP or UDCP checksum calculator, would you put it in librte_hash?
> 
> The code from this patch is never going to be used to compute a fast signature/digest. Typically this CRC is computed over the entire frame/packet rather than just selected fields from the packet representing the application-specific flow key. Also note that the signature produced by CRC32 hash function from librte_hash is actually not the correct Cyclic Redundancy Check of that array of bytes (or, for math guys, of the associated polynomial), it is just a partial/intermediate value.
> 
> Therefore, I suggest placing this code into: librte_ether (given that it can be used to compute Ethernet FCS), or librte_net, or librte_crc. Definitely not in librte_hash.

I agree with you Cristian that the protocol layer must be in librte_net.
But I think most of this patch is not protocol level.
I think you agree with me that the code computing a
"digest/signature for a given array of bytes" must go to librte_hash?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
  2017-03-15 20:15                   ` Thomas Monjalon
@ 2017-03-15 21:11                     ` Dumitrescu, Cristian
  0 siblings, 0 replies; 69+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-15 21:11 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: De Lara Guarch, Pablo, Singh, Jasvinder, dev, Doherty, Declan



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, March 15, 2017 8:16 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Singh,
> Jasvinder <jasvinder.singh@intel.com>; dev@dpdk.org; Doherty, Declan
> <declan.doherty@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute
> APIs
> 
> 2017-03-15 19:03, Dumitrescu, Cristian:
> > ... <snip>
> >
> > > > > > > I think it should be in librte_hash.
> > > > > > >
> > > > > > > Please check lib/librte_hash/rte_hash_crc.h
> > > > > >
> > > > > > Is it good to include payload crc calculation in hash library as I see all
> > > hash
> > > > > related functionality there?
> > > > >
> > > > > I think yes. Pablo?
> > > >
> > > > I think this doesn't belong in the hash library. These new functions
> calculate
> > > CRC, but not as a hash function.
> > >
> > > Can't we say that a CRC is a hash? What is a hash?
> > > A function generating the same output bytes from given input bytes.
> > >
> > > I think you must separate hash functions and hash table management.
> > >
> >
> > The fact that CRC32 instruction is opportunistically used to compute a hash
> digest/signature for load balancing (affinity-based) or hash tables (flow
> tables, ARP cache, etc) does not mean that all the code that uses CRC32
> instruction should be placed in librte_hash.
> >
> > The purpose of the hash functions in librte_hash is to compute a
> digest/signature for a given array of bytes (the key) as fast as possible. Any
> hash function that produces a hash signature with good uniform distribution
> in a small amount of cycles belongs here, including those opportunistically
> using specialized CPU instructions such as CRC32 (or XOR, AESNI, etc).
> >
> > The code proposed in this patch is used to compute packet fields for
> various protocols that have a CRC field, such as FCS of Ethernet frames, etc.
> according to the relevant standard (IEEE 802, others). It is an utility to be used
> for implementing protocol-level functionality for various protocols from the
> network stack, similar to e.g. IP or UDP checksum. If we were to add an IP or
> UDCP checksum calculator, would you put it in librte_hash?
> >
> > The code from this patch is never going to be used to compute a fast
> signature/digest. Typically this CRC is computed over the entire frame/packet
> rather than just selected fields from the packet representing the application-
> specific flow key. Also note that the signature produced by CRC32 hash
> function from librte_hash is actually not the correct Cyclic Redundancy Check
> of that array of bytes (or, for math guys, of the associated polynomial), it is
> just a partial/intermediate value.
> >
> > Therefore, I suggest placing this code into: librte_ether (given that it can be
> used to compute Ethernet FCS), or librte_net, or librte_crc. Definitely not in
> librte_hash.
> 
> I agree with you Cristian that the protocol layer must be in librte_net.
> But I think most of this patch is not protocol level.

Nope, this is the true CRC computed over entire protocol header and/or payload. Similar to to IPv4/UDP/TCP checksum. The only reason for computing it is because the protocol specs require it for data integrity checks, nothing to do with our signature for load balancing/hash tables.

More details on covered protocols from a reliable source :) [1]:
	CRC-32 (polynomial = 0x04C11DB7): used for HDLC, ANSI X3.66, ITU-T V.42, Ethernet, Serial ATA, MPEG-2, PKZIP, Gzip, Bzip2, PNG, many others
	CRC-16-CCITT (polynomial = 0x1021): used for X.25, V.41, HDLC FCS, XMODEM, Bluetooth, PACTOR, SD, DigRF, many others

[1] Wikipedia: https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Standards_and_common_use

> I think you agree with me that the code computing a
> "digest/signature for a given array of bytes" must go to librte_hash?

Yes, but this is the true protocol-level CRC, not a digest/signature.

Non-cryptographic hash digest/signature:
-computed over selected packet fields (flow key) for load balancing (affinity scheme) or hash table; key size typical range: 8 .. 192 bytes
-required by application requirements (such as flow packet ordering preservation), not by protocol standards
-has uniform distribution
-requires small amount of cycles to compute
-used as meta-data, not written in the packet
-can be opportunistically generated using specialized CPU instructions, such as CRC32 (or XEOR, or AESNI); in this case, it is a partial/intermediate value, not the correct CRC of the array of bytes

Protocol CRC:
-computed over entire packet header and/or payload
-protocol overhead (required by standards)
-computational cost is typically big and proportional with the packet length; packet length typical range: 64 .. 1514 .. 9K
-written in the packet (by the application SW or by the HW)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v4 0/2] librte_net: add crc computation support
  2017-03-12 21:33         ` [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-13  3:06           ` Ananyev, Konstantin
@ 2017-03-20 19:29           ` Jasvinder Singh
  2017-03-20 19:29             ` [dpdk-dev] [PATCH v4 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-20 19:29             ` [dpdk-dev] [PATCH v4 2/2] app/test: " Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-20 19:29 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty, pablo.de.lara.guarch

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v4 changes:
- change crc compute api parameters to make it more generic
- change the unit test to accomodate the crc compute api change

v3 changes:
- separate the x86 specific implementation into new file
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (2):
  librte_net: add crc compute APIs
  app/test: add unit test for CRC computation

 app/test/Makefile                  |   2 +
 app/test/test_crc.c                | 223 ++++++++++++++++++++++++
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 196 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 101 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 345 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 7 files changed, 878 insertions(+)
 create mode 100644 app/test/test_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v4 1/2] librte_net: add crc compute APIs
  2017-03-20 19:29           ` [dpdk-dev] [PATCH v4 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-03-20 19:29             ` Jasvinder Singh
  2017-03-21 14:45               ` [dpdk-dev] [PATCH v5 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-20 19:29             ` [dpdk-dev] [PATCH v4 2/2] app/test: " Jasvinder Singh
  1 sibling, 1 reply; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-20 19:29 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 196 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 101 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 345 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 5 files changed, 653 insertions(+)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 20cf664..39ff1cc 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,11 +39,14 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc_sse.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf
 
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..89edd80
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,196 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <stddef.h>
+
+/** crc tables */
+static uint32_t crc32_eth_lut[256];
+static uint32_t crc16_ccitt_lut[256];
+
+static uint32_t rte_crc16_ccitt_handler(const uint8_t *data,
+	uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#if defined(RTE_ARCH_X86_64) || defined(RTE_CPU_FALGS_SSE_4_2)
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param x value to be reflected
+ *
+ * @return reflected value
+ */
+static uint32_t
+reflect_32bits(const uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(const uint32_t poly,
+	uint32_t *lut)
+{
+	uint_fast32_t i, j;
+
+	for (i = 0; i < 256; i++) {
+		uint_fast32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+	lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/** 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/** 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+
+	case RTE_NET_CRC_SSE42:
+#ifdef RTE_ARCH_X86_64
+		if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = RTE_NET_CRC_SCALAR;
+		else {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[type];
+	ret = f_handle((const uint8_t *) data, data_len);
+
+	return ret;
+}
+
+/*
+ * Select highest available crc algorithm as default one.
+ */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef RTE_ARCH_X86_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..f8c9075
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,101 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ *  This API set the crc computation algorithm (i.e. scalar version,
+ *  x86 64-bit sse4.2 intrinsic version, etc.) and internal data
+ *  structure.
+ *
+ * @param alg
+ *   - RTE_NET_CRC_SCALAR
+ *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param data
+ *  Pointer to the packet data for crc computation
+ * @param data_len
+ *  Data length for crc computation
+ * @param type
+ *  Crc type (enum rte_net_crc_type)
+ *
+ * @return
+ *  crc value
+ */
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#if defined(RTE_ARCH_X86_64) || defined(RTE_CPU_FALGS_SSE_4_2)
+#include <rte_net_crc_sse.h>
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_crc_sse.h b/lib/librte_net/rte_net_crc_sse.h
new file mode 100644
index 0000000..1339201
--- /dev/null
+++ b/lib/librte_net/rte_net_crc_sse.h
@@ -0,0 +1,345 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <cpuid.h>
+#include <rte_net_crc.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block 16 byte data block
+ * @param precomp precomputed rk1 constanst
+ * @param fold running 16 byte folded data
+ *
+ * @return New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(const __m128i data_block,
+		const __m128i precomp,
+		const __m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128 128 bits data to be reduced
+ * @param precomp rk5 and rk6 precomputed constants
+ *
+ * @return data reduced to 64 bits
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128,
+	const __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64 64 bits data to be reduced
+ * @param precomp rk7 precomputed constant
+ *
+ * @return data reduced to 32 bits
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64,
+	const __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg 128 bit value
+ * @param num number of bytes to shift left reg by (0-16)
+ *
+ * @return reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1),	_mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5),	_mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1),	_mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..c6716ec 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_set_alg;
+	rte_net_crc_calc;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v4 2/2] app/test: add unit test for CRC computation
  2017-03-20 19:29           ` [dpdk-dev] [PATCH v4 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-20 19:29             ` [dpdk-dev] [PATCH v4 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-20 19:29             ` Jasvinder Singh
  2017-03-21  7:14               ` Peng, Yuan
  1 sibling, 1 reply; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-20 19:29 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 app/test/Makefile   |   2 +
 app/test/test_crc.c | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 225 insertions(+)
 create mode 100644 app/test/test_crc.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 1a5e03d..2a497f7 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/app/test/test_crc.c b/app/test/test_crc.c
new file mode 100644
index 0000000..2eb0bff
--- /dev/null
+++ b/app/test/test_crc.c
@@ -0,0 +1,223 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <rte_malloc.h>
+#include <rte_hexdump.h>
+
+#include "test.h"
+
+#define CRC_VEC_LEN				32
+#define CRC32_VEC_LEN1			1512
+#define CRC32_VEC_LEN2			348
+#define CRC16_VEC_LEN1			12
+#define CRC16_VEC_LEN2			2
+#define LINE_LEN 75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1*/
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2*/
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+uint32_t crc32_vec_res, crc32_vec1_res, crc32_vec2_res;
+uint32_t crc16_vec_res, crc16_vec1_res, crc16_vec2_res;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console*/
+	rte_hexdump(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static void
+crc_calc_scalar(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+
+	/* 32-bit ethernet CRC: scalar result */
+	type = RTE_NET_CRC32_ETH;
+	crc32_vec_res = crc_calc(crc_vec,
+						CRC_VEC_LEN,
+						type);
+
+	/* 32-bit ethernet CRC scalar result*/
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	crc32_vec1_res = crc_calc(test_data,
+						CRC32_VEC_LEN1,
+						type);
+
+	/* 32-bit ethernet CRC scalar result */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	crc32_vec2_res = crc_calc(test_data,
+						CRC32_VEC_LEN2,
+						type);
+
+	/* 16-bit CCITT CRC scalar result */
+	type = RTE_NET_CRC16_CCITT;
+	crc16_vec_res = crc_calc(crc_vec,
+						CRC_VEC_LEN,
+						type);
+
+	/* 16-bit CCITT CRC scalar result */
+	crc16_vec1_res = crc_calc(crc16_vec1,
+						CRC16_VEC_LEN1,
+						type);
+
+	/* 16-bit CCITT CRC scalar result*/
+	crc16_vec2_res = crc_calc(crc16_vec2,
+						CRC16_VEC_LEN2,
+						type);
+}
+
+static int
+test_crc(void) {
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t ret;
+
+	/* set crc scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+	crc_calc_scalar();
+
+	/* set crc sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	ret = crc_calc(crc_vec,
+		CRC_VEC_LEN,
+		type);
+	if (ret != crc32_vec_res) {
+		printf("test_crc(32-bit): test1 failed !\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data,
+		CRC32_VEC_LEN1,
+		type);
+	if (ret != crc32_vec1_res) {
+		printf("test_crc(32-bit): test2 failed!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data,
+		CRC32_VEC_LEN2,
+		type);
+	if (ret != crc32_vec2_res) {
+		printf("test_crc(32-bit): test3 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(crc_vec,
+		CRC_VEC_LEN,
+		type);
+	if (ret != crc16_vec_res) {
+		printf("test_crc (16-bit): test4 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 5 */
+	ret = crc_calc(crc16_vec1,
+		CRC16_VEC_LEN1,
+		type);
+	if (ret != crc16_vec1_res) {
+		printf("test_crc (16-bit): test5 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 6 */
+	ret = crc_calc(crc16_vec2,
+		CRC16_VEC_LEN2,
+		type);
+	if (ret != crc16_vec2_res) {
+		printf("test_crc (16-bit): test6 failed!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/2] app/test: add unit test for CRC computation
  2017-03-20 19:29             ` [dpdk-dev] [PATCH v4 2/2] app/test: " Jasvinder Singh
@ 2017-03-21  7:14               ` Peng, Yuan
  0 siblings, 0 replies; 69+ messages in thread
From: Peng, Yuan @ 2017-03-21  7:14 UTC (permalink / raw)
  To: Singh, Jasvinder, dev; +Cc: Doherty, Declan, De Lara Guarch, Pablo

Hi Singh,

The second patch cannot be applied, since the test folder has moved to the DPDK root directory.

[root@localhost dpdk]# git apply dpdk-dev-v4-2-2-app-test-add-unit-test-for-CRC-computation.patch
error: app/test/Makefile: No such file or directory

Thank you.
Yuan.

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jasvinder Singh
Sent: Tuesday, March 21, 2017 3:30 AM
To: dev@dpdk.org
Cc: Doherty, Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
Subject: [dpdk-dev] [PATCH v4 2/2] app/test: add unit test for CRC computation

This patch provides a set of tests for verifying the functional correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 app/test/Makefile   |   2 +
 app/test/test_crc.c | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 225 insertions(+)
 create mode 100644 app/test/test_crc.c

diff --git a/app/test/Makefile b/app/test/Makefile index 1a5e03d..2a497f7 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/app/test/test_crc.c b/app/test/test_crc.c new file mode 100644 index 0000000..2eb0bff
--- /dev/null
+++ b/app/test/test_crc.c
@@ -0,0 +1,223 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <rte_malloc.h>
+#include <rte_hexdump.h>
+
+#include "test.h"
+
+#define CRC_VEC_LEN				32
+#define CRC32_VEC_LEN1			1512
+#define CRC32_VEC_LEN2			348
+#define CRC16_VEC_LEN1			12
+#define CRC16_VEC_LEN2			2
+#define LINE_LEN 75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', };
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1*/
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2*/
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+uint32_t crc32_vec_res, crc32_vec1_res, crc32_vec2_res; uint32_t 
+crc16_vec_res, crc16_vec1_res, crc16_vec2_res;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console*/
+	rte_hexdump(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static void
+crc_calc_scalar(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+
+	/* 32-bit ethernet CRC: scalar result */
+	type = RTE_NET_CRC32_ETH;
+	crc32_vec_res = crc_calc(crc_vec,
+						CRC_VEC_LEN,
+						type);
+
+	/* 32-bit ethernet CRC scalar result*/
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	crc32_vec1_res = crc_calc(test_data,
+						CRC32_VEC_LEN1,
+						type);
+
+	/* 32-bit ethernet CRC scalar result */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	crc32_vec2_res = crc_calc(test_data,
+						CRC32_VEC_LEN2,
+						type);
+
+	/* 16-bit CCITT CRC scalar result */
+	type = RTE_NET_CRC16_CCITT;
+	crc16_vec_res = crc_calc(crc_vec,
+						CRC_VEC_LEN,
+						type);
+
+	/* 16-bit CCITT CRC scalar result */
+	crc16_vec1_res = crc_calc(crc16_vec1,
+						CRC16_VEC_LEN1,
+						type);
+
+	/* 16-bit CCITT CRC scalar result*/
+	crc16_vec2_res = crc_calc(crc16_vec2,
+						CRC16_VEC_LEN2,
+						type);
+}
+
+static int
+test_crc(void) {
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t ret;
+
+	/* set crc scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+	crc_calc_scalar();
+
+	/* set crc sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	ret = crc_calc(crc_vec,
+		CRC_VEC_LEN,
+		type);
+	if (ret != crc32_vec_res) {
+		printf("test_crc(32-bit): test1 failed !\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data,
+		CRC32_VEC_LEN1,
+		type);
+	if (ret != crc32_vec1_res) {
+		printf("test_crc(32-bit): test2 failed!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data,
+		CRC32_VEC_LEN2,
+		type);
+	if (ret != crc32_vec2_res) {
+		printf("test_crc(32-bit): test3 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(crc_vec,
+		CRC_VEC_LEN,
+		type);
+	if (ret != crc16_vec_res) {
+		printf("test_crc (16-bit): test4 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 5 */
+	ret = crc_calc(crc16_vec1,
+		CRC16_VEC_LEN1,
+		type);
+	if (ret != crc16_vec1_res) {
+		printf("test_crc (16-bit): test5 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 6 */
+	ret = crc_calc(crc16_vec2,
+		CRC16_VEC_LEN2,
+		type);
+	if (ret != crc16_vec2_res) {
+		printf("test_crc (16-bit): test6 failed!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
--
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v5 0/2] librte_net: add crc computation support
  2017-03-20 19:29             ` [dpdk-dev] [PATCH v4 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-21 14:45               ` Jasvinder Singh
  2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 " Jasvinder Singh
  0 siblings, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-21 14:45 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3187 bytes --]

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v5 changes:
- rebase to the master

v4 changes:
- change crc compute api parameters to make it more generic
- change the unit test to accomodate the crc compute api change

v3 changes:
- separate the x86 specific implementation into new file
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (2):
  librte_net: add crc compute APIs
  test/test: add unit test for CRC computation

 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 196 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 101 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 345 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 test/test/Makefile                 |   2 +
 test/test/test_crc.c               | 223 ++++++++++++++++++++++++
 7 files changed, 878 insertions(+)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h
 create mode 100644 test/test/test_crc.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs
  2017-03-21 14:45               ` [dpdk-dev] [PATCH v5 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-03-21 14:45                 ` Jasvinder Singh
  2017-03-28 18:04                   ` De Lara Guarch, Pablo
  2017-03-29 12:42                   ` [dpdk-dev] [PATCH v6 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 " Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-21 14:45 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 196 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 101 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 345 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 5 files changed, 653 insertions(+)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 20cf664..39ff1cc 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,11 +39,14 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc_sse.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf
 
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..89edd80
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,196 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <stddef.h>
+
+/** crc tables */
+static uint32_t crc32_eth_lut[256];
+static uint32_t crc16_ccitt_lut[256];
+
+static uint32_t rte_crc16_ccitt_handler(const uint8_t *data,
+	uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#if defined(RTE_ARCH_X86_64) || defined(RTE_CPU_FALGS_SSE_4_2)
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param x value to be reflected
+ *
+ * @return reflected value
+ */
+static uint32_t
+reflect_32bits(const uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(const uint32_t poly,
+	uint32_t *lut)
+{
+	uint_fast32_t i, j;
+
+	for (i = 0; i < 256; i++) {
+		uint_fast32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+	lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/** 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/** 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+
+	case RTE_NET_CRC_SSE42:
+#ifdef RTE_ARCH_X86_64
+		if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = RTE_NET_CRC_SCALAR;
+		else {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[type];
+	ret = f_handle((const uint8_t *) data, data_len);
+
+	return ret;
+}
+
+/*
+ * Select highest available crc algorithm as default one.
+ */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef RTE_ARCH_X86_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..f8c9075
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,101 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ *  This API set the crc computation algorithm (i.e. scalar version,
+ *  x86 64-bit sse4.2 intrinsic version, etc.) and internal data
+ *  structure.
+ *
+ * @param alg
+ *   - RTE_NET_CRC_SCALAR
+ *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param data
+ *  Pointer to the packet data for crc computation
+ * @param data_len
+ *  Data length for crc computation
+ * @param type
+ *  Crc type (enum rte_net_crc_type)
+ *
+ * @return
+ *  crc value
+ */
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#if defined(RTE_ARCH_X86_64) || defined(RTE_CPU_FALGS_SSE_4_2)
+#include <rte_net_crc_sse.h>
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_crc_sse.h b/lib/librte_net/rte_net_crc_sse.h
new file mode 100644
index 0000000..e9af22d
--- /dev/null
+++ b/lib/librte_net/rte_net_crc_sse.h
@@ -0,0 +1,345 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <cpuid.h>
+#include <rte_net_crc.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block 16 byte data block
+ * @param precomp precomputed rk1 constanst
+ * @param fold running 16 byte folded data
+ *
+ * @return New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(const __m128i data_block,
+		const __m128i precomp,
+		const __m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128 128 bits data to be reduced
+ * @param precomp rk5 and rk6 precomputed constants
+ *
+ * @return data reduced to 64 bits
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128,
+	const __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64 64 bits data to be reduced
+ * @param precomp rk7 precomputed constant
+ *
+ * @return data reduced to 32 bits
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64,
+	const __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg 128 bit value
+ * @param num number of bytes to shift left reg by (0-16)
+ *
+ * @return reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1),	_mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5),	_mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1),	_mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..c6716ec 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_set_alg;
+	rte_net_crc_calc;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v5 2/2] test/test: add unit test for CRC computation
  2017-03-21 14:45               ` [dpdk-dev] [PATCH v5 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-21 14:45                 ` Jasvinder Singh
  2017-03-28 19:23                   ` De Lara Guarch, Pablo
  1 sibling, 1 reply; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-21 14:45 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 test/test/Makefile   |   2 +
 test/test/test_crc.c | 223 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 225 insertions(+)
 create mode 100644 test/test/test_crc.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 1a5e03d..2a497f7 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/test/test/test_crc.c b/test/test/test_crc.c
new file mode 100644
index 0000000..2eb0bff
--- /dev/null
+++ b/test/test/test_crc.c
@@ -0,0 +1,223 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_net_crc.h>
+#include <rte_malloc.h>
+#include <rte_hexdump.h>
+
+#include "test.h"
+
+#define CRC_VEC_LEN				32
+#define CRC32_VEC_LEN1			1512
+#define CRC32_VEC_LEN2			348
+#define CRC16_VEC_LEN1			12
+#define CRC16_VEC_LEN2			2
+#define LINE_LEN 75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1*/
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2*/
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+uint32_t crc32_vec_res, crc32_vec1_res, crc32_vec2_res;
+uint32_t crc16_vec_res, crc16_vec1_res, crc16_vec2_res;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console*/
+	rte_hexdump(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static void
+crc_calc_scalar(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+
+	/* 32-bit ethernet CRC: scalar result */
+	type = RTE_NET_CRC32_ETH;
+	crc32_vec_res = crc_calc(crc_vec,
+						CRC_VEC_LEN,
+						type);
+
+	/* 32-bit ethernet CRC scalar result*/
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	crc32_vec1_res = crc_calc(test_data,
+						CRC32_VEC_LEN1,
+						type);
+
+	/* 32-bit ethernet CRC scalar result */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	crc32_vec2_res = crc_calc(test_data,
+						CRC32_VEC_LEN2,
+						type);
+
+	/* 16-bit CCITT CRC scalar result */
+	type = RTE_NET_CRC16_CCITT;
+	crc16_vec_res = crc_calc(crc_vec,
+						CRC_VEC_LEN,
+						type);
+
+	/* 16-bit CCITT CRC scalar result */
+	crc16_vec1_res = crc_calc(crc16_vec1,
+						CRC16_VEC_LEN1,
+						type);
+
+	/* 16-bit CCITT CRC scalar result*/
+	crc16_vec2_res = crc_calc(crc16_vec2,
+						CRC16_VEC_LEN2,
+						type);
+}
+
+static int
+test_crc(void) {
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t ret;
+
+	/* set crc scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+	crc_calc_scalar();
+
+	/* set crc sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	ret = crc_calc(crc_vec,
+		CRC_VEC_LEN,
+		type);
+	if (ret != crc32_vec_res) {
+		printf("test_crc(32-bit): test1 failed !\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data,
+		CRC32_VEC_LEN1,
+		type);
+	if (ret != crc32_vec1_res) {
+		printf("test_crc(32-bit): test2 failed!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN2, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+	rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data,
+		CRC32_VEC_LEN2,
+		type);
+	if (ret != crc32_vec2_res) {
+		printf("test_crc(32-bit): test3 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(crc_vec,
+		CRC_VEC_LEN,
+		type);
+	if (ret != crc16_vec_res) {
+		printf("test_crc (16-bit): test4 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 5 */
+	ret = crc_calc(crc16_vec1,
+		CRC16_VEC_LEN1,
+		type);
+	if (ret != crc16_vec1_res) {
+		printf("test_crc (16-bit): test5 failed!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 6 */
+	ret = crc_calc(crc16_vec2,
+		CRC16_VEC_LEN2,
+		type);
+	if (ret != crc16_vec2_res) {
+		printf("test_crc (16-bit): test6 failed!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs
  2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-28 18:04                   ` De Lara Guarch, Pablo
  2017-03-28 18:07                     ` De Lara Guarch, Pablo
  2017-03-28 19:21                     ` Singh, Jasvinder
  2017-03-29 12:42                   ` [dpdk-dev] [PATCH v6 0/2] librte_net: add crc computation support Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: De Lara Guarch, Pablo @ 2017-03-28 18:04 UTC (permalink / raw)
  To: Singh, Jasvinder, dev; +Cc: olivier.matz, Doherty, Declan

Hi Jasvinder,

> -----Original Message-----
> From: Singh, Jasvinder
> Sent: Tuesday, March 21, 2017 2:46 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Doherty, Declan; De Lara Guarch, Pablo
> Subject: [PATCH v5 1/2] librte_net: add crc compute APIs
> 
> APIs for selecting the architecure specific implementation and computing
> the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
> as well as x86 intrinsic(sse4.2) versions are implemented.
> 
> The scalar version is based on generic Look-Up Table(LUT) algorithm,
> while x86 intrinsic version uses carry-less multiplication for
> fast CRC computation.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> ---

> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
> new file mode 100644
> index 0000000..89edd80
> --- /dev/null
> +++ b/lib/librte_net/rte_net_crc.c

...

> +
> +#include <rte_net_crc.h>
> +#include <stddef.h>
> +
> +/** crc tables */
> +static uint32_t crc32_eth_lut[256];
> +static uint32_t crc16_ccitt_lut[256];

Use a macro for 256, that you can use in crc32_eth_init_lut. 
> +
> +static uint32_t rte_crc16_ccitt_handler(const uint8_t *data,
> +	uint32_t data_len);

Separate "static uint32_t" in another line.


> +/**
> + * Reflect the bits about the middle
> + *
> + * @param x value to be reflected

Should be "val".

> + *
> + * @return reflected value
> + */
> +static uint32_t
> +reflect_32bits(const uint32_t val)

No need for "const" here, as it is not a pointer.

> +{
> +	uint32_t i, res = 0;
> +
> +	for (i = 0; i < 32; i++)
> +		if ((val & (1 << i)) != 0)
> +			res |= (uint32_t)(1 << (31 - i));
> +
> +	return res;
> +}
> +
> +static void
> +crc32_eth_init_lut(const uint32_t poly,

No need for "const" here.

> +	uint32_t *lut)
> +{
> +	uint_fast32_t i, j;
> +
> +	for (i = 0; i < 256; i++) {
> +		uint_fast32_t crc = reflect_32bits(i);
> +
> +		for (j = 0; j < 8; j++) {
> +			if (crc & 0x80000000L)
> +				crc = (crc << 1) ^ poly;
> +			else
> +				crc <<= 1;
> +		}
> +	lut[i] = reflect_32bits(crc);

Wrong indentation.

> +	}
> +}
> +
> +static inline __attribute__((always_inline)) uint32_t
> +crc32_eth_calc_lut(const uint8_t *data,
> +	uint32_t data_len,
> +	uint32_t crc,
> +	const uint32_t *lut)
> +{
> +	while (data_len--)
> +		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
> +
> +	return crc;
> +}
> +
> +static void
> +rte_net_crc_scalar_init(void)
> +{
> +	/** 32-bit crc init */
> +	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
> +
> +	/** 16-bit CRC init */
> +	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16,
> crc16_ccitt_lut);
> +

Remove this blank line.

> +}
> +
> +static inline uint32_t
> +rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
> +{
> +	return (uint16_t)~crc32_eth_calc_lut(data,
> +		data_len,
> +		0xffff,
> +		crc16_ccitt_lut);

Since you are casting to uint16_t, when you are supposed to cast to uint32_t
(given the return type), I would add a comment explaining why.

> +}
> +


> diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
> new file mode 100644
> index 0000000..f8c9075
> --- /dev/null
> +++ b/lib/librte_net/rte_net_crc.h
> @@ -0,0 +1,101 @@

...

> +
> +/**
> + *  This API set the crc computation algorithm (i.e. scalar version,
> + *  x86 64-bit sse4.2 intrinsic version, etc.) and internal data
> + *  structure.
> + *
> + * @param alg

Add extra information (CRC algorithm?).

> + *   - RTE_NET_CRC_SCALAR
> + *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
> + */
> +void
> +rte_net_crc_set_alg(enum rte_net_crc_alg alg);
> +
> +/**
> + * CRC compute API
> + *
> + * @param data
> + *  Pointer to the packet data for crc computation
> + * @param data_len
> + *  Data length for crc computation
> + * @param type
> + *  Crc type (enum rte_net_crc_type)

CRC

> + *
> + * @return
> + *  crc value

Add two spaces after "@param" and "@return".

> + */
> +uint32_t
> +rte_net_crc_calc(const void *data,
> +	uint32_t data_len,
> +	enum rte_net_crc_type type);
> +
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_CPU_FALGS_SSE_4_2)

Typo in RTE_CPU_FALGS_SSE_4_2 (I missed the same one in rte_net_crc.c ).
Also, should it be "&&"?


> +#include <rte_net_crc_sse.h>
> +#endif
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +
> +#endif /* _RTE_NET_CRC_H_ */
> diff --git a/lib/librte_net/rte_net_crc_sse.h
> b/lib/librte_net/rte_net_crc_sse.h
> new file mode 100644
> index 0000000..e9af22d
> --- /dev/null
> +++ b/lib/librte_net/rte_net_crc_sse.h
> @@ -0,0 +1,345 @@

...

> + * @brief Performs one folding round
> + *
> + * Logically function operates as follows:
> + *     DATA = READ_NEXT_16BYTES();
> + *     F1 = LSB8(FOLD)
> + *     F2 = MSB8(FOLD)
> + *     T1 = CLMUL(F1, RK1)
> + *     T2 = CLMUL(F2, RK2)
> + *     FOLD = XOR(T1, T2, DATA)
> + *
> + * @param data_block 16 byte data block
> + * @param precomp precomputed rk1 constanst
> + * @param fold running 16 byte folded data
> + *
> + * @return New 16 byte folded data

Move parameter/rturn description in a separate line (same for other functions).

> + */
> +static inline __attribute__((always_inline)) __m128i
> +crcr32_folding_round(const __m128i data_block,
> +		const __m128i precomp,
> +		const __m128i fold)

No need to use "const" here.

> +{
> +	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
> +	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
> +
> +	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
> +}
> +
> +/**
> + * Performs reduction from 128 bits to 64 bits
> + *
> + * @param data128 128 bits data to be reduced
> + * @param precomp rk5 and rk6 precomputed constants
> + *
> + * @return data reduced to 64 bits
> + */
> +
> +static inline __attribute__((always_inline)) __m128i
> +crcr32_reduce_128_to_64(__m128i data128,
> +	const __m128i precomp)

No need to use "const" here.

...

> +
> +
> +static inline void
> +rte_net_crc_sse42_init(void)
> +{
> +	uint64_t k1, k2, k5, k6;
> +	uint64_t p = 0, q = 0;
> +
> +	/** Initialize CRC16 data */
> +	k1 = 0x189aeLLU;
> +	k2 = 0x8e10LLU;
> +	k5 = 0x189aeLLU;
> +	k6 = 0x114aaLLU;
> +	q =  0x11c581910LLU;
> +	p =  0x10811LLU;
> +
> +	/** Save the params in context structure */
> +	crc16_ccitt_pclmulqdq.rk1_rk2 =
> +		_mm_setr_epi64(_mm_cvtsi64_m64(k1),
> 	_mm_cvtsi64_m64(k2));
> +	crc16_ccitt_pclmulqdq.rk5_rk6 =
> +		_mm_setr_epi64(_mm_cvtsi64_m64(k5),
> 	_mm_cvtsi64_m64(k6));
> +	crc16_ccitt_pclmulqdq.rk7_rk8 =
> +		_mm_setr_epi64(_mm_cvtsi64_m64(q),
> _mm_cvtsi64_m64(p));
> +
> +	/** Initialize CRC32 data */
> +	k1 = 0xccaa009eLLU;
> +	k2 = 0x1751997d0LLU;
> +	k5 = 0xccaa009eLLU;
> +	k6 = 0x163cd6124LLU;
> +	q =  0x1f7011640LLU;
> +	p =  0x1db710641LLU;
> +
> +	/** Save the params in context structure */
> +	crc32_eth_pclmulqdq.rk1_rk2 =
> +		_mm_setr_epi64(_mm_cvtsi64_m64(k1),
> 	_mm_cvtsi64_m64(k2));

Add extra tab for better readability.

> +	crc32_eth_pclmulqdq.rk5_rk6 =
> +		_mm_setr_epi64(_mm_cvtsi64_m64(k5),
> _mm_cvtsi64_m64(k6));
> +	crc32_eth_pclmulqdq.rk7_rk8 =
> +		_mm_setr_epi64(_mm_cvtsi64_m64(q),
> _mm_cvtsi64_m64(p));
> +
> +	_mm_empty();

Maybe we need a comment here.

> +
> +}
> +
> +static inline uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data,
> +	uint32_t data_len)
> +{
> +	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
> +		data_len,
> +		0xffff,
> +		&crc16_ccitt_pclmulqdq);

Same comment about the casting here.

> +}
> +
> +static inline uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data,
> +	uint32_t data_len)
> +{
> +	return ~crc32_eth_calc_pclmulqdq(data,
> +		data_len,
> +		0xffffffffUL,
> +		&crc32_eth_pclmulqdq);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_NET_CRC_SSE_H_ */
> diff --git a/lib/librte_net/rte_net_version.map
> b/lib/librte_net/rte_net_version.map
> index 3b15e65..c6716ec 100644
> --- a/lib/librte_net/rte_net_version.map
> +++ b/lib/librte_net/rte_net_version.map
> @@ -4,3 +4,11 @@ DPDK_16.11 {
> 
>  	local: *;
>  };
> +
> +DPDK_17.05 {
> +	global:
> +
> +	rte_net_crc_set_alg;
> +	rte_net_crc_calc;

This has to be alphabetically sorted.

> +
> +} DPDK_16.11;
> --
> 2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs
  2017-03-28 18:04                   ` De Lara Guarch, Pablo
@ 2017-03-28 18:07                     ` De Lara Guarch, Pablo
  2017-03-28 19:21                     ` Singh, Jasvinder
  1 sibling, 0 replies; 69+ messages in thread
From: De Lara Guarch, Pablo @ 2017-03-28 18:07 UTC (permalink / raw)
  To: De Lara Guarch, Pablo, Singh, Jasvinder, dev
  Cc: olivier.matz, Doherty, Declan



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of De Lara Guarch,
> Pablo
> Sent: Tuesday, March 28, 2017 7:04 PM
> To: Singh, Jasvinder; dev@dpdk.org
> Cc: olivier.matz@6wind.com; Doherty, Declan
> Subject: Re: [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs
> 
> Hi Jasvinder,

There is also an issue when compiling the net library as a shared library.

rte_net_crc.o: In function `rte_net_crc_set_alg':
rte_net_crc.c:(.text+0x502): undefined reference to `rte_cpu_get_flag_enabled'
rte_net_crc.o: In function `rte_net_crc_init':
rte_net_crc.c:(.text.startup+0x15c): undefined reference to `rte_cpu_get_flag_enabled'
collect2: error: ld returned 1 exit status
/tmp/dpdk/mk/rte.lib.mk:127: recipe for target 'librte_net.so.1.1' failed

> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs
  2017-03-28 18:04                   ` De Lara Guarch, Pablo
  2017-03-28 18:07                     ` De Lara Guarch, Pablo
@ 2017-03-28 19:21                     ` Singh, Jasvinder
  1 sibling, 0 replies; 69+ messages in thread
From: Singh, Jasvinder @ 2017-03-28 19:21 UTC (permalink / raw)
  To: De Lara Guarch, Pablo, dev; +Cc: olivier.matz, Doherty, Declan

Hi Pablo,


> -----Original Message-----
> From: De Lara Guarch, Pablo
> Sent: Tuesday, March 28, 2017 7:04 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; Doherty, Declan <declan.doherty@intel.com>
> Subject: RE: [PATCH v5 1/2] librte_net: add crc compute APIs
> 
> Hi Jasvinder,
> 
> > -----Original Message-----
> > From: Singh, Jasvinder
> > Sent: Tuesday, March 21, 2017 2:46 PM
> > To: dev@dpdk.org
> > Cc: olivier.matz@6wind.com; Doherty, Declan; De Lara Guarch, Pablo
> > Subject: [PATCH v5 1/2] librte_net: add crc compute APIs
> >
> > APIs for selecting the architecure specific implementation and
> > computing the crc (16-bit and 32-bit CRCs) are added. For CRCs
> > calculation, scalar as well as x86 intrinsic(sse4.2) versions are implemented.
> >
> > The scalar version is based on generic Look-Up Table(LUT) algorithm,
> > while x86 intrinsic version uses carry-less multiplication for fast
> > CRC computation.
> >
> > Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> > ---
> 
> > diff --git a/lib/librte_net/rte_net_crc.c
> > b/lib/librte_net/rte_net_crc.c new file mode 100644 index
> > 0000000..89edd80
> > --- /dev/null
> > +++ b/lib/librte_net/rte_net_crc.c
> 
> ...
> 
> > +
> > +#include <rte_net_crc.h>
> > +#include <stddef.h>
> > +
> > +/** crc tables */
> > +static uint32_t crc32_eth_lut[256];
> > +static uint32_t crc16_ccitt_lut[256];
> 
> Use a macro for 256, that you can use in crc32_eth_init_lut.
> > +
> > +static uint32_t rte_crc16_ccitt_handler(const uint8_t *data,
> > +	uint32_t data_len);
> 
> Separate "static uint32_t" in another line.
> 
> 
> > +/**
> > + * Reflect the bits about the middle
> > + *
> > + * @param x value to be reflected
> 
> Should be "val".
> 
> > + *
> > + * @return reflected value
> > + */
> > +static uint32_t
> > +reflect_32bits(const uint32_t val)
> 
> No need for "const" here, as it is not a pointer.
> 
> > +{
> > +	uint32_t i, res = 0;
> > +
> > +	for (i = 0; i < 32; i++)
> > +		if ((val & (1 << i)) != 0)
> > +			res |= (uint32_t)(1 << (31 - i));
> > +
> > +	return res;
> > +}
> > +
> > +static void
> > +crc32_eth_init_lut(const uint32_t poly,
> 
> No need for "const" here.
> 
> > +	uint32_t *lut)
> > +{
> > +	uint_fast32_t i, j;
> > +
> > +	for (i = 0; i < 256; i++) {
> > +		uint_fast32_t crc = reflect_32bits(i);
> > +
> > +		for (j = 0; j < 8; j++) {
> > +			if (crc & 0x80000000L)
> > +				crc = (crc << 1) ^ poly;
> > +			else
> > +				crc <<= 1;
> > +		}
> > +	lut[i] = reflect_32bits(crc);
> 
> Wrong indentation.
> 
> > +	}
> > +}
> > +
> > +static inline __attribute__((always_inline)) uint32_t
> > +crc32_eth_calc_lut(const uint8_t *data,
> > +	uint32_t data_len,
> > +	uint32_t crc,
> > +	const uint32_t *lut)
> > +{
> > +	while (data_len--)
> > +		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
> > +
> > +	return crc;
> > +}
> > +
> > +static void
> > +rte_net_crc_scalar_init(void)
> > +{
> > +	/** 32-bit crc init */
> > +	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
> > +
> > +	/** 16-bit CRC init */
> > +	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16,
> > crc16_ccitt_lut);
> > +
> 
> Remove this blank line.
> 
> > +}
> > +
> > +static inline uint32_t
> > +rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len) {
> > +	return (uint16_t)~crc32_eth_calc_lut(data,
> > +		data_len,
> > +		0xffff,
> > +		crc16_ccitt_lut);
> 
> Since you are casting to uint16_t, when you are supposed to cast to uint32_t
> (given the return type), I would add a comment explaining why.
> 
> > +}
> > +
> 
> 
> > diff --git a/lib/librte_net/rte_net_crc.h
> > b/lib/librte_net/rte_net_crc.h new file mode 100644 index
> > 0000000..f8c9075
> > --- /dev/null
> > +++ b/lib/librte_net/rte_net_crc.h
> > @@ -0,0 +1,101 @@
> 
> ...
> 
> > +
> > +/**
> > + *  This API set the crc computation algorithm (i.e. scalar version,
> > + *  x86 64-bit sse4.2 intrinsic version, etc.) and internal data
> > + *  structure.
> > + *
> > + * @param alg
> 
> Add extra information (CRC algorithm?).
> 
> > + *   - RTE_NET_CRC_SCALAR
> > + *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
> > + */
> > +void
> > +rte_net_crc_set_alg(enum rte_net_crc_alg alg);
> > +
> > +/**
> > + * CRC compute API
> > + *
> > + * @param data
> > + *  Pointer to the packet data for crc computation
> > + * @param data_len
> > + *  Data length for crc computation
> > + * @param type
> > + *  Crc type (enum rte_net_crc_type)
> 
> CRC
> 
> > + *
> > + * @return
> > + *  crc value
> 
> Add two spaces after "@param" and "@return".
> 
> > + */
> > +uint32_t
> > +rte_net_crc_calc(const void *data,
> > +	uint32_t data_len,
> > +	enum rte_net_crc_type type);
> > +
> > +#if defined(RTE_ARCH_X86_64) || defined(RTE_CPU_FALGS_SSE_4_2)
> 
> Typo in RTE_CPU_FALGS_SSE_4_2 (I missed the same one in rte_net_crc.c ).
> Also, should it be "&&"?
> 
> 
> > +#include <rte_net_crc_sse.h>
> > +#endif
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +
> > +#endif /* _RTE_NET_CRC_H_ */
> > diff --git a/lib/librte_net/rte_net_crc_sse.h
> > b/lib/librte_net/rte_net_crc_sse.h
> > new file mode 100644
> > index 0000000..e9af22d
> > --- /dev/null
> > +++ b/lib/librte_net/rte_net_crc_sse.h
> > @@ -0,0 +1,345 @@
> 
> ...
> 
> > + * @brief Performs one folding round
> > + *
> > + * Logically function operates as follows:
> > + *     DATA = READ_NEXT_16BYTES();
> > + *     F1 = LSB8(FOLD)
> > + *     F2 = MSB8(FOLD)
> > + *     T1 = CLMUL(F1, RK1)
> > + *     T2 = CLMUL(F2, RK2)
> > + *     FOLD = XOR(T1, T2, DATA)
> > + *
> > + * @param data_block 16 byte data block
> > + * @param precomp precomputed rk1 constanst
> > + * @param fold running 16 byte folded data
> > + *
> > + * @return New 16 byte folded data
> 
> Move parameter/rturn description in a separate line (same for other
> functions).
> 
> > + */
> > +static inline __attribute__((always_inline)) __m128i
> > +crcr32_folding_round(const __m128i data_block,
> > +		const __m128i precomp,
> > +		const __m128i fold)
> 
> No need to use "const" here.
> 
> > +{
> > +	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
> > +	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
> > +
> > +	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0)); }
> > +
> > +/**
> > + * Performs reduction from 128 bits to 64 bits
> > + *
> > + * @param data128 128 bits data to be reduced
> > + * @param precomp rk5 and rk6 precomputed constants
> > + *
> > + * @return data reduced to 64 bits
> > + */
> > +
> > +static inline __attribute__((always_inline)) __m128i
> > +crcr32_reduce_128_to_64(__m128i data128,
> > +	const __m128i precomp)
> 
> No need to use "const" here.
> 
> ...
> 
> > +
> > +
> > +static inline void
> > +rte_net_crc_sse42_init(void)
> > +{
> > +	uint64_t k1, k2, k5, k6;
> > +	uint64_t p = 0, q = 0;
> > +
> > +	/** Initialize CRC16 data */
> > +	k1 = 0x189aeLLU;
> > +	k2 = 0x8e10LLU;
> > +	k5 = 0x189aeLLU;
> > +	k6 = 0x114aaLLU;
> > +	q =  0x11c581910LLU;
> > +	p =  0x10811LLU;
> > +
> > +	/** Save the params in context structure */
> > +	crc16_ccitt_pclmulqdq.rk1_rk2 =
> > +		_mm_setr_epi64(_mm_cvtsi64_m64(k1),
> > 	_mm_cvtsi64_m64(k2));
> > +	crc16_ccitt_pclmulqdq.rk5_rk6 =
> > +		_mm_setr_epi64(_mm_cvtsi64_m64(k5),
> > 	_mm_cvtsi64_m64(k6));
> > +	crc16_ccitt_pclmulqdq.rk7_rk8 =
> > +		_mm_setr_epi64(_mm_cvtsi64_m64(q),
> > _mm_cvtsi64_m64(p));
> > +
> > +	/** Initialize CRC32 data */
> > +	k1 = 0xccaa009eLLU;
> > +	k2 = 0x1751997d0LLU;
> > +	k5 = 0xccaa009eLLU;
> > +	k6 = 0x163cd6124LLU;
> > +	q =  0x1f7011640LLU;
> > +	p =  0x1db710641LLU;
> > +
> > +	/** Save the params in context structure */
> > +	crc32_eth_pclmulqdq.rk1_rk2 =
> > +		_mm_setr_epi64(_mm_cvtsi64_m64(k1),
> > 	_mm_cvtsi64_m64(k2));
> 
> Add extra tab for better readability.
> 
> > +	crc32_eth_pclmulqdq.rk5_rk6 =
> > +		_mm_setr_epi64(_mm_cvtsi64_m64(k5),
> > _mm_cvtsi64_m64(k6));
> > +	crc32_eth_pclmulqdq.rk7_rk8 =
> > +		_mm_setr_epi64(_mm_cvtsi64_m64(q),
> > _mm_cvtsi64_m64(p));
> > +
> > +	_mm_empty();
> 
> Maybe we need a comment here.
> 
> > +
> > +}
> > +
> > +static inline uint32_t
> > +rte_crc16_ccitt_sse42_handler(const uint8_t *data,
> > +	uint32_t data_len)
> > +{
> > +	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
> > +		data_len,
> > +		0xffff,
> > +		&crc16_ccitt_pclmulqdq);
> 
> Same comment about the casting here.
> 
> > +}
> > +
> > +static inline uint32_t
> > +rte_crc32_eth_sse42_handler(const uint8_t *data,
> > +	uint32_t data_len)
> > +{
> > +	return ~crc32_eth_calc_pclmulqdq(data,
> > +		data_len,
> > +		0xffffffffUL,
> > +		&crc32_eth_pclmulqdq);
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_NET_CRC_SSE_H_ */
> > diff --git a/lib/librte_net/rte_net_version.map
> > b/lib/librte_net/rte_net_version.map
> > index 3b15e65..c6716ec 100644
> > --- a/lib/librte_net/rte_net_version.map
> > +++ b/lib/librte_net/rte_net_version.map
> > @@ -4,3 +4,11 @@ DPDK_16.11 {
> >
> >  	local: *;
> >  };
> > +
> > +DPDK_17.05 {
> > +	global:
> > +
> > +	rte_net_crc_set_alg;
> > +	rte_net_crc_calc;
> 
> This has to be alphabetically sorted.

Thank you for detailed review. I will revise the patch following your comments and will send v6.

Jasvinder

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v5 2/2] test/test: add unit test for CRC computation
  2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 " Jasvinder Singh
@ 2017-03-28 19:23                   ` De Lara Guarch, Pablo
  2017-03-28 19:27                     ` Singh, Jasvinder
  0 siblings, 1 reply; 69+ messages in thread
From: De Lara Guarch, Pablo @ 2017-03-28 19:23 UTC (permalink / raw)
  To: Singh, Jasvinder, dev; +Cc: olivier.matz, Doherty, Declan

Hi Jasvinder,


> -----Original Message-----
> From: Singh, Jasvinder
> Sent: Tuesday, March 21, 2017 2:46 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Doherty, Declan; De Lara Guarch, Pablo
> Subject: [PATCH v5 2/2] test/test: add unit test for CRC computation
> 
> This patch provides a set of tests for verifying the functional
> correctness of 16-bit and 32-bit CRC APIs.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> ---
>  test/test/Makefile   |   2 +
>  test/test/test_crc.c | 223
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 225 insertions(+)
>  create mode 100644 test/test/test_crc.c
> 

> diff --git a/test/test/test_crc.c b/test/test/test_crc.c
> new file mode 100644
> index 0000000..2eb0bff
> --- /dev/null
> +++ b/test/test/test_crc.c
> @@ -0,0 +1,223 @@


> +
> +#include "test.h"
> +
> +#define CRC_VEC_LEN				32

Unnecessary extra tab.

> +#define CRC32_VEC_LEN1			1512
> +#define CRC32_VEC_LEN2			348
> +#define CRC16_VEC_LEN1			12
> +#define CRC16_VEC_LEN2			2
> +#define LINE_LEN 75

I would align this to the other values above.

...

> +
> +/* 16-bit CRC test vector 1*/

Extra space before "*/".

> +static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
> +	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
> +	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
> +};
> +
> +/* 16-bit CRC test vector 2*/

Same here.

> +static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
> +	0x03, 0x3f,
> +};
> +/** CRC results */
> +uint32_t crc32_vec_res, crc32_vec1_res, crc32_vec2_res;
> +uint32_t crc16_vec_res, crc16_vec1_res, crc16_vec2_res;
> +
> +static int
> +crc_calc(const uint8_t *vec,
> +	uint32_t vec_len,
> +	enum rte_net_crc_type type)
> +{
> +	/* compute CRC */
> +	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
> +
> +	/* dump data on console*/

Same here.

> +	rte_hexdump(stdout, NULL, vec, vec_len);

I would use TEST_HEXDUMP, which dumps the stream only if debug is enabled (to avoid too much output).

> +
> +	return  ret;
> +}
> +
> +static void
> +crc_calc_scalar(void)
> +{
> +	uint32_t i;
> +	enum rte_net_crc_type type;
> +	uint8_t *test_data;
> +
> +	/* 32-bit ethernet CRC: scalar result */
> +	type = RTE_NET_CRC32_ETH;
> +	crc32_vec_res = crc_calc(crc_vec,
> +						CRC_VEC_LEN,
> +						type);

Remove unnecessary tabs (more of this in the rest of the file).

> +
> +	/* 32-bit ethernet CRC scalar result*/
> +	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
> +
> +	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
> +	rte_memcpy(&test_data[i], crc32_vec1, 12);

Wrong indentation (more of this in the rest of the file).
Also, why not just one memcpy call, with CRC32_VEC_LEN1.
A comment would be good to explain why.

...

> +static int
> +test_crc(void) {

Looks strange to start the parameters in a new line.
I would say you can start in the same line as the function name.

> +	uint32_t i;
> +	enum rte_net_crc_type type;
> +	uint8_t *test_data;
> +	uint32_t ret;
> +
> +	/* set crc scalar mode */

Probably the references of "crc" in comments should be uppercase "CRC".

> +	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
> +	crc_calc_scalar();
> +
> +	/* set crc sse4.2 mode */
> +	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
> +
> +	/* 32-bit ethernet CRC: Test 1 */
> +	type = RTE_NET_CRC32_ETH;
> +
> +	ret = crc_calc(crc_vec,
> +		CRC_VEC_LEN,
> +		type);
> +	if (ret != crc32_vec_res) {
> +		printf("test_crc(32-bit): test1 failed !\n");

Extra space before "!" (keep consistency with the other printfs).

> +		return -1;
> +	}
> +
> +	/* 32-bit ethernet CRC: Test 2 */
> +	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);

Free the memory after the test (also for other calls).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v5 2/2] test/test: add unit test for CRC computation
  2017-03-28 19:23                   ` De Lara Guarch, Pablo
@ 2017-03-28 19:27                     ` Singh, Jasvinder
  0 siblings, 0 replies; 69+ messages in thread
From: Singh, Jasvinder @ 2017-03-28 19:27 UTC (permalink / raw)
  To: De Lara Guarch, Pablo, dev; +Cc: olivier.matz, Doherty, Declan

Hi Pablo,

> -----Original Message-----
> From: De Lara Guarch, Pablo
> Sent: Tuesday, March 28, 2017 8:23 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; Doherty, Declan <declan.doherty@intel.com>
> Subject: RE: [PATCH v5 2/2] test/test: add unit test for CRC computation
> 
> Hi Jasvinder,
> 
> 
> > -----Original Message-----
> > From: Singh, Jasvinder
> > Sent: Tuesday, March 21, 2017 2:46 PM
> > To: dev@dpdk.org
> > Cc: olivier.matz@6wind.com; Doherty, Declan; De Lara Guarch, Pablo
> > Subject: [PATCH v5 2/2] test/test: add unit test for CRC computation
> >
> > This patch provides a set of tests for verifying the functional
> > correctness of 16-bit and 32-bit CRC APIs.
> >
> > Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> > ---
> >  test/test/Makefile   |   2 +
> >  test/test/test_crc.c | 223
> > +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 225 insertions(+)
> >  create mode 100644 test/test/test_crc.c
> >
> 
> > diff --git a/test/test/test_crc.c b/test/test/test_crc.c new file mode
> > 100644 index 0000000..2eb0bff
> > --- /dev/null
> > +++ b/test/test/test_crc.c
> > @@ -0,0 +1,223 @@
> 
> 
> > +
> > +#include "test.h"
> > +
> > +#define CRC_VEC_LEN				32
> 
> Unnecessary extra tab.
> 
> > +#define CRC32_VEC_LEN1			1512
> > +#define CRC32_VEC_LEN2			348
> > +#define CRC16_VEC_LEN1			12
> > +#define CRC16_VEC_LEN2			2
> > +#define LINE_LEN 75
> 
> I would align this to the other values above.
> 
> ...
> 
> > +
> > +/* 16-bit CRC test vector 1*/
> 
> Extra space before "*/".
> 
> > +static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
> > +	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
> > +	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
> > +};
> > +
> > +/* 16-bit CRC test vector 2*/
> 
> Same here.
> 
> > +static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
> > +	0x03, 0x3f,
> > +};
> > +/** CRC results */
> > +uint32_t crc32_vec_res, crc32_vec1_res, crc32_vec2_res; uint32_t
> > +crc16_vec_res, crc16_vec1_res, crc16_vec2_res;
> > +
> > +static int
> > +crc_calc(const uint8_t *vec,
> > +	uint32_t vec_len,
> > +	enum rte_net_crc_type type)
> > +{
> > +	/* compute CRC */
> > +	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
> > +
> > +	/* dump data on console*/
> 
> Same here.
> 
> > +	rte_hexdump(stdout, NULL, vec, vec_len);
> 
> I would use TEST_HEXDUMP, which dumps the stream only if debug is
> enabled (to avoid too much output).
> 
> > +
> > +	return  ret;
> > +}
> > +
> > +static void
> > +crc_calc_scalar(void)
> > +{
> > +	uint32_t i;
> > +	enum rte_net_crc_type type;
> > +	uint8_t *test_data;
> > +
> > +	/* 32-bit ethernet CRC: scalar result */
> > +	type = RTE_NET_CRC32_ETH;
> > +	crc32_vec_res = crc_calc(crc_vec,
> > +						CRC_VEC_LEN,
> > +						type);
> 
> Remove unnecessary tabs (more of this in the rest of the file).
> 
> > +
> > +	/* 32-bit ethernet CRC scalar result*/
> > +	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
> > +
> > +	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
> > +	rte_memcpy(&test_data[i], crc32_vec1, 12);
> 
> Wrong indentation (more of this in the rest of the file).
> Also, why not just one memcpy call, with CRC32_VEC_LEN1.
> A comment would be good to explain why.
> 
> ...
> 
> > +static int
> > +test_crc(void) {
> 
> Looks strange to start the parameters in a new line.
> I would say you can start in the same line as the function name.
> 
> > +	uint32_t i;
> > +	enum rte_net_crc_type type;
> > +	uint8_t *test_data;
> > +	uint32_t ret;
> > +
> > +	/* set crc scalar mode */
> 
> Probably the references of "crc" in comments should be uppercase "CRC".
> 
> > +	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
> > +	crc_calc_scalar();
> > +
> > +	/* set crc sse4.2 mode */
> > +	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
> > +
> > +	/* 32-bit ethernet CRC: Test 1 */
> > +	type = RTE_NET_CRC32_ETH;
> > +
> > +	ret = crc_calc(crc_vec,
> > +		CRC_VEC_LEN,
> > +		type);
> > +	if (ret != crc32_vec_res) {
> > +		printf("test_crc(32-bit): test1 failed !\n");
> 
> Extra space before "!" (keep consistency with the other printfs).
> 
> > +		return -1;
> > +	}
> > +
> > +	/* 32-bit ethernet CRC: Test 2 */
> > +	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
> 
> Free the memory after the test (also for other calls).

Will correct the patch and send the next version. Thank you.

Jasvinder

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v6 0/2] librte_net: add crc computation support
  2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-28 18:04                   ` De Lara Guarch, Pablo
@ 2017-03-29 12:42                   ` Jasvinder Singh
  2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 " Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-29 12:42 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3366 bytes --]

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v6 changes:
- fixed build error when compiling as a shared library
- addressed review comments on v5, (thanks Pablo) 

v5 changes:
- rebase to the master

v4 changes:
- change crc compute api parameters to make it more generic
- change the unit test to accomodate the crc compute api change

v3 changes:
- separate the x86 specific implementation into new file
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (2):
  librte_net: add crc compute APIs
  test/test: add unit test for CRC computation

 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 199 ++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 104 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 361 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 test/test/Makefile                 |   2 +
 test/test/test_crc.c               | 241 +++++++++++++++++++++++++
 8 files changed, 919 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h
 create mode 100644 test/test/test_crc.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v6 1/2] librte_net: add crc compute APIs
  2017-03-29 12:42                   ` [dpdk-dev] [PATCH v6 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-03-29 12:42                     ` Jasvinder Singh
  2017-03-29 16:14                       ` De Lara Guarch, Pablo
  2017-03-29 17:15                       ` [dpdk-dev] [PATCH v7 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 " Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-29 12:42 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 199 ++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 104 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 361 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 6 files changed, 676 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

diff --git a/lib/Makefile b/lib/Makefile
index 5ad3c7c..456eb38 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -62,7 +62,7 @@ DEPDIRS-librte_lpm := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DEPDIRS-librte_acl := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
-DEPDIRS-librte_net := librte_mbuf
+DEPDIRS-librte_net := librte_mbuf librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DEPDIRS-librte_ip_frag := librte_eal librte_mempool librte_mbuf librte_ether
 DEPDIRS-librte_ip_frag += librte_hash
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index abd5c46..757f3bc 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,10 +39,13 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc_sse.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..0f58f07
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,199 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+
+#include <rte_net_crc.h>
+
+/** crc tables */
+static uint32_t crc32_eth_lut[CRC_LUT_SIZE];
+static uint32_t crc16_ccitt_lut[CRC_LUT_SIZE];
+
+static uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param val
+ *   value to be reflected
+ *
+ * @return
+ *   reflected value
+ */
+static uint32_t
+reflect_32bits(uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(uint32_t poly,
+	uint32_t *lut)
+{
+	uint32_t i, j;
+
+	for (i = 0; i < CRC_LUT_SIZE; i++) {
+		uint32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+		lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/** 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/** 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 32-bit CRC value */
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+	case RTE_NET_CRC_SSE42:
+#ifdef RTE_ARCH_X86_64
+		if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = RTE_NET_CRC_SCALAR;
+		else {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[type];
+	ret = f_handle((const uint8_t *) data, data_len);
+
+	return ret;
+}
+
+/*
+ * Select highest available crc algorithm as default one.
+ */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef RTE_ARCH_X86_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..dd6c110
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,104 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+#define CRC_LUT_SIZE 256
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ * This API set the CRC computation algorithm (i.e. scalar version,
+ * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
+ * structure.
+ *
+ * @param alg
+ *   This parameter is used to select the CRC implementation version.
+ *   - RTE_NET_CRC_SCALAR
+ *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param data
+ *   Pointer to the packet data for CRC computation
+ * @param data_len
+ *   Data length for CRC computation
+ * @param type
+ *   CRC type (enum rte_net_crc_type)
+ *
+ * @return
+ *   CRC value
+ */
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+#include <rte_net_crc_sse.h>
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_crc_sse.h b/lib/librte_net/rte_net_crc_sse.h
new file mode 100644
index 0000000..b04901c
--- /dev/null
+++ b/lib/librte_net/rte_net_crc_sse.h
@@ -0,0 +1,361 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <cpuid.h>
+#include <rte_net_crc.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block
+ *   16 byte data block
+ * @param precomp
+ *   Precomputed rk1 constanst
+ * @param fold
+ *   Current16 byte folded data
+ *
+ * @return
+ *   New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(__m128i data_block,
+		__m128i precomp,
+		__m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128
+ *   128 bits data to be reduced
+ * @param precomp
+ *   precomputed constants rk5, rk6
+ *
+ * @return
+ *  64 bits reduced data
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128, __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64
+ *   64 bits data to be reduced
+ * @param precomp
+ *   rk7 precomputed constant
+ *
+ * @return
+ *   reduced 32 bits data
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg
+ *   128 bit value
+ * @param num
+ *   number of bytes to shift left reg by (0-16)
+ *
+ * @return
+ *   reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1),	_mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5),	_mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/**
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..687c40e 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_calc;
+	rte_net_crc_set_alg;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v6 2/2] test/test: add unit test for CRC computation
  2017-03-29 12:42                   ` [dpdk-dev] [PATCH v6 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-29 12:42                     ` Jasvinder Singh
  2017-03-29 16:12                       ` De Lara Guarch, Pablo
  1 sibling, 1 reply; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-29 12:42 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 test/test/Makefile   |   2 +
 test/test/test_crc.c | 241 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 243 insertions(+)
 create mode 100644 test/test/test_crc.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 79f0c61..06d8d5d 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/test/test/test_crc.c b/test/test/test_crc.c
new file mode 100644
index 0000000..c25790d
--- /dev/null
+++ b/test/test/test_crc.c
@@ -0,0 +1,241 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "test.h"
+
+#include <rte_malloc.h>
+#include <rte_net_crc.h>
+
+#define CRC_VEC_LEN        32
+#define CRC32_VEC_LEN1     1512
+#define CRC32_VEC_LEN2     348
+#define CRC16_VEC_LEN1     12
+#define CRC16_VEC_LEN2     2
+#define LINE_LEN           75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1 */
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2 */
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console */
+	TEST_HEXDUMP(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static int
+test_crc_scalar(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t ret;
+
+	/* 32-bit ethernet CRC: scalar result */
+	type = RTE_NET_CRC32_ETH;
+	ret = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (ret != crc32_vec_res) {
+		printf("test_crc(32-bit, scalar): test1 failed!!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC scalar result*/
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data, CRC32_VEC_LEN1, type);
+	if (ret != crc32_vec1_res) {
+		printf("test_crc(32-bit, scalar): test2 failed!!\n");
+		rte_free(test_data);
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC scalar result */
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data, CRC32_VEC_LEN2, type);
+	if (ret != crc32_vec2_res) {
+		printf("test_crc(32-bit, scalar): test3 failed!!\n");
+		rte_free(test_data);
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC scalar result */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (ret != crc16_vec_res) {
+		printf("test_crc (16-bit, scalar): test4 failed!!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC scalar result */
+	ret = crc_calc(crc16_vec1, CRC16_VEC_LEN1, type);
+	if (ret != crc16_vec1_res) {
+		printf("test_crc (16-bit, scalar): test5 failed!!\n");
+		return -1;
+	}
+	/* 16-bit CCITT CRC scalar result*/
+	ret = crc_calc(crc16_vec2, CRC16_VEC_LEN2, type);
+	if (ret != crc16_vec2_res) {
+		printf("test_crc (16-bit, scalar): test6 failed!!\n");
+		return -1;
+	}
+
+	rte_free(test_data);
+	return 0;
+}
+
+static int
+test_crc_x86_sse42(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t ret;
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	ret = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (ret != crc32_vec_res) {
+		printf("test_crc(32-bit, x86_SSE4.2): test7 failed!!\n");
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data, CRC32_VEC_LEN1, type);
+	if (ret != crc32_vec1_res) {
+		printf("test_crc(32-bit, x86_SSE4.2): test8 failed!!\n");
+		rte_free(test_data);
+		return -1;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data, CRC32_VEC_LEN2, type);
+	if (ret != crc32_vec2_res) {
+		printf("test_crc(32-bit, x86_SSE4.2): test9 failed!!\n");
+		rte_free(test_data);
+		return -1;
+	}
+
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (ret != crc16_vec_res) {
+		printf("test_crc (16-bit, x86_SSE4.2): test10 failed!!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 5 */
+	ret = crc_calc(crc16_vec1, CRC16_VEC_LEN1, type);
+	if (ret != crc16_vec1_res) {
+		printf("test_crc (16-bit, x86_SSE4.2): test11 failed!!\n");
+		return -1;
+	}
+
+	/* 16-bit CCITT CRC:  Test 6 */
+	ret = crc_calc(crc16_vec2, CRC16_VEC_LEN2, type);
+	if (ret != crc16_vec2_res) {
+		printf("test_crc (16-bit, x86_SSE4.2): test12 failed!!\n");
+		return -1;
+	}
+
+	rte_free(test_data);
+	return 0;
+}
+
+static int
+test_crc(void)
+{
+	/* set CRC scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+	if (test_crc_scalar() == -1)
+		return -1;
+
+	/* set CRC sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+	if (test_crc_x86_sse42() == -1)
+		return -1;
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/2] test/test: add unit test for CRC computation
  2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 " Jasvinder Singh
@ 2017-03-29 16:12                       ` De Lara Guarch, Pablo
  0 siblings, 0 replies; 69+ messages in thread
From: De Lara Guarch, Pablo @ 2017-03-29 16:12 UTC (permalink / raw)
  To: Singh, Jasvinder, dev; +Cc: olivier.matz, Doherty, Declan

Hi Jasvinder,

> -----Original Message-----
> From: Singh, Jasvinder
> Sent: Wednesday, March 29, 2017 1:42 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Doherty, Declan; De Lara Guarch, Pablo
> Subject: [PATCH v6 2/2] test/test: add unit test for CRC computation
> 
> This patch provides a set of tests for verifying the functional
> correctness of 16-bit and 32-bit CRC APIs.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> ---
>  test/test/Makefile   |   2 +
>  test/test/test_crc.c | 241
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 243 insertions(+)
>  create mode 100644 test/test/test_crc.c
> 

...

> diff --git a/test/test/test_crc.c b/test/test/test_crc.c
> new file mode 100644
> index 0000000..c25790d
> --- /dev/null
> +++ b/test/test/test_crc.c
> @@ -0,0 +1,241 @@

...

> +static int
> +test_crc_scalar(void)

Could you merge this and the next function in a single one,
as they are duplicated (except for the comments/printfs, which can be a parameter of the function).

Apart from this:

Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/2] librte_net: add crc compute APIs
  2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-29 16:14                       ` De Lara Guarch, Pablo
  2017-03-29 17:15                       ` [dpdk-dev] [PATCH v7 0/2] librte_net: add crc computation support Jasvinder Singh
  1 sibling, 0 replies; 69+ messages in thread
From: De Lara Guarch, Pablo @ 2017-03-29 16:14 UTC (permalink / raw)
  To: Singh, Jasvinder, dev; +Cc: olivier.matz, Doherty, Declan



> -----Original Message-----
> From: Singh, Jasvinder
> Sent: Wednesday, March 29, 2017 1:42 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Doherty, Declan; De Lara Guarch, Pablo
> Subject: [PATCH v6 1/2] librte_net: add crc compute APIs
> 
> APIs for selecting the architecure specific implementation and computing
> the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
> as well as x86 intrinsic(sse4.2) versions are implemented.
> 
> The scalar version is based on generic Look-Up Table(LUT) algorithm,
> while x86 intrinsic version uses carry-less multiplication for
> fast CRC computation.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

Thanks for your work addressing all my comments.

Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v7 0/2] librte_net: add crc computation support
  2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-29 16:14                       ` De Lara Guarch, Pablo
@ 2017-03-29 17:15                       ` Jasvinder Singh
  2017-03-29 17:15                         ` [dpdk-dev] [PATCH v7 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-29 17:15                         ` [dpdk-dev] [PATCH v7 " Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-29 17:15 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3430 bytes --]

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v7 changes:
- remove the duplicate function in unit test.

v6 changes:
- fixed build error when compiling net library as a shared library
- addressed review comments on v5, (thanks Pablo) 

v5 changes:
- rebase to the master

v4 changes:
- change crc compute api parameters to make it more generic
- change the unit test to accomodate the crc compute api change

v3 changes:
- separate the x86 specific implementation into new file
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (2):
  librte_net: add crc compute APIs
  test/test: add unit test for CRC computation

 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 199 ++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 104 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 361 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 test/test/Makefile                 |   2 +
 test/test/test_crc.c               | 173 ++++++++++++++++++
 8 files changed, 851 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h
 create mode 100644 test/test/test_crc.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v7 1/2] librte_net: add crc compute APIs
  2017-03-29 17:15                       ` [dpdk-dev] [PATCH v7 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-03-29 17:15                         ` Jasvinder Singh
  2017-03-30 11:30                           ` [dpdk-dev] [PATCH v8 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-29 17:15                         ` [dpdk-dev] [PATCH v7 " Jasvinder Singh
  1 sibling, 1 reply; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-29 17:15 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 199 ++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 104 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 361 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 6 files changed, 676 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

diff --git a/lib/Makefile b/lib/Makefile
index 5ad3c7c..456eb38 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -62,7 +62,7 @@ DEPDIRS-librte_lpm := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DEPDIRS-librte_acl := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
-DEPDIRS-librte_net := librte_mbuf
+DEPDIRS-librte_net := librte_mbuf librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DEPDIRS-librte_ip_frag := librte_eal librte_mempool librte_mbuf librte_ether
 DEPDIRS-librte_ip_frag += librte_hash
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index abd5c46..757f3bc 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,10 +39,13 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc_sse.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..0f58f07
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,199 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+
+#include <rte_net_crc.h>
+
+/** crc tables */
+static uint32_t crc32_eth_lut[CRC_LUT_SIZE];
+static uint32_t crc16_ccitt_lut[CRC_LUT_SIZE];
+
+static uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param val
+ *   value to be reflected
+ *
+ * @return
+ *   reflected value
+ */
+static uint32_t
+reflect_32bits(uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(uint32_t poly,
+	uint32_t *lut)
+{
+	uint32_t i, j;
+
+	for (i = 0; i < CRC_LUT_SIZE; i++) {
+		uint32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+		lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/** 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/** 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 32-bit CRC value */
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+	case RTE_NET_CRC_SSE42:
+#ifdef RTE_ARCH_X86_64
+		if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = RTE_NET_CRC_SCALAR;
+		else {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[type];
+	ret = f_handle((const uint8_t *) data, data_len);
+
+	return ret;
+}
+
+/*
+ * Select highest available crc algorithm as default one.
+ */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef RTE_ARCH_X86_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..dd6c110
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,104 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+#define CRC_LUT_SIZE 256
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ * This API set the CRC computation algorithm (i.e. scalar version,
+ * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
+ * structure.
+ *
+ * @param alg
+ *   This parameter is used to select the CRC implementation version.
+ *   - RTE_NET_CRC_SCALAR
+ *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param data
+ *   Pointer to the packet data for CRC computation
+ * @param data_len
+ *   Data length for CRC computation
+ * @param type
+ *   CRC type (enum rte_net_crc_type)
+ *
+ * @return
+ *   CRC value
+ */
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+#include <rte_net_crc_sse.h>
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_crc_sse.h b/lib/librte_net/rte_net_crc_sse.h
new file mode 100644
index 0000000..514b400
--- /dev/null
+++ b/lib/librte_net/rte_net_crc_sse.h
@@ -0,0 +1,361 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <cpuid.h>
+#include <rte_net_crc.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block
+ *   16 byte data block
+ * @param precomp
+ *   Precomputed rk1 constanst
+ * @param fold
+ *   Current16 byte folded data
+ *
+ * @return
+ *   New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(__m128i data_block,
+		__m128i precomp,
+		__m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128
+ *   128 bits data to be reduced
+ * @param precomp
+ *   precomputed constants rk5, rk6
+ *
+ * @return
+ *  64 bits reduced data
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128, __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64
+ *   64 bits data to be reduced
+ * @param precomp
+ *   rk7 precomputed constant
+ *
+ * @return
+ *   reduced 32 bits data
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg
+ *   128 bit value
+ * @param num
+ *   number of bytes to shift left reg by (0-16)
+ *
+ * @return
+ *   reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/**
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..687c40e 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_calc;
+	rte_net_crc_set_alg;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v7 2/2] test/test: add unit test for CRC computation
  2017-03-29 17:15                       ` [dpdk-dev] [PATCH v7 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-29 17:15                         ` [dpdk-dev] [PATCH v7 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-29 17:15                         ` Jasvinder Singh
  1 sibling, 0 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-29 17:15 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 test/test/Makefile   |   2 +
 test/test/test_crc.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 175 insertions(+)
 create mode 100644 test/test/test_crc.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 79f0c61..06d8d5d 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/test/test/test_crc.c b/test/test/test_crc.c
new file mode 100644
index 0000000..524cb29
--- /dev/null
+++ b/test/test/test_crc.c
@@ -0,0 +1,173 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "test.h"
+
+#include <rte_malloc.h>
+#include <rte_net_crc.h>
+
+#define CRC_VEC_LEN        32
+#define CRC32_VEC_LEN1     1512
+#define CRC32_VEC_LEN2     348
+#define CRC16_VEC_LEN1     12
+#define CRC16_VEC_LEN2     2
+#define LINE_LEN           75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1 */
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2 */
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console */
+	TEST_HEXDUMP(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static int
+test_crc_calc(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t ret;
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	ret = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (ret != crc32_vec_res)
+		return -1;
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data, CRC32_VEC_LEN1, type);
+	if (ret != crc32_vec1_res) {
+		rte_free(test_data);
+		return -2;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	ret = crc_calc(test_data, CRC32_VEC_LEN2, type);
+	if (ret != crc32_vec2_res) {
+		rte_free(test_data);
+		return -3;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	ret = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (ret != crc16_vec_res)
+		return -4;
+
+	/* 16-bit CCITT CRC:  Test 5 */
+	ret = crc_calc(crc16_vec1, CRC16_VEC_LEN1, type);
+	if (ret != crc16_vec1_res)
+		return -5;
+
+	/* 16-bit CCITT CRC:  Test 6 */
+	ret = crc_calc(crc16_vec2, CRC16_VEC_LEN2, type);
+	if (ret != crc16_vec2_res)
+		return -6;
+
+	rte_free(test_data);
+	return 0;
+}
+
+static int
+test_crc(void)
+{
+	int ret;
+	/* set CRC scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (scalar): failed (%d)\n", ret);
+		return ret;
+	}
+	/* set CRC sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (x86_64_SSE4.2): failed (%d)\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v8 0/2] librte_net: add crc computation support
  2017-03-29 17:15                         ` [dpdk-dev] [PATCH v7 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-30 11:30                           ` Jasvinder Singh
  2017-03-30 11:30                             ` [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-30 11:30                             ` [dpdk-dev] [PATCH v8 2/2] test/test: add unit test for CRC computation Jasvinder Singh
  0 siblings, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-30 11:30 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3481 bytes --]

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v8 changes:
- fixed memory leaks in unit test.
 
v7 changes:
- removed the duplicate function in unit test.

v6 changes:
- fixed build error when compiling net library as a shared library
- addressed review comments on v5, (thanks Pablo) 

v5 changes:
- rebase to the master

v4 changes:
- change crc compute api parameters to make it more generic
- change the unit test to accomodate the crc compute api change

v3 changes:
- separate the x86 specific implementation into new file
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (2):
  librte_net: add crc compute APIs
  test/test: add unit test for CRC computation

 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 199 ++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 104 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 361 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 test/test/Makefile                 |   2 +
 test/test/test_crc.c               | 182 +++++++++++++++++++
 8 files changed, 860 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h
 create mode 100644 test/test/test_crc.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs
  2017-03-30 11:30                           ` [dpdk-dev] [PATCH v8 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-03-30 11:30                             ` Jasvinder Singh
  2017-03-30 11:31                               ` Ananyev, Konstantin
  2017-03-30 16:15                               ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Jasvinder Singh
  2017-03-30 11:30                             ` [dpdk-dev] [PATCH v8 2/2] test/test: add unit test for CRC computation Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-30 11:30 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   3 +
 lib/librte_net/rte_net_crc.c       | 199 ++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       | 104 +++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 361 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 6 files changed, 676 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

diff --git a/lib/Makefile b/lib/Makefile
index 5ad3c7c..456eb38 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -62,7 +62,7 @@ DEPDIRS-librte_lpm := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DEPDIRS-librte_acl := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
-DEPDIRS-librte_net := librte_mbuf
+DEPDIRS-librte_net := librte_mbuf librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DEPDIRS-librte_ip_frag := librte_eal librte_mempool librte_mbuf librte_ether
 DEPDIRS-librte_ip_frag += librte_hash
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index abd5c46..757f3bc 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,10 +39,13 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc_sse.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..0f58f07
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,199 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+
+#include <rte_net_crc.h>
+
+/** crc tables */
+static uint32_t crc32_eth_lut[CRC_LUT_SIZE];
+static uint32_t crc16_ccitt_lut[CRC_LUT_SIZE];
+
+static uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param val
+ *   value to be reflected
+ *
+ * @return
+ *   reflected value
+ */
+static uint32_t
+reflect_32bits(uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(uint32_t poly,
+	uint32_t *lut)
+{
+	uint32_t i, j;
+
+	for (i = 0; i < CRC_LUT_SIZE; i++) {
+		uint32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+		lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/** 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/** 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 32-bit CRC value */
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+	case RTE_NET_CRC_SSE42:
+#ifdef RTE_ARCH_X86_64
+		if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = RTE_NET_CRC_SCALAR;
+		else {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[type];
+	ret = f_handle((const uint8_t *) data, data_len);
+
+	return ret;
+}
+
+/*
+ * Select highest available crc algorithm as default one.
+ */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef RTE_ARCH_X86_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..dd6c110
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,104 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+#define CRC_LUT_SIZE 256
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ * This API set the CRC computation algorithm (i.e. scalar version,
+ * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
+ * structure.
+ *
+ * @param alg
+ *   This parameter is used to select the CRC implementation version.
+ *   - RTE_NET_CRC_SCALAR
+ *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param data
+ *   Pointer to the packet data for CRC computation
+ * @param data_len
+ *   Data length for CRC computation
+ * @param type
+ *   CRC type (enum rte_net_crc_type)
+ *
+ * @return
+ *   CRC value
+ */
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+#include <rte_net_crc_sse.h>
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_crc_sse.h b/lib/librte_net/rte_net_crc_sse.h
new file mode 100644
index 0000000..514b400
--- /dev/null
+++ b/lib/librte_net/rte_net_crc_sse.h
@@ -0,0 +1,361 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <cpuid.h>
+#include <rte_net_crc.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block
+ *   16 byte data block
+ * @param precomp
+ *   Precomputed rk1 constanst
+ * @param fold
+ *   Current16 byte folded data
+ *
+ * @return
+ *   New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(__m128i data_block,
+		__m128i precomp,
+		__m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128
+ *   128 bits data to be reduced
+ * @param precomp
+ *   precomputed constants rk5, rk6
+ *
+ * @return
+ *  64 bits reduced data
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128, __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64
+ *   64 bits data to be reduced
+ * @param precomp
+ *   rk7 precomputed constant
+ *
+ * @return
+ *   reduced 32 bits data
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg
+ *   128 bit value
+ * @param num
+ *   number of bytes to shift left reg by (0-16)
+ *
+ * @return
+ *   reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/**
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..687c40e 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_calc;
+	rte_net_crc_set_alg;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v8 2/2] test/test: add unit test for CRC computation
  2017-03-30 11:30                           ` [dpdk-dev] [PATCH v8 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-30 11:30                             ` [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-30 11:30                             ` Jasvinder Singh
  1 sibling, 0 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-30 11:30 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 test/test/Makefile   |   2 +
 test/test/test_crc.c | 182 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 184 insertions(+)
 create mode 100644 test/test/test_crc.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 79f0c61..06d8d5d 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/test/test/test_crc.c b/test/test/test_crc.c
new file mode 100644
index 0000000..b6c5517
--- /dev/null
+++ b/test/test/test_crc.c
@@ -0,0 +1,182 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "test.h"
+
+#include <rte_malloc.h>
+#include <rte_net_crc.h>
+
+#define CRC_VEC_LEN        32
+#define CRC32_VEC_LEN1     1512
+#define CRC32_VEC_LEN2     348
+#define CRC16_VEC_LEN1     12
+#define CRC16_VEC_LEN2     2
+#define LINE_LEN           75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1 */
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2 */
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console */
+	TEST_HEXDUMP(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static int
+test_crc_calc(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t result;
+	int error;
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	result = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (result != crc32_vec_res)
+		return -1;
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	result = crc_calc(test_data, CRC32_VEC_LEN1, type);
+	if (result != crc32_vec1_res) {
+		error = -2;
+		goto fail;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	result = crc_calc(test_data, CRC32_VEC_LEN2, type);
+	if (result != crc32_vec2_res) {
+		error = -3;
+		goto fail;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	result = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (result != crc16_vec_res) {
+		error = -4;
+		goto fail;
+	}
+	/* 16-bit CCITT CRC:  Test 5 */
+	result = crc_calc(crc16_vec1, CRC16_VEC_LEN1, type);
+	if (result != crc16_vec1_res) {
+		error = -5;
+		goto fail;
+	}
+	/* 16-bit CCITT CRC:  Test 6 */
+	result = crc_calc(crc16_vec2, CRC16_VEC_LEN2, type);
+	if (result != crc16_vec2_res) {
+		error = -6;
+		goto fail;
+	}
+
+	rte_free(test_data);
+	return 0;
+
+fail:
+	rte_free(test_data);
+	return error;
+}
+
+static int
+test_crc(void)
+{
+	int ret;
+	/* set CRC scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (scalar): failed (%d)\n", ret);
+		return ret;
+	}
+	/* set CRC sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (x86_64_SSE4.2): failed (%d)\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs
  2017-03-30 11:30                             ` [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-30 11:31                               ` Ananyev, Konstantin
  2017-03-30 12:06                                 ` Singh, Jasvinder
  2017-03-30 14:40                                 ` Olivier Matz
  2017-03-30 16:15                               ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Ananyev, Konstantin @ 2017-03-30 11:31 UTC (permalink / raw)
  To: Singh, Jasvinder, dev
  Cc: olivier.matz, Doherty, Declan, De Lara Guarch, Pablo

Hi Jasvinder,

> diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
> new file mode 100644
> index 0000000..dd6c110
> --- /dev/null
> +++ b/lib/librte_net/rte_net_crc.h
> @@ -0,0 +1,104 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_NET_CRC_H_
> +#define _RTE_NET_CRC_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdint.h>
> +
> +#include <rte_mbuf.h>

As a nit: you probably don't need that include.
Konstantin

> +
> +/** CRC polynomials */
> +#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
> +#define CRC16_CCITT_POLYNOMIAL 0x1021U
> +
> +#define CRC_LUT_SIZE 256
> +
> +/** CRC types */
> +enum rte_net_crc_type {
> +	RTE_NET_CRC16_CCITT = 0,
> +	RTE_NET_CRC32_ETH,
> +	RTE_NET_CRC_REQS
> +};
> +
> +/** CRC compute algorithm */
> +enum rte_net_crc_alg {
> +	RTE_NET_CRC_SCALAR = 0,
> +	RTE_NET_CRC_SSE42,
> +};
> +
> +/**
> + * This API set the CRC computation algorithm (i.e. scalar version,
> + * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
> + * structure.
> + *
> + * @param alg
> + *   This parameter is used to select the CRC implementation version.
> + *   - RTE_NET_CRC_SCALAR
> + *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
> + */
> +void
> +rte_net_crc_set_alg(enum rte_net_crc_alg alg);
> +
> +/**
> + * CRC compute API
> + *
> + * @param data
> + *   Pointer to the packet data for CRC computation
> + * @param data_len
> + *   Data length for CRC computation
> + * @param type
> + *   CRC type (enum rte_net_crc_type)
> + *
> + * @return
> + *   CRC value
> + */
> +uint32_t
> +rte_net_crc_calc(const void *data,
> +	uint32_t data_len,
> +	enum rte_net_crc_type type);
> +
> +#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
> +#include <rte_net_crc_sse.h>
> +#endif
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs
  2017-03-30 11:31                               ` Ananyev, Konstantin
@ 2017-03-30 12:06                                 ` Singh, Jasvinder
  2017-03-30 14:40                                 ` Olivier Matz
  1 sibling, 0 replies; 69+ messages in thread
From: Singh, Jasvinder @ 2017-03-30 12:06 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: olivier.matz, Doherty, Declan, De Lara Guarch, Pablo



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, March 30, 2017 12:32 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; Doherty, Declan <declan.doherty@intel.com>;
> De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs
> 
> Hi Jasvinder,
> 
> > diff --git a/lib/librte_net/rte_net_crc.h
> > b/lib/librte_net/rte_net_crc.h new file mode 100644 index
> > 0000000..dd6c110
> > --- /dev/null
> > +++ b/lib/librte_net/rte_net_crc.h
> > @@ -0,0 +1,104 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_NET_CRC_H_
> > +#define _RTE_NET_CRC_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <stdint.h>
> > +
> > +#include <rte_mbuf.h>
> 
> As a nit: you probably don't need that include.
> Konstantin
> 

Oh, forgot to remove this, will send another version. Thanks Konstantin.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs
  2017-03-30 11:31                               ` Ananyev, Konstantin
  2017-03-30 12:06                                 ` Singh, Jasvinder
@ 2017-03-30 14:40                                 ` Olivier Matz
  2017-03-30 15:14                                   ` Singh, Jasvinder
  1 sibling, 1 reply; 69+ messages in thread
From: Olivier Matz @ 2017-03-30 14:40 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Singh, Jasvinder, dev, Doherty, Declan, De Lara Guarch, Pablo

Hi Jasvinder,

On Thu, 30 Mar 2017 11:31:54 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> Hi Jasvinder,
> 
> > diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
> > new file mode 100644
> > index 0000000..dd6c110
> > --- /dev/null
> > +++ b/lib/librte_net/rte_net_crc.h
> > @@ -0,0 +1,104 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_NET_CRC_H_
> > +#define _RTE_NET_CRC_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <stdint.h>
> > +
> > +#include <rte_mbuf.h>  
> 
> As a nit: you probably don't need that include.
> Konstantin
> 
> > +
> > +/** CRC polynomials */
> > +#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
> > +#define CRC16_CCITT_POLYNOMIAL 0x1021U
> > +
> > +#define CRC_LUT_SIZE 256
> > +
> > +/** CRC types */
> > +enum rte_net_crc_type {
> > +	RTE_NET_CRC16_CCITT = 0,
> > +	RTE_NET_CRC32_ETH,
> > +	RTE_NET_CRC_REQS
> > +};
> > +
> > +/** CRC compute algorithm */
> > +enum rte_net_crc_alg {
> > +	RTE_NET_CRC_SCALAR = 0,
> > +	RTE_NET_CRC_SSE42,
> > +};
> > +
> > +/**
> > + * This API set the CRC computation algorithm (i.e. scalar version,
> > + * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
> > + * structure.
> > + *
> > + * @param alg
> > + *   This parameter is used to select the CRC implementation version.
> > + *   - RTE_NET_CRC_SCALAR
> > + *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
> > + */
> > +void
> > +rte_net_crc_set_alg(enum rte_net_crc_alg alg);
> > +
> > +/**
> > + * CRC compute API
> > + *
> > + * @param data
> > + *   Pointer to the packet data for CRC computation
> > + * @param data_len
> > + *   Data length for CRC computation
> > + * @param type
> > + *   CRC type (enum rte_net_crc_type)
> > + *
> > + * @return
> > + *   CRC value
> > + */
> > +uint32_t
> > +rte_net_crc_calc(const void *data,
> > +	uint32_t data_len,
> > +	enum rte_net_crc_type type);
> > +
> > +#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
> > +#include <rte_net_crc_sse.h>
> > +#endif

I think this include should not be included from rte_net_crc.h.

From what I see, the API is the same for sse and non-sse, so this include
could be private, included only from the .c file. If you also remove
the include to rte_mbuf.h as suggested by Konstantin, it will require the
following includes in rte_net_crc.c:

 #include <stddef.h>
 #include <string.h>
 
 #include <rte_common.h>
 #include <rte_cpuflags.h>
 #include <rte_branch_prediction.h>
 #include <rte_vect.h>
 #include <rte_net_crc.h>
 #if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
 #include <rte_net_crc_sse.h>
 #endif

If the sse file is only used in the .c, this line could also be
removed in the Makefile:

SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc_sse.h


I'm not very familiar with crc and sse code. Could you add yourself
as maintainer for these files in MAINTAINERS?


Thanks
Olivier

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs
  2017-03-30 14:40                                 ` Olivier Matz
@ 2017-03-30 15:14                                   ` Singh, Jasvinder
  0 siblings, 0 replies; 69+ messages in thread
From: Singh, Jasvinder @ 2017-03-30 15:14 UTC (permalink / raw)
  To: Olivier Matz, Ananyev, Konstantin
  Cc: dev, Doherty, Declan, De Lara Guarch, Pablo

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, March 30, 2017 3:41 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Singh, Jasvinder <jasvinder.singh@intel.com>; dev@dpdk.org; Doherty,
> Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs


<snip>

> I think this include should not be included from rte_net_crc.h.
> 
> From what I see, the API is the same for sse and non-sse, so this include
> could be private, included only from the .c file. If you also remove the include
> to rte_mbuf.h as suggested by Konstantin, it will require the following
> includes in rte_net_crc.c:
> 
>  #include <stddef.h>
>  #include <string.h>
> 
>  #include <rte_common.h>
>  #include <rte_cpuflags.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_vect.h>
>  #include <rte_net_crc.h>
>  #if defined(RTE_ARCH_X86_64) &&
> defined(RTE_MACHINE_CPUFLAG_SSE4_2)
>  #include <rte_net_crc_sse.h>
>  #endif
> 
> If the sse file is only used in the .c, this line could also be removed in the
> Makefile:
> 
> SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc_sse.h
> 
> 
> I'm not very familiar with crc and sse code. Could you add yourself as
> maintainer for these files in MAINTAINERS?
> 
> 
> Thanks
> Olivier

Thank you for the review. I will make above suggested changes in the next version. 

Jasvinder

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support
  2017-03-30 11:30                             ` [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-30 11:31                               ` Ananyev, Konstantin
@ 2017-03-30 16:15                               ` Jasvinder Singh
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 1/3] librte_net: add crc compute APIs Jasvinder Singh
                                                   ` (3 more replies)
  1 sibling, 4 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-30 16:15 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3639 bytes --]

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v9 changes:
- included header files.
- added maintainership

v8 changes:
- improve unit test case.
 
v7 changes:
- remove the duplicate function in unit test.

v6 changes:
- fixed build error when compiling net library as a shared library
- addressed review comments on v5, (thanks Pablo) 

v5 changes:
- rebase to the master

v4 changes:
- change crc compute api parameters to make it more generic
- change the unit test to accomodate the crc compute api change

v3 changes:
- separate the x86 specific implementation into new file
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (3):
  librte_net: add crc compute APIs
  test/test: add unit test for CRC computation
  maintainers: add packet crc section and claim maintainership

 MAINTAINERS                        |   6 +
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/rte_net_crc.c       | 206 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       |  96 ++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 363 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 test/test/Makefile                 |   2 +
 test/test/test_crc.c               | 183 +++++++++++++++++++
 9 files changed, 867 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h
 create mode 100644 test/test/test_crc.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v9 1/3] librte_net: add crc compute APIs
  2017-03-30 16:15                               ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Jasvinder Singh
@ 2017-03-30 16:15                                 ` Jasvinder Singh
  2017-04-04 20:00                                   ` Thomas Monjalon
  2017-04-05 14:58                                   ` [dpdk-dev] [PATCH v10 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 2/3] " Jasvinder Singh
                                                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-30 16:15 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/rte_net_crc.c       | 206 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       |  96 ++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 363 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 6 files changed, 676 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

diff --git a/lib/Makefile b/lib/Makefile
index 5ad3c7c..456eb38 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -62,7 +62,7 @@ DEPDIRS-librte_lpm := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DEPDIRS-librte_acl := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
-DEPDIRS-librte_net := librte_mbuf
+DEPDIRS-librte_net := librte_mbuf librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DEPDIRS-librte_ip_frag := librte_eal librte_mempool librte_mbuf librte_ether
 DEPDIRS-librte_ip_frag += librte_hash
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index abd5c46..56727c4 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,10 +39,12 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..4601d6a
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,206 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <string.h>
+#include <stdint.h>
+
+#include <rte_cpuflags.h>
+#include <rte_common.h>
+#include <rte_net_crc.h>
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+#include <rte_net_crc_sse.h>
+#endif
+
+/** crc tables */
+static uint32_t crc32_eth_lut[CRC_LUT_SIZE];
+static uint32_t crc16_ccitt_lut[CRC_LUT_SIZE];
+
+static uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_SSE4_2)
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param val
+ *   value to be reflected
+ *
+ * @return
+ *   reflected value
+ */
+static uint32_t
+reflect_32bits(uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(uint32_t poly,
+	uint32_t *lut)
+{
+	uint32_t i, j;
+
+	for (i = 0; i < CRC_LUT_SIZE; i++) {
+		uint32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+		lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/** 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/** 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 32-bit CRC value */
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+	case RTE_NET_CRC_SSE42:
+#ifdef RTE_ARCH_X86_64
+		if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = RTE_NET_CRC_SCALAR;
+		else {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[type];
+	ret = f_handle((const uint8_t *) data, data_len);
+
+	return ret;
+}
+
+/*
+ * Select highest available crc algorithm as default one.
+ */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef RTE_ARCH_X86_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..76fd129
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,96 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+#define CRC_LUT_SIZE 256
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ * This API set the CRC computation algorithm (i.e. scalar version,
+ * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
+ * structure.
+ *
+ * @param alg
+ *   This parameter is used to select the CRC implementation version.
+ *   - RTE_NET_CRC_SCALAR
+ *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param data
+ *   Pointer to the packet data for CRC computation
+ * @param data_len
+ *   Data length for CRC computation
+ * @param type
+ *   CRC type (enum rte_net_crc_type)
+ *
+ * @return
+ *   CRC value
+ */
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_crc_sse.h b/lib/librte_net/rte_net_crc_sse.h
new file mode 100644
index 0000000..8bce522
--- /dev/null
+++ b/lib/librte_net/rte_net_crc_sse.h
@@ -0,0 +1,363 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <rte_branch_prediction.h>
+
+#include <x86intrin.h>
+#include <cpuid.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block
+ *   16 byte data block
+ * @param precomp
+ *   Precomputed rk1 constanst
+ * @param fold
+ *   Current16 byte folded data
+ *
+ * @return
+ *   New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(__m128i data_block,
+		__m128i precomp,
+		__m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128
+ *   128 bits data to be reduced
+ * @param precomp
+ *   precomputed constants rk5, rk6
+ *
+ * @return
+ *  64 bits reduced data
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128, __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64
+ *   64 bits data to be reduced
+ * @param precomp
+ *   rk7 precomputed constant
+ *
+ * @return
+ *   reduced 32 bits data
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg
+ *   128 bit value
+ * @param num
+ *   number of bytes to shift left reg by (0-16)
+ *
+ * @return
+ *   reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/**
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..687c40e 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_calc;
+	rte_net_crc_set_alg;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v9 2/3] test/test: add unit test for CRC computation
  2017-03-30 16:15                               ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Jasvinder Singh
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 1/3] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-03-30 16:15                                 ` Jasvinder Singh
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 3/3] maintainers: add packet crc section and claim maintainership Jasvinder Singh
  2017-04-04 20:02                                 ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Thomas Monjalon
  3 siblings, 0 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-30 16:15 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 test/test/Makefile   |   2 +
 test/test/test_crc.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 185 insertions(+)
 create mode 100644 test/test/test_crc.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 79f0c61..06d8d5d 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/test/test/test_crc.c b/test/test/test_crc.c
new file mode 100644
index 0000000..ea61ca9
--- /dev/null
+++ b/test/test/test_crc.c
@@ -0,0 +1,183 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "test.h"
+
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_net_crc.h>
+
+#define CRC_VEC_LEN        32
+#define CRC32_VEC_LEN1     1512
+#define CRC32_VEC_LEN2     348
+#define CRC16_VEC_LEN1     12
+#define CRC16_VEC_LEN2     2
+#define LINE_LEN           75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1 */
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2 */
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console */
+	TEST_HEXDUMP(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static int
+test_crc_calc(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t result;
+	int error;
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	result = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (result != crc32_vec_res)
+		return -1;
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	result = crc_calc(test_data, CRC32_VEC_LEN1, type);
+	if (result != crc32_vec1_res) {
+		error = -2;
+		goto fail;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	result = crc_calc(test_data, CRC32_VEC_LEN2, type);
+	if (result != crc32_vec2_res) {
+		error = -3;
+		goto fail;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	result = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (result != crc16_vec_res) {
+		error = -4;
+		goto fail;
+	}
+	/* 16-bit CCITT CRC:  Test 5 */
+	result = crc_calc(crc16_vec1, CRC16_VEC_LEN1, type);
+	if (result != crc16_vec1_res) {
+		error = -5;
+		goto fail;
+	}
+	/* 16-bit CCITT CRC:  Test 6 */
+	result = crc_calc(crc16_vec2, CRC16_VEC_LEN2, type);
+	if (result != crc16_vec2_res) {
+		error = -6;
+		goto fail;
+	}
+
+	rte_free(test_data);
+	return 0;
+
+fail:
+	rte_free(test_data);
+	return error;
+}
+
+static int
+test_crc(void)
+{
+	int ret;
+	/* set CRC scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (scalar): failed (%d)\n", ret);
+		return ret;
+	}
+	/* set CRC sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (x86_64_SSE4.2): failed (%d)\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v9 3/3] maintainers: add packet crc section and claim maintainership
  2017-03-30 16:15                               ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Jasvinder Singh
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 1/3] librte_net: add crc compute APIs Jasvinder Singh
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 2/3] " Jasvinder Singh
@ 2017-03-30 16:15                                 ` Jasvinder Singh
  2017-04-04 19:55                                   ` Thomas Monjalon
  2017-04-04 20:02                                 ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Thomas Monjalon
  3 siblings, 1 reply; 69+ messages in thread
From: Jasvinder Singh @ 2017-03-30 16:15 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b1524d..270c2fe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -480,6 +480,12 @@ Network headers
 M: Olivier Matz <olivier.matz@6wind.com>
 F: lib/librte_net/
 
+Packet CRC
+M: Jasvinder Singh <jasvinder.singh@intel.com>
+F: lib/librte_net/rte_net_crc*
+F: lib/librte_net/rte_net_crc_sse.h
+F: test/test/test_crc.c
+
 IP fragmentation & reassembly
 M: Konstantin Ananyev <konstantin.ananyev@intel.com>
 F: lib/librte_ip_frag/
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v9 3/3] maintainers: add packet crc section and claim maintainership
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 3/3] maintainers: add packet crc section and claim maintainership Jasvinder Singh
@ 2017-04-04 19:55                                   ` Thomas Monjalon
  0 siblings, 0 replies; 69+ messages in thread
From: Thomas Monjalon @ 2017-04-04 19:55 UTC (permalink / raw)
  To: Jasvinder Singh; +Cc: dev, olivier.matz, declan.doherty, pablo.de.lara.guarch

2017-03-30 17:15, Jasvinder Singh:
> +Packet CRC
> +M: Jasvinder Singh <jasvinder.singh@intel.com>
> +F: lib/librte_net/rte_net_crc*
> +F: lib/librte_net/rte_net_crc_sse.h

The above lines should be in the first patch.

> +F: test/test/test_crc.c

This line should be in the second patch.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v9 1/3] librte_net: add crc compute APIs
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 1/3] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-04-04 20:00                                   ` Thomas Monjalon
  2017-04-05 14:58                                   ` [dpdk-dev] [PATCH v10 0/2] librte_net: add crc computation support Jasvinder Singh
  1 sibling, 0 replies; 69+ messages in thread
From: Thomas Monjalon @ 2017-04-04 20:00 UTC (permalink / raw)
  To: Jasvinder Singh, pablo.de.lara.guarch, bruce.richardson
  Cc: dev, olivier.matz, declan.doherty

2017-03-30 17:15, Jasvinder Singh:
> +/**
> + * CRC compute API
> + *
> + * @param data
> + *   Pointer to the packet data for CRC computation
> + * @param data_len
> + *   Data length for CRC computation
> + * @param type
> + *   CRC type (enum rte_net_crc_type)
> + *
> + * @return
> + *   CRC value
> + */
> +uint32_t
> +rte_net_crc_calc(const void *data,
> +       uint32_t data_len,
> +       enum rte_net_crc_type type);

I still think returning a value from a data input is a kind of hash.
And this is my wikipedia argument:
	https://en.wikipedia.org/wiki/List_of_hash_functions
"This is a list of hash functions, including cyclic redundancy checks,
checksum functions, and cryptographic hash functions."

Anyway, I must accept the community decision.
Now I would like to see a clear explanations of which algos are in
librte_hash, and why we have CRC32c and Toeplitz.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support
  2017-03-30 16:15                               ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Jasvinder Singh
                                                   ` (2 preceding siblings ...)
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 3/3] maintainers: add packet crc section and claim maintainership Jasvinder Singh
@ 2017-04-04 20:02                                 ` Thomas Monjalon
  2017-04-05  8:34                                   ` Singh, Jasvinder
  3 siblings, 1 reply; 69+ messages in thread
From: Thomas Monjalon @ 2017-04-04 20:02 UTC (permalink / raw)
  To: Jasvinder Singh; +Cc: dev, olivier.matz, declan.doherty, pablo.de.lara.guarch

2017-03-30 17:15, Jasvinder Singh:
> In some applications, CRC (Cyclic Redundancy Check) needs to be computed
> or updated during packet processing operations. This patchset adds
> software implementation of some common standard CRCs (32-bit Ethernet
> CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
> Two versions of each 32-bit and 16-bit CRC calculation are proposed.
> 
> The first version presents a fast and efficient CRC generation on IA
> processors by using the carry-less multiplication instruction � PCLMULQDQ
> (i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
> approach has been used to first reduce an arbitrary length buffer to a small
> fixed size length buffer (16 bytes) with the help of precomputed constants.
> The resultant single 16-bytes chunk is further reduced by Barrett reduction
> method to generate final CRC value. For more details on the implementation,
> see reference [1].
> 
> The second version presents the fallback solution to support the CRC
> generation without needing any specific support from CPU (for examples-
> SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
> that uses precomputed 256 element table as explained in reference[2].
> 
> During intialisation, all the data structures required for CRC computation
> are initialised. Also, x86 specific crc implementation (if supported by
> the platform) or scalar version is enabled.

As you can see in patchwork, it does not compile on FreeBSD:
	http://dpdk.org/ml/archives/test-report/2017-April/016943.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support
  2017-04-04 20:02                                 ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Thomas Monjalon
@ 2017-04-05  8:34                                   ` Singh, Jasvinder
  2017-04-05  9:01                                     ` Thomas Monjalon
  0 siblings, 1 reply; 69+ messages in thread
From: Singh, Jasvinder @ 2017-04-05  8:34 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, olivier.matz, Doherty, Declan, De Lara Guarch, Pablo

Hi Thomas,


> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Tuesday, April 4, 2017 9:02 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>
> Cc: dev@dpdk.org; olivier.matz@6wind.com; Doherty, Declan
> <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation
> support
> 
> 2017-03-30 17:15, Jasvinder Singh:
> > In some applications, CRC (Cyclic Redundancy Check) needs to be
> > computed or updated during packet processing operations. This patchset
> > adds software implementation of some common standard CRCs (32-bit
> > Ethernet CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T
> X.25]).
> > Two versions of each 32-bit and 16-bit CRC calculation are proposed.
> >
> > The first version presents a fast and efficient CRC generation on IA
> > processors by using the carry-less multiplication instruction
> > PCLMULQDQ (i.e SSE4.2 instrinsics). In this implementation, a
> > parallelized folding approach has been used to first reduce an
> > arbitrary length buffer to a small fixed size length buffer (16 bytes) with the
> help of precomputed constants.
> > The resultant single 16-bytes chunk is further reduced by Barrett
> > reduction method to generate final CRC value. For more details on the
> > implementation, see reference [1].
> >
> > The second version presents the fallback solution to support the CRC
> > generation without needing any specific support from CPU (for
> > examples-
> > SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT)
> > algorithm that uses precomputed 256 element table as explained in
> reference[2].
> >
> > During intialisation, all the data structures required for CRC
> > computation are initialised. Also, x86 specific crc implementation (if
> > supported by the platform) or scalar version is enabled.
> 
> As you can see in patchwork, it does not compile on FreeBSD:
> 	http://dpdk.org/ml/archives/test-report/2017-April/016943.html

As I stated in the cover letter  notes as well that The patchset build fails on clang version earlier than 3.7.0 due to
missing intrinsics and this issue is listed in DPDK known issue section. FreeBSD build on gcc  target should work fine.

Jasvinder
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support
  2017-04-05  8:34                                   ` Singh, Jasvinder
@ 2017-04-05  9:01                                     ` Thomas Monjalon
  2017-04-05  9:37                                       ` Richardson, Bruce
  0 siblings, 1 reply; 69+ messages in thread
From: Thomas Monjalon @ 2017-04-05  9:01 UTC (permalink / raw)
  To: Singh, Jasvinder
  Cc: dev, olivier.matz, Doherty, Declan, De Lara Guarch, Pablo

2017-04-05 08:34, Singh, Jasvinder:
> Hi Thomas,
> 
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2017-03-30 17:15, Jasvinder Singh:
> > > In some applications, CRC (Cyclic Redundancy Check) needs to be
> > > computed or updated during packet processing operations. This patchset
> > > adds software implementation of some common standard CRCs (32-bit
> > > Ethernet CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T
> > X.25]).
> > > Two versions of each 32-bit and 16-bit CRC calculation are proposed.
> > >
> > > The first version presents a fast and efficient CRC generation on IA
> > > processors by using the carry-less multiplication instruction
> > > PCLMULQDQ (i.e SSE4.2 instrinsics). In this implementation, a
> > > parallelized folding approach has been used to first reduce an
> > > arbitrary length buffer to a small fixed size length buffer (16 bytes) with the
> > help of precomputed constants.
> > > The resultant single 16-bytes chunk is further reduced by Barrett
> > > reduction method to generate final CRC value. For more details on the
> > > implementation, see reference [1].
> > >
> > > The second version presents the fallback solution to support the CRC
> > > generation without needing any specific support from CPU (for
> > > examples-
> > > SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT)
> > > algorithm that uses precomputed 256 element table as explained in
> > reference[2].
> > >
> > > During intialisation, all the data structures required for CRC
> > > computation are initialised. Also, x86 specific crc implementation (if
> > > supported by the platform) or scalar version is enabled.
> > 
> > As you can see in patchwork, it does not compile on FreeBSD:
> > 	http://dpdk.org/ml/archives/test-report/2017-April/016943.html
> 
> As I stated in the cover letter  notes as well that The patchset build fails on clang version earlier than 3.7.0 due to
> missing intrinsics and this issue is listed in DPDK known issue section. FreeBSD build on gcc  target should work fine.

Ah, I have not seen this explanation.

However, we cannot let the build fails.
It is a blocker for patch admission.

Can you, at least, disable the code for some compiler versions?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support
  2017-04-05  9:01                                     ` Thomas Monjalon
@ 2017-04-05  9:37                                       ` Richardson, Bruce
  2017-04-05 12:52                                         ` Singh, Jasvinder
  0 siblings, 1 reply; 69+ messages in thread
From: Richardson, Bruce @ 2017-04-05  9:37 UTC (permalink / raw)
  To: Thomas Monjalon, Singh, Jasvinder
  Cc: dev, olivier.matz, Doherty, Declan, De Lara Guarch, Pablo



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Wednesday, April 5, 2017 10:01 AM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>
> Cc: dev@dpdk.org; olivier.matz@6wind.com; Doherty, Declan
> <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation
> support
> 
> 2017-04-05 08:34, Singh, Jasvinder:
> > Hi Thomas,
> >
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2017-03-30 17:15, Jasvinder Singh:
> > > > In some applications, CRC (Cyclic Redundancy Check) needs to be
> > > > computed or updated during packet processing operations. This
> > > > patchset adds software implementation of some common standard CRCs
> > > > (32-bit Ethernet CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit
> > > > CCITT-CRC [ITU-T
> > > X.25]).
> > > > Two versions of each 32-bit and 16-bit CRC calculation are proposed.
> > > >
> > > > The first version presents a fast and efficient CRC generation on
> > > > IA processors by using the carry-less multiplication instruction
> > > > PCLMULQDQ (i.e SSE4.2 instrinsics). In this implementation, a
> > > > parallelized folding approach has been used to first reduce an
> > > > arbitrary length buffer to a small fixed size length buffer (16
> > > > bytes) with the
> > > help of precomputed constants.
> > > > The resultant single 16-bytes chunk is further reduced by Barrett
> > > > reduction method to generate final CRC value. For more details on
> > > > the implementation, see reference [1].
> > > >
> > > > The second version presents the fallback solution to support the
> > > > CRC generation without needing any specific support from CPU (for
> > > > examples-
> > > > SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT)
> > > > algorithm that uses precomputed 256 element table as explained in
> > > reference[2].
> > > >
> > > > During intialisation, all the data structures required for CRC
> > > > computation are initialised. Also, x86 specific crc implementation
> > > > (if supported by the platform) or scalar version is enabled.
> > >
> > > As you can see in patchwork, it does not compile on FreeBSD:
> > > 	http://dpdk.org/ml/archives/test-report/2017-April/016943.html
> >
> > As I stated in the cover letter  notes as well that The patchset build
> > fails on clang version earlier than 3.7.0 due to missing intrinsics and
> this issue is listed in DPDK known issue section. FreeBSD build on gcc
> target should work fine.
> 
> Ah, I have not seen this explanation.
> 
> However, we cannot let the build fails.
> It is a blocker for patch admission.
> 
> Can you, at least, disable the code for some compiler versions?

Hi Jasvinder,

Any chance a work-around for this issue. The default compiler on BSD is clang, and the BSD 10 series of releases uses v3.4. This means this functionality will be unavailable for anyone using DPDK from BSD ports on BSD 10.

/Bruce

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support
  2017-04-05  9:37                                       ` Richardson, Bruce
@ 2017-04-05 12:52                                         ` Singh, Jasvinder
  0 siblings, 0 replies; 69+ messages in thread
From: Singh, Jasvinder @ 2017-04-05 12:52 UTC (permalink / raw)
  To: Richardson, Bruce, Thomas Monjalon
  Cc: dev, olivier.matz, Doherty, Declan, De Lara Guarch, Pablo

Hi Bruce,

> -----Original Message-----
> From: Richardson, Bruce
> Sent: Wednesday, April 5, 2017 10:37 AM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>; Singh, Jasvinder
> <jasvinder.singh@intel.com>
> Cc: dev@dpdk.org; olivier.matz@6wind.com; Doherty, Declan
> <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation
> support
> 
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > Sent: Wednesday, April 5, 2017 10:01 AM
> > To: Singh, Jasvinder <jasvinder.singh@intel.com>
> > Cc: dev@dpdk.org; olivier.matz@6wind.com; Doherty, Declan
> > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation
> > support
> >
> > 2017-04-05 08:34, Singh, Jasvinder:
> > > Hi Thomas,
> > >
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > 2017-03-30 17:15, Jasvinder Singh:
> > > > > In some applications, CRC (Cyclic Redundancy Check) needs to be
> > > > > computed or updated during packet processing operations. This
> > > > > patchset adds software implementation of some common standard
> > > > > CRCs (32-bit Ethernet CRC as per Ethernet/[ISO/IEC 8802-3] and
> > > > > 16-bit CCITT-CRC [ITU-T
> > > > X.25]).
> > > > > Two versions of each 32-bit and 16-bit CRC calculation are proposed.
> > > > >
> > > > > The first version presents a fast and efficient CRC generation
> > > > > on IA processors by using the carry-less multiplication
> > > > > instruction PCLMULQDQ (i.e SSE4.2 instrinsics). In this
> > > > > implementation, a parallelized folding approach has been used to
> > > > > first reduce an arbitrary length buffer to a small fixed size
> > > > > length buffer (16
> > > > > bytes) with the
> > > > help of precomputed constants.
> > > > > The resultant single 16-bytes chunk is further reduced by
> > > > > Barrett reduction method to generate final CRC value. For more
> > > > > details on the implementation, see reference [1].
> > > > >
> > > > > The second version presents the fallback solution to support the
> > > > > CRC generation without needing any specific support from CPU
> > > > > (for
> > > > > examples-
> > > > > SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT)
> > > > > algorithm that uses precomputed 256 element table as explained
> > > > > in
> > > > reference[2].
> > > > >
> > > > > During intialisation, all the data structures required for CRC
> > > > > computation are initialised. Also, x86 specific crc
> > > > > implementation (if supported by the platform) or scalar version is
> enabled.
> > > >
> > > > As you can see in patchwork, it does not compile on FreeBSD:
> > > > 	http://dpdk.org/ml/archives/test-report/2017-April/016943.html
> > >
> > > As I stated in the cover letter  notes as well that The patchset
> > > build fails on clang version earlier than 3.7.0 due to missing
> > > intrinsics and
> > this issue is listed in DPDK known issue section. FreeBSD build on gcc
> > target should work fine.
> >
> > Ah, I have not seen this explanation.
> >
> > However, we cannot let the build fails.
> > It is a blocker for patch admission.
> >
> > Can you, at least, disable the code for some compiler versions?
> 
> Hi Jasvinder,
> 
> Any chance a work-around for this issue. The default compiler on BSD is
> clang, and the BSD 10 series of releases uses v3.4. This means this
> functionality will be unavailable for anyone using DPDK from BSD ports on
> BSD 10.
> 
> /Bruce


I  will have a look at this and send another version with fix.

Jasvinder

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v10 0/2] librte_net: add crc computation support
  2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 1/3] librte_net: add crc compute APIs Jasvinder Singh
  2017-04-04 20:00                                   ` Thomas Monjalon
@ 2017-04-05 14:58                                   ` Jasvinder Singh
  2017-04-05 14:58                                     ` [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-04-05 14:58                                     ` [dpdk-dev] [PATCH v10 2/2] test/test: add unit test for CRC computation Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-04-05 14:58 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3595 bytes --]

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v10 changes:
- added check for PCLMULQDQ instructions support to fix
  FreeBSD 10.4 build error (clang 3.4.0)
- added maintainership claim in the respective patches

v9 changes:
- included header files.
- added maintainership

v8 changes:
- improve unit test case.
 
v7 changes:
- remove the duplicate function in unit test.

v6 changes:
- fixed build error when compiling net library as a shared library
- addressed review comments on v5, (thanks Pablo) 

v5 changes:
- rebase to the master

v4 changes:
- change crc compute api parameters to make it more generic
- change the unit test to accomodate the crc compute api change

v3 changes:
- separate the x86 specific implementation into new file
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Jasvinder Singh (2):
  librte_net: add crc compute APIs
  test/test: add unit test for CRC computation

 MAINTAINERS                        |   6 +
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/rte_net_crc.c       | 209 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       |  96 ++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 363 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 test/test/Makefile                 |   2 +
 test/test/test_crc.c               | 183 +++++++++++++++++++
 9 files changed, 870 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h
 create mode 100644 test/test/test_crc.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs
  2017-04-05 14:58                                   ` [dpdk-dev] [PATCH v10 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-04-05 14:58                                     ` Jasvinder Singh
  2017-04-05 17:49                                       ` Thomas Monjalon
  2017-04-05 20:49                                       ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-04-05 14:58                                     ` [dpdk-dev] [PATCH v10 2/2] test/test: add unit test for CRC computation Jasvinder Singh
  1 sibling, 2 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-04-05 14:58 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 MAINTAINERS                        |   5 +
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/rte_net_crc.c       | 209 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       |  96 ++++++++++
 lib/librte_net/rte_net_crc_sse.h   | 363 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 7 files changed, 684 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 lib/librte_net/rte_net_crc_sse.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b1524d..c03320d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -480,6 +480,11 @@ Network headers
 M: Olivier Matz <olivier.matz@6wind.com>
 F: lib/librte_net/
 
+Packet CRC
+M: Jasvinder Singh <jasvinder.singh@intel.com>
+F: lib/librte_net/rte_net_crc*
+F: lib/librte_net/rte_net_crc_sse.h
+
 IP fragmentation & reassembly
 M: Konstantin Ananyev <konstantin.ananyev@intel.com>
 F: lib/librte_ip_frag/
diff --git a/lib/Makefile b/lib/Makefile
index 5ad3c7c..456eb38 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -62,7 +62,7 @@ DEPDIRS-librte_lpm := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DEPDIRS-librte_acl := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
-DEPDIRS-librte_net := librte_mbuf
+DEPDIRS-librte_net := librte_mbuf librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DEPDIRS-librte_ip_frag := librte_eal librte_mempool librte_mbuf librte_ether
 DEPDIRS-librte_ip_frag += librte_hash
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index abd5c46..56727c4 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,10 +39,12 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..5c03695
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,209 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <string.h>
+#include <stdint.h>
+
+#include <rte_cpuflags.h>
+#include <rte_common.h>
+#include <rte_net_crc.h>
+
+#if defined(RTE_ARCH_X86_64)				\
+	&& defined(RTE_MACHINE_CPUFLAG_SSE4_2)		\
+	&& defined(RTE_MACHINE_CPUFLAG_PCLMULQDQ)
+#define X86_64_SSE42_PCLMULQDQ     1
+#endif
+
+#ifdef X86_64_SSE42_PCLMULQDQ
+#include <rte_net_crc_sse.h>
+#endif
+
+/** crc tables */
+static uint32_t crc32_eth_lut[CRC_LUT_SIZE];
+static uint32_t crc16_ccitt_lut[CRC_LUT_SIZE];
+
+static uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#ifdef X86_64_SSE42_PCLMULQDQ
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param val
+ *   value to be reflected
+ *
+ * @return
+ *   reflected value
+ */
+static uint32_t
+reflect_32bits(uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(uint32_t poly,
+	uint32_t *lut)
+{
+	uint32_t i, j;
+
+	for (i = 0; i < CRC_LUT_SIZE; i++) {
+		uint32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+		lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/** 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/** 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	/** return 32-bit CRC value */
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+	case RTE_NET_CRC_SSE42:
+#ifdef X86_64_SSE42_PCLMULQDQ
+		handlers = handlers_sse42;
+#else
+		alg = RTE_NET_CRC_SCALAR;
+		break;
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[type];
+	ret = f_handle((const uint8_t *) data, data_len);
+
+	return ret;
+}
+
+/*
+ * Select highest available crc algorithm as default one.
+ */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef X86_64_SSE42_PCLMULQDQ
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..76fd129
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,96 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+#define CRC_LUT_SIZE 256
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ * This API set the CRC computation algorithm (i.e. scalar version,
+ * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
+ * structure.
+ *
+ * @param alg
+ *   This parameter is used to select the CRC implementation version.
+ *   - RTE_NET_CRC_SCALAR
+ *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param data
+ *   Pointer to the packet data for CRC computation
+ * @param data_len
+ *   Data length for CRC computation
+ * @param type
+ *   CRC type (enum rte_net_crc_type)
+ *
+ * @return
+ *   CRC value
+ */
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_crc_sse.h b/lib/librte_net/rte_net_crc_sse.h
new file mode 100644
index 0000000..8bce522
--- /dev/null
+++ b/lib/librte_net/rte_net_crc_sse.h
@@ -0,0 +1,363 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <rte_branch_prediction.h>
+
+#include <x86intrin.h>
+#include <cpuid.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block
+ *   16 byte data block
+ * @param precomp
+ *   Precomputed rk1 constanst
+ * @param fold
+ *   Current16 byte folded data
+ *
+ * @return
+ *   New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(__m128i data_block,
+		__m128i precomp,
+		__m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128
+ *   128 bits data to be reduced
+ * @param precomp
+ *   precomputed constants rk5, rk6
+ *
+ * @return
+ *  64 bits reduced data
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128, __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64
+ *   64 bits data to be reduced
+ * @param precomp
+ *   rk7 precomputed constant
+ *
+ * @return
+ *   reduced 32 bits data
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg
+ *   128 bit value
+ * @param num
+ *   number of bytes to shift left reg by (0-16)
+ *
+ * @return
+ *   reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/**
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..687c40e 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_calc;
+	rte_net_crc_set_alg;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v10 2/2] test/test: add unit test for CRC computation
  2017-04-05 14:58                                   ` [dpdk-dev] [PATCH v10 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-04-05 14:58                                     ` [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-04-05 14:58                                     ` Jasvinder Singh
  1 sibling, 0 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-04-05 14:58 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 MAINTAINERS          |   1 +
 test/test/Makefile   |   2 +
 test/test/test_crc.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 test/test/test_crc.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c03320d..270c2fe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -484,6 +484,7 @@ Packet CRC
 M: Jasvinder Singh <jasvinder.singh@intel.com>
 F: lib/librte_net/rte_net_crc*
 F: lib/librte_net/rte_net_crc_sse.h
+F: test/test/test_crc.c
 
 IP fragmentation & reassembly
 M: Konstantin Ananyev <konstantin.ananyev@intel.com>
diff --git a/test/test/Makefile b/test/test/Makefile
index 79f0c61..06d8d5d 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/test/test/test_crc.c b/test/test/test_crc.c
new file mode 100644
index 0000000..ea61ca9
--- /dev/null
+++ b/test/test/test_crc.c
@@ -0,0 +1,183 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "test.h"
+
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_net_crc.h>
+
+#define CRC_VEC_LEN        32
+#define CRC32_VEC_LEN1     1512
+#define CRC32_VEC_LEN2     348
+#define CRC16_VEC_LEN1     12
+#define CRC16_VEC_LEN2     2
+#define LINE_LEN           75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1 */
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2 */
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console */
+	TEST_HEXDUMP(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static int
+test_crc_calc(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t result;
+	int error;
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	result = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (result != crc32_vec_res)
+		return -1;
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	result = crc_calc(test_data, CRC32_VEC_LEN1, type);
+	if (result != crc32_vec1_res) {
+		error = -2;
+		goto fail;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	result = crc_calc(test_data, CRC32_VEC_LEN2, type);
+	if (result != crc32_vec2_res) {
+		error = -3;
+		goto fail;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	result = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (result != crc16_vec_res) {
+		error = -4;
+		goto fail;
+	}
+	/* 16-bit CCITT CRC:  Test 5 */
+	result = crc_calc(crc16_vec1, CRC16_VEC_LEN1, type);
+	if (result != crc16_vec1_res) {
+		error = -5;
+		goto fail;
+	}
+	/* 16-bit CCITT CRC:  Test 6 */
+	result = crc_calc(crc16_vec2, CRC16_VEC_LEN2, type);
+	if (result != crc16_vec2_res) {
+		error = -6;
+		goto fail;
+	}
+
+	rte_free(test_data);
+	return 0;
+
+fail:
+	rte_free(test_data);
+	return error;
+}
+
+static int
+test_crc(void)
+{
+	int ret;
+	/* set CRC scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (scalar): failed (%d)\n", ret);
+		return ret;
+	}
+	/* set CRC sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (x86_64_SSE4.2): failed (%d)\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs
  2017-04-05 14:58                                     ` [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-04-05 17:49                                       ` Thomas Monjalon
  2017-04-05 19:22                                         ` Singh, Jasvinder
  2017-04-05 20:49                                       ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Jasvinder Singh
  1 sibling, 1 reply; 69+ messages in thread
From: Thomas Monjalon @ 2017-04-05 17:49 UTC (permalink / raw)
  To: Jasvinder Singh; +Cc: dev, olivier.matz, declan.doherty, pablo.de.lara.guarch

2017-04-05 15:58, Jasvinder Singh:
> APIs for selecting the architecure specific implementation and computing
> the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
> as well as x86 intrinsic(sse4.2) versions are implemented.
> 
> The scalar version is based on generic Look-Up Table(LUT) algorithm,
> while x86 intrinsic version uses carry-less multiplication for
> fast CRC computation.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

There is a remaining error with doxygen:
	lib/librte_net/rte_net_crc_sse.h:153:
	warning: documented symbol `static const uint8_t crc_xmm_shift_tab'
	was not declared or defined.

> --- a/lib/librte_net/Makefile
> +++ b/lib/librte_net/Makefile
> @@ -39,10 +39,12 @@ EXPORT_MAP := rte_net_version.map
>  LIBABIVER := 1
>  
>  SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
> +SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h

As rte_net_crc_sse.h is not exported, you should avoid doxygen generation.
I suggest removing the rte_ prefix of the filename, so it will make
clear that it is a private header and doxygen should ignore it
(because of FILE_PATTERNS = rte_*.h).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs
  2017-04-05 17:49                                       ` Thomas Monjalon
@ 2017-04-05 19:22                                         ` Singh, Jasvinder
  0 siblings, 0 replies; 69+ messages in thread
From: Singh, Jasvinder @ 2017-04-05 19:22 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, olivier.matz, Doherty, Declan, De Lara Guarch, Pablo

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, April 5, 2017 6:50 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>
> Cc: dev@dpdk.org; olivier.matz@6wind.com; Doherty, Declan
> <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs
> 
> 2017-04-05 15:58, Jasvinder Singh:
> > APIs for selecting the architecure specific implementation and
> > computing the crc (16-bit and 32-bit CRCs) are added. For CRCs
> > calculation, scalar as well as x86 intrinsic(sse4.2) versions are implemented.
> >
> > The scalar version is based on generic Look-Up Table(LUT) algorithm,
> > while x86 intrinsic version uses carry-less multiplication for fast
> > CRC computation.
> >
> > Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> > Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> 
> There is a remaining error with doxygen:
> 	lib/librte_net/rte_net_crc_sse.h:153:
> 	warning: documented symbol `static const uint8_t
> crc_xmm_shift_tab'
> 	was not declared or defined.
> 
> > --- a/lib/librte_net/Makefile
> > +++ b/lib/librte_net/Makefile
> > @@ -39,10 +39,12 @@ EXPORT_MAP := rte_net_version.map  LIBABIVER
> := 1
> >
> >  SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
> >
> >  # install includes
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h
> > rte_udp.h  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h
> > rte_icmp.h rte_arp.h  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include +=
> > rte_ether.h rte_gre.h rte_net.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
> 
> As rte_net_crc_sse.h is not exported, you should avoid doxygen generation.
> I suggest removing the rte_ prefix of the filename, so it will make clear that it
> is a private header and doxygen should ignore it (because of FILE_PATTERNS
> = rte_*.h).

Thanks, will fix above in next version.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support
  2017-04-05 14:58                                     ` [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-04-05 17:49                                       ` Thomas Monjalon
@ 2017-04-05 20:49                                       ` Jasvinder Singh
  2017-04-05 20:49                                         ` [dpdk-dev] [PATCH v11 1/2] librte_net: add crc compute APIs Jasvinder Singh
                                                           ` (2 more replies)
  1 sibling, 3 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-04-05 20:49 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3574 bytes --]

In some applications, CRC (Cyclic Redundancy Check) needs to be computed
or updated during packet processing operations. This patchset adds
software implementation of some common standard CRCs (32-bit Ethernet
CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
Two versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
that uses precomputed 256 element table as explained in reference[2].

During intialisation, all the data structures required for CRC computation
are initialised. Also, x86 specific crc implementation (if supported by
the platform) or scalar version is enabled.

Following APIs have been added;

(i) rte_net_crc_set_alg()   
(ii)rte_net_crc_calc()

The first API (i) allows user to select the specific CRC implementation
in run-time while the second API (ii) is used for computing the 16-bit and
32-bit CRC.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v11 changes:
- fixed doxygen warning

v10 changes:
- added check for PCLMULQDQ instructions support to fix
  FreeBSD 10.4 build error (clang 3.4.0)

v9 changes:
- included header files.
- added maintainership

v8 changes:
- improve unit test case.
 
v7 changes:
- remove the duplicate function in unit test.

v6 changes:
- fixed build error when compiling net library as a shared library
- addressed review comments on v5, (thanks Pablo) 

v5 changes:
- rebase to the master

v4 changes:
- change crc compute api parameters to make it more generic
- change the unit test to accomodate the crc compute api change

v3 changes:
- separate the x86 specific implementation into new file
- improve the unit test

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Jasvinder Singh (2):
  librte_net: add crc compute APIs
  test/test: add unit test for CRC computation

 MAINTAINERS                        |   6 +
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/net_crc_sse.h       | 363 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.c       | 207 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       |  96 ++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 test/test/Makefile                 |   2 +
 test/test/test_crc.c               | 183 +++++++++++++++++++
 9 files changed, 868 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/net_crc_sse.h
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h
 create mode 100644 test/test/test_crc.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v11 1/2] librte_net: add crc compute APIs
  2017-04-05 20:49                                       ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Jasvinder Singh
@ 2017-04-05 20:49                                         ` Jasvinder Singh
  2017-04-05 20:49                                         ` [dpdk-dev] [PATCH v11 2/2] test/test: add unit test for CRC computation Jasvinder Singh
  2017-04-05 21:00                                         ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Thomas Monjalon
  2 siblings, 0 replies; 69+ messages in thread
From: Jasvinder Singh @ 2017-04-05 20:49 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

APIs for selecting the architecure specific implementation and computing
the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar
as well as x86 intrinsic(sse4.2) versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 MAINTAINERS                        |   5 +
 lib/Makefile                       |   2 +-
 lib/librte_net/Makefile            |   2 +
 lib/librte_net/net_crc_sse.h       | 363 +++++++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.c       | 207 +++++++++++++++++++++
 lib/librte_net/rte_net_crc.h       |  96 ++++++++++
 lib/librte_net/rte_net_version.map |   8 +
 7 files changed, 682 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/net_crc_sse.h
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b1524d..a76d0c3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -480,6 +480,11 @@ Network headers
 M: Olivier Matz <olivier.matz@6wind.com>
 F: lib/librte_net/
 
+Packet CRC
+M: Jasvinder Singh <jasvinder.singh@intel.com>
+F: lib/librte_net/rte_net_crc*
+F: lib/librte_net/net_crc_sse.h
+
 IP fragmentation & reassembly
 M: Konstantin Ananyev <konstantin.ananyev@intel.com>
 F: lib/librte_ip_frag/
diff --git a/lib/Makefile b/lib/Makefile
index 5ad3c7c..456eb38 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -62,7 +62,7 @@ DEPDIRS-librte_lpm := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DEPDIRS-librte_acl := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
-DEPDIRS-librte_net := librte_mbuf
+DEPDIRS-librte_net := librte_mbuf librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DEPDIRS-librte_ip_frag := librte_eal librte_mempool librte_mbuf librte_ether
 DEPDIRS-librte_ip_frag += librte_hash
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index abd5c46..56727c4 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,10 +39,12 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.h
new file mode 100644
index 0000000..8bce522
--- /dev/null
+++ b/lib/librte_net/net_crc_sse.h
@@ -0,0 +1,363 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_SSE_H_
+#define _RTE_NET_CRC_SSE_H_
+
+#include <rte_branch_prediction.h>
+
+#include <x86intrin.h>
+#include <cpuid.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+	__m128i rk1_rk2;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ *     DATA = READ_NEXT_16BYTES();
+ *     F1 = LSB8(FOLD)
+ *     F2 = MSB8(FOLD)
+ *     T1 = CLMUL(F1, RK1)
+ *     T2 = CLMUL(F2, RK2)
+ *     FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block
+ *   16 byte data block
+ * @param precomp
+ *   Precomputed rk1 constanst
+ * @param fold
+ *   Current16 byte folded data
+ *
+ * @return
+ *   New 16 byte folded data
+ */
+static inline __attribute__((always_inline)) __m128i
+crcr32_folding_round(__m128i data_block,
+		__m128i precomp,
+		__m128i fold)
+{
+	__m128i tmp0 = _mm_clmulepi64_si128(fold, precomp, 0x01);
+	__m128i tmp1 = _mm_clmulepi64_si128(fold, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, _mm_xor_si128(data_block, tmp0));
+}
+
+/**
+ * Performs reduction from 128 bits to 64 bits
+ *
+ * @param data128
+ *   128 bits data to be reduced
+ * @param precomp
+ *   precomputed constants rk5, rk6
+ *
+ * @return
+ *  64 bits reduced data
+ */
+
+static inline __attribute__((always_inline)) __m128i
+crcr32_reduce_128_to_64(__m128i data128, __m128i precomp)
+{
+	__m128i tmp0, tmp1, tmp2;
+
+	/* 64b fold */
+	tmp0 = _mm_clmulepi64_si128(data128, precomp, 0x00);
+	tmp1 = _mm_srli_si128(data128, 8);
+	tmp0 = _mm_xor_si128(tmp0, tmp1);
+
+	/* 32b fold */
+	tmp2 = _mm_slli_si128(tmp0, 4);
+	tmp1 = _mm_clmulepi64_si128(tmp2, precomp, 0x10);
+
+	return _mm_xor_si128(tmp1, tmp0);
+}
+
+/**
+ * Performs Barret's reduction from 64 bits to 32 bits
+ *
+ * @param data64
+ *   64 bits data to be reduced
+ * @param precomp
+ *   rk7 precomputed constant
+ *
+ * @return
+ *   reduced 32 bits data
+ */
+
+static inline __attribute__((always_inline)) uint32_t
+crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
+{
+	static const uint32_t mask1[4] __rte_aligned(16) = {
+		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+	};
+
+	static const uint32_t mask2[4] __rte_aligned(16) = {
+		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+	};
+	__m128i tmp0, tmp1, tmp2;
+
+	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
+
+	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
+	tmp1 = _mm_xor_si128(tmp1, tmp0);
+	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
+
+	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
+	tmp2 = _mm_xor_si128(tmp2, tmp1);
+	tmp2 = _mm_xor_si128(tmp2, tmp0);
+
+	return _mm_extract_epi32(tmp2, 2);
+}
+
+static const uint8_t crc_xmm_shift_tab[48] __rte_aligned(16) = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/**
+ * Shifts left 128 bit register by specified number of bytes
+ *
+ * @param reg
+ *   128 bit value
+ * @param num
+ *   number of bytes to shift left reg by (0-16)
+ *
+ * @return
+ *   reg << (num * 8)
+ */
+
+static inline __attribute__((always_inline)) __m128i
+xmm_shift_left(__m128i reg, const unsigned int num)
+{
+	const __m128i *p = (const __m128i *)(crc_xmm_shift_tab + 16 - num);
+
+	return _mm_shuffle_epi8(reg, _mm_loadu_si128(p));
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_pclmulqdq(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_pclmulqdq_ctx *params)
+{
+	__m128i temp, fold, k;
+	uint32_t n;
+
+	/* Get CRC init value */
+	temp = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+
+	/**
+	 * Folding all data into single 16 byte data block
+	 * Assumes: fold holds first 16 bytes of data
+	 */
+
+	if (unlikely(data_len < 32)) {
+		if (unlikely(data_len == 16)) {
+			/* 16 bytes */
+			fold = _mm_loadu_si128((const __m128i *)data);
+			fold = _mm_xor_si128(fold, temp);
+			goto reduction_128_64;
+		}
+
+		if (unlikely(data_len < 16)) {
+			/* 0 to 15 bytes */
+			uint8_t buffer[16] __rte_aligned(16);
+
+			memset(buffer, 0, sizeof(buffer));
+			memcpy(buffer, data, data_len);
+
+			fold = _mm_load_si128((const __m128i *)buffer);
+			fold = _mm_xor_si128(fold, temp);
+			if (unlikely(data_len < 4)) {
+				fold = xmm_shift_left(fold, 8 - data_len);
+				goto barret_reduction;
+			}
+			fold = xmm_shift_left(fold, 16 - data_len);
+			goto reduction_128_64;
+		}
+		/* 17 to 31 bytes */
+		fold = _mm_loadu_si128((const __m128i *)data);
+		fold = _mm_xor_si128(fold, temp);
+		n = 16;
+		k = params->rk1_rk2;
+		goto partial_bytes;
+	}
+
+	/** At least 32 bytes in the buffer */
+	/** Apply CRC initial value */
+	fold = _mm_loadu_si128((const __m128i *)data);
+	fold = _mm_xor_si128(fold, temp);
+
+	/** Main folding loop - the last 16 bytes is processed separately */
+	k = params->rk1_rk2;
+	for (n = 16; (n + 16) <= data_len; n += 16) {
+		temp = _mm_loadu_si128((const __m128i *)&data[n]);
+		fold = crcr32_folding_round(temp, k, fold);
+	}
+
+partial_bytes:
+	if (likely(n < data_len)) {
+
+		const uint32_t mask3[4] __rte_aligned(16) = {
+			0x80808080, 0x80808080, 0x80808080, 0x80808080
+		};
+
+		const uint8_t shf_table[32] __rte_aligned(16) = {
+			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+		};
+
+		__m128i last16, a, b;
+
+		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
+
+		temp = _mm_loadu_si128((const __m128i *)
+			&shf_table[data_len & 15]);
+		a = _mm_shuffle_epi8(fold, temp);
+
+		temp = _mm_xor_si128(temp,
+			_mm_load_si128((const __m128i *)mask3));
+		b = _mm_shuffle_epi8(fold, temp);
+		b = _mm_blendv_epi8(b, last16, temp);
+
+		/* k = rk1 & rk2 */
+		temp = _mm_clmulepi64_si128(a, k, 0x01);
+		fold = _mm_clmulepi64_si128(a, k, 0x10);
+
+		fold = _mm_xor_si128(fold, temp);
+		fold = _mm_xor_si128(fold, b);
+	}
+
+	/** Reduction 128 -> 32 Assumes: fold holds 128bit folded data */
+reduction_128_64:
+	k = params->rk5_rk6;
+	fold = crcr32_reduce_128_to_64(fold, k);
+
+barret_reduction:
+	k = params->rk7_rk8;
+	n = crcr32_reduce_64_to_32(fold, k);
+
+	return n;
+}
+
+
+static inline void
+rte_net_crc_sse42_init(void)
+{
+	uint64_t k1, k2, k5, k6;
+	uint64_t p = 0, q = 0;
+
+	/** Initialize CRC16 data */
+	k1 = 0x189aeLLU;
+	k2 = 0x8e10LLU;
+	k5 = 0x189aeLLU;
+	k6 = 0x114aaLLU;
+	q =  0x11c581910LLU;
+	p =  0x10811LLU;
+
+	/** Save the params in context structure */
+	crc16_ccitt_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc16_ccitt_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc16_ccitt_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/** Initialize CRC32 data */
+	k1 = 0xccaa009eLLU;
+	k2 = 0x1751997d0LLU;
+	k5 = 0xccaa009eLLU;
+	k6 = 0x163cd6124LLU;
+	q =  0x1f7011640LLU;
+	p =  0x1db710641LLU;
+
+	/** Save the params in context structure */
+	crc32_eth_pclmulqdq.rk1_rk2 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
+	crc32_eth_pclmulqdq.rk5_rk6 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
+	crc32_eth_pclmulqdq.rk7_rk8 =
+		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+
+	/**
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+
+}
+
+static inline uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	/** return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_pclmulqdq);
+}
+
+static inline uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data,
+	uint32_t data_len)
+{
+	return ~crc32_eth_calc_pclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_pclmulqdq);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 0000000..72efe20
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,207 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <string.h>
+#include <stdint.h>
+
+#include <rte_cpuflags.h>
+#include <rte_common.h>
+#include <rte_net_crc.h>
+
+#if defined(RTE_ARCH_X86_64)				\
+	&& defined(RTE_MACHINE_CPUFLAG_SSE4_2)		\
+	&& defined(RTE_MACHINE_CPUFLAG_PCLMULQDQ)
+#define X86_64_SSE42_PCLMULQDQ     1
+#endif
+
+#ifdef X86_64_SSE42_PCLMULQDQ
+#include <net_crc_sse.h>
+#endif
+
+/* crc tables */
+static uint32_t crc32_eth_lut[CRC_LUT_SIZE];
+static uint32_t crc16_ccitt_lut[CRC_LUT_SIZE];
+
+static uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len);
+
+static uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
+
+typedef uint32_t
+(*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+};
+
+#ifdef X86_64_SSE42_PCLMULQDQ
+static rte_net_crc_handler handlers_sse42[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
+};
+#endif
+
+/**
+ * Reflect the bits about the middle
+ *
+ * @param val
+ *   value to be reflected
+ *
+ * @return
+ *   reflected value
+ */
+static uint32_t
+reflect_32bits(uint32_t val)
+{
+	uint32_t i, res = 0;
+
+	for (i = 0; i < 32; i++)
+		if ((val & (1 << i)) != 0)
+			res |= (uint32_t)(1 << (31 - i));
+
+	return res;
+}
+
+static void
+crc32_eth_init_lut(uint32_t poly,
+	uint32_t *lut)
+{
+	uint32_t i, j;
+
+	for (i = 0; i < CRC_LUT_SIZE; i++) {
+		uint32_t crc = reflect_32bits(i);
+
+		for (j = 0; j < 8; j++) {
+			if (crc & 0x80000000L)
+				crc = (crc << 1) ^ poly;
+			else
+				crc <<= 1;
+		}
+		lut[i] = reflect_32bits(crc);
+	}
+}
+
+static inline __attribute__((always_inline)) uint32_t
+crc32_eth_calc_lut(const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const uint32_t *lut)
+{
+	while (data_len--)
+		crc = lut[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
+
+	return crc;
+}
+
+static void
+rte_net_crc_scalar_init(void)
+{
+	/* 32-bit crc init */
+	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
+
+	/* 16-bit CRC init */
+	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_lut(data,
+		data_len,
+		0xffff,
+		crc16_ccitt_lut);
+}
+
+static inline uint32_t
+rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* return 32-bit CRC value */
+	return ~crc32_eth_calc_lut(data,
+		data_len,
+		0xffffffffUL,
+		crc32_eth_lut);
+}
+
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg)
+{
+	switch (alg) {
+	case RTE_NET_CRC_SSE42:
+#ifdef X86_64_SSE42_PCLMULQDQ
+		handlers = handlers_sse42;
+#else
+		alg = RTE_NET_CRC_SCALAR;
+		break;
+#endif
+	case RTE_NET_CRC_SCALAR:
+	default:
+		handlers = handlers_scalar;
+		break;
+	}
+}
+
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type)
+{
+	uint32_t ret;
+	rte_net_crc_handler f_handle;
+
+	f_handle = handlers[type];
+	ret = f_handle((const uint8_t *) data, data_len);
+
+	return ret;
+}
+
+/* Select highest available crc algorithm as default one */
+static inline void __attribute__((constructor))
+rte_net_crc_init(void)
+{
+	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
+
+	rte_net_crc_scalar_init();
+
+#ifdef X86_64_SSE42_PCLMULQDQ
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+#endif
+
+	rte_net_crc_set_alg(alg);
+}
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
new file mode 100644
index 0000000..76fd129
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.h
@@ -0,0 +1,96 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_NET_CRC_H_
+#define _RTE_NET_CRC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+#define CRC_LUT_SIZE 256
+
+/** CRC types */
+enum rte_net_crc_type {
+	RTE_NET_CRC16_CCITT = 0,
+	RTE_NET_CRC32_ETH,
+	RTE_NET_CRC_REQS
+};
+
+/** CRC compute algorithm */
+enum rte_net_crc_alg {
+	RTE_NET_CRC_SCALAR = 0,
+	RTE_NET_CRC_SSE42,
+};
+
+/**
+ * This API set the CRC computation algorithm (i.e. scalar version,
+ * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
+ * structure.
+ *
+ * @param alg
+ *   This parameter is used to select the CRC implementation version.
+ *   - RTE_NET_CRC_SCALAR
+ *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
+ */
+void
+rte_net_crc_set_alg(enum rte_net_crc_alg alg);
+
+/**
+ * CRC compute API
+ *
+ * @param data
+ *   Pointer to the packet data for CRC computation
+ * @param data_len
+ *   Data length for CRC computation
+ * @param type
+ *   CRC type (enum rte_net_crc_type)
+ *
+ * @return
+ *   CRC value
+ */
+uint32_t
+rte_net_crc_calc(const void *data,
+	uint32_t data_len,
+	enum rte_net_crc_type type);
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* _RTE_NET_CRC_H_ */
diff --git a/lib/librte_net/rte_net_version.map b/lib/librte_net/rte_net_version.map
index 3b15e65..687c40e 100644
--- a/lib/librte_net/rte_net_version.map
+++ b/lib/librte_net/rte_net_version.map
@@ -4,3 +4,11 @@ DPDK_16.11 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_net_crc_calc;
+	rte_net_crc_set_alg;
+
+} DPDK_16.11;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [dpdk-dev] [PATCH v11 2/2] test/test: add unit test for CRC computation
  2017-04-05 20:49                                       ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-04-05 20:49                                         ` [dpdk-dev] [PATCH v11 1/2] librte_net: add crc compute APIs Jasvinder Singh
@ 2017-04-05 20:49                                         ` Jasvinder Singh
  2017-04-05 20:59                                           ` Thomas Monjalon
  2017-04-05 21:00                                         ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Thomas Monjalon
  2 siblings, 1 reply; 69+ messages in thread
From: Jasvinder Singh @ 2017-04-05 20:49 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, declan.doherty, pablo.de.lara.guarch

This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 MAINTAINERS          |   1 +
 test/test/Makefile   |   2 +
 test/test/test_crc.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 test/test/test_crc.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a76d0c3..81fe121 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -484,6 +484,7 @@ Packet CRC
 M: Jasvinder Singh <jasvinder.singh@intel.com>
 F: lib/librte_net/rte_net_crc*
 F: lib/librte_net/net_crc_sse.h
+F: test/test/test_crc.c
 
 IP fragmentation & reassembly
 M: Konstantin Ananyev <konstantin.ananyev@intel.com>
diff --git a/test/test/Makefile b/test/test/Makefile
index 79f0c61..06d8d5d 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/test/test/test_crc.c b/test/test/test_crc.c
new file mode 100644
index 0000000..ea61ca9
--- /dev/null
+++ b/test/test/test_crc.c
@@ -0,0 +1,183 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "test.h"
+
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_net_crc.h>
+
+#define CRC_VEC_LEN        32
+#define CRC32_VEC_LEN1     1512
+#define CRC32_VEC_LEN2     348
+#define CRC16_VEC_LEN1     12
+#define CRC16_VEC_LEN2     2
+#define LINE_LEN           75
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+	'0', '1', '2', '3', '4', '5', '6', '7',
+	'8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+	'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+	0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+	0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1 */
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+	0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+	0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2 */
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+	0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(const uint8_t *vec,
+	uint32_t vec_len,
+	enum rte_net_crc_type type)
+{
+	/* compute CRC */
+	uint32_t ret = rte_net_crc_calc(vec, vec_len, type);
+
+	/* dump data on console */
+	TEST_HEXDUMP(stdout, NULL, vec, vec_len);
+
+	return  ret;
+}
+
+static int
+test_crc_calc(void)
+{
+	uint32_t i;
+	enum rte_net_crc_type type;
+	uint8_t *test_data;
+	uint32_t result;
+	int error;
+
+	/* 32-bit ethernet CRC: Test 1 */
+	type = RTE_NET_CRC32_ETH;
+
+	result = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (result != crc32_vec_res)
+		return -1;
+
+	/* 32-bit ethernet CRC: Test 2 */
+	test_data = rte_zmalloc(NULL, CRC32_VEC_LEN1, 0);
+
+	for (i = 0; i < CRC32_VEC_LEN1; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	result = crc_calc(test_data, CRC32_VEC_LEN1, type);
+	if (result != crc32_vec1_res) {
+		error = -2;
+		goto fail;
+	}
+
+	/* 32-bit ethernet CRC: Test 3 */
+	for (i = 0; i < CRC32_VEC_LEN2; i += 12)
+		rte_memcpy(&test_data[i], crc32_vec1, 12);
+
+	result = crc_calc(test_data, CRC32_VEC_LEN2, type);
+	if (result != crc32_vec2_res) {
+		error = -3;
+		goto fail;
+	}
+
+	/* 16-bit CCITT CRC:  Test 4 */
+	type = RTE_NET_CRC16_CCITT;
+	result = crc_calc(crc_vec, CRC_VEC_LEN, type);
+	if (result != crc16_vec_res) {
+		error = -4;
+		goto fail;
+	}
+	/* 16-bit CCITT CRC:  Test 5 */
+	result = crc_calc(crc16_vec1, CRC16_VEC_LEN1, type);
+	if (result != crc16_vec1_res) {
+		error = -5;
+		goto fail;
+	}
+	/* 16-bit CCITT CRC:  Test 6 */
+	result = crc_calc(crc16_vec2, CRC16_VEC_LEN2, type);
+	if (result != crc16_vec2_res) {
+		error = -6;
+		goto fail;
+	}
+
+	rte_free(test_data);
+	return 0;
+
+fail:
+	rte_free(test_data);
+	return error;
+}
+
+static int
+test_crc(void)
+{
+	int ret;
+	/* set CRC scalar mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (scalar): failed (%d)\n", ret);
+		return ret;
+	}
+	/* set CRC sse4.2 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_SSE42);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test_crc (x86_64_SSE4.2): failed (%d)\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(crc_autotest, test_crc);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v11 2/2] test/test: add unit test for CRC computation
  2017-04-05 20:49                                         ` [dpdk-dev] [PATCH v11 2/2] test/test: add unit test for CRC computation Jasvinder Singh
@ 2017-04-05 20:59                                           ` Thomas Monjalon
  0 siblings, 0 replies; 69+ messages in thread
From: Thomas Monjalon @ 2017-04-05 20:59 UTC (permalink / raw)
  To: Jasvinder Singh; +Cc: dev, olivier.matz, declan.doherty, pablo.de.lara.guarch

2017-04-05 21:49, Jasvinder Singh:
> This patch provides a set of tests for verifying the functional
> correctness of 16-bit and 32-bit CRC APIs.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

Now first patch is fine but this one has an issue:

test_crc.c:88:2: fatal error:
implicit declaration of function 'rte_hexdump' is invalid in C99
        TEST_HEXDUMP(stdout, NULL, vec, vec_len);

It is fixed by adding
#include <rte_hexdump.h>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support
  2017-04-05 20:49                                       ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Jasvinder Singh
  2017-04-05 20:49                                         ` [dpdk-dev] [PATCH v11 1/2] librte_net: add crc compute APIs Jasvinder Singh
  2017-04-05 20:49                                         ` [dpdk-dev] [PATCH v11 2/2] test/test: add unit test for CRC computation Jasvinder Singh
@ 2017-04-05 21:00                                         ` Thomas Monjalon
  2 siblings, 0 replies; 69+ messages in thread
From: Thomas Monjalon @ 2017-04-05 21:00 UTC (permalink / raw)
  To: Jasvinder Singh; +Cc: dev, olivier.matz, declan.doherty, pablo.de.lara.guarch

2017-04-05 21:49, Jasvinder Singh:
> In some applications, CRC (Cyclic Redundancy Check) needs to be computed
> or updated during packet processing operations. This patchset adds
> software implementation of some common standard CRCs (32-bit Ethernet
> CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]).
> Two versions of each 32-bit and 16-bit CRC calculation are proposed.
> 
> The first version presents a fast and efficient CRC generation on IA
> processors by using the carry-less multiplication instruction � PCLMULQDQ
> (i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
> approach has been used to first reduce an arbitrary length buffer to a small
> fixed size length buffer (16 bytes) with the help of precomputed constants.
> The resultant single 16-bytes chunk is further reduced by Barrett reduction
> method to generate final CRC value. For more details on the implementation,
> see reference [1].
> 
> The second version presents the fallback solution to support the CRC
> generation without needing any specific support from CPU (for examples-
> SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm
> that uses precomputed 256 element table as explained in reference[2].
> 
> During intialisation, all the data structures required for CRC computation
> are initialised. Also, x86 specific crc implementation (if supported by
> the platform) or scalar version is enabled.

Applied (with one last fix), thanks

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2017-04-05 21:00 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24 20:54 [dpdk-dev] [PATCH 0/2] librte_net: add crc computation support Jasvinder Singh
2017-02-24 20:54 ` [dpdk-dev] [PATCH 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
2017-02-28 12:08   ` [dpdk-dev] [PATCH v2 0/2] librte_net: add crc computation support Jasvinder Singh
2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs Jasvinder Singh
2017-02-28 12:15       ` Jerin Jacob
2017-03-01 18:46       ` Thomas Monjalon
2017-03-02 13:03         ` Singh, Jasvinder
2017-03-06 15:27           ` Thomas Monjalon
2017-03-08 11:08             ` De Lara Guarch, Pablo
2017-03-15 17:35               ` Thomas Monjalon
2017-03-15 19:03                 ` Dumitrescu, Cristian
2017-03-15 20:15                   ` Thomas Monjalon
2017-03-15 21:11                     ` Dumitrescu, Cristian
2017-03-15 19:09                 ` Dumitrescu, Cristian
2017-03-12 21:33       ` [dpdk-dev] [PATCH v3 0/2] librte_net: add crc computation support Jasvinder Singh
2017-03-12 21:33         ` [dpdk-dev] [PATCH v3 1/2] librte_net: add crc compute APIs Jasvinder Singh
2017-03-13  3:06           ` Ananyev, Konstantin
2017-03-13  9:05             ` Singh, Jasvinder
2017-03-20 19:29           ` [dpdk-dev] [PATCH v4 0/2] librte_net: add crc computation support Jasvinder Singh
2017-03-20 19:29             ` [dpdk-dev] [PATCH v4 1/2] librte_net: add crc compute APIs Jasvinder Singh
2017-03-21 14:45               ` [dpdk-dev] [PATCH v5 0/2] librte_net: add crc computation support Jasvinder Singh
2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 1/2] librte_net: add crc compute APIs Jasvinder Singh
2017-03-28 18:04                   ` De Lara Guarch, Pablo
2017-03-28 18:07                     ` De Lara Guarch, Pablo
2017-03-28 19:21                     ` Singh, Jasvinder
2017-03-29 12:42                   ` [dpdk-dev] [PATCH v6 0/2] librte_net: add crc computation support Jasvinder Singh
2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 1/2] librte_net: add crc compute APIs Jasvinder Singh
2017-03-29 16:14                       ` De Lara Guarch, Pablo
2017-03-29 17:15                       ` [dpdk-dev] [PATCH v7 0/2] librte_net: add crc computation support Jasvinder Singh
2017-03-29 17:15                         ` [dpdk-dev] [PATCH v7 1/2] librte_net: add crc compute APIs Jasvinder Singh
2017-03-30 11:30                           ` [dpdk-dev] [PATCH v8 0/2] librte_net: add crc computation support Jasvinder Singh
2017-03-30 11:30                             ` [dpdk-dev] [PATCH v8 1/2] librte_net: add crc compute APIs Jasvinder Singh
2017-03-30 11:31                               ` Ananyev, Konstantin
2017-03-30 12:06                                 ` Singh, Jasvinder
2017-03-30 14:40                                 ` Olivier Matz
2017-03-30 15:14                                   ` Singh, Jasvinder
2017-03-30 16:15                               ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Jasvinder Singh
2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 1/3] librte_net: add crc compute APIs Jasvinder Singh
2017-04-04 20:00                                   ` Thomas Monjalon
2017-04-05 14:58                                   ` [dpdk-dev] [PATCH v10 0/2] librte_net: add crc computation support Jasvinder Singh
2017-04-05 14:58                                     ` [dpdk-dev] [PATCH v10 1/2] librte_net: add crc compute APIs Jasvinder Singh
2017-04-05 17:49                                       ` Thomas Monjalon
2017-04-05 19:22                                         ` Singh, Jasvinder
2017-04-05 20:49                                       ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Jasvinder Singh
2017-04-05 20:49                                         ` [dpdk-dev] [PATCH v11 1/2] librte_net: add crc compute APIs Jasvinder Singh
2017-04-05 20:49                                         ` [dpdk-dev] [PATCH v11 2/2] test/test: add unit test for CRC computation Jasvinder Singh
2017-04-05 20:59                                           ` Thomas Monjalon
2017-04-05 21:00                                         ` [dpdk-dev] [PATCH v11 0/2] librte_net: add crc computation support Thomas Monjalon
2017-04-05 14:58                                     ` [dpdk-dev] [PATCH v10 2/2] test/test: add unit test for CRC computation Jasvinder Singh
2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 2/3] " Jasvinder Singh
2017-03-30 16:15                                 ` [dpdk-dev] [PATCH v9 3/3] maintainers: add packet crc section and claim maintainership Jasvinder Singh
2017-04-04 19:55                                   ` Thomas Monjalon
2017-04-04 20:02                                 ` [dpdk-dev] [PATCH v9 0/3] librte_net: add crc computation support Thomas Monjalon
2017-04-05  8:34                                   ` Singh, Jasvinder
2017-04-05  9:01                                     ` Thomas Monjalon
2017-04-05  9:37                                       ` Richardson, Bruce
2017-04-05 12:52                                         ` Singh, Jasvinder
2017-03-30 11:30                             ` [dpdk-dev] [PATCH v8 2/2] test/test: add unit test for CRC computation Jasvinder Singh
2017-03-29 17:15                         ` [dpdk-dev] [PATCH v7 " Jasvinder Singh
2017-03-29 12:42                     ` [dpdk-dev] [PATCH v6 " Jasvinder Singh
2017-03-29 16:12                       ` De Lara Guarch, Pablo
2017-03-21 14:45                 ` [dpdk-dev] [PATCH v5 " Jasvinder Singh
2017-03-28 19:23                   ` De Lara Guarch, Pablo
2017-03-28 19:27                     ` Singh, Jasvinder
2017-03-20 19:29             ` [dpdk-dev] [PATCH v4 2/2] app/test: " Jasvinder Singh
2017-03-21  7:14               ` Peng, Yuan
2017-03-12 21:33         ` [dpdk-dev] [PATCH v3 " Jasvinder Singh
2017-02-28 12:08     ` [dpdk-dev] [PATCH v2 " Jasvinder Singh
2017-02-24 20:54 ` [dpdk-dev] [PATCH " Jasvinder Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).