DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
@ 2014-09-03  6:05 Yerden Zhumabekov
  2014-09-03  6:05 ` [dpdk-dev] [PATCH 1/2] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
                   ` (10 more replies)
  0 siblings, 11 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-09-03  6:05 UTC (permalink / raw)
  To: dev

As SSE4.2 provides CRC32 instructions with either 32 and 64 bit operands,
new rte_hash_crc_8byte() call assisted with _mm_crc32_u64 intrinsic may be
useful.

Then, rte_hash_crc() function is redesigned to take advantage of both 32
and 64 bit operands. This improves the function's performance significantly.

Results of my test run on a single CPU core are below.

CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Number of iterations/chunks: 52428800
Chunk size: 24
  rte_hash_crc:            0.379 sec, hash: 0x14c64e11
  rte_hash_crc_new:        0.253 sec, hash: 0x14c64e11
Chunk size: 25
  rte_hash_crc:            0.442 sec, hash: 0xa9afc779
  rte_hash_crc_new:        0.316 sec, hash: 0xa9afc779
Chunk size: 26
  rte_hash_crc:            0.442 sec, hash: 0x92f2284b
  rte_hash_crc_new:        0.316 sec, hash: 0x92f2284b
Chunk size: 27
  rte_hash_crc:            0.442 sec, hash: 0x7c4655ff
  rte_hash_crc_new:        0.316 sec, hash: 0x7c4655ff
Chunk size: 28
  rte_hash_crc:            0.442 sec, hash: 0xf577c6b4
  rte_hash_crc_new:        0.316 sec, hash: 0xf577c6b4
Chunk size: 29
  rte_hash_crc:            0.505 sec, hash: 0x6e18ba55
  rte_hash_crc_new:        0.337 sec, hash: 0x6e18ba55
Chunk size: 30
  rte_hash_crc:            0.505 sec, hash: 0x35f07dbb
  rte_hash_crc_new:        0.337 sec, hash: 0x35f07dbb
Chunk size: 31
  rte_hash_crc:            0.505 sec, hash: 0x1bf2ee8c
  rte_hash_crc_new:        0.337 sec, hash: 0x1bf2ee8c

Yerden Zhumabekov (2):
  hash: add new rte_hash_crc_8byte call
  hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics

 lib/librte_hash/rte_hash_crc.h |   47 +++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 8 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH 1/2] hash: add new rte_hash_crc_8byte call
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
@ 2014-09-03  6:05 ` Yerden Zhumabekov
  2014-09-03  6:05 ` [dpdk-dev] [PATCH 2/2] hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics Yerden Zhumabekov
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-09-03  6:05 UTC (permalink / raw)
  To: dev

SSE4.2 provides _mm_crc32_u64 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..102b2a0 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -64,6 +64,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }
 
 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+	return _mm_crc32_u64(init_val, data);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH 2/2] hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
  2014-09-03  6:05 ` [dpdk-dev] [PATCH 1/2] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
@ 2014-09-03  6:05 ` Yerden Zhumabekov
  2014-11-13 17:33 ` [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Thomas Monjalon
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-09-03  6:05 UTC (permalink / raw)
  To: dev

Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using either 8 and 4-byte CRC32 intrinsics.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 102b2a0..d023e5d 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -95,23 +95,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
 	unsigned i;
-	uint32_t temp = 0;
-	const uint32_t *p32 = (const uint32_t *)data;
+	uint64_t temp = 0;
+	const uint64_t *p64 = (const uint64_t *)data;
 
-	for (i = 0; i < data_len / 4; i++) {
-		init_val = rte_hash_crc_4byte(*p32++, init_val);
+	for (i = 0; i < data_len / 8; i++) {
+		init_val = rte_hash_crc_8byte(*p64++, init_val);
 	}
 
-	switch (3 - (data_len & 0x03)) {
+	switch (7 - (data_len & 0x07)) {
 	case 0:
-		temp |= *((const uint8_t *)p32 + 2) << 16;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
 		/* Fallthrough */
 	case 1:
-		temp |= *((const uint8_t *)p32 + 1) << 8;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
 		/* Fallthrough */
 	case 2:
-		temp |= *((const uint8_t *)p32);
+		temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+		temp |= *((const uint32_t *)p64);
+		init_val = rte_hash_crc_8byte(temp, init_val);
+		break;
+	case 3:
+		init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+		break;
+	case 4:
+		temp |= *((const uint8_t *)p64 + 2) << 16;
+		/* Fallthrough */
+	case 5:
+		temp |= *((const uint8_t *)p64 + 1) << 8;
+		/* Fallthrough */
+	case 6:
+		temp |= *((const uint8_t *)p64);
 		init_val = rte_hash_crc_4byte(temp, init_val);
+		/* Fallthrough */
 	default:
 		break;
 	}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
  2014-09-03  6:05 ` [dpdk-dev] [PATCH 1/2] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
  2014-09-03  6:05 ` [dpdk-dev] [PATCH 2/2] hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics Yerden Zhumabekov
@ 2014-11-13 17:33 ` Thomas Monjalon
  2014-11-14  0:52   ` Neil Horman
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 98+ messages in thread
From: Thomas Monjalon @ 2014-11-13 17:33 UTC (permalink / raw)
  To: dev

Any comment on these patches?

2014-09-03 12:05, Yerden Zhumabekov:
> As SSE4.2 provides CRC32 instructions with either 32 and 64 bit operands,
> new rte_hash_crc_8byte() call assisted with _mm_crc32_u64 intrinsic may be
> useful.
> 
> Then, rte_hash_crc() function is redesigned to take advantage of both 32
> and 64 bit operands. This improves the function's performance significantly.
> 
> Results of my test run on a single CPU core are below.
> 
> CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
> Number of iterations/chunks: 52428800
> Chunk size: 24
>   rte_hash_crc:            0.379 sec, hash: 0x14c64e11
>   rte_hash_crc_new:        0.253 sec, hash: 0x14c64e11
> Chunk size: 25
>   rte_hash_crc:            0.442 sec, hash: 0xa9afc779
>   rte_hash_crc_new:        0.316 sec, hash: 0xa9afc779
> Chunk size: 26
>   rte_hash_crc:            0.442 sec, hash: 0x92f2284b
>   rte_hash_crc_new:        0.316 sec, hash: 0x92f2284b
> Chunk size: 27
>   rte_hash_crc:            0.442 sec, hash: 0x7c4655ff
>   rte_hash_crc_new:        0.316 sec, hash: 0x7c4655ff
> Chunk size: 28
>   rte_hash_crc:            0.442 sec, hash: 0xf577c6b4
>   rte_hash_crc_new:        0.316 sec, hash: 0xf577c6b4
> Chunk size: 29
>   rte_hash_crc:            0.505 sec, hash: 0x6e18ba55
>   rte_hash_crc_new:        0.337 sec, hash: 0x6e18ba55
> Chunk size: 30
>   rte_hash_crc:            0.505 sec, hash: 0x35f07dbb
>   rte_hash_crc_new:        0.337 sec, hash: 0x35f07dbb
> Chunk size: 31
>   rte_hash_crc:            0.505 sec, hash: 0x1bf2ee8c
>   rte_hash_crc_new:        0.337 sec, hash: 0x1bf2ee8c
> 
> Yerden Zhumabekov (2):
>   hash: add new rte_hash_crc_8byte call
>   hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics
> 
>  lib/librte_hash/rte_hash_crc.h |   47 +++++++++++++++++++++++++++++++++-------
>  1 file changed, 39 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-13 17:33 ` [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Thomas Monjalon
@ 2014-11-14  0:52   ` Neil Horman
  2014-11-14  7:15     ` Yerden Zhumabekov
  0 siblings, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-14  0:52 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Thu, Nov 13, 2014 at 06:33:14PM +0100, Thomas Monjalon wrote:
> Any comment on these patches?
> 
> 2014-09-03 12:05, Yerden Zhumabekov:
> > As SSE4.2 provides CRC32 instructions with either 32 and 64 bit operands,
> > new rte_hash_crc_8byte() call assisted with _mm_crc32_u64 intrinsic may be
> > useful.
> > 
> > Then, rte_hash_crc() function is redesigned to take advantage of both 32
> > and 64 bit operands. This improves the function's performance significantly.
> > 
> > Results of my test run on a single CPU core are below.
> > 
> > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
> > Number of iterations/chunks: 52428800
> > Chunk size: 24
> >   rte_hash_crc:            0.379 sec, hash: 0x14c64e11
> >   rte_hash_crc_new:        0.253 sec, hash: 0x14c64e11
> > Chunk size: 25
> >   rte_hash_crc:            0.442 sec, hash: 0xa9afc779
> >   rte_hash_crc_new:        0.316 sec, hash: 0xa9afc779
> > Chunk size: 26
> >   rte_hash_crc:            0.442 sec, hash: 0x92f2284b
> >   rte_hash_crc_new:        0.316 sec, hash: 0x92f2284b
> > Chunk size: 27
> >   rte_hash_crc:            0.442 sec, hash: 0x7c4655ff
> >   rte_hash_crc_new:        0.316 sec, hash: 0x7c4655ff
> > Chunk size: 28
> >   rte_hash_crc:            0.442 sec, hash: 0xf577c6b4
> >   rte_hash_crc_new:        0.316 sec, hash: 0xf577c6b4
> > Chunk size: 29
> >   rte_hash_crc:            0.505 sec, hash: 0x6e18ba55
> >   rte_hash_crc_new:        0.337 sec, hash: 0x6e18ba55
> > Chunk size: 30
> >   rte_hash_crc:            0.505 sec, hash: 0x35f07dbb
> >   rte_hash_crc_new:        0.337 sec, hash: 0x35f07dbb
> > Chunk size: 31
> >   rte_hash_crc:            0.505 sec, hash: 0x1bf2ee8c
> >   rte_hash_crc_new:        0.337 sec, hash: 0x1bf2ee8c
> > 
> > Yerden Zhumabekov (2):
> >   hash: add new rte_hash_crc_8byte call
> >   hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics
> > 
> >  lib/librte_hash/rte_hash_crc.h |   47 +++++++++++++++++++++++++++++++++-------
> >  1 file changed, 39 insertions(+), 8 deletions(-)
> 
> 
Yeah, sorry I didn't speak up earlier.  I meant to ask if the __mm_crc_u64
intrinsic will emit software emulated versions of the sse4.2 instruction in the
event that you build with a config that doesn't enable sse4.2?  If not, then
NAK, since this will break on the default build.  In that event you'll have to
modify the new function to do a runtime cpu flags check to either just use the
instruction inlined with some asm, or emulate it in software.

Neil

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-14  0:52   ` Neil Horman
@ 2014-11-14  7:15     ` Yerden Zhumabekov
  2014-11-14 11:33       ` Neil Horman
  0 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-14  7:15 UTC (permalink / raw)
  To: Neil Horman, Thomas Monjalon; +Cc: dev

14.11.2014 6:52, Neil Horman пишет:
> On Thu, Nov 13, 2014 at 06:33:14PM +0100, Thomas Monjalon wrote:
>> Any comment on these patches?
>>
>> 2014-09-03 12:05, Yerden Zhumabekov:
>>> As SSE4.2 provides CRC32 instructions with either 32 and 64 bit operands,
>>> new rte_hash_crc_8byte() call assisted with _mm_crc32_u64 intrinsic may be
>>> useful.
>>>
>>> ... <skipped> ...
>>
> Yeah, sorry I didn't speak up earlier.  I meant to ask if the __mm_crc_u64
> intrinsic will emit software emulated versions of the sse4.2 instruction in the
> event that you build with a config that doesn't enable sse4.2?  If not, then
> NAK, since this will break on the default build.  In that event you'll have to
> modify the new function to do a runtime cpu flags check to either just use the
> instruction inlined with some asm, or emulate it in software.

Hello,

A quick grep on dpdk source shows that rte_hash_crc() is used in
librte_hash in following context:

In rte_hash.c:
/* Hash function used if none is specified */
#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
#include <rte_hash_crc.h>
#define DEFAULT_HASH_FUNC       rte_hash_crc
#else
#include <rte_jhash.h>
#define DEFAULT_HASH_FUNC       rte_jhash
#endif

In rte_fbk_hash.h
#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
#include <rte_hash_crc.h>
/** Default four-byte key hash function if none is specified. */
#define RTE_FBK_HASH_FUNC_DEFAULT·······rte_hash_crc_4byte
#else
#include <rte_jhash.h>
#define RTE_FBK_HASH_FUNC_DEFAULT·······rte_jhash_1word
#endif
#endif


I guess it covers the cpu flags check you're talking about.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-14  7:15     ` Yerden Zhumabekov
@ 2014-11-14 11:33       ` Neil Horman
  2014-11-14 11:57         ` Yerden Zhumabekov
  0 siblings, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-14 11:33 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Fri, Nov 14, 2014 at 01:15:12PM +0600, Yerden Zhumabekov wrote:
> 14.11.2014 6:52, Neil Horman пишет:
> > On Thu, Nov 13, 2014 at 06:33:14PM +0100, Thomas Monjalon wrote:
> >> Any comment on these patches?
> >>
> >> 2014-09-03 12:05, Yerden Zhumabekov:
> >>> As SSE4.2 provides CRC32 instructions with either 32 and 64 bit operands,
> >>> new rte_hash_crc_8byte() call assisted with _mm_crc32_u64 intrinsic may be
> >>> useful.
> >>>
> >>> ... <skipped> ...
> >>
> > Yeah, sorry I didn't speak up earlier.  I meant to ask if the __mm_crc_u64
> > intrinsic will emit software emulated versions of the sse4.2 instruction in the
> > event that you build with a config that doesn't enable sse4.2?  If not, then
> > NAK, since this will break on the default build.  In that event you'll have to
> > modify the new function to do a runtime cpu flags check to either just use the
> > instruction inlined with some asm, or emulate it in software.
> 
> Hello,
> 
> A quick grep on dpdk source shows that rte_hash_crc() is used in
> librte_hash in following context:
> 
> In rte_hash.c:
> /* Hash function used if none is specified */
> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> #include <rte_hash_crc.h>
> #define DEFAULT_HASH_FUNC       rte_hash_crc
> #else
> #include <rte_jhash.h>
> #define DEFAULT_HASH_FUNC       rte_jhash
> #endif
> 
> In rte_fbk_hash.h
> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> #include <rte_hash_crc.h>
> /** Default four-byte key hash function if none is specified. */
> #define RTE_FBK_HASH_FUNC_DEFAULT·······rte_hash_crc_4byte
> #else
> #include <rte_jhash.h>
> #define RTE_FBK_HASH_FUNC_DEFAULT·······rte_jhash_1word
> #endif
> #endif
> 
> 
> I guess it covers the cpu flags check you're talking about.
> 

Not really.  That covers the case of applications selecting the hash function
using the DEFUALT_HASH_FUNC macro, but doesn't nothing for applications using
the function directly.  Test_hash_perf is an example  of this, and ostensibly
because of the behavior without SSE4.2 it defines these huge test tables twice
based on the availability of SSE4.2.  It would be better if we could allow
applications to use rte_hash_crc regardless, and make the code it uses at run
time configurable.
Neil

> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-14 11:33       ` Neil Horman
@ 2014-11-14 11:57         ` Yerden Zhumabekov
  2014-11-14 13:53           ` Neil Horman
  0 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-14 11:57 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev


14.11.2014 17:33, Neil Horman пишет:
> On Fri, Nov 14, 2014 at 01:15:12PM +0600, Yerden Zhumabekov wrote:
>>
>> Hello,
>>
>> A quick grep on dpdk source shows that rte_hash_crc() is used in
>> librte_hash in following context:
>>
>> In rte_hash.c:
>> /* Hash function used if none is specified */
>> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>> #include <rte_hash_crc.h>
>> #define DEFAULT_HASH_FUNC       rte_hash_crc
>> #else
>> #include <rte_jhash.h>
>> #define DEFAULT_HASH_FUNC       rte_jhash
>> #endif
>>
>> In rte_fbk_hash.h
>> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>> #include <rte_hash_crc.h>
>> /** Default four-byte key hash function if none is specified. */
>> #define RTE_FBK_HASH_FUNC_DEFAULT·······rte_hash_crc_4byte
>> #else
>> #include <rte_jhash.h>
>> #define RTE_FBK_HASH_FUNC_DEFAULT·······rte_jhash_1word
>> #endif
>> #endif
>>
>>
>> I guess it covers the cpu flags check you're talking about.
>>
> Not really.  That covers the case of applications selecting the hash function
> using the DEFUALT_HASH_FUNC macro, but doesn't nothing for applications using
> the function directly.  Test_hash_perf is an example  of this, and ostensibly
> because of the behavior without SSE4.2 it defines these huge test tables twice
> based on the availability of SSE4.2.  It would be better if we could allow
> applications to use rte_hash_crc regardless, and make the code it uses at run
> time configurable.

I see, then we have a problem here :)

Actually, that was one of my concerns when developing these patches. I
looked through the source code of libs and examples and I saw the
'#ifdef..#include..#endif'-like appoach while selecting hash function
was common. So I organized patches to minimize the impact on API and not
to contradict this approach.

If we prefer to change this approach then, I guess, we need to introduce
broader changes to rte_hash library and change other code which uses it.
If that's what's needed, then it'll take some time for me to rework
these patches.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-14 11:57         ` Yerden Zhumabekov
@ 2014-11-14 13:53           ` Neil Horman
  2014-11-14 14:33             ` Thomas Monjalon
  2014-11-14 16:43             ` Yerden Zhumabekov
  0 siblings, 2 replies; 98+ messages in thread
From: Neil Horman @ 2014-11-14 13:53 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Fri, Nov 14, 2014 at 05:57:51PM +0600, Yerden Zhumabekov wrote:
> 
> 14.11.2014 17:33, Neil Horman пишет:
> > On Fri, Nov 14, 2014 at 01:15:12PM +0600, Yerden Zhumabekov wrote:
> >>
> >> Hello,
> >>
> >> A quick grep on dpdk source shows that rte_hash_crc() is used in
> >> librte_hash in following context:
> >>
> >> In rte_hash.c:
> >> /* Hash function used if none is specified */
> >> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> >> #include <rte_hash_crc.h>
> >> #define DEFAULT_HASH_FUNC       rte_hash_crc
> >> #else
> >> #include <rte_jhash.h>
> >> #define DEFAULT_HASH_FUNC       rte_jhash
> >> #endif
> >>
> >> In rte_fbk_hash.h
> >> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> >> #include <rte_hash_crc.h>
> >> /** Default four-byte key hash function if none is specified. */
> >> #define RTE_FBK_HASH_FUNC_DEFAULT·······rte_hash_crc_4byte
> >> #else
> >> #include <rte_jhash.h>
> >> #define RTE_FBK_HASH_FUNC_DEFAULT·······rte_jhash_1word
> >> #endif
> >> #endif
> >>
> >>
> >> I guess it covers the cpu flags check you're talking about.
> >>
> > Not really.  That covers the case of applications selecting the hash function
> > using the DEFUALT_HASH_FUNC macro, but doesn't nothing for applications using
> > the function directly.  Test_hash_perf is an example  of this, and ostensibly
> > because of the behavior without SSE4.2 it defines these huge test tables twice
> > based on the availability of SSE4.2.  It would be better if we could allow
> > applications to use rte_hash_crc regardless, and make the code it uses at run
> > time configurable.
> 
> I see, then we have a problem here :)
> 
> Actually, that was one of my concerns when developing these patches. I
> looked through the source code of libs and examples and I saw the
> '#ifdef..#include..#endif'-like appoach while selecting hash function
> was common. So I organized patches to minimize the impact on API and not
> to contradict this approach.
> 
Thats a reasonable approach, but I really hate the idea of continuing this need
to select cpu features at compile time if its not nececcesary.

> If we prefer to change this approach then, I guess, we need to introduce
> broader changes to rte_hash library and change other code which uses it.
> If that's what's needed, then it'll take some time for me to rework
> these patches.
> 
Well, its possible you'll get lucky.  crc is such a common operation, its
entirely possible that the gcc intrinsic emits software based crc computation if
the SSE4.2 instructions aren't enabled.  I recommend modifying the test_hash_crc
function to use rte_hash_crc with SSE4.2 disabled, and see if you get a crash.
If you don't examine the disassembly of your new function and confirm that
something reasonable that doesn't use SSE4.2 is emitted.  If thats the case,
your patch is fine, and we can focus on how to change the ifdefs in the existing
code, as use of the rte_hash_crc functions should be safe.

Best
Neil

> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-14 13:53           ` Neil Horman
@ 2014-11-14 14:33             ` Thomas Monjalon
  2014-11-14 16:43             ` Yerden Zhumabekov
  1 sibling, 0 replies; 98+ messages in thread
From: Thomas Monjalon @ 2014-11-14 14:33 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

2014-11-14 08:53, Neil Horman:
> On Fri, Nov 14, 2014 at 05:57:51PM +0600, Yerden Zhumabekov wrote:
> > 14.11.2014 17:33, Neil Horman пишет:
> > > On Fri, Nov 14, 2014 at 01:15:12PM +0600, Yerden Zhumabekov wrote:
> > >> A quick grep on dpdk source shows that rte_hash_crc() is used in
> > >> librte_hash in following context:
> > >>
> > >> In rte_hash.c:
> > >> /* Hash function used if none is specified */
> > >> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > >> #include <rte_hash_crc.h>
> > >> #define DEFAULT_HASH_FUNC       rte_hash_crc
> > >> #else
> > >> #include <rte_jhash.h>
> > >> #define DEFAULT_HASH_FUNC       rte_jhash
> > >> #endif
> > >>
> > >> In rte_fbk_hash.h
> > >> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > >> #include <rte_hash_crc.h>
> > >> /** Default four-byte key hash function if none is specified. */
> > >> #define RTE_FBK_HASH_FUNC_DEFAULT·······rte_hash_crc_4byte
> > >> #else
> > >> #include <rte_jhash.h>
> > >> #define RTE_FBK_HASH_FUNC_DEFAULT·······rte_jhash_1word
> > >> #endif
> > >> #endif
> > >>
> > >>
> > >> I guess it covers the cpu flags check you're talking about.
> > >>
> > > Not really.  That covers the case of applications selecting the hash function
> > > using the DEFUALT_HASH_FUNC macro, but doesn't nothing for applications using
> > > the function directly.  Test_hash_perf is an example  of this, and ostensibly
> > > because of the behavior without SSE4.2 it defines these huge test tables twice
> > > based on the availability of SSE4.2.  It would be better if we could allow
> > > applications to use rte_hash_crc regardless, and make the code it uses at run
> > > time configurable.
> > 
> > I see, then we have a problem here :)
> > 
> > Actually, that was one of my concerns when developing these patches. I
> > looked through the source code of libs and examples and I saw the
> > '#ifdef..#include..#endif'-like appoach while selecting hash function
> > was common. So I organized patches to minimize the impact on API and not
> > to contradict this approach.
> > 
> Thats a reasonable approach, but I really hate the idea of continuing this need
> to select cpu features at compile time if its not nececcesary.

Yes, it's better to make it working on all architectures.

> > If we prefer to change this approach then, I guess, we need to introduce
> > broader changes to rte_hash library and change other code which uses it.
> > If that's what's needed, then it'll take some time for me to rework
> > these patches.
> > 
> Well, its possible you'll get lucky.  crc is such a common operation, its
> entirely possible that the gcc intrinsic emits software based crc computation if
> the SSE4.2 instructions aren't enabled.  I recommend modifying the test_hash_crc
> function to use rte_hash_crc with SSE4.2 disabled, and see if you get a crash.
> If you don't examine the disassembly of your new function and confirm that
> something reasonable that doesn't use SSE4.2 is emitted.  If thats the case,
> your patch is fine, and we can focus on how to change the ifdefs in the existing
> code, as use of the rte_hash_crc functions should be safe.

It's the ideal case.

I remind the consensus we had about having different code paths:
	http://dpdk.org/ml/archives/dev/2014-August/004670.html
The idea is to make the code working on all architectures thanks to a generic
code which leverage build time optimizations.
In the meantime, we want to be able to build DPDK for a default platform with
different code paths. In case a really interesting optimization requires more
CPU features, we can check CPU flags at runtime and select the best code path.
In ACL code, it is the responsibility of the app to choose the code path:
	http://dpdk.org/browse/dpdk/tree/lib/librte_acl/rte_acl.h#n346

-- 
Thomas

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-14 13:53           ` Neil Horman
  2014-11-14 14:33             ` Thomas Monjalon
@ 2014-11-14 16:43             ` Yerden Zhumabekov
  2014-11-14 18:41               ` Neil Horman
  1 sibling, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-14 16:43 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev


14.11.2014 19:53, Neil Horman пишет:
> On Fri, Nov 14, 2014 at 05:57:51PM +0600, Yerden Zhumabekov wrote:
>> 14.11.2014 17:33, Neil Horman пишет:
>>> Not really.  That covers the case of applications selecting the hash function
>>> using the DEFUALT_HASH_FUNC macro, but doesn't nothing for applications using
>>> the function directly.  Test_hash_perf is an example  of this, and ostensibly
>>> because of the behavior without SSE4.2 it defines these huge test tables twice
>>> based on the availability of SSE4.2.  It would be better if we could allow
>>> applications to use rte_hash_crc regardless, and make the code it uses at run
>>> time configurable.
>> I see, then we have a problem here :)
>>
>> Actually, that was one of my concerns when developing these patches. I
>> looked through the source code of libs and examples and I saw the
>> '#ifdef..#include..#endif'-like appoach while selecting hash function
>> was common. So I organized patches to minimize the impact on API and not
>> to contradict this approach.
>>
> Thats a reasonable approach, but I really hate the idea of continuing this need
> to select cpu features at compile time if its not nececcesary.
>
>> If we prefer to change this approach then, I guess, we need to introduce
>> broader changes to rte_hash library and change other code which uses it.
>> If that's what's needed, then it'll take some time for me to rework
>> these patches.
>>
> Well, its possible you'll get lucky.  crc is such a common operation, its
> entirely possible that the gcc intrinsic emits software based crc computation if
> the SSE4.2 instructions aren't enabled.  I recommend modifying the test_hash_crc
> function to use rte_hash_crc with SSE4.2 disabled, and see if you get a crash.
> If you don't examine the disassembly of your new function and confirm that
> something reasonable that doesn't use SSE4.2 is emitted.  If thats the case,
> your patch is fine, and we can focus on how to change the ifdefs in the existing
> code, as use of the rte_hash_crc functions should be safe.
>

Unfortunately, it seems not to be the case. Trying to force compiling a
test program with _mm_crc32_u32 intrinsic on computer with no SSE4.2
support leads to "Illegal instruction error". So it looks like GCC does
not fall back to crc32 software implementation.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-14 16:43             ` Yerden Zhumabekov
@ 2014-11-14 18:41               ` Neil Horman
  2014-11-15 21:45                 ` Yerden Zhumabekov
  0 siblings, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-14 18:41 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Fri, Nov 14, 2014 at 10:43:39PM +0600, Yerden Zhumabekov wrote:
> 
> 14.11.2014 19:53, Neil Horman пишет:
> > On Fri, Nov 14, 2014 at 05:57:51PM +0600, Yerden Zhumabekov wrote:
> >> 14.11.2014 17:33, Neil Horman пишет:
> >>> Not really.  That covers the case of applications selecting the hash function
> >>> using the DEFUALT_HASH_FUNC macro, but doesn't nothing for applications using
> >>> the function directly.  Test_hash_perf is an example  of this, and ostensibly
> >>> because of the behavior without SSE4.2 it defines these huge test tables twice
> >>> based on the availability of SSE4.2.  It would be better if we could allow
> >>> applications to use rte_hash_crc regardless, and make the code it uses at run
> >>> time configurable.
> >> I see, then we have a problem here :)
> >>
> >> Actually, that was one of my concerns when developing these patches. I
> >> looked through the source code of libs and examples and I saw the
> >> '#ifdef..#include..#endif'-like appoach while selecting hash function
> >> was common. So I organized patches to minimize the impact on API and not
> >> to contradict this approach.
> >>
> > Thats a reasonable approach, but I really hate the idea of continuing this need
> > to select cpu features at compile time if its not nececcesary.
> >
> >> If we prefer to change this approach then, I guess, we need to introduce
> >> broader changes to rte_hash library and change other code which uses it.
> >> If that's what's needed, then it'll take some time for me to rework
> >> these patches.
> >>
> > Well, its possible you'll get lucky.  crc is such a common operation, its
> > entirely possible that the gcc intrinsic emits software based crc computation if
> > the SSE4.2 instructions aren't enabled.  I recommend modifying the test_hash_crc
> > function to use rte_hash_crc with SSE4.2 disabled, and see if you get a crash.
> > If you don't examine the disassembly of your new function and confirm that
> > something reasonable that doesn't use SSE4.2 is emitted.  If thats the case,
> > your patch is fine, and we can focus on how to change the ifdefs in the existing
> > code, as use of the rte_hash_crc functions should be safe.
> >
> 
> Unfortunately, it seems not to be the case. Trying to force compiling a
> test program with _mm_crc32_u32 intrinsic on computer with no SSE4.2
> support leads to "Illegal instruction error". So it looks like GCC does
> not fall back to crc32 software implementation.
> 
Ok, but crc32 is pretty easy to implement in software.  Just appropriate the
calculate_crc32c function from the BSD or Linux kernels and if
(unlikely(!support_sse42)) calculate_crc32 operation.

Neil

> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call
  2014-11-14 18:41               ` Neil Horman
@ 2014-11-15 21:45                 ` Yerden Zhumabekov
  0 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-15 21:45 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev


15.11.2014 0:41, Neil Horman пишет:
> On Fri, Nov 14, 2014 at 10:43:39PM +0600, Yerden Zhumabekov wrote:
>> 14.11.2014 19:53, Neil Horman пишет:
>>>
>>> Well, its possible you'll get lucky.  crc is such a common operation, its
>>> entirely possible that the gcc intrinsic emits software based crc computation if
>>> the SSE4.2 instructions aren't enabled.  I recommend modifying the test_hash_crc
>>> function to use rte_hash_crc with SSE4.2 disabled, and see if you get a crash.
>>> If you don't examine the disassembly of your new function and confirm that
>>> something reasonable that doesn't use SSE4.2 is emitted.  If thats the case,
>>> your patch is fine, and we can focus on how to change the ifdefs in the existing
>>> code, as use of the rte_hash_crc functions should be safe.
>>>
>> Unfortunately, it seems not to be the case. Trying to force compiling a
>> test program with _mm_crc32_u32 intrinsic on computer with no SSE4.2
>> support leads to "Illegal instruction error". So it looks like GCC does
>> not fall back to crc32 software implementation.
>>
> Ok, but crc32 is pretty easy to implement in software.  Just appropriate the
> calculate_crc32c function from the BSD or Linux kernels and if
> (unlikely(!support_sse42)) calculate_crc32 operation.
>

I've almost reworked patches, but there's one more issue I was wondering
about.

If we use a flag (say, 'sse42_flag ') to determine code path, where
should it be defined?
Should it be some sort of rte_hash_crc_init() call in the init stage of
application?

Alternatively, I could have implemented it like this:


static uint8_t sse42_flag = FLAG_UNKNOWN;
....
rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
{
    if (unlikely(sse42_flag == FLAG_UNKNOWN))
        sse42_flag = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2) ?
FLAG_SUPPORTED : FLAG_NOTSUPPORTED;

    if (likely(sse42_flag == FLAG_SUPPORTED))
        return _mm_crc32_u32(init_val, data);
    .....
}



-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
                   ` (2 preceding siblings ...)
  2014-11-13 17:33 ` [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Thomas Monjalon
@ 2014-11-16 17:59 ` Yerden Zhumabekov
  2014-11-17 11:31   ` Neil Horman
  2014-11-18  3:21   ` [dpdk-dev] [PATCH v3 0/5] " Yerden Zhumabekov
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 1/4] hash: add software CRC32 implementation Yerden Zhumabekov
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-16 17:59 UTC (permalink / raw)
  To: dev

This is a rework of my previous patches improving performance of rte_hash_crc. In addition, this revision brings a fallback mechanism to ensure that CRC32 hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics).

Summary of changes:
* added CRC32 software implementation, which is used as a fallback in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
* added rte_hash_crc_set_alg() function to control availability of SSE4.2.
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.

Patches were tested on machines either with and without SSE4.2 support. Software implementation seems to be about 15 times slower than SSE4.2-enabled one. Of course, they return identical results.

Yerden Zhumabekov (4):
  hash: add software CRC32 implementation
  hash: add new rte_hash_crc_8byte call
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces

 lib/librte_hash/rte_hash_crc.h |  212 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 202 insertions(+), 10 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v2 1/4] hash: add software CRC32 implementation
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
                   ` (3 preceding siblings ...)
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
@ 2014-11-16 17:59 ` Yerden Zhumabekov
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 2/4] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-16 17:59 UTC (permalink / raw)
  To: dev

Add lookup table for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and
64-bit operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |  105 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..3c368c5 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -47,6 +47,111 @@ extern "C" {
 #include <stdint.h>
 #include <nmmintrin.h>
 
+/* Lookup table for software implementation of CRC32C */
+static const uint32_t crc32c_table[256] = {
+	0x00000000L, 0xF26B8303L, 0xE13B70F7L, 0x1350F3F4L,
+	0xC79A971FL, 0x35F1141CL, 0x26A1E7E8L, 0xD4CA64EBL,
+	0x8AD958CFL, 0x78B2DBCCL, 0x6BE22838L, 0x9989AB3BL,
+	0x4D43CFD0L, 0xBF284CD3L, 0xAC78BF27L, 0x5E133C24L,
+	0x105EC76FL, 0xE235446CL, 0xF165B798L, 0x030E349BL,
+	0xD7C45070L, 0x25AFD373L, 0x36FF2087L, 0xC494A384L,
+	0x9A879FA0L, 0x68EC1CA3L, 0x7BBCEF57L, 0x89D76C54L,
+	0x5D1D08BFL, 0xAF768BBCL, 0xBC267848L, 0x4E4DFB4BL,
+	0x20BD8EDEL, 0xD2D60DDDL, 0xC186FE29L, 0x33ED7D2AL,
+	0xE72719C1L, 0x154C9AC2L, 0x061C6936L, 0xF477EA35L,
+	0xAA64D611L, 0x580F5512L, 0x4B5FA6E6L, 0xB93425E5L,
+	0x6DFE410EL, 0x9F95C20DL, 0x8CC531F9L, 0x7EAEB2FAL,
+	0x30E349B1L, 0xC288CAB2L, 0xD1D83946L, 0x23B3BA45L,
+	0xF779DEAEL, 0x05125DADL, 0x1642AE59L, 0xE4292D5AL,
+	0xBA3A117EL, 0x4851927DL, 0x5B016189L, 0xA96AE28AL,
+	0x7DA08661L, 0x8FCB0562L, 0x9C9BF696L, 0x6EF07595L,
+	0x417B1DBCL, 0xB3109EBFL, 0xA0406D4BL, 0x522BEE48L,
+	0x86E18AA3L, 0x748A09A0L, 0x67DAFA54L, 0x95B17957L,
+	0xCBA24573L, 0x39C9C670L, 0x2A993584L, 0xD8F2B687L,
+	0x0C38D26CL, 0xFE53516FL, 0xED03A29BL, 0x1F682198L,
+	0x5125DAD3L, 0xA34E59D0L, 0xB01EAA24L, 0x42752927L,
+	0x96BF4DCCL, 0x64D4CECFL, 0x77843D3BL, 0x85EFBE38L,
+	0xDBFC821CL, 0x2997011FL, 0x3AC7F2EBL, 0xC8AC71E8L,
+	0x1C661503L, 0xEE0D9600L, 0xFD5D65F4L, 0x0F36E6F7L,
+	0x61C69362L, 0x93AD1061L, 0x80FDE395L, 0x72966096L,
+	0xA65C047DL, 0x5437877EL, 0x4767748AL, 0xB50CF789L,
+	0xEB1FCBADL, 0x197448AEL, 0x0A24BB5AL, 0xF84F3859L,
+	0x2C855CB2L, 0xDEEEDFB1L, 0xCDBE2C45L, 0x3FD5AF46L,
+	0x7198540DL, 0x83F3D70EL, 0x90A324FAL, 0x62C8A7F9L,
+	0xB602C312L, 0x44694011L, 0x5739B3E5L, 0xA55230E6L,
+	0xFB410CC2L, 0x092A8FC1L, 0x1A7A7C35L, 0xE811FF36L,
+	0x3CDB9BDDL, 0xCEB018DEL, 0xDDE0EB2AL, 0x2F8B6829L,
+	0x82F63B78L, 0x709DB87BL, 0x63CD4B8FL, 0x91A6C88CL,
+	0x456CAC67L, 0xB7072F64L, 0xA457DC90L, 0x563C5F93L,
+	0x082F63B7L, 0xFA44E0B4L, 0xE9141340L, 0x1B7F9043L,
+	0xCFB5F4A8L, 0x3DDE77ABL, 0x2E8E845FL, 0xDCE5075CL,
+	0x92A8FC17L, 0x60C37F14L, 0x73938CE0L, 0x81F80FE3L,
+	0x55326B08L, 0xA759E80BL, 0xB4091BFFL, 0x466298FCL,
+	0x1871A4D8L, 0xEA1A27DBL, 0xF94AD42FL, 0x0B21572CL,
+	0xDFEB33C7L, 0x2D80B0C4L, 0x3ED04330L, 0xCCBBC033L,
+	0xA24BB5A6L, 0x502036A5L, 0x4370C551L, 0xB11B4652L,
+	0x65D122B9L, 0x97BAA1BAL, 0x84EA524EL, 0x7681D14DL,
+	0x2892ED69L, 0xDAF96E6AL, 0xC9A99D9EL, 0x3BC21E9DL,
+	0xEF087A76L, 0x1D63F975L, 0x0E330A81L, 0xFC588982L,
+	0xB21572C9L, 0x407EF1CAL, 0x532E023EL, 0xA145813DL,
+	0x758FE5D6L, 0x87E466D5L, 0x94B49521L, 0x66DF1622L,
+	0x38CC2A06L, 0xCAA7A905L, 0xD9F75AF1L, 0x2B9CD9F2L,
+	0xFF56BD19L, 0x0D3D3E1AL, 0x1E6DCDEEL, 0xEC064EEDL,
+	0xC38D26C4L, 0x31E6A5C7L, 0x22B65633L, 0xD0DDD530L,
+	0x0417B1DBL, 0xF67C32D8L, 0xE52CC12CL, 0x1747422FL,
+	0x49547E0BL, 0xBB3FFD08L, 0xA86F0EFCL, 0x5A048DFFL,
+	0x8ECEE914L, 0x7CA56A17L, 0x6FF599E3L, 0x9D9E1AE0L,
+	0xD3D3E1ABL, 0x21B862A8L, 0x32E8915CL, 0xC083125FL,
+	0x144976B4L, 0xE622F5B7L, 0xF5720643L, 0x07198540L,
+	0x590AB964L, 0xAB613A67L, 0xB831C993L, 0x4A5A4A90L,
+	0x9E902E7BL, 0x6CFBAD78L, 0x7FAB5E8CL, 0x8DC0DD8FL,
+	0xE330A81AL, 0x115B2B19L, 0x020BD8EDL, 0xF0605BEEL,
+	0x24AA3F05L, 0xD6C1BC06L, 0xC5914FF2L, 0x37FACCF1L,
+	0x69E9F0D5L, 0x9B8273D6L, 0x88D28022L, 0x7AB90321L,
+	0xAE7367CAL, 0x5C18E4C9L, 0x4F48173DL, 0xBD23943EL,
+	0xF36E6F75L, 0x0105EC76L, 0x12551F82L, 0xE03E9C81L,
+	0x34F4F86AL, 0xC69F7B69L, 0xD5CF889DL, 0x27A40B9EL,
+	0x79B737BAL, 0x8BDCB4B9L, 0x988C474DL, 0x6AE7C44EL,
+	0xBE2DA0A5L, 0x4C4623A6L, 0x5F16D052L, 0xAD7D5351L
+};
+
+#define CRC32C_UPD(crc, byte) \
+	(crc = crc32c_table[((crc) ^ (byte)) & 0xFFL] ^ ((crc) >> 8))
+
+static inline uint32_t
+crc32c_1word(uint32_t data, uint32_t init_val)
+{
+	union {
+		uint32_t u32;
+		uint8_t u8[4];
+	} d;
+	d.u32 = data;
+	CRC32C_UPD(init_val, d.u8[0]);
+	CRC32C_UPD(init_val, d.u8[1]);
+	CRC32C_UPD(init_val, d.u8[2]);
+	CRC32C_UPD(init_val, d.u8[3]);
+	return init_val;
+}
+
+static inline uint32_t
+crc32c_2words(uint64_t data, uint32_t init_val)
+{
+	union {
+		uint64_t u64;
+		uint8_t u8[8];
+	} d;
+	d.u64 = data;
+	CRC32C_UPD(init_val, d.u8[0]);
+	CRC32C_UPD(init_val, d.u8[1]);
+	CRC32C_UPD(init_val, d.u8[2]);
+	CRC32C_UPD(init_val, d.u8[3]);
+	CRC32C_UPD(init_val, d.u8[4]);
+	CRC32C_UPD(init_val, d.u8[5]);
+	CRC32C_UPD(init_val, d.u8[6]);
+	CRC32C_UPD(init_val, d.u8[7]);
+	return init_val;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
  *
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v2 2/4] hash: add new rte_hash_crc_8byte call
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
                   ` (4 preceding siblings ...)
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 1/4] hash: add software CRC32 implementation Yerden Zhumabekov
@ 2014-11-16 17:59 ` Yerden Zhumabekov
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-16 17:59 UTC (permalink / raw)
  To: dev

SSE4.2 provides _mm_crc32_u64 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 3c368c5..74e2d92 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -169,6 +169,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }
 
 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+	return _mm_crc32_u64(init_val, data);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
                   ` (5 preceding siblings ...)
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 2/4] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
@ 2014-11-16 17:59 ` Yerden Zhumabekov
  2014-11-17 12:34   ` Ananyev, Konstantin
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 4/4] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-16 17:59 UTC (permalink / raw)
  To: dev

Initially, SSE4.2 support is detected via CPUID instruction.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default. If it's
not available, fall back to sw implementation.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   60 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 74e2d92..178b162 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,11 @@ extern "C" {
 #endif
 
 #include <stdint.h>
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <nmmintrin.h>
+#endif
+#include <rte_cpuflags.h>
+#include <rte_branch_prediction.h>
 
 /* Lookup table for software implementation of CRC32C */
 static const uint32_t crc32c_table[256] = {
@@ -152,8 +156,42 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 	return init_val;
 }
 
+enum crc32_alg_t {
+	CRC32_SW = 0,
+	CRC32_SSE42,
+	CRC32_AUTODETECT
+};
+
+/* Default algorithm is left for autodetection,
+ * it is detected on first run of hash function
+ */
+static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * hash calculation.
+ *
+ * @param flag
+ *   unsigned integer flag
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
+ */
+static inline void
+rte_hash_crc_set_alg(enum crc32_alg_t alg)
+{
+	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
+	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
+
+	if (alg == CRC32_SSE42)
+		crc32_alg = alg_supp;
+	else
+		crc32_alg = CRC32_SW;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -165,11 +203,21 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	return _mm_crc32_u32(init_val, data);
+	if (unlikely(crc32_alg == CRC32_AUTODETECT))
+		rte_hash_crc_set_alg(CRC32_SSE42);
+
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+	if (likely(crc32_alg == CRC32_SSE42))
+		return _mm_crc32_u32(init_val, data);
+#endif
+
+	return crc32c_1word(data, init_val);
 }
 
 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -181,7 +229,15 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	return _mm_crc32_u64(init_val, data);
+	if (unlikely(crc32_alg == CRC32_AUTODETECT))
+		rte_hash_crc_set_alg(CRC32_SSE42);
+
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+	if (likely(crc32_alg == CRC32_SSE42))
+		return _mm_crc32_u64(init_val, data);
+#endif
+
+	return crc32c_2words(data, init_val);
 }
 
 /**
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v2 4/4] hash: rte_hash_crc() slices data into 8-byte pieces
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
                   ` (6 preceding siblings ...)
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
@ 2014-11-16 17:59 ` Yerden Zhumabekov
  2014-11-18 14:03 ` [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-16 17:59 UTC (permalink / raw)
  To: dev

Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 178b162..3d8dafe 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -241,7 +241,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }
 
 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -256,23 +256,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
 	unsigned i;
-	uint32_t temp = 0;
-	const uint32_t *p32 = (const uint32_t *)data;
+	uint64_t temp = 0;
+	const uint64_t *p64 = (const uint64_t *)data;
 
-	for (i = 0; i < data_len / 4; i++) {
-		init_val = rte_hash_crc_4byte(*p32++, init_val);
+	for (i = 0; i < data_len / 8; i++) {
+		init_val = rte_hash_crc_8byte(*p64++, init_val);
 	}
 
-	switch (3 - (data_len & 0x03)) {
+	switch (7 - (data_len & 0x07)) {
 	case 0:
-		temp |= *((const uint8_t *)p32 + 2) << 16;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
 		/* Fallthrough */
 	case 1:
-		temp |= *((const uint8_t *)p32 + 1) << 8;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
 		/* Fallthrough */
 	case 2:
-		temp |= *((const uint8_t *)p32);
+		temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+		temp |= *((const uint32_t *)p64);
+		init_val = rte_hash_crc_8byte(temp, init_val);
+		break;
+	case 3:
+		init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+		break;
+	case 4:
+		temp |= *((const uint8_t *)p64 + 2) << 16;
+		/* Fallthrough */
+	case 5:
+		temp |= *((const uint8_t *)p64 + 1) << 8;
+		/* Fallthrough */
+	case 6:
+		temp |= *((const uint8_t *)p64);
 		init_val = rte_hash_crc_4byte(temp, init_val);
+		/* Fallthrough */
 	default:
 		break;
 	}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
@ 2014-11-17 11:31   ` Neil Horman
  2014-11-17 11:54     ` Yerden Zhumabekov
  2014-11-18  3:21   ` [dpdk-dev] [PATCH v3 0/5] " Yerden Zhumabekov
  1 sibling, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-17 11:31 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Sun, Nov 16, 2014 at 11:59:16PM +0600, Yerden Zhumabekov wrote:
> This is a rework of my previous patches improving performance of rte_hash_crc. In addition, this revision brings a fallback mechanism to ensure that CRC32 hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics).
> 
> Summary of changes:
> * added CRC32 software implementation, which is used as a fallback in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
> * added rte_hash_crc_set_alg() function to control availability of SSE4.2.
> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
> * reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.
> 
> Patches were tested on machines either with and without SSE4.2 support. Software implementation seems to be about 15 times slower than SSE4.2-enabled one. Of course, they return identical results.
> 
> Yerden Zhumabekov (4):
>   hash: add software CRC32 implementation
>   hash: add new rte_hash_crc_8byte call
>   hash: add fallback to software CRC32 implementation
>   hash: rte_hash_crc() slices data into 8-byte pieces
> 
>  lib/librte_hash/rte_hash_crc.h |  212 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 202 insertions(+), 10 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> 
Functionally this all looks great, but I think you want to add a 5th patch to
the series in which you remove the ifdef SSE4.2 bits from test_hash_perf, since
this makes rte_hash_crc usable in all cases.  Not sure if you would rather just
ditch rte_hash_jhash alltogether, or make testing it a command line runtime
option

Neil

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent
  2014-11-17 11:31   ` Neil Horman
@ 2014-11-17 11:54     ` Yerden Zhumabekov
  2014-11-25 17:05       ` Stephen Hemminger
  0 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-17 11:54 UTC (permalink / raw)
  To: Neil Horman, dev, e_zhumabekov


17.11.2014 17:31, Neil Horman пишет:
> On Sun, Nov 16, 2014 at 11:59:16PM +0600, Yerden Zhumabekov wrote:
>> This is a rework of my previous patches improving performance of rte_hash_crc. In addition, this revision brings a fallback mechanism to ensure that CRC32 hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics).
>>
>> Summary of changes:
>> * added CRC32 software implementation, which is used as a fallback in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
>> * added rte_hash_crc_set_alg() function to control availability of SSE4.2.
>> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
>> * reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.
>>
>> Patches were tested on machines either with and without SSE4.2 support. Software implementation seems to be about 15 times slower than SSE4.2-enabled one. Of course, they return identical results.
>>
>> Yerden Zhumabekov (4):
>>   hash: add software CRC32 implementation
>>   hash: add new rte_hash_crc_8byte call
>>   hash: add fallback to software CRC32 implementation
>>   hash: rte_hash_crc() slices data into 8-byte pieces
>>
>>  lib/librte_hash/rte_hash_crc.h |  212 ++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 202 insertions(+), 10 deletions(-)
>>
>> -- 
>> 1.7.9.5
>>
>>
> Functionally this all looks great, but I think you want to add a 5th patch to
> the series in which you remove the ifdef SSE4.2 bits from test_hash_perf, since
> this makes rte_hash_crc usable in all cases.  Not sure if you would rather just
> ditch rte_hash_jhash alltogether, or make testing it a command line runtime
> option

Meanwhile, I've borrowed some Intel's code (BSD licensed) for CRC32 sw
algorithm, it runs 4 times faster sacrificing memory (2K) for additional
lookup tables. I'd like to include it as well. As for test_hash_perf,
I'll look at it.
Should I just send new series over as 'v3'? Any approval/disapproval for
the current series?

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
@ 2014-11-17 12:34   ` Ananyev, Konstantin
  2014-11-17 12:41     ` Yerden Zhumabekov
  2014-11-17 14:06     ` Neil Horman
  0 siblings, 2 replies; 98+ messages in thread
From: Ananyev, Konstantin @ 2014-11-17 12:34 UTC (permalink / raw)
  To: Yerden Zhumabekov, dev


Hi Yerden,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yerden Zhumabekov
> Sent: Sunday, November 16, 2014 5:59 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation
> 
> Initially, SSE4.2 support is detected via CPUID instruction.
> 
> Added rte_hash_crc_set_alg() function to detect and set CRC32
> implementation if necessary. SSE4.2 is allowed by default. If it's
> not available, fall back to sw implementation.
> 
> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> ---
>  lib/librte_hash/rte_hash_crc.h |   60 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index 74e2d92..178b162 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -45,7 +45,11 @@ extern "C" {
>  #endif
> 
>  #include <stdint.h>
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>  #include <nmmintrin.h>
> +#endif
> +#include <rte_cpuflags.h>
> +#include <rte_branch_prediction.h>
> 
>  /* Lookup table for software implementation of CRC32C */
>  static const uint32_t crc32c_table[256] = {
> @@ -152,8 +156,42 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>  	return init_val;
>  }
> 
> +enum crc32_alg_t {
> +	CRC32_SW = 0,
> +	CRC32_SSE42,
> +	CRC32_AUTODETECT
> +};
> +
> +/* Default algorithm is left for autodetection,
> + * it is detected on first run of hash function
> + */
> +static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
> +
> +/**
> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> + * hash calculation.
> + *
> + * @param flag
> + *   unsigned integer flag
> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> + */
> +static inline void
> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> +{
> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> +
> +	if (alg == CRC32_SSE42)
> +		crc32_alg = alg_supp;
> +	else
> +		crc32_alg = CRC32_SW;
> +}
> +

Wonder can we define that function with __attribute__((constructor))?
Then, I suppose we can remove CRC32_AUTODETECT, and remove:
if (unlikely(crc32_alg == CRC32_AUTODETECT))
   rte_hash_crc_set_alg(CRC32_SSE42);   
from rte_hash_crc_*byte().

Konstantin

>  /**
>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -165,11 +203,21 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  {
> -	return _mm_crc32_u32(init_val, data);
> +	if (unlikely(crc32_alg == CRC32_AUTODETECT))
> +		rte_hash_crc_set_alg(CRC32_SSE42);
> +
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +	if (likely(crc32_alg == CRC32_SSE42))
> +		return _mm_crc32_u32(init_val, data);
> +#endif
> +
> +	return crc32c_1word(data, init_val);
>  }
> 
>  /**
>   * Use single crc32 instruction to perform a hash on a 8 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -181,7 +229,15 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
>  {
> -	return _mm_crc32_u64(init_val, data);
> +	if (unlikely(crc32_alg == CRC32_AUTODETECT))
> +		rte_hash_crc_set_alg(CRC32_SSE42);
> +
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +	if (likely(crc32_alg == CRC32_SSE42))
> +		return _mm_crc32_u64(init_val, data);
> +#endif
> +
> +	return crc32c_2words(data, init_val);
>  }
> 
>  /**
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation
  2014-11-17 12:34   ` Ananyev, Konstantin
@ 2014-11-17 12:41     ` Yerden Zhumabekov
  2014-11-17 14:06     ` Neil Horman
  1 sibling, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-17 12:41 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev


17.11.2014 18:34, Ananyev, Konstantin пишет:
> Hi Yerden,
>
>> +static inline void
>> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
>> +{
>> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
>> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
>> +
>> +	if (alg == CRC32_SSE42)
>> +		crc32_alg = alg_supp;
>> +	else
>> +		crc32_alg = CRC32_SW;
>> +}
>> +
> Wonder can we define that function with __attribute__((constructor))?
> Then, I suppose we can remove CRC32_AUTODETECT, and remove:
> if (unlikely(crc32_alg == CRC32_AUTODETECT))
>    rte_hash_crc_set_alg(CRC32_SSE42);   
> from rte_hash_crc_*byte().
Nice feature  I was unfamiliar with :)
Since I'm going to revise the patch series anyway, I'll apply it and
test. Thank you.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation
  2014-11-17 12:34   ` Ananyev, Konstantin
  2014-11-17 12:41     ` Yerden Zhumabekov
@ 2014-11-17 14:06     ` Neil Horman
  1 sibling, 0 replies; 98+ messages in thread
From: Neil Horman @ 2014-11-17 14:06 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Mon, Nov 17, 2014 at 12:34:04PM +0000, Ananyev, Konstantin wrote:
> 
> Hi Yerden,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yerden Zhumabekov
> > Sent: Sunday, November 16, 2014 5:59 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation
> > 
> > Initially, SSE4.2 support is detected via CPUID instruction.
> > 
> > Added rte_hash_crc_set_alg() function to detect and set CRC32
> > implementation if necessary. SSE4.2 is allowed by default. If it's
> > not available, fall back to sw implementation.
> > 
> > Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> > ---
> >  lib/librte_hash/rte_hash_crc.h |   60 ++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 58 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> > index 74e2d92..178b162 100644
> > --- a/lib/librte_hash/rte_hash_crc.h
> > +++ b/lib/librte_hash/rte_hash_crc.h
> > @@ -45,7 +45,11 @@ extern "C" {
> >  #endif
> > 
> >  #include <stdint.h>
> > +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> >  #include <nmmintrin.h>
> > +#endif
> > +#include <rte_cpuflags.h>
> > +#include <rte_branch_prediction.h>
> > 
> >  /* Lookup table for software implementation of CRC32C */
> >  static const uint32_t crc32c_table[256] = {
> > @@ -152,8 +156,42 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> >  	return init_val;
> >  }
> > 
> > +enum crc32_alg_t {
> > +	CRC32_SW = 0,
> > +	CRC32_SSE42,
> > +	CRC32_AUTODETECT
> > +};
> > +
> > +/* Default algorithm is left for autodetection,
> > + * it is detected on first run of hash function
> > + */
> > +static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
> > +
> > +/**
> > + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> > + * hash calculation.
> > + *
> > + * @param flag
> > + *   unsigned integer flag
> > + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> > + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> > + */
> > +static inline void
> > +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> > +{
> > +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> > +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> > +
> > +	if (alg == CRC32_SSE42)
> > +		crc32_alg = alg_supp;
> > +	else
> > +		crc32_alg = CRC32_SW;
> > +}
> > +
> 
> Wonder can we define that function with __attribute__((constructor))?
> Then, I suppose we can remove CRC32_AUTODETECT, and remove:
> if (unlikely(crc32_alg == CRC32_AUTODETECT))
>    rte_hash_crc_set_alg(CRC32_SSE42);   
> from rte_hash_crc_*byte().
> 
That would make sense.
Neil

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v3 0/5] rte_hash_crc reworked to be platform-independent
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
  2014-11-17 11:31   ` Neil Horman
@ 2014-11-18  3:21   ` Yerden Zhumabekov
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
                       ` (4 more replies)
  1 sibling, 5 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18  3:21 UTC (permalink / raw)
  To: dev

This is a rework of my previous patches improving performance of rte_hash_crc. In addition, this revision brings a fallback mechanism to ensure that CRC32 hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics). Performance of software CRC32 implementation is also improved.

Summary of changes:
* added CRC32 software implementation, which is used as a fallback in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
* added rte_hash_crc_set_alg() function to control availability of SSE4.2.
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.
* removed compile-time checks from test_hash_perf and test_hash.
* setting default algorithm implementation as a constructor while application startup.

Patches were tested on machines either with and without SSE4.2 support. Software implementation seems to be about 4-5 times slower than SSE4.2-enabled one. Of course, they return identical results.

Yerden Zhumabekov (5):
  hash: add software CRC32 implementation
  hash: add new rte_hash_crc_8byte call
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces
  test: remove redundant compile checks

 app/test/test_hash.c           |    7 -
 app/test/test_hash_perf.c      |   11 --
 lib/librte_hash/rte_hash_crc.h |  427 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 417 insertions(+), 28 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v3 1/5] hash: add software CRC32 implementation
  2014-11-18  3:21   ` [dpdk-dev] [PATCH v3 0/5] " Yerden Zhumabekov
@ 2014-11-18  3:21     ` Yerden Zhumabekov
  2014-11-25 17:34       ` Stephen Hemminger
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 2/5] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18  3:21 UTC (permalink / raw)
  To: dev

Add lookup tables for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and
64-bit operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |  316 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 316 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..4d7532a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -47,6 +47,322 @@ extern "C" {
 #include <stdint.h>
 #include <nmmintrin.h>
 
+/* Lookup tables for software implementation of CRC32C */
+static uint32_t crc32c_tables[8][256] = {{
+ 0x00000000, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, 0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB,
+ 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, 0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24,
+ 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, 0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384,
+ 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, 0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B,
+ 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, 0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35,
+ 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, 0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA,
+ 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, 0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A,
+ 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, 0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595,
+ 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, 0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957,
+ 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, 0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198,
+ 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, 0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38,
+ 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, 0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7,
+ 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, 0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789,
+ 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, 0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46,
+ 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, 0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6,
+ 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, 0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829,
+ 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, 0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93,
+ 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, 0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C,
+ 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, 0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC,
+ 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, 0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033,
+ 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, 0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D,
+ 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, 0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982,
+ 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, 0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622,
+ 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, 0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED,
+ 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, 0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F,
+ 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, 0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0,
+ 0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F, 0x144976B4, 0xE622F5B7, 0xF5720643, 0x07198540,
+ 0x590AB964, 0xAB613A67, 0xB831C993, 0x4A5A4A90, 0x9E902E7B, 0x6CFBAD78, 0x7FAB5E8C, 0x8DC0DD8F,
+ 0xE330A81A, 0x115B2B19, 0x020BD8ED, 0xF0605BEE, 0x24AA3F05, 0xD6C1BC06, 0xC5914FF2, 0x37FACCF1,
+ 0x69E9F0D5, 0x9B8273D6, 0x88D28022, 0x7AB90321, 0xAE7367CA, 0x5C18E4C9, 0x4F48173D, 0xBD23943E,
+ 0xF36E6F75, 0x0105EC76, 0x12551F82, 0xE03E9C81, 0x34F4F86A, 0xC69F7B69, 0xD5CF889D, 0x27A40B9E,
+ 0x79B737BA, 0x8BDCB4B9, 0x988C474D, 0x6AE7C44E, 0xBE2DA0A5, 0x4C4623A6, 0x5F16D052, 0xAD7D5351
+},
+{
+ 0x00000000, 0x13A29877, 0x274530EE, 0x34E7A899, 0x4E8A61DC, 0x5D28F9AB, 0x69CF5132, 0x7A6DC945,
+ 0x9D14C3B8, 0x8EB65BCF, 0xBA51F356, 0xA9F36B21, 0xD39EA264, 0xC03C3A13, 0xF4DB928A, 0xE7790AFD,
+ 0x3FC5F181, 0x2C6769F6, 0x1880C16F, 0x0B225918, 0x714F905D, 0x62ED082A, 0x560AA0B3, 0x45A838C4,
+ 0xA2D13239, 0xB173AA4E, 0x859402D7, 0x96369AA0, 0xEC5B53E5, 0xFFF9CB92, 0xCB1E630B, 0xD8BCFB7C,
+ 0x7F8BE302, 0x6C297B75, 0x58CED3EC, 0x4B6C4B9B, 0x310182DE, 0x22A31AA9, 0x1644B230, 0x05E62A47,
+ 0xE29F20BA, 0xF13DB8CD, 0xC5DA1054, 0xD6788823, 0xAC154166, 0xBFB7D911, 0x8B507188, 0x98F2E9FF,
+ 0x404E1283, 0x53EC8AF4, 0x670B226D, 0x74A9BA1A, 0x0EC4735F, 0x1D66EB28, 0x298143B1, 0x3A23DBC6,
+ 0xDD5AD13B, 0xCEF8494C, 0xFA1FE1D5, 0xE9BD79A2, 0x93D0B0E7, 0x80722890, 0xB4958009, 0xA737187E,
+ 0xFF17C604, 0xECB55E73, 0xD852F6EA, 0xCBF06E9D, 0xB19DA7D8, 0xA23F3FAF, 0x96D89736, 0x857A0F41,
+ 0x620305BC, 0x71A19DCB, 0x45463552, 0x56E4AD25, 0x2C896460, 0x3F2BFC17, 0x0BCC548E, 0x186ECCF9,
+ 0xC0D23785, 0xD370AFF2, 0xE797076B, 0xF4359F1C, 0x8E585659, 0x9DFACE2E, 0xA91D66B7, 0xBABFFEC0,
+ 0x5DC6F43D, 0x4E646C4A, 0x7A83C4D3, 0x69215CA4, 0x134C95E1, 0x00EE0D96, 0x3409A50F, 0x27AB3D78,
+ 0x809C2506, 0x933EBD71, 0xA7D915E8, 0xB47B8D9F, 0xCE1644DA, 0xDDB4DCAD, 0xE9537434, 0xFAF1EC43,
+ 0x1D88E6BE, 0x0E2A7EC9, 0x3ACDD650, 0x296F4E27, 0x53028762, 0x40A01F15, 0x7447B78C, 0x67E52FFB,
+ 0xBF59D487, 0xACFB4CF0, 0x981CE469, 0x8BBE7C1E, 0xF1D3B55B, 0xE2712D2C, 0xD69685B5, 0xC5341DC2,
+ 0x224D173F, 0x31EF8F48, 0x050827D1, 0x16AABFA6, 0x6CC776E3, 0x7F65EE94, 0x4B82460D, 0x5820DE7A,
+ 0xFBC3FAF9, 0xE861628E, 0xDC86CA17, 0xCF245260, 0xB5499B25, 0xA6EB0352, 0x920CABCB, 0x81AE33BC,
+ 0x66D73941, 0x7575A136, 0x419209AF, 0x523091D8, 0x285D589D, 0x3BFFC0EA, 0x0F186873, 0x1CBAF004,
+ 0xC4060B78, 0xD7A4930F, 0xE3433B96, 0xF0E1A3E1, 0x8A8C6AA4, 0x992EF2D3, 0xADC95A4A, 0xBE6BC23D,
+ 0x5912C8C0, 0x4AB050B7, 0x7E57F82E, 0x6DF56059, 0x1798A91C, 0x043A316B, 0x30DD99F2, 0x237F0185,
+ 0x844819FB, 0x97EA818C, 0xA30D2915, 0xB0AFB162, 0xCAC27827, 0xD960E050, 0xED8748C9, 0xFE25D0BE,
+ 0x195CDA43, 0x0AFE4234, 0x3E19EAAD, 0x2DBB72DA, 0x57D6BB9F, 0x447423E8, 0x70938B71, 0x63311306,
+ 0xBB8DE87A, 0xA82F700D, 0x9CC8D894, 0x8F6A40E3, 0xF50789A6, 0xE6A511D1, 0xD242B948, 0xC1E0213F,
+ 0x26992BC2, 0x353BB3B5, 0x01DC1B2C, 0x127E835B, 0x68134A1E, 0x7BB1D269, 0x4F567AF0, 0x5CF4E287,
+ 0x04D43CFD, 0x1776A48A, 0x23910C13, 0x30339464, 0x4A5E5D21, 0x59FCC556, 0x6D1B6DCF, 0x7EB9F5B8,
+ 0x99C0FF45, 0x8A626732, 0xBE85CFAB, 0xAD2757DC, 0xD74A9E99, 0xC4E806EE, 0xF00FAE77, 0xE3AD3600,
+ 0x3B11CD7C, 0x28B3550B, 0x1C54FD92, 0x0FF665E5, 0x759BACA0, 0x663934D7, 0x52DE9C4E, 0x417C0439,
+ 0xA6050EC4, 0xB5A796B3, 0x81403E2A, 0x92E2A65D, 0xE88F6F18, 0xFB2DF76F, 0xCFCA5FF6, 0xDC68C781,
+ 0x7B5FDFFF, 0x68FD4788, 0x5C1AEF11, 0x4FB87766, 0x35D5BE23, 0x26772654, 0x12908ECD, 0x013216BA,
+ 0xE64B1C47, 0xF5E98430, 0xC10E2CA9, 0xD2ACB4DE, 0xA8C17D9B, 0xBB63E5EC, 0x8F844D75, 0x9C26D502,
+ 0x449A2E7E, 0x5738B609, 0x63DF1E90, 0x707D86E7, 0x0A104FA2, 0x19B2D7D5, 0x2D557F4C, 0x3EF7E73B,
+ 0xD98EEDC6, 0xCA2C75B1, 0xFECBDD28, 0xED69455F, 0x97048C1A, 0x84A6146D, 0xB041BCF4, 0xA3E32483
+},
+{
+ 0x00000000, 0xA541927E, 0x4F6F520D, 0xEA2EC073, 0x9EDEA41A, 0x3B9F3664, 0xD1B1F617, 0x74F06469,
+ 0x38513EC5, 0x9D10ACBB, 0x773E6CC8, 0xD27FFEB6, 0xA68F9ADF, 0x03CE08A1, 0xE9E0C8D2, 0x4CA15AAC,
+ 0x70A27D8A, 0xD5E3EFF4, 0x3FCD2F87, 0x9A8CBDF9, 0xEE7CD990, 0x4B3D4BEE, 0xA1138B9D, 0x045219E3,
+ 0x48F3434F, 0xEDB2D131, 0x079C1142, 0xA2DD833C, 0xD62DE755, 0x736C752B, 0x9942B558, 0x3C032726,
+ 0xE144FB14, 0x4405696A, 0xAE2BA919, 0x0B6A3B67, 0x7F9A5F0E, 0xDADBCD70, 0x30F50D03, 0x95B49F7D,
+ 0xD915C5D1, 0x7C5457AF, 0x967A97DC, 0x333B05A2, 0x47CB61CB, 0xE28AF3B5, 0x08A433C6, 0xADE5A1B8,
+ 0x91E6869E, 0x34A714E0, 0xDE89D493, 0x7BC846ED, 0x0F382284, 0xAA79B0FA, 0x40577089, 0xE516E2F7,
+ 0xA9B7B85B, 0x0CF62A25, 0xE6D8EA56, 0x43997828, 0x37691C41, 0x92288E3F, 0x78064E4C, 0xDD47DC32,
+ 0xC76580D9, 0x622412A7, 0x880AD2D4, 0x2D4B40AA, 0x59BB24C3, 0xFCFAB6BD, 0x16D476CE, 0xB395E4B0,
+ 0xFF34BE1C, 0x5A752C62, 0xB05BEC11, 0x151A7E6F, 0x61EA1A06, 0xC4AB8878, 0x2E85480B, 0x8BC4DA75,
+ 0xB7C7FD53, 0x12866F2D, 0xF8A8AF5E, 0x5DE93D20, 0x29195949, 0x8C58CB37, 0x66760B44, 0xC337993A,
+ 0x8F96C396, 0x2AD751E8, 0xC0F9919B, 0x65B803E5, 0x1148678C, 0xB409F5F2, 0x5E273581, 0xFB66A7FF,
+ 0x26217BCD, 0x8360E9B3, 0x694E29C0, 0xCC0FBBBE, 0xB8FFDFD7, 0x1DBE4DA9, 0xF7908DDA, 0x52D11FA4,
+ 0x1E704508, 0xBB31D776, 0x511F1705, 0xF45E857B, 0x80AEE112, 0x25EF736C, 0xCFC1B31F, 0x6A802161,
+ 0x56830647, 0xF3C29439, 0x19EC544A, 0xBCADC634, 0xC85DA25D, 0x6D1C3023, 0x8732F050, 0x2273622E,
+ 0x6ED23882, 0xCB93AAFC, 0x21BD6A8F, 0x84FCF8F1, 0xF00C9C98, 0x554D0EE6, 0xBF63CE95, 0x1A225CEB,
+ 0x8B277743, 0x2E66E53D, 0xC448254E, 0x6109B730, 0x15F9D359, 0xB0B84127, 0x5A968154, 0xFFD7132A,
+ 0xB3764986, 0x1637DBF8, 0xFC191B8B, 0x595889F5, 0x2DA8ED9C, 0x88E97FE2, 0x62C7BF91, 0xC7862DEF,
+ 0xFB850AC9, 0x5EC498B7, 0xB4EA58C4, 0x11ABCABA, 0x655BAED3, 0xC01A3CAD, 0x2A34FCDE, 0x8F756EA0,
+ 0xC3D4340C, 0x6695A672, 0x8CBB6601, 0x29FAF47F, 0x5D0A9016, 0xF84B0268, 0x1265C21B, 0xB7245065,
+ 0x6A638C57, 0xCF221E29, 0x250CDE5A, 0x804D4C24, 0xF4BD284D, 0x51FCBA33, 0xBBD27A40, 0x1E93E83E,
+ 0x5232B292, 0xF77320EC, 0x1D5DE09F, 0xB81C72E1, 0xCCEC1688, 0x69AD84F6, 0x83834485, 0x26C2D6FB,
+ 0x1AC1F1DD, 0xBF8063A3, 0x55AEA3D0, 0xF0EF31AE, 0x841F55C7, 0x215EC7B9, 0xCB7007CA, 0x6E3195B4,
+ 0x2290CF18, 0x87D15D66, 0x6DFF9D15, 0xC8BE0F6B, 0xBC4E6B02, 0x190FF97C, 0xF321390F, 0x5660AB71,
+ 0x4C42F79A, 0xE90365E4, 0x032DA597, 0xA66C37E9, 0xD29C5380, 0x77DDC1FE, 0x9DF3018D, 0x38B293F3,
+ 0x7413C95F, 0xD1525B21, 0x3B7C9B52, 0x9E3D092C, 0xEACD6D45, 0x4F8CFF3B, 0xA5A23F48, 0x00E3AD36,
+ 0x3CE08A10, 0x99A1186E, 0x738FD81D, 0xD6CE4A63, 0xA23E2E0A, 0x077FBC74, 0xED517C07, 0x4810EE79,
+ 0x04B1B4D5, 0xA1F026AB, 0x4BDEE6D8, 0xEE9F74A6, 0x9A6F10CF, 0x3F2E82B1, 0xD50042C2, 0x7041D0BC,
+ 0xAD060C8E, 0x08479EF0, 0xE2695E83, 0x4728CCFD, 0x33D8A894, 0x96993AEA, 0x7CB7FA99, 0xD9F668E7,
+ 0x9557324B, 0x3016A035, 0xDA386046, 0x7F79F238, 0x0B899651, 0xAEC8042F, 0x44E6C45C, 0xE1A75622,
+ 0xDDA47104, 0x78E5E37A, 0x92CB2309, 0x378AB177, 0x437AD51E, 0xE63B4760, 0x0C158713, 0xA954156D,
+ 0xE5F54FC1, 0x40B4DDBF, 0xAA9A1DCC, 0x0FDB8FB2, 0x7B2BEBDB, 0xDE6A79A5, 0x3444B9D6, 0x91052BA8
+},
+{
+ 0x00000000, 0xDD45AAB8, 0xBF672381, 0x62228939, 0x7B2231F3, 0xA6679B4B, 0xC4451272, 0x1900B8CA,
+ 0xF64463E6, 0x2B01C95E, 0x49234067, 0x9466EADF, 0x8D665215, 0x5023F8AD, 0x32017194, 0xEF44DB2C,
+ 0xE964B13D, 0x34211B85, 0x560392BC, 0x8B463804, 0x924680CE, 0x4F032A76, 0x2D21A34F, 0xF06409F7,
+ 0x1F20D2DB, 0xC2657863, 0xA047F15A, 0x7D025BE2, 0x6402E328, 0xB9474990, 0xDB65C0A9, 0x06206A11,
+ 0xD725148B, 0x0A60BE33, 0x6842370A, 0xB5079DB2, 0xAC072578, 0x71428FC0, 0x136006F9, 0xCE25AC41,
+ 0x2161776D, 0xFC24DDD5, 0x9E0654EC, 0x4343FE54, 0x5A43469E, 0x8706EC26, 0xE524651F, 0x3861CFA7,
+ 0x3E41A5B6, 0xE3040F0E, 0x81268637, 0x5C632C8F, 0x45639445, 0x98263EFD, 0xFA04B7C4, 0x27411D7C,
+ 0xC805C650, 0x15406CE8, 0x7762E5D1, 0xAA274F69, 0xB327F7A3, 0x6E625D1B, 0x0C40D422, 0xD1057E9A,
+ 0xABA65FE7, 0x76E3F55F, 0x14C17C66, 0xC984D6DE, 0xD0846E14, 0x0DC1C4AC, 0x6FE34D95, 0xB2A6E72D,
+ 0x5DE23C01, 0x80A796B9, 0xE2851F80, 0x3FC0B538, 0x26C00DF2, 0xFB85A74A, 0x99A72E73, 0x44E284CB,
+ 0x42C2EEDA, 0x9F874462, 0xFDA5CD5B, 0x20E067E3, 0x39E0DF29, 0xE4A57591, 0x8687FCA8, 0x5BC25610,
+ 0xB4868D3C, 0x69C32784, 0x0BE1AEBD, 0xD6A40405, 0xCFA4BCCF, 0x12E11677, 0x70C39F4E, 0xAD8635F6,
+ 0x7C834B6C, 0xA1C6E1D4, 0xC3E468ED, 0x1EA1C255, 0x07A17A9F, 0xDAE4D027, 0xB8C6591E, 0x6583F3A6,
+ 0x8AC7288A, 0x57828232, 0x35A00B0B, 0xE8E5A1B3, 0xF1E51979, 0x2CA0B3C1, 0x4E823AF8, 0x93C79040,
+ 0x95E7FA51, 0x48A250E9, 0x2A80D9D0, 0xF7C57368, 0xEEC5CBA2, 0x3380611A, 0x51A2E823, 0x8CE7429B,
+ 0x63A399B7, 0xBEE6330F, 0xDCC4BA36, 0x0181108E, 0x1881A844, 0xC5C402FC, 0xA7E68BC5, 0x7AA3217D,
+ 0x52A0C93F, 0x8FE56387, 0xEDC7EABE, 0x30824006, 0x2982F8CC, 0xF4C75274, 0x96E5DB4D, 0x4BA071F5,
+ 0xA4E4AAD9, 0x79A10061, 0x1B838958, 0xC6C623E0, 0xDFC69B2A, 0x02833192, 0x60A1B8AB, 0xBDE41213,
+ 0xBBC47802, 0x6681D2BA, 0x04A35B83, 0xD9E6F13B, 0xC0E649F1, 0x1DA3E349, 0x7F816A70, 0xA2C4C0C8,
+ 0x4D801BE4, 0x90C5B15C, 0xF2E73865, 0x2FA292DD, 0x36A22A17, 0xEBE780AF, 0x89C50996, 0x5480A32E,
+ 0x8585DDB4, 0x58C0770C, 0x3AE2FE35, 0xE7A7548D, 0xFEA7EC47, 0x23E246FF, 0x41C0CFC6, 0x9C85657E,
+ 0x73C1BE52, 0xAE8414EA, 0xCCA69DD3, 0x11E3376B, 0x08E38FA1, 0xD5A62519, 0xB784AC20, 0x6AC10698,
+ 0x6CE16C89, 0xB1A4C631, 0xD3864F08, 0x0EC3E5B0, 0x17C35D7A, 0xCA86F7C2, 0xA8A47EFB, 0x75E1D443,
+ 0x9AA50F6F, 0x47E0A5D7, 0x25C22CEE, 0xF8878656, 0xE1873E9C, 0x3CC29424, 0x5EE01D1D, 0x83A5B7A5,
+ 0xF90696D8, 0x24433C60, 0x4661B559, 0x9B241FE1, 0x8224A72B, 0x5F610D93, 0x3D4384AA, 0xE0062E12,
+ 0x0F42F53E, 0xD2075F86, 0xB025D6BF, 0x6D607C07, 0x7460C4CD, 0xA9256E75, 0xCB07E74C, 0x16424DF4,
+ 0x106227E5, 0xCD278D5D, 0xAF050464, 0x7240AEDC, 0x6B401616, 0xB605BCAE, 0xD4273597, 0x09629F2F,
+ 0xE6264403, 0x3B63EEBB, 0x59416782, 0x8404CD3A, 0x9D0475F0, 0x4041DF48, 0x22635671, 0xFF26FCC9,
+ 0x2E238253, 0xF36628EB, 0x9144A1D2, 0x4C010B6A, 0x5501B3A0, 0x88441918, 0xEA669021, 0x37233A99,
+ 0xD867E1B5, 0x05224B0D, 0x6700C234, 0xBA45688C, 0xA345D046, 0x7E007AFE, 0x1C22F3C7, 0xC167597F,
+ 0xC747336E, 0x1A0299D6, 0x782010EF, 0xA565BA57, 0xBC65029D, 0x6120A825, 0x0302211C, 0xDE478BA4,
+ 0x31035088, 0xEC46FA30, 0x8E647309, 0x5321D9B1, 0x4A21617B, 0x9764CBC3, 0xF54642FA, 0x2803E842
+},
+{
+ 0x00000000, 0x38116FAC, 0x7022DF58, 0x4833B0F4, 0xE045BEB0, 0xD854D11C, 0x906761E8, 0xA8760E44,
+ 0xC5670B91, 0xFD76643D, 0xB545D4C9, 0x8D54BB65, 0x2522B521, 0x1D33DA8D, 0x55006A79, 0x6D1105D5,
+ 0x8F2261D3, 0xB7330E7F, 0xFF00BE8B, 0xC711D127, 0x6F67DF63, 0x5776B0CF, 0x1F45003B, 0x27546F97,
+ 0x4A456A42, 0x725405EE, 0x3A67B51A, 0x0276DAB6, 0xAA00D4F2, 0x9211BB5E, 0xDA220BAA, 0xE2336406,
+ 0x1BA8B557, 0x23B9DAFB, 0x6B8A6A0F, 0x539B05A3, 0xFBED0BE7, 0xC3FC644B, 0x8BCFD4BF, 0xB3DEBB13,
+ 0xDECFBEC6, 0xE6DED16A, 0xAEED619E, 0x96FC0E32, 0x3E8A0076, 0x069B6FDA, 0x4EA8DF2E, 0x76B9B082,
+ 0x948AD484, 0xAC9BBB28, 0xE4A80BDC, 0xDCB96470, 0x74CF6A34, 0x4CDE0598, 0x04EDB56C, 0x3CFCDAC0,
+ 0x51EDDF15, 0x69FCB0B9, 0x21CF004D, 0x19DE6FE1, 0xB1A861A5, 0x89B90E09, 0xC18ABEFD, 0xF99BD151,
+ 0x37516AAE, 0x0F400502, 0x4773B5F6, 0x7F62DA5A, 0xD714D41E, 0xEF05BBB2, 0xA7360B46, 0x9F2764EA,
+ 0xF236613F, 0xCA270E93, 0x8214BE67, 0xBA05D1CB, 0x1273DF8F, 0x2A62B023, 0x625100D7, 0x5A406F7B,
+ 0xB8730B7D, 0x806264D1, 0xC851D425, 0xF040BB89, 0x5836B5CD, 0x6027DA61, 0x28146A95, 0x10050539,
+ 0x7D1400EC, 0x45056F40, 0x0D36DFB4, 0x3527B018, 0x9D51BE5C, 0xA540D1F0, 0xED736104, 0xD5620EA8,
+ 0x2CF9DFF9, 0x14E8B055, 0x5CDB00A1, 0x64CA6F0D, 0xCCBC6149, 0xF4AD0EE5, 0xBC9EBE11, 0x848FD1BD,
+ 0xE99ED468, 0xD18FBBC4, 0x99BC0B30, 0xA1AD649C, 0x09DB6AD8, 0x31CA0574, 0x79F9B580, 0x41E8DA2C,
+ 0xA3DBBE2A, 0x9BCAD186, 0xD3F96172, 0xEBE80EDE, 0x439E009A, 0x7B8F6F36, 0x33BCDFC2, 0x0BADB06E,
+ 0x66BCB5BB, 0x5EADDA17, 0x169E6AE3, 0x2E8F054F, 0x86F90B0B, 0xBEE864A7, 0xF6DBD453, 0xCECABBFF,
+ 0x6EA2D55C, 0x56B3BAF0, 0x1E800A04, 0x269165A8, 0x8EE76BEC, 0xB6F60440, 0xFEC5B4B4, 0xC6D4DB18,
+ 0xABC5DECD, 0x93D4B161, 0xDBE70195, 0xE3F66E39, 0x4B80607D, 0x73910FD1, 0x3BA2BF25, 0x03B3D089,
+ 0xE180B48F, 0xD991DB23, 0x91A26BD7, 0xA9B3047B, 0x01C50A3F, 0x39D46593, 0x71E7D567, 0x49F6BACB,
+ 0x24E7BF1E, 0x1CF6D0B2, 0x54C56046, 0x6CD40FEA, 0xC4A201AE, 0xFCB36E02, 0xB480DEF6, 0x8C91B15A,
+ 0x750A600B, 0x4D1B0FA7, 0x0528BF53, 0x3D39D0FF, 0x954FDEBB, 0xAD5EB117, 0xE56D01E3, 0xDD7C6E4F,
+ 0xB06D6B9A, 0x887C0436, 0xC04FB4C2, 0xF85EDB6E, 0x5028D52A, 0x6839BA86, 0x200A0A72, 0x181B65DE,
+ 0xFA2801D8, 0xC2396E74, 0x8A0ADE80, 0xB21BB12C, 0x1A6DBF68, 0x227CD0C4, 0x6A4F6030, 0x525E0F9C,
+ 0x3F4F0A49, 0x075E65E5, 0x4F6DD511, 0x777CBABD, 0xDF0AB4F9, 0xE71BDB55, 0xAF286BA1, 0x9739040D,
+ 0x59F3BFF2, 0x61E2D05E, 0x29D160AA, 0x11C00F06, 0xB9B60142, 0x81A76EEE, 0xC994DE1A, 0xF185B1B6,
+ 0x9C94B463, 0xA485DBCF, 0xECB66B3B, 0xD4A70497, 0x7CD10AD3, 0x44C0657F, 0x0CF3D58B, 0x34E2BA27,
+ 0xD6D1DE21, 0xEEC0B18D, 0xA6F30179, 0x9EE26ED5, 0x36946091, 0x0E850F3D, 0x46B6BFC9, 0x7EA7D065,
+ 0x13B6D5B0, 0x2BA7BA1C, 0x63940AE8, 0x5B856544, 0xF3F36B00, 0xCBE204AC, 0x83D1B458, 0xBBC0DBF4,
+ 0x425B0AA5, 0x7A4A6509, 0x3279D5FD, 0x0A68BA51, 0xA21EB415, 0x9A0FDBB9, 0xD23C6B4D, 0xEA2D04E1,
+ 0x873C0134, 0xBF2D6E98, 0xF71EDE6C, 0xCF0FB1C0, 0x6779BF84, 0x5F68D028, 0x175B60DC, 0x2F4A0F70,
+ 0xCD796B76, 0xF56804DA, 0xBD5BB42E, 0x854ADB82, 0x2D3CD5C6, 0x152DBA6A, 0x5D1E0A9E, 0x650F6532,
+ 0x081E60E7, 0x300F0F4B, 0x783CBFBF, 0x402DD013, 0xE85BDE57, 0xD04AB1FB, 0x9879010F, 0xA0686EA3
+},
+{
+ 0x00000000, 0xEF306B19, 0xDB8CA0C3, 0x34BCCBDA, 0xB2F53777, 0x5DC55C6E, 0x697997B4, 0x8649FCAD,
+ 0x6006181F, 0x8F367306, 0xBB8AB8DC, 0x54BAD3C5, 0xD2F32F68, 0x3DC34471, 0x097F8FAB, 0xE64FE4B2,
+ 0xC00C303E, 0x2F3C5B27, 0x1B8090FD, 0xF4B0FBE4, 0x72F90749, 0x9DC96C50, 0xA975A78A, 0x4645CC93,
+ 0xA00A2821, 0x4F3A4338, 0x7B8688E2, 0x94B6E3FB, 0x12FF1F56, 0xFDCF744F, 0xC973BF95, 0x2643D48C,
+ 0x85F4168D, 0x6AC47D94, 0x5E78B64E, 0xB148DD57, 0x370121FA, 0xD8314AE3, 0xEC8D8139, 0x03BDEA20,
+ 0xE5F20E92, 0x0AC2658B, 0x3E7EAE51, 0xD14EC548, 0x570739E5, 0xB83752FC, 0x8C8B9926, 0x63BBF23F,
+ 0x45F826B3, 0xAAC84DAA, 0x9E748670, 0x7144ED69, 0xF70D11C4, 0x183D7ADD, 0x2C81B107, 0xC3B1DA1E,
+ 0x25FE3EAC, 0xCACE55B5, 0xFE729E6F, 0x1142F576, 0x970B09DB, 0x783B62C2, 0x4C87A918, 0xA3B7C201,
+ 0x0E045BEB, 0xE13430F2, 0xD588FB28, 0x3AB89031, 0xBCF16C9C, 0x53C10785, 0x677DCC5F, 0x884DA746,
+ 0x6E0243F4, 0x813228ED, 0xB58EE337, 0x5ABE882E, 0xDCF77483, 0x33C71F9A, 0x077BD440, 0xE84BBF59,
+ 0xCE086BD5, 0x213800CC, 0x1584CB16, 0xFAB4A00F, 0x7CFD5CA2, 0x93CD37BB, 0xA771FC61, 0x48419778,
+ 0xAE0E73CA, 0x413E18D3, 0x7582D309, 0x9AB2B810, 0x1CFB44BD, 0xF3CB2FA4, 0xC777E47E, 0x28478F67,
+ 0x8BF04D66, 0x64C0267F, 0x507CEDA5, 0xBF4C86BC, 0x39057A11, 0xD6351108, 0xE289DAD2, 0x0DB9B1CB,
+ 0xEBF65579, 0x04C63E60, 0x307AF5BA, 0xDF4A9EA3, 0x5903620E, 0xB6330917, 0x828FC2CD, 0x6DBFA9D4,
+ 0x4BFC7D58, 0xA4CC1641, 0x9070DD9B, 0x7F40B682, 0xF9094A2F, 0x16392136, 0x2285EAEC, 0xCDB581F5,
+ 0x2BFA6547, 0xC4CA0E5E, 0xF076C584, 0x1F46AE9D, 0x990F5230, 0x763F3929, 0x4283F2F3, 0xADB399EA,
+ 0x1C08B7D6, 0xF338DCCF, 0xC7841715, 0x28B47C0C, 0xAEFD80A1, 0x41CDEBB8, 0x75712062, 0x9A414B7B,
+ 0x7C0EAFC9, 0x933EC4D0, 0xA7820F0A, 0x48B26413, 0xCEFB98BE, 0x21CBF3A7, 0x1577387D, 0xFA475364,
+ 0xDC0487E8, 0x3334ECF1, 0x0788272B, 0xE8B84C32, 0x6EF1B09F, 0x81C1DB86, 0xB57D105C, 0x5A4D7B45,
+ 0xBC029FF7, 0x5332F4EE, 0x678E3F34, 0x88BE542D, 0x0EF7A880, 0xE1C7C399, 0xD57B0843, 0x3A4B635A,
+ 0x99FCA15B, 0x76CCCA42, 0x42700198, 0xAD406A81, 0x2B09962C, 0xC439FD35, 0xF08536EF, 0x1FB55DF6,
+ 0xF9FAB944, 0x16CAD25D, 0x22761987, 0xCD46729E, 0x4B0F8E33, 0xA43FE52A, 0x90832EF0, 0x7FB345E9,
+ 0x59F09165, 0xB6C0FA7C, 0x827C31A6, 0x6D4C5ABF, 0xEB05A612, 0x0435CD0B, 0x308906D1, 0xDFB96DC8,
+ 0x39F6897A, 0xD6C6E263, 0xE27A29B9, 0x0D4A42A0, 0x8B03BE0D, 0x6433D514, 0x508F1ECE, 0xBFBF75D7,
+ 0x120CEC3D, 0xFD3C8724, 0xC9804CFE, 0x26B027E7, 0xA0F9DB4A, 0x4FC9B053, 0x7B757B89, 0x94451090,
+ 0x720AF422, 0x9D3A9F3B, 0xA98654E1, 0x46B63FF8, 0xC0FFC355, 0x2FCFA84C, 0x1B736396, 0xF443088F,
+ 0xD200DC03, 0x3D30B71A, 0x098C7CC0, 0xE6BC17D9, 0x60F5EB74, 0x8FC5806D, 0xBB794BB7, 0x544920AE,
+ 0xB206C41C, 0x5D36AF05, 0x698A64DF, 0x86BA0FC6, 0x00F3F36B, 0xEFC39872, 0xDB7F53A8, 0x344F38B1,
+ 0x97F8FAB0, 0x78C891A9, 0x4C745A73, 0xA344316A, 0x250DCDC7, 0xCA3DA6DE, 0xFE816D04, 0x11B1061D,
+ 0xF7FEE2AF, 0x18CE89B6, 0x2C72426C, 0xC3422975, 0x450BD5D8, 0xAA3BBEC1, 0x9E87751B, 0x71B71E02,
+ 0x57F4CA8E, 0xB8C4A197, 0x8C786A4D, 0x63480154, 0xE501FDF9, 0x0A3196E0, 0x3E8D5D3A, 0xD1BD3623,
+ 0x37F2D291, 0xD8C2B988, 0xEC7E7252, 0x034E194B, 0x8507E5E6, 0x6A378EFF, 0x5E8B4525, 0xB1BB2E3C
+},
+{
+ 0x00000000, 0x68032CC8, 0xD0065990, 0xB8057558, 0xA5E0C5D1, 0xCDE3E919, 0x75E69C41, 0x1DE5B089,
+ 0x4E2DFD53, 0x262ED19B, 0x9E2BA4C3, 0xF628880B, 0xEBCD3882, 0x83CE144A, 0x3BCB6112, 0x53C84DDA,
+ 0x9C5BFAA6, 0xF458D66E, 0x4C5DA336, 0x245E8FFE, 0x39BB3F77, 0x51B813BF, 0xE9BD66E7, 0x81BE4A2F,
+ 0xD27607F5, 0xBA752B3D, 0x02705E65, 0x6A7372AD, 0x7796C224, 0x1F95EEEC, 0xA7909BB4, 0xCF93B77C,
+ 0x3D5B83BD, 0x5558AF75, 0xED5DDA2D, 0x855EF6E5, 0x98BB466C, 0xF0B86AA4, 0x48BD1FFC, 0x20BE3334,
+ 0x73767EEE, 0x1B755226, 0xA370277E, 0xCB730BB6, 0xD696BB3F, 0xBE9597F7, 0x0690E2AF, 0x6E93CE67,
+ 0xA100791B, 0xC90355D3, 0x7106208B, 0x19050C43, 0x04E0BCCA, 0x6CE39002, 0xD4E6E55A, 0xBCE5C992,
+ 0xEF2D8448, 0x872EA880, 0x3F2BDDD8, 0x5728F110, 0x4ACD4199, 0x22CE6D51, 0x9ACB1809, 0xF2C834C1,
+ 0x7AB7077A, 0x12B42BB2, 0xAAB15EEA, 0xC2B27222, 0xDF57C2AB, 0xB754EE63, 0x0F519B3B, 0x6752B7F3,
+ 0x349AFA29, 0x5C99D6E1, 0xE49CA3B9, 0x8C9F8F71, 0x917A3FF8, 0xF9791330, 0x417C6668, 0x297F4AA0,
+ 0xE6ECFDDC, 0x8EEFD114, 0x36EAA44C, 0x5EE98884, 0x430C380D, 0x2B0F14C5, 0x930A619D, 0xFB094D55,
+ 0xA8C1008F, 0xC0C22C47, 0x78C7591F, 0x10C475D7, 0x0D21C55E, 0x6522E996, 0xDD279CCE, 0xB524B006,
+ 0x47EC84C7, 0x2FEFA80F, 0x97EADD57, 0xFFE9F19F, 0xE20C4116, 0x8A0F6DDE, 0x320A1886, 0x5A09344E,
+ 0x09C17994, 0x61C2555C, 0xD9C72004, 0xB1C40CCC, 0xAC21BC45, 0xC422908D, 0x7C27E5D5, 0x1424C91D,
+ 0xDBB77E61, 0xB3B452A9, 0x0BB127F1, 0x63B20B39, 0x7E57BBB0, 0x16549778, 0xAE51E220, 0xC652CEE8,
+ 0x959A8332, 0xFD99AFFA, 0x459CDAA2, 0x2D9FF66A, 0x307A46E3, 0x58796A2B, 0xE07C1F73, 0x887F33BB,
+ 0xF56E0EF4, 0x9D6D223C, 0x25685764, 0x4D6B7BAC, 0x508ECB25, 0x388DE7ED, 0x808892B5, 0xE88BBE7D,
+ 0xBB43F3A7, 0xD340DF6F, 0x6B45AA37, 0x034686FF, 0x1EA33676, 0x76A01ABE, 0xCEA56FE6, 0xA6A6432E,
+ 0x6935F452, 0x0136D89A, 0xB933ADC2, 0xD130810A, 0xCCD53183, 0xA4D61D4B, 0x1CD36813, 0x74D044DB,
+ 0x27180901, 0x4F1B25C9, 0xF71E5091, 0x9F1D7C59, 0x82F8CCD0, 0xEAFBE018, 0x52FE9540, 0x3AFDB988,
+ 0xC8358D49, 0xA036A181, 0x1833D4D9, 0x7030F811, 0x6DD54898, 0x05D66450, 0xBDD31108, 0xD5D03DC0,
+ 0x8618701A, 0xEE1B5CD2, 0x561E298A, 0x3E1D0542, 0x23F8B5CB, 0x4BFB9903, 0xF3FEEC5B, 0x9BFDC093,
+ 0x546E77EF, 0x3C6D5B27, 0x84682E7F, 0xEC6B02B7, 0xF18EB23E, 0x998D9EF6, 0x2188EBAE, 0x498BC766,
+ 0x1A438ABC, 0x7240A674, 0xCA45D32C, 0xA246FFE4, 0xBFA34F6D, 0xD7A063A5, 0x6FA516FD, 0x07A63A35,
+ 0x8FD9098E, 0xE7DA2546, 0x5FDF501E, 0x37DC7CD6, 0x2A39CC5F, 0x423AE097, 0xFA3F95CF, 0x923CB907,
+ 0xC1F4F4DD, 0xA9F7D815, 0x11F2AD4D, 0x79F18185, 0x6414310C, 0x0C171DC4, 0xB412689C, 0xDC114454,
+ 0x1382F328, 0x7B81DFE0, 0xC384AAB8, 0xAB878670, 0xB66236F9, 0xDE611A31, 0x66646F69, 0x0E6743A1,
+ 0x5DAF0E7B, 0x35AC22B3, 0x8DA957EB, 0xE5AA7B23, 0xF84FCBAA, 0x904CE762, 0x2849923A, 0x404ABEF2,
+ 0xB2828A33, 0xDA81A6FB, 0x6284D3A3, 0x0A87FF6B, 0x17624FE2, 0x7F61632A, 0xC7641672, 0xAF673ABA,
+ 0xFCAF7760, 0x94AC5BA8, 0x2CA92EF0, 0x44AA0238, 0x594FB2B1, 0x314C9E79, 0x8949EB21, 0xE14AC7E9,
+ 0x2ED97095, 0x46DA5C5D, 0xFEDF2905, 0x96DC05CD, 0x8B39B544, 0xE33A998C, 0x5B3FECD4, 0x333CC01C,
+ 0x60F48DC6, 0x08F7A10E, 0xB0F2D456, 0xD8F1F89E, 0xC5144817, 0xAD1764DF, 0x15121187, 0x7D113D4F
+},
+{
+ 0x00000000, 0x493C7D27, 0x9278FA4E, 0xDB448769, 0x211D826D, 0x6821FF4A, 0xB3657823, 0xFA590504,
+ 0x423B04DA, 0x0B0779FD, 0xD043FE94, 0x997F83B3, 0x632686B7, 0x2A1AFB90, 0xF15E7CF9, 0xB86201DE,
+ 0x847609B4, 0xCD4A7493, 0x160EF3FA, 0x5F328EDD, 0xA56B8BD9, 0xEC57F6FE, 0x37137197, 0x7E2F0CB0,
+ 0xC64D0D6E, 0x8F717049, 0x5435F720, 0x1D098A07, 0xE7508F03, 0xAE6CF224, 0x7528754D, 0x3C14086A,
+ 0x0D006599, 0x443C18BE, 0x9F789FD7, 0xD644E2F0, 0x2C1DE7F4, 0x65219AD3, 0xBE651DBA, 0xF759609D,
+ 0x4F3B6143, 0x06071C64, 0xDD439B0D, 0x947FE62A, 0x6E26E32E, 0x271A9E09, 0xFC5E1960, 0xB5626447,
+ 0x89766C2D, 0xC04A110A, 0x1B0E9663, 0x5232EB44, 0xA86BEE40, 0xE1579367, 0x3A13140E, 0x732F6929,
+ 0xCB4D68F7, 0x827115D0, 0x593592B9, 0x1009EF9E, 0xEA50EA9A, 0xA36C97BD, 0x782810D4, 0x31146DF3,
+ 0x1A00CB32, 0x533CB615, 0x8878317C, 0xC1444C5B, 0x3B1D495F, 0x72213478, 0xA965B311, 0xE059CE36,
+ 0x583BCFE8, 0x1107B2CF, 0xCA4335A6, 0x837F4881, 0x79264D85, 0x301A30A2, 0xEB5EB7CB, 0xA262CAEC,
+ 0x9E76C286, 0xD74ABFA1, 0x0C0E38C8, 0x453245EF, 0xBF6B40EB, 0xF6573DCC, 0x2D13BAA5, 0x642FC782,
+ 0xDC4DC65C, 0x9571BB7B, 0x4E353C12, 0x07094135, 0xFD504431, 0xB46C3916, 0x6F28BE7F, 0x2614C358,
+ 0x1700AEAB, 0x5E3CD38C, 0x857854E5, 0xCC4429C2, 0x361D2CC6, 0x7F2151E1, 0xA465D688, 0xED59ABAF,
+ 0x553BAA71, 0x1C07D756, 0xC743503F, 0x8E7F2D18, 0x7426281C, 0x3D1A553B, 0xE65ED252, 0xAF62AF75,
+ 0x9376A71F, 0xDA4ADA38, 0x010E5D51, 0x48322076, 0xB26B2572, 0xFB575855, 0x2013DF3C, 0x692FA21B,
+ 0xD14DA3C5, 0x9871DEE2, 0x4335598B, 0x0A0924AC, 0xF05021A8, 0xB96C5C8F, 0x6228DBE6, 0x2B14A6C1,
+ 0x34019664, 0x7D3DEB43, 0xA6796C2A, 0xEF45110D, 0x151C1409, 0x5C20692E, 0x8764EE47, 0xCE589360,
+ 0x763A92BE, 0x3F06EF99, 0xE44268F0, 0xAD7E15D7, 0x572710D3, 0x1E1B6DF4, 0xC55FEA9D, 0x8C6397BA,
+ 0xB0779FD0, 0xF94BE2F7, 0x220F659E, 0x6B3318B9, 0x916A1DBD, 0xD856609A, 0x0312E7F3, 0x4A2E9AD4,
+ 0xF24C9B0A, 0xBB70E62D, 0x60346144, 0x29081C63, 0xD3511967, 0x9A6D6440, 0x4129E329, 0x08159E0E,
+ 0x3901F3FD, 0x703D8EDA, 0xAB7909B3, 0xE2457494, 0x181C7190, 0x51200CB7, 0x8A648BDE, 0xC358F6F9,
+ 0x7B3AF727, 0x32068A00, 0xE9420D69, 0xA07E704E, 0x5A27754A, 0x131B086D, 0xC85F8F04, 0x8163F223,
+ 0xBD77FA49, 0xF44B876E, 0x2F0F0007, 0x66337D20, 0x9C6A7824, 0xD5560503, 0x0E12826A, 0x472EFF4D,
+ 0xFF4CFE93, 0xB67083B4, 0x6D3404DD, 0x240879FA, 0xDE517CFE, 0x976D01D9, 0x4C2986B0, 0x0515FB97,
+ 0x2E015D56, 0x673D2071, 0xBC79A718, 0xF545DA3F, 0x0F1CDF3B, 0x4620A21C, 0x9D642575, 0xD4585852,
+ 0x6C3A598C, 0x250624AB, 0xFE42A3C2, 0xB77EDEE5, 0x4D27DBE1, 0x041BA6C6, 0xDF5F21AF, 0x96635C88,
+ 0xAA7754E2, 0xE34B29C5, 0x380FAEAC, 0x7133D38B, 0x8B6AD68F, 0xC256ABA8, 0x19122CC1, 0x502E51E6,
+ 0xE84C5038, 0xA1702D1F, 0x7A34AA76, 0x3308D751, 0xC951D255, 0x806DAF72, 0x5B29281B, 0x1215553C,
+ 0x230138CF, 0x6A3D45E8, 0xB179C281, 0xF845BFA6, 0x021CBAA2, 0x4B20C785, 0x906440EC, 0xD9583DCB,
+ 0x613A3C15, 0x28064132, 0xF342C65B, 0xBA7EBB7C, 0x4027BE78, 0x091BC35F, 0xD25F4436, 0x9B633911,
+ 0xA777317B, 0xEE4B4C5C, 0x350FCB35, 0x7C33B612, 0x866AB316, 0xCF56CE31, 0x14124958, 0x5D2E347F,
+ 0xE54C35A1, 0xAC704886, 0x7734CFEF, 0x3E08B2C8, 0xC451B7CC, 0x8D6DCAEB, 0x56294D82, 0x1F1530A5
+}};
+
+#define CRC32_UPD(crc, n) \
+	(crc32c_tables[(n)][(crc) & 0xFF] ^ \
+	 crc32c_tables[(n)-1][((crc) >> 8) & 0xFF])
+
+static inline uint32_t
+crc32c_1word(uint32_t data, uint32_t init_val)
+{
+	uint32_t crc, term1, term2;
+	crc = init_val;
+	crc ^= data;
+
+	term1 = CRC32_UPD(crc, 3);
+	term2 = crc >> 16;
+	crc = term1 ^ CRC32_UPD(term2, 1);
+
+	return crc;
+}
+
+static inline uint32_t
+crc32c_2words(uint64_t data, uint32_t init_val)
+{
+	union {
+		uint64_t u64;
+		uint32_t u32[2];
+	} d;
+	d.u64 = data;
+
+	uint32_t crc, term1, term2;
+
+	crc = init_val;
+	crc ^= d.u32[0];
+
+	term1 = CRC32_UPD(crc, 7);
+	term2 = crc >> 16;
+	crc = term1 ^ CRC32_UPD(term2, 5);
+	term1 = CRC32_UPD(d.u32[1], 3);
+	term2 = d.u32[1] >> 16;
+	crc ^= term1 ^ CRC32_UPD(term2, 1);
+
+	return crc;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
  *
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v3 2/5] hash: add new rte_hash_crc_8byte call
  2014-11-18  3:21   ` [dpdk-dev] [PATCH v3 0/5] " Yerden Zhumabekov
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
@ 2014-11-18  3:21     ` Yerden Zhumabekov
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18  3:21 UTC (permalink / raw)
  To: dev

SSE4.2 provides _mm_crc32_u64 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 4d7532a..15f687a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -380,6 +380,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }
 
 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+	return _mm_crc32_u64(init_val, data);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18  3:21   ` [dpdk-dev] [PATCH v3 0/5] " Yerden Zhumabekov
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 2/5] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
@ 2014-11-18  3:21     ` Yerden Zhumabekov
  2014-11-18  4:56       ` Yerden Zhumabekov
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 4/5] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
  2014-11-18  3:25     ` [dpdk-dev] [PATCH v3 5/5] test: remove redundant compile checks Yerden Zhumabekov
  4 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18  3:21 UTC (permalink / raw)
  To: dev

Initially, SSE4.2 support is detected via CPUID instruction.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default. If it's
not available, fall back to sw implementation.

Depending on compiler attributes support, best available algorithm
may be detected upon application startup.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   64 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 15f687a..c1b75e8 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,11 @@ extern "C" {
 #endif
 
 #include <stdint.h>
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <nmmintrin.h>
+#endif
+#include <rte_cpuflags.h>
+#include <rte_branch_prediction.h>
 
 /* Lookup tables for software implementation of CRC32C */
 static uint32_t crc32c_tables[8][256] = {{
@@ -363,8 +367,44 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 	return crc;
 }
 
+enum crc32_alg_t {
+	CRC32_SW = 0,
+	CRC32_SSE42,
+	CRC32_AUTODETECT
+};
+
+static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param flag
+ *   unsigned integer flag
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
+ */
+static inline void
+rte_hash_crc_set_alg(enum crc32_alg_t alg)
+{
+	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
+	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
+	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
+}
+
+/* Best available algorithm is detected via CPUID instruction */
+#ifndef __INTEL_COMPILER
+static inline void __attribute__((constructor))
+rte_hash_crc_try_sse42(void)
+{
+	rte_hash_crc_set_alg(CRC32_SSE42);
+}
+#endif
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -376,11 +416,22 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	return _mm_crc32_u32(init_val, data);
+#ifdef __INTEL_COMPILER
+	if (unlikely(crc32_alg == CRC32_AUTODETECT))
+		rte_hash_crc_set_alg(CRC32_SSE42);
+#endif
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+	if (likely(crc32_alg == CRC32_SSE42))
+		return _mm_crc32_u32(init_val, data);
+#endif
+
+	return crc32c_1word(data, init_val);
 }
 
 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -392,7 +443,16 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	return _mm_crc32_u64(init_val, data);
+#ifdef __INTEL_COMPILER
+	if (unlikely(crc32_alg == CRC32_AUTODETECT))
+		rte_hash_crc_set_alg(CRC32_SSE42);
+#endif
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+	if (likely(crc32_alg == CRC32_SSE42))
+		return _mm_crc32_u64(init_val, data);
+#endif
+
+	return crc32c_2words(data, init_val);
 }
 
 /**
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v3 4/5] hash: rte_hash_crc() slices data into 8-byte pieces
  2014-11-18  3:21   ` [dpdk-dev] [PATCH v3 0/5] " Yerden Zhumabekov
                       ` (2 preceding siblings ...)
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
@ 2014-11-18  3:21     ` Yerden Zhumabekov
  2014-11-18  3:25     ` [dpdk-dev] [PATCH v3 5/5] test: remove redundant compile checks Yerden Zhumabekov
  4 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18  3:21 UTC (permalink / raw)
  To: dev

Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index c1b75e8..2d95e3c 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -456,7 +456,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }
 
 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -471,23 +471,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
 	unsigned i;
-	uint32_t temp = 0;
-	const uint32_t *p32 = (const uint32_t *)data;
+	uint64_t temp = 0;
+	const uint64_t *p64 = (const uint64_t *)data;
 
-	for (i = 0; i < data_len / 4; i++) {
-		init_val = rte_hash_crc_4byte(*p32++, init_val);
+	for (i = 0; i < data_len / 8; i++) {
+		init_val = rte_hash_crc_8byte(*p64++, init_val);
 	}
 
-	switch (3 - (data_len & 0x03)) {
+	switch (7 - (data_len & 0x07)) {
 	case 0:
-		temp |= *((const uint8_t *)p32 + 2) << 16;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
 		/* Fallthrough */
 	case 1:
-		temp |= *((const uint8_t *)p32 + 1) << 8;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
 		/* Fallthrough */
 	case 2:
-		temp |= *((const uint8_t *)p32);
+		temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+		temp |= *((const uint32_t *)p64);
+		init_val = rte_hash_crc_8byte(temp, init_val);
+		break;
+	case 3:
+		init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+		break;
+	case 4:
+		temp |= *((const uint8_t *)p64 + 2) << 16;
+		/* Fallthrough */
+	case 5:
+		temp |= *((const uint8_t *)p64 + 1) << 8;
+		/* Fallthrough */
+	case 6:
+		temp |= *((const uint8_t *)p64);
 		init_val = rte_hash_crc_4byte(temp, init_val);
+		/* Fallthrough */
 	default:
 		break;
 	}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v3 5/5] test: remove redundant compile checks
  2014-11-18  3:21   ` [dpdk-dev] [PATCH v3 0/5] " Yerden Zhumabekov
                       ` (3 preceding siblings ...)
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 4/5] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
@ 2014-11-18  3:25     ` Yerden Zhumabekov
  4 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18  3:25 UTC (permalink / raw)
  To: dev

Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 app/test/test_hash.c      |    7 -------
 app/test/test_hash_perf.c |   11 -----------
 2 files changed, 18 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 178ec3f..76b1b8f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -55,10 +55,7 @@
 #include <rte_hash.h>
 #include <rte_fbk_hash.h>
 #include <rte_jhash.h>
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <rte_hash_crc.h>
-#endif
 
 /*******************************************************************************
  * Hash function performance test configuration section. Each performance test
@@ -67,11 +64,7 @@
  * The five arrays below control what tests are performed. Every combination
  * from the array entries is tested.
  */
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64};
 /******************************************************************************/
diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c
index be34957..05a88ec 100644
--- a/app/test/test_hash_perf.c
+++ b/app/test/test_hash_perf.c
@@ -56,10 +56,7 @@
 #include <rte_hash.h>
 #include <rte_fbk_hash.h>
 #include <rte_jhash.h>
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <rte_hash_crc.h>
-#endif
 
 /* Types of hash table performance test that can be performed */
 enum hash_test_t {
@@ -97,11 +94,7 @@ struct tbl_perf_test_params {
  */
 #define HASHTEST_ITERATIONS 1000000
 
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64};
 /******************************************************************************/
@@ -243,7 +236,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {       LOOKUP,  ITERATIONS,  1048576,           4,      64,    rte_jhash,   0},
 {       LOOKUP,  ITERATIONS,  1048576,           8,      64,    rte_jhash,   0},
 {       LOOKUP,  ITERATIONS,  1048576,          16,      64,    rte_jhash,   0},
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 /* Small table, add */
 /*  Test type | Iterations | Entries | BucketSize | KeyLen |    HashFunc | InitVal */
 { ADD_ON_EMPTY,        1024,     1024,           1,      16, rte_hash_crc,   0},
@@ -376,7 +368,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {       LOOKUP,  ITERATIONS,  1048576,           4,      64, rte_hash_crc,   0},
 {       LOOKUP,  ITERATIONS,  1048576,           8,      64, rte_hash_crc,   0},
 {       LOOKUP,  ITERATIONS,  1048576,          16,      64, rte_hash_crc,   0},
-#endif
 };
 
 /******************************************************************************/
@@ -423,10 +414,8 @@ static const char *get_hash_name(rte_hash_function f)
 	if (f == rte_jhash)
 		return "jhash";
 
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 	if (f == rte_hash_crc)
 		return "rte_hash_crc";
-#endif
 
 	return "UnknownHash";
 }
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
@ 2014-11-18  4:56       ` Yerden Zhumabekov
  2014-11-18 13:33         ` Neil Horman
  0 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18  4:56 UTC (permalink / raw)
  To: dev

Sorry, maybe I made a mistake here.

Accoring to lib/librte_eal/common/eal_common_cpuflags.c code, it seemed
to me that constructor attribute is not supported by intel compiler. So
in that case here I decided to leave the code for autodetection. Am I
correct?

18.11.2014 9:21, Yerden Zhumabekov пишет:
> Initially, SSE4.2 support is detected via CPUID instruction.
>
> Added rte_hash_crc_set_alg() function to detect and set CRC32
> implementation if necessary. SSE4.2 is allowed by default. If it's
> not available, fall back to sw implementation.
>
> Depending on compiler attributes support, best available algorithm
> may be detected upon application startup.
>
> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> ---
>  lib/librte_hash/rte_hash_crc.h |   64 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 62 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index 15f687a..c1b75e8 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -45,7 +45,11 @@ extern "C" {
>  #endif
>  
>  #include <stdint.h>
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>  #include <nmmintrin.h>
> +#endif
> +#include <rte_cpuflags.h>
> +#include <rte_branch_prediction.h>
>  
>  /* Lookup tables for software implementation of CRC32C */
>  static uint32_t crc32c_tables[8][256] = {{
> @@ -363,8 +367,44 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>  	return crc;
>  }
>  
> +enum crc32_alg_t {
> +	CRC32_SW = 0,
> +	CRC32_SSE42,
> +	CRC32_AUTODETECT
> +};
> +
> +static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
> +
> +/**
> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> + * calculation.
> + *
> + * @param flag
> + *   unsigned integer flag
> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> + */
> +static inline void
> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> +{
> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> +	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
> +}
> +
> +/* Best available algorithm is detected via CPUID instruction */
> +#ifndef __INTEL_COMPILER
> +static inline void __attribute__((constructor))
> +rte_hash_crc_try_sse42(void)
> +{
> +	rte_hash_crc_set_alg(CRC32_SSE42);
> +}
> +#endif
> +
>  /**
>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -376,11 +416,22 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  {
> -	return _mm_crc32_u32(init_val, data);
> +#ifdef __INTEL_COMPILER
> +	if (unlikely(crc32_alg == CRC32_AUTODETECT))
> +		rte_hash_crc_set_alg(CRC32_SSE42);
> +#endif
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +	if (likely(crc32_alg == CRC32_SSE42))
> +		return _mm_crc32_u32(init_val, data);
> +#endif
> +
> +	return crc32c_1word(data, init_val);
>  }
>  
>  /**
>   * Use single crc32 instruction to perform a hash on a 8 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -392,7 +443,16 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
>  {
> -	return _mm_crc32_u64(init_val, data);
> +#ifdef __INTEL_COMPILER
> +	if (unlikely(crc32_alg == CRC32_AUTODETECT))
> +		rte_hash_crc_set_alg(CRC32_SSE42);
> +#endif
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +	if (likely(crc32_alg == CRC32_SSE42))
> +		return _mm_crc32_u64(init_val, data);
> +#endif
> +
> +	return crc32c_2words(data, init_val);
>  }
>  
>  /**

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18  4:56       ` Yerden Zhumabekov
@ 2014-11-18 13:33         ` Neil Horman
  2014-11-18 13:37           ` Yerden Zhumabekov
  2014-11-18 13:43           ` Thomas Monjalon
  0 siblings, 2 replies; 98+ messages in thread
From: Neil Horman @ 2014-11-18 13:33 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Tue, Nov 18, 2014 at 10:56:24AM +0600, Yerden Zhumabekov wrote:
> Sorry, maybe I made a mistake here.
> 
> Accoring to lib/librte_eal/common/eal_common_cpuflags.c code, it seemed
> to me that constructor attribute is not supported by intel compiler. So
> in that case here I decided to leave the code for autodetection. Am I
> correct?
> 

I don't think thats correct. The Intel Compiler claims support for most GCC
features, except where explicitly stated in the release notes, and I don't find
any documentation clearly excepting the constructor attribute from that list.
That said, since the intel compiler isn't open, I don't have access to it and
cannot confirm either way, though if its the case, the DPDK has a major issue,
as __attribute__((constructor)) is used extensively throughout the code
Neil



> 18.11.2014 9:21, Yerden Zhumabekov пишет:
> > Initially, SSE4.2 support is detected via CPUID instruction.
> >
> > Added rte_hash_crc_set_alg() function to detect and set CRC32
> > implementation if necessary. SSE4.2 is allowed by default. If it's
> > not available, fall back to sw implementation.
> >
> > Depending on compiler attributes support, best available algorithm
> > may be detected upon application startup.
> >
> > Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> > ---
> >  lib/librte_hash/rte_hash_crc.h |   64 ++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 62 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> > index 15f687a..c1b75e8 100644
> > --- a/lib/librte_hash/rte_hash_crc.h
> > +++ b/lib/librte_hash/rte_hash_crc.h
> > @@ -45,7 +45,11 @@ extern "C" {
> >  #endif
> >  
> >  #include <stdint.h>
> > +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> >  #include <nmmintrin.h>
> > +#endif
> > +#include <rte_cpuflags.h>
> > +#include <rte_branch_prediction.h>
> >  
> >  /* Lookup tables for software implementation of CRC32C */
> >  static uint32_t crc32c_tables[8][256] = {{
> > @@ -363,8 +367,44 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> >  	return crc;
> >  }
> >  
> > +enum crc32_alg_t {
> > +	CRC32_SW = 0,
> > +	CRC32_SSE42,
> > +	CRC32_AUTODETECT
> > +};
> > +
> > +static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
> > +
> > +/**
> > + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> > + * calculation.
> > + *
> > + * @param flag
> > + *   unsigned integer flag
> > + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> > + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> > + */
> > +static inline void
> > +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> > +{
> > +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> > +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> > +	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
> > +}
> > +
> > +/* Best available algorithm is detected via CPUID instruction */
> > +#ifndef __INTEL_COMPILER
> > +static inline void __attribute__((constructor))
> > +rte_hash_crc_try_sse42(void)
> > +{
> > +	rte_hash_crc_set_alg(CRC32_SSE42);
> > +}
> > +#endif
> > +
> >  /**
> >   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > + * Fall back to software crc32 implementation in case SSE4.2 is
> > + * not supported
> >   *
> >   * @param data
> >   *   Data to perform hash on.
> > @@ -376,11 +416,22 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> >  static inline uint32_t
> >  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> >  {
> > -	return _mm_crc32_u32(init_val, data);
> > +#ifdef __INTEL_COMPILER
> > +	if (unlikely(crc32_alg == CRC32_AUTODETECT))
> > +		rte_hash_crc_set_alg(CRC32_SSE42);
> > +#endif
> > +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > +	if (likely(crc32_alg == CRC32_SSE42))
> > +		return _mm_crc32_u32(init_val, data);
> > +#endif
> > +
> > +	return crc32c_1word(data, init_val);
> >  }
> >  
> >  /**
> >   * Use single crc32 instruction to perform a hash on a 8 byte value.
> > + * Fall back to software crc32 implementation in case SSE4.2 is
> > + * not supported
> >   *
> >   * @param data
> >   *   Data to perform hash on.
> > @@ -392,7 +443,16 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> >  static inline uint32_t
> >  rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
> >  {
> > -	return _mm_crc32_u64(init_val, data);
> > +#ifdef __INTEL_COMPILER
> > +	if (unlikely(crc32_alg == CRC32_AUTODETECT))
> > +		rte_hash_crc_set_alg(CRC32_SSE42);
> > +#endif
> > +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > +	if (likely(crc32_alg == CRC32_SSE42))
> > +		return _mm_crc32_u64(init_val, data);
> > +#endif
> > +
> > +	return crc32c_2words(data, init_val);
> >  }
> >  
> >  /**
> 
> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 13:33         ` Neil Horman
@ 2014-11-18 13:37           ` Yerden Zhumabekov
  2014-11-18 13:43           ` Thomas Monjalon
  1 sibling, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 13:37 UTC (permalink / raw)
  To: Neil Horman, dev


18.11.2014 19:33, Neil Horman пишет:
> On Tue, Nov 18, 2014 at 10:56:24AM +0600, Yerden Zhumabekov wrote:
>> Sorry, maybe I made a mistake here.
>>
>> Accoring to lib/librte_eal/common/eal_common_cpuflags.c code, it seemed
>> to me that constructor attribute is not supported by intel compiler. So
>> in that case here I decided to leave the code for autodetection. Am I
>> correct?
>>
> I don't think thats correct. The Intel Compiler claims support for most GCC
> features, except where explicitly stated in the release notes, and I don't find
> any documentation clearly excepting the constructor attribute from that list.
> That said, since the intel compiler isn't open, I don't have access to it and
> cannot confirm either way, though if its the case, the DPDK has a major issue,
> as __attribute__((constructor)) is used extensively throughout the code
> Neil

My bad. Ok, I'll redo it again and send the series as 'v4'.
Thanks.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 13:33         ` Neil Horman
  2014-11-18 13:37           ` Yerden Zhumabekov
@ 2014-11-18 13:43           ` Thomas Monjalon
  1 sibling, 0 replies; 98+ messages in thread
From: Thomas Monjalon @ 2014-11-18 13:43 UTC (permalink / raw)
  To: dev

2014-11-18 08:33, Neil Horman:
> On Tue, Nov 18, 2014 at 10:56:24AM +0600, Yerden Zhumabekov wrote:
> > Sorry, maybe I made a mistake here.
> > 
> > Accoring to lib/librte_eal/common/eal_common_cpuflags.c code, it seemed
> > to me that constructor attribute is not supported by intel compiler. So
> > in that case here I decided to leave the code for autodetection. Am I
> > correct?
> 
> I don't think thats correct. The Intel Compiler claims support for most GCC
> features, except where explicitly stated in the release notes, and I don't find
> any documentation clearly excepting the constructor attribute from that list.
> That said, since the intel compiler isn't open, I don't have access to it and
> cannot confirm either way, though if its the case, the DPDK has a major issue,
> as __attribute__((constructor)) is used extensively throughout the code

The comment of rte_cpu_check_supported() is:
	"with ICC, the check is generated by the compiler"
So in my understanding, the constructor attribute is not set because this function
isn't needed for ICC.
I'd like to see an explanation of how CPU flags are set by ICC.

-- 
Thomas

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
                   ` (7 preceding siblings ...)
  2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 4/4] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
@ 2014-11-18 14:03 ` Yerden Zhumabekov
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
                     ` (4 more replies)
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
  10 siblings, 5 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 14:03 UTC (permalink / raw)
  To: dev

This is a rework of my previous patches improving performance of rte_hash_crc. In addition, this revision brings a fallback mechanism to ensure that CRC32 hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics). Performance of software CRC32 implementation is also improved.

Summary of changes:
* added CRC32 software implementation, which is used as a fallback in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
* added rte_hash_crc_set_alg() function to control availability of SSE4.2.
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.
* removed compile-time checks from test_hash_perf and test_hash.
* setting default algorithm implementation as a constructor while application startup.
* compared to v3, icc-specific code was removed

Patches were tested on machines either with and without SSE4.2 support. Software implementation seems to be about 4-5 times slower than SSE4.2-enabled one. Of course, they return identical results.

Yerden Zhumabekov (5):
  hash: add software CRC32 implementation
  hash: add new rte_hash_crc_8byte call
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces
  test: remove redundant compile checks

 app/test/test_hash.c           |    7 -
 app/test/test_hash_perf.c      |   11 --
 lib/librte_hash/rte_hash_crc.h |  416 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 406 insertions(+), 28 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v4 1/5] hash: add software CRC32 implementation
  2014-11-18 14:03 ` [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
@ 2014-11-18 14:03   ` Yerden Zhumabekov
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 2/5] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 14:03 UTC (permalink / raw)
  To: dev

Add lookup tables for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and
64-bit operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |  316 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 316 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..4d7532a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -47,6 +47,322 @@ extern "C" {
 #include <stdint.h>
 #include <nmmintrin.h>
 
+/* Lookup tables for software implementation of CRC32C */
+static uint32_t crc32c_tables[8][256] = {{
+ 0x00000000, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, 0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB,
+ 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, 0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24,
+ 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, 0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384,
+ 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, 0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B,
+ 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, 0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35,
+ 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, 0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA,
+ 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, 0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A,
+ 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, 0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595,
+ 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, 0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957,
+ 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, 0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198,
+ 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, 0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38,
+ 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, 0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7,
+ 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, 0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789,
+ 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, 0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46,
+ 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, 0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6,
+ 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, 0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829,
+ 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, 0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93,
+ 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, 0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C,
+ 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, 0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC,
+ 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, 0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033,
+ 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, 0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D,
+ 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, 0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982,
+ 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, 0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622,
+ 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, 0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED,
+ 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, 0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F,
+ 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, 0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0,
+ 0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F, 0x144976B4, 0xE622F5B7, 0xF5720643, 0x07198540,
+ 0x590AB964, 0xAB613A67, 0xB831C993, 0x4A5A4A90, 0x9E902E7B, 0x6CFBAD78, 0x7FAB5E8C, 0x8DC0DD8F,
+ 0xE330A81A, 0x115B2B19, 0x020BD8ED, 0xF0605BEE, 0x24AA3F05, 0xD6C1BC06, 0xC5914FF2, 0x37FACCF1,
+ 0x69E9F0D5, 0x9B8273D6, 0x88D28022, 0x7AB90321, 0xAE7367CA, 0x5C18E4C9, 0x4F48173D, 0xBD23943E,
+ 0xF36E6F75, 0x0105EC76, 0x12551F82, 0xE03E9C81, 0x34F4F86A, 0xC69F7B69, 0xD5CF889D, 0x27A40B9E,
+ 0x79B737BA, 0x8BDCB4B9, 0x988C474D, 0x6AE7C44E, 0xBE2DA0A5, 0x4C4623A6, 0x5F16D052, 0xAD7D5351
+},
+{
+ 0x00000000, 0x13A29877, 0x274530EE, 0x34E7A899, 0x4E8A61DC, 0x5D28F9AB, 0x69CF5132, 0x7A6DC945,
+ 0x9D14C3B8, 0x8EB65BCF, 0xBA51F356, 0xA9F36B21, 0xD39EA264, 0xC03C3A13, 0xF4DB928A, 0xE7790AFD,
+ 0x3FC5F181, 0x2C6769F6, 0x1880C16F, 0x0B225918, 0x714F905D, 0x62ED082A, 0x560AA0B3, 0x45A838C4,
+ 0xA2D13239, 0xB173AA4E, 0x859402D7, 0x96369AA0, 0xEC5B53E5, 0xFFF9CB92, 0xCB1E630B, 0xD8BCFB7C,
+ 0x7F8BE302, 0x6C297B75, 0x58CED3EC, 0x4B6C4B9B, 0x310182DE, 0x22A31AA9, 0x1644B230, 0x05E62A47,
+ 0xE29F20BA, 0xF13DB8CD, 0xC5DA1054, 0xD6788823, 0xAC154166, 0xBFB7D911, 0x8B507188, 0x98F2E9FF,
+ 0x404E1283, 0x53EC8AF4, 0x670B226D, 0x74A9BA1A, 0x0EC4735F, 0x1D66EB28, 0x298143B1, 0x3A23DBC6,
+ 0xDD5AD13B, 0xCEF8494C, 0xFA1FE1D5, 0xE9BD79A2, 0x93D0B0E7, 0x80722890, 0xB4958009, 0xA737187E,
+ 0xFF17C604, 0xECB55E73, 0xD852F6EA, 0xCBF06E9D, 0xB19DA7D8, 0xA23F3FAF, 0x96D89736, 0x857A0F41,
+ 0x620305BC, 0x71A19DCB, 0x45463552, 0x56E4AD25, 0x2C896460, 0x3F2BFC17, 0x0BCC548E, 0x186ECCF9,
+ 0xC0D23785, 0xD370AFF2, 0xE797076B, 0xF4359F1C, 0x8E585659, 0x9DFACE2E, 0xA91D66B7, 0xBABFFEC0,
+ 0x5DC6F43D, 0x4E646C4A, 0x7A83C4D3, 0x69215CA4, 0x134C95E1, 0x00EE0D96, 0x3409A50F, 0x27AB3D78,
+ 0x809C2506, 0x933EBD71, 0xA7D915E8, 0xB47B8D9F, 0xCE1644DA, 0xDDB4DCAD, 0xE9537434, 0xFAF1EC43,
+ 0x1D88E6BE, 0x0E2A7EC9, 0x3ACDD650, 0x296F4E27, 0x53028762, 0x40A01F15, 0x7447B78C, 0x67E52FFB,
+ 0xBF59D487, 0xACFB4CF0, 0x981CE469, 0x8BBE7C1E, 0xF1D3B55B, 0xE2712D2C, 0xD69685B5, 0xC5341DC2,
+ 0x224D173F, 0x31EF8F48, 0x050827D1, 0x16AABFA6, 0x6CC776E3, 0x7F65EE94, 0x4B82460D, 0x5820DE7A,
+ 0xFBC3FAF9, 0xE861628E, 0xDC86CA17, 0xCF245260, 0xB5499B25, 0xA6EB0352, 0x920CABCB, 0x81AE33BC,
+ 0x66D73941, 0x7575A136, 0x419209AF, 0x523091D8, 0x285D589D, 0x3BFFC0EA, 0x0F186873, 0x1CBAF004,
+ 0xC4060B78, 0xD7A4930F, 0xE3433B96, 0xF0E1A3E1, 0x8A8C6AA4, 0x992EF2D3, 0xADC95A4A, 0xBE6BC23D,
+ 0x5912C8C0, 0x4AB050B7, 0x7E57F82E, 0x6DF56059, 0x1798A91C, 0x043A316B, 0x30DD99F2, 0x237F0185,
+ 0x844819FB, 0x97EA818C, 0xA30D2915, 0xB0AFB162, 0xCAC27827, 0xD960E050, 0xED8748C9, 0xFE25D0BE,
+ 0x195CDA43, 0x0AFE4234, 0x3E19EAAD, 0x2DBB72DA, 0x57D6BB9F, 0x447423E8, 0x70938B71, 0x63311306,
+ 0xBB8DE87A, 0xA82F700D, 0x9CC8D894, 0x8F6A40E3, 0xF50789A6, 0xE6A511D1, 0xD242B948, 0xC1E0213F,
+ 0x26992BC2, 0x353BB3B5, 0x01DC1B2C, 0x127E835B, 0x68134A1E, 0x7BB1D269, 0x4F567AF0, 0x5CF4E287,
+ 0x04D43CFD, 0x1776A48A, 0x23910C13, 0x30339464, 0x4A5E5D21, 0x59FCC556, 0x6D1B6DCF, 0x7EB9F5B8,
+ 0x99C0FF45, 0x8A626732, 0xBE85CFAB, 0xAD2757DC, 0xD74A9E99, 0xC4E806EE, 0xF00FAE77, 0xE3AD3600,
+ 0x3B11CD7C, 0x28B3550B, 0x1C54FD92, 0x0FF665E5, 0x759BACA0, 0x663934D7, 0x52DE9C4E, 0x417C0439,
+ 0xA6050EC4, 0xB5A796B3, 0x81403E2A, 0x92E2A65D, 0xE88F6F18, 0xFB2DF76F, 0xCFCA5FF6, 0xDC68C781,
+ 0x7B5FDFFF, 0x68FD4788, 0x5C1AEF11, 0x4FB87766, 0x35D5BE23, 0x26772654, 0x12908ECD, 0x013216BA,
+ 0xE64B1C47, 0xF5E98430, 0xC10E2CA9, 0xD2ACB4DE, 0xA8C17D9B, 0xBB63E5EC, 0x8F844D75, 0x9C26D502,
+ 0x449A2E7E, 0x5738B609, 0x63DF1E90, 0x707D86E7, 0x0A104FA2, 0x19B2D7D5, 0x2D557F4C, 0x3EF7E73B,
+ 0xD98EEDC6, 0xCA2C75B1, 0xFECBDD28, 0xED69455F, 0x97048C1A, 0x84A6146D, 0xB041BCF4, 0xA3E32483
+},
+{
+ 0x00000000, 0xA541927E, 0x4F6F520D, 0xEA2EC073, 0x9EDEA41A, 0x3B9F3664, 0xD1B1F617, 0x74F06469,
+ 0x38513EC5, 0x9D10ACBB, 0x773E6CC8, 0xD27FFEB6, 0xA68F9ADF, 0x03CE08A1, 0xE9E0C8D2, 0x4CA15AAC,
+ 0x70A27D8A, 0xD5E3EFF4, 0x3FCD2F87, 0x9A8CBDF9, 0xEE7CD990, 0x4B3D4BEE, 0xA1138B9D, 0x045219E3,
+ 0x48F3434F, 0xEDB2D131, 0x079C1142, 0xA2DD833C, 0xD62DE755, 0x736C752B, 0x9942B558, 0x3C032726,
+ 0xE144FB14, 0x4405696A, 0xAE2BA919, 0x0B6A3B67, 0x7F9A5F0E, 0xDADBCD70, 0x30F50D03, 0x95B49F7D,
+ 0xD915C5D1, 0x7C5457AF, 0x967A97DC, 0x333B05A2, 0x47CB61CB, 0xE28AF3B5, 0x08A433C6, 0xADE5A1B8,
+ 0x91E6869E, 0x34A714E0, 0xDE89D493, 0x7BC846ED, 0x0F382284, 0xAA79B0FA, 0x40577089, 0xE516E2F7,
+ 0xA9B7B85B, 0x0CF62A25, 0xE6D8EA56, 0x43997828, 0x37691C41, 0x92288E3F, 0x78064E4C, 0xDD47DC32,
+ 0xC76580D9, 0x622412A7, 0x880AD2D4, 0x2D4B40AA, 0x59BB24C3, 0xFCFAB6BD, 0x16D476CE, 0xB395E4B0,
+ 0xFF34BE1C, 0x5A752C62, 0xB05BEC11, 0x151A7E6F, 0x61EA1A06, 0xC4AB8878, 0x2E85480B, 0x8BC4DA75,
+ 0xB7C7FD53, 0x12866F2D, 0xF8A8AF5E, 0x5DE93D20, 0x29195949, 0x8C58CB37, 0x66760B44, 0xC337993A,
+ 0x8F96C396, 0x2AD751E8, 0xC0F9919B, 0x65B803E5, 0x1148678C, 0xB409F5F2, 0x5E273581, 0xFB66A7FF,
+ 0x26217BCD, 0x8360E9B3, 0x694E29C0, 0xCC0FBBBE, 0xB8FFDFD7, 0x1DBE4DA9, 0xF7908DDA, 0x52D11FA4,
+ 0x1E704508, 0xBB31D776, 0x511F1705, 0xF45E857B, 0x80AEE112, 0x25EF736C, 0xCFC1B31F, 0x6A802161,
+ 0x56830647, 0xF3C29439, 0x19EC544A, 0xBCADC634, 0xC85DA25D, 0x6D1C3023, 0x8732F050, 0x2273622E,
+ 0x6ED23882, 0xCB93AAFC, 0x21BD6A8F, 0x84FCF8F1, 0xF00C9C98, 0x554D0EE6, 0xBF63CE95, 0x1A225CEB,
+ 0x8B277743, 0x2E66E53D, 0xC448254E, 0x6109B730, 0x15F9D359, 0xB0B84127, 0x5A968154, 0xFFD7132A,
+ 0xB3764986, 0x1637DBF8, 0xFC191B8B, 0x595889F5, 0x2DA8ED9C, 0x88E97FE2, 0x62C7BF91, 0xC7862DEF,
+ 0xFB850AC9, 0x5EC498B7, 0xB4EA58C4, 0x11ABCABA, 0x655BAED3, 0xC01A3CAD, 0x2A34FCDE, 0x8F756EA0,
+ 0xC3D4340C, 0x6695A672, 0x8CBB6601, 0x29FAF47F, 0x5D0A9016, 0xF84B0268, 0x1265C21B, 0xB7245065,
+ 0x6A638C57, 0xCF221E29, 0x250CDE5A, 0x804D4C24, 0xF4BD284D, 0x51FCBA33, 0xBBD27A40, 0x1E93E83E,
+ 0x5232B292, 0xF77320EC, 0x1D5DE09F, 0xB81C72E1, 0xCCEC1688, 0x69AD84F6, 0x83834485, 0x26C2D6FB,
+ 0x1AC1F1DD, 0xBF8063A3, 0x55AEA3D0, 0xF0EF31AE, 0x841F55C7, 0x215EC7B9, 0xCB7007CA, 0x6E3195B4,
+ 0x2290CF18, 0x87D15D66, 0x6DFF9D15, 0xC8BE0F6B, 0xBC4E6B02, 0x190FF97C, 0xF321390F, 0x5660AB71,
+ 0x4C42F79A, 0xE90365E4, 0x032DA597, 0xA66C37E9, 0xD29C5380, 0x77DDC1FE, 0x9DF3018D, 0x38B293F3,
+ 0x7413C95F, 0xD1525B21, 0x3B7C9B52, 0x9E3D092C, 0xEACD6D45, 0x4F8CFF3B, 0xA5A23F48, 0x00E3AD36,
+ 0x3CE08A10, 0x99A1186E, 0x738FD81D, 0xD6CE4A63, 0xA23E2E0A, 0x077FBC74, 0xED517C07, 0x4810EE79,
+ 0x04B1B4D5, 0xA1F026AB, 0x4BDEE6D8, 0xEE9F74A6, 0x9A6F10CF, 0x3F2E82B1, 0xD50042C2, 0x7041D0BC,
+ 0xAD060C8E, 0x08479EF0, 0xE2695E83, 0x4728CCFD, 0x33D8A894, 0x96993AEA, 0x7CB7FA99, 0xD9F668E7,
+ 0x9557324B, 0x3016A035, 0xDA386046, 0x7F79F238, 0x0B899651, 0xAEC8042F, 0x44E6C45C, 0xE1A75622,
+ 0xDDA47104, 0x78E5E37A, 0x92CB2309, 0x378AB177, 0x437AD51E, 0xE63B4760, 0x0C158713, 0xA954156D,
+ 0xE5F54FC1, 0x40B4DDBF, 0xAA9A1DCC, 0x0FDB8FB2, 0x7B2BEBDB, 0xDE6A79A5, 0x3444B9D6, 0x91052BA8
+},
+{
+ 0x00000000, 0xDD45AAB8, 0xBF672381, 0x62228939, 0x7B2231F3, 0xA6679B4B, 0xC4451272, 0x1900B8CA,
+ 0xF64463E6, 0x2B01C95E, 0x49234067, 0x9466EADF, 0x8D665215, 0x5023F8AD, 0x32017194, 0xEF44DB2C,
+ 0xE964B13D, 0x34211B85, 0x560392BC, 0x8B463804, 0x924680CE, 0x4F032A76, 0x2D21A34F, 0xF06409F7,
+ 0x1F20D2DB, 0xC2657863, 0xA047F15A, 0x7D025BE2, 0x6402E328, 0xB9474990, 0xDB65C0A9, 0x06206A11,
+ 0xD725148B, 0x0A60BE33, 0x6842370A, 0xB5079DB2, 0xAC072578, 0x71428FC0, 0x136006F9, 0xCE25AC41,
+ 0x2161776D, 0xFC24DDD5, 0x9E0654EC, 0x4343FE54, 0x5A43469E, 0x8706EC26, 0xE524651F, 0x3861CFA7,
+ 0x3E41A5B6, 0xE3040F0E, 0x81268637, 0x5C632C8F, 0x45639445, 0x98263EFD, 0xFA04B7C4, 0x27411D7C,
+ 0xC805C650, 0x15406CE8, 0x7762E5D1, 0xAA274F69, 0xB327F7A3, 0x6E625D1B, 0x0C40D422, 0xD1057E9A,
+ 0xABA65FE7, 0x76E3F55F, 0x14C17C66, 0xC984D6DE, 0xD0846E14, 0x0DC1C4AC, 0x6FE34D95, 0xB2A6E72D,
+ 0x5DE23C01, 0x80A796B9, 0xE2851F80, 0x3FC0B538, 0x26C00DF2, 0xFB85A74A, 0x99A72E73, 0x44E284CB,
+ 0x42C2EEDA, 0x9F874462, 0xFDA5CD5B, 0x20E067E3, 0x39E0DF29, 0xE4A57591, 0x8687FCA8, 0x5BC25610,
+ 0xB4868D3C, 0x69C32784, 0x0BE1AEBD, 0xD6A40405, 0xCFA4BCCF, 0x12E11677, 0x70C39F4E, 0xAD8635F6,
+ 0x7C834B6C, 0xA1C6E1D4, 0xC3E468ED, 0x1EA1C255, 0x07A17A9F, 0xDAE4D027, 0xB8C6591E, 0x6583F3A6,
+ 0x8AC7288A, 0x57828232, 0x35A00B0B, 0xE8E5A1B3, 0xF1E51979, 0x2CA0B3C1, 0x4E823AF8, 0x93C79040,
+ 0x95E7FA51, 0x48A250E9, 0x2A80D9D0, 0xF7C57368, 0xEEC5CBA2, 0x3380611A, 0x51A2E823, 0x8CE7429B,
+ 0x63A399B7, 0xBEE6330F, 0xDCC4BA36, 0x0181108E, 0x1881A844, 0xC5C402FC, 0xA7E68BC5, 0x7AA3217D,
+ 0x52A0C93F, 0x8FE56387, 0xEDC7EABE, 0x30824006, 0x2982F8CC, 0xF4C75274, 0x96E5DB4D, 0x4BA071F5,
+ 0xA4E4AAD9, 0x79A10061, 0x1B838958, 0xC6C623E0, 0xDFC69B2A, 0x02833192, 0x60A1B8AB, 0xBDE41213,
+ 0xBBC47802, 0x6681D2BA, 0x04A35B83, 0xD9E6F13B, 0xC0E649F1, 0x1DA3E349, 0x7F816A70, 0xA2C4C0C8,
+ 0x4D801BE4, 0x90C5B15C, 0xF2E73865, 0x2FA292DD, 0x36A22A17, 0xEBE780AF, 0x89C50996, 0x5480A32E,
+ 0x8585DDB4, 0x58C0770C, 0x3AE2FE35, 0xE7A7548D, 0xFEA7EC47, 0x23E246FF, 0x41C0CFC6, 0x9C85657E,
+ 0x73C1BE52, 0xAE8414EA, 0xCCA69DD3, 0x11E3376B, 0x08E38FA1, 0xD5A62519, 0xB784AC20, 0x6AC10698,
+ 0x6CE16C89, 0xB1A4C631, 0xD3864F08, 0x0EC3E5B0, 0x17C35D7A, 0xCA86F7C2, 0xA8A47EFB, 0x75E1D443,
+ 0x9AA50F6F, 0x47E0A5D7, 0x25C22CEE, 0xF8878656, 0xE1873E9C, 0x3CC29424, 0x5EE01D1D, 0x83A5B7A5,
+ 0xF90696D8, 0x24433C60, 0x4661B559, 0x9B241FE1, 0x8224A72B, 0x5F610D93, 0x3D4384AA, 0xE0062E12,
+ 0x0F42F53E, 0xD2075F86, 0xB025D6BF, 0x6D607C07, 0x7460C4CD, 0xA9256E75, 0xCB07E74C, 0x16424DF4,
+ 0x106227E5, 0xCD278D5D, 0xAF050464, 0x7240AEDC, 0x6B401616, 0xB605BCAE, 0xD4273597, 0x09629F2F,
+ 0xE6264403, 0x3B63EEBB, 0x59416782, 0x8404CD3A, 0x9D0475F0, 0x4041DF48, 0x22635671, 0xFF26FCC9,
+ 0x2E238253, 0xF36628EB, 0x9144A1D2, 0x4C010B6A, 0x5501B3A0, 0x88441918, 0xEA669021, 0x37233A99,
+ 0xD867E1B5, 0x05224B0D, 0x6700C234, 0xBA45688C, 0xA345D046, 0x7E007AFE, 0x1C22F3C7, 0xC167597F,
+ 0xC747336E, 0x1A0299D6, 0x782010EF, 0xA565BA57, 0xBC65029D, 0x6120A825, 0x0302211C, 0xDE478BA4,
+ 0x31035088, 0xEC46FA30, 0x8E647309, 0x5321D9B1, 0x4A21617B, 0x9764CBC3, 0xF54642FA, 0x2803E842
+},
+{
+ 0x00000000, 0x38116FAC, 0x7022DF58, 0x4833B0F4, 0xE045BEB0, 0xD854D11C, 0x906761E8, 0xA8760E44,
+ 0xC5670B91, 0xFD76643D, 0xB545D4C9, 0x8D54BB65, 0x2522B521, 0x1D33DA8D, 0x55006A79, 0x6D1105D5,
+ 0x8F2261D3, 0xB7330E7F, 0xFF00BE8B, 0xC711D127, 0x6F67DF63, 0x5776B0CF, 0x1F45003B, 0x27546F97,
+ 0x4A456A42, 0x725405EE, 0x3A67B51A, 0x0276DAB6, 0xAA00D4F2, 0x9211BB5E, 0xDA220BAA, 0xE2336406,
+ 0x1BA8B557, 0x23B9DAFB, 0x6B8A6A0F, 0x539B05A3, 0xFBED0BE7, 0xC3FC644B, 0x8BCFD4BF, 0xB3DEBB13,
+ 0xDECFBEC6, 0xE6DED16A, 0xAEED619E, 0x96FC0E32, 0x3E8A0076, 0x069B6FDA, 0x4EA8DF2E, 0x76B9B082,
+ 0x948AD484, 0xAC9BBB28, 0xE4A80BDC, 0xDCB96470, 0x74CF6A34, 0x4CDE0598, 0x04EDB56C, 0x3CFCDAC0,
+ 0x51EDDF15, 0x69FCB0B9, 0x21CF004D, 0x19DE6FE1, 0xB1A861A5, 0x89B90E09, 0xC18ABEFD, 0xF99BD151,
+ 0x37516AAE, 0x0F400502, 0x4773B5F6, 0x7F62DA5A, 0xD714D41E, 0xEF05BBB2, 0xA7360B46, 0x9F2764EA,
+ 0xF236613F, 0xCA270E93, 0x8214BE67, 0xBA05D1CB, 0x1273DF8F, 0x2A62B023, 0x625100D7, 0x5A406F7B,
+ 0xB8730B7D, 0x806264D1, 0xC851D425, 0xF040BB89, 0x5836B5CD, 0x6027DA61, 0x28146A95, 0x10050539,
+ 0x7D1400EC, 0x45056F40, 0x0D36DFB4, 0x3527B018, 0x9D51BE5C, 0xA540D1F0, 0xED736104, 0xD5620EA8,
+ 0x2CF9DFF9, 0x14E8B055, 0x5CDB00A1, 0x64CA6F0D, 0xCCBC6149, 0xF4AD0EE5, 0xBC9EBE11, 0x848FD1BD,
+ 0xE99ED468, 0xD18FBBC4, 0x99BC0B30, 0xA1AD649C, 0x09DB6AD8, 0x31CA0574, 0x79F9B580, 0x41E8DA2C,
+ 0xA3DBBE2A, 0x9BCAD186, 0xD3F96172, 0xEBE80EDE, 0x439E009A, 0x7B8F6F36, 0x33BCDFC2, 0x0BADB06E,
+ 0x66BCB5BB, 0x5EADDA17, 0x169E6AE3, 0x2E8F054F, 0x86F90B0B, 0xBEE864A7, 0xF6DBD453, 0xCECABBFF,
+ 0x6EA2D55C, 0x56B3BAF0, 0x1E800A04, 0x269165A8, 0x8EE76BEC, 0xB6F60440, 0xFEC5B4B4, 0xC6D4DB18,
+ 0xABC5DECD, 0x93D4B161, 0xDBE70195, 0xE3F66E39, 0x4B80607D, 0x73910FD1, 0x3BA2BF25, 0x03B3D089,
+ 0xE180B48F, 0xD991DB23, 0x91A26BD7, 0xA9B3047B, 0x01C50A3F, 0x39D46593, 0x71E7D567, 0x49F6BACB,
+ 0x24E7BF1E, 0x1CF6D0B2, 0x54C56046, 0x6CD40FEA, 0xC4A201AE, 0xFCB36E02, 0xB480DEF6, 0x8C91B15A,
+ 0x750A600B, 0x4D1B0FA7, 0x0528BF53, 0x3D39D0FF, 0x954FDEBB, 0xAD5EB117, 0xE56D01E3, 0xDD7C6E4F,
+ 0xB06D6B9A, 0x887C0436, 0xC04FB4C2, 0xF85EDB6E, 0x5028D52A, 0x6839BA86, 0x200A0A72, 0x181B65DE,
+ 0xFA2801D8, 0xC2396E74, 0x8A0ADE80, 0xB21BB12C, 0x1A6DBF68, 0x227CD0C4, 0x6A4F6030, 0x525E0F9C,
+ 0x3F4F0A49, 0x075E65E5, 0x4F6DD511, 0x777CBABD, 0xDF0AB4F9, 0xE71BDB55, 0xAF286BA1, 0x9739040D,
+ 0x59F3BFF2, 0x61E2D05E, 0x29D160AA, 0x11C00F06, 0xB9B60142, 0x81A76EEE, 0xC994DE1A, 0xF185B1B6,
+ 0x9C94B463, 0xA485DBCF, 0xECB66B3B, 0xD4A70497, 0x7CD10AD3, 0x44C0657F, 0x0CF3D58B, 0x34E2BA27,
+ 0xD6D1DE21, 0xEEC0B18D, 0xA6F30179, 0x9EE26ED5, 0x36946091, 0x0E850F3D, 0x46B6BFC9, 0x7EA7D065,
+ 0x13B6D5B0, 0x2BA7BA1C, 0x63940AE8, 0x5B856544, 0xF3F36B00, 0xCBE204AC, 0x83D1B458, 0xBBC0DBF4,
+ 0x425B0AA5, 0x7A4A6509, 0x3279D5FD, 0x0A68BA51, 0xA21EB415, 0x9A0FDBB9, 0xD23C6B4D, 0xEA2D04E1,
+ 0x873C0134, 0xBF2D6E98, 0xF71EDE6C, 0xCF0FB1C0, 0x6779BF84, 0x5F68D028, 0x175B60DC, 0x2F4A0F70,
+ 0xCD796B76, 0xF56804DA, 0xBD5BB42E, 0x854ADB82, 0x2D3CD5C6, 0x152DBA6A, 0x5D1E0A9E, 0x650F6532,
+ 0x081E60E7, 0x300F0F4B, 0x783CBFBF, 0x402DD013, 0xE85BDE57, 0xD04AB1FB, 0x9879010F, 0xA0686EA3
+},
+{
+ 0x00000000, 0xEF306B19, 0xDB8CA0C3, 0x34BCCBDA, 0xB2F53777, 0x5DC55C6E, 0x697997B4, 0x8649FCAD,
+ 0x6006181F, 0x8F367306, 0xBB8AB8DC, 0x54BAD3C5, 0xD2F32F68, 0x3DC34471, 0x097F8FAB, 0xE64FE4B2,
+ 0xC00C303E, 0x2F3C5B27, 0x1B8090FD, 0xF4B0FBE4, 0x72F90749, 0x9DC96C50, 0xA975A78A, 0x4645CC93,
+ 0xA00A2821, 0x4F3A4338, 0x7B8688E2, 0x94B6E3FB, 0x12FF1F56, 0xFDCF744F, 0xC973BF95, 0x2643D48C,
+ 0x85F4168D, 0x6AC47D94, 0x5E78B64E, 0xB148DD57, 0x370121FA, 0xD8314AE3, 0xEC8D8139, 0x03BDEA20,
+ 0xE5F20E92, 0x0AC2658B, 0x3E7EAE51, 0xD14EC548, 0x570739E5, 0xB83752FC, 0x8C8B9926, 0x63BBF23F,
+ 0x45F826B3, 0xAAC84DAA, 0x9E748670, 0x7144ED69, 0xF70D11C4, 0x183D7ADD, 0x2C81B107, 0xC3B1DA1E,
+ 0x25FE3EAC, 0xCACE55B5, 0xFE729E6F, 0x1142F576, 0x970B09DB, 0x783B62C2, 0x4C87A918, 0xA3B7C201,
+ 0x0E045BEB, 0xE13430F2, 0xD588FB28, 0x3AB89031, 0xBCF16C9C, 0x53C10785, 0x677DCC5F, 0x884DA746,
+ 0x6E0243F4, 0x813228ED, 0xB58EE337, 0x5ABE882E, 0xDCF77483, 0x33C71F9A, 0x077BD440, 0xE84BBF59,
+ 0xCE086BD5, 0x213800CC, 0x1584CB16, 0xFAB4A00F, 0x7CFD5CA2, 0x93CD37BB, 0xA771FC61, 0x48419778,
+ 0xAE0E73CA, 0x413E18D3, 0x7582D309, 0x9AB2B810, 0x1CFB44BD, 0xF3CB2FA4, 0xC777E47E, 0x28478F67,
+ 0x8BF04D66, 0x64C0267F, 0x507CEDA5, 0xBF4C86BC, 0x39057A11, 0xD6351108, 0xE289DAD2, 0x0DB9B1CB,
+ 0xEBF65579, 0x04C63E60, 0x307AF5BA, 0xDF4A9EA3, 0x5903620E, 0xB6330917, 0x828FC2CD, 0x6DBFA9D4,
+ 0x4BFC7D58, 0xA4CC1641, 0x9070DD9B, 0x7F40B682, 0xF9094A2F, 0x16392136, 0x2285EAEC, 0xCDB581F5,
+ 0x2BFA6547, 0xC4CA0E5E, 0xF076C584, 0x1F46AE9D, 0x990F5230, 0x763F3929, 0x4283F2F3, 0xADB399EA,
+ 0x1C08B7D6, 0xF338DCCF, 0xC7841715, 0x28B47C0C, 0xAEFD80A1, 0x41CDEBB8, 0x75712062, 0x9A414B7B,
+ 0x7C0EAFC9, 0x933EC4D0, 0xA7820F0A, 0x48B26413, 0xCEFB98BE, 0x21CBF3A7, 0x1577387D, 0xFA475364,
+ 0xDC0487E8, 0x3334ECF1, 0x0788272B, 0xE8B84C32, 0x6EF1B09F, 0x81C1DB86, 0xB57D105C, 0x5A4D7B45,
+ 0xBC029FF7, 0x5332F4EE, 0x678E3F34, 0x88BE542D, 0x0EF7A880, 0xE1C7C399, 0xD57B0843, 0x3A4B635A,
+ 0x99FCA15B, 0x76CCCA42, 0x42700198, 0xAD406A81, 0x2B09962C, 0xC439FD35, 0xF08536EF, 0x1FB55DF6,
+ 0xF9FAB944, 0x16CAD25D, 0x22761987, 0xCD46729E, 0x4B0F8E33, 0xA43FE52A, 0x90832EF0, 0x7FB345E9,
+ 0x59F09165, 0xB6C0FA7C, 0x827C31A6, 0x6D4C5ABF, 0xEB05A612, 0x0435CD0B, 0x308906D1, 0xDFB96DC8,
+ 0x39F6897A, 0xD6C6E263, 0xE27A29B9, 0x0D4A42A0, 0x8B03BE0D, 0x6433D514, 0x508F1ECE, 0xBFBF75D7,
+ 0x120CEC3D, 0xFD3C8724, 0xC9804CFE, 0x26B027E7, 0xA0F9DB4A, 0x4FC9B053, 0x7B757B89, 0x94451090,
+ 0x720AF422, 0x9D3A9F3B, 0xA98654E1, 0x46B63FF8, 0xC0FFC355, 0x2FCFA84C, 0x1B736396, 0xF443088F,
+ 0xD200DC03, 0x3D30B71A, 0x098C7CC0, 0xE6BC17D9, 0x60F5EB74, 0x8FC5806D, 0xBB794BB7, 0x544920AE,
+ 0xB206C41C, 0x5D36AF05, 0x698A64DF, 0x86BA0FC6, 0x00F3F36B, 0xEFC39872, 0xDB7F53A8, 0x344F38B1,
+ 0x97F8FAB0, 0x78C891A9, 0x4C745A73, 0xA344316A, 0x250DCDC7, 0xCA3DA6DE, 0xFE816D04, 0x11B1061D,
+ 0xF7FEE2AF, 0x18CE89B6, 0x2C72426C, 0xC3422975, 0x450BD5D8, 0xAA3BBEC1, 0x9E87751B, 0x71B71E02,
+ 0x57F4CA8E, 0xB8C4A197, 0x8C786A4D, 0x63480154, 0xE501FDF9, 0x0A3196E0, 0x3E8D5D3A, 0xD1BD3623,
+ 0x37F2D291, 0xD8C2B988, 0xEC7E7252, 0x034E194B, 0x8507E5E6, 0x6A378EFF, 0x5E8B4525, 0xB1BB2E3C
+},
+{
+ 0x00000000, 0x68032CC8, 0xD0065990, 0xB8057558, 0xA5E0C5D1, 0xCDE3E919, 0x75E69C41, 0x1DE5B089,
+ 0x4E2DFD53, 0x262ED19B, 0x9E2BA4C3, 0xF628880B, 0xEBCD3882, 0x83CE144A, 0x3BCB6112, 0x53C84DDA,
+ 0x9C5BFAA6, 0xF458D66E, 0x4C5DA336, 0x245E8FFE, 0x39BB3F77, 0x51B813BF, 0xE9BD66E7, 0x81BE4A2F,
+ 0xD27607F5, 0xBA752B3D, 0x02705E65, 0x6A7372AD, 0x7796C224, 0x1F95EEEC, 0xA7909BB4, 0xCF93B77C,
+ 0x3D5B83BD, 0x5558AF75, 0xED5DDA2D, 0x855EF6E5, 0x98BB466C, 0xF0B86AA4, 0x48BD1FFC, 0x20BE3334,
+ 0x73767EEE, 0x1B755226, 0xA370277E, 0xCB730BB6, 0xD696BB3F, 0xBE9597F7, 0x0690E2AF, 0x6E93CE67,
+ 0xA100791B, 0xC90355D3, 0x7106208B, 0x19050C43, 0x04E0BCCA, 0x6CE39002, 0xD4E6E55A, 0xBCE5C992,
+ 0xEF2D8448, 0x872EA880, 0x3F2BDDD8, 0x5728F110, 0x4ACD4199, 0x22CE6D51, 0x9ACB1809, 0xF2C834C1,
+ 0x7AB7077A, 0x12B42BB2, 0xAAB15EEA, 0xC2B27222, 0xDF57C2AB, 0xB754EE63, 0x0F519B3B, 0x6752B7F3,
+ 0x349AFA29, 0x5C99D6E1, 0xE49CA3B9, 0x8C9F8F71, 0x917A3FF8, 0xF9791330, 0x417C6668, 0x297F4AA0,
+ 0xE6ECFDDC, 0x8EEFD114, 0x36EAA44C, 0x5EE98884, 0x430C380D, 0x2B0F14C5, 0x930A619D, 0xFB094D55,
+ 0xA8C1008F, 0xC0C22C47, 0x78C7591F, 0x10C475D7, 0x0D21C55E, 0x6522E996, 0xDD279CCE, 0xB524B006,
+ 0x47EC84C7, 0x2FEFA80F, 0x97EADD57, 0xFFE9F19F, 0xE20C4116, 0x8A0F6DDE, 0x320A1886, 0x5A09344E,
+ 0x09C17994, 0x61C2555C, 0xD9C72004, 0xB1C40CCC, 0xAC21BC45, 0xC422908D, 0x7C27E5D5, 0x1424C91D,
+ 0xDBB77E61, 0xB3B452A9, 0x0BB127F1, 0x63B20B39, 0x7E57BBB0, 0x16549778, 0xAE51E220, 0xC652CEE8,
+ 0x959A8332, 0xFD99AFFA, 0x459CDAA2, 0x2D9FF66A, 0x307A46E3, 0x58796A2B, 0xE07C1F73, 0x887F33BB,
+ 0xF56E0EF4, 0x9D6D223C, 0x25685764, 0x4D6B7BAC, 0x508ECB25, 0x388DE7ED, 0x808892B5, 0xE88BBE7D,
+ 0xBB43F3A7, 0xD340DF6F, 0x6B45AA37, 0x034686FF, 0x1EA33676, 0x76A01ABE, 0xCEA56FE6, 0xA6A6432E,
+ 0x6935F452, 0x0136D89A, 0xB933ADC2, 0xD130810A, 0xCCD53183, 0xA4D61D4B, 0x1CD36813, 0x74D044DB,
+ 0x27180901, 0x4F1B25C9, 0xF71E5091, 0x9F1D7C59, 0x82F8CCD0, 0xEAFBE018, 0x52FE9540, 0x3AFDB988,
+ 0xC8358D49, 0xA036A181, 0x1833D4D9, 0x7030F811, 0x6DD54898, 0x05D66450, 0xBDD31108, 0xD5D03DC0,
+ 0x8618701A, 0xEE1B5CD2, 0x561E298A, 0x3E1D0542, 0x23F8B5CB, 0x4BFB9903, 0xF3FEEC5B, 0x9BFDC093,
+ 0x546E77EF, 0x3C6D5B27, 0x84682E7F, 0xEC6B02B7, 0xF18EB23E, 0x998D9EF6, 0x2188EBAE, 0x498BC766,
+ 0x1A438ABC, 0x7240A674, 0xCA45D32C, 0xA246FFE4, 0xBFA34F6D, 0xD7A063A5, 0x6FA516FD, 0x07A63A35,
+ 0x8FD9098E, 0xE7DA2546, 0x5FDF501E, 0x37DC7CD6, 0x2A39CC5F, 0x423AE097, 0xFA3F95CF, 0x923CB907,
+ 0xC1F4F4DD, 0xA9F7D815, 0x11F2AD4D, 0x79F18185, 0x6414310C, 0x0C171DC4, 0xB412689C, 0xDC114454,
+ 0x1382F328, 0x7B81DFE0, 0xC384AAB8, 0xAB878670, 0xB66236F9, 0xDE611A31, 0x66646F69, 0x0E6743A1,
+ 0x5DAF0E7B, 0x35AC22B3, 0x8DA957EB, 0xE5AA7B23, 0xF84FCBAA, 0x904CE762, 0x2849923A, 0x404ABEF2,
+ 0xB2828A33, 0xDA81A6FB, 0x6284D3A3, 0x0A87FF6B, 0x17624FE2, 0x7F61632A, 0xC7641672, 0xAF673ABA,
+ 0xFCAF7760, 0x94AC5BA8, 0x2CA92EF0, 0x44AA0238, 0x594FB2B1, 0x314C9E79, 0x8949EB21, 0xE14AC7E9,
+ 0x2ED97095, 0x46DA5C5D, 0xFEDF2905, 0x96DC05CD, 0x8B39B544, 0xE33A998C, 0x5B3FECD4, 0x333CC01C,
+ 0x60F48DC6, 0x08F7A10E, 0xB0F2D456, 0xD8F1F89E, 0xC5144817, 0xAD1764DF, 0x15121187, 0x7D113D4F
+},
+{
+ 0x00000000, 0x493C7D27, 0x9278FA4E, 0xDB448769, 0x211D826D, 0x6821FF4A, 0xB3657823, 0xFA590504,
+ 0x423B04DA, 0x0B0779FD, 0xD043FE94, 0x997F83B3, 0x632686B7, 0x2A1AFB90, 0xF15E7CF9, 0xB86201DE,
+ 0x847609B4, 0xCD4A7493, 0x160EF3FA, 0x5F328EDD, 0xA56B8BD9, 0xEC57F6FE, 0x37137197, 0x7E2F0CB0,
+ 0xC64D0D6E, 0x8F717049, 0x5435F720, 0x1D098A07, 0xE7508F03, 0xAE6CF224, 0x7528754D, 0x3C14086A,
+ 0x0D006599, 0x443C18BE, 0x9F789FD7, 0xD644E2F0, 0x2C1DE7F4, 0x65219AD3, 0xBE651DBA, 0xF759609D,
+ 0x4F3B6143, 0x06071C64, 0xDD439B0D, 0x947FE62A, 0x6E26E32E, 0x271A9E09, 0xFC5E1960, 0xB5626447,
+ 0x89766C2D, 0xC04A110A, 0x1B0E9663, 0x5232EB44, 0xA86BEE40, 0xE1579367, 0x3A13140E, 0x732F6929,
+ 0xCB4D68F7, 0x827115D0, 0x593592B9, 0x1009EF9E, 0xEA50EA9A, 0xA36C97BD, 0x782810D4, 0x31146DF3,
+ 0x1A00CB32, 0x533CB615, 0x8878317C, 0xC1444C5B, 0x3B1D495F, 0x72213478, 0xA965B311, 0xE059CE36,
+ 0x583BCFE8, 0x1107B2CF, 0xCA4335A6, 0x837F4881, 0x79264D85, 0x301A30A2, 0xEB5EB7CB, 0xA262CAEC,
+ 0x9E76C286, 0xD74ABFA1, 0x0C0E38C8, 0x453245EF, 0xBF6B40EB, 0xF6573DCC, 0x2D13BAA5, 0x642FC782,
+ 0xDC4DC65C, 0x9571BB7B, 0x4E353C12, 0x07094135, 0xFD504431, 0xB46C3916, 0x6F28BE7F, 0x2614C358,
+ 0x1700AEAB, 0x5E3CD38C, 0x857854E5, 0xCC4429C2, 0x361D2CC6, 0x7F2151E1, 0xA465D688, 0xED59ABAF,
+ 0x553BAA71, 0x1C07D756, 0xC743503F, 0x8E7F2D18, 0x7426281C, 0x3D1A553B, 0xE65ED252, 0xAF62AF75,
+ 0x9376A71F, 0xDA4ADA38, 0x010E5D51, 0x48322076, 0xB26B2572, 0xFB575855, 0x2013DF3C, 0x692FA21B,
+ 0xD14DA3C5, 0x9871DEE2, 0x4335598B, 0x0A0924AC, 0xF05021A8, 0xB96C5C8F, 0x6228DBE6, 0x2B14A6C1,
+ 0x34019664, 0x7D3DEB43, 0xA6796C2A, 0xEF45110D, 0x151C1409, 0x5C20692E, 0x8764EE47, 0xCE589360,
+ 0x763A92BE, 0x3F06EF99, 0xE44268F0, 0xAD7E15D7, 0x572710D3, 0x1E1B6DF4, 0xC55FEA9D, 0x8C6397BA,
+ 0xB0779FD0, 0xF94BE2F7, 0x220F659E, 0x6B3318B9, 0x916A1DBD, 0xD856609A, 0x0312E7F3, 0x4A2E9AD4,
+ 0xF24C9B0A, 0xBB70E62D, 0x60346144, 0x29081C63, 0xD3511967, 0x9A6D6440, 0x4129E329, 0x08159E0E,
+ 0x3901F3FD, 0x703D8EDA, 0xAB7909B3, 0xE2457494, 0x181C7190, 0x51200CB7, 0x8A648BDE, 0xC358F6F9,
+ 0x7B3AF727, 0x32068A00, 0xE9420D69, 0xA07E704E, 0x5A27754A, 0x131B086D, 0xC85F8F04, 0x8163F223,
+ 0xBD77FA49, 0xF44B876E, 0x2F0F0007, 0x66337D20, 0x9C6A7824, 0xD5560503, 0x0E12826A, 0x472EFF4D,
+ 0xFF4CFE93, 0xB67083B4, 0x6D3404DD, 0x240879FA, 0xDE517CFE, 0x976D01D9, 0x4C2986B0, 0x0515FB97,
+ 0x2E015D56, 0x673D2071, 0xBC79A718, 0xF545DA3F, 0x0F1CDF3B, 0x4620A21C, 0x9D642575, 0xD4585852,
+ 0x6C3A598C, 0x250624AB, 0xFE42A3C2, 0xB77EDEE5, 0x4D27DBE1, 0x041BA6C6, 0xDF5F21AF, 0x96635C88,
+ 0xAA7754E2, 0xE34B29C5, 0x380FAEAC, 0x7133D38B, 0x8B6AD68F, 0xC256ABA8, 0x19122CC1, 0x502E51E6,
+ 0xE84C5038, 0xA1702D1F, 0x7A34AA76, 0x3308D751, 0xC951D255, 0x806DAF72, 0x5B29281B, 0x1215553C,
+ 0x230138CF, 0x6A3D45E8, 0xB179C281, 0xF845BFA6, 0x021CBAA2, 0x4B20C785, 0x906440EC, 0xD9583DCB,
+ 0x613A3C15, 0x28064132, 0xF342C65B, 0xBA7EBB7C, 0x4027BE78, 0x091BC35F, 0xD25F4436, 0x9B633911,
+ 0xA777317B, 0xEE4B4C5C, 0x350FCB35, 0x7C33B612, 0x866AB316, 0xCF56CE31, 0x14124958, 0x5D2E347F,
+ 0xE54C35A1, 0xAC704886, 0x7734CFEF, 0x3E08B2C8, 0xC451B7CC, 0x8D6DCAEB, 0x56294D82, 0x1F1530A5
+}};
+
+#define CRC32_UPD(crc, n) \
+	(crc32c_tables[(n)][(crc) & 0xFF] ^ \
+	 crc32c_tables[(n)-1][((crc) >> 8) & 0xFF])
+
+static inline uint32_t
+crc32c_1word(uint32_t data, uint32_t init_val)
+{
+	uint32_t crc, term1, term2;
+	crc = init_val;
+	crc ^= data;
+
+	term1 = CRC32_UPD(crc, 3);
+	term2 = crc >> 16;
+	crc = term1 ^ CRC32_UPD(term2, 1);
+
+	return crc;
+}
+
+static inline uint32_t
+crc32c_2words(uint64_t data, uint32_t init_val)
+{
+	union {
+		uint64_t u64;
+		uint32_t u32[2];
+	} d;
+	d.u64 = data;
+
+	uint32_t crc, term1, term2;
+
+	crc = init_val;
+	crc ^= d.u32[0];
+
+	term1 = CRC32_UPD(crc, 7);
+	term2 = crc >> 16;
+	crc = term1 ^ CRC32_UPD(term2, 5);
+	term1 = CRC32_UPD(d.u32[1], 3);
+	term2 = d.u32[1] >> 16;
+	crc ^= term1 ^ CRC32_UPD(term2, 1);
+
+	return crc;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
  *
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v4 2/5] hash: add new rte_hash_crc_8byte call
  2014-11-18 14:03 ` [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
@ 2014-11-18 14:03   ` Yerden Zhumabekov
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 14:03 UTC (permalink / raw)
  To: dev

SSE4.2 provides _mm_crc32_u64 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 4d7532a..15f687a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -380,6 +380,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }
 
 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+	return _mm_crc32_u64(init_val, data);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 14:03 ` [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 2/5] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
@ 2014-11-18 14:03   ` Yerden Zhumabekov
  2014-11-18 14:41     ` Neil Horman
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 4/5] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
  2014-11-18 14:05   ` [dpdk-dev] [PATCH v4 5/5] test: remove redundant compile checks Yerden Zhumabekov
  4 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 14:03 UTC (permalink / raw)
  To: dev

Initially, SSE4.2 support is detected via CPUID instruction.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default. If it's
not available, fall back to sw implementation.

Best available algorithm is detected upon application startup
through the constructor function rte_hash_crc_try_sse442().

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   53 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 15f687a..332ed99 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,11 @@ extern "C" {
 #endif
 
 #include <stdint.h>
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <nmmintrin.h>
+#endif
+#include <rte_cpuflags.h>
+#include <rte_branch_prediction.h>
 
 /* Lookup tables for software implementation of CRC32C */
 static uint32_t crc32c_tables[8][256] = {{
@@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 	return crc;
 }
 
+enum crc32_alg_t {
+	CRC32_SW = 0,
+	CRC32_SSE42
+};
+
+static enum crc32_alg_t crc32_alg;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param flag
+ *   unsigned integer flag
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
+ */
+static inline void
+rte_hash_crc_set_alg(enum crc32_alg_t alg)
+{
+	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
+	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
+	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
+}
+
+/* Best available algorithm is detected via CPUID instruction */
+static inline void __attribute__((constructor))
+rte_hash_crc_try_sse42(void)
+{
+	rte_hash_crc_set_alg(CRC32_SSE42);
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	return _mm_crc32_u32(init_val, data);
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+	if (likely(crc32_alg == CRC32_SSE42))
+		return _mm_crc32_u32(init_val, data);
+#endif
+
+	return crc32c_1word(data, init_val);
 }
 
 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -392,7 +436,12 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	return _mm_crc32_u64(init_val, data);
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+	if (likely(crc32_alg == CRC32_SSE42))
+		return _mm_crc32_u64(init_val, data);
+#endif
+
+	return crc32c_2words(data, init_val);
 }
 
 /**
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v4 4/5] hash: rte_hash_crc() slices data into 8-byte pieces
  2014-11-18 14:03 ` [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                     ` (2 preceding siblings ...)
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
@ 2014-11-18 14:03   ` Yerden Zhumabekov
  2014-11-18 14:05   ` [dpdk-dev] [PATCH v4 5/5] test: remove redundant compile checks Yerden Zhumabekov
  4 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 14:03 UTC (permalink / raw)
  To: dev

Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 332ed99..e7819f3 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -445,7 +445,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }
 
 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -460,23 +460,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
 	unsigned i;
-	uint32_t temp = 0;
-	const uint32_t *p32 = (const uint32_t *)data;
+	uint64_t temp = 0;
+	const uint64_t *p64 = (const uint64_t *)data;
 
-	for (i = 0; i < data_len / 4; i++) {
-		init_val = rte_hash_crc_4byte(*p32++, init_val);
+	for (i = 0; i < data_len / 8; i++) {
+		init_val = rte_hash_crc_8byte(*p64++, init_val);
 	}
 
-	switch (3 - (data_len & 0x03)) {
+	switch (7 - (data_len & 0x07)) {
 	case 0:
-		temp |= *((const uint8_t *)p32 + 2) << 16;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
 		/* Fallthrough */
 	case 1:
-		temp |= *((const uint8_t *)p32 + 1) << 8;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
 		/* Fallthrough */
 	case 2:
-		temp |= *((const uint8_t *)p32);
+		temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+		temp |= *((const uint32_t *)p64);
+		init_val = rte_hash_crc_8byte(temp, init_val);
+		break;
+	case 3:
+		init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+		break;
+	case 4:
+		temp |= *((const uint8_t *)p64 + 2) << 16;
+		/* Fallthrough */
+	case 5:
+		temp |= *((const uint8_t *)p64 + 1) << 8;
+		/* Fallthrough */
+	case 6:
+		temp |= *((const uint8_t *)p64);
 		init_val = rte_hash_crc_4byte(temp, init_val);
+		/* Fallthrough */
 	default:
 		break;
 	}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v4 5/5] test: remove redundant compile checks
  2014-11-18 14:03 ` [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                     ` (3 preceding siblings ...)
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 4/5] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
@ 2014-11-18 14:05   ` Yerden Zhumabekov
  4 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 14:05 UTC (permalink / raw)
  To: dev

Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 app/test/test_hash.c      |    7 -------
 app/test/test_hash_perf.c |   11 -----------
 2 files changed, 18 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 178ec3f..76b1b8f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -55,10 +55,7 @@
 #include <rte_hash.h>
 #include <rte_fbk_hash.h>
 #include <rte_jhash.h>
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <rte_hash_crc.h>
-#endif
 
 /*******************************************************************************
  * Hash function performance test configuration section. Each performance test
@@ -67,11 +64,7 @@
  * The five arrays below control what tests are performed. Every combination
  * from the array entries is tested.
  */
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64};
 /******************************************************************************/
diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c
index be34957..05a88ec 100644
--- a/app/test/test_hash_perf.c
+++ b/app/test/test_hash_perf.c
@@ -56,10 +56,7 @@
 #include <rte_hash.h>
 #include <rte_fbk_hash.h>
 #include <rte_jhash.h>
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <rte_hash_crc.h>
-#endif
 
 /* Types of hash table performance test that can be performed */
 enum hash_test_t {
@@ -97,11 +94,7 @@ struct tbl_perf_test_params {
  */
 #define HASHTEST_ITERATIONS 1000000
 
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64};
 /******************************************************************************/
@@ -243,7 +236,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {       LOOKUP,  ITERATIONS,  1048576,           4,      64,    rte_jhash,   0},
 {       LOOKUP,  ITERATIONS,  1048576,           8,      64,    rte_jhash,   0},
 {       LOOKUP,  ITERATIONS,  1048576,          16,      64,    rte_jhash,   0},
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 /* Small table, add */
 /*  Test type | Iterations | Entries | BucketSize | KeyLen |    HashFunc | InitVal */
 { ADD_ON_EMPTY,        1024,     1024,           1,      16, rte_hash_crc,   0},
@@ -376,7 +368,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {       LOOKUP,  ITERATIONS,  1048576,           4,      64, rte_hash_crc,   0},
 {       LOOKUP,  ITERATIONS,  1048576,           8,      64, rte_hash_crc,   0},
 {       LOOKUP,  ITERATIONS,  1048576,          16,      64, rte_hash_crc,   0},
-#endif
 };
 
 /******************************************************************************/
@@ -423,10 +414,8 @@ static const char *get_hash_name(rte_hash_function f)
 	if (f == rte_jhash)
 		return "jhash";
 
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 	if (f == rte_hash_crc)
 		return "rte_hash_crc";
-#endif
 
 	return "UnknownHash";
 }
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
@ 2014-11-18 14:41     ` Neil Horman
  2014-11-18 15:06       ` Yerden Zhumabekov
  0 siblings, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-18 14:41 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> Initially, SSE4.2 support is detected via CPUID instruction.
> 
> Added rte_hash_crc_set_alg() function to detect and set CRC32
> implementation if necessary. SSE4.2 is allowed by default. If it's
> not available, fall back to sw implementation.
> 
> Best available algorithm is detected upon application startup
> through the constructor function rte_hash_crc_try_sse442().
> 
> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> ---
>  lib/librte_hash/rte_hash_crc.h |   53 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 51 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index 15f687a..332ed99 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -45,7 +45,11 @@ extern "C" {
>  #endif
>  
>  #include <stdint.h>
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>  #include <nmmintrin.h>
> +#endif
> +#include <rte_cpuflags.h>
> +#include <rte_branch_prediction.h>
>  
>  /* Lookup tables for software implementation of CRC32C */
>  static uint32_t crc32c_tables[8][256] = {{
> @@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>  	return crc;
>  }
>  
> +enum crc32_alg_t {
> +	CRC32_SW = 0,
> +	CRC32_SSE42
> +};
> +
> +static enum crc32_alg_t crc32_alg;
> +
> +/**
> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> + * calculation.
> + *
> + * @param flag
> + *   unsigned integer flag
> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> + */
> +static inline void
> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> +{
> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> +	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
> +}
> +
> +/* Best available algorithm is detected via CPUID instruction */
> +static inline void __attribute__((constructor))
> +rte_hash_crc_try_sse42(void)
> +{
> +	rte_hash_crc_set_alg(CRC32_SSE42);
> +}
> +
>  /**
>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  {
> -	return _mm_crc32_u32(init_val, data);
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +	if (likely(crc32_alg == CRC32_SSE42))
> +		return _mm_crc32_u32(init_val, data);
> +#endif

you don't really need these ifdefs here anymore given that you have a
constructor to do the algorithm selection.  In fact you need to remove them, in
the event you build on a system that doesn't support SSE42, but run on a system
that does.

Neil

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 14:41     ` Neil Horman
@ 2014-11-18 15:06       ` Yerden Zhumabekov
  2014-11-18 16:00         ` Neil Horman
  0 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 15:06 UTC (permalink / raw)
  To: Neil Horman, dev


18.11.2014 20:41, Neil Horman пишет:
> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
>> Initially, SSE4.2 support is detected via CPUID instruction.
>>
>> Added rte_hash_crc_set_alg() function to detect and set CRC32
>> implementation if necessary. SSE4.2 is allowed by default. If it's
>> not available, fall back to sw implementation.
>>
>> Best available algorithm is detected upon application startup
>> through the constructor function rte_hash_crc_try_sse442().
>>
>> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
>> ---
>>  lib/librte_hash/rte_hash_crc.h |   53 ++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 51 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
>> index 15f687a..332ed99 100644
>> --- a/lib/librte_hash/rte_hash_crc.h
>> +++ b/lib/librte_hash/rte_hash_crc.h
>> @@ -45,7 +45,11 @@ extern "C" {
>>  #endif
>>  
>>  #include <stdint.h>
>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>>  #include <nmmintrin.h>
>> +#endif
>> +#include <rte_cpuflags.h>
>> +#include <rte_branch_prediction.h>
>>  
>>  /* Lookup tables for software implementation of CRC32C */
>>  static uint32_t crc32c_tables[8][256] = {{
>> @@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>>  	return crc;
>>  }
>>  
>> +enum crc32_alg_t {
>> +	CRC32_SW = 0,
>> +	CRC32_SSE42
>> +};
>> +
>> +static enum crc32_alg_t crc32_alg;
>> +
>> +/**
>> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
>> + * calculation.
>> + *
>> + * @param flag
>> + *   unsigned integer flag
>> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
>> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
>> + */
>> +static inline void
>> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
>> +{
>> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
>> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
>> +	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
>> +}
>> +
>> +/* Best available algorithm is detected via CPUID instruction */
>> +static inline void __attribute__((constructor))
>> +rte_hash_crc_try_sse42(void)
>> +{
>> +	rte_hash_crc_set_alg(CRC32_SSE42);
>> +}
>> +
>>  /**
>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
>> + * Fall back to software crc32 implementation in case SSE4.2 is
>> + * not supported
>>   *
>>   * @param data
>>   *   Data to perform hash on.
>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>>  static inline uint32_t
>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>>  {
>> -	return _mm_crc32_u32(init_val, data);
>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>> +	if (likely(crc32_alg == CRC32_SSE42))
>> +		return _mm_crc32_u32(init_val, data);
>> +#endif
> you don't really need these ifdefs here anymore given that you have a
> constructor to do the algorithm selection.  In fact you need to remove them, in
> the event you build on a system that doesn't support SSE42, but run on a system
> that does.

Originally, I thought so as well. I wrote the code without these ifdefs,
but it didn't compile on my machine which doesn't support SSE4.2. Error
was triggered by nmmintrin.h which has a check for respective GCC
extension. So I think these ifdefs are indeed required.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 15:06       ` Yerden Zhumabekov
@ 2014-11-18 16:00         ` Neil Horman
  2014-11-18 16:04           ` Bruce Richardson
  2014-11-18 17:13           ` Yerden Zhumabekov
  0 siblings, 2 replies; 98+ messages in thread
From: Neil Horman @ 2014-11-18 16:00 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> 
> 18.11.2014 20:41, Neil Horman пишет:
> > On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> >> Initially, SSE4.2 support is detected via CPUID instruction.
> >>
> >> Added rte_hash_crc_set_alg() function to detect and set CRC32
> >> implementation if necessary. SSE4.2 is allowed by default. If it's
> >> not available, fall back to sw implementation.
> >>
> >> Best available algorithm is detected upon application startup
> >> through the constructor function rte_hash_crc_try_sse442().
> >>
> >> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> >> ---
> >>  lib/librte_hash/rte_hash_crc.h |   53 ++++++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 51 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> >> index 15f687a..332ed99 100644
> >> --- a/lib/librte_hash/rte_hash_crc.h
> >> +++ b/lib/librte_hash/rte_hash_crc.h
> >> @@ -45,7 +45,11 @@ extern "C" {
> >>  #endif
> >>  
> >>  #include <stdint.h>
> >> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> >>  #include <nmmintrin.h>
> >> +#endif
> >> +#include <rte_cpuflags.h>
> >> +#include <rte_branch_prediction.h>
> >>  
> >>  /* Lookup tables for software implementation of CRC32C */
> >>  static uint32_t crc32c_tables[8][256] = {{
> >> @@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> >>  	return crc;
> >>  }
> >>  
> >> +enum crc32_alg_t {
> >> +	CRC32_SW = 0,
> >> +	CRC32_SSE42
> >> +};
> >> +
> >> +static enum crc32_alg_t crc32_alg;
> >> +
> >> +/**
> >> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> >> + * calculation.
> >> + *
> >> + * @param flag
> >> + *   unsigned integer flag
> >> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> >> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> >> + */
> >> +static inline void
> >> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> >> +{
> >> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> >> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> >> +	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
> >> +}
> >> +
> >> +/* Best available algorithm is detected via CPUID instruction */
> >> +static inline void __attribute__((constructor))
> >> +rte_hash_crc_try_sse42(void)
> >> +{
> >> +	rte_hash_crc_set_alg(CRC32_SSE42);
> >> +}
> >> +
> >>  /**
> >>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> >> + * Fall back to software crc32 implementation in case SSE4.2 is
> >> + * not supported
> >>   *
> >>   * @param data
> >>   *   Data to perform hash on.
> >> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> >>  static inline uint32_t
> >>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> >>  {
> >> -	return _mm_crc32_u32(init_val, data);
> >> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> >> +	if (likely(crc32_alg == CRC32_SSE42))
> >> +		return _mm_crc32_u32(init_val, data);
> >> +#endif
> > you don't really need these ifdefs here anymore given that you have a
> > constructor to do the algorithm selection.  In fact you need to remove them, in
> > the event you build on a system that doesn't support SSE42, but run on a system
> > that does.
> 
> Originally, I thought so as well. I wrote the code without these ifdefs,
> but it didn't compile on my machine which doesn't support SSE4.2. Error
> was triggered by nmmintrin.h which has a check for respective GCC
> extension. So I think these ifdefs are indeed required.
> 
You need to edit the makefile so that the compiler gets passed the option
-msse42.  That way it will know to emit sse42 instructions. It will also allow
you to remove the ifdef from the include file
Neil

> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 16:00         ` Neil Horman
@ 2014-11-18 16:04           ` Bruce Richardson
  2014-11-18 16:08             ` Bruce Richardson
  2014-11-18 16:38             ` Neil Horman
  2014-11-18 17:13           ` Yerden Zhumabekov
  1 sibling, 2 replies; 98+ messages in thread
From: Bruce Richardson @ 2014-11-18 16:04 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On Tue, Nov 18, 2014 at 11:00:05AM -0500, Neil Horman wrote:
> On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > 
> > 18.11.2014 20:41, Neil Horman пишет:
> > > On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > >> Initially, SSE4.2 support is detected via CPUID instruction.
> > >>
> > >> Added rte_hash_crc_set_alg() function to detect and set CRC32
> > >> implementation if necessary. SSE4.2 is allowed by default. If it's
> > >> not available, fall back to sw implementation.
> > >>
> > >> Best available algorithm is detected upon application startup
> > >> through the constructor function rte_hash_crc_try_sse442().
> > >>
> > >> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> > >> ---
> > >>  lib/librte_hash/rte_hash_crc.h |   53 ++++++++++++++++++++++++++++++++++++++--
> > >>  1 file changed, 51 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> > >> index 15f687a..332ed99 100644
> > >> --- a/lib/librte_hash/rte_hash_crc.h
> > >> +++ b/lib/librte_hash/rte_hash_crc.h
> > >> @@ -45,7 +45,11 @@ extern "C" {
> > >>  #endif
> > >>  
> > >>  #include <stdint.h>
> > >> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > >>  #include <nmmintrin.h>
> > >> +#endif
> > >> +#include <rte_cpuflags.h>
> > >> +#include <rte_branch_prediction.h>
> > >>  
> > >>  /* Lookup tables for software implementation of CRC32C */
> > >>  static uint32_t crc32c_tables[8][256] = {{
> > >> @@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > >>  	return crc;
> > >>  }
> > >>  
> > >> +enum crc32_alg_t {
> > >> +	CRC32_SW = 0,
> > >> +	CRC32_SSE42
> > >> +};
> > >> +
> > >> +static enum crc32_alg_t crc32_alg;
> > >> +
> > >> +/**
> > >> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> > >> + * calculation.
> > >> + *
> > >> + * @param flag
> > >> + *   unsigned integer flag
> > >> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> > >> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> > >> + */
> > >> +static inline void
> > >> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> > >> +{
> > >> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> > >> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> > >> +	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
> > >> +}
> > >> +
> > >> +/* Best available algorithm is detected via CPUID instruction */
> > >> +static inline void __attribute__((constructor))
> > >> +rte_hash_crc_try_sse42(void)
> > >> +{
> > >> +	rte_hash_crc_set_alg(CRC32_SSE42);
> > >> +}
> > >> +
> > >>  /**
> > >>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > >> + * Fall back to software crc32 implementation in case SSE4.2 is
> > >> + * not supported
> > >>   *
> > >>   * @param data
> > >>   *   Data to perform hash on.
> > >> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > >>  static inline uint32_t
> > >>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > >>  {
> > >> -	return _mm_crc32_u32(init_val, data);
> > >> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > >> +	if (likely(crc32_alg == CRC32_SSE42))
> > >> +		return _mm_crc32_u32(init_val, data);
> > >> +#endif
> > > you don't really need these ifdefs here anymore given that you have a
> > > constructor to do the algorithm selection.  In fact you need to remove them, in
> > > the event you build on a system that doesn't support SSE42, but run on a system
> > > that does.
> > 
> > Originally, I thought so as well. I wrote the code without these ifdefs,
> > but it didn't compile on my machine which doesn't support SSE4.2. Error
> > was triggered by nmmintrin.h which has a check for respective GCC
> > extension. So I think these ifdefs are indeed required.
> > 
> You need to edit the makefile so that the compiler gets passed the option
> -msse42.  That way it will know to emit sse42 instructions. It will also allow
> you to remove the ifdef from the include file
> Neil
> 

Question: does that then limit the compiler to emitting sse42 instructions? If,
for instance, the rest of DPDK is being compiled for a target supporting AVX2,
does that flag then prevent the compiler from auto-vectorising using SSE?

/Bruce

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 16:04           ` Bruce Richardson
@ 2014-11-18 16:08             ` Bruce Richardson
  2014-11-18 16:38             ` Neil Horman
  1 sibling, 0 replies; 98+ messages in thread
From: Bruce Richardson @ 2014-11-18 16:08 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On Tue, Nov 18, 2014 at 04:04:26PM +0000, Bruce Richardson wrote:
> On Tue, Nov 18, 2014 at 11:00:05AM -0500, Neil Horman wrote:
> > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > 
> > > 18.11.2014 20:41, Neil Horman пишет:
> > > > On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > >> Initially, SSE4.2 support is detected via CPUID instruction.
> > > >>
> > > >> Added rte_hash_crc_set_alg() function to detect and set CRC32
> > > >> implementation if necessary. SSE4.2 is allowed by default. If it's
> > > >> not available, fall back to sw implementation.
> > > >>
> > > >> Best available algorithm is detected upon application startup
> > > >> through the constructor function rte_hash_crc_try_sse442().
> > > >>
> > > >> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> > > >> ---
> > > >>  lib/librte_hash/rte_hash_crc.h |   53 ++++++++++++++++++++++++++++++++++++++--
> > > >>  1 file changed, 51 insertions(+), 2 deletions(-)
> > > >>
> > > >> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> > > >> index 15f687a..332ed99 100644
> > > >> --- a/lib/librte_hash/rte_hash_crc.h
> > > >> +++ b/lib/librte_hash/rte_hash_crc.h
> > > >> @@ -45,7 +45,11 @@ extern "C" {
> > > >>  #endif
> > > >>  
> > > >>  #include <stdint.h>
> > > >> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > >>  #include <nmmintrin.h>
> > > >> +#endif
> > > >> +#include <rte_cpuflags.h>
> > > >> +#include <rte_branch_prediction.h>
> > > >>  
> > > >>  /* Lookup tables for software implementation of CRC32C */
> > > >>  static uint32_t crc32c_tables[8][256] = {{
> > > >> @@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > >>  	return crc;
> > > >>  }
> > > >>  
> > > >> +enum crc32_alg_t {
> > > >> +	CRC32_SW = 0,
> > > >> +	CRC32_SSE42
> > > >> +};
> > > >> +
> > > >> +static enum crc32_alg_t crc32_alg;
> > > >> +
> > > >> +/**
> > > >> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> > > >> + * calculation.
> > > >> + *
> > > >> + * @param flag
> > > >> + *   unsigned integer flag
> > > >> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> > > >> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> > > >> + */
> > > >> +static inline void
> > > >> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> > > >> +{
> > > >> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> > > >> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> > > >> +	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
> > > >> +}
> > > >> +
> > > >> +/* Best available algorithm is detected via CPUID instruction */
> > > >> +static inline void __attribute__((constructor))
> > > >> +rte_hash_crc_try_sse42(void)
> > > >> +{
> > > >> +	rte_hash_crc_set_alg(CRC32_SSE42);
> > > >> +}
> > > >> +
> > > >>  /**
> > > >>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > >> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > >> + * not supported
> > > >>   *
> > > >>   * @param data
> > > >>   *   Data to perform hash on.
> > > >> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > >>  static inline uint32_t
> > > >>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > >>  {
> > > >> -	return _mm_crc32_u32(init_val, data);
> > > >> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > >> +	if (likely(crc32_alg == CRC32_SSE42))
> > > >> +		return _mm_crc32_u32(init_val, data);
> > > >> +#endif
> > > > you don't really need these ifdefs here anymore given that you have a
> > > > constructor to do the algorithm selection.  In fact you need to remove them, in
> > > > the event you build on a system that doesn't support SSE42, but run on a system
> > > > that does.
> > > 
> > > Originally, I thought so as well. I wrote the code without these ifdefs,
> > > but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > was triggered by nmmintrin.h which has a check for respective GCC
> > > extension. So I think these ifdefs are indeed required.
> > > 
> > You need to edit the makefile so that the compiler gets passed the option
> > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > you to remove the ifdef from the include file
> > Neil
> > 
> 
Email V2, with fix for the last word:

Question: does that then limit the compiler to emitting sse42 instructions? If,
for instance, the rest of DPDK is being compiled for a target supporting AVX2,
does that flag then prevent the compiler from auto-vectorising using AVX?

/Bruce

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 16:04           ` Bruce Richardson
  2014-11-18 16:08             ` Bruce Richardson
@ 2014-11-18 16:38             ` Neil Horman
  1 sibling, 0 replies; 98+ messages in thread
From: Neil Horman @ 2014-11-18 16:38 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Tue, Nov 18, 2014 at 04:04:26PM +0000, Bruce Richardson wrote:
> On Tue, Nov 18, 2014 at 11:00:05AM -0500, Neil Horman wrote:
> > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > 
> > > 18.11.2014 20:41, Neil Horman пишет:
> > > > On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > >> Initially, SSE4.2 support is detected via CPUID instruction.
> > > >>
> > > >> Added rte_hash_crc_set_alg() function to detect and set CRC32
> > > >> implementation if necessary. SSE4.2 is allowed by default. If it's
> > > >> not available, fall back to sw implementation.
> > > >>
> > > >> Best available algorithm is detected upon application startup
> > > >> through the constructor function rte_hash_crc_try_sse442().
> > > >>
> > > >> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> > > >> ---
> > > >>  lib/librte_hash/rte_hash_crc.h |   53 ++++++++++++++++++++++++++++++++++++++--
> > > >>  1 file changed, 51 insertions(+), 2 deletions(-)
> > > >>
> > > >> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> > > >> index 15f687a..332ed99 100644
> > > >> --- a/lib/librte_hash/rte_hash_crc.h
> > > >> +++ b/lib/librte_hash/rte_hash_crc.h
> > > >> @@ -45,7 +45,11 @@ extern "C" {
> > > >>  #endif
> > > >>  
> > > >>  #include <stdint.h>
> > > >> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > >>  #include <nmmintrin.h>
> > > >> +#endif
> > > >> +#include <rte_cpuflags.h>
> > > >> +#include <rte_branch_prediction.h>
> > > >>  
> > > >>  /* Lookup tables for software implementation of CRC32C */
> > > >>  static uint32_t crc32c_tables[8][256] = {{
> > > >> @@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > >>  	return crc;
> > > >>  }
> > > >>  
> > > >> +enum crc32_alg_t {
> > > >> +	CRC32_SW = 0,
> > > >> +	CRC32_SSE42
> > > >> +};
> > > >> +
> > > >> +static enum crc32_alg_t crc32_alg;
> > > >> +
> > > >> +/**
> > > >> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> > > >> + * calculation.
> > > >> + *
> > > >> + * @param flag
> > > >> + *   unsigned integer flag
> > > >> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> > > >> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> > > >> + */
> > > >> +static inline void
> > > >> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> > > >> +{
> > > >> +	int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> > > >> +	enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> > > >> +	crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
> > > >> +}
> > > >> +
> > > >> +/* Best available algorithm is detected via CPUID instruction */
> > > >> +static inline void __attribute__((constructor))
> > > >> +rte_hash_crc_try_sse42(void)
> > > >> +{
> > > >> +	rte_hash_crc_set_alg(CRC32_SSE42);
> > > >> +}
> > > >> +
> > > >>  /**
> > > >>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > >> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > >> + * not supported
> > > >>   *
> > > >>   * @param data
> > > >>   *   Data to perform hash on.
> > > >> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > >>  static inline uint32_t
> > > >>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > >>  {
> > > >> -	return _mm_crc32_u32(init_val, data);
> > > >> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > >> +	if (likely(crc32_alg == CRC32_SSE42))
> > > >> +		return _mm_crc32_u32(init_val, data);
> > > >> +#endif
> > > > you don't really need these ifdefs here anymore given that you have a
> > > > constructor to do the algorithm selection.  In fact you need to remove them, in
> > > > the event you build on a system that doesn't support SSE42, but run on a system
> > > > that does.
> > > 
> > > Originally, I thought so as well. I wrote the code without these ifdefs,
> > > but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > was triggered by nmmintrin.h which has a check for respective GCC
> > > extension. So I think these ifdefs are indeed required.
> > > 
> > You need to edit the makefile so that the compiler gets passed the option
> > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > you to remove the ifdef from the include file
> > Neil
> > 
> 
> Question: does that then limit the compiler to emitting sse42 instructions? If,
> for instance, the rest of DPDK is being compiled for a target supporting AVX2,
> does that flag then prevent the compiler from auto-vectorising using SSE?
> 
It should be a last-option-wins model I think (that is to say if the command
line specifies -msse42 and -march=core-avx2, then you will get both instructions,
since core-avx2 supports sse42.  

Neil

> /Bruce
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 16:00         ` Neil Horman
  2014-11-18 16:04           ` Bruce Richardson
@ 2014-11-18 17:13           ` Yerden Zhumabekov
  2014-11-18 17:29             ` Wang, Shawn
  2014-11-18 17:46             ` Neil Horman
  1 sibling, 2 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 17:13 UTC (permalink / raw)
  To: Neil Horman, Richardson, Bruce, dev


18.11.2014 22:00, Neil Horman пишет:
> On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
>> 18.11.2014 20:41, Neil Horman пишет:
>>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
>>>>  /**
>>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
>>>> + * Fall back to software crc32 implementation in case SSE4.2 is
>>>> + * not supported
>>>>   *
>>>>   * @param data
>>>>   *   Data to perform hash on.
>>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>>>>  static inline uint32_t
>>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>>>>  {
>>>> -	return _mm_crc32_u32(init_val, data);
>>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>>>> +	if (likely(crc32_alg == CRC32_SSE42))
>>>> +		return _mm_crc32_u32(init_val, data);
>>>> +#endif
>>> you don't really need these ifdefs here anymore given that you have a
>>> constructor to do the algorithm selection.  In fact you need to remove them, in
>>> the event you build on a system that doesn't support SSE42, but run on a system
>>> that does.
>> Originally, I thought so as well. I wrote the code without these ifdefs,
>> but it didn't compile on my machine which doesn't support SSE4.2. Error
>> was triggered by nmmintrin.h which has a check for respective GCC
>> extension. So I think these ifdefs are indeed required.
>>
> You need to edit the makefile so that the compiler gets passed the option
> -msse42.  That way it will know to emit sse42 instructions. It will also allow
> you to remove the ifdef from the include file

In this case, I guess there are two options:
1) modify all makefiles which use librte_hash
2) move all function bodies from rte_hash_crc.h to separate module,
leaving prototype definitions there only.

Everybody's up for the second option? :)

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 17:13           ` Yerden Zhumabekov
@ 2014-11-18 17:29             ` Wang, Shawn
  2014-11-19  4:07               ` Yerden Zhumabekov
  2014-11-18 17:46             ` Neil Horman
  1 sibling, 1 reply; 98+ messages in thread
From: Wang, Shawn @ 2014-11-18 17:29 UTC (permalink / raw)
  To: Yerden Zhumabekov, Neil Horman, Richardson, Bruce, dev

I have a general question about using CPUID to detect supported instruction set.
What if we are compiling the software with some old hardware which does not support SSE4.2, but run it on new hardware which does support SSE4.2. Is there still a static way to force the compiler to turn on the SSE4.2 support? 
I guess for SSE4.2, most of the CPU has support for it now. But for AVX2, this might not be the case.
________________________________________
From: dev [dev-bounces@dpdk.org] on behalf of Yerden Zhumabekov [e_zhumabekov@sts.kz]
Sent: Tuesday, November 18, 2014 9:13 AM
To: Neil Horman; Richardson, Bruce; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32     implementation

18.11.2014 22:00, Neil Horman пишет:
> On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
>> 18.11.2014 20:41, Neil Horman пишет:
>>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
>>>>  /**
>>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
>>>> + * Fall back to software crc32 implementation in case SSE4.2 is
>>>> + * not supported
>>>>   *
>>>>   * @param data
>>>>   *   Data to perform hash on.
>>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>>>>  static inline uint32_t
>>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>>>>  {
>>>> -  return _mm_crc32_u32(init_val, data);
>>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>>>> +  if (likely(crc32_alg == CRC32_SSE42))
>>>> +          return _mm_crc32_u32(init_val, data);
>>>> +#endif
>>> you don't really need these ifdefs here anymore given that you have a
>>> constructor to do the algorithm selection.  In fact you need to remove them, in
>>> the event you build on a system that doesn't support SSE42, but run on a system
>>> that does.
>> Originally, I thought so as well. I wrote the code without these ifdefs,
>> but it didn't compile on my machine which doesn't support SSE4.2. Error
>> was triggered by nmmintrin.h which has a check for respective GCC
>> extension. So I think these ifdefs are indeed required.
>>
> You need to edit the makefile so that the compiler gets passed the option
> -msse42.  That way it will know to emit sse42 instructions. It will also allow
> you to remove the ifdef from the include file

In this case, I guess there are two options:
1) modify all makefiles which use librte_hash
2) move all function bodies from rte_hash_crc.h to separate module,
leaving prototype definitions there only.

Everybody's up for the second option? :)

--
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 17:13           ` Yerden Zhumabekov
  2014-11-18 17:29             ` Wang, Shawn
@ 2014-11-18 17:46             ` Neil Horman
  2014-11-18 17:52               ` Bruce Richardson
  2014-11-18 17:58               ` Yerden Zhumabekov
  1 sibling, 2 replies; 98+ messages in thread
From: Neil Horman @ 2014-11-18 17:46 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> 
> 18.11.2014 22:00, Neil Horman пишет:
> > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> >> 18.11.2014 20:41, Neil Horman пишет:
> >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> >>>>  /**
> >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> >>>> + * not supported
> >>>>   *
> >>>>   * @param data
> >>>>   *   Data to perform hash on.
> >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> >>>>  static inline uint32_t
> >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> >>>>  {
> >>>> -	return _mm_crc32_u32(init_val, data);
> >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> >>>> +		return _mm_crc32_u32(init_val, data);
> >>>> +#endif
> >>> you don't really need these ifdefs here anymore given that you have a
> >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> >>> the event you build on a system that doesn't support SSE42, but run on a system
> >>> that does.
> >> Originally, I thought so as well. I wrote the code without these ifdefs,
> >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> >> was triggered by nmmintrin.h which has a check for respective GCC
> >> extension. So I think these ifdefs are indeed required.
> >>
> > You need to edit the makefile so that the compiler gets passed the option
> > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > you to remove the ifdef from the include file
> 
> In this case, I guess there are two options:
> 1) modify all makefiles which use librte_hash
> 2) move all function bodies from rte_hash_crc.h to separate module,
> leaving prototype definitions there only.
> 
> Everybody's up for the second option? :)
> 
Crud, you're right, I didn't think about the header inclusion issue.  Is it
worth adding the jump to enable the dynamic hash selection?
Neil

> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 17:46             ` Neil Horman
@ 2014-11-18 17:52               ` Bruce Richardson
  2014-11-18 21:36                 ` Neil Horman
  2014-11-18 17:58               ` Yerden Zhumabekov
  1 sibling, 1 reply; 98+ messages in thread
From: Bruce Richardson @ 2014-11-18 17:52 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
> On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> > 
> > 18.11.2014 22:00, Neil Horman пишет:
> > > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > >> 18.11.2014 20:41, Neil Horman пишет:
> > >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > >>>>  /**
> > >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> > >>>> + * not supported
> > >>>>   *
> > >>>>   * @param data
> > >>>>   *   Data to perform hash on.
> > >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > >>>>  static inline uint32_t
> > >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > >>>>  {
> > >>>> -	return _mm_crc32_u32(init_val, data);
> > >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> > >>>> +		return _mm_crc32_u32(init_val, data);
> > >>>> +#endif
> > >>> you don't really need these ifdefs here anymore given that you have a
> > >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> > >>> the event you build on a system that doesn't support SSE42, but run on a system
> > >>> that does.
> > >> Originally, I thought so as well. I wrote the code without these ifdefs,
> > >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> > >> was triggered by nmmintrin.h which has a check for respective GCC
> > >> extension. So I think these ifdefs are indeed required.
> > >>
> > > You need to edit the makefile so that the compiler gets passed the option
> > > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > > you to remove the ifdef from the include file
> > 
> > In this case, I guess there are two options:
> > 1) modify all makefiles which use librte_hash
> > 2) move all function bodies from rte_hash_crc.h to separate module,
> > leaving prototype definitions there only.
> > 
> > Everybody's up for the second option? :)
> > 
> Crud, you're right, I didn't think about the header inclusion issue.  Is it
> worth adding the jump to enable the dynamic hash selection?
> Neil

Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
For builds where we have hardware support confirmed at compile time, just use
the function from the header file.
Does that make sense?

/Bruce
> 
> > -- 
> > Sincerely,
> > 
> > Yerden Zhumabekov
> > State Technical Service
> > Astana, KZ
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 17:46             ` Neil Horman
  2014-11-18 17:52               ` Bruce Richardson
@ 2014-11-18 17:58               ` Yerden Zhumabekov
  1 sibling, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-18 17:58 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev


18.11.2014 23:46, Neil Horman пишет:
> On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
>> 18.11.2014 22:00, Neil Horman пишет:
>>>
>>> You need to edit the makefile so that the compiler gets passed the option
>>> -msse42.  That way it will know to emit sse42 instructions. It will also allow
>>> you to remove the ifdef from the include file
>> In this case, I guess there are two options:
>> 1) modify all makefiles which use librte_hash
>> 2) move all function bodies from rte_hash_crc.h to separate module,
>> leaving prototype definitions there only.
>>
>> Everybody's up for the second option? :)
>>
> Crud, you're right, I didn't think about the header inclusion issue.  Is it
> worth adding the jump to enable the dynamic hash selection?

If I understood you correctly - I've already added a function to
dynamically change the CRC32 implementation in the runtime,
rte_hash_crc_set_alg(). I can rework patches once again, if everybody's
fine with the separate module.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 17:52               ` Bruce Richardson
@ 2014-11-18 21:36                 ` Neil Horman
  2014-11-19  3:51                   ` Yerden Zhumabekov
  2014-11-19 10:16                   ` Bruce Richardson
  0 siblings, 2 replies; 98+ messages in thread
From: Neil Horman @ 2014-11-18 21:36 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Tue, Nov 18, 2014 at 05:52:27PM +0000, Bruce Richardson wrote:
> On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
> > On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> > > 
> > > 18.11.2014 22:00, Neil Horman пишет:
> > > > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > >> 18.11.2014 20:41, Neil Horman пишет:
> > > >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > >>>>  /**
> > > >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > >>>> + * not supported
> > > >>>>   *
> > > >>>>   * @param data
> > > >>>>   *   Data to perform hash on.
> > > >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > >>>>  static inline uint32_t
> > > >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > >>>>  {
> > > >>>> -	return _mm_crc32_u32(init_val, data);
> > > >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> > > >>>> +		return _mm_crc32_u32(init_val, data);
> > > >>>> +#endif
> > > >>> you don't really need these ifdefs here anymore given that you have a
> > > >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> > > >>> the event you build on a system that doesn't support SSE42, but run on a system
> > > >>> that does.
> > > >> Originally, I thought so as well. I wrote the code without these ifdefs,
> > > >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > >> was triggered by nmmintrin.h which has a check for respective GCC
> > > >> extension. So I think these ifdefs are indeed required.
> > > >>
> > > > You need to edit the makefile so that the compiler gets passed the option
> > > > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > > > you to remove the ifdef from the include file
> > > 
> > > In this case, I guess there are two options:
> > > 1) modify all makefiles which use librte_hash
> > > 2) move all function bodies from rte_hash_crc.h to separate module,
> > > leaving prototype definitions there only.
> > > 
> > > Everybody's up for the second option? :)
> > > 
> > Crud, you're right, I didn't think about the header inclusion issue.  Is it
> > worth adding the jump to enable the dynamic hash selection?
> > Neil
> 
> Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
> For builds where we have hardware support confirmed at compile time, just use
> the function from the header file.
> Does that make sense?
> 
I'm not certain of that, as I don't think anything can be 'confirmed' at compile
time.  I.e. just because you have sse42 at compile time doesn't guarantee you
have it at run time with a DSO.  If you have these as macros, you need to enable
sse42 whereever you include the file so that the intrinsic works properly.

an alternate option would be to not use the intrinsic, and craft some explicit
__asm__ statement that executes the right sse42 instructions.  That way the asm
is directly emitted, without requiring the -msse42 flag at all, and it will just
work in all the files that call it.

Neil

> /Bruce
> > 
> > > -- 
> > > Sincerely,
> > > 
> > > Yerden Zhumabekov
> > > State Technical Service
> > > Astana, KZ
> > > 
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 21:36                 ` Neil Horman
@ 2014-11-19  3:51                   ` Yerden Zhumabekov
  2014-11-19 10:16                   ` Bruce Richardson
  1 sibling, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-19  3:51 UTC (permalink / raw)
  To: Neil Horman, Bruce Richardson; +Cc: dev


19.11.2014 3:36, Neil Horman пишет:
> On Tue, Nov 18, 2014 at 05:52:27PM +0000, Bruce Richardson wrote:
>> On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
>>> On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
>>>> Everybody's up for the second option? :)
>>>>
>>> Crud, you're right, I didn't think about the header inclusion issue.  Is it
>>> worth adding the jump to enable the dynamic hash selection?
>>> Neil
>> Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
>> For builds where we have hardware support confirmed at compile time, just use
>> the function from the header file.
>> Does that make sense?
>>
> I'm not certain of that, as I don't think anything can be 'confirmed' at compile
> time.  I.e. just because you have sse42 at compile time doesn't guarantee you
> have it at run time with a DSO.  If you have these as macros, you need to enable
> sse42 whereever you include the file so that the intrinsic works properly.
>
> an alternate option would be to not use the intrinsic, and craft some explicit
> __asm__ statement that executes the right sse42 instructions.  That way the asm
> is directly emitted, without requiring the -msse42 flag at all, and it will just
> work in all the files that call it.

Thanks for the discussion. To summarize it with my suggestions for 'v5':
1) replace intrinsics with asm code and give up including nmmintrin.h;
2) detect arch (EM64T flag) on runtime because crc32 for 64-bit operand
doesn't work on 32-bit x86;
3) separate function prototypes (leaving them in header) and bodies, add
to SRCS in Makefile.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 17:29             ` Wang, Shawn
@ 2014-11-19  4:07               ` Yerden Zhumabekov
  0 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-19  4:07 UTC (permalink / raw)
  To: Wang, Shawn, dev


18.11.2014 23:29, Wang, Shawn пишет:
> I have a general question about using CPUID to detect supported instruction set.
> What if we are compiling the software with some old hardware which does not support SSE4.2, but run it on new hardware which does support SSE4.2. Is there still a static way to force the compiler to turn on the SSE4.2 support? 
> I guess for SSE4.2, most of the CPU has support for it now. But for AVX2, this might not be the case.
According to gcc 4.7 changes (https://gcc.gnu.org/gcc-4.7/changes.html)
they've added support for AVX2 instructions since that version.
Use -mavx2 or -march=core-avx2. The latter seems to be supported by ICC
as well, according to Google :)

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-18 21:36                 ` Neil Horman
  2014-11-19  3:51                   ` Yerden Zhumabekov
@ 2014-11-19 10:16                   ` Bruce Richardson
  2014-11-19 11:34                     ` Neil Horman
  2014-11-19 11:35                     ` Yerden Zhumabekov
  1 sibling, 2 replies; 98+ messages in thread
From: Bruce Richardson @ 2014-11-19 10:16 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
> On Tue, Nov 18, 2014 at 05:52:27PM +0000, Bruce Richardson wrote:
> > On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
> > > On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> > > > 
> > > > 18.11.2014 22:00, Neil Horman пишет:
> > > > > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > > >> 18.11.2014 20:41, Neil Horman пишет:
> > > > >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > > >>>>  /**
> > > > >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > > >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > > >>>> + * not supported
> > > > >>>>   *
> > > > >>>>   * @param data
> > > > >>>>   *   Data to perform hash on.
> > > > >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > > >>>>  static inline uint32_t
> > > > >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > > >>>>  {
> > > > >>>> -	return _mm_crc32_u32(init_val, data);
> > > > >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > > >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> > > > >>>> +		return _mm_crc32_u32(init_val, data);
> > > > >>>> +#endif
> > > > >>> you don't really need these ifdefs here anymore given that you have a
> > > > >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> > > > >>> the event you build on a system that doesn't support SSE42, but run on a system
> > > > >>> that does.
> > > > >> Originally, I thought so as well. I wrote the code without these ifdefs,
> > > > >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > > >> was triggered by nmmintrin.h which has a check for respective GCC
> > > > >> extension. So I think these ifdefs are indeed required.
> > > > >>
> > > > > You need to edit the makefile so that the compiler gets passed the option
> > > > > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > > > > you to remove the ifdef from the include file
> > > > 
> > > > In this case, I guess there are two options:
> > > > 1) modify all makefiles which use librte_hash
> > > > 2) move all function bodies from rte_hash_crc.h to separate module,
> > > > leaving prototype definitions there only.
> > > > 
> > > > Everybody's up for the second option? :)
> > > > 
> > > Crud, you're right, I didn't think about the header inclusion issue.  Is it
> > > worth adding the jump to enable the dynamic hash selection?
> > > Neil
> > 
> > Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
> > For builds where we have hardware support confirmed at compile time, just use
> > the function from the header file.
> > Does that make sense?
> > 
> I'm not certain of that, as I don't think anything can be 'confirmed' at compile
> time.  I.e. just because you have sse42 at compile time doesn't guarantee you
> have it at run time with a DSO.  If you have these as macros, you need to enable
> sse42 whereever you include the file so that the intrinsic works properly.

Well, if you compile with sse42 at compile time, the compiler is free to insert
sse4 instructions at any place it feels like, irrespective of whether or not you
use SSE4 intrinsics, so I would never expect such a DSO to work on a system
without SSE42 support.

> 
> an alternate option would be to not use the intrinsic, and craft some explicit
> __asm__ statement that executes the right sse42 instructions.  That way the asm
> is directly emitted, without requiring the -msse42 flag at all, and it will just
> work in all the files that call it.
> 

I really don't like that approach. I think using intrinsics is much more 
maintainable.

/Bruce

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 10:16                   ` Bruce Richardson
@ 2014-11-19 11:34                     ` Neil Horman
  2014-11-19 11:38                       ` Bruce Richardson
  2014-11-19 11:35                     ` Yerden Zhumabekov
  1 sibling, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-19 11:34 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Wed, Nov 19, 2014 at 10:16:14AM +0000, Bruce Richardson wrote:
> On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
> > On Tue, Nov 18, 2014 at 05:52:27PM +0000, Bruce Richardson wrote:
> > > On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
> > > > On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> > > > > 
> > > > > 18.11.2014 22:00, Neil Horman пишет:
> > > > > > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > > > >> 18.11.2014 20:41, Neil Horman пишет:
> > > > > >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > > > >>>>  /**
> > > > > >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > > > >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > > > >>>> + * not supported
> > > > > >>>>   *
> > > > > >>>>   * @param data
> > > > > >>>>   *   Data to perform hash on.
> > > > > >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > > > >>>>  static inline uint32_t
> > > > > >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > > > >>>>  {
> > > > > >>>> -	return _mm_crc32_u32(init_val, data);
> > > > > >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > > > >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> > > > > >>>> +		return _mm_crc32_u32(init_val, data);
> > > > > >>>> +#endif
> > > > > >>> you don't really need these ifdefs here anymore given that you have a
> > > > > >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> > > > > >>> the event you build on a system that doesn't support SSE42, but run on a system
> > > > > >>> that does.
> > > > > >> Originally, I thought so as well. I wrote the code without these ifdefs,
> > > > > >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > > > >> was triggered by nmmintrin.h which has a check for respective GCC
> > > > > >> extension. So I think these ifdefs are indeed required.
> > > > > >>
> > > > > > You need to edit the makefile so that the compiler gets passed the option
> > > > > > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > > > > > you to remove the ifdef from the include file
> > > > > 
> > > > > In this case, I guess there are two options:
> > > > > 1) modify all makefiles which use librte_hash
> > > > > 2) move all function bodies from rte_hash_crc.h to separate module,
> > > > > leaving prototype definitions there only.
> > > > > 
> > > > > Everybody's up for the second option? :)
> > > > > 
> > > > Crud, you're right, I didn't think about the header inclusion issue.  Is it
> > > > worth adding the jump to enable the dynamic hash selection?
> > > > Neil
> > > 
> > > Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
> > > For builds where we have hardware support confirmed at compile time, just use
> > > the function from the header file.
> > > Does that make sense?
> > > 
> > I'm not certain of that, as I don't think anything can be 'confirmed' at compile
> > time.  I.e. just because you have sse42 at compile time doesn't guarantee you
> > have it at run time with a DSO.  If you have these as macros, you need to enable
> > sse42 whereever you include the file so that the intrinsic works properly.
> 
> Well, if you compile with sse42 at compile time, the compiler is free to insert
> sse4 instructions at any place it feels like, irrespective of whether or not you
> use SSE4 intrinsics, so I would never expect such a DSO to work on a system
> without SSE42 support.
> 
> > 
> > an alternate option would be to not use the intrinsic, and craft some explicit
> > __asm__ statement that executes the right sse42 instructions.  That way the asm
> > is directly emitted, without requiring the -msse42 flag at all, and it will just
> > work in all the files that call it.
> > 
> 
> I really don't like that approach. I think using intrinsics is much more 
> maintainable.
> 
I grant you that using an intrinsic is easier to read, but if the code doesn't
compile when using the intrinsic unless you have sse42 turned on, I'm not sure
what choice we have.  and inline asm isn't that hard to maintain.  We're talking
about three lines of code:
asm(
 "mov    %[1],%eax
 mov    %[2],%edx
 crc32l %edx,%eax": 
 [edx] "r" (crc) /*output*/
 :
 [1] "r" (crc), /* input */
 [2] "r" (val)
 :
 [eax] "r" /* clobber */
)

I don't have the syntax quite right, but its pretty easy to read the intent.
Its not like we dont have precidence for this, the atomic interface and several
pmds do this frequently.

Neil


> /Bruce
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 10:16                   ` Bruce Richardson
  2014-11-19 11:34                     ` Neil Horman
@ 2014-11-19 11:35                     ` Yerden Zhumabekov
  2014-11-19 15:07                       ` Neil Horman
  1 sibling, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-19 11:35 UTC (permalink / raw)
  To: Bruce Richardson, Neil Horman; +Cc: dev


19.11.2014 16:16, Bruce Richardson пишет:
> On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
>> an alternate option would be to not use the intrinsic, and craft some explicit
>> __asm__ statement that executes the right sse42 instructions.  That way the asm
>> is directly emitted, without requiring the -msse42 flag at all, and it will just
>> work in all the files that call it.
>>
> I really don't like that approach. I think using intrinsics is much more 
> maintainable.
>

static inline uint32_t
crc32_sse42_u32(uint32_t data, uint32_t init_val)
{
/*··__asm__ volatile(
············"crc32l %[data], %[init_val];"
············: [init_val] "+r" (init_val)
············: [data] "rm" (data));
····return init_val;*/

But wait, will __builtin_ia32_crc32si and __builtin_ia32_crc32di
functions do the trick? ICC has them?
What about prototyping functions and extracting their bodies to separate
module? Does it break anything?

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 11:34                     ` Neil Horman
@ 2014-11-19 11:38                       ` Bruce Richardson
  2014-11-19 11:50                         ` Ananyev, Konstantin
  0 siblings, 1 reply; 98+ messages in thread
From: Bruce Richardson @ 2014-11-19 11:38 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On Wed, Nov 19, 2014 at 06:34:08AM -0500, Neil Horman wrote:
> On Wed, Nov 19, 2014 at 10:16:14AM +0000, Bruce Richardson wrote:
> > On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
> > > On Tue, Nov 18, 2014 at 05:52:27PM +0000, Bruce Richardson wrote:
> > > > On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
> > > > > On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> > > > > > 
> > > > > > 18.11.2014 22:00, Neil Horman пишет:
> > > > > > > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > > > > >> 18.11.2014 20:41, Neil Horman пишет:
> > > > > > >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > > > > >>>>  /**
> > > > > > >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > > > > >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > > > > >>>> + * not supported
> > > > > > >>>>   *
> > > > > > >>>>   * @param data
> > > > > > >>>>   *   Data to perform hash on.
> > > > > > >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > > > > >>>>  static inline uint32_t
> > > > > > >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > > > > >>>>  {
> > > > > > >>>> -	return _mm_crc32_u32(init_val, data);
> > > > > > >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > > > > >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> > > > > > >>>> +		return _mm_crc32_u32(init_val, data);
> > > > > > >>>> +#endif
> > > > > > >>> you don't really need these ifdefs here anymore given that you have a
> > > > > > >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> > > > > > >>> the event you build on a system that doesn't support SSE42, but run on a system
> > > > > > >>> that does.
> > > > > > >> Originally, I thought so as well. I wrote the code without these ifdefs,
> > > > > > >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > > > > >> was triggered by nmmintrin.h which has a check for respective GCC
> > > > > > >> extension. So I think these ifdefs are indeed required.
> > > > > > >>
> > > > > > > You need to edit the makefile so that the compiler gets passed the option
> > > > > > > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > > > > > > you to remove the ifdef from the include file
> > > > > > 
> > > > > > In this case, I guess there are two options:
> > > > > > 1) modify all makefiles which use librte_hash
> > > > > > 2) move all function bodies from rte_hash_crc.h to separate module,
> > > > > > leaving prototype definitions there only.
> > > > > > 
> > > > > > Everybody's up for the second option? :)
> > > > > > 
> > > > > Crud, you're right, I didn't think about the header inclusion issue.  Is it
> > > > > worth adding the jump to enable the dynamic hash selection?
> > > > > Neil
> > > > 
> > > > Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
> > > > For builds where we have hardware support confirmed at compile time, just use
> > > > the function from the header file.
> > > > Does that make sense?
> > > > 
> > > I'm not certain of that, as I don't think anything can be 'confirmed' at compile
> > > time.  I.e. just because you have sse42 at compile time doesn't guarantee you
> > > have it at run time with a DSO.  If you have these as macros, you need to enable
> > > sse42 whereever you include the file so that the intrinsic works properly.
> > 
> > Well, if you compile with sse42 at compile time, the compiler is free to insert
> > sse4 instructions at any place it feels like, irrespective of whether or not you
> > use SSE4 intrinsics, so I would never expect such a DSO to work on a system
> > without SSE42 support.
> > 
> > > 
> > > an alternate option would be to not use the intrinsic, and craft some explicit
> > > __asm__ statement that executes the right sse42 instructions.  That way the asm
> > > is directly emitted, without requiring the -msse42 flag at all, and it will just
> > > work in all the files that call it.
> > > 
> > 
> > I really don't like that approach. I think using intrinsics is much more 
> > maintainable.
> > 
> I grant you that using an intrinsic is easier to read, but if the code doesn't
> compile when using the intrinsic unless you have sse42 turned on, I'm not sure
> what choice we have.  and inline asm isn't that hard to maintain.  We're talking
> about three lines of code:
> asm(
>  "mov    %[1],%eax
>  mov    %[2],%edx
>  crc32l %edx,%eax": 
>  [edx] "r" (crc) /*output*/
>  :
>  [1] "r" (crc), /* input */
>  [2] "r" (val)
>  :
>  [eax] "r" /* clobber */
> )
> 
> I don't have the syntax quite right, but its pretty easy to read the intent.
> Its not like we dont have precidence for this, the atomic interface and several
> pmds do this frequently.
> 
> Neil

Fair point. If everyone else is happy enough with it, I'm ok too.

/Bruce

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 11:38                       ` Bruce Richardson
@ 2014-11-19 11:50                         ` Ananyev, Konstantin
  2014-11-19 11:59                           ` Yerden Zhumabekov
  2014-11-19 15:05                           ` Neil Horman
  0 siblings, 2 replies; 98+ messages in thread
From: Ananyev, Konstantin @ 2014-11-19 11:50 UTC (permalink / raw)
  To: Richardson, Bruce, Neil Horman; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, November 19, 2014 11:38 AM
> To: Neil Horman
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
> 
> On Wed, Nov 19, 2014 at 06:34:08AM -0500, Neil Horman wrote:
> > On Wed, Nov 19, 2014 at 10:16:14AM +0000, Bruce Richardson wrote:
> > > On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
> > > > On Tue, Nov 18, 2014 at 05:52:27PM +0000, Bruce Richardson wrote:
> > > > > On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
> > > > > > On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> > > > > > >
> > > > > > > 18.11.2014 22:00, Neil Horman пишет:
> > > > > > > > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > > > > > >> 18.11.2014 20:41, Neil Horman пишет:
> > > > > > > >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > > > > > >>>>  /**
> > > > > > > >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > > > > > >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > > > > > >>>> + * not supported
> > > > > > > >>>>   *
> > > > > > > >>>>   * @param data
> > > > > > > >>>>   *   Data to perform hash on.
> > > > > > > >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > > > > > >>>>  static inline uint32_t
> > > > > > > >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > > > > > >>>>  {
> > > > > > > >>>> -	return _mm_crc32_u32(init_val, data);
> > > > > > > >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > > > > > >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> > > > > > > >>>> +		return _mm_crc32_u32(init_val, data);
> > > > > > > >>>> +#endif
> > > > > > > >>> you don't really need these ifdefs here anymore given that you have a
> > > > > > > >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> > > > > > > >>> the event you build on a system that doesn't support SSE42, but run on a system
> > > > > > > >>> that does.
> > > > > > > >> Originally, I thought so as well. I wrote the code without these ifdefs,
> > > > > > > >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > > > > > >> was triggered by nmmintrin.h which has a check for respective GCC
> > > > > > > >> extension. So I think these ifdefs are indeed required.
> > > > > > > >>
> > > > > > > > You need to edit the makefile so that the compiler gets passed the option
> > > > > > > > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > > > > > > > you to remove the ifdef from the include file
> > > > > > >
> > > > > > > In this case, I guess there are two options:
> > > > > > > 1) modify all makefiles which use librte_hash
> > > > > > > 2) move all function bodies from rte_hash_crc.h to separate module,
> > > > > > > leaving prototype definitions there only.
> > > > > > >
> > > > > > > Everybody's up for the second option? :)
> > > > > > >
> > > > > > Crud, you're right, I didn't think about the header inclusion issue.  Is it
> > > > > > worth adding the jump to enable the dynamic hash selection?
> > > > > > Neil
> > > > >
> > > > > Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
> > > > > For builds where we have hardware support confirmed at compile time, just use
> > > > > the function from the header file.
> > > > > Does that make sense?
> > > > >
> > > > I'm not certain of that, as I don't think anything can be 'confirmed' at compile
> > > > time.  I.e. just because you have sse42 at compile time doesn't guarantee you
> > > > have it at run time with a DSO.  If you have these as macros, you need to enable
> > > > sse42 whereever you include the file so that the intrinsic works properly.
> > >
> > > Well, if you compile with sse42 at compile time, the compiler is free to insert
> > > sse4 instructions at any place it feels like, irrespective of whether or not you
> > > use SSE4 intrinsics, so I would never expect such a DSO to work on a system
> > > without SSE42 support.
> > >
> > > >
> > > > an alternate option would be to not use the intrinsic, and craft some explicit
> > > > __asm__ statement that executes the right sse42 instructions.  That way the asm
> > > > is directly emitted, without requiring the -msse42 flag at all, and it will just
> > > > work in all the files that call it.
> > > >
> > >
> > > I really don't like that approach. I think using intrinsics is much more
> > > maintainable.
> > >
> > I grant you that using an intrinsic is easier to read, but if the code doesn't
> > compile when using the intrinsic unless you have sse42 turned on, I'm not sure
> > what choice we have.  and inline asm isn't that hard to maintain.  We're talking
> > about three lines of code:
> > asm(
> >  "mov    %[1],%eax
> >  mov    %[2],%edx
> >  crc32l %edx,%eax":
> >  [edx] "r" (crc) /*output*/
> >  :
> >  [1] "r" (crc), /* input */
> >  [2] "r" (val)
> >  :
> >  [eax] "r" /* clobber */
> > )
> >
> > I don't have the syntax quite right, but its pretty easy to read the intent.
> > Its not like we dont have precidence for this, the atomic interface and several
> > pmds do this frequently.
> >
> > Neil
> 
> Fair point. If everyone else is happy enough with it, I'm ok too.

As I remember with gcc & icc it is possible to specify tht you'd like to compile that particular function
for different target.
From https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html:
"target
The target attribute is used to specify that a function is to be compiled with different target options than specified on the command line. This can be used for instance to have functions compiled with a different ISA (instruction set architecture) than the default. You can also use the ‘#pragma GCC target’ pragma to set more than one function to be compiled with specific target options. See Function Specific Option Pragmas, for details about the ‘#pragma GCC target’ pragma.
For instance on a 386, you could compile one function with target("sse4.1,arch=core2") and another with target("sse4a,arch=amdfam10"). This is equivalent to compiling the first function with -msse4.1 and -march=core2 options, and the second function with -msse4a and -march=amdfam10 options. It is up to the user to make sure that a function is only invoked on a machine that supports the particular ISA it is compiled for (for example by using cpuid on 386 to determine what feature bits and architecture family are used).

          int core2_func (void) __attribute__ ((__target__ ("arch=core2")));
          int sse3_func (void) __attribute__ ((__target__ ("sse3")));
You can either use multiple strings to specify multiple options, or separate the options with a comma (‘,’).

The target attribute is presently implemented for i386/x86_64, PowerPC, and Nios II targets only. The options supported are specific to each target.

On the 386, the following options are allowed:
...
 ‘sse4.2’
‘no-sse4.2’"

Wouldn't that suit your purposes?
Probably you can even keep your function inline with that approach.

Konstantin


> 
> /Bruce

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 11:50                         ` Ananyev, Konstantin
@ 2014-11-19 11:59                           ` Yerden Zhumabekov
  2014-11-19 15:05                           ` Neil Horman
  1 sibling, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-19 11:59 UTC (permalink / raw)
  To: Ananyev, Konstantin, Richardson, Bruce, Neil Horman; +Cc: dev


19.11.2014 17:50, Ananyev, Konstantin пишет:
>
> As I remember with gcc & icc it is possible to specify tht you'd like to compile that particular function
> for different target.
> From https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html:
> "target
> The target attribute is used to specify that a function is to be compiled with different target options than specified on the command line. This can be used for instance to have functions compiled with a different ISA (instruction set architecture) than the default. You can also use the ‘#pragma GCC target’ pragma to set more than one function to be compiled with specific target options. See Function Specific Option Pragmas, for details about the ‘#pragma GCC target’ pragma.
> For instance on a 386, you could compile one function with target("sse4.1,arch=core2") and another with target("sse4a,arch=amdfam10"). This is equivalent to compiling the first function with -msse4.1 and -march=core2 options, and the second function with -msse4a and -march=amdfam10 options. It is up to the user to make sure that a function is only invoked on a machine that supports the particular ISA it is compiled for (for example by using cpuid on 386 to determine what feature bits and architecture family are used).
>
>           int core2_func (void) __attribute__ ((__target__ ("arch=core2")));
>           int sse3_func (void) __attribute__ ((__target__ ("sse3")));
> You can either use multiple strings to specify multiple options, or separate the options with a comma (‘,’).
>
> The target attribute is presently implemented for i386/x86_64, PowerPC, and Nios II targets only. The options supported are specific to each target.
>
> On the 386, the following options are allowed:
> ...
>  ‘sse4.2’
> ‘no-sse4.2’"
>
> Wouldn't that suit your purposes?
> Probably you can even keep your function inline with that approach.
Very nice. Thank you. I will test it.


-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 11:50                         ` Ananyev, Konstantin
  2014-11-19 11:59                           ` Yerden Zhumabekov
@ 2014-11-19 15:05                           ` Neil Horman
  2014-11-19 16:51                             ` Ananyev, Konstantin
  1 sibling, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-19 15:05 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Wed, Nov 19, 2014 at 11:50:40AM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Wednesday, November 19, 2014 11:38 AM
> > To: Neil Horman
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
> > 
> > On Wed, Nov 19, 2014 at 06:34:08AM -0500, Neil Horman wrote:
> > > On Wed, Nov 19, 2014 at 10:16:14AM +0000, Bruce Richardson wrote:
> > > > On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
> > > > > On Tue, Nov 18, 2014 at 05:52:27PM +0000, Bruce Richardson wrote:
> > > > > > On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
> > > > > > > On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> > > > > > > >
> > > > > > > > 18.11.2014 22:00, Neil Horman пишет:
> > > > > > > > > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > > > > > > >> 18.11.2014 20:41, Neil Horman пишет:
> > > > > > > > >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > > > > > > >>>>  /**
> > > > > > > > >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > > > > > > >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > > > > > > >>>> + * not supported
> > > > > > > > >>>>   *
> > > > > > > > >>>>   * @param data
> > > > > > > > >>>>   *   Data to perform hash on.
> > > > > > > > >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > > > > > > >>>>  static inline uint32_t
> > > > > > > > >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > > > > > > >>>>  {
> > > > > > > > >>>> -	return _mm_crc32_u32(init_val, data);
> > > > > > > > >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > > > > > > >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> > > > > > > > >>>> +		return _mm_crc32_u32(init_val, data);
> > > > > > > > >>>> +#endif
> > > > > > > > >>> you don't really need these ifdefs here anymore given that you have a
> > > > > > > > >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> > > > > > > > >>> the event you build on a system that doesn't support SSE42, but run on a system
> > > > > > > > >>> that does.
> > > > > > > > >> Originally, I thought so as well. I wrote the code without these ifdefs,
> > > > > > > > >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > > > > > > >> was triggered by nmmintrin.h which has a check for respective GCC
> > > > > > > > >> extension. So I think these ifdefs are indeed required.
> > > > > > > > >>
> > > > > > > > > You need to edit the makefile so that the compiler gets passed the option
> > > > > > > > > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > > > > > > > > you to remove the ifdef from the include file
> > > > > > > >
> > > > > > > > In this case, I guess there are two options:
> > > > > > > > 1) modify all makefiles which use librte_hash
> > > > > > > > 2) move all function bodies from rte_hash_crc.h to separate module,
> > > > > > > > leaving prototype definitions there only.
> > > > > > > >
> > > > > > > > Everybody's up for the second option? :)
> > > > > > > >
> > > > > > > Crud, you're right, I didn't think about the header inclusion issue.  Is it
> > > > > > > worth adding the jump to enable the dynamic hash selection?
> > > > > > > Neil
> > > > > >
> > > > > > Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
> > > > > > For builds where we have hardware support confirmed at compile time, just use
> > > > > > the function from the header file.
> > > > > > Does that make sense?
> > > > > >
> > > > > I'm not certain of that, as I don't think anything can be 'confirmed' at compile
> > > > > time.  I.e. just because you have sse42 at compile time doesn't guarantee you
> > > > > have it at run time with a DSO.  If you have these as macros, you need to enable
> > > > > sse42 whereever you include the file so that the intrinsic works properly.
> > > >
> > > > Well, if you compile with sse42 at compile time, the compiler is free to insert
> > > > sse4 instructions at any place it feels like, irrespective of whether or not you
> > > > use SSE4 intrinsics, so I would never expect such a DSO to work on a system
> > > > without SSE42 support.
> > > >
> > > > >
> > > > > an alternate option would be to not use the intrinsic, and craft some explicit
> > > > > __asm__ statement that executes the right sse42 instructions.  That way the asm
> > > > > is directly emitted, without requiring the -msse42 flag at all, and it will just
> > > > > work in all the files that call it.
> > > > >
> > > >
> > > > I really don't like that approach. I think using intrinsics is much more
> > > > maintainable.
> > > >
> > > I grant you that using an intrinsic is easier to read, but if the code doesn't
> > > compile when using the intrinsic unless you have sse42 turned on, I'm not sure
> > > what choice we have.  and inline asm isn't that hard to maintain.  We're talking
> > > about three lines of code:
> > > asm(
> > >  "mov    %[1],%eax
> > >  mov    %[2],%edx
> > >  crc32l %edx,%eax":
> > >  [edx] "r" (crc) /*output*/
> > >  :
> > >  [1] "r" (crc), /* input */
> > >  [2] "r" (val)
> > >  :
> > >  [eax] "r" /* clobber */
> > > )
> > >
> > > I don't have the syntax quite right, but its pretty easy to read the intent.
> > > Its not like we dont have precidence for this, the atomic interface and several
> > > pmds do this frequently.
> > >
> > > Neil
> > 
> > Fair point. If everyone else is happy enough with it, I'm ok too.
> 
> As I remember with gcc & icc it is possible to specify tht you'd like to compile that particular function
> for different target.
> From https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html:
> "target
> The target attribute is used to specify that a function is to be compiled with different target options than specified on the command line. This can be used for instance to have functions compiled with a different ISA (instruction set architecture) than the default. You can also use the ‘#pragma GCC target’ pragma to set more than one function to be compiled with specific target options. See Function Specific Option Pragmas, for details about the ‘#pragma GCC target’ pragma.
> For instance on a 386, you could compile one function with target("sse4.1,arch=core2") and another with target("sse4a,arch=amdfam10"). This is equivalent to compiling the first function with -msse4.1 and -march=core2 options, and the second function with -msse4a and -march=amdfam10 options. It is up to the user to make sure that a function is only invoked on a machine that supports the particular ISA it is compiled for (for example by using cpuid on 386 to determine what feature bits and architecture family are used).
> 
>           int core2_func (void) __attribute__ ((__target__ ("arch=core2")));
>           int sse3_func (void) __attribute__ ((__target__ ("sse3")));
> You can either use multiple strings to specify multiple options, or separate the options with a comma (‘,’).
> 
> The target attribute is presently implemented for i386/x86_64, PowerPC, and Nios II targets only. The options supported are specific to each target.
> 
> On the 386, the following options are allowed:
> ...
>  ‘sse4.2’
> ‘no-sse4.2’"
> 
> Wouldn't that suit your purposes?
> Probably you can even keep your function inline with that approach.
> 
That would definately work, and be a great solution in this case.  However, its
limited to only the most recent version of gcc.  If thats an acceptible
constraint on the DPDK, then its ok, but distributions are only starting to
include that version now.  Not sure of the icc status of that attribute.

Neil

> Konstantin
> 
> 
> > 
> > /Bruce

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 11:35                     ` Yerden Zhumabekov
@ 2014-11-19 15:07                       ` Neil Horman
  2014-11-20  3:04                         ` Yerden Zhumabekov
  0 siblings, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-19 15:07 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Wed, Nov 19, 2014 at 05:35:51PM +0600, Yerden Zhumabekov wrote:
> 
> 19.11.2014 16:16, Bruce Richardson пишет:
> > On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
> >> an alternate option would be to not use the intrinsic, and craft some explicit
> >> __asm__ statement that executes the right sse42 instructions.  That way the asm
> >> is directly emitted, without requiring the -msse42 flag at all, and it will just
> >> work in all the files that call it.
> >>
> > I really don't like that approach. I think using intrinsics is much more 
> > maintainable.
> >
> 
> static inline uint32_t
> crc32_sse42_u32(uint32_t data, uint32_t init_val)
> {
> /*··__asm__ volatile(
> ············"crc32l %[data], %[init_val];"
> ············: [init_val] "+r" (init_val)
> ············: [data] "rm" (data));
> ····return init_val;*/
> 
> But wait, will __builtin_ia32_crc32si and __builtin_ia32_crc32di
> functions do the trick? ICC has them?
If builtins work on both icc and gcc, yes, that would be a solution as it
creates non sse instructions when the target cpu doesn't support it.

> What about prototyping functions and extracting their bodies to separate
> module? Does it break anything?
> 
That would be a variant on the asm inline idea, but yes, I think that would work
too
Neil


> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 15:05                           ` Neil Horman
@ 2014-11-19 16:51                             ` Ananyev, Konstantin
  0 siblings, 0 replies; 98+ messages in thread
From: Ananyev, Konstantin @ 2014-11-19 16:51 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev



> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Wednesday, November 19, 2014 3:06 PM
> To: Ananyev, Konstantin
> Cc: Richardson, Bruce; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
> 
> On Wed, Nov 19, 2014 at 11:50:40AM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > > Sent: Wednesday, November 19, 2014 11:38 AM
> > > To: Neil Horman
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
> > >
> > > On Wed, Nov 19, 2014 at 06:34:08AM -0500, Neil Horman wrote:
> > > > On Wed, Nov 19, 2014 at 10:16:14AM +0000, Bruce Richardson wrote:
> > > > > On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
> > > > > > On Tue, Nov 18, 2014 at 05:52:27PM +0000, Bruce Richardson wrote:
> > > > > > > On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
> > > > > > > > On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
> > > > > > > > >
> > > > > > > > > 18.11.2014 22:00, Neil Horman пишет:
> > > > > > > > > > On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
> > > > > > > > > >> 18.11.2014 20:41, Neil Horman пишет:
> > > > > > > > > >>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
> > > > > > > > > >>>>  /**
> > > > > > > > > >>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> > > > > > > > > >>>> + * Fall back to software crc32 implementation in case SSE4.2 is
> > > > > > > > > >>>> + * not supported
> > > > > > > > > >>>>   *
> > > > > > > > > >>>>   * @param data
> > > > > > > > > >>>>   *   Data to perform hash on.
> > > > > > > > > >>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > > > > > > > > >>>>  static inline uint32_t
> > > > > > > > > >>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
> > > > > > > > > >>>>  {
> > > > > > > > > >>>> -	return _mm_crc32_u32(init_val, data);
> > > > > > > > > >>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> > > > > > > > > >>>> +	if (likely(crc32_alg == CRC32_SSE42))
> > > > > > > > > >>>> +		return _mm_crc32_u32(init_val, data);
> > > > > > > > > >>>> +#endif
> > > > > > > > > >>> you don't really need these ifdefs here anymore given that you have a
> > > > > > > > > >>> constructor to do the algorithm selection.  In fact you need to remove them, in
> > > > > > > > > >>> the event you build on a system that doesn't support SSE42, but run on a system
> > > > > > > > > >>> that does.
> > > > > > > > > >> Originally, I thought so as well. I wrote the code without these ifdefs,
> > > > > > > > > >> but it didn't compile on my machine which doesn't support SSE4.2. Error
> > > > > > > > > >> was triggered by nmmintrin.h which has a check for respective GCC
> > > > > > > > > >> extension. So I think these ifdefs are indeed required.
> > > > > > > > > >>
> > > > > > > > > > You need to edit the makefile so that the compiler gets passed the option
> > > > > > > > > > -msse42.  That way it will know to emit sse42 instructions. It will also allow
> > > > > > > > > > you to remove the ifdef from the include file
> > > > > > > > >
> > > > > > > > > In this case, I guess there are two options:
> > > > > > > > > 1) modify all makefiles which use librte_hash
> > > > > > > > > 2) move all function bodies from rte_hash_crc.h to separate module,
> > > > > > > > > leaving prototype definitions there only.
> > > > > > > > >
> > > > > > > > > Everybody's up for the second option? :)
> > > > > > > > >
> > > > > > > > Crud, you're right, I didn't think about the header inclusion issue.  Is it
> > > > > > > > worth adding the jump to enable the dynamic hash selection?
> > > > > > > > Neil
> > > > > > >
> > > > > > > Maybe for cases where SSE4.2 is not currently available, i.e. for generic builds.
> > > > > > > For builds where we have hardware support confirmed at compile time, just use
> > > > > > > the function from the header file.
> > > > > > > Does that make sense?
> > > > > > >
> > > > > > I'm not certain of that, as I don't think anything can be 'confirmed' at compile
> > > > > > time.  I.e. just because you have sse42 at compile time doesn't guarantee you
> > > > > > have it at run time with a DSO.  If you have these as macros, you need to enable
> > > > > > sse42 whereever you include the file so that the intrinsic works properly.
> > > > >
> > > > > Well, if you compile with sse42 at compile time, the compiler is free to insert
> > > > > sse4 instructions at any place it feels like, irrespective of whether or not you
> > > > > use SSE4 intrinsics, so I would never expect such a DSO to work on a system
> > > > > without SSE42 support.
> > > > >
> > > > > >
> > > > > > an alternate option would be to not use the intrinsic, and craft some explicit
> > > > > > __asm__ statement that executes the right sse42 instructions.  That way the asm
> > > > > > is directly emitted, without requiring the -msse42 flag at all, and it will just
> > > > > > work in all the files that call it.
> > > > > >
> > > > >
> > > > > I really don't like that approach. I think using intrinsics is much more
> > > > > maintainable.
> > > > >
> > > > I grant you that using an intrinsic is easier to read, but if the code doesn't
> > > > compile when using the intrinsic unless you have sse42 turned on, I'm not sure
> > > > what choice we have.  and inline asm isn't that hard to maintain.  We're talking
> > > > about three lines of code:
> > > > asm(
> > > >  "mov    %[1],%eax
> > > >  mov    %[2],%edx
> > > >  crc32l %edx,%eax":
> > > >  [edx] "r" (crc) /*output*/
> > > >  :
> > > >  [1] "r" (crc), /* input */
> > > >  [2] "r" (val)
> > > >  :
> > > >  [eax] "r" /* clobber */
> > > > )
> > > >
> > > > I don't have the syntax quite right, but its pretty easy to read the intent.
> > > > Its not like we dont have precidence for this, the atomic interface and several
> > > > pmds do this frequently.
> > > >
> > > > Neil
> > >
> > > Fair point. If everyone else is happy enough with it, I'm ok too.
> >
> > As I remember with gcc & icc it is possible to specify tht you'd like to compile that particular function
> > for different target.
> > From https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html:
> > "target
> > The target attribute is used to specify that a function is to be compiled with different target options than specified on the command
> line. This can be used for instance to have functions compiled with a different ISA (instruction set architecture) than the default. You
> can also use the ‘#pragma GCC target’ pragma to set more than one function to be compiled with specific target options. See
> Function Specific Option Pragmas, for details about the ‘#pragma GCC target’ pragma.
> > For instance on a 386, you could compile one function with target("sse4.1,arch=core2") and another with
> target("sse4a,arch=amdfam10"). This is equivalent to compiling the first function with -msse4.1 and -march=core2 options, and the
> second function with -msse4a and -march=amdfam10 options. It is up to the user to make sure that a function is only invoked on a
> machine that supports the particular ISA it is compiled for (for example by using cpuid on 386 to determine what feature bits and
> architecture family are used).
> >
> >           int core2_func (void) __attribute__ ((__target__ ("arch=core2")));
> >           int sse3_func (void) __attribute__ ((__target__ ("sse3")));
> > You can either use multiple strings to specify multiple options, or separate the options with a comma (‘,’).
> >
> > The target attribute is presently implemented for i386/x86_64, PowerPC, and Nios II targets only. The options supported are specific
> to each target.
> >
> > On the 386, the following options are allowed:
> > ...
> >  ‘sse4.2’
> > ‘no-sse4.2’"
> >
> > Wouldn't that suit your purposes?
> > Probably you can even keep your function inline with that approach.
> >
> That would definately work, and be a great solution in this case.  However, its
> limited to only the most recent version of gcc.  If thats an acceptible
> constraint on the DPDK, then its ok, but distributions are only starting to
> include that version now.  Not sure of the icc status of that attribute.

Yes, as I can see that feature was introduced in gcc 4.4, and we have to support backward to gcc 4.3...
Though I suppose for gcc 4.3 , we can just always switch to the scalar version, can't we?
Something like that:

$ cat ./tatrg1.c
#include <stdint.h>

uint64_t
ffx1_gen(uint64_t x)
{
        /* should contain scalar CRC implementation. */
        return (x * x);
}

#pragma GCC target ("sse4.2")

#if defined __SSE4_2__

#include <smmintrin.h>

uint64_t
ffx1_sse42(uint64_t x)
{
        return _mm_crc32_u64(x, x);
}

#else

uint64_t
ffx1_sse42(uint64_t x)
{
        /* should contain scalar CRC implementation. */
        return (x * x);
}

#endif

Konstantin

> 
> Neil
> 
> > Konstantin
> >
> >
> > >
> > > /Bruce

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation
  2014-11-19 15:07                       ` Neil Horman
@ 2014-11-20  3:04                         ` Yerden Zhumabekov
  0 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  3:04 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev


19.11.2014 21:07, Neil Horman пишет:
> On Wed, Nov 19, 2014 at 05:35:51PM +0600, Yerden Zhumabekov wrote:
>> static inline uint32_t
>> crc32_sse42_u32(uint32_t data, uint32_t init_val)
>> {
>> /*··__asm__ volatile(
>> ············"crc32l %[data], %[init_val];"
>> ············: [init_val] "+r" (init_val)
>> ············: [data] "rm" (data));
>> ····return init_val;*/
>>
>> But wait, will __builtin_ia32_crc32si and __builtin_ia32_crc32di
>> functions do the trick? ICC has them?
> If builtins work on both icc and gcc, yes, that would be a solution as it
> creates non sse instructions when the target cpu doesn't support it.

Can anyone acknowledge?

>
>> What about prototyping functions and extracting their bodies to separate
>> module? Does it break anything?
>>
> That would be a variant on the asm inline idea, but yes, I think that would work
> too

No luck. Performance degrades up to 30-50 percent if extracting
functions to separate module.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
                   ` (8 preceding siblings ...)
  2014-11-18 14:03 ` [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
@ 2014-11-20  5:15 ` Yerden Zhumabekov
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 1/7] hash: add software CRC32 implementation Yerden Zhumabekov
                     ` (7 more replies)
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
  10 siblings, 8 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  5:15 UTC (permalink / raw)
  To: dev

These patches bring a fallback mechanism to ensure that CRC32 hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics).
Performance is also improved by slicing data in 8 bytes.

Patches were tested on machines either with and without SSE4.2 support.

Software implementation seems to be about 4-5 times slower than SSE4.2-enabled one. Of course, they return identical results.

Summary of changes:
* added CRC32 software implementation, which is used as a fallback in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
* added rte_hash_crc_set_alg() function to control availability of SSE4.2.
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.
* removed compile-time checks from test_hash_perf and test_hash.
* setting default algorithm implementation as a constructor while application startup.
* SSE4.2 intrinsics are implemented through inline assembly code.
* added additional run-time check for 64-bit support.

Yerden Zhumabekov (7):
  hash: add software CRC32 implementation
  hash: add assembly implementation of CRC32 intrinsics
  hash: replace built-in functions implementing SSE4.2
  hash: add rte_hash_crc_8byte function
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces
  test: remove redundant compile checks

 app/test/test_hash.c           |    7 -
 app/test/test_hash_perf.c      |   11 -
 lib/librte_hash/rte_hash_crc.h |  459 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 448 insertions(+), 29 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v5 1/7] hash: add software CRC32 implementation
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
@ 2014-11-20  5:16   ` Yerden Zhumabekov
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 2/7] hash: add assembly implementation of CRC32 intrinsics Yerden Zhumabekov
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  5:16 UTC (permalink / raw)
  To: dev

Add lookup tables for CRC32 algorithm, crc32c_1word() and crc32c_2words()
functions returning hash of 32-bit and 64-bit operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |  316 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 316 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..4d7532a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -47,6 +47,322 @@ extern "C" {
 #include <stdint.h>
 #include <nmmintrin.h>
 
+/* Lookup tables for software implementation of CRC32C */
+static uint32_t crc32c_tables[8][256] = {{
+ 0x00000000, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, 0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB,
+ 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, 0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24,
+ 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, 0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384,
+ 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, 0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B,
+ 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, 0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35,
+ 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, 0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA,
+ 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, 0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A,
+ 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, 0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595,
+ 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, 0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957,
+ 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, 0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198,
+ 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, 0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38,
+ 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, 0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7,
+ 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, 0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789,
+ 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, 0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46,
+ 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, 0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6,
+ 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, 0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829,
+ 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, 0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93,
+ 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, 0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C,
+ 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, 0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC,
+ 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, 0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033,
+ 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, 0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D,
+ 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, 0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982,
+ 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, 0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622,
+ 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, 0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED,
+ 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, 0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F,
+ 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, 0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0,
+ 0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F, 0x144976B4, 0xE622F5B7, 0xF5720643, 0x07198540,
+ 0x590AB964, 0xAB613A67, 0xB831C993, 0x4A5A4A90, 0x9E902E7B, 0x6CFBAD78, 0x7FAB5E8C, 0x8DC0DD8F,
+ 0xE330A81A, 0x115B2B19, 0x020BD8ED, 0xF0605BEE, 0x24AA3F05, 0xD6C1BC06, 0xC5914FF2, 0x37FACCF1,
+ 0x69E9F0D5, 0x9B8273D6, 0x88D28022, 0x7AB90321, 0xAE7367CA, 0x5C18E4C9, 0x4F48173D, 0xBD23943E,
+ 0xF36E6F75, 0x0105EC76, 0x12551F82, 0xE03E9C81, 0x34F4F86A, 0xC69F7B69, 0xD5CF889D, 0x27A40B9E,
+ 0x79B737BA, 0x8BDCB4B9, 0x988C474D, 0x6AE7C44E, 0xBE2DA0A5, 0x4C4623A6, 0x5F16D052, 0xAD7D5351
+},
+{
+ 0x00000000, 0x13A29877, 0x274530EE, 0x34E7A899, 0x4E8A61DC, 0x5D28F9AB, 0x69CF5132, 0x7A6DC945,
+ 0x9D14C3B8, 0x8EB65BCF, 0xBA51F356, 0xA9F36B21, 0xD39EA264, 0xC03C3A13, 0xF4DB928A, 0xE7790AFD,
+ 0x3FC5F181, 0x2C6769F6, 0x1880C16F, 0x0B225918, 0x714F905D, 0x62ED082A, 0x560AA0B3, 0x45A838C4,
+ 0xA2D13239, 0xB173AA4E, 0x859402D7, 0x96369AA0, 0xEC5B53E5, 0xFFF9CB92, 0xCB1E630B, 0xD8BCFB7C,
+ 0x7F8BE302, 0x6C297B75, 0x58CED3EC, 0x4B6C4B9B, 0x310182DE, 0x22A31AA9, 0x1644B230, 0x05E62A47,
+ 0xE29F20BA, 0xF13DB8CD, 0xC5DA1054, 0xD6788823, 0xAC154166, 0xBFB7D911, 0x8B507188, 0x98F2E9FF,
+ 0x404E1283, 0x53EC8AF4, 0x670B226D, 0x74A9BA1A, 0x0EC4735F, 0x1D66EB28, 0x298143B1, 0x3A23DBC6,
+ 0xDD5AD13B, 0xCEF8494C, 0xFA1FE1D5, 0xE9BD79A2, 0x93D0B0E7, 0x80722890, 0xB4958009, 0xA737187E,
+ 0xFF17C604, 0xECB55E73, 0xD852F6EA, 0xCBF06E9D, 0xB19DA7D8, 0xA23F3FAF, 0x96D89736, 0x857A0F41,
+ 0x620305BC, 0x71A19DCB, 0x45463552, 0x56E4AD25, 0x2C896460, 0x3F2BFC17, 0x0BCC548E, 0x186ECCF9,
+ 0xC0D23785, 0xD370AFF2, 0xE797076B, 0xF4359F1C, 0x8E585659, 0x9DFACE2E, 0xA91D66B7, 0xBABFFEC0,
+ 0x5DC6F43D, 0x4E646C4A, 0x7A83C4D3, 0x69215CA4, 0x134C95E1, 0x00EE0D96, 0x3409A50F, 0x27AB3D78,
+ 0x809C2506, 0x933EBD71, 0xA7D915E8, 0xB47B8D9F, 0xCE1644DA, 0xDDB4DCAD, 0xE9537434, 0xFAF1EC43,
+ 0x1D88E6BE, 0x0E2A7EC9, 0x3ACDD650, 0x296F4E27, 0x53028762, 0x40A01F15, 0x7447B78C, 0x67E52FFB,
+ 0xBF59D487, 0xACFB4CF0, 0x981CE469, 0x8BBE7C1E, 0xF1D3B55B, 0xE2712D2C, 0xD69685B5, 0xC5341DC2,
+ 0x224D173F, 0x31EF8F48, 0x050827D1, 0x16AABFA6, 0x6CC776E3, 0x7F65EE94, 0x4B82460D, 0x5820DE7A,
+ 0xFBC3FAF9, 0xE861628E, 0xDC86CA17, 0xCF245260, 0xB5499B25, 0xA6EB0352, 0x920CABCB, 0x81AE33BC,
+ 0x66D73941, 0x7575A136, 0x419209AF, 0x523091D8, 0x285D589D, 0x3BFFC0EA, 0x0F186873, 0x1CBAF004,
+ 0xC4060B78, 0xD7A4930F, 0xE3433B96, 0xF0E1A3E1, 0x8A8C6AA4, 0x992EF2D3, 0xADC95A4A, 0xBE6BC23D,
+ 0x5912C8C0, 0x4AB050B7, 0x7E57F82E, 0x6DF56059, 0x1798A91C, 0x043A316B, 0x30DD99F2, 0x237F0185,
+ 0x844819FB, 0x97EA818C, 0xA30D2915, 0xB0AFB162, 0xCAC27827, 0xD960E050, 0xED8748C9, 0xFE25D0BE,
+ 0x195CDA43, 0x0AFE4234, 0x3E19EAAD, 0x2DBB72DA, 0x57D6BB9F, 0x447423E8, 0x70938B71, 0x63311306,
+ 0xBB8DE87A, 0xA82F700D, 0x9CC8D894, 0x8F6A40E3, 0xF50789A6, 0xE6A511D1, 0xD242B948, 0xC1E0213F,
+ 0x26992BC2, 0x353BB3B5, 0x01DC1B2C, 0x127E835B, 0x68134A1E, 0x7BB1D269, 0x4F567AF0, 0x5CF4E287,
+ 0x04D43CFD, 0x1776A48A, 0x23910C13, 0x30339464, 0x4A5E5D21, 0x59FCC556, 0x6D1B6DCF, 0x7EB9F5B8,
+ 0x99C0FF45, 0x8A626732, 0xBE85CFAB, 0xAD2757DC, 0xD74A9E99, 0xC4E806EE, 0xF00FAE77, 0xE3AD3600,
+ 0x3B11CD7C, 0x28B3550B, 0x1C54FD92, 0x0FF665E5, 0x759BACA0, 0x663934D7, 0x52DE9C4E, 0x417C0439,
+ 0xA6050EC4, 0xB5A796B3, 0x81403E2A, 0x92E2A65D, 0xE88F6F18, 0xFB2DF76F, 0xCFCA5FF6, 0xDC68C781,
+ 0x7B5FDFFF, 0x68FD4788, 0x5C1AEF11, 0x4FB87766, 0x35D5BE23, 0x26772654, 0x12908ECD, 0x013216BA,
+ 0xE64B1C47, 0xF5E98430, 0xC10E2CA9, 0xD2ACB4DE, 0xA8C17D9B, 0xBB63E5EC, 0x8F844D75, 0x9C26D502,
+ 0x449A2E7E, 0x5738B609, 0x63DF1E90, 0x707D86E7, 0x0A104FA2, 0x19B2D7D5, 0x2D557F4C, 0x3EF7E73B,
+ 0xD98EEDC6, 0xCA2C75B1, 0xFECBDD28, 0xED69455F, 0x97048C1A, 0x84A6146D, 0xB041BCF4, 0xA3E32483
+},
+{
+ 0x00000000, 0xA541927E, 0x4F6F520D, 0xEA2EC073, 0x9EDEA41A, 0x3B9F3664, 0xD1B1F617, 0x74F06469,
+ 0x38513EC5, 0x9D10ACBB, 0x773E6CC8, 0xD27FFEB6, 0xA68F9ADF, 0x03CE08A1, 0xE9E0C8D2, 0x4CA15AAC,
+ 0x70A27D8A, 0xD5E3EFF4, 0x3FCD2F87, 0x9A8CBDF9, 0xEE7CD990, 0x4B3D4BEE, 0xA1138B9D, 0x045219E3,
+ 0x48F3434F, 0xEDB2D131, 0x079C1142, 0xA2DD833C, 0xD62DE755, 0x736C752B, 0x9942B558, 0x3C032726,
+ 0xE144FB14, 0x4405696A, 0xAE2BA919, 0x0B6A3B67, 0x7F9A5F0E, 0xDADBCD70, 0x30F50D03, 0x95B49F7D,
+ 0xD915C5D1, 0x7C5457AF, 0x967A97DC, 0x333B05A2, 0x47CB61CB, 0xE28AF3B5, 0x08A433C6, 0xADE5A1B8,
+ 0x91E6869E, 0x34A714E0, 0xDE89D493, 0x7BC846ED, 0x0F382284, 0xAA79B0FA, 0x40577089, 0xE516E2F7,
+ 0xA9B7B85B, 0x0CF62A25, 0xE6D8EA56, 0x43997828, 0x37691C41, 0x92288E3F, 0x78064E4C, 0xDD47DC32,
+ 0xC76580D9, 0x622412A7, 0x880AD2D4, 0x2D4B40AA, 0x59BB24C3, 0xFCFAB6BD, 0x16D476CE, 0xB395E4B0,
+ 0xFF34BE1C, 0x5A752C62, 0xB05BEC11, 0x151A7E6F, 0x61EA1A06, 0xC4AB8878, 0x2E85480B, 0x8BC4DA75,
+ 0xB7C7FD53, 0x12866F2D, 0xF8A8AF5E, 0x5DE93D20, 0x29195949, 0x8C58CB37, 0x66760B44, 0xC337993A,
+ 0x8F96C396, 0x2AD751E8, 0xC0F9919B, 0x65B803E5, 0x1148678C, 0xB409F5F2, 0x5E273581, 0xFB66A7FF,
+ 0x26217BCD, 0x8360E9B3, 0x694E29C0, 0xCC0FBBBE, 0xB8FFDFD7, 0x1DBE4DA9, 0xF7908DDA, 0x52D11FA4,
+ 0x1E704508, 0xBB31D776, 0x511F1705, 0xF45E857B, 0x80AEE112, 0x25EF736C, 0xCFC1B31F, 0x6A802161,
+ 0x56830647, 0xF3C29439, 0x19EC544A, 0xBCADC634, 0xC85DA25D, 0x6D1C3023, 0x8732F050, 0x2273622E,
+ 0x6ED23882, 0xCB93AAFC, 0x21BD6A8F, 0x84FCF8F1, 0xF00C9C98, 0x554D0EE6, 0xBF63CE95, 0x1A225CEB,
+ 0x8B277743, 0x2E66E53D, 0xC448254E, 0x6109B730, 0x15F9D359, 0xB0B84127, 0x5A968154, 0xFFD7132A,
+ 0xB3764986, 0x1637DBF8, 0xFC191B8B, 0x595889F5, 0x2DA8ED9C, 0x88E97FE2, 0x62C7BF91, 0xC7862DEF,
+ 0xFB850AC9, 0x5EC498B7, 0xB4EA58C4, 0x11ABCABA, 0x655BAED3, 0xC01A3CAD, 0x2A34FCDE, 0x8F756EA0,
+ 0xC3D4340C, 0x6695A672, 0x8CBB6601, 0x29FAF47F, 0x5D0A9016, 0xF84B0268, 0x1265C21B, 0xB7245065,
+ 0x6A638C57, 0xCF221E29, 0x250CDE5A, 0x804D4C24, 0xF4BD284D, 0x51FCBA33, 0xBBD27A40, 0x1E93E83E,
+ 0x5232B292, 0xF77320EC, 0x1D5DE09F, 0xB81C72E1, 0xCCEC1688, 0x69AD84F6, 0x83834485, 0x26C2D6FB,
+ 0x1AC1F1DD, 0xBF8063A3, 0x55AEA3D0, 0xF0EF31AE, 0x841F55C7, 0x215EC7B9, 0xCB7007CA, 0x6E3195B4,
+ 0x2290CF18, 0x87D15D66, 0x6DFF9D15, 0xC8BE0F6B, 0xBC4E6B02, 0x190FF97C, 0xF321390F, 0x5660AB71,
+ 0x4C42F79A, 0xE90365E4, 0x032DA597, 0xA66C37E9, 0xD29C5380, 0x77DDC1FE, 0x9DF3018D, 0x38B293F3,
+ 0x7413C95F, 0xD1525B21, 0x3B7C9B52, 0x9E3D092C, 0xEACD6D45, 0x4F8CFF3B, 0xA5A23F48, 0x00E3AD36,
+ 0x3CE08A10, 0x99A1186E, 0x738FD81D, 0xD6CE4A63, 0xA23E2E0A, 0x077FBC74, 0xED517C07, 0x4810EE79,
+ 0x04B1B4D5, 0xA1F026AB, 0x4BDEE6D8, 0xEE9F74A6, 0x9A6F10CF, 0x3F2E82B1, 0xD50042C2, 0x7041D0BC,
+ 0xAD060C8E, 0x08479EF0, 0xE2695E83, 0x4728CCFD, 0x33D8A894, 0x96993AEA, 0x7CB7FA99, 0xD9F668E7,
+ 0x9557324B, 0x3016A035, 0xDA386046, 0x7F79F238, 0x0B899651, 0xAEC8042F, 0x44E6C45C, 0xE1A75622,
+ 0xDDA47104, 0x78E5E37A, 0x92CB2309, 0x378AB177, 0x437AD51E, 0xE63B4760, 0x0C158713, 0xA954156D,
+ 0xE5F54FC1, 0x40B4DDBF, 0xAA9A1DCC, 0x0FDB8FB2, 0x7B2BEBDB, 0xDE6A79A5, 0x3444B9D6, 0x91052BA8
+},
+{
+ 0x00000000, 0xDD45AAB8, 0xBF672381, 0x62228939, 0x7B2231F3, 0xA6679B4B, 0xC4451272, 0x1900B8CA,
+ 0xF64463E6, 0x2B01C95E, 0x49234067, 0x9466EADF, 0x8D665215, 0x5023F8AD, 0x32017194, 0xEF44DB2C,
+ 0xE964B13D, 0x34211B85, 0x560392BC, 0x8B463804, 0x924680CE, 0x4F032A76, 0x2D21A34F, 0xF06409F7,
+ 0x1F20D2DB, 0xC2657863, 0xA047F15A, 0x7D025BE2, 0x6402E328, 0xB9474990, 0xDB65C0A9, 0x06206A11,
+ 0xD725148B, 0x0A60BE33, 0x6842370A, 0xB5079DB2, 0xAC072578, 0x71428FC0, 0x136006F9, 0xCE25AC41,
+ 0x2161776D, 0xFC24DDD5, 0x9E0654EC, 0x4343FE54, 0x5A43469E, 0x8706EC26, 0xE524651F, 0x3861CFA7,
+ 0x3E41A5B6, 0xE3040F0E, 0x81268637, 0x5C632C8F, 0x45639445, 0x98263EFD, 0xFA04B7C4, 0x27411D7C,
+ 0xC805C650, 0x15406CE8, 0x7762E5D1, 0xAA274F69, 0xB327F7A3, 0x6E625D1B, 0x0C40D422, 0xD1057E9A,
+ 0xABA65FE7, 0x76E3F55F, 0x14C17C66, 0xC984D6DE, 0xD0846E14, 0x0DC1C4AC, 0x6FE34D95, 0xB2A6E72D,
+ 0x5DE23C01, 0x80A796B9, 0xE2851F80, 0x3FC0B538, 0x26C00DF2, 0xFB85A74A, 0x99A72E73, 0x44E284CB,
+ 0x42C2EEDA, 0x9F874462, 0xFDA5CD5B, 0x20E067E3, 0x39E0DF29, 0xE4A57591, 0x8687FCA8, 0x5BC25610,
+ 0xB4868D3C, 0x69C32784, 0x0BE1AEBD, 0xD6A40405, 0xCFA4BCCF, 0x12E11677, 0x70C39F4E, 0xAD8635F6,
+ 0x7C834B6C, 0xA1C6E1D4, 0xC3E468ED, 0x1EA1C255, 0x07A17A9F, 0xDAE4D027, 0xB8C6591E, 0x6583F3A6,
+ 0x8AC7288A, 0x57828232, 0x35A00B0B, 0xE8E5A1B3, 0xF1E51979, 0x2CA0B3C1, 0x4E823AF8, 0x93C79040,
+ 0x95E7FA51, 0x48A250E9, 0x2A80D9D0, 0xF7C57368, 0xEEC5CBA2, 0x3380611A, 0x51A2E823, 0x8CE7429B,
+ 0x63A399B7, 0xBEE6330F, 0xDCC4BA36, 0x0181108E, 0x1881A844, 0xC5C402FC, 0xA7E68BC5, 0x7AA3217D,
+ 0x52A0C93F, 0x8FE56387, 0xEDC7EABE, 0x30824006, 0x2982F8CC, 0xF4C75274, 0x96E5DB4D, 0x4BA071F5,
+ 0xA4E4AAD9, 0x79A10061, 0x1B838958, 0xC6C623E0, 0xDFC69B2A, 0x02833192, 0x60A1B8AB, 0xBDE41213,
+ 0xBBC47802, 0x6681D2BA, 0x04A35B83, 0xD9E6F13B, 0xC0E649F1, 0x1DA3E349, 0x7F816A70, 0xA2C4C0C8,
+ 0x4D801BE4, 0x90C5B15C, 0xF2E73865, 0x2FA292DD, 0x36A22A17, 0xEBE780AF, 0x89C50996, 0x5480A32E,
+ 0x8585DDB4, 0x58C0770C, 0x3AE2FE35, 0xE7A7548D, 0xFEA7EC47, 0x23E246FF, 0x41C0CFC6, 0x9C85657E,
+ 0x73C1BE52, 0xAE8414EA, 0xCCA69DD3, 0x11E3376B, 0x08E38FA1, 0xD5A62519, 0xB784AC20, 0x6AC10698,
+ 0x6CE16C89, 0xB1A4C631, 0xD3864F08, 0x0EC3E5B0, 0x17C35D7A, 0xCA86F7C2, 0xA8A47EFB, 0x75E1D443,
+ 0x9AA50F6F, 0x47E0A5D7, 0x25C22CEE, 0xF8878656, 0xE1873E9C, 0x3CC29424, 0x5EE01D1D, 0x83A5B7A5,
+ 0xF90696D8, 0x24433C60, 0x4661B559, 0x9B241FE1, 0x8224A72B, 0x5F610D93, 0x3D4384AA, 0xE0062E12,
+ 0x0F42F53E, 0xD2075F86, 0xB025D6BF, 0x6D607C07, 0x7460C4CD, 0xA9256E75, 0xCB07E74C, 0x16424DF4,
+ 0x106227E5, 0xCD278D5D, 0xAF050464, 0x7240AEDC, 0x6B401616, 0xB605BCAE, 0xD4273597, 0x09629F2F,
+ 0xE6264403, 0x3B63EEBB, 0x59416782, 0x8404CD3A, 0x9D0475F0, 0x4041DF48, 0x22635671, 0xFF26FCC9,
+ 0x2E238253, 0xF36628EB, 0x9144A1D2, 0x4C010B6A, 0x5501B3A0, 0x88441918, 0xEA669021, 0x37233A99,
+ 0xD867E1B5, 0x05224B0D, 0x6700C234, 0xBA45688C, 0xA345D046, 0x7E007AFE, 0x1C22F3C7, 0xC167597F,
+ 0xC747336E, 0x1A0299D6, 0x782010EF, 0xA565BA57, 0xBC65029D, 0x6120A825, 0x0302211C, 0xDE478BA4,
+ 0x31035088, 0xEC46FA30, 0x8E647309, 0x5321D9B1, 0x4A21617B, 0x9764CBC3, 0xF54642FA, 0x2803E842
+},
+{
+ 0x00000000, 0x38116FAC, 0x7022DF58, 0x4833B0F4, 0xE045BEB0, 0xD854D11C, 0x906761E8, 0xA8760E44,
+ 0xC5670B91, 0xFD76643D, 0xB545D4C9, 0x8D54BB65, 0x2522B521, 0x1D33DA8D, 0x55006A79, 0x6D1105D5,
+ 0x8F2261D3, 0xB7330E7F, 0xFF00BE8B, 0xC711D127, 0x6F67DF63, 0x5776B0CF, 0x1F45003B, 0x27546F97,
+ 0x4A456A42, 0x725405EE, 0x3A67B51A, 0x0276DAB6, 0xAA00D4F2, 0x9211BB5E, 0xDA220BAA, 0xE2336406,
+ 0x1BA8B557, 0x23B9DAFB, 0x6B8A6A0F, 0x539B05A3, 0xFBED0BE7, 0xC3FC644B, 0x8BCFD4BF, 0xB3DEBB13,
+ 0xDECFBEC6, 0xE6DED16A, 0xAEED619E, 0x96FC0E32, 0x3E8A0076, 0x069B6FDA, 0x4EA8DF2E, 0x76B9B082,
+ 0x948AD484, 0xAC9BBB28, 0xE4A80BDC, 0xDCB96470, 0x74CF6A34, 0x4CDE0598, 0x04EDB56C, 0x3CFCDAC0,
+ 0x51EDDF15, 0x69FCB0B9, 0x21CF004D, 0x19DE6FE1, 0xB1A861A5, 0x89B90E09, 0xC18ABEFD, 0xF99BD151,
+ 0x37516AAE, 0x0F400502, 0x4773B5F6, 0x7F62DA5A, 0xD714D41E, 0xEF05BBB2, 0xA7360B46, 0x9F2764EA,
+ 0xF236613F, 0xCA270E93, 0x8214BE67, 0xBA05D1CB, 0x1273DF8F, 0x2A62B023, 0x625100D7, 0x5A406F7B,
+ 0xB8730B7D, 0x806264D1, 0xC851D425, 0xF040BB89, 0x5836B5CD, 0x6027DA61, 0x28146A95, 0x10050539,
+ 0x7D1400EC, 0x45056F40, 0x0D36DFB4, 0x3527B018, 0x9D51BE5C, 0xA540D1F0, 0xED736104, 0xD5620EA8,
+ 0x2CF9DFF9, 0x14E8B055, 0x5CDB00A1, 0x64CA6F0D, 0xCCBC6149, 0xF4AD0EE5, 0xBC9EBE11, 0x848FD1BD,
+ 0xE99ED468, 0xD18FBBC4, 0x99BC0B30, 0xA1AD649C, 0x09DB6AD8, 0x31CA0574, 0x79F9B580, 0x41E8DA2C,
+ 0xA3DBBE2A, 0x9BCAD186, 0xD3F96172, 0xEBE80EDE, 0x439E009A, 0x7B8F6F36, 0x33BCDFC2, 0x0BADB06E,
+ 0x66BCB5BB, 0x5EADDA17, 0x169E6AE3, 0x2E8F054F, 0x86F90B0B, 0xBEE864A7, 0xF6DBD453, 0xCECABBFF,
+ 0x6EA2D55C, 0x56B3BAF0, 0x1E800A04, 0x269165A8, 0x8EE76BEC, 0xB6F60440, 0xFEC5B4B4, 0xC6D4DB18,
+ 0xABC5DECD, 0x93D4B161, 0xDBE70195, 0xE3F66E39, 0x4B80607D, 0x73910FD1, 0x3BA2BF25, 0x03B3D089,
+ 0xE180B48F, 0xD991DB23, 0x91A26BD7, 0xA9B3047B, 0x01C50A3F, 0x39D46593, 0x71E7D567, 0x49F6BACB,
+ 0x24E7BF1E, 0x1CF6D0B2, 0x54C56046, 0x6CD40FEA, 0xC4A201AE, 0xFCB36E02, 0xB480DEF6, 0x8C91B15A,
+ 0x750A600B, 0x4D1B0FA7, 0x0528BF53, 0x3D39D0FF, 0x954FDEBB, 0xAD5EB117, 0xE56D01E3, 0xDD7C6E4F,
+ 0xB06D6B9A, 0x887C0436, 0xC04FB4C2, 0xF85EDB6E, 0x5028D52A, 0x6839BA86, 0x200A0A72, 0x181B65DE,
+ 0xFA2801D8, 0xC2396E74, 0x8A0ADE80, 0xB21BB12C, 0x1A6DBF68, 0x227CD0C4, 0x6A4F6030, 0x525E0F9C,
+ 0x3F4F0A49, 0x075E65E5, 0x4F6DD511, 0x777CBABD, 0xDF0AB4F9, 0xE71BDB55, 0xAF286BA1, 0x9739040D,
+ 0x59F3BFF2, 0x61E2D05E, 0x29D160AA, 0x11C00F06, 0xB9B60142, 0x81A76EEE, 0xC994DE1A, 0xF185B1B6,
+ 0x9C94B463, 0xA485DBCF, 0xECB66B3B, 0xD4A70497, 0x7CD10AD3, 0x44C0657F, 0x0CF3D58B, 0x34E2BA27,
+ 0xD6D1DE21, 0xEEC0B18D, 0xA6F30179, 0x9EE26ED5, 0x36946091, 0x0E850F3D, 0x46B6BFC9, 0x7EA7D065,
+ 0x13B6D5B0, 0x2BA7BA1C, 0x63940AE8, 0x5B856544, 0xF3F36B00, 0xCBE204AC, 0x83D1B458, 0xBBC0DBF4,
+ 0x425B0AA5, 0x7A4A6509, 0x3279D5FD, 0x0A68BA51, 0xA21EB415, 0x9A0FDBB9, 0xD23C6B4D, 0xEA2D04E1,
+ 0x873C0134, 0xBF2D6E98, 0xF71EDE6C, 0xCF0FB1C0, 0x6779BF84, 0x5F68D028, 0x175B60DC, 0x2F4A0F70,
+ 0xCD796B76, 0xF56804DA, 0xBD5BB42E, 0x854ADB82, 0x2D3CD5C6, 0x152DBA6A, 0x5D1E0A9E, 0x650F6532,
+ 0x081E60E7, 0x300F0F4B, 0x783CBFBF, 0x402DD013, 0xE85BDE57, 0xD04AB1FB, 0x9879010F, 0xA0686EA3
+},
+{
+ 0x00000000, 0xEF306B19, 0xDB8CA0C3, 0x34BCCBDA, 0xB2F53777, 0x5DC55C6E, 0x697997B4, 0x8649FCAD,
+ 0x6006181F, 0x8F367306, 0xBB8AB8DC, 0x54BAD3C5, 0xD2F32F68, 0x3DC34471, 0x097F8FAB, 0xE64FE4B2,
+ 0xC00C303E, 0x2F3C5B27, 0x1B8090FD, 0xF4B0FBE4, 0x72F90749, 0x9DC96C50, 0xA975A78A, 0x4645CC93,
+ 0xA00A2821, 0x4F3A4338, 0x7B8688E2, 0x94B6E3FB, 0x12FF1F56, 0xFDCF744F, 0xC973BF95, 0x2643D48C,
+ 0x85F4168D, 0x6AC47D94, 0x5E78B64E, 0xB148DD57, 0x370121FA, 0xD8314AE3, 0xEC8D8139, 0x03BDEA20,
+ 0xE5F20E92, 0x0AC2658B, 0x3E7EAE51, 0xD14EC548, 0x570739E5, 0xB83752FC, 0x8C8B9926, 0x63BBF23F,
+ 0x45F826B3, 0xAAC84DAA, 0x9E748670, 0x7144ED69, 0xF70D11C4, 0x183D7ADD, 0x2C81B107, 0xC3B1DA1E,
+ 0x25FE3EAC, 0xCACE55B5, 0xFE729E6F, 0x1142F576, 0x970B09DB, 0x783B62C2, 0x4C87A918, 0xA3B7C201,
+ 0x0E045BEB, 0xE13430F2, 0xD588FB28, 0x3AB89031, 0xBCF16C9C, 0x53C10785, 0x677DCC5F, 0x884DA746,
+ 0x6E0243F4, 0x813228ED, 0xB58EE337, 0x5ABE882E, 0xDCF77483, 0x33C71F9A, 0x077BD440, 0xE84BBF59,
+ 0xCE086BD5, 0x213800CC, 0x1584CB16, 0xFAB4A00F, 0x7CFD5CA2, 0x93CD37BB, 0xA771FC61, 0x48419778,
+ 0xAE0E73CA, 0x413E18D3, 0x7582D309, 0x9AB2B810, 0x1CFB44BD, 0xF3CB2FA4, 0xC777E47E, 0x28478F67,
+ 0x8BF04D66, 0x64C0267F, 0x507CEDA5, 0xBF4C86BC, 0x39057A11, 0xD6351108, 0xE289DAD2, 0x0DB9B1CB,
+ 0xEBF65579, 0x04C63E60, 0x307AF5BA, 0xDF4A9EA3, 0x5903620E, 0xB6330917, 0x828FC2CD, 0x6DBFA9D4,
+ 0x4BFC7D58, 0xA4CC1641, 0x9070DD9B, 0x7F40B682, 0xF9094A2F, 0x16392136, 0x2285EAEC, 0xCDB581F5,
+ 0x2BFA6547, 0xC4CA0E5E, 0xF076C584, 0x1F46AE9D, 0x990F5230, 0x763F3929, 0x4283F2F3, 0xADB399EA,
+ 0x1C08B7D6, 0xF338DCCF, 0xC7841715, 0x28B47C0C, 0xAEFD80A1, 0x41CDEBB8, 0x75712062, 0x9A414B7B,
+ 0x7C0EAFC9, 0x933EC4D0, 0xA7820F0A, 0x48B26413, 0xCEFB98BE, 0x21CBF3A7, 0x1577387D, 0xFA475364,
+ 0xDC0487E8, 0x3334ECF1, 0x0788272B, 0xE8B84C32, 0x6EF1B09F, 0x81C1DB86, 0xB57D105C, 0x5A4D7B45,
+ 0xBC029FF7, 0x5332F4EE, 0x678E3F34, 0x88BE542D, 0x0EF7A880, 0xE1C7C399, 0xD57B0843, 0x3A4B635A,
+ 0x99FCA15B, 0x76CCCA42, 0x42700198, 0xAD406A81, 0x2B09962C, 0xC439FD35, 0xF08536EF, 0x1FB55DF6,
+ 0xF9FAB944, 0x16CAD25D, 0x22761987, 0xCD46729E, 0x4B0F8E33, 0xA43FE52A, 0x90832EF0, 0x7FB345E9,
+ 0x59F09165, 0xB6C0FA7C, 0x827C31A6, 0x6D4C5ABF, 0xEB05A612, 0x0435CD0B, 0x308906D1, 0xDFB96DC8,
+ 0x39F6897A, 0xD6C6E263, 0xE27A29B9, 0x0D4A42A0, 0x8B03BE0D, 0x6433D514, 0x508F1ECE, 0xBFBF75D7,
+ 0x120CEC3D, 0xFD3C8724, 0xC9804CFE, 0x26B027E7, 0xA0F9DB4A, 0x4FC9B053, 0x7B757B89, 0x94451090,
+ 0x720AF422, 0x9D3A9F3B, 0xA98654E1, 0x46B63FF8, 0xC0FFC355, 0x2FCFA84C, 0x1B736396, 0xF443088F,
+ 0xD200DC03, 0x3D30B71A, 0x098C7CC0, 0xE6BC17D9, 0x60F5EB74, 0x8FC5806D, 0xBB794BB7, 0x544920AE,
+ 0xB206C41C, 0x5D36AF05, 0x698A64DF, 0x86BA0FC6, 0x00F3F36B, 0xEFC39872, 0xDB7F53A8, 0x344F38B1,
+ 0x97F8FAB0, 0x78C891A9, 0x4C745A73, 0xA344316A, 0x250DCDC7, 0xCA3DA6DE, 0xFE816D04, 0x11B1061D,
+ 0xF7FEE2AF, 0x18CE89B6, 0x2C72426C, 0xC3422975, 0x450BD5D8, 0xAA3BBEC1, 0x9E87751B, 0x71B71E02,
+ 0x57F4CA8E, 0xB8C4A197, 0x8C786A4D, 0x63480154, 0xE501FDF9, 0x0A3196E0, 0x3E8D5D3A, 0xD1BD3623,
+ 0x37F2D291, 0xD8C2B988, 0xEC7E7252, 0x034E194B, 0x8507E5E6, 0x6A378EFF, 0x5E8B4525, 0xB1BB2E3C
+},
+{
+ 0x00000000, 0x68032CC8, 0xD0065990, 0xB8057558, 0xA5E0C5D1, 0xCDE3E919, 0x75E69C41, 0x1DE5B089,
+ 0x4E2DFD53, 0x262ED19B, 0x9E2BA4C3, 0xF628880B, 0xEBCD3882, 0x83CE144A, 0x3BCB6112, 0x53C84DDA,
+ 0x9C5BFAA6, 0xF458D66E, 0x4C5DA336, 0x245E8FFE, 0x39BB3F77, 0x51B813BF, 0xE9BD66E7, 0x81BE4A2F,
+ 0xD27607F5, 0xBA752B3D, 0x02705E65, 0x6A7372AD, 0x7796C224, 0x1F95EEEC, 0xA7909BB4, 0xCF93B77C,
+ 0x3D5B83BD, 0x5558AF75, 0xED5DDA2D, 0x855EF6E5, 0x98BB466C, 0xF0B86AA4, 0x48BD1FFC, 0x20BE3334,
+ 0x73767EEE, 0x1B755226, 0xA370277E, 0xCB730BB6, 0xD696BB3F, 0xBE9597F7, 0x0690E2AF, 0x6E93CE67,
+ 0xA100791B, 0xC90355D3, 0x7106208B, 0x19050C43, 0x04E0BCCA, 0x6CE39002, 0xD4E6E55A, 0xBCE5C992,
+ 0xEF2D8448, 0x872EA880, 0x3F2BDDD8, 0x5728F110, 0x4ACD4199, 0x22CE6D51, 0x9ACB1809, 0xF2C834C1,
+ 0x7AB7077A, 0x12B42BB2, 0xAAB15EEA, 0xC2B27222, 0xDF57C2AB, 0xB754EE63, 0x0F519B3B, 0x6752B7F3,
+ 0x349AFA29, 0x5C99D6E1, 0xE49CA3B9, 0x8C9F8F71, 0x917A3FF8, 0xF9791330, 0x417C6668, 0x297F4AA0,
+ 0xE6ECFDDC, 0x8EEFD114, 0x36EAA44C, 0x5EE98884, 0x430C380D, 0x2B0F14C5, 0x930A619D, 0xFB094D55,
+ 0xA8C1008F, 0xC0C22C47, 0x78C7591F, 0x10C475D7, 0x0D21C55E, 0x6522E996, 0xDD279CCE, 0xB524B006,
+ 0x47EC84C7, 0x2FEFA80F, 0x97EADD57, 0xFFE9F19F, 0xE20C4116, 0x8A0F6DDE, 0x320A1886, 0x5A09344E,
+ 0x09C17994, 0x61C2555C, 0xD9C72004, 0xB1C40CCC, 0xAC21BC45, 0xC422908D, 0x7C27E5D5, 0x1424C91D,
+ 0xDBB77E61, 0xB3B452A9, 0x0BB127F1, 0x63B20B39, 0x7E57BBB0, 0x16549778, 0xAE51E220, 0xC652CEE8,
+ 0x959A8332, 0xFD99AFFA, 0x459CDAA2, 0x2D9FF66A, 0x307A46E3, 0x58796A2B, 0xE07C1F73, 0x887F33BB,
+ 0xF56E0EF4, 0x9D6D223C, 0x25685764, 0x4D6B7BAC, 0x508ECB25, 0x388DE7ED, 0x808892B5, 0xE88BBE7D,
+ 0xBB43F3A7, 0xD340DF6F, 0x6B45AA37, 0x034686FF, 0x1EA33676, 0x76A01ABE, 0xCEA56FE6, 0xA6A6432E,
+ 0x6935F452, 0x0136D89A, 0xB933ADC2, 0xD130810A, 0xCCD53183, 0xA4D61D4B, 0x1CD36813, 0x74D044DB,
+ 0x27180901, 0x4F1B25C9, 0xF71E5091, 0x9F1D7C59, 0x82F8CCD0, 0xEAFBE018, 0x52FE9540, 0x3AFDB988,
+ 0xC8358D49, 0xA036A181, 0x1833D4D9, 0x7030F811, 0x6DD54898, 0x05D66450, 0xBDD31108, 0xD5D03DC0,
+ 0x8618701A, 0xEE1B5CD2, 0x561E298A, 0x3E1D0542, 0x23F8B5CB, 0x4BFB9903, 0xF3FEEC5B, 0x9BFDC093,
+ 0x546E77EF, 0x3C6D5B27, 0x84682E7F, 0xEC6B02B7, 0xF18EB23E, 0x998D9EF6, 0x2188EBAE, 0x498BC766,
+ 0x1A438ABC, 0x7240A674, 0xCA45D32C, 0xA246FFE4, 0xBFA34F6D, 0xD7A063A5, 0x6FA516FD, 0x07A63A35,
+ 0x8FD9098E, 0xE7DA2546, 0x5FDF501E, 0x37DC7CD6, 0x2A39CC5F, 0x423AE097, 0xFA3F95CF, 0x923CB907,
+ 0xC1F4F4DD, 0xA9F7D815, 0x11F2AD4D, 0x79F18185, 0x6414310C, 0x0C171DC4, 0xB412689C, 0xDC114454,
+ 0x1382F328, 0x7B81DFE0, 0xC384AAB8, 0xAB878670, 0xB66236F9, 0xDE611A31, 0x66646F69, 0x0E6743A1,
+ 0x5DAF0E7B, 0x35AC22B3, 0x8DA957EB, 0xE5AA7B23, 0xF84FCBAA, 0x904CE762, 0x2849923A, 0x404ABEF2,
+ 0xB2828A33, 0xDA81A6FB, 0x6284D3A3, 0x0A87FF6B, 0x17624FE2, 0x7F61632A, 0xC7641672, 0xAF673ABA,
+ 0xFCAF7760, 0x94AC5BA8, 0x2CA92EF0, 0x44AA0238, 0x594FB2B1, 0x314C9E79, 0x8949EB21, 0xE14AC7E9,
+ 0x2ED97095, 0x46DA5C5D, 0xFEDF2905, 0x96DC05CD, 0x8B39B544, 0xE33A998C, 0x5B3FECD4, 0x333CC01C,
+ 0x60F48DC6, 0x08F7A10E, 0xB0F2D456, 0xD8F1F89E, 0xC5144817, 0xAD1764DF, 0x15121187, 0x7D113D4F
+},
+{
+ 0x00000000, 0x493C7D27, 0x9278FA4E, 0xDB448769, 0x211D826D, 0x6821FF4A, 0xB3657823, 0xFA590504,
+ 0x423B04DA, 0x0B0779FD, 0xD043FE94, 0x997F83B3, 0x632686B7, 0x2A1AFB90, 0xF15E7CF9, 0xB86201DE,
+ 0x847609B4, 0xCD4A7493, 0x160EF3FA, 0x5F328EDD, 0xA56B8BD9, 0xEC57F6FE, 0x37137197, 0x7E2F0CB0,
+ 0xC64D0D6E, 0x8F717049, 0x5435F720, 0x1D098A07, 0xE7508F03, 0xAE6CF224, 0x7528754D, 0x3C14086A,
+ 0x0D006599, 0x443C18BE, 0x9F789FD7, 0xD644E2F0, 0x2C1DE7F4, 0x65219AD3, 0xBE651DBA, 0xF759609D,
+ 0x4F3B6143, 0x06071C64, 0xDD439B0D, 0x947FE62A, 0x6E26E32E, 0x271A9E09, 0xFC5E1960, 0xB5626447,
+ 0x89766C2D, 0xC04A110A, 0x1B0E9663, 0x5232EB44, 0xA86BEE40, 0xE1579367, 0x3A13140E, 0x732F6929,
+ 0xCB4D68F7, 0x827115D0, 0x593592B9, 0x1009EF9E, 0xEA50EA9A, 0xA36C97BD, 0x782810D4, 0x31146DF3,
+ 0x1A00CB32, 0x533CB615, 0x8878317C, 0xC1444C5B, 0x3B1D495F, 0x72213478, 0xA965B311, 0xE059CE36,
+ 0x583BCFE8, 0x1107B2CF, 0xCA4335A6, 0x837F4881, 0x79264D85, 0x301A30A2, 0xEB5EB7CB, 0xA262CAEC,
+ 0x9E76C286, 0xD74ABFA1, 0x0C0E38C8, 0x453245EF, 0xBF6B40EB, 0xF6573DCC, 0x2D13BAA5, 0x642FC782,
+ 0xDC4DC65C, 0x9571BB7B, 0x4E353C12, 0x07094135, 0xFD504431, 0xB46C3916, 0x6F28BE7F, 0x2614C358,
+ 0x1700AEAB, 0x5E3CD38C, 0x857854E5, 0xCC4429C2, 0x361D2CC6, 0x7F2151E1, 0xA465D688, 0xED59ABAF,
+ 0x553BAA71, 0x1C07D756, 0xC743503F, 0x8E7F2D18, 0x7426281C, 0x3D1A553B, 0xE65ED252, 0xAF62AF75,
+ 0x9376A71F, 0xDA4ADA38, 0x010E5D51, 0x48322076, 0xB26B2572, 0xFB575855, 0x2013DF3C, 0x692FA21B,
+ 0xD14DA3C5, 0x9871DEE2, 0x4335598B, 0x0A0924AC, 0xF05021A8, 0xB96C5C8F, 0x6228DBE6, 0x2B14A6C1,
+ 0x34019664, 0x7D3DEB43, 0xA6796C2A, 0xEF45110D, 0x151C1409, 0x5C20692E, 0x8764EE47, 0xCE589360,
+ 0x763A92BE, 0x3F06EF99, 0xE44268F0, 0xAD7E15D7, 0x572710D3, 0x1E1B6DF4, 0xC55FEA9D, 0x8C6397BA,
+ 0xB0779FD0, 0xF94BE2F7, 0x220F659E, 0x6B3318B9, 0x916A1DBD, 0xD856609A, 0x0312E7F3, 0x4A2E9AD4,
+ 0xF24C9B0A, 0xBB70E62D, 0x60346144, 0x29081C63, 0xD3511967, 0x9A6D6440, 0x4129E329, 0x08159E0E,
+ 0x3901F3FD, 0x703D8EDA, 0xAB7909B3, 0xE2457494, 0x181C7190, 0x51200CB7, 0x8A648BDE, 0xC358F6F9,
+ 0x7B3AF727, 0x32068A00, 0xE9420D69, 0xA07E704E, 0x5A27754A, 0x131B086D, 0xC85F8F04, 0x8163F223,
+ 0xBD77FA49, 0xF44B876E, 0x2F0F0007, 0x66337D20, 0x9C6A7824, 0xD5560503, 0x0E12826A, 0x472EFF4D,
+ 0xFF4CFE93, 0xB67083B4, 0x6D3404DD, 0x240879FA, 0xDE517CFE, 0x976D01D9, 0x4C2986B0, 0x0515FB97,
+ 0x2E015D56, 0x673D2071, 0xBC79A718, 0xF545DA3F, 0x0F1CDF3B, 0x4620A21C, 0x9D642575, 0xD4585852,
+ 0x6C3A598C, 0x250624AB, 0xFE42A3C2, 0xB77EDEE5, 0x4D27DBE1, 0x041BA6C6, 0xDF5F21AF, 0x96635C88,
+ 0xAA7754E2, 0xE34B29C5, 0x380FAEAC, 0x7133D38B, 0x8B6AD68F, 0xC256ABA8, 0x19122CC1, 0x502E51E6,
+ 0xE84C5038, 0xA1702D1F, 0x7A34AA76, 0x3308D751, 0xC951D255, 0x806DAF72, 0x5B29281B, 0x1215553C,
+ 0x230138CF, 0x6A3D45E8, 0xB179C281, 0xF845BFA6, 0x021CBAA2, 0x4B20C785, 0x906440EC, 0xD9583DCB,
+ 0x613A3C15, 0x28064132, 0xF342C65B, 0xBA7EBB7C, 0x4027BE78, 0x091BC35F, 0xD25F4436, 0x9B633911,
+ 0xA777317B, 0xEE4B4C5C, 0x350FCB35, 0x7C33B612, 0x866AB316, 0xCF56CE31, 0x14124958, 0x5D2E347F,
+ 0xE54C35A1, 0xAC704886, 0x7734CFEF, 0x3E08B2C8, 0xC451B7CC, 0x8D6DCAEB, 0x56294D82, 0x1F1530A5
+}};
+
+#define CRC32_UPD(crc, n) \
+	(crc32c_tables[(n)][(crc) & 0xFF] ^ \
+	 crc32c_tables[(n)-1][((crc) >> 8) & 0xFF])
+
+static inline uint32_t
+crc32c_1word(uint32_t data, uint32_t init_val)
+{
+	uint32_t crc, term1, term2;
+	crc = init_val;
+	crc ^= data;
+
+	term1 = CRC32_UPD(crc, 3);
+	term2 = crc >> 16;
+	crc = term1 ^ CRC32_UPD(term2, 1);
+
+	return crc;
+}
+
+static inline uint32_t
+crc32c_2words(uint64_t data, uint32_t init_val)
+{
+	union {
+		uint64_t u64;
+		uint32_t u32[2];
+	} d;
+	d.u64 = data;
+
+	uint32_t crc, term1, term2;
+
+	crc = init_val;
+	crc ^= d.u32[0];
+
+	term1 = CRC32_UPD(crc, 7);
+	term2 = crc >> 16;
+	crc = term1 ^ CRC32_UPD(term2, 5);
+	term1 = CRC32_UPD(d.u32[1], 3);
+	term2 = d.u32[1] >> 16;
+	crc ^= term1 ^ CRC32_UPD(term2, 1);
+
+	return crc;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
  *
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v5 2/7] hash: add assembly implementation of CRC32 intrinsics
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 1/7] hash: add software CRC32 implementation Yerden Zhumabekov
@ 2014-11-20  5:16   ` Yerden Zhumabekov
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 3/7] hash: replace built-in functions implementing SSE4.2 Yerden Zhumabekov
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  5:16 UTC (permalink / raw)
  To: dev

Added:
- crc32c_sse42_u32() emits 'crc32l' asm instruction;
- crc32c_sse42_u64() emits 'crc32q' asm instruction;
- crc32c_sse42_u64_mimic(), wrapper in case of run on 32-bit platform.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 4d7532a..9bd0cf6 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -363,6 +363,40 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 	return crc;
 }
 
+static inline uint32_t
+crc32c_sse42_u32(uint32_t data, uint32_t init_val)
+{
+	__asm__ volatile(
+			"crc32l %[data], %[init_val];"
+			: [init_val] "+r" (init_val)
+			: [data] "rm" (data));
+	return init_val;
+}
+
+static inline uint32_t
+crc32c_sse42_u64(uint64_t data, uint64_t init_val)
+{
+	__asm__ volatile(
+			"crc32q %[data], %[init_val];"
+			: [init_val] "+r" (init_val)
+			: [data] "rm" (data));
+	return init_val;
+}
+
+static inline uint32_t
+crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
+{
+	union {
+		uint32_t u32[2];
+		uint64_t u64;
+	} d;
+
+	d.u64 = data;
+	init_val = crc32c_sse42_u32(d.u32[0], init_val);
+	init_val = crc32c_sse42_u32(d.u32[1], init_val);
+	return init_val;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
  *
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v5 3/7] hash: replace built-in functions implementing SSE4.2
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 1/7] hash: add software CRC32 implementation Yerden Zhumabekov
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 2/7] hash: add assembly implementation of CRC32 intrinsics Yerden Zhumabekov
@ 2014-11-20  5:16   ` Yerden Zhumabekov
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function Yerden Zhumabekov
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  5:16 UTC (permalink / raw)
  To: dev

Give up using built-in intrinsics and use our own assembly
implementation. Remove #include entry as well.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 9bd0cf6..cd28833 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,6 @@ extern "C" {
 #endif
 
 #include <stdint.h>
-#include <nmmintrin.h>
 
 /* Lookup tables for software implementation of CRC32C */
 static uint32_t crc32c_tables[8][256] = {{
@@ -410,7 +409,7 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	return _mm_crc32_u32(init_val, data);
+	return crc32c_sse42_u32(data, init_val);
 }
 
 /**
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                     ` (2 preceding siblings ...)
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 3/7] hash: replace built-in functions implementing SSE4.2 Yerden Zhumabekov
@ 2014-11-20  5:16   ` Yerden Zhumabekov
  2014-11-21 11:22     ` Neil Horman
  2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 6/7] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  5:16 UTC (permalink / raw)
  To: dev

SSE4.2 provides CRC32 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index cd28833..2c8ec99 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -413,6 +413,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }
 
 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+	return crc32c_sse42_u64(data, init_val);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v5 6/7] hash: rte_hash_crc() slices data into 8-byte pieces
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                     ` (3 preceding siblings ...)
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function Yerden Zhumabekov
@ 2014-11-20  5:17   ` Yerden Zhumabekov
  2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 7/7] test: remove redundant compile checks Yerden Zhumabekov
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  5:17 UTC (permalink / raw)
  To: dev

Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 469b4f5..39d0569 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -486,7 +486,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }
 
 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -501,23 +501,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
 	unsigned i;
-	uint32_t temp = 0;
-	const uint32_t *p32 = (const uint32_t *)data;
+	uint64_t temp = 0;
+	const uint64_t *p64 = (const uint64_t *)data;
 
-	for (i = 0; i < data_len / 4; i++) {
-		init_val = rte_hash_crc_4byte(*p32++, init_val);
+	for (i = 0; i < data_len / 8; i++) {
+		init_val = rte_hash_crc_8byte(*p64++, init_val);
 	}
 
-	switch (3 - (data_len & 0x03)) {
+	switch (7 - (data_len & 0x07)) {
 	case 0:
-		temp |= *((const uint8_t *)p32 + 2) << 16;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
 		/* Fallthrough */
 	case 1:
-		temp |= *((const uint8_t *)p32 + 1) << 8;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
 		/* Fallthrough */
 	case 2:
-		temp |= *((const uint8_t *)p32);
+		temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+		temp |= *((const uint32_t *)p64);
+		init_val = rte_hash_crc_8byte(temp, init_val);
+		break;
+	case 3:
+		init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+		break;
+	case 4:
+		temp |= *((const uint8_t *)p64 + 2) << 16;
+		/* Fallthrough */
+	case 5:
+		temp |= *((const uint8_t *)p64 + 1) << 8;
+		/* Fallthrough */
+	case 6:
+		temp |= *((const uint8_t *)p64);
 		init_val = rte_hash_crc_4byte(temp, init_val);
+		/* Fallthrough */
 	default:
 		break;
 	}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v5 7/7] test: remove redundant compile checks
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                     ` (4 preceding siblings ...)
  2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 6/7] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
@ 2014-11-20  5:17   ` Yerden Zhumabekov
  2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 5/7] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
  2014-11-27 21:04   ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Thomas Monjalon
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  5:17 UTC (permalink / raw)
  To: dev

Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 app/test/test_hash.c      |    7 -------
 app/test/test_hash_perf.c |   11 -----------
 2 files changed, 18 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 178ec3f..76b1b8f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -55,10 +55,7 @@
 #include <rte_hash.h>
 #include <rte_fbk_hash.h>
 #include <rte_jhash.h>
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <rte_hash_crc.h>
-#endif
 
 /*******************************************************************************
  * Hash function performance test configuration section. Each performance test
@@ -67,11 +64,7 @@
  * The five arrays below control what tests are performed. Every combination
  * from the array entries is tested.
  */
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64};
 /******************************************************************************/
diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c
index be34957..05a88ec 100644
--- a/app/test/test_hash_perf.c
+++ b/app/test/test_hash_perf.c
@@ -56,10 +56,7 @@
 #include <rte_hash.h>
 #include <rte_fbk_hash.h>
 #include <rte_jhash.h>
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <rte_hash_crc.h>
-#endif
 
 /* Types of hash table performance test that can be performed */
 enum hash_test_t {
@@ -97,11 +94,7 @@ struct tbl_perf_test_params {
  */
 #define HASHTEST_ITERATIONS 1000000
 
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64};
 /******************************************************************************/
@@ -243,7 +236,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {       LOOKUP,  ITERATIONS,  1048576,           4,      64,    rte_jhash,   0},
 {       LOOKUP,  ITERATIONS,  1048576,           8,      64,    rte_jhash,   0},
 {       LOOKUP,  ITERATIONS,  1048576,          16,      64,    rte_jhash,   0},
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 /* Small table, add */
 /*  Test type | Iterations | Entries | BucketSize | KeyLen |    HashFunc | InitVal */
 { ADD_ON_EMPTY,        1024,     1024,           1,      16, rte_hash_crc,   0},
@@ -376,7 +368,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {       LOOKUP,  ITERATIONS,  1048576,           4,      64, rte_hash_crc,   0},
 {       LOOKUP,  ITERATIONS,  1048576,           8,      64, rte_hash_crc,   0},
 {       LOOKUP,  ITERATIONS,  1048576,          16,      64, rte_hash_crc,   0},
-#endif
 };
 
 /******************************************************************************/
@@ -423,10 +414,8 @@ static const char *get_hash_name(rte_hash_function f)
 	if (f == rte_jhash)
 		return "jhash";
 
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 	if (f == rte_hash_crc)
 		return "rte_hash_crc";
-#endif
 
 	return "UnknownHash";
 }
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v5 5/7] hash: add fallback to software CRC32 implementation
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                     ` (5 preceding siblings ...)
  2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 7/7] test: remove redundant compile checks Yerden Zhumabekov
@ 2014-11-20  5:17   ` Yerden Zhumabekov
  2014-11-27 21:04   ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Thomas Monjalon
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-20  5:17 UTC (permalink / raw)
  To: dev

Initially, SSE4.2 support is detected via CPUID instruction via
the constructor function.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default.

rte_hash_crc_*byte() functions reworked so they choose available
CRC32 implementation in the runtime.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   61 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 2c8ec99..469b4f5 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,6 +45,8 @@ extern "C" {
 #endif
 
 #include <stdint.h>
+#include <rte_cpuflags.h>
+#include <rte_branch_prediction.h>
 
 /* Lookup tables for software implementation of CRC32C */
 static uint32_t crc32c_tables[8][256] = {{
@@ -396,8 +398,52 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
 	return init_val;
 }
 
+#define CRC32_SW            (1U << 0)
+#define CRC32_SSE42         (1U << 1)
+#define CRC32_x64           (1U << 2)
+#define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
+
+static uint8_t crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param flag
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default)
+ *
+ */
+static inline void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	switch (alg) {
+	case CRC32_SSE42_x64:
+		if (! rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T))
+			alg = CRC32_SSE42;
+	case CRC32_SSE42:
+		if (! rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = CRC32_SW;
+	case CRC32_SW:
+		crc32_alg = alg;
+	default:
+		break;
+	}
+}
+
+/* Setting the best available algorithm */
+static inline void __attribute__((constructor))
+rte_hash_crc_init_alg(void)
+{
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -409,11 +455,16 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	return crc32c_sse42_u32(data, init_val);
+	if (likely(crc32_alg & CRC32_SSE42))
+		return crc32c_sse42_u32(data, init_val);
+
+	return crc32c_1word(data, init_val);
 }
 
 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -425,7 +476,13 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	return crc32c_sse42_u64(data, init_val);
+	if (likely(crc32_alg == CRC32_SSE42_x64))
+		return crc32c_sse42_u64(data, init_val);
+
+	if (likely(crc32_alg & CRC32_SSE42))
+		return crc32c_sse42_u64_mimic(data, init_val);
+
+	return crc32c_2words(data, init_val);
 }
 
 /**
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function
  2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function Yerden Zhumabekov
@ 2014-11-21 11:22     ` Neil Horman
  2014-11-21 11:26       ` Yerden Zhumabekov
  0 siblings, 1 reply; 98+ messages in thread
From: Neil Horman @ 2014-11-21 11:22 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Thu, Nov 20, 2014 at 11:16:34AM +0600, Yerden Zhumabekov wrote:
> SSE4.2 provides CRC32 intrinsic with 8-byte operand.
> 
> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> ---
>  lib/librte_hash/rte_hash_crc.h |   16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index cd28833..2c8ec99 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -413,6 +413,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  }
>  
>  /**
> + * Use single crc32 instruction to perform a hash on a 8 byte value.
> + *
> + * @param data
> + *   Data to perform hash on.
> + * @param init_val
> + *   Value to initialise hash generator.
> + * @return
> + *   32bit calculated hash value.
> + */
> +static inline uint32_t
> +rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
> +{
> +	return crc32c_sse42_u64(data, init_val);
> +}
> +
> +/**
>   * Use crc32 instruction to perform a hash.
>   *
>   * @param data
> -- 
> 1.7.9.5
> 
> 

I'm sorry, it may be early here, so I may be missing something. The assembly
implementations look great, but if a user calls rte_hash_crc_8byte on a system
that doesn't support ss342, how do they wind up getting into the software crc
implementation given what you have above?
Neil

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function
  2014-11-21 11:22     ` Neil Horman
@ 2014-11-21 11:26       ` Yerden Zhumabekov
  0 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-21 11:26 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev


21.11.2014 17:22, Neil Horman пишет:
> On Thu, Nov 20, 2014 at 11:16:34AM +0600, Yerden Zhumabekov wrote:
>> SSE4.2 provides CRC32 intrinsic with 8-byte operand.
>>
>> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
>> ---
>>  lib/librte_hash/rte_hash_crc.h |   16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
>> index cd28833..2c8ec99 100644
>> --- a/lib/librte_hash/rte_hash_crc.h
>> +++ b/lib/librte_hash/rte_hash_crc.h
>> @@ -413,6 +413,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>>  }
>>  
>>  /**
>> + * Use single crc32 instruction to perform a hash on a 8 byte value.
>> + *
>> + * @param data
>> + *   Data to perform hash on.
>> + * @param init_val
>> + *   Value to initialise hash generator.
>> + * @return
>> + *   32bit calculated hash value.
>> + */
>> +static inline uint32_t
>> +rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
>> +{
>> +	return crc32c_sse42_u64(data, init_val);
>> +}
>> +
>> +/**
>>   * Use crc32 instruction to perform a hash.
>>   *
>>   * @param data
>> -- 
>> 1.7.9.5
>>
>>
> I'm sorry, it may be early here, so I may be missing something. The assembly
> implementations look great, but if a user calls rte_hash_crc_8byte on a system
> that doesn't support ss342, how do they wind up getting into the software crc
> implementation given what you have above?
> Neil

After applying patch 4 out of 7 - there's no fall back.  Fall back to SW
crc32 algorithm is in patch 5/7.

Moreover, after patch 5/7  there's a detection if the platform supports
64-bit, otherwise 64-bit operand support is mimicked using two 32-bit
function calls.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent
  2014-11-17 11:54     ` Yerden Zhumabekov
@ 2014-11-25 17:05       ` Stephen Hemminger
  0 siblings, 0 replies; 98+ messages in thread
From: Stephen Hemminger @ 2014-11-25 17:05 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

I found that other hash functions are faster than the crc32 SSE instruction.

Though the hardware instruction seems like it would be faster, it takes more
cycles than simple multiplicative or murmur hash.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/5] hash: add software CRC32 implementation
  2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
@ 2014-11-25 17:34       ` Stephen Hemminger
  0 siblings, 0 replies; 98+ messages in thread
From: Stephen Hemminger @ 2014-11-25 17:34 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Tue, 18 Nov 2014 09:21:54 +0600
Yerden Zhumabekov <e_zhumabekov@sts.kz> wrote:

> +/* Lookup tables for software implementation of CRC32C */
> +static uint32_t crc32c_tables[8][256] = {{
> + 0x00000000, 0xF26B8303, 0xE13B70F7, 0x1350F3F4,

Table should be declared const

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
                     ` (6 preceding siblings ...)
  2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 5/7] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
@ 2014-11-27 21:04   ` Thomas Monjalon
  2014-11-28  3:28     ` Yerden Zhumabekov
  7 siblings, 1 reply; 98+ messages in thread
From: Thomas Monjalon @ 2014-11-27 21:04 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

2014-11-20 11:15, Yerden Zhumabekov:
> These patches bring a fallback mechanism to ensure that CRC32 hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics).
> Performance is also improved by slicing data in 8 bytes.
> 
> Patches were tested on machines either with and without SSE4.2 support.
> 
> Software implementation seems to be about 4-5 times slower than SSE4.2-enabled one. Of course, they return identical results.
> 
> Summary of changes:
> * added CRC32 software implementation, which is used as a fallback in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
> * added rte_hash_crc_set_alg() function to control availability of SSE4.2.
> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
> * reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.
> * removed compile-time checks from test_hash_perf and test_hash.
> * setting default algorithm implementation as a constructor while application startup.
> * SSE4.2 intrinsics are implemented through inline assembly code.
> * added additional run-time check for 64-bit support.

So you don't want to use the target attribute as suggested by Konstantin?

Why the discussion ended without any acknowledgement?

-- 
Thomas

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent
  2014-11-27 21:04   ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Thomas Monjalon
@ 2014-11-28  3:28     ` Yerden Zhumabekov
  0 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2014-11-28  3:28 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev


28.11.2014 3:04, Thomas Monjalon пишет:
> 2014-11-20 11:15, Yerden Zhumabekov:
>> These patches bring a fallback mechanism to ensure that CRC32 hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics).
>> Performance is also improved by slicing data in 8 bytes.
>>
>> Patches were tested on machines either with and without SSE4.2 support.
>>
>> Software implementation seems to be about 4-5 times slower than SSE4.2-enabled one. Of course, they return identical results.
>>
>> Summary of changes:
>> * added CRC32 software implementation, which is used as a fallback in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
>> * added rte_hash_crc_set_alg() function to control availability of SSE4.2.
>> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
>> * reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.
>> * removed compile-time checks from test_hash_perf and test_hash.
>> * setting default algorithm implementation as a constructor while application startup.
>> * SSE4.2 intrinsics are implemented through inline assembly code.
>> * added additional run-time check for 64-bit support.
> So you don't want to use the target attribute as suggested by Konstantin?
>
> Why the discussion ended without any acknowledgement?
>

I decided to emit SSE4.2 instruction right from the code, because:
* it is supported by gcc 4.3;
* use of target attribute (in a way suggested by Konstantin) presumably
still requires us to use #ifdef which we want to avoid.

Actually then, I didn't investigate it further. I'm quite happy with
last revision, but I'm open for ideas and discussion.
I made new patch series with solely change of crc32c tables declaration
using 'const' just as Stephen suggested, and I may post it. But I'd like
to see a confirmation for what I've done so far.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
                   ` (9 preceding siblings ...)
  2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
@ 2015-01-29  8:48 ` Yerden Zhumabekov
  2015-01-29  8:48   ` [dpdk-dev] [PATCH v6 1/7] hash: add software CRC32 implementation Yerden Zhumabekov
                     ` (7 more replies)
  10 siblings, 8 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-01-29  8:48 UTC (permalink / raw)
  To: dev

This is a rework of my previous patches improving performance of rte_hash_crc.

Summary of changes:
* software implementation of CRC32 introduced;
* in the runtime, algorithm can fall back to software version if CPU doesn't support SSE4.2;
* best available algorithm is automatically detected upon application startup;
* redundant compile checks removed from test utilities;
* assembly code for emitting SSE4.2 instructions is used instead of built-in intrinsics;
* rte_hash_crc() function performance significantly improved.

v6 changes:
* added 'const' qualifier to crc32c lookup tables declaration.

v5 changes:
* given up gcc's builtin SSE4.2 intrinsics;
* add assembly code for emitting SSE4.2 instructions.

v4 changes:
* icc-specific compile checks removed.

v3 changes:
* setting default algorithm implementation as a constructor while application startup;
* crc32 software implementation improved;
* removed compile-time checks from test_hash_perf and test_hash.

v2 changes:
* added CRC32 software implementation;
* added rte_hash_crc_set_alg() function to control availability of SSE4.2;
* added fallback to sw crc32 in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.

Initial version (v1) changes:
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand;
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.


Yerden Zhumabekov (7):
  hash: add software CRC32 implementation
  hash: add assembly implementation of CRC32 intrinsics
  hash: replace built-in functions implementing SSE4.2
  hash: add rte_hash_crc_8byte function
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces
  test: remove redundant compile checks

 app/test/test_hash.c           |    7 -
 app/test/test_hash_perf.c      |   11 -
 lib/librte_hash/rte_hash_crc.h |  459 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 448 insertions(+), 29 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v6 1/7] hash: add software CRC32 implementation
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
@ 2015-01-29  8:48   ` Yerden Zhumabekov
  2015-01-29  8:48   ` [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics Yerden Zhumabekov
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-01-29  8:48 UTC (permalink / raw)
  To: dev

Add lookup tables for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and 64-bit
operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |  316 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 316 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..4da7ca4 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -47,6 +47,322 @@ extern "C" {
 #include <stdint.h>
 #include <nmmintrin.h>
 
+/* Lookup tables for software implementation of CRC32C */
+static const uint32_t crc32c_tables[8][256] = {{
+ 0x00000000, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, 0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB,
+ 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, 0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24,
+ 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, 0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384,
+ 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, 0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B,
+ 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, 0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35,
+ 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, 0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA,
+ 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, 0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A,
+ 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, 0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595,
+ 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, 0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957,
+ 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, 0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198,
+ 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, 0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38,
+ 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, 0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7,
+ 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, 0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789,
+ 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, 0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46,
+ 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, 0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6,
+ 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, 0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829,
+ 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, 0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93,
+ 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, 0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C,
+ 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, 0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC,
+ 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, 0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033,
+ 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, 0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D,
+ 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, 0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982,
+ 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, 0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622,
+ 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, 0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED,
+ 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, 0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F,
+ 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, 0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0,
+ 0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F, 0x144976B4, 0xE622F5B7, 0xF5720643, 0x07198540,
+ 0x590AB964, 0xAB613A67, 0xB831C993, 0x4A5A4A90, 0x9E902E7B, 0x6CFBAD78, 0x7FAB5E8C, 0x8DC0DD8F,
+ 0xE330A81A, 0x115B2B19, 0x020BD8ED, 0xF0605BEE, 0x24AA3F05, 0xD6C1BC06, 0xC5914FF2, 0x37FACCF1,
+ 0x69E9F0D5, 0x9B8273D6, 0x88D28022, 0x7AB90321, 0xAE7367CA, 0x5C18E4C9, 0x4F48173D, 0xBD23943E,
+ 0xF36E6F75, 0x0105EC76, 0x12551F82, 0xE03E9C81, 0x34F4F86A, 0xC69F7B69, 0xD5CF889D, 0x27A40B9E,
+ 0x79B737BA, 0x8BDCB4B9, 0x988C474D, 0x6AE7C44E, 0xBE2DA0A5, 0x4C4623A6, 0x5F16D052, 0xAD7D5351
+},
+{
+ 0x00000000, 0x13A29877, 0x274530EE, 0x34E7A899, 0x4E8A61DC, 0x5D28F9AB, 0x69CF5132, 0x7A6DC945,
+ 0x9D14C3B8, 0x8EB65BCF, 0xBA51F356, 0xA9F36B21, 0xD39EA264, 0xC03C3A13, 0xF4DB928A, 0xE7790AFD,
+ 0x3FC5F181, 0x2C6769F6, 0x1880C16F, 0x0B225918, 0x714F905D, 0x62ED082A, 0x560AA0B3, 0x45A838C4,
+ 0xA2D13239, 0xB173AA4E, 0x859402D7, 0x96369AA0, 0xEC5B53E5, 0xFFF9CB92, 0xCB1E630B, 0xD8BCFB7C,
+ 0x7F8BE302, 0x6C297B75, 0x58CED3EC, 0x4B6C4B9B, 0x310182DE, 0x22A31AA9, 0x1644B230, 0x05E62A47,
+ 0xE29F20BA, 0xF13DB8CD, 0xC5DA1054, 0xD6788823, 0xAC154166, 0xBFB7D911, 0x8B507188, 0x98F2E9FF,
+ 0x404E1283, 0x53EC8AF4, 0x670B226D, 0x74A9BA1A, 0x0EC4735F, 0x1D66EB28, 0x298143B1, 0x3A23DBC6,
+ 0xDD5AD13B, 0xCEF8494C, 0xFA1FE1D5, 0xE9BD79A2, 0x93D0B0E7, 0x80722890, 0xB4958009, 0xA737187E,
+ 0xFF17C604, 0xECB55E73, 0xD852F6EA, 0xCBF06E9D, 0xB19DA7D8, 0xA23F3FAF, 0x96D89736, 0x857A0F41,
+ 0x620305BC, 0x71A19DCB, 0x45463552, 0x56E4AD25, 0x2C896460, 0x3F2BFC17, 0x0BCC548E, 0x186ECCF9,
+ 0xC0D23785, 0xD370AFF2, 0xE797076B, 0xF4359F1C, 0x8E585659, 0x9DFACE2E, 0xA91D66B7, 0xBABFFEC0,
+ 0x5DC6F43D, 0x4E646C4A, 0x7A83C4D3, 0x69215CA4, 0x134C95E1, 0x00EE0D96, 0x3409A50F, 0x27AB3D78,
+ 0x809C2506, 0x933EBD71, 0xA7D915E8, 0xB47B8D9F, 0xCE1644DA, 0xDDB4DCAD, 0xE9537434, 0xFAF1EC43,
+ 0x1D88E6BE, 0x0E2A7EC9, 0x3ACDD650, 0x296F4E27, 0x53028762, 0x40A01F15, 0x7447B78C, 0x67E52FFB,
+ 0xBF59D487, 0xACFB4CF0, 0x981CE469, 0x8BBE7C1E, 0xF1D3B55B, 0xE2712D2C, 0xD69685B5, 0xC5341DC2,
+ 0x224D173F, 0x31EF8F48, 0x050827D1, 0x16AABFA6, 0x6CC776E3, 0x7F65EE94, 0x4B82460D, 0x5820DE7A,
+ 0xFBC3FAF9, 0xE861628E, 0xDC86CA17, 0xCF245260, 0xB5499B25, 0xA6EB0352, 0x920CABCB, 0x81AE33BC,
+ 0x66D73941, 0x7575A136, 0x419209AF, 0x523091D8, 0x285D589D, 0x3BFFC0EA, 0x0F186873, 0x1CBAF004,
+ 0xC4060B78, 0xD7A4930F, 0xE3433B96, 0xF0E1A3E1, 0x8A8C6AA4, 0x992EF2D3, 0xADC95A4A, 0xBE6BC23D,
+ 0x5912C8C0, 0x4AB050B7, 0x7E57F82E, 0x6DF56059, 0x1798A91C, 0x043A316B, 0x30DD99F2, 0x237F0185,
+ 0x844819FB, 0x97EA818C, 0xA30D2915, 0xB0AFB162, 0xCAC27827, 0xD960E050, 0xED8748C9, 0xFE25D0BE,
+ 0x195CDA43, 0x0AFE4234, 0x3E19EAAD, 0x2DBB72DA, 0x57D6BB9F, 0x447423E8, 0x70938B71, 0x63311306,
+ 0xBB8DE87A, 0xA82F700D, 0x9CC8D894, 0x8F6A40E3, 0xF50789A6, 0xE6A511D1, 0xD242B948, 0xC1E0213F,
+ 0x26992BC2, 0x353BB3B5, 0x01DC1B2C, 0x127E835B, 0x68134A1E, 0x7BB1D269, 0x4F567AF0, 0x5CF4E287,
+ 0x04D43CFD, 0x1776A48A, 0x23910C13, 0x30339464, 0x4A5E5D21, 0x59FCC556, 0x6D1B6DCF, 0x7EB9F5B8,
+ 0x99C0FF45, 0x8A626732, 0xBE85CFAB, 0xAD2757DC, 0xD74A9E99, 0xC4E806EE, 0xF00FAE77, 0xE3AD3600,
+ 0x3B11CD7C, 0x28B3550B, 0x1C54FD92, 0x0FF665E5, 0x759BACA0, 0x663934D7, 0x52DE9C4E, 0x417C0439,
+ 0xA6050EC4, 0xB5A796B3, 0x81403E2A, 0x92E2A65D, 0xE88F6F18, 0xFB2DF76F, 0xCFCA5FF6, 0xDC68C781,
+ 0x7B5FDFFF, 0x68FD4788, 0x5C1AEF11, 0x4FB87766, 0x35D5BE23, 0x26772654, 0x12908ECD, 0x013216BA,
+ 0xE64B1C47, 0xF5E98430, 0xC10E2CA9, 0xD2ACB4DE, 0xA8C17D9B, 0xBB63E5EC, 0x8F844D75, 0x9C26D502,
+ 0x449A2E7E, 0x5738B609, 0x63DF1E90, 0x707D86E7, 0x0A104FA2, 0x19B2D7D5, 0x2D557F4C, 0x3EF7E73B,
+ 0xD98EEDC6, 0xCA2C75B1, 0xFECBDD28, 0xED69455F, 0x97048C1A, 0x84A6146D, 0xB041BCF4, 0xA3E32483
+},
+{
+ 0x00000000, 0xA541927E, 0x4F6F520D, 0xEA2EC073, 0x9EDEA41A, 0x3B9F3664, 0xD1B1F617, 0x74F06469,
+ 0x38513EC5, 0x9D10ACBB, 0x773E6CC8, 0xD27FFEB6, 0xA68F9ADF, 0x03CE08A1, 0xE9E0C8D2, 0x4CA15AAC,
+ 0x70A27D8A, 0xD5E3EFF4, 0x3FCD2F87, 0x9A8CBDF9, 0xEE7CD990, 0x4B3D4BEE, 0xA1138B9D, 0x045219E3,
+ 0x48F3434F, 0xEDB2D131, 0x079C1142, 0xA2DD833C, 0xD62DE755, 0x736C752B, 0x9942B558, 0x3C032726,
+ 0xE144FB14, 0x4405696A, 0xAE2BA919, 0x0B6A3B67, 0x7F9A5F0E, 0xDADBCD70, 0x30F50D03, 0x95B49F7D,
+ 0xD915C5D1, 0x7C5457AF, 0x967A97DC, 0x333B05A2, 0x47CB61CB, 0xE28AF3B5, 0x08A433C6, 0xADE5A1B8,
+ 0x91E6869E, 0x34A714E0, 0xDE89D493, 0x7BC846ED, 0x0F382284, 0xAA79B0FA, 0x40577089, 0xE516E2F7,
+ 0xA9B7B85B, 0x0CF62A25, 0xE6D8EA56, 0x43997828, 0x37691C41, 0x92288E3F, 0x78064E4C, 0xDD47DC32,
+ 0xC76580D9, 0x622412A7, 0x880AD2D4, 0x2D4B40AA, 0x59BB24C3, 0xFCFAB6BD, 0x16D476CE, 0xB395E4B0,
+ 0xFF34BE1C, 0x5A752C62, 0xB05BEC11, 0x151A7E6F, 0x61EA1A06, 0xC4AB8878, 0x2E85480B, 0x8BC4DA75,
+ 0xB7C7FD53, 0x12866F2D, 0xF8A8AF5E, 0x5DE93D20, 0x29195949, 0x8C58CB37, 0x66760B44, 0xC337993A,
+ 0x8F96C396, 0x2AD751E8, 0xC0F9919B, 0x65B803E5, 0x1148678C, 0xB409F5F2, 0x5E273581, 0xFB66A7FF,
+ 0x26217BCD, 0x8360E9B3, 0x694E29C0, 0xCC0FBBBE, 0xB8FFDFD7, 0x1DBE4DA9, 0xF7908DDA, 0x52D11FA4,
+ 0x1E704508, 0xBB31D776, 0x511F1705, 0xF45E857B, 0x80AEE112, 0x25EF736C, 0xCFC1B31F, 0x6A802161,
+ 0x56830647, 0xF3C29439, 0x19EC544A, 0xBCADC634, 0xC85DA25D, 0x6D1C3023, 0x8732F050, 0x2273622E,
+ 0x6ED23882, 0xCB93AAFC, 0x21BD6A8F, 0x84FCF8F1, 0xF00C9C98, 0x554D0EE6, 0xBF63CE95, 0x1A225CEB,
+ 0x8B277743, 0x2E66E53D, 0xC448254E, 0x6109B730, 0x15F9D359, 0xB0B84127, 0x5A968154, 0xFFD7132A,
+ 0xB3764986, 0x1637DBF8, 0xFC191B8B, 0x595889F5, 0x2DA8ED9C, 0x88E97FE2, 0x62C7BF91, 0xC7862DEF,
+ 0xFB850AC9, 0x5EC498B7, 0xB4EA58C4, 0x11ABCABA, 0x655BAED3, 0xC01A3CAD, 0x2A34FCDE, 0x8F756EA0,
+ 0xC3D4340C, 0x6695A672, 0x8CBB6601, 0x29FAF47F, 0x5D0A9016, 0xF84B0268, 0x1265C21B, 0xB7245065,
+ 0x6A638C57, 0xCF221E29, 0x250CDE5A, 0x804D4C24, 0xF4BD284D, 0x51FCBA33, 0xBBD27A40, 0x1E93E83E,
+ 0x5232B292, 0xF77320EC, 0x1D5DE09F, 0xB81C72E1, 0xCCEC1688, 0x69AD84F6, 0x83834485, 0x26C2D6FB,
+ 0x1AC1F1DD, 0xBF8063A3, 0x55AEA3D0, 0xF0EF31AE, 0x841F55C7, 0x215EC7B9, 0xCB7007CA, 0x6E3195B4,
+ 0x2290CF18, 0x87D15D66, 0x6DFF9D15, 0xC8BE0F6B, 0xBC4E6B02, 0x190FF97C, 0xF321390F, 0x5660AB71,
+ 0x4C42F79A, 0xE90365E4, 0x032DA597, 0xA66C37E9, 0xD29C5380, 0x77DDC1FE, 0x9DF3018D, 0x38B293F3,
+ 0x7413C95F, 0xD1525B21, 0x3B7C9B52, 0x9E3D092C, 0xEACD6D45, 0x4F8CFF3B, 0xA5A23F48, 0x00E3AD36,
+ 0x3CE08A10, 0x99A1186E, 0x738FD81D, 0xD6CE4A63, 0xA23E2E0A, 0x077FBC74, 0xED517C07, 0x4810EE79,
+ 0x04B1B4D5, 0xA1F026AB, 0x4BDEE6D8, 0xEE9F74A6, 0x9A6F10CF, 0x3F2E82B1, 0xD50042C2, 0x7041D0BC,
+ 0xAD060C8E, 0x08479EF0, 0xE2695E83, 0x4728CCFD, 0x33D8A894, 0x96993AEA, 0x7CB7FA99, 0xD9F668E7,
+ 0x9557324B, 0x3016A035, 0xDA386046, 0x7F79F238, 0x0B899651, 0xAEC8042F, 0x44E6C45C, 0xE1A75622,
+ 0xDDA47104, 0x78E5E37A, 0x92CB2309, 0x378AB177, 0x437AD51E, 0xE63B4760, 0x0C158713, 0xA954156D,
+ 0xE5F54FC1, 0x40B4DDBF, 0xAA9A1DCC, 0x0FDB8FB2, 0x7B2BEBDB, 0xDE6A79A5, 0x3444B9D6, 0x91052BA8
+},
+{
+ 0x00000000, 0xDD45AAB8, 0xBF672381, 0x62228939, 0x7B2231F3, 0xA6679B4B, 0xC4451272, 0x1900B8CA,
+ 0xF64463E6, 0x2B01C95E, 0x49234067, 0x9466EADF, 0x8D665215, 0x5023F8AD, 0x32017194, 0xEF44DB2C,
+ 0xE964B13D, 0x34211B85, 0x560392BC, 0x8B463804, 0x924680CE, 0x4F032A76, 0x2D21A34F, 0xF06409F7,
+ 0x1F20D2DB, 0xC2657863, 0xA047F15A, 0x7D025BE2, 0x6402E328, 0xB9474990, 0xDB65C0A9, 0x06206A11,
+ 0xD725148B, 0x0A60BE33, 0x6842370A, 0xB5079DB2, 0xAC072578, 0x71428FC0, 0x136006F9, 0xCE25AC41,
+ 0x2161776D, 0xFC24DDD5, 0x9E0654EC, 0x4343FE54, 0x5A43469E, 0x8706EC26, 0xE524651F, 0x3861CFA7,
+ 0x3E41A5B6, 0xE3040F0E, 0x81268637, 0x5C632C8F, 0x45639445, 0x98263EFD, 0xFA04B7C4, 0x27411D7C,
+ 0xC805C650, 0x15406CE8, 0x7762E5D1, 0xAA274F69, 0xB327F7A3, 0x6E625D1B, 0x0C40D422, 0xD1057E9A,
+ 0xABA65FE7, 0x76E3F55F, 0x14C17C66, 0xC984D6DE, 0xD0846E14, 0x0DC1C4AC, 0x6FE34D95, 0xB2A6E72D,
+ 0x5DE23C01, 0x80A796B9, 0xE2851F80, 0x3FC0B538, 0x26C00DF2, 0xFB85A74A, 0x99A72E73, 0x44E284CB,
+ 0x42C2EEDA, 0x9F874462, 0xFDA5CD5B, 0x20E067E3, 0x39E0DF29, 0xE4A57591, 0x8687FCA8, 0x5BC25610,
+ 0xB4868D3C, 0x69C32784, 0x0BE1AEBD, 0xD6A40405, 0xCFA4BCCF, 0x12E11677, 0x70C39F4E, 0xAD8635F6,
+ 0x7C834B6C, 0xA1C6E1D4, 0xC3E468ED, 0x1EA1C255, 0x07A17A9F, 0xDAE4D027, 0xB8C6591E, 0x6583F3A6,
+ 0x8AC7288A, 0x57828232, 0x35A00B0B, 0xE8E5A1B3, 0xF1E51979, 0x2CA0B3C1, 0x4E823AF8, 0x93C79040,
+ 0x95E7FA51, 0x48A250E9, 0x2A80D9D0, 0xF7C57368, 0xEEC5CBA2, 0x3380611A, 0x51A2E823, 0x8CE7429B,
+ 0x63A399B7, 0xBEE6330F, 0xDCC4BA36, 0x0181108E, 0x1881A844, 0xC5C402FC, 0xA7E68BC5, 0x7AA3217D,
+ 0x52A0C93F, 0x8FE56387, 0xEDC7EABE, 0x30824006, 0x2982F8CC, 0xF4C75274, 0x96E5DB4D, 0x4BA071F5,
+ 0xA4E4AAD9, 0x79A10061, 0x1B838958, 0xC6C623E0, 0xDFC69B2A, 0x02833192, 0x60A1B8AB, 0xBDE41213,
+ 0xBBC47802, 0x6681D2BA, 0x04A35B83, 0xD9E6F13B, 0xC0E649F1, 0x1DA3E349, 0x7F816A70, 0xA2C4C0C8,
+ 0x4D801BE4, 0x90C5B15C, 0xF2E73865, 0x2FA292DD, 0x36A22A17, 0xEBE780AF, 0x89C50996, 0x5480A32E,
+ 0x8585DDB4, 0x58C0770C, 0x3AE2FE35, 0xE7A7548D, 0xFEA7EC47, 0x23E246FF, 0x41C0CFC6, 0x9C85657E,
+ 0x73C1BE52, 0xAE8414EA, 0xCCA69DD3, 0x11E3376B, 0x08E38FA1, 0xD5A62519, 0xB784AC20, 0x6AC10698,
+ 0x6CE16C89, 0xB1A4C631, 0xD3864F08, 0x0EC3E5B0, 0x17C35D7A, 0xCA86F7C2, 0xA8A47EFB, 0x75E1D443,
+ 0x9AA50F6F, 0x47E0A5D7, 0x25C22CEE, 0xF8878656, 0xE1873E9C, 0x3CC29424, 0x5EE01D1D, 0x83A5B7A5,
+ 0xF90696D8, 0x24433C60, 0x4661B559, 0x9B241FE1, 0x8224A72B, 0x5F610D93, 0x3D4384AA, 0xE0062E12,
+ 0x0F42F53E, 0xD2075F86, 0xB025D6BF, 0x6D607C07, 0x7460C4CD, 0xA9256E75, 0xCB07E74C, 0x16424DF4,
+ 0x106227E5, 0xCD278D5D, 0xAF050464, 0x7240AEDC, 0x6B401616, 0xB605BCAE, 0xD4273597, 0x09629F2F,
+ 0xE6264403, 0x3B63EEBB, 0x59416782, 0x8404CD3A, 0x9D0475F0, 0x4041DF48, 0x22635671, 0xFF26FCC9,
+ 0x2E238253, 0xF36628EB, 0x9144A1D2, 0x4C010B6A, 0x5501B3A0, 0x88441918, 0xEA669021, 0x37233A99,
+ 0xD867E1B5, 0x05224B0D, 0x6700C234, 0xBA45688C, 0xA345D046, 0x7E007AFE, 0x1C22F3C7, 0xC167597F,
+ 0xC747336E, 0x1A0299D6, 0x782010EF, 0xA565BA57, 0xBC65029D, 0x6120A825, 0x0302211C, 0xDE478BA4,
+ 0x31035088, 0xEC46FA30, 0x8E647309, 0x5321D9B1, 0x4A21617B, 0x9764CBC3, 0xF54642FA, 0x2803E842
+},
+{
+ 0x00000000, 0x38116FAC, 0x7022DF58, 0x4833B0F4, 0xE045BEB0, 0xD854D11C, 0x906761E8, 0xA8760E44,
+ 0xC5670B91, 0xFD76643D, 0xB545D4C9, 0x8D54BB65, 0x2522B521, 0x1D33DA8D, 0x55006A79, 0x6D1105D5,
+ 0x8F2261D3, 0xB7330E7F, 0xFF00BE8B, 0xC711D127, 0x6F67DF63, 0x5776B0CF, 0x1F45003B, 0x27546F97,
+ 0x4A456A42, 0x725405EE, 0x3A67B51A, 0x0276DAB6, 0xAA00D4F2, 0x9211BB5E, 0xDA220BAA, 0xE2336406,
+ 0x1BA8B557, 0x23B9DAFB, 0x6B8A6A0F, 0x539B05A3, 0xFBED0BE7, 0xC3FC644B, 0x8BCFD4BF, 0xB3DEBB13,
+ 0xDECFBEC6, 0xE6DED16A, 0xAEED619E, 0x96FC0E32, 0x3E8A0076, 0x069B6FDA, 0x4EA8DF2E, 0x76B9B082,
+ 0x948AD484, 0xAC9BBB28, 0xE4A80BDC, 0xDCB96470, 0x74CF6A34, 0x4CDE0598, 0x04EDB56C, 0x3CFCDAC0,
+ 0x51EDDF15, 0x69FCB0B9, 0x21CF004D, 0x19DE6FE1, 0xB1A861A5, 0x89B90E09, 0xC18ABEFD, 0xF99BD151,
+ 0x37516AAE, 0x0F400502, 0x4773B5F6, 0x7F62DA5A, 0xD714D41E, 0xEF05BBB2, 0xA7360B46, 0x9F2764EA,
+ 0xF236613F, 0xCA270E93, 0x8214BE67, 0xBA05D1CB, 0x1273DF8F, 0x2A62B023, 0x625100D7, 0x5A406F7B,
+ 0xB8730B7D, 0x806264D1, 0xC851D425, 0xF040BB89, 0x5836B5CD, 0x6027DA61, 0x28146A95, 0x10050539,
+ 0x7D1400EC, 0x45056F40, 0x0D36DFB4, 0x3527B018, 0x9D51BE5C, 0xA540D1F0, 0xED736104, 0xD5620EA8,
+ 0x2CF9DFF9, 0x14E8B055, 0x5CDB00A1, 0x64CA6F0D, 0xCCBC6149, 0xF4AD0EE5, 0xBC9EBE11, 0x848FD1BD,
+ 0xE99ED468, 0xD18FBBC4, 0x99BC0B30, 0xA1AD649C, 0x09DB6AD8, 0x31CA0574, 0x79F9B580, 0x41E8DA2C,
+ 0xA3DBBE2A, 0x9BCAD186, 0xD3F96172, 0xEBE80EDE, 0x439E009A, 0x7B8F6F36, 0x33BCDFC2, 0x0BADB06E,
+ 0x66BCB5BB, 0x5EADDA17, 0x169E6AE3, 0x2E8F054F, 0x86F90B0B, 0xBEE864A7, 0xF6DBD453, 0xCECABBFF,
+ 0x6EA2D55C, 0x56B3BAF0, 0x1E800A04, 0x269165A8, 0x8EE76BEC, 0xB6F60440, 0xFEC5B4B4, 0xC6D4DB18,
+ 0xABC5DECD, 0x93D4B161, 0xDBE70195, 0xE3F66E39, 0x4B80607D, 0x73910FD1, 0x3BA2BF25, 0x03B3D089,
+ 0xE180B48F, 0xD991DB23, 0x91A26BD7, 0xA9B3047B, 0x01C50A3F, 0x39D46593, 0x71E7D567, 0x49F6BACB,
+ 0x24E7BF1E, 0x1CF6D0B2, 0x54C56046, 0x6CD40FEA, 0xC4A201AE, 0xFCB36E02, 0xB480DEF6, 0x8C91B15A,
+ 0x750A600B, 0x4D1B0FA7, 0x0528BF53, 0x3D39D0FF, 0x954FDEBB, 0xAD5EB117, 0xE56D01E3, 0xDD7C6E4F,
+ 0xB06D6B9A, 0x887C0436, 0xC04FB4C2, 0xF85EDB6E, 0x5028D52A, 0x6839BA86, 0x200A0A72, 0x181B65DE,
+ 0xFA2801D8, 0xC2396E74, 0x8A0ADE80, 0xB21BB12C, 0x1A6DBF68, 0x227CD0C4, 0x6A4F6030, 0x525E0F9C,
+ 0x3F4F0A49, 0x075E65E5, 0x4F6DD511, 0x777CBABD, 0xDF0AB4F9, 0xE71BDB55, 0xAF286BA1, 0x9739040D,
+ 0x59F3BFF2, 0x61E2D05E, 0x29D160AA, 0x11C00F06, 0xB9B60142, 0x81A76EEE, 0xC994DE1A, 0xF185B1B6,
+ 0x9C94B463, 0xA485DBCF, 0xECB66B3B, 0xD4A70497, 0x7CD10AD3, 0x44C0657F, 0x0CF3D58B, 0x34E2BA27,
+ 0xD6D1DE21, 0xEEC0B18D, 0xA6F30179, 0x9EE26ED5, 0x36946091, 0x0E850F3D, 0x46B6BFC9, 0x7EA7D065,
+ 0x13B6D5B0, 0x2BA7BA1C, 0x63940AE8, 0x5B856544, 0xF3F36B00, 0xCBE204AC, 0x83D1B458, 0xBBC0DBF4,
+ 0x425B0AA5, 0x7A4A6509, 0x3279D5FD, 0x0A68BA51, 0xA21EB415, 0x9A0FDBB9, 0xD23C6B4D, 0xEA2D04E1,
+ 0x873C0134, 0xBF2D6E98, 0xF71EDE6C, 0xCF0FB1C0, 0x6779BF84, 0x5F68D028, 0x175B60DC, 0x2F4A0F70,
+ 0xCD796B76, 0xF56804DA, 0xBD5BB42E, 0x854ADB82, 0x2D3CD5C6, 0x152DBA6A, 0x5D1E0A9E, 0x650F6532,
+ 0x081E60E7, 0x300F0F4B, 0x783CBFBF, 0x402DD013, 0xE85BDE57, 0xD04AB1FB, 0x9879010F, 0xA0686EA3
+},
+{
+ 0x00000000, 0xEF306B19, 0xDB8CA0C3, 0x34BCCBDA, 0xB2F53777, 0x5DC55C6E, 0x697997B4, 0x8649FCAD,
+ 0x6006181F, 0x8F367306, 0xBB8AB8DC, 0x54BAD3C5, 0xD2F32F68, 0x3DC34471, 0x097F8FAB, 0xE64FE4B2,
+ 0xC00C303E, 0x2F3C5B27, 0x1B8090FD, 0xF4B0FBE4, 0x72F90749, 0x9DC96C50, 0xA975A78A, 0x4645CC93,
+ 0xA00A2821, 0x4F3A4338, 0x7B8688E2, 0x94B6E3FB, 0x12FF1F56, 0xFDCF744F, 0xC973BF95, 0x2643D48C,
+ 0x85F4168D, 0x6AC47D94, 0x5E78B64E, 0xB148DD57, 0x370121FA, 0xD8314AE3, 0xEC8D8139, 0x03BDEA20,
+ 0xE5F20E92, 0x0AC2658B, 0x3E7EAE51, 0xD14EC548, 0x570739E5, 0xB83752FC, 0x8C8B9926, 0x63BBF23F,
+ 0x45F826B3, 0xAAC84DAA, 0x9E748670, 0x7144ED69, 0xF70D11C4, 0x183D7ADD, 0x2C81B107, 0xC3B1DA1E,
+ 0x25FE3EAC, 0xCACE55B5, 0xFE729E6F, 0x1142F576, 0x970B09DB, 0x783B62C2, 0x4C87A918, 0xA3B7C201,
+ 0x0E045BEB, 0xE13430F2, 0xD588FB28, 0x3AB89031, 0xBCF16C9C, 0x53C10785, 0x677DCC5F, 0x884DA746,
+ 0x6E0243F4, 0x813228ED, 0xB58EE337, 0x5ABE882E, 0xDCF77483, 0x33C71F9A, 0x077BD440, 0xE84BBF59,
+ 0xCE086BD5, 0x213800CC, 0x1584CB16, 0xFAB4A00F, 0x7CFD5CA2, 0x93CD37BB, 0xA771FC61, 0x48419778,
+ 0xAE0E73CA, 0x413E18D3, 0x7582D309, 0x9AB2B810, 0x1CFB44BD, 0xF3CB2FA4, 0xC777E47E, 0x28478F67,
+ 0x8BF04D66, 0x64C0267F, 0x507CEDA5, 0xBF4C86BC, 0x39057A11, 0xD6351108, 0xE289DAD2, 0x0DB9B1CB,
+ 0xEBF65579, 0x04C63E60, 0x307AF5BA, 0xDF4A9EA3, 0x5903620E, 0xB6330917, 0x828FC2CD, 0x6DBFA9D4,
+ 0x4BFC7D58, 0xA4CC1641, 0x9070DD9B, 0x7F40B682, 0xF9094A2F, 0x16392136, 0x2285EAEC, 0xCDB581F5,
+ 0x2BFA6547, 0xC4CA0E5E, 0xF076C584, 0x1F46AE9D, 0x990F5230, 0x763F3929, 0x4283F2F3, 0xADB399EA,
+ 0x1C08B7D6, 0xF338DCCF, 0xC7841715, 0x28B47C0C, 0xAEFD80A1, 0x41CDEBB8, 0x75712062, 0x9A414B7B,
+ 0x7C0EAFC9, 0x933EC4D0, 0xA7820F0A, 0x48B26413, 0xCEFB98BE, 0x21CBF3A7, 0x1577387D, 0xFA475364,
+ 0xDC0487E8, 0x3334ECF1, 0x0788272B, 0xE8B84C32, 0x6EF1B09F, 0x81C1DB86, 0xB57D105C, 0x5A4D7B45,
+ 0xBC029FF7, 0x5332F4EE, 0x678E3F34, 0x88BE542D, 0x0EF7A880, 0xE1C7C399, 0xD57B0843, 0x3A4B635A,
+ 0x99FCA15B, 0x76CCCA42, 0x42700198, 0xAD406A81, 0x2B09962C, 0xC439FD35, 0xF08536EF, 0x1FB55DF6,
+ 0xF9FAB944, 0x16CAD25D, 0x22761987, 0xCD46729E, 0x4B0F8E33, 0xA43FE52A, 0x90832EF0, 0x7FB345E9,
+ 0x59F09165, 0xB6C0FA7C, 0x827C31A6, 0x6D4C5ABF, 0xEB05A612, 0x0435CD0B, 0x308906D1, 0xDFB96DC8,
+ 0x39F6897A, 0xD6C6E263, 0xE27A29B9, 0x0D4A42A0, 0x8B03BE0D, 0x6433D514, 0x508F1ECE, 0xBFBF75D7,
+ 0x120CEC3D, 0xFD3C8724, 0xC9804CFE, 0x26B027E7, 0xA0F9DB4A, 0x4FC9B053, 0x7B757B89, 0x94451090,
+ 0x720AF422, 0x9D3A9F3B, 0xA98654E1, 0x46B63FF8, 0xC0FFC355, 0x2FCFA84C, 0x1B736396, 0xF443088F,
+ 0xD200DC03, 0x3D30B71A, 0x098C7CC0, 0xE6BC17D9, 0x60F5EB74, 0x8FC5806D, 0xBB794BB7, 0x544920AE,
+ 0xB206C41C, 0x5D36AF05, 0x698A64DF, 0x86BA0FC6, 0x00F3F36B, 0xEFC39872, 0xDB7F53A8, 0x344F38B1,
+ 0x97F8FAB0, 0x78C891A9, 0x4C745A73, 0xA344316A, 0x250DCDC7, 0xCA3DA6DE, 0xFE816D04, 0x11B1061D,
+ 0xF7FEE2AF, 0x18CE89B6, 0x2C72426C, 0xC3422975, 0x450BD5D8, 0xAA3BBEC1, 0x9E87751B, 0x71B71E02,
+ 0x57F4CA8E, 0xB8C4A197, 0x8C786A4D, 0x63480154, 0xE501FDF9, 0x0A3196E0, 0x3E8D5D3A, 0xD1BD3623,
+ 0x37F2D291, 0xD8C2B988, 0xEC7E7252, 0x034E194B, 0x8507E5E6, 0x6A378EFF, 0x5E8B4525, 0xB1BB2E3C
+},
+{
+ 0x00000000, 0x68032CC8, 0xD0065990, 0xB8057558, 0xA5E0C5D1, 0xCDE3E919, 0x75E69C41, 0x1DE5B089,
+ 0x4E2DFD53, 0x262ED19B, 0x9E2BA4C3, 0xF628880B, 0xEBCD3882, 0x83CE144A, 0x3BCB6112, 0x53C84DDA,
+ 0x9C5BFAA6, 0xF458D66E, 0x4C5DA336, 0x245E8FFE, 0x39BB3F77, 0x51B813BF, 0xE9BD66E7, 0x81BE4A2F,
+ 0xD27607F5, 0xBA752B3D, 0x02705E65, 0x6A7372AD, 0x7796C224, 0x1F95EEEC, 0xA7909BB4, 0xCF93B77C,
+ 0x3D5B83BD, 0x5558AF75, 0xED5DDA2D, 0x855EF6E5, 0x98BB466C, 0xF0B86AA4, 0x48BD1FFC, 0x20BE3334,
+ 0x73767EEE, 0x1B755226, 0xA370277E, 0xCB730BB6, 0xD696BB3F, 0xBE9597F7, 0x0690E2AF, 0x6E93CE67,
+ 0xA100791B, 0xC90355D3, 0x7106208B, 0x19050C43, 0x04E0BCCA, 0x6CE39002, 0xD4E6E55A, 0xBCE5C992,
+ 0xEF2D8448, 0x872EA880, 0x3F2BDDD8, 0x5728F110, 0x4ACD4199, 0x22CE6D51, 0x9ACB1809, 0xF2C834C1,
+ 0x7AB7077A, 0x12B42BB2, 0xAAB15EEA, 0xC2B27222, 0xDF57C2AB, 0xB754EE63, 0x0F519B3B, 0x6752B7F3,
+ 0x349AFA29, 0x5C99D6E1, 0xE49CA3B9, 0x8C9F8F71, 0x917A3FF8, 0xF9791330, 0x417C6668, 0x297F4AA0,
+ 0xE6ECFDDC, 0x8EEFD114, 0x36EAA44C, 0x5EE98884, 0x430C380D, 0x2B0F14C5, 0x930A619D, 0xFB094D55,
+ 0xA8C1008F, 0xC0C22C47, 0x78C7591F, 0x10C475D7, 0x0D21C55E, 0x6522E996, 0xDD279CCE, 0xB524B006,
+ 0x47EC84C7, 0x2FEFA80F, 0x97EADD57, 0xFFE9F19F, 0xE20C4116, 0x8A0F6DDE, 0x320A1886, 0x5A09344E,
+ 0x09C17994, 0x61C2555C, 0xD9C72004, 0xB1C40CCC, 0xAC21BC45, 0xC422908D, 0x7C27E5D5, 0x1424C91D,
+ 0xDBB77E61, 0xB3B452A9, 0x0BB127F1, 0x63B20B39, 0x7E57BBB0, 0x16549778, 0xAE51E220, 0xC652CEE8,
+ 0x959A8332, 0xFD99AFFA, 0x459CDAA2, 0x2D9FF66A, 0x307A46E3, 0x58796A2B, 0xE07C1F73, 0x887F33BB,
+ 0xF56E0EF4, 0x9D6D223C, 0x25685764, 0x4D6B7BAC, 0x508ECB25, 0x388DE7ED, 0x808892B5, 0xE88BBE7D,
+ 0xBB43F3A7, 0xD340DF6F, 0x6B45AA37, 0x034686FF, 0x1EA33676, 0x76A01ABE, 0xCEA56FE6, 0xA6A6432E,
+ 0x6935F452, 0x0136D89A, 0xB933ADC2, 0xD130810A, 0xCCD53183, 0xA4D61D4B, 0x1CD36813, 0x74D044DB,
+ 0x27180901, 0x4F1B25C9, 0xF71E5091, 0x9F1D7C59, 0x82F8CCD0, 0xEAFBE018, 0x52FE9540, 0x3AFDB988,
+ 0xC8358D49, 0xA036A181, 0x1833D4D9, 0x7030F811, 0x6DD54898, 0x05D66450, 0xBDD31108, 0xD5D03DC0,
+ 0x8618701A, 0xEE1B5CD2, 0x561E298A, 0x3E1D0542, 0x23F8B5CB, 0x4BFB9903, 0xF3FEEC5B, 0x9BFDC093,
+ 0x546E77EF, 0x3C6D5B27, 0x84682E7F, 0xEC6B02B7, 0xF18EB23E, 0x998D9EF6, 0x2188EBAE, 0x498BC766,
+ 0x1A438ABC, 0x7240A674, 0xCA45D32C, 0xA246FFE4, 0xBFA34F6D, 0xD7A063A5, 0x6FA516FD, 0x07A63A35,
+ 0x8FD9098E, 0xE7DA2546, 0x5FDF501E, 0x37DC7CD6, 0x2A39CC5F, 0x423AE097, 0xFA3F95CF, 0x923CB907,
+ 0xC1F4F4DD, 0xA9F7D815, 0x11F2AD4D, 0x79F18185, 0x6414310C, 0x0C171DC4, 0xB412689C, 0xDC114454,
+ 0x1382F328, 0x7B81DFE0, 0xC384AAB8, 0xAB878670, 0xB66236F9, 0xDE611A31, 0x66646F69, 0x0E6743A1,
+ 0x5DAF0E7B, 0x35AC22B3, 0x8DA957EB, 0xE5AA7B23, 0xF84FCBAA, 0x904CE762, 0x2849923A, 0x404ABEF2,
+ 0xB2828A33, 0xDA81A6FB, 0x6284D3A3, 0x0A87FF6B, 0x17624FE2, 0x7F61632A, 0xC7641672, 0xAF673ABA,
+ 0xFCAF7760, 0x94AC5BA8, 0x2CA92EF0, 0x44AA0238, 0x594FB2B1, 0x314C9E79, 0x8949EB21, 0xE14AC7E9,
+ 0x2ED97095, 0x46DA5C5D, 0xFEDF2905, 0x96DC05CD, 0x8B39B544, 0xE33A998C, 0x5B3FECD4, 0x333CC01C,
+ 0x60F48DC6, 0x08F7A10E, 0xB0F2D456, 0xD8F1F89E, 0xC5144817, 0xAD1764DF, 0x15121187, 0x7D113D4F
+},
+{
+ 0x00000000, 0x493C7D27, 0x9278FA4E, 0xDB448769, 0x211D826D, 0x6821FF4A, 0xB3657823, 0xFA590504,
+ 0x423B04DA, 0x0B0779FD, 0xD043FE94, 0x997F83B3, 0x632686B7, 0x2A1AFB90, 0xF15E7CF9, 0xB86201DE,
+ 0x847609B4, 0xCD4A7493, 0x160EF3FA, 0x5F328EDD, 0xA56B8BD9, 0xEC57F6FE, 0x37137197, 0x7E2F0CB0,
+ 0xC64D0D6E, 0x8F717049, 0x5435F720, 0x1D098A07, 0xE7508F03, 0xAE6CF224, 0x7528754D, 0x3C14086A,
+ 0x0D006599, 0x443C18BE, 0x9F789FD7, 0xD644E2F0, 0x2C1DE7F4, 0x65219AD3, 0xBE651DBA, 0xF759609D,
+ 0x4F3B6143, 0x06071C64, 0xDD439B0D, 0x947FE62A, 0x6E26E32E, 0x271A9E09, 0xFC5E1960, 0xB5626447,
+ 0x89766C2D, 0xC04A110A, 0x1B0E9663, 0x5232EB44, 0xA86BEE40, 0xE1579367, 0x3A13140E, 0x732F6929,
+ 0xCB4D68F7, 0x827115D0, 0x593592B9, 0x1009EF9E, 0xEA50EA9A, 0xA36C97BD, 0x782810D4, 0x31146DF3,
+ 0x1A00CB32, 0x533CB615, 0x8878317C, 0xC1444C5B, 0x3B1D495F, 0x72213478, 0xA965B311, 0xE059CE36,
+ 0x583BCFE8, 0x1107B2CF, 0xCA4335A6, 0x837F4881, 0x79264D85, 0x301A30A2, 0xEB5EB7CB, 0xA262CAEC,
+ 0x9E76C286, 0xD74ABFA1, 0x0C0E38C8, 0x453245EF, 0xBF6B40EB, 0xF6573DCC, 0x2D13BAA5, 0x642FC782,
+ 0xDC4DC65C, 0x9571BB7B, 0x4E353C12, 0x07094135, 0xFD504431, 0xB46C3916, 0x6F28BE7F, 0x2614C358,
+ 0x1700AEAB, 0x5E3CD38C, 0x857854E5, 0xCC4429C2, 0x361D2CC6, 0x7F2151E1, 0xA465D688, 0xED59ABAF,
+ 0x553BAA71, 0x1C07D756, 0xC743503F, 0x8E7F2D18, 0x7426281C, 0x3D1A553B, 0xE65ED252, 0xAF62AF75,
+ 0x9376A71F, 0xDA4ADA38, 0x010E5D51, 0x48322076, 0xB26B2572, 0xFB575855, 0x2013DF3C, 0x692FA21B,
+ 0xD14DA3C5, 0x9871DEE2, 0x4335598B, 0x0A0924AC, 0xF05021A8, 0xB96C5C8F, 0x6228DBE6, 0x2B14A6C1,
+ 0x34019664, 0x7D3DEB43, 0xA6796C2A, 0xEF45110D, 0x151C1409, 0x5C20692E, 0x8764EE47, 0xCE589360,
+ 0x763A92BE, 0x3F06EF99, 0xE44268F0, 0xAD7E15D7, 0x572710D3, 0x1E1B6DF4, 0xC55FEA9D, 0x8C6397BA,
+ 0xB0779FD0, 0xF94BE2F7, 0x220F659E, 0x6B3318B9, 0x916A1DBD, 0xD856609A, 0x0312E7F3, 0x4A2E9AD4,
+ 0xF24C9B0A, 0xBB70E62D, 0x60346144, 0x29081C63, 0xD3511967, 0x9A6D6440, 0x4129E329, 0x08159E0E,
+ 0x3901F3FD, 0x703D8EDA, 0xAB7909B3, 0xE2457494, 0x181C7190, 0x51200CB7, 0x8A648BDE, 0xC358F6F9,
+ 0x7B3AF727, 0x32068A00, 0xE9420D69, 0xA07E704E, 0x5A27754A, 0x131B086D, 0xC85F8F04, 0x8163F223,
+ 0xBD77FA49, 0xF44B876E, 0x2F0F0007, 0x66337D20, 0x9C6A7824, 0xD5560503, 0x0E12826A, 0x472EFF4D,
+ 0xFF4CFE93, 0xB67083B4, 0x6D3404DD, 0x240879FA, 0xDE517CFE, 0x976D01D9, 0x4C2986B0, 0x0515FB97,
+ 0x2E015D56, 0x673D2071, 0xBC79A718, 0xF545DA3F, 0x0F1CDF3B, 0x4620A21C, 0x9D642575, 0xD4585852,
+ 0x6C3A598C, 0x250624AB, 0xFE42A3C2, 0xB77EDEE5, 0x4D27DBE1, 0x041BA6C6, 0xDF5F21AF, 0x96635C88,
+ 0xAA7754E2, 0xE34B29C5, 0x380FAEAC, 0x7133D38B, 0x8B6AD68F, 0xC256ABA8, 0x19122CC1, 0x502E51E6,
+ 0xE84C5038, 0xA1702D1F, 0x7A34AA76, 0x3308D751, 0xC951D255, 0x806DAF72, 0x5B29281B, 0x1215553C,
+ 0x230138CF, 0x6A3D45E8, 0xB179C281, 0xF845BFA6, 0x021CBAA2, 0x4B20C785, 0x906440EC, 0xD9583DCB,
+ 0x613A3C15, 0x28064132, 0xF342C65B, 0xBA7EBB7C, 0x4027BE78, 0x091BC35F, 0xD25F4436, 0x9B633911,
+ 0xA777317B, 0xEE4B4C5C, 0x350FCB35, 0x7C33B612, 0x866AB316, 0xCF56CE31, 0x14124958, 0x5D2E347F,
+ 0xE54C35A1, 0xAC704886, 0x7734CFEF, 0x3E08B2C8, 0xC451B7CC, 0x8D6DCAEB, 0x56294D82, 0x1F1530A5
+}};
+
+#define CRC32_UPD(crc, n) \
+	(crc32c_tables[(n)][(crc) & 0xFF] ^ \
+	 crc32c_tables[(n)-1][((crc) >> 8) & 0xFF])
+
+static inline uint32_t
+crc32c_1word(uint32_t data, uint32_t init_val)
+{
+	uint32_t crc, term1, term2;
+	crc = init_val;
+	crc ^= data;
+
+	term1 = CRC32_UPD(crc, 3);
+	term2 = crc >> 16;
+	crc = term1 ^ CRC32_UPD(term2, 1);
+
+	return crc;
+}
+
+static inline uint32_t
+crc32c_2words(uint64_t data, uint32_t init_val)
+{
+	union {
+		uint64_t u64;
+		uint32_t u32[2];
+	} d;
+	d.u64 = data;
+
+	uint32_t crc, term1, term2;
+
+	crc = init_val;
+	crc ^= d.u32[0];
+
+	term1 = CRC32_UPD(crc, 7);
+	term2 = crc >> 16;
+	crc = term1 ^ CRC32_UPD(term2, 5);
+	term1 = CRC32_UPD(d.u32[1], 3);
+	term2 = d.u32[1] >> 16;
+	crc ^= term1 ^ CRC32_UPD(term2, 1);
+
+	return crc;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
  *
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
  2015-01-29  8:48   ` [dpdk-dev] [PATCH v6 1/7] hash: add software CRC32 implementation Yerden Zhumabekov
@ 2015-01-29  8:48   ` Yerden Zhumabekov
  2015-02-02  5:15     ` Liang, Cunming
  2015-01-29  8:49   ` [dpdk-dev] [PATCH v6 3/7] hash: replace built-in functions implementing SSE4.2 Yerden Zhumabekov
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-01-29  8:48 UTC (permalink / raw)
  To: dev

Added:
- crc32c_sse42_u32() emits 'crc32l' asm instruction;
- crc32c_sse42_u64() emits 'crc32q' asm instruction;
- crc32c_sse42_u64_mimic(), wrapper in case of run on 32-bit platform.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 4da7ca4..fe35996 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -363,6 +363,40 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 	return crc;
 }
 
+static inline uint32_t
+crc32c_sse42_u32(uint32_t data, uint32_t init_val)
+{
+	__asm__ volatile(
+			"crc32l %[data], %[init_val];"
+			: [init_val] "+r" (init_val)
+			: [data] "rm" (data));
+	return init_val;
+}
+
+static inline uint32_t
+crc32c_sse42_u64(uint64_t data, uint64_t init_val)
+{
+	__asm__ volatile(
+			"crc32q %[data], %[init_val];"
+			: [init_val] "+r" (init_val)
+			: [data] "rm" (data));
+	return init_val;
+}
+
+static inline uint32_t
+crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
+{
+	union {
+		uint32_t u32[2];
+		uint64_t u64;
+	} d;
+
+	d.u64 = data;
+	init_val = crc32c_sse42_u32(d.u32[0], init_val);
+	init_val = crc32c_sse42_u32(d.u32[1], init_val);
+	return init_val;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
  *
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v6 3/7] hash: replace built-in functions implementing SSE4.2
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
  2015-01-29  8:48   ` [dpdk-dev] [PATCH v6 1/7] hash: add software CRC32 implementation Yerden Zhumabekov
  2015-01-29  8:48   ` [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics Yerden Zhumabekov
@ 2015-01-29  8:49   ` Yerden Zhumabekov
  2015-01-29  8:49   ` [dpdk-dev] [PATCH v6 4/7] hash: add rte_hash_crc_8byte function Yerden Zhumabekov
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-01-29  8:49 UTC (permalink / raw)
  To: dev

Give up using built-in intrinsics and use our own assembly
implementation. Remove #include entry as well.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index fe35996..45b0dce 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,6 @@ extern "C" {
 #endif
 
 #include <stdint.h>
-#include <nmmintrin.h>
 
 /* Lookup tables for software implementation of CRC32C */
 static const uint32_t crc32c_tables[8][256] = {{
@@ -410,7 +409,7 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	return _mm_crc32_u32(init_val, data);
+	return crc32c_sse42_u32(data, init_val);
 }
 
 /**
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v6 4/7] hash: add rte_hash_crc_8byte function
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
                     ` (2 preceding siblings ...)
  2015-01-29  8:49   ` [dpdk-dev] [PATCH v6 3/7] hash: replace built-in functions implementing SSE4.2 Yerden Zhumabekov
@ 2015-01-29  8:49   ` Yerden Zhumabekov
  2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 5/7] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-01-29  8:49 UTC (permalink / raw)
  To: dev

SSE4.2 provides CRC32 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 45b0dce..6cc67cd 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -413,6 +413,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }
 
 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+	return crc32c_sse42_u64(data, init_val);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v6 5/7] hash: add fallback to software CRC32 implementation
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
                     ` (3 preceding siblings ...)
  2015-01-29  8:49   ` [dpdk-dev] [PATCH v6 4/7] hash: add rte_hash_crc_8byte function Yerden Zhumabekov
@ 2015-01-29  8:50   ` Yerden Zhumabekov
  2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 6/7] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-01-29  8:50 UTC (permalink / raw)
  To: dev

Initially, SSE4.2 support is detected via the constructor function.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default.

rte_hash_crc_*byte() functions reworked so they choose available
CRC32 implementation in the runtime.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   61 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 6cc67cd..435048e 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,6 +45,8 @@ extern "C" {
 #endif
 
 #include <stdint.h>
+#include <rte_cpuflags.h>
+#include <rte_branch_prediction.h>
 
 /* Lookup tables for software implementation of CRC32C */
 static const uint32_t crc32c_tables[8][256] = {{
@@ -396,8 +398,52 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
 	return init_val;
 }
 
+#define CRC32_SW            (1U << 0)
+#define CRC32_SSE42         (1U << 1)
+#define CRC32_x64           (1U << 2)
+#define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
+
+static uint8_t crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param flag
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default)
+ *
+ */
+static inline void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	switch (alg) {
+	case CRC32_SSE42_x64:
+		if (! rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T))
+			alg = CRC32_SSE42;
+	case CRC32_SSE42:
+		if (! rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+			alg = CRC32_SW;
+	case CRC32_SW:
+		crc32_alg = alg;
+	default:
+		break;
+	}
+}
+
+/* Setting the best available algorithm */
+static inline void __attribute__((constructor))
+rte_hash_crc_init_alg(void)
+{
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -409,11 +455,16 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	return crc32c_sse42_u32(data, init_val);
+	if (likely(crc32_alg & CRC32_SSE42))
+		return crc32c_sse42_u32(data, init_val);
+
+	return crc32c_1word(data, init_val);
 }
 
 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -425,7 +476,13 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	return crc32c_sse42_u64(data, init_val);
+	if (likely(crc32_alg == CRC32_SSE42_x64))
+		return crc32c_sse42_u64(data, init_val);
+
+	if (likely(crc32_alg & CRC32_SSE42))
+		return crc32c_sse42_u64_mimic(data, init_val);
+
+	return crc32c_2words(data, init_val);
 }
 
 /**
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v6 6/7] hash: rte_hash_crc() slices data into 8-byte pieces
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
                     ` (4 preceding siblings ...)
  2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 5/7] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
@ 2015-01-29  8:50   ` Yerden Zhumabekov
  2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 7/7] test: remove redundant compile checks Yerden Zhumabekov
  2015-02-01 14:13   ` [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent Neil Horman
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-01-29  8:50 UTC (permalink / raw)
  To: dev

Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 lib/librte_hash/rte_hash_crc.h |   33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 435048e..5c04fe4 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -486,7 +486,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }
 
 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -501,23 +501,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
 	unsigned i;
-	uint32_t temp = 0;
-	const uint32_t *p32 = (const uint32_t *)data;
+	uint64_t temp = 0;
+	const uint64_t *p64 = (const uint64_t *)data;
 
-	for (i = 0; i < data_len / 4; i++) {
-		init_val = rte_hash_crc_4byte(*p32++, init_val);
+	for (i = 0; i < data_len / 8; i++) {
+		init_val = rte_hash_crc_8byte(*p64++, init_val);
 	}
 
-	switch (3 - (data_len & 0x03)) {
+	switch (7 - (data_len & 0x07)) {
 	case 0:
-		temp |= *((const uint8_t *)p32 + 2) << 16;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
 		/* Fallthrough */
 	case 1:
-		temp |= *((const uint8_t *)p32 + 1) << 8;
+		temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
 		/* Fallthrough */
 	case 2:
-		temp |= *((const uint8_t *)p32);
+		temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+		temp |= *((const uint32_t *)p64);
+		init_val = rte_hash_crc_8byte(temp, init_val);
+		break;
+	case 3:
+		init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+		break;
+	case 4:
+		temp |= *((const uint8_t *)p64 + 2) << 16;
+		/* Fallthrough */
+	case 5:
+		temp |= *((const uint8_t *)p64 + 1) << 8;
+		/* Fallthrough */
+	case 6:
+		temp |= *((const uint8_t *)p64);
 		init_val = rte_hash_crc_4byte(temp, init_val);
+		/* Fallthrough */
 	default:
 		break;
 	}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] [PATCH v6 7/7] test: remove redundant compile checks
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
                     ` (5 preceding siblings ...)
  2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 6/7] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
@ 2015-01-29  8:50   ` Yerden Zhumabekov
  2015-02-01 14:13   ` [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent Neil Horman
  7 siblings, 0 replies; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-01-29  8:50 UTC (permalink / raw)
  To: dev

Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
---
 app/test/test_hash.c      |    7 -------
 app/test/test_hash_perf.c |   11 -----------
 2 files changed, 18 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 178ec3f..76b1b8f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -55,10 +55,7 @@
 #include <rte_hash.h>
 #include <rte_fbk_hash.h>
 #include <rte_jhash.h>
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <rte_hash_crc.h>
-#endif
 
 /*******************************************************************************
  * Hash function performance test configuration section. Each performance test
@@ -67,11 +64,7 @@
  * The five arrays below control what tests are performed. Every combination
  * from the array entries is tested.
  */
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64};
 /******************************************************************************/
diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c
index be34957..05a88ec 100644
--- a/app/test/test_hash_perf.c
+++ b/app/test/test_hash_perf.c
@@ -56,10 +56,7 @@
 #include <rte_hash.h>
 #include <rte_fbk_hash.h>
 #include <rte_jhash.h>
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include <rte_hash_crc.h>
-#endif
 
 /* Types of hash table performance test that can be performed */
 enum hash_test_t {
@@ -97,11 +94,7 @@ struct tbl_perf_test_params {
  */
 #define HASHTEST_ITERATIONS 1000000
 
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64};
 /******************************************************************************/
@@ -243,7 +236,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {       LOOKUP,  ITERATIONS,  1048576,           4,      64,    rte_jhash,   0},
 {       LOOKUP,  ITERATIONS,  1048576,           8,      64,    rte_jhash,   0},
 {       LOOKUP,  ITERATIONS,  1048576,          16,      64,    rte_jhash,   0},
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 /* Small table, add */
 /*  Test type | Iterations | Entries | BucketSize | KeyLen |    HashFunc | InitVal */
 { ADD_ON_EMPTY,        1024,     1024,           1,      16, rte_hash_crc,   0},
@@ -376,7 +368,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {       LOOKUP,  ITERATIONS,  1048576,           4,      64, rte_hash_crc,   0},
 {       LOOKUP,  ITERATIONS,  1048576,           8,      64, rte_hash_crc,   0},
 {       LOOKUP,  ITERATIONS,  1048576,          16,      64, rte_hash_crc,   0},
-#endif
 };
 
 /******************************************************************************/
@@ -423,10 +414,8 @@ static const char *get_hash_name(rte_hash_function f)
 	if (f == rte_jhash)
 		return "jhash";
 
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 	if (f == rte_hash_crc)
 		return "rte_hash_crc";
-#endif
 
 	return "UnknownHash";
 }
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
                     ` (6 preceding siblings ...)
  2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 7/7] test: remove redundant compile checks Yerden Zhumabekov
@ 2015-02-01 14:13   ` Neil Horman
  2015-02-02  3:07     ` Yerden Zhumabekov
  7 siblings, 1 reply; 98+ messages in thread
From: Neil Horman @ 2015-02-01 14:13 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Thu, Jan 29, 2015 at 02:48:11PM +0600, Yerden Zhumabekov wrote:
> This is a rework of my previous patches improving performance of rte_hash_crc.
> 
> Summary of changes:
> * software implementation of CRC32 introduced;
> * in the runtime, algorithm can fall back to software version if CPU doesn't support SSE4.2;
> * best available algorithm is automatically detected upon application startup;
> * redundant compile checks removed from test utilities;
> * assembly code for emitting SSE4.2 instructions is used instead of built-in intrinsics;
> * rte_hash_crc() function performance significantly improved.
> 
> v6 changes:
> * added 'const' qualifier to crc32c lookup tables declaration.
> 
> v5 changes:
> * given up gcc's builtin SSE4.2 intrinsics;
> * add assembly code for emitting SSE4.2 instructions.
> 
> v4 changes:
> * icc-specific compile checks removed.
> 
> v3 changes:
> * setting default algorithm implementation as a constructor while application startup;
> * crc32 software implementation improved;
> * removed compile-time checks from test_hash_perf and test_hash.
> 
> v2 changes:
> * added CRC32 software implementation;
> * added rte_hash_crc_set_alg() function to control availability of SSE4.2;
> * added fallback to sw crc32 in case SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
> 
> Initial version (v1) changes:
> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand;
> * reworked rte_hash_crc() function which leverages both versions of CRC32 hash calculation functions with 4 and 8-byte operands.
> 
> 
> Yerden Zhumabekov (7):
>   hash: add software CRC32 implementation
>   hash: add assembly implementation of CRC32 intrinsics
>   hash: replace built-in functions implementing SSE4.2
>   hash: add rte_hash_crc_8byte function
>   hash: add fallback to software CRC32 implementation
>   hash: rte_hash_crc() slices data into 8-byte pieces
>   test: remove redundant compile checks
> 
>  app/test/test_hash.c           |    7 -
>  app/test/test_hash_perf.c      |   11 -
>  lib/librte_hash/rte_hash_crc.h |  459 +++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 448 insertions(+), 29 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> 
Just to be clear, this does build if you compile it against the "default"
machine type, correct?
Neil

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-01 14:13   ` [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent Neil Horman
@ 2015-02-02  3:07     ` Yerden Zhumabekov
  2015-02-02  3:31       ` Neil Horman
  0 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-02-02  3:07 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev


01.02.2015 20:13, Neil Horman пишет:
> On Thu, Jan 29, 2015 at 02:48:11PM +0600, Yerden Zhumabekov wrote:
>> This is a rework of my previous patches improving performance of rte_hash_crc.
>>
>> Summary of changes:
>> * software implementation of CRC32 introduced;
>> * in the runtime, algorithm can fall back to software version if CPU doesn't support SSE4.2;
>> * best available algorithm is automatically detected upon application startup;
>> * redundant compile checks removed from test utilities;
>> * assembly code for emitting SSE4.2 instructions is used instead of built-in intrinsics;
>> * rte_hash_crc() function performance significantly improved.
>>
>> v6 changes:
>> * added 'const' qualifier to crc32c lookup tables declaration.
> Just to be clear, this does build if you compile it against the "default"
> machine type, correct?
> Neil

I think so, I've just successfully built it against latest snapshot with
RTE_TARGET
equal to 'x86_64-native-linuxapp-gcc'.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-02  3:07     ` Yerden Zhumabekov
@ 2015-02-02  3:31       ` Neil Horman
  2015-02-02  5:18         ` [dpdk-dev] HA: " Жумабеков Ерден Мирзагулович
  2015-02-02  5:39         ` [dpdk-dev] " Yerden Zhumabekov
  0 siblings, 2 replies; 98+ messages in thread
From: Neil Horman @ 2015-02-02  3:31 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Mon, Feb 02, 2015 at 09:07:45AM +0600, Yerden Zhumabekov wrote:
> 
> 01.02.2015 20:13, Neil Horman пишет:
> > On Thu, Jan 29, 2015 at 02:48:11PM +0600, Yerden Zhumabekov wrote:
> >> This is a rework of my previous patches improving performance of rte_hash_crc.
> >>
> >> Summary of changes:
> >> * software implementation of CRC32 introduced;
> >> * in the runtime, algorithm can fall back to software version if CPU doesn't support SSE4.2;
> >> * best available algorithm is automatically detected upon application startup;
> >> * redundant compile checks removed from test utilities;
> >> * assembly code for emitting SSE4.2 instructions is used instead of built-in intrinsics;
> >> * rte_hash_crc() function performance significantly improved.
> >>
> >> v6 changes:
> >> * added 'const' qualifier to crc32c lookup tables declaration.
> > Just to be clear, this does build if you compile it against the "default"
> > machine type, correct?
> > Neil
> 
> I think so, I've just successfully built it against latest snapshot with
> RTE_TARGET
> equal to 'x86_64-native-linuxapp-gcc'.
> 
Please confirm that setting the machine type to default builds and runs properly.
Neil

> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics
  2015-01-29  8:48   ` [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics Yerden Zhumabekov
@ 2015-02-02  5:15     ` Liang, Cunming
  2015-02-02  5:34       ` Yerden Zhumabekov
  0 siblings, 1 reply; 98+ messages in thread
From: Liang, Cunming @ 2015-02-02  5:15 UTC (permalink / raw)
  To: Yerden Zhumabekov, dev


On 1/29/2015 4:48 PM, Yerden Zhumabekov wrote:
> Added:
> - crc32c_sse42_u32() emits 'crc32l' asm instruction;
> - crc32c_sse42_u64() emits 'crc32q' asm instruction;
> - crc32c_sse42_u64_mimic(), wrapper in case of run on 32-bit platform.
>
> Signed-off-by: Yerden Zhumabekov <e_zhumabekov@sts.kz>
> ---
>   lib/librte_hash/rte_hash_crc.h |   34 ++++++++++++++++++++++++++++++++++
>   1 file changed, 34 insertions(+)
>
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index 4da7ca4..fe35996 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -363,6 +363,40 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>   	return crc;
>   }
>   
> +static inline uint32_t
> +crc32c_sse42_u32(uint32_t data, uint32_t init_val)
> +{
> +	__asm__ volatile(
> +			"crc32l %[data], %[init_val];"
> +			: [init_val] "+r" (init_val)
> +			: [data] "rm" (data));
> +	return init_val;
> +}
> +
> +static inline uint32_t
> +crc32c_sse42_u64(uint64_t data, uint64_t init_val)
> +{
> +	__asm__ volatile(
> +			"crc32q %[data], %[init_val];"
> +			: [init_val] "+r" (init_val)
> +			: [data] "rm" (data));
> +	return init_val;
> +}
[LCM] I'm curious about the benefit of replacing CRC32 intrinsic 
"_mm_crc32_u32/64".
> +
> +static inline uint32_t
> +crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
> +{
> +	union {
> +		uint32_t u32[2];
> +		uint64_t u64;
> +	} d;
> +
> +	d.u64 = data;
> +	init_val = crc32c_sse42_u32(d.u32[0], init_val);
> +	init_val = crc32c_sse42_u32(d.u32[1], init_val);
> +	return init_val;
> +}
> +
>   /**
>    * Use single crc32 instruction to perform a hash on a 4 byte value.
>    *

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [dpdk-dev] HA: [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-02  3:31       ` Neil Horman
@ 2015-02-02  5:18         ` Жумабеков Ерден Мирзагулович
  2015-02-02  5:39         ` [dpdk-dev] " Yerden Zhumabekov
  1 sibling, 0 replies; 98+ messages in thread
From: Жумабеков Ерден Мирзагулович @ 2015-02-02  5:18 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

I've set CONFIG_RTE_MACHINE="default" in the config and the build was successful.

________________________________________
От: Neil Horman [nhorman@tuxdriver.com]
Отправлено: 2 февраля 2015 г. 9:31
To: Жумабеков Ерден Мирзагулович
Cc: thomas.monjalon@6wind.com; dev@dpdk.org
Тема: Re: [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent

On Mon, Feb 02, 2015 at 09:07:45AM +0600, Yerden Zhumabekov wrote:
>
> 01.02.2015 20:13, Neil Horman пишет:
> > On Thu, Jan 29, 2015 at 02:48:11PM +0600, Yerden Zhumabekov wrote:
> >> This is a rework of my previous patches improving performance of rte_hash_crc.
> >>
> >> Summary of changes:
> >> * software implementation of CRC32 introduced;
> >> * in the runtime, algorithm can fall back to software version if CPU doesn't support SSE4.2;
> >> * best available algorithm is automatically detected upon application startup;
> >> * redundant compile checks removed from test utilities;
> >> * assembly code for emitting SSE4.2 instructions is used instead of built-in intrinsics;
> >> * rte_hash_crc() function performance significantly improved.
> >>
> >> v6 changes:
> >> * added 'const' qualifier to crc32c lookup tables declaration.
> > Just to be clear, this does build if you compile it against the "default"
> > machine type, correct?
> > Neil
>
> I think so, I've just successfully built it against latest snapshot with
> RTE_TARGET
> equal to 'x86_64-native-linuxapp-gcc'.
>
Please confirm that setting the machine type to default builds and runs properly.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics
  2015-02-02  5:15     ` Liang, Cunming
@ 2015-02-02  5:34       ` Yerden Zhumabekov
  2015-02-02  5:59         ` Liang, Cunming
  0 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-02-02  5:34 UTC (permalink / raw)
  To: Liang, Cunming, dev


02.02.2015 11:15, Liang, Cunming пишет:
>
>> +static inline uint32_t
>> +crc32c_sse42_u64(uint64_t data, uint64_t init_val)
>> +{
>> +    __asm__ volatile(
>> +            "crc32q %[data], %[init_val];"
>> +            : [init_val] "+r" (init_val)
>> +            : [data] "rm" (data));
>> +    return init_val;
>> +}
> [LCM] I'm curious about the benefit of replacing CRC32 intrinsic
> "_mm_crc32_u32/64".

These intrinsics are not available on a platform which has no SSE4.2
support so the build would fail.

See previous suggestion from Neil: 
http://dpdk.org/ml/archives/dev/2014-November/008353.html

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-02  3:31       ` Neil Horman
  2015-02-02  5:18         ` [dpdk-dev] HA: " Жумабеков Ерден Мирзагулович
@ 2015-02-02  5:39         ` Yerden Zhumabekov
  2015-02-19 15:21           ` Bruce Richardson
  1 sibling, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-02-02  5:39 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev


02.02.2015 9:31, Neil Horman пишет:
> On Mon, Feb 02, 2015 at 09:07:45AM +0600, Yerden Zhumabekov wrote:
>
>> I think so, I've just successfully built it against latest snapshot with
>> RTE_TARGET
>> equal to 'x86_64-native-linuxapp-gcc'.
>>
> Please confirm that setting the machine type to default builds and runs properly.

If I understood you correctly, I set CONFIG_RTE_MACHINE="default" in the
config and the build was successful.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics
  2015-02-02  5:34       ` Yerden Zhumabekov
@ 2015-02-02  5:59         ` Liang, Cunming
  0 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2015-02-02  5:59 UTC (permalink / raw)
  To: Yerden Zhumabekov, dev

Got it, thanks.

> -----Original Message-----
> From: Yerden Zhumabekov [mailto:e_zhumabekov@sts.kz]
> Sent: Monday, February 02, 2015 1:34 PM
> To: Liang, Cunming; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of
> CRC32 intrinsics
> 
> 
> 02.02.2015 11:15, Liang, Cunming пишет:
> >
> >> +static inline uint32_t
> >> +crc32c_sse42_u64(uint64_t data, uint64_t init_val)
> >> +{
> >> +    __asm__ volatile(
> >> +            "crc32q %[data], %[init_val];"
> >> +            : [init_val] "+r" (init_val)
> >> +            : [data] "rm" (data));
> >> +    return init_val;
> >> +}
> > [LCM] I'm curious about the benefit of replacing CRC32 intrinsic
> > "_mm_crc32_u32/64".
> 
> These intrinsics are not available on a platform which has no SSE4.2
> support so the build would fail.
> 
> See previous suggestion from Neil:
> http://dpdk.org/ml/archives/dev/2014-November/008353.html
> 
> --
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-02  5:39         ` [dpdk-dev] " Yerden Zhumabekov
@ 2015-02-19 15:21           ` Bruce Richardson
  2015-02-23 17:36             ` Thomas Monjalon
  0 siblings, 1 reply; 98+ messages in thread
From: Bruce Richardson @ 2015-02-19 15:21 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

On Mon, Feb 02, 2015 at 11:39:18AM +0600, Yerden Zhumabekov wrote:
> 
> 02.02.2015 9:31, Neil Horman пишет:
> > On Mon, Feb 02, 2015 at 09:07:45AM +0600, Yerden Zhumabekov wrote:
> >
> >> I think so, I've just successfully built it against latest snapshot with
> >> RTE_TARGET
> >> equal to 'x86_64-native-linuxapp-gcc'.
> >>
> > Please confirm that setting the machine type to default builds and runs properly.
> 
> If I understood you correctly, I set CONFIG_RTE_MACHINE="default" in the
> config and the build was successful.
> 

Confirmed, this worked for me too.
Looking at the patches, they look good. However, one thing I think we are missing
is a unit test to verify that all our CRC implementations give the same result.
That would be useful as a sanity check of the software fallback especially. The
existing hash tests, test the hash table implementation rather than the
mathematical argorithm used to compute the hash values.

Overall, though, software fallback for CRC is something well worthwhile having.

Series Acked-by: Bruce Richardson <bruce.richardson@intel.com>

> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-19 15:21           ` Bruce Richardson
@ 2015-02-23 17:36             ` Thomas Monjalon
  2015-02-24  3:00               ` Yerden Zhumabekov
  0 siblings, 1 reply; 98+ messages in thread
From: Thomas Monjalon @ 2015-02-23 17:36 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

2015-02-19 15:21, Bruce Richardson:
> Confirmed, this worked for me too.
> Looking at the patches, they look good. However, one thing I think we are missing
> is a unit test to verify that all our CRC implementations give the same result.
> That would be useful as a sanity check of the software fallback especially. The
> existing hash tests, test the hash table implementation rather than the
> mathematical argorithm used to compute the hash values.
> 
> Overall, though, software fallback for CRC is something well worthwhile having.
> 
> Series Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thanks

Note: running doxygen compilation helped me to find and fix a small
mismatch (parameter alg was flag in comment).

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-23 17:36             ` Thomas Monjalon
@ 2015-02-24  3:00               ` Yerden Zhumabekov
  2015-02-24  3:10                 ` Thomas Monjalon
  0 siblings, 1 reply; 98+ messages in thread
From: Yerden Zhumabekov @ 2015-02-24  3:00 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev


23.02.2015 23:36, Thomas Monjalon пишет:
> 2015-02-19 15:21, Bruce Richardson:
>> Confirmed, this worked for me too.
>> Looking at the patches, they look good. However, one thing I think we are missing
>> is a unit test to verify that all our CRC implementations give the same result.
>> That would be useful as a sanity check of the software fallback especially. The
>> existing hash tests, test the hash table implementation rather than the
>> mathematical argorithm used to compute the hash values.
>>
>> Overall, though, software fallback for CRC is something well worthwhile having.
>>
>> Series Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Applied, thanks
>
> Note: running doxygen compilation helped me to find and fix a small
> mismatch (parameter alg was flag in comment).

Thanks, Bruce, Thomas.

As for yielding the same hash value, I made a test which runs every
CRC32 implementation across a number of randomly generated data sets.
Results are equal on my trial run.

I can post a patch for test_hash.c a bit later if this kind of check
suffices.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-24  3:00               ` Yerden Zhumabekov
@ 2015-02-24  3:10                 ` Thomas Monjalon
  2015-02-24  9:12                   ` Bruce Richardson
  0 siblings, 1 reply; 98+ messages in thread
From: Thomas Monjalon @ 2015-02-24  3:10 UTC (permalink / raw)
  To: Yerden Zhumabekov; +Cc: dev

2015-02-24 09:00, Yerden Zhumabekov:
> 
> 23.02.2015 23:36, Thomas Monjalon пишет:
> > 2015-02-19 15:21, Bruce Richardson:
> >> Confirmed, this worked for me too.
> >> Looking at the patches, they look good. However, one thing I think we are missing
> >> is a unit test to verify that all our CRC implementations give the same result.
> >> That would be useful as a sanity check of the software fallback especially. The
> >> existing hash tests, test the hash table implementation rather than the
> >> mathematical argorithm used to compute the hash values.
> >>
> >> Overall, though, software fallback for CRC is something well worthwhile having.
> >>
> >> Series Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> > Applied, thanks
> >
> > Note: running doxygen compilation helped me to find and fix a small
> > mismatch (parameter alg was flag in comment).
> 
> Thanks, Bruce, Thomas.
> 
> As for yielding the same hash value, I made a test which runs every
> CRC32 implementation across a number of randomly generated data sets.
> Results are equal on my trial run.
> 
> I can post a patch for test_hash.c a bit later if this kind of check
> suffices.

Yes, seems interesting. Thanks

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent
  2015-02-24  3:10                 ` Thomas Monjalon
@ 2015-02-24  9:12                   ` Bruce Richardson
  0 siblings, 0 replies; 98+ messages in thread
From: Bruce Richardson @ 2015-02-24  9:12 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Tue, Feb 24, 2015 at 04:10:34AM +0100, Thomas Monjalon wrote:
> 2015-02-24 09:00, Yerden Zhumabekov:
> > 
> > 23.02.2015 23:36, Thomas Monjalon пишет:
> > > 2015-02-19 15:21, Bruce Richardson:
> > >> Confirmed, this worked for me too.
> > >> Looking at the patches, they look good. However, one thing I think we are missing
> > >> is a unit test to verify that all our CRC implementations give the same result.
> > >> That would be useful as a sanity check of the software fallback especially. The
> > >> existing hash tests, test the hash table implementation rather than the
> > >> mathematical argorithm used to compute the hash values.
> > >>
> > >> Overall, though, software fallback for CRC is something well worthwhile having.
> > >>
> > >> Series Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> > > Applied, thanks
> > >
> > > Note: running doxygen compilation helped me to find and fix a small
> > > mismatch (parameter alg was flag in comment).
> > 
> > Thanks, Bruce, Thomas.
> > 
> > As for yielding the same hash value, I made a test which runs every
> > CRC32 implementation across a number of randomly generated data sets.
> > Results are equal on my trial run.
> > 
> > I can post a patch for test_hash.c a bit later if this kind of check
> > suffices.
> 
> Yes, seems interesting. Thanks
> 
+1

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2015-02-24  9:12 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-03  6:05 [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Yerden Zhumabekov
2014-09-03  6:05 ` [dpdk-dev] [PATCH 1/2] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
2014-09-03  6:05 ` [dpdk-dev] [PATCH 2/2] hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics Yerden Zhumabekov
2014-11-13 17:33 ` [dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call Thomas Monjalon
2014-11-14  0:52   ` Neil Horman
2014-11-14  7:15     ` Yerden Zhumabekov
2014-11-14 11:33       ` Neil Horman
2014-11-14 11:57         ` Yerden Zhumabekov
2014-11-14 13:53           ` Neil Horman
2014-11-14 14:33             ` Thomas Monjalon
2014-11-14 16:43             ` Yerden Zhumabekov
2014-11-14 18:41               ` Neil Horman
2014-11-15 21:45                 ` Yerden Zhumabekov
2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
2014-11-17 11:31   ` Neil Horman
2014-11-17 11:54     ` Yerden Zhumabekov
2014-11-25 17:05       ` Stephen Hemminger
2014-11-18  3:21   ` [dpdk-dev] [PATCH v3 0/5] " Yerden Zhumabekov
2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
2014-11-25 17:34       ` Stephen Hemminger
2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 2/5] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
2014-11-18  4:56       ` Yerden Zhumabekov
2014-11-18 13:33         ` Neil Horman
2014-11-18 13:37           ` Yerden Zhumabekov
2014-11-18 13:43           ` Thomas Monjalon
2014-11-18  3:21     ` [dpdk-dev] [PATCH v3 4/5] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
2014-11-18  3:25     ` [dpdk-dev] [PATCH v3 5/5] test: remove redundant compile checks Yerden Zhumabekov
2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 1/4] hash: add software CRC32 implementation Yerden Zhumabekov
2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 2/4] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
2014-11-17 12:34   ` Ananyev, Konstantin
2014-11-17 12:41     ` Yerden Zhumabekov
2014-11-17 14:06     ` Neil Horman
2014-11-16 17:59 ` [dpdk-dev] [PATCH v2 4/4] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
2014-11-18 14:03 ` [dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 1/5] hash: add software CRC32 implementation Yerden Zhumabekov
2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 2/5] hash: add new rte_hash_crc_8byte call Yerden Zhumabekov
2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
2014-11-18 14:41     ` Neil Horman
2014-11-18 15:06       ` Yerden Zhumabekov
2014-11-18 16:00         ` Neil Horman
2014-11-18 16:04           ` Bruce Richardson
2014-11-18 16:08             ` Bruce Richardson
2014-11-18 16:38             ` Neil Horman
2014-11-18 17:13           ` Yerden Zhumabekov
2014-11-18 17:29             ` Wang, Shawn
2014-11-19  4:07               ` Yerden Zhumabekov
2014-11-18 17:46             ` Neil Horman
2014-11-18 17:52               ` Bruce Richardson
2014-11-18 21:36                 ` Neil Horman
2014-11-19  3:51                   ` Yerden Zhumabekov
2014-11-19 10:16                   ` Bruce Richardson
2014-11-19 11:34                     ` Neil Horman
2014-11-19 11:38                       ` Bruce Richardson
2014-11-19 11:50                         ` Ananyev, Konstantin
2014-11-19 11:59                           ` Yerden Zhumabekov
2014-11-19 15:05                           ` Neil Horman
2014-11-19 16:51                             ` Ananyev, Konstantin
2014-11-19 11:35                     ` Yerden Zhumabekov
2014-11-19 15:07                       ` Neil Horman
2014-11-20  3:04                         ` Yerden Zhumabekov
2014-11-18 17:58               ` Yerden Zhumabekov
2014-11-18 14:03   ` [dpdk-dev] [PATCH v4 4/5] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
2014-11-18 14:05   ` [dpdk-dev] [PATCH v4 5/5] test: remove redundant compile checks Yerden Zhumabekov
2014-11-20  5:15 ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Yerden Zhumabekov
2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 1/7] hash: add software CRC32 implementation Yerden Zhumabekov
2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 2/7] hash: add assembly implementation of CRC32 intrinsics Yerden Zhumabekov
2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 3/7] hash: replace built-in functions implementing SSE4.2 Yerden Zhumabekov
2014-11-20  5:16   ` [dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function Yerden Zhumabekov
2014-11-21 11:22     ` Neil Horman
2014-11-21 11:26       ` Yerden Zhumabekov
2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 6/7] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 7/7] test: remove redundant compile checks Yerden Zhumabekov
2014-11-20  5:17   ` [dpdk-dev] [PATCH v5 5/7] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
2014-11-27 21:04   ` [dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent Thomas Monjalon
2014-11-28  3:28     ` Yerden Zhumabekov
2015-01-29  8:48 ` [dpdk-dev] [PATCH v6 " Yerden Zhumabekov
2015-01-29  8:48   ` [dpdk-dev] [PATCH v6 1/7] hash: add software CRC32 implementation Yerden Zhumabekov
2015-01-29  8:48   ` [dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics Yerden Zhumabekov
2015-02-02  5:15     ` Liang, Cunming
2015-02-02  5:34       ` Yerden Zhumabekov
2015-02-02  5:59         ` Liang, Cunming
2015-01-29  8:49   ` [dpdk-dev] [PATCH v6 3/7] hash: replace built-in functions implementing SSE4.2 Yerden Zhumabekov
2015-01-29  8:49   ` [dpdk-dev] [PATCH v6 4/7] hash: add rte_hash_crc_8byte function Yerden Zhumabekov
2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 5/7] hash: add fallback to software CRC32 implementation Yerden Zhumabekov
2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 6/7] hash: rte_hash_crc() slices data into 8-byte pieces Yerden Zhumabekov
2015-01-29  8:50   ` [dpdk-dev] [PATCH v6 7/7] test: remove redundant compile checks Yerden Zhumabekov
2015-02-01 14:13   ` [dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent Neil Horman
2015-02-02  3:07     ` Yerden Zhumabekov
2015-02-02  3:31       ` Neil Horman
2015-02-02  5:18         ` [dpdk-dev] HA: " Жумабеков Ерден Мирзагулович
2015-02-02  5:39         ` [dpdk-dev] " Yerden Zhumabekov
2015-02-19 15:21           ` Bruce Richardson
2015-02-23 17:36             ` Thomas Monjalon
2015-02-24  3:00               ` Yerden Zhumabekov
2015-02-24  3:10                 ` Thomas Monjalon
2015-02-24  9:12                   ` Bruce Richardson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).