From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id D089A2A07
 for <dev@dpdk.org>; Thu,  7 May 2015 13:11:21 +0200 (CEST)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga103.fm.intel.com with ESMTP; 07 May 2015 04:11:17 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.13,384,1427785200"; d="scan'208";a="722147973"
Received: from irsmsx102.ger.corp.intel.com ([163.33.3.155])
 by fmsmga002.fm.intel.com with ESMTP; 07 May 2015 04:11:12 -0700
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.178]) by
 IRSMSX102.ger.corp.intel.com ([169.254.2.145]) with mapi id 14.03.0224.002;
 Thu, 7 May 2015 12:11:08 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>, "dev@dpdk.org"
 <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v3 3/6] hash: update jhash function with the
 latest available
Thread-Index: AQHQh0Ifa916QN+t0kuol3MAxqDHRp1uDeowgACTEICAAGVfwIABVYBQ
Date: Thu, 7 May 2015 11:11:08 +0000
Message-ID: <2601191342CEEE43887BDE71AB97725821425383@irsmsx105.ger.corp.intel.com>
References: <1429874587-17939-1-git-send-email-pablo.de.lara.guarch@intel.com>
 <1430837034-21031-1-git-send-email-pablo.de.lara.guarch@intel.com>
 <1430837034-21031-4-git-send-email-pablo.de.lara.guarch@intel.com>
 <2601191342CEEE43887BDE71AB97725821424ED1@irsmsx105.ger.corp.intel.com>
 <E115CCD9D858EF4F90C690B0DCB4D8972729A7B7@IRSMSX108.ger.corp.intel.com> 
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.182]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v3 3/6] hash: update jhash function with
	the	latest available
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 07 May 2015 11:11:23 -0000



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, May 06, 2015 5:11 PM
> To: De Lara Guarch, Pablo; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v3 3/6] hash: update jhash function with t=
he latest available
>=20
> Hi Pablo,
>=20
> > -----Original Message-----
> > From: De Lara Guarch, Pablo
> > Sent: Wednesday, May 06, 2015 10:36 AM
> > To: Ananyev, Konstantin; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v3 3/6] hash: update jhash function with=
 the latest available
> >
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Wednesday, May 06, 2015 1:36 AM
> > > To: De Lara Guarch, Pablo; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v3 3/6] hash: update jhash function wi=
th the
> > > latest available
> > >
> > >
> > > Hi Pablo,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pablo de Lara
> > > > Sent: Tuesday, May 05, 2015 3:44 PM
> > > > To: dev@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH v3 3/6] hash: update jhash function with=
 the
> > > latest available
> > > >
> > > > Jenkins hash function was developed originally in 1996,
> > > > and was integrated in first versions of DPDK.
> > > > The function has been improved in 2006,
> > > > achieving up to 60% better performance, compared to the original on=
e.
> > > >
> > > > This patch integrates that code into the rte_jhash library.
> > > >
> > > > Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> > > > ---
> > > >  lib/librte_hash/rte_jhash.h |  261
> > > +++++++++++++++++++++++++++++++------------
> > > >  1 files changed, 188 insertions(+), 73 deletions(-)
> > > >
> > > > diff --git a/lib/librte_hash/rte_jhash.h b/lib/librte_hash/rte_jhas=
h.h
> > > > index a4bf5a1..0e96b7c 100644
> > > > --- a/lib/librte_hash/rte_jhash.h
> > > > +++ b/lib/librte_hash/rte_jhash.h
> > > > @@ -1,7 +1,7 @@
> > > >  /*-
> > > >   *   BSD LICENSE
> > > >   *
> > > > - *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved=
.
> > > > + *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved=
.
> > > >   *   All rights reserved.
> > > >   *
> > > >   *   Redistribution and use in source and binary forms, with or wi=
thout
> > > > @@ -45,38 +45,68 @@ extern "C" {
> > > >  #endif
> > > >
> > > >  #include <stdint.h>
> > > > +#include <string.h>
> > > > +#include <rte_byteorder.h>
> > > >
> > > >  /* jhash.h: Jenkins hash support.
> > > >   *
> > > > - * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
> > > > + * Copyright (C) 2006 Bob Jenkins (bob_jenkins@burtleburtle.net)
> > > >   *
> > > >   * http://burtleburtle.net/bob/hash/
> > > >   *
> > > >   * These are the credits from Bob's sources:
> > > >   *
> > > > - * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
> > > > - * hash(), hash2(), hash3, and mix() are externally useful functio=
ns.
> > > > - * Routines to test the hash are included if SELF_TEST is defined.
> > > > - * You can use this free for any purpose.  It has no warranty.
> > > > + * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
> > > > + *
> > > > + * These are functions for producing 32-bit hashes for hash table =
lookup.
> > > > + * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and =
final()
> > > > + * are externally useful functions.  Routines to test the hash are=
 included
> > > > + * if SELF_TEST is defined.  You can use this free for any purpose=
.  It's in
> > > > + * the public domain.  It has no warranty.
> > > >   *
> > > >   * $FreeBSD$
> > > >   */
> > > >
> > > > +#define rot(x, k) (((x) << (k)) | ((x) >> (32-(k))))
> > > > +
> > > >  /** @internal Internal function. NOTE: Arguments are modified. */
> > > >  #define __rte_jhash_mix(a, b, c) do { \
> > > > -	a -=3D b; a -=3D c; a ^=3D (c>>13); \
> > > > -	b -=3D c; b -=3D a; b ^=3D (a<<8); \
> > > > -	c -=3D a; c -=3D b; c ^=3D (b>>13); \
> > > > -	a -=3D b; a -=3D c; a ^=3D (c>>12); \
> > > > -	b -=3D c; b -=3D a; b ^=3D (a<<16); \
> > > > -	c -=3D a; c -=3D b; c ^=3D (b>>5); \
> > > > -	a -=3D b; a -=3D c; a ^=3D (c>>3); \
> > > > -	b -=3D c; b -=3D a; b ^=3D (a<<10); \
> > > > -	c -=3D a; c -=3D b; c ^=3D (b>>15); \
> > > > +	a -=3D c; a ^=3D rot(c, 4); c +=3D b; \
> > > > +	b -=3D a; b ^=3D rot(a, 6); a +=3D c; \
> > > > +	c -=3D b; c ^=3D rot(b, 8); b +=3D a; \
> > > > +	a -=3D c; a ^=3D rot(c, 16); c +=3D b; \
> > > > +	b -=3D a; b ^=3D rot(a, 19); a +=3D c; \
> > > > +	c -=3D b; c ^=3D rot(b, 4); b +=3D a; \
> > > > +} while (0)
> > > > +
> > > > +#define __rte_jhash_final(a, b, c) do { \
> > > > +	c ^=3D b; c -=3D rot(b, 14); \
> > > > +	a ^=3D c; a -=3D rot(c, 11); \
> > > > +	b ^=3D a; b -=3D rot(a, 25); \
> > > > +	c ^=3D b; c -=3D rot(b, 16); \
> > > > +	a ^=3D c; a -=3D rot(c, 4);  \
> > > > +	b ^=3D a; b -=3D rot(a, 14); \
> > > > +	c ^=3D b; c -=3D rot(b, 24); \
> > > >  } while (0)
> > > >
> > > >  /** The golden ratio: an arbitrary value. */
> > > > -#define RTE_JHASH_GOLDEN_RATIO      0x9e3779b9
> > > > +#define RTE_JHASH_GOLDEN_RATIO      0xdeadbeef
> > > > +
> > > > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN
> > > > +#define RTE_JHASH_BYTE0_SHIFT 0
> > > > +#define RTE_JHASH_BYTE1_SHIFT 8
> > > > +#define RTE_JHASH_BYTE2_SHIFT 16
> > > > +#define RTE_JHASH_BYTE3_SHIFT 24
> > > > +#else
> > > > +#define RTE_JHASH_BYTE0_SHIFT 24
> > > > +#define RTE_JHASH_BYTE1_SHIFT 16
> > > > +#define RTE_JHASH_BYTE2_SHIFT 8
> > > > +#define RTE_JHASH_BYTE3_SHIFT 0
> > > > +#endif
> > > > +
> > > > +#define LOWER8b_MASK rte_le_to_cpu_32(0xff)
> > > > +#define LOWER16b_MASK rte_le_to_cpu_32(0xffff)
> > > > +#define LOWER24b_MASK rte_le_to_cpu_32(0xffffff)
> > > >
> > > >  /**
> > > >   * The most generic version, hashes an arbitrary sequence
> > > > @@ -95,42 +125,119 @@ extern "C" {
> > > >  static inline uint32_t
> > > >  rte_jhash(const void *key, uint32_t length, uint32_t initval)
> > > >  {
> > > > -	uint32_t a, b, c, len;
> > > > -	const uint8_t *k =3D (const uint8_t *)key;
> > > > -	const uint32_t *k32 =3D (const uint32_t *)key;
> > > > +	uint32_t a, b, c;
> > > > +	union {
> > > > +		const void *ptr;
> > > > +		size_t i;
> > > > +	} u;
> > > >
> > > > -	len =3D length;
> > > > -	a =3D b =3D RTE_JHASH_GOLDEN_RATIO;
> > > > -	c =3D initval;
> > > > +	/* Set up the internal state */
> > > > +	a =3D b =3D c =3D RTE_JHASH_GOLDEN_RATIO + ((uint32_t)length) + i=
nitval;
> > > >
> > > > -	while (len >=3D 12) {
> > > > -		a +=3D k32[0];
> > > > -		b +=3D k32[1];
> > > > -		c +=3D k32[2];
> > > > +	u.ptr =3D key;
> > > >
> > > > -		__rte_jhash_mix(a,b,c);
> > > > +	/* Check key alignment. For x86 architecture, first case is alway=
s
> > > optimal */
> > > > +	if (!strcmp(RTE_ARCH,"x86_64") || !strcmp(RTE_ARCH,"i686") || (u.=
i
> > > & 0x3) =3D=3D 0) {
> > >
> > > Wonder why strcmp(), why not something like: 'if defined(RTE_ARCH_I68=
6)
> > > || defined(RTE_ARCH_X86_64)' as in all other places?
> > > Another question what would be in case of RTE_ARCH=3D"x86_x32"?
> > > Konstantin
> >
> > Functionally is the same and using this method, I can integrate all con=
ditions in one line, so it takes less code.
> > I also checked the assembly code, and the compiler removes the check if=
 it is Intel architecture, so performance remains the same.
>=20
> Well,  yes I think most modern compilers  treat strcmp() as a builtin fun=
ction and are able to optimise these strcmp() calls off for that
> case.
> But  we probably can't guarantee that it would always be the case for all=
 different compiler/libc combinations.
> Again, by some reason user might need to use ' -fno-builtin' flag while b=
uilding his stuff.
> So I would use pre-processor macros here, it is more predictable.
> Again, that way it is consistent with other places.
>=20
> Actually I wonder do you really need such sort of diversity for aligned/n=
on-aligned case?
> Wonder wouldn't something like that work for you:
>=20
> #infdef  RTE_ARCH_X86
>         const uint32_t *k =3D (uint32_t *)((uintptr_t)key & (uintptr_t)~3=
);
>         const uint32_t s =3D ((uintptr_t)key & 3) * CHAR_BIT;
> #else /*X86*/
>         const uint32_t *k =3D key;
>         const uint32_t s =3D 0;
> #endif
>=20
>   while (len > 12) {
>                 a +=3D k[0] >> s | (uint64_t)k[1] << (32 - s);
>                 b +=3D k[1] >> s | (uint64_t)k[2] << (32 - s);
>                 c +=3D k[2] >> s | (uint64_t)k[3] << (32 - s);
>                 k +=3D 3;
>                 length -=3D 12;
> }
>=20
> switch (length) {
> case 12:
>     a +=3D k[0] >> s | (uint64_t)k[1] << (32 - s);
>     b +=3D k[1] >> s | (uint64_t)k[2] << (32 - s);
>     c +=3D k[2] >> s | (uint64_t)k[3] << (32 - s);
>     break;
> case 11:
>     a +=3D k[0] >> s | (uint64_t)k[1] << (32 - s);
>     b +=3D k[1] >> s | (uint64_t)k[2] << (32 - s);
>     c +=3D (k[2] >> s | (uint64_t)k[3] << (32 - s)) & & LOWER24b_MASK;
>     break;
> ...
> case 1:
>    a +=3D (k[0] >> s | (uint64_t)k[1] << (32 - s)) & LOWER8b_MASK;
>    break;
> ...
>=20
> In that way, even for non-aligned you don't need do 4B reads.
> For x86, compiler would do it's optimisation work and strip off '>> s | (=
uint64_t)k[..] << (32 - s);'.
>=20

Actually, as Sergio pointed out, that approach might penalise non-x86 4B al=
igned case.=20
So probably, a special path for s=3D=3D 0 is still needed, i.e:
if (s=3D=3D0) {...; a +=3D k[0]; ...} else {...; a +=3D k[0] >> s | (uint64=
_t)k[1] << (32 - s);...}
Konstantin

> >
> > Re x86_x32, you are right, probably I need to include it. Although, I j=
ust realized that it is not used in any other place.
> > Wonder if we should include it somewhere else? E.g. rte_hash_crc.h
>=20
> Yep, that's true we are not doing it for hash_crc also...
> Would probably good to have some sort of ' RTE_ARCH_X86' - that would be =
defined for all x86 targets and use it whenever applicable.
> But I suppose, that's a subject for another patch.
>=20
> Konstantin
>=20