From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pablo.de.lara.guarch@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 8D3FAC33A
 for <dev@dpdk.org>; Fri, 17 Apr 2015 18:03:48 +0200 (CEST)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by fmsmga101.fm.intel.com with ESMTP; 17 Apr 2015 09:03:47 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.11,595,1422950400"; d="scan'208";a="681750081"
Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157])
 by orsmga001.jf.intel.com with ESMTP; 17 Apr 2015 09:03:46 -0700
Received: from irsmsx108.ger.corp.intel.com ([169.254.11.216]) by
 IRSMSX103.ger.corp.intel.com ([169.254.3.129]) with mapi id 14.03.0224.002;
 Fri, 17 Apr 2015 17:03:45 +0100
From: "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>
To: "Richardson, Bruce" <bruce.richardson@intel.com>
Thread-Topic: [dpdk-dev] [PATCH] hash: update jhash function with the latest
 available
Thread-Index: AQHQeE3fP6lDRQCLwE6sqGc+LPfFfJ1RXobw
Date: Fri, 17 Apr 2015 16:03:44 +0000
Message-ID: <E115CCD9D858EF4F90C690B0DCB4D89727289E30@IRSMSX108.ger.corp.intel.com>
References: <1429190819-27402-1-git-send-email-pablo.de.lara.guarch@intel.com>
 <20150416140115.GA10552@bricha3-MOBL3>
In-Reply-To: <20150416140115.GA10552@bricha3-MOBL3>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] hash: update jhash function with the latest
 available
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2015 16:03:49 -0000


> -----Original Message-----
> From: Richardson, Bruce
> Sent: Thursday, April 16, 2015 3:01 PM
> To: De Lara Guarch, Pablo
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] hash: update jhash function with the late=
st
> available
>=20
> On Thu, Apr 16, 2015 at 02:26:59PM +0100, Pablo de Lara wrote:
> > Jenkins hash function was developed originally in 1996,
> > and was integrated in first versions of DPDK.
> > The function has been improved in 2006,
> > achieving up to 60% better performance, compared to the original one.
> >
> > Check out: http://burtleburtle.net/bob/c/lookup3.c
> >
> > This patch integrates that code in the rte_jhash library,
> > adding also a new function rte_jhash_word2,
> > that returns two different hash values, for a single key.
> >
>=20
> Should the addition of the new functionality not be a separate patch from
> the
> update to the existing code?

True, actually, I miss one extra function (2 in total).

> Also, do the new functions return the exact same values as the previous
> versions,
> just faster?

The new functions return different values from the previous version
AND faster (some cases, MUCH faster)

[...]

> > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN
> > +		case 12:
> > +			c +=3D k[2]; b +=3D k[1]; a +=3D k[0]; break;
> > +		case 11:
> > +			c +=3D k[2]&0xffffff; b +=3D k[1]; a +=3D k[0]; break;
> > +		case 10:
> > +			c +=3D k[2]&0xffff; b +=3D k[1]; a +=3D k[0]; break;
> > +		case 9:
> > +			c +=3D k[2]&0xff; b +=3D k[1]; a +=3D k[0]; break;
> > +		case 8:
> > +			b +=3D k[1]; a +=3D k[0]; break;
> > +		case 7:
> > +			b +=3D k[1]&0xffffff; a +=3D k[0]; break;
> > +		case 6:
> > +			b +=3D k[1]&0xffff; a +=3D k[0]; break;
> > +		case 5:
> > +			b +=3D k[1]&0xff; a +=3D k[0]; break;
> > +		case 4:
> > +			a +=3D k[0]; break;
> > +		case 3:
> > +			a +=3D k[0]&0xffffff; break;
> > +		case 2:
> > +			a +=3D k[0]&0xffff; break;
> > +		case 1:
> > +			a +=3D k[0]&0xff; break;
> > +#else
> > +		case 12:
> > +			c +=3D k[2]; b +=3D k[1]; a +=3D k[0]; break;
> > +		case 11:
> > +			c +=3D k[2]&0xffffff00; b +=3D k[1]; a +=3D k[0]; break;
> > +		case 10:
> > +			c +=3D k[2]&0xffff0000; b +=3D k[1]; a +=3D k[0]; break;
> > +		case 9:
> > +			c +=3D k[2]&0xff000000; b +=3D k[1]; a +=3D k[0]; break;
> > +		case 8:
> > +			b +=3D k[1]; a +=3D k[0]; break;
> > +		case 7:
> > +			b +=3D k[1]&0xffffff00; a +=3D k[0]; break;
> > +		case 6:
> > +			b +=3D k[1]&0xffff0000; a +=3D k[0]; break;
> > +		case 5:
> > +			b +=3D k[1]&0xff000000; a +=3D k[0]; break;
> > +		case 4:
> > +			a +=3D k[0]; break;
> > +		case 3:
> > +			a +=3D k[0]&0xffffff00; break;
> > +		case 2:
> > +			a +=3D k[0]&0xffff0000; break;
> > +		case 1:
> > +			a +=3D k[0]&0xff000000; break;
> > +#endif
>=20
> Only the constants seem different in this block. Can we get rid of the
> #ifdefs using rte_XX_to_cpu() calls instead?

Will add that in next version.

>=20
> > +		/* zero length strings require no mixing */
> > +		case 0:
> > +			return c;
> > +		};
> > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN
> > +	} else if ((u.i & 0x1) =3D=3D 0) {
> > +		/* read 16-bit chunks */
> > +		const uint16_t *k =3D (const uint16_t *)key;
> > +		const uint8_t  *k8;
> > +
> > +		/* all but last block: aligned reads and different mixing */
> > +		while (length > 12) {
> > +			a +=3D k[0] + (((uint32_t)k[1])<<16);
> > +			b +=3D k[2] + (((uint32_t)k[3])<<16);
> > +			c +=3D k[4] + (((uint32_t)k[5])<<16);
> > +
> > +			__rte_jhash_mix(a, b, c);
> > +
> > +			k +=3D 6;
> > +			length -=3D 12;
> > +		}
> > +
> > +		/* handle the last (probably partial) block */
> > +		k8 =3D (const uint8_t *)k;
> > +		switch (length) {
> > +		case 12:
> > +			c +=3D k[4]+(((uint32_t)k[5])<<16);
> > +			b +=3D k[2]+(((uint32_t)k[3])<<16);
> > +			a +=3D k[0]+(((uint32_t)k[1])<<16);
> > +			break;
> > +		case 11:
> > +			/* fall through */
> > +			c +=3D ((uint32_t)k8[10])<<16;
> > +		case 10:
> > +			c +=3D k[4];
> > +			b +=3D k[2]+(((uint32_t)k[3])<<16);
> > +			a +=3D k[0]+(((uint32_t)k[1])<<16);
> > +			break;
> > +		case 9:
> > +			/* fall through */
> > +			c +=3D k8[8];
> > +		case 8:
> > +			b +=3D k[2]+(((uint32_t)k[3])<<16);
> > +			a +=3D k[0]+(((uint32_t)k[1])<<16);
> > +			break;
> > +		case 7:
> > +			/* fall through */
> > +			b +=3D ((uint32_t)k8[6])<<16;
> > +		case 6:
> > +			b +=3D k[2];
> > +			a +=3D k[0]+(((uint32_t)k[1])<<16);
> > +			break;
> > +		case 5:
> > +			/* fall through */
> > +			b +=3D k8[4];
> > +		case 4:
> > +			a +=3D k[0]+(((uint32_t)k[1])<<16);
> > +			break;
> > +		case 3:
> > +			/* fall through */
> > +			a +=3D ((uint32_t)k8[2])<<16;
> > +		case 2:
> > +			a +=3D k[0];
> > +			break;
> > +		case 1:
> > +			a +=3D k8[0];
> > +			break;
> > +		case 0:
> > +			/* zero length requires no mixing */
> > +			return c;
> > +		}
> > +#endif
>=20
> No else block for this ifdef?

According to the code, for big endian, it only covers 4-byte alignment and =
rest of cases.

>=20
> > +	} else {
> > +		const uint8_t *k =3D (const uint8_t *)key;
> > +
> > +		/* all but the last block: affect some 32 bits of (a, b, c) */
> > +		while (length > 12) {
> > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN
> > +			a +=3D k[0];
> > +			a +=3D ((uint32_t)k[1])<<8;
> > +			a +=3D ((uint32_t)k[2])<<16;
> > +			a +=3D ((uint32_t)k[3])<<24;
> > +			b +=3D k[4];
> > +			b +=3D ((uint32_t)k[5])<<8;
> > +			b +=3D ((uint32_t)k[6])<<16;
> > +			b +=3D ((uint32_t)k[7])<<24;
> > +			c +=3D k[8];
> > +			c +=3D ((uint32_t)k[9])<<8;
> > +			c +=3D ((uint32_t)k[10])<<16;
> > +			c +=3D ((uint32_t)k[11])<<24;
> > +#else
> > +			a +=3D ((uint32_t)k[0])<<24;
> > +			a +=3D ((uint32_t)k[1])<<16;
> > +			a +=3D ((uint32_t)k[2])<<8;
> > +			a +=3D ((uint32_t)k[3]);
> > +			b +=3D ((uint32_t)k[4])<<24;
> > +			b +=3D ((uint32_t)k[5])<<16;
> > +			b +=3D ((uint32_t)k[6])<<8;
> > +			b +=3D ((uint32_t)k[7]);
> > +			c +=3D ((uint32_t)k[8])<<32;
> > +			c +=3D ((uint32_t)k[9])<<16;
> > +			c +=3D ((uint32_t)k[10])<<8;
> > +			c +=3D ((uint32_t)k[11]);
> > +#endif
>=20
> Maybe find a better way to shorten/remove this #ifdef also. E.g. shorter
> ifdef defining macros for the different shift amounts, 0, 8, 16, 24.

Agree. Will change that in v2.

[...]

Thanks for the comments. I will send a v2 soon.