From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 8D3FAC33A for ; Fri, 17 Apr 2015 18:03:48 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP; 17 Apr 2015 09:03:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,595,1422950400"; d="scan'208";a="681750081" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by orsmga001.jf.intel.com with ESMTP; 17 Apr 2015 09:03:46 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.216]) by IRSMSX103.ger.corp.intel.com ([169.254.3.129]) with mapi id 14.03.0224.002; Fri, 17 Apr 2015 17:03:45 +0100 From: "De Lara Guarch, Pablo" To: "Richardson, Bruce" Thread-Topic: [dpdk-dev] [PATCH] hash: update jhash function with the latest available Thread-Index: AQHQeE3fP6lDRQCLwE6sqGc+LPfFfJ1RXobw Date: Fri, 17 Apr 2015 16:03:44 +0000 Message-ID: References: <1429190819-27402-1-git-send-email-pablo.de.lara.guarch@intel.com> <20150416140115.GA10552@bricha3-MOBL3> In-Reply-To: <20150416140115.GA10552@bricha3-MOBL3> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH] hash: update jhash function with the latest available X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Apr 2015 16:03:49 -0000 > -----Original Message----- > From: Richardson, Bruce > Sent: Thursday, April 16, 2015 3:01 PM > To: De Lara Guarch, Pablo > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] hash: update jhash function with the late= st > available >=20 > On Thu, Apr 16, 2015 at 02:26:59PM +0100, Pablo de Lara wrote: > > Jenkins hash function was developed originally in 1996, > > and was integrated in first versions of DPDK. > > The function has been improved in 2006, > > achieving up to 60% better performance, compared to the original one. > > > > Check out: http://burtleburtle.net/bob/c/lookup3.c > > > > This patch integrates that code in the rte_jhash library, > > adding also a new function rte_jhash_word2, > > that returns two different hash values, for a single key. > > >=20 > Should the addition of the new functionality not be a separate patch from > the > update to the existing code? True, actually, I miss one extra function (2 in total). > Also, do the new functions return the exact same values as the previous > versions, > just faster? The new functions return different values from the previous version AND faster (some cases, MUCH faster) [...] > > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN > > + case 12: > > + c +=3D k[2]; b +=3D k[1]; a +=3D k[0]; break; > > + case 11: > > + c +=3D k[2]&0xffffff; b +=3D k[1]; a +=3D k[0]; break; > > + case 10: > > + c +=3D k[2]&0xffff; b +=3D k[1]; a +=3D k[0]; break; > > + case 9: > > + c +=3D k[2]&0xff; b +=3D k[1]; a +=3D k[0]; break; > > + case 8: > > + b +=3D k[1]; a +=3D k[0]; break; > > + case 7: > > + b +=3D k[1]&0xffffff; a +=3D k[0]; break; > > + case 6: > > + b +=3D k[1]&0xffff; a +=3D k[0]; break; > > + case 5: > > + b +=3D k[1]&0xff; a +=3D k[0]; break; > > + case 4: > > + a +=3D k[0]; break; > > + case 3: > > + a +=3D k[0]&0xffffff; break; > > + case 2: > > + a +=3D k[0]&0xffff; break; > > + case 1: > > + a +=3D k[0]&0xff; break; > > +#else > > + case 12: > > + c +=3D k[2]; b +=3D k[1]; a +=3D k[0]; break; > > + case 11: > > + c +=3D k[2]&0xffffff00; b +=3D k[1]; a +=3D k[0]; break; > > + case 10: > > + c +=3D k[2]&0xffff0000; b +=3D k[1]; a +=3D k[0]; break; > > + case 9: > > + c +=3D k[2]&0xff000000; b +=3D k[1]; a +=3D k[0]; break; > > + case 8: > > + b +=3D k[1]; a +=3D k[0]; break; > > + case 7: > > + b +=3D k[1]&0xffffff00; a +=3D k[0]; break; > > + case 6: > > + b +=3D k[1]&0xffff0000; a +=3D k[0]; break; > > + case 5: > > + b +=3D k[1]&0xff000000; a +=3D k[0]; break; > > + case 4: > > + a +=3D k[0]; break; > > + case 3: > > + a +=3D k[0]&0xffffff00; break; > > + case 2: > > + a +=3D k[0]&0xffff0000; break; > > + case 1: > > + a +=3D k[0]&0xff000000; break; > > +#endif >=20 > Only the constants seem different in this block. Can we get rid of the > #ifdefs using rte_XX_to_cpu() calls instead? Will add that in next version. >=20 > > + /* zero length strings require no mixing */ > > + case 0: > > + return c; > > + }; > > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN > > + } else if ((u.i & 0x1) =3D=3D 0) { > > + /* read 16-bit chunks */ > > + const uint16_t *k =3D (const uint16_t *)key; > > + const uint8_t *k8; > > + > > + /* all but last block: aligned reads and different mixing */ > > + while (length > 12) { > > + a +=3D k[0] + (((uint32_t)k[1])<<16); > > + b +=3D k[2] + (((uint32_t)k[3])<<16); > > + c +=3D k[4] + (((uint32_t)k[5])<<16); > > + > > + __rte_jhash_mix(a, b, c); > > + > > + k +=3D 6; > > + length -=3D 12; > > + } > > + > > + /* handle the last (probably partial) block */ > > + k8 =3D (const uint8_t *)k; > > + switch (length) { > > + case 12: > > + c +=3D k[4]+(((uint32_t)k[5])<<16); > > + b +=3D k[2]+(((uint32_t)k[3])<<16); > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 11: > > + /* fall through */ > > + c +=3D ((uint32_t)k8[10])<<16; > > + case 10: > > + c +=3D k[4]; > > + b +=3D k[2]+(((uint32_t)k[3])<<16); > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 9: > > + /* fall through */ > > + c +=3D k8[8]; > > + case 8: > > + b +=3D k[2]+(((uint32_t)k[3])<<16); > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 7: > > + /* fall through */ > > + b +=3D ((uint32_t)k8[6])<<16; > > + case 6: > > + b +=3D k[2]; > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 5: > > + /* fall through */ > > + b +=3D k8[4]; > > + case 4: > > + a +=3D k[0]+(((uint32_t)k[1])<<16); > > + break; > > + case 3: > > + /* fall through */ > > + a +=3D ((uint32_t)k8[2])<<16; > > + case 2: > > + a +=3D k[0]; > > + break; > > + case 1: > > + a +=3D k8[0]; > > + break; > > + case 0: > > + /* zero length requires no mixing */ > > + return c; > > + } > > +#endif >=20 > No else block for this ifdef? According to the code, for big endian, it only covers 4-byte alignment and = rest of cases. >=20 > > + } else { > > + const uint8_t *k =3D (const uint8_t *)key; > > + > > + /* all but the last block: affect some 32 bits of (a, b, c) */ > > + while (length > 12) { > > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN > > + a +=3D k[0]; > > + a +=3D ((uint32_t)k[1])<<8; > > + a +=3D ((uint32_t)k[2])<<16; > > + a +=3D ((uint32_t)k[3])<<24; > > + b +=3D k[4]; > > + b +=3D ((uint32_t)k[5])<<8; > > + b +=3D ((uint32_t)k[6])<<16; > > + b +=3D ((uint32_t)k[7])<<24; > > + c +=3D k[8]; > > + c +=3D ((uint32_t)k[9])<<8; > > + c +=3D ((uint32_t)k[10])<<16; > > + c +=3D ((uint32_t)k[11])<<24; > > +#else > > + a +=3D ((uint32_t)k[0])<<24; > > + a +=3D ((uint32_t)k[1])<<16; > > + a +=3D ((uint32_t)k[2])<<8; > > + a +=3D ((uint32_t)k[3]); > > + b +=3D ((uint32_t)k[4])<<24; > > + b +=3D ((uint32_t)k[5])<<16; > > + b +=3D ((uint32_t)k[6])<<8; > > + b +=3D ((uint32_t)k[7]); > > + c +=3D ((uint32_t)k[8])<<32; > > + c +=3D ((uint32_t)k[9])<<16; > > + c +=3D ((uint32_t)k[10])<<8; > > + c +=3D ((uint32_t)k[11]); > > +#endif >=20 > Maybe find a better way to shorten/remove this #ifdef also. E.g. shorter > ifdef defining macros for the different shift amounts, 0, 8, 16, 24. Agree. Will change that in v2. [...] Thanks for the comments. I will send a v2 soon.