From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sameh.gobriel@intel.com>
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 by dpdk.org (Postfix) with ESMTP id D2E6E592F
 for <dev@dpdk.org>; Fri, 30 Sep 2016 21:53:16 +0200 (CEST)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by fmsmga105.fm.intel.com with ESMTP; 30 Sep 2016 12:53:15 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.31,274,1473145200"; d="scan'208";a="1038786913"
Received: from orsmsx104.amr.corp.intel.com ([10.22.225.131])
 by orsmga001.jf.intel.com with ESMTP; 30 Sep 2016 12:53:15 -0700
Received: from orsmsx113.amr.corp.intel.com ([169.254.9.161]) by
 ORSMSX104.amr.corp.intel.com ([169.254.4.228]) with mapi id 14.03.0248.002;
 Fri, 30 Sep 2016 12:53:15 -0700
From: "Gobriel, Sameh" <sameh.gobriel@intel.com>
To: "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>, "dev@dpdk.org"
 <dev@dpdk.org>
CC: "Richardson, Bruce" <bruce.richardson@intel.com>
Thread-Topic: [dpdk-dev] [PATCH v4 0/4] Cuckoo hash enhancements
Thread-Index: AQHSGu2hSeXbYz26q0GPj60j4uzuRaCScnfQ
Date: Fri, 30 Sep 2016 19:53:14 +0000
Message-ID: <D6455DCED8CA9B4B940598A7ACF7B9FD71825A28@ORSMSX113.amr.corp.intel.com>
References: <1473190397-120741-1-git-send-email-pablo.de.lara.guarch@intel.com>
 <1475221136-213246-1-git-send-email-pablo.de.lara.guarch@intel.com>
In-Reply-To: <1475221136-213246-1-git-send-email-pablo.de.lara.guarch@intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.22.254.139]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v4 0/4] Cuckoo hash enhancements
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Sep 2016 19:53:17 -0000



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of De Lara Guarch, Pabl=
o
> Sent: Friday, September 30, 2016 12:39 AM
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: [dpdk-dev] [PATCH v4 0/4] Cuckoo hash enhancements
>=20
> This patchset improves lookup performance on the current hash library by
> changing the existing lookup bulk pipeline, with an improved pipeline, ba=
sed on
> a loop-and-jump model, instead of the current 4-stage 2-entry pipeline.
> Also, x86 vectorized intrinsics are used to improve performance when
> comparing signatures.
>=20
> First patch reorganizes the order of the hash structure.
> The structure takes more than one 64-byte cache line, but not all the fie=
lds are
> used in the lookup operation (the most common operation).
> Therefore, all these fields have been moved to the first part of the stru=
cture, so
> they all fit in one cache line, improving slightly the performance in som=
e
> scenarios.
>=20
> Second patch modifies the order of the bucket structure.
> Currently, the buckets store all the signatures together (current and
> alternative).
> In order to be able to perform a vectorized signature comparison, all cur=
rent
> signatures have to be together, so the order of the bucket has been chang=
ed,
> having separated all the current signatures from the alternative signatur=
es.
>=20
> Third patch introduces x86 vectorized intrinsics.
> When performing a lookup bulk operation, all current signatures in a buck=
et
> are compared against the signature of the key being looked up.
> Now that they all are together, a vectorized comparison can be performed,
> which takes less instructions to be carried out.
> In case of having a machine with AVX2, number of entries per bucket are
> increased from 4 to 8, as AVX2 allows comparing two 256-bit values, with
> 8x32-bit integers, which are the 8 signatures on the bucket.
>=20
> Fourth (and last) patch modifies the current pipeline of the lookup bulk
> function.
> The new pipeline is based on a loop-and-jump model. The two key
> improvements are:
>=20
> - Better prefetching: in this case, first 4 keys to be looked up are pref=
etched,
>   and after that, the rest of the keys are prefetched at the time the cal=
culation
>   of the signatures are being performed. This gives more time for the CPU=
 to
>   prefetch the data requesting before actually need it, which result in l=
ess
>   cache misses and therefore, higher throughput.
>=20
> - Lower performance penalty when using fallback: the lookup bulk algorith=
m
>   assumes that most times there will not be a collision in a bucket, but =
it might
>   happen that two or more signatures are equal, which means that more tha=
n
> one
>   key comparison might be necessary. In that case, only the key of the fi=
rst hit is
> prefetched,
>   like in the current implementation. The difference now is that if this
> comparison
>   results in a miss, the information of the other keys to be compared has=
 been
> stored,
>   unlike the current implementation, which needs to perform an entire sim=
ple
> lookup again.
>=20
> Changes in v4:
> - Reordered hash structure, so alt signature is at the start
>   of the next cache line, and explain in the commit message
>   why it has been moved
> - Reordered hash structure, so name field is on top of the structure,
>   leaving all the fields used in lookup in the next cache line
>   (instead of the first cache line)
>=20
> Changes in v3:
> - Corrected the cover letter (wrong number of patches)
>=20
> Changes in v2:
> - Increased entries per bucket from 4 to 8 for all cases,
>   so it is not architecture dependent any longer.
> - Replaced compile-time signature comparison function election
>   with run-time election, so best optimization available
>   will be used from a single binary.
> - Reordered the hash structure, so all the fields used by lookup
>   are in the same cache line (first).
>=20
> Byron Marohn (3):
>   hash: reorganize bucket structure
>   hash: add vectorized comparison
>   hash: modify lookup bulk pipeline
>=20
> Pablo de Lara (1):
>   hash: reorder hash structure
>=20
>  lib/librte_hash/rte_cuckoo_hash.c     | 455 ++++++++++++++--------------=
------
>  lib/librte_hash/rte_cuckoo_hash.h     |  56 +++--
>  lib/librte_hash/rte_cuckoo_hash_x86.h |  20 +-
>  3 files changed, 228 insertions(+), 303 deletions(-)
>=20
> --
> 2.7.4

Series-acked-by: Sameh Gobriel <sameh.gobriel@intel.com>