From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id DB8AA58C5 for ; Fri, 26 Aug 2016 23:33:56 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP; 26 Aug 2016 14:33:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,582,1464678000"; d="scan'208";a="1031835869" Received: from sie-lab-214-036.ir.intel.com (HELO silpixa00394365.ir.intel.com) ([10.237.214.36]) by fmsmga001.fm.intel.com with ESMTP; 26 Aug 2016 14:33:54 -0700 From: Pablo de Lara To: dev@dpdk.org Cc: bruce.richardson@intel.com, Pablo de Lara Date: Fri, 26 Aug 2016 22:34:44 +0100 Message-Id: <1472247287-167011-1-git-send-email-pablo.de.lara.guarch@intel.com> X-Mailer: git-send-email 2.7.4 Subject: [dpdk-dev] [PATCH 0/3] Cuckoo hash lookup enhancements X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Aug 2016 21:33:57 -0000 This patchset improves lookup performance on the current hash library by changing the existing lookup bulk pipeline, with an improved pipeline, based on a loop-and-jump model, instead of the current 4-stage 2-entry pipeline. Also, x86 vectorized intrinsics are used to improve performance when comparing signatures. First patch modifies the order of the bucket structure. Currently, the buckets store all the signatures together (current and alternative). In order to be able to perform a vectorized signature comparison, all current signatures have to be together, so the order of the bucket has been changed, having separated all the current signatures from the alternative signatures. Second patch introduces x86 vectorized intrinsics. When performing a lookup bulk operation, all current signatures in a bucket are compared against the signature of the key being looked up. Now that they all are together, a vectorized comparison can be performed, which takes less instructions to be carried out. In case of having a machine with AVX2, number of entries per bucket are increased from 4 to 8, as AVX2 allows comparing two 256-bit values, with 8x32-bit integers, which are the 8 signatures on the bucket. Third (and last) patch modifies the current pipeline of the lookup bulk function. The new pipeline is based on a loop-and-jump model. The two key improvements are: - Better prefetching: in this case, first 4 keys to be looked up are prefetched, and after that, the rest of the keys are prefetched at the time the calculation of the signatures are being performed. This gives more time for the CPU to prefetch the data requesting before actually need it, which result in less cache misses and therefore, higher throughput. - Lower performance penalty when using fallback: the lookup bulk algorithm assumes that most times there will not be a collision in a bucket, but it might happen that two or more signatures are equal, which means that more than one key comparison might be necessary. In that case, only the key of the first hit is prefetched, like in the current implementation. The difference now is that if this comparison results in a miss, the information of the other keys to be compared has been stored, unlike the current implementation, which needs to perform an entire simple lookup again. This patchset depends on the following patchset: "Hash library fixes" (http://dpdk.org/ml/archives/dev/2016-August/045780.html) Byron Marohn (3): hash: reorganize bucket structure hash: add vectorized comparison hash: modify lookup bulk pipeline lib/librte_hash/rte_cuckoo_hash.c | 427 +++++++++++++--------------------- lib/librte_hash/rte_cuckoo_hash.h | 24 +- lib/librte_hash/rte_cuckoo_hash_x86.h | 20 +- 3 files changed, 179 insertions(+), 292 deletions(-) -- 2.7.4