From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f181.google.com (mail-pd0-f181.google.com [209.85.192.181]) by dpdk.org (Postfix) with ESMTP id 63E539A87 for ; Fri, 8 May 2015 23:19:19 +0200 (CEST) Received: by pdbqa5 with SMTP id qa5so95206861pdb.1 for ; Fri, 08 May 2015 14:19:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=FivUgsiLT1c3s/JmH7w7imLkIUkxRUh7l4QUOnfKlzc=; b=KJVxLjoHap8nhKP3gQs+tEcrgYBhH+nUv2YcQWUGl1U6imyGARQiXKrxCGiyc1Oo5y obiu5syRT9dhqpHwKO+nJUwYPj3NCoT2JpAIJOVUsCJR9CkD5XCZcq3Fby3yu3I/N+ZH M63PUZWxFG1XYcWpLE8opNaia3RfMoNUr8BykD4EcKKt5wCwm+gLsuQrYZo/arSL823/ /GWJte3NF9JKFYyrIE7ztTvws8nIkkqUm7TPMbErhcwU/YWg9gK2t1kFNcvdrI4cq4Wq mRCTYqk3I3toBSTXEQFoFgUhXn+PYsaTqgSUCE5ZsVU0YgyQNnTMC7LsjGUKNun6ySDw ZNtg== X-Received: by 10.66.242.79 with SMTP id wo15mr38482pac.6.1431119957686; Fri, 08 May 2015 14:19:17 -0700 (PDT) Received: from user-PC.hsd1.ca.comcast.net (c-98-234-176-9.hsd1.ca.comcast.net. [98.234.176.9]) by mx.google.com with ESMTPSA id e5sm6081436pdc.94.2015.05.08.14.19.16 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 08 May 2015 14:19:16 -0700 (PDT) From: Ravi Kerur To: dev@dpdk.org Date: Fri, 8 May 2015 14:19:06 -0700 Message-Id: <1431119946-32078-1-git-send-email-rkerur@gmail.com> X-Mailer: git-send-email 1.9.1 Subject: [dpdk-dev] [PATCH v2] Implement rte_memcmp with AVX/SSE instructions. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 May 2015 21:19:19 -0000 Background: After preliminary discussion with John (Zhihong) and Tim from Intel it was decided that it would be beneficial to use AVX/SSE instructions for memcmp similar to memcpy being implemeneted. In addition, we decided to use librte_hash as a test candidate to test both functionality and performance. Currently memcmp in librte_hash is used for key comparisons whose length can vary and max key length is defined to 64 bytes. Preliminary tests on memory comparison alone shows using AVX/SSE instructions takes 1/3rd CPU ticks compared with regular memcmp function. Furthermore, hash_perf_autotest shows better results in all categories. Please note that memory comparison is a small portion in hash functionality and CPU Ticks/Op is for hash operations (Add on Empty, Add update, Lookup). Only hash lookup results are shown below. I can send complete results if interested. Test was conducted on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04, x86_64, 16GB DDR3 system. PS: I would like to keep "rte_memcmp" simple with return codes 0 - match 1 - no-match since usage in DPDK is for equality or inequality and I have not seen any instance where less-than/greater-than comparison is needed. Hence "if (unlikely(...))" portion in the code will probably be removed and it will be made specific to DPDK rather than being generic. /*************Existing code**********************************/ *** Hash table performance test results *** Hash Func. , Operation , Key size (bytes), Entries, Entries per bucket, Errors , Avg. bucket entries, Ticks/Op. rte_hash_crc, Lookup , 16 , 1024 , 1 , 10000 , 0.00 , 88.55 rte_hash_crc, Lookup , 16 , 1024 , 2 , 10000 , 0.00 , 99.28 rte_hash_crc, Lookup , 16 , 1024 , 4 , 10000 , 0.00 , 106.73 rte_hash_crc, Lookup , 16 , 1024 , 8 , 10000 , 0.00 , 126.99 rte_hash_crc, Lookup , 16 , 1024 , 16 , 10000 , 0.00 , 159.80 rte_hash_crc, Lookup , 16 , 1048576, 1 , 51 , 0.01 , 175.23 rte_hash_crc, Lookup , 16 , 1048576, 2 , 2 , 0.02 , 171.24 rte_hash_crc, Lookup , 16 , 1048576, 4 , 0 , 0.04 , 145.48 rte_hash_crc, Lookup , 16 , 1048576, 8 , 0 , 0.08 , 162.35 rte_hash_crc, Lookup , 16 , 1048576, 16 , 0 , 0.15 , 182.42 jhash , Lookup , 16 , 1048576, 1 , 33 , 0.01 , 219.71 jhash , Lookup , 16 , 1048576, 2 , 1 , 0.02 , 216.44 jhash , Lookup , 16 , 1048576, 4 , 0 , 0.04 , 188.29 jhash , Lookup , 16 , 1048576, 8 , 0 , 0.08 , 203.70 jhash , Lookup , 16 , 1048576, 16 , 0 , 0.15 , 229.50 /**************New AVX/SSE code******************************/ Hash Func. , Operation , Key size (bytes), Entries, Entries per bucket, Errors , Avg. bucket entries, Ticks/Op. rte_hash_crc, Lookup , 16 , 1024 , 1 , 10000 , 0.00 , 85.69 rte_hash_crc, Lookup , 16 , 1024 , 2 , 10000 , 0.00 , 93.95 rte_hash_crc, Lookup , 16 , 1024 , 4 , 10000 , 0.00 , 102.80 rte_hash_crc, Lookup , 16 , 1024 , 8 , 10000 , 0.00 , 122.60 rte_hash_crc, Lookup , 16 , 1024 , 16 , 10000 , 0.00 , 156.58 rte_hash_crc, Lookup , 16 , 1048576, 1 , 41 , 0.01 , 156.84 rte_hash_crc, Lookup , 16 , 1048576, 2 , 0 , 0.02 , 157.90 rte_hash_crc, Lookup , 16 , 1048576, 4 , 0 , 0.04 , 134.92 rte_hash_crc, Lookup , 16 , 1048576, 8 , 0 , 0.08 , 150.99 rte_hash_crc, Lookup , 16 , 1048576, 16 , 0 , 0.15 , 174.08 jhash , Lookup , 16 , 1048576, 1 , 45 , 0.01 , 212.03 jhash , Lookup , 16 , 1048576, 2 , 2 , 0.02 , 210.65 jhash , Lookup , 16 , 1048576, 4 , 0 , 0.04 , 185.90 jhash , Lookup , 16 , 1048576, 8 , 0 , 0.08 , 201.35 jhash , Lookup , 16 , 1048576, 16 , 0 , 0.15 , 223.54 Ravi Kerur (1): Implement memcmp using AVX/SSE instructions. app/test/test_hash_perf.c | 36 +- .../common/include/arch/ppc_64/rte_memcmp.h | 62 +++ .../common/include/arch/x86/rte_memcmp.h | 421 +++++++++++++++++++++ lib/librte_eal/common/include/generic/rte_memcmp.h | 131 +++++++ lib/librte_hash/rte_hash.c | 59 ++- 5 files changed, 675 insertions(+), 34 deletions(-) create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_memcmp.h create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcmp.h create mode 100644 lib/librte_eal/common/include/generic/rte_memcmp.h -- 1.9.1