From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id DB51D1BD17 for ; Fri, 21 Dec 2018 00:30:13 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Dec 2018 15:30:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,379,1539673200"; d="scan'208";a="103637123" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.237.221.46]) ([10.237.221.46]) by orsmga008.jf.intel.com with ESMTP; 20 Dec 2018 15:30:11 -0800 To: "Wang, Zhihong" , Ravi Kerur References: <1457391583-29604-1-git-send-email-rkerur@gmail.com> <1457391644-29645-1-git-send-email-rkerur@gmail.com> <8F6C2BD409508844A0EFC19955BE094110743936@SHSMSX103.ccr.corp.intel.com> From: Ferruh Yigit Openpgp: preference=signencrypt Autocrypt: addr=ferruh.yigit@intel.com; prefer-encrypt=mutual; keydata= mQINBFXZCFABEADCujshBOAaqPZpwShdkzkyGpJ15lmxiSr3jVMqOtQS/sB3FYLT0/d3+bvy qbL9YnlbPyRvZfnP3pXiKwkRoR1RJwEo2BOf6hxdzTmLRtGtwWzI9MwrUPj6n/ldiD58VAGQ +iR1I/z9UBUN/ZMksElA2D7Jgg7vZ78iKwNnd+vLBD6I61kVrZ45Vjo3r+pPOByUBXOUlxp9 GWEKKIrJ4eogqkVNSixN16VYK7xR+5OUkBYUO+sE6etSxCr7BahMPKxH+XPlZZjKrxciaWQb +dElz3Ab4Opl+ZT/bK2huX+W+NJBEBVzjTkhjSTjcyRdxvS1gwWRuXqAml/sh+KQjPV1PPHF YK5LcqLkle+OKTCa82OvUb7cr+ALxATIZXQkgmn+zFT8UzSS3aiBBohg3BtbTIWy51jNlYdy ezUZ4UxKSsFuUTPt+JjHQBvF7WKbmNGS3fCid5Iag4tWOfZoqiCNzxApkVugltxoc6rG2TyX CmI2rP0mQ0GOsGXA3+3c1MCdQFzdIn/5tLBZyKy4F54UFo35eOX8/g7OaE+xrgY/4bZjpxC1 1pd66AAtKb3aNXpHvIfkVV6NYloo52H+FUE5ZDPNCGD0/btFGPWmWRmkPybzColTy7fmPaGz cBcEEqHK4T0aY4UJmE7Ylvg255Kz7s6wGZe6IR3N0cKNv++O7QARAQABtCVGZXJydWggWWln aXQgPGZlcnJ1aC55aWdpdEBpbnRlbC5jb20+iQJVBBMBAgA/AhsDBgsJCAcDAgYVCAIJCgsE FgIDAQIeAQIXgBYhBNI2U4dCLsKE45mBx/kz60PfE2EfBQJbughWBQkHwjOGAAoJEPkz60Pf E2Eft84QAIbKWqhgqRfoiw/BbXbA1+qm2o4UgkCRQ0yJgt9QsnbpOmPKydHH0ixCliNz1J8e mRXCkMini1bTpnzp7spOjQGLeAFkNFz6BMq8YF2mVWbGEDE9WgnAxZdi0eLY7ZQnHbE6AxKL SXmpe9INb6z3ztseFt7mqje/W/6DWYIMnH3Yz9KzxujFWDcq8UCAvPkxVQXLTMpauhFgYeEx Nub5HbvhxTfUkapLwRQsSd/HbywzqZ3s/bbYMjj5JO3tgMiM9g9HOjv1G2f1dQjHi5YQiTZl 1eIIqQ3pTic6ROaiZqNmQFXPsoOOFfXF8nN2zg8kl/sSdoXWHhama5hbwwtl1vdaygQYlmdK H2ueiFh/UvT3WG3waNv2eZiEbHV8Rk52Xyn2w1G90lV0fYC6Ket1Xjoch7kjwbx793Kz/RfQ rmBY8/S4DTGn3oq3dMdQY+b6+7VMUeLMMh2CXYO9ErkOq+qNTD1IY+cBAkXnaDbQfz0zbste ZGWH74FAZ9nCpDOqbRTrBL42aMGhfOWEyeA1x7+hl6JZfabBWAuf4nnCXuorKHzBXTrf7u7p fXsKQClWRW77PF1VmzrtKNVSytQAmlCWApQIw20AarFipXmVdIjHmJPU611WoyxZPb4JTOxx 5cv9B+nr/RIB+v5dcStyHCCwO1be7nBDdCgd4F6kTQPLuQINBFfWTL4BEACnNA29e8TarUsB L5n6eLZHXcFvVwNLVlirWOClHXf44o2KnN3ww+eBEmKVfEFo9MSuGDNHS8Zw1NiGMYxLIUgd U6gGrVVs/VrQWL82pbMk6jCj98N+BXIri+6K1z+AImz7ax7iF1kDgRAnFWU0znWWBgM2mM8Y gDjcxfXk4sCKnvf6Gjo08Ey5zmqx7dekAKU2EEp8Q1EJY3jbymLdZWRP4AFFMTS1rGMk0/tt v71NBg1GobCcbNfn9chK/jhqxYhAJqq86RdJQkt3/9x1U1Oq0vXCt4JVVHmkxePtUiuWTTt+ aYlUAsKYZsWvncExvw77x2ArYDmaK0yfjh37wp0lY7DOJHFxoyT8tyWZlLci/VMRG2Ja33xj 0CN4C1yBg+QDeV3QFxQo42iA/ykdXPUR3ezmsND3XKvVLTC4DNb3V/EZQ7jBj64+bEK0VW4G B31VP00ApNQvSoczsIOAKdk97RNbpmPw6q10ILIB+9T1xbnFYzshzGF17oC0/GENIHATx8vZ masOZoDiOZQpeneLgnFE9JfzhLTxv6wNZcc/HLXRQVTkDsQr8ERtkAoHCf1E5+b5Yr7pfnE4 YuhET746o25S53ELUYPIs49qoJsEJL34/oexMfPGyPIlrbufiNyty5jc/1MRwUlhJlJ5IOHy ZUa+6CLR7GdImusFkPJUJwARAQABiQI8BBgBAgAmAhsMFiEE0jZTh0IuwoTjmYHH+TPrQ98T YR8FAlu6CHAFCQXE7zIACgkQ+TPrQ98TYR9nXxAAqNBgkYNyGuWUuy0GwDQCbu3iiMyH1+D7 llafPcK4NYy1Z4AYuVwC9nmLaoj+ozdqS3ncRo57ncRsKEJC46nDJJZYZ5LSJVn63Y3NBF86 lxQAgjj2oyZEwaLKtKbAFsXL43jv1pUGgSvWwYtDwHITXXFQto9rZEuUDRFSx4sg9OR+Q6/6 LY+nQQ3OdHlBkflzYMPcWgDcvcTAO6yasLEUf7UcYoSWTyMYjLB4QuNlXzTswzGVMssJF/vo V8lD1eqqaSUWG3STF6GVLQOr1NLvN5+kUBiEStHFxBpgSCvYY9sNV8FS6N24CAWMBl+10W+D 2h1yiiP5dOdPcBDYKsgqDD91/sP0WdyMJkwdQJtD49f9f+lYloxHnSAxMleOpyscg1pldw+i mPaUY1bmIknLhhkqfMmjywQOXpac5LRMibAAYkcB8v7y3kwELnt8mhqqZy6LUsqcWygNbH/W K3GGt5tRpeIXeJ25x8gg5EBQ0Jnvp/IbBYQfPLtXH0Myq2QuAhk/1q2yEIbVjS+7iowEZNyE 56K63WBJxsJPB2mvmLgn98GqB4G6GufP1ndS0XDti/2K0o8rep9xoY/JDGi0n0L0tk9BHyoP Y7kaEpu7UyY3nVdRLe5H1/MnFG8hdJ97WqnPS0buYZlrbTV0nRFL/NI2VABl18vEEXvNQiO+ vM8= Cc: dpdk-dev , Thomas Monjalon Message-ID: Date: Thu, 20 Dec 2018 23:30:10 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.3 MIME-Version: 1.0 In-Reply-To: <8F6C2BD409508844A0EFC19955BE094110743936@SHSMSX103.ccr.corp.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2018 23:30:14 -0000 On 5/26/2016 9:57 AM, zhihong.wang at intel.com (Wang, Zhihong) wrote: >> -----Original Message----- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur >> Sent: Tuesday, March 8, 2016 7:01 AM >> To: dev at dpdk.org >> Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and >> SSE intrinsics >> >> v1: >> This patch adds memcmp functionality using AVX and SSE >> intrinsics provided by Intel. For other architectures >> supported by DPDK regular memcmp function is used. >> >> Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA) >> systems. >> > [...] > >> + if (unlikely(!_mm_testz_si128(xmm2, xmm2))) { >> + __m128i idx = >> + _mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); > > line over 80 characters ;) > >> + >> + /* >> + * Reverse byte order >> + */ >> + xmm0 = _mm_shuffle_epi8(xmm0, idx); >> + xmm1 = _mm_shuffle_epi8(xmm1, idx); >> + >> + /* >> + * Compare unsigned bytes with instructions for signed bytes >> + */ >> + xmm0 = _mm_xor_si128(xmm0, _mm_set1_epi8(0x80)); >> + xmm1 = _mm_xor_si128(xmm1, _mm_set1_epi8(0x80)); >> + >> + return _mm_movemask_epi8(xmm0 > xmm1) - >> _mm_movemask_epi8(xmm1 > xmm0); >> + } >> + >> + return 0; >> +} > > [...] > >> +static inline int >> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) >> +{ >> + const uint8_t *src_1 = (const uint8_t *)_src_1; >> + const uint8_t *src_2 = (const uint8_t *)_src_2; >> + int ret = 0; >> + >> + if (n < 16) >> + return rte_memcmp_regular(src_1, src_2, n); > [...] >> + >> + while (n > 512) { >> + ret = rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256); > > Thanks for the great work! > > Seems to me there's a big improvement area before going into detailed > instruction layout tuning that -- No unalignment handling here for large > size memcmp. > > So almost without a doubt the performance will be low in micro-architectures > like Sandy Bridge if the start address is unaligned, which might be a > common case. Patch is waiting for comment for a long time, since 2016 May. Updating patch status as rejected. Anyone planning to work on vectorized version of rte_memcmp() can benefit from this patch: https://patches.dpdk.org/patch/11156/ https://patches.dpdk.org/patch/11157/