From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id E2DFA2BDF for ; Thu, 26 May 2016 11:06:19 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP; 26 May 2016 02:06:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,367,1459839600"; d="scan'208";a="815176656" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga003.jf.intel.com with ESMTP; 26 May 2016 02:06:17 -0700 Received: from fmsmsx152.amr.corp.intel.com (10.18.125.5) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 26 May 2016 01:57:32 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by FMSMSX152.amr.corp.intel.com (10.18.125.5) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 26 May 2016 01:57:32 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.58]) by shsmsx102.ccr.corp.intel.com ([169.254.2.104]) with mapi id 14.03.0248.002; Thu, 26 May 2016 16:57:30 +0800 From: "Wang, Zhihong" To: Ravi Kerur , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics Thread-Index: AQHReMU+yvaN5z+od0uqX9j1dt2kJp/LY+1w Date: Thu, 26 May 2016 08:57:30 +0000 Message-ID: <8F6C2BD409508844A0EFC19955BE094110743936@SHSMSX103.ccr.corp.intel.com> References: <1457391583-29604-1-git-send-email-rkerur@gmail.com> <1457391644-29645-1-git-send-email-rkerur@gmail.com> In-Reply-To: <1457391644-29645-1-git-send-email-rkerur@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiN2E4MTZkZWQtNjQ3NC00MjUxLWI1ZGQtYmFlNjUyMTRmOTc4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6ImFiREU3c3A5UDJlVjhOWHdzKzB2NjZVTlN2ZWtqQXVjT091OHBrdmlNY1k9In0= x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 May 2016 09:06:20 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ravi Kerur > Sent: Tuesday, March 8, 2016 7:01 AM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX a= nd > SSE intrinsics >=20 > v1: > This patch adds memcmp functionality using AVX and SSE > intrinsics provided by Intel. For other architectures > supported by DPDK regular memcmp function is used. >=20 > Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA) > systems. >=20 [...] > + if (unlikely(!_mm_testz_si128(xmm2, xmm2))) { > + __m128i idx =3D > + _mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); line over 80 characters ;) > + > + /* > + * Reverse byte order > + */ > + xmm0 =3D _mm_shuffle_epi8(xmm0, idx); > + xmm1 =3D _mm_shuffle_epi8(xmm1, idx); > + > + /* > + * Compare unsigned bytes with instructions for signed bytes > + */ > + xmm0 =3D _mm_xor_si128(xmm0, _mm_set1_epi8(0x80)); > + xmm1 =3D _mm_xor_si128(xmm1, _mm_set1_epi8(0x80)); > + > + return _mm_movemask_epi8(xmm0 > xmm1) - > _mm_movemask_epi8(xmm1 > xmm0); > + } > + > + return 0; > +} [...] > +static inline int > +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) > +{ > + const uint8_t *src_1 =3D (const uint8_t *)_src_1; > + const uint8_t *src_2 =3D (const uint8_t *)_src_2; > + int ret =3D 0; > + > + if (n < 16) > + return rte_memcmp_regular(src_1, src_2, n); [...] > + > + while (n > 512) { > + ret =3D rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256); Thanks for the great work! Seems to me there's a big improvement area before going into detailed instruction layout tuning that -- No unalignment handling here for large size memcmp. So almost without a doubt the performance will be low in micro-architecture= s like Sandy Bridge if the start address is unaligned, which might be a common case. > + if (unlikely(ret !=3D 0)) > + return ret; > + > + ret =3D rte_cmp256(src_1 + 1 * 256, src_2 + 1 * 256); > + if (unlikely(ret !=3D 0)) > + return ret; > + > + src_1 =3D src_1 + 512; > + src_2 =3D src_2 + 512; > + n -=3D 512; > + } > + goto CMP_BLOCK_LESS_THAN_512; > +} > + > +#else /* RTE_MACHINE_CPUFLAG_AVX2 */