From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 341905694 for ; Wed, 13 May 2015 12:12:22 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP; 13 May 2015 03:12:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,420,1427785200"; d="scan'208";a="493010901" Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by FMSMGA003.fm.intel.com with ESMTP; 13 May 2015 03:12:20 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.73]) by IRSMSX106.ger.corp.intel.com ([169.254.8.189]) with mapi id 14.03.0224.002; Wed, 13 May 2015 11:12:19 +0100 From: "Ananyev, Konstantin" To: "Ravi Kerur (rkerur@gmail.com)" Thread-Topic: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions. Thread-Index: AQHQjBHuQIC0CvBLZEiNp/yP/VFCLp13FljAgAAAK+CAABfEgIACgUewgAAAH/A= Date: Wed, 13 May 2015 10:12:19 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772582142EBD4@irsmsx105.ger.corp.intel.com> References: <1431119946-32078-1-git-send-email-rkerur@gmail.com> <1431119989-32124-1-git-send-email-rkerur@gmail.com> <2601191342CEEE43887BDE71AB9772582142E106@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772582142E44A@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772582142E475@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772582142EBB5@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772582142EBB5@irsmsx105.ger.corp.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2015 10:12:23 -0000 Hi Ravi, > -----Original Message----- > From: Ananyev, Konstantin > Sent: Wednesday, May 13, 2015 11:02 AM > To: Ananyev, Konstantin > Subject: FW: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instruc= tions. >=20 >=20 >=20 > From: Ravi Kerur [mailto:rkerur@gmail.com] > Sent: Monday, May 11, 2015 9:47 PM > To: Ananyev, Konstantin > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instruc= tions. >=20 > Hi Konstantin, >=20 > On Mon, May 11, 2015 at 12:35 PM, Ananyev, Konstantin wrote: >=20 > Hi Ravi, >=20 > > > > From: Ravi Kerur [mailto:rkerur@gmail.com] > > Sent: Monday, May 11, 2015 6:43 PM > > To: Ananyev, Konstantin > > Cc: Matt Laswell; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instr= uctions. > > > > Hi Konstantin, > > > > > > On Mon, May 11, 2015 at 2:51 AM, Ananyev, Konstantin wrote: > > Hi Ravi, > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ravi Kerur > > > Sent: Friday, May 08, 2015 11:55 PM > > > To: Matt Laswell > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE ins= tructions. > > > > > > On Fri, May 8, 2015 at 3:29 PM, Matt Laswell = wrote: > > > > > > > > > > > > > > > On Fri, May 8, 2015 at 4:19 PM, Ravi Kerur wrote= : > > > > > > > >> This patch replaces memcmp in librte_hash with rte_memcmp which is > > > >> implemented with AVX/SSE instructions. > > > >> > > > >> +static inline int > > > >> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) > > > >> +{ > > > >> +=A0 =A0 =A0 =A0const uint8_t *src_1 =3D (const uint8_t *)_src_1; > > > >> +=A0 =A0 =A0 =A0const uint8_t *src_2 =3D (const uint8_t *)_src_2; > > > >> +=A0 =A0 =A0 =A0int ret =3D 0; > > > >> + > > > >> +=A0 =A0 =A0 =A0if (n & 0x80) > > > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return rte_cmp128(src_1, src_2); > > > >> + > > > >> +=A0 =A0 =A0 =A0if (n & 0x40) > > > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return rte_cmp64(src_1, src_2); > > > >> + > > > >> +=A0 =A0 =A0 =A0if (n & 0x20) { > > > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D rte_cmp32(src_1, src_2); > > > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0n -=3D 0x20; > > > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0src_1 +=3D 0x20; > > > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0src_2 +=3D 0x20; > > > >> +=A0 =A0 =A0 =A0} > > > >> > > > >> > > > > Pardon me for butting in, but this seems incorrect for the first tw= o cases > > > > listed above, as the function as written will only compare the firs= t 128 or > > > > 64 bytes of each source and return the result.=A0 The pattern expre= ssed in > > > > the 32 byte case appears more correct, as it compares the first 32 = bytes > > > > and then lets later pieces of the function handle the smaller remai= ning > > > > bits of the sources. Also, if this function is to handle arbitraril= y large > > > > source data, the 128 byte case needs to be in a loop. > > > > > > > > What am I missing? > > > > > > > > > > Current max hash key length supported is 64 bytes, hence no compariso= n is > > > done after 64 bytes. 128 bytes comparison is added to measure perform= ance > > > only and there is no use-case as of now. With the current use-cases i= ts not > > > required but if there is a need to handle large arbitrary data upto 1= 28 > > > bytes it can be modified. > > So on x86 let say rte_memcmp(k1, k2, 65) might produce invalid results,= right? > > While on PPC will work as expected (as it calls memcpu underneath)? > > That looks really weird to me. > > If you plan to use rte_memcmp only for hash comparisons, then probably > > you should put it somewhere into librte_hash and name it accordingly: r= te_hash_key_cmp() or something. > > And put a big comment around it, that it only works with particular len= gths. > > If you want it to be a generic function inside EAL, then it probably ne= ed to handle different lengths properly > > on all supported architectures. > > Konstantin > > > > > > Let me just explain it here and probably add it to document as well. > > > > rte_memcmp is not > > > > 1. a replacement to memcmp > > > > 2. =A0restricted to hash key comparison > > > > rte_memcmp is > > > > 1. optimized comparison for 16 to 128 bytes, v1 patch series had this s= upport. Changed some of the logic in v2 due to concerns raised > > for unavailable use-cases beyond 64 bytes comparison. > From what I see in v2 it supposed to work correctly for len in [0,64] and= =A0 len=3D128, right? > Not sure I get it: so for v1 it was able to handle any length correctly, = but then you removed it? > If so, I wonder what was the reason? Make it faster? >=20 > My initial discussion was with Zhilong(John) from Intel and we decided to= implement up to 128 bytes comparison and use rte_hash > and rte_lpm6 as a candidate for testing. When I sent out v1 patch, Bruce = comments were on use-case for 128 bytes comparison and > was it really required? Hence I decided in v2 to support only up to 64 by= tes and added 128 bytes only for performance measurement. > Personally I think support for up to 128 bytes comparison is required, th= ere might not be use-cases today but it will definitely be > useful. Ok, we don't have a real usage case for it now, but it still probably good = to have it work with arbitrary key-length. Again, as Don suggested in another mail, we can have an optimised implement= ation for particular sizes and fall back to slow-path (memcp) for all other cases= . Even if you'll decide to limit len to particular value (64/128), it is prob= ably not very good to have a gap in between, as it exists now [65-127]. >=20 > Another thing that looks strange to me: > While all rte_cmp*() uses actual data values for comparison results, > rte_memcmp_remainder() return value depends not only on data values but a= lso on data locations: >=20 > +static inline int > +rte_memcmp_remainder(const uint8_t *src_1u, const uint8_t *src_2u, size_= t n) > +{ > ... > exit: > + > +=A0 =A0 =A0 =A0return src_1u < src_2u ? -1 : 1; > +} >=20 > This is a bug and its not supposed to be there. I will fix it. Thanks for= catching it. >=20 > If you just test for equal/not equal that doesn't really matter. > If this is supposed to be a 'proper' comparison function, then the result= is sort of unpredictable. > > With minor tuning over the weekend I am able to get better performance = for > > anything between 16 to 128 bytes comparison. > > > > 2. will be specific to DPDK =A0i.e. currently all memcmp usage in DPDK = are for equality or inequality hence "less than" or "greater than" > > implementation in rte_memcmp doesn't make sense and will be removed in = subsequent patches, it will return 0 or 1 for > > equal/unequal cases. >=20 > If you don't plan your function to follow memcmp() semantics and syntax, = why to name it rte_memcmp()? > I=A0 think that will make a lot of confusion around. > Why not to name it differently(and put a clear comment in the declaration= of course)? >=20 > Following memcmp semantics is not hard but there are no use-cases for it = in DPDK currently. Keeping it specific to DPDK usage > simplifies code as well. I can change the name to "rte_compare" and add c= omments to the function. Will it work? Yep, either rte_compare(), or as Don suggested rte_testequal() - both seems= good to me. Konstantin >=20 >=20 > > > > rte_hash will be the first candidate to move to rte_memcmp and subseque= ntly rte_lpm6 which uses 16 bytes comparison will be > > moved > > > > Later on RING_SIZE which uses large size for comparison will be moved. = I am currently studying/understanding that logic and will > make > > changes to rte_memcmp to support that. >=20 > Sorry, didn't get you here. >=20 > Once rte_hash, rte_lpm6 changes and new compare function code are reviewe= d and accepted I plan to move to different > components (RING_SIZE is currently defined to be from 256 to 16384 bytes)= and memcmp function being used in test_ring, > test_pmd_ring and other functions. I did not want to add all component ch= anges into one patch series as it causes high review latency > or patch series just dies down silently. Instead make patches small and i= ncremental in every series, hope this clarifies. > Thanks, > Ravi > Konstantin >=20 > > > > I don't want to make lot of changes in one shot and see that patch seri= es die a slow death with no takers. > > > > Thanks, > > Ravi > > > > > > > > > > > > > -- > > > > Matt Laswell > > > > infinite io, inc. > > > > laswell@infiniteio.com > > > > > > > >