* [dpdk-dev] cuckoo hash in dpdk @ 2017-08-23 14:15 Pragash Vijayaragavan 2017-08-23 18:28 ` Dumitrescu, Cristian 2017-08-24 14:53 ` Andriy Berestovskyy 0 siblings, 2 replies; 7+ messages in thread From: Pragash Vijayaragavan @ 2017-08-23 14:15 UTC (permalink / raw) To: dev; +Cc: Minseok Kwon Hi, I got the chance to look at the cuckoo hash used in dpdk and have a query. would using division and modulo operations be slower than bitwise operations on RTE_HASH_BUCKET_ENTRIES, specially since RTE_HASH_BUCKET_ENTRIES is a power of 2. For example, to do a modulo we can do a "AND" operation on (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo filter for VPP and doing this gave a slight improvement in speed. Is there any particular reason its done this way. Sorry if i am being wrong in any way, i was just curious. Thanks, Pragash Vijayaragavan Grad Student at Rochester Institute of Technology email : pxv3620@rit.edu ph : 585 764 4662 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] cuckoo hash in dpdk 2017-08-23 14:15 [dpdk-dev] cuckoo hash in dpdk Pragash Vijayaragavan @ 2017-08-23 18:28 ` Dumitrescu, Cristian 2017-08-23 20:20 ` Pragash Vijayaragavan 2017-08-24 14:53 ` Andriy Berestovskyy 1 sibling, 1 reply; 7+ messages in thread From: Dumitrescu, Cristian @ 2017-08-23 18:28 UTC (permalink / raw) To: pxv3620, dev; +Cc: Minseok Kwon > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pragash > Vijayaragavan > Sent: Wednesday, August 23, 2017 3:16 PM > To: dev@dpdk.org > Cc: Minseok Kwon <mxkvcs@rit.edu> > Subject: [dpdk-dev] cuckoo hash in dpdk > > Hi, > > I got the chance to look at the cuckoo hash used in dpdk and have a query. > > would using division and modulo operations be slower than bitwise > operations on RTE_HASH_BUCKET_ENTRIES, specially since > RTE_HASH_BUCKET_ENTRIES is a power of 2. > For example, to do a modulo we can do a "AND" operation on > (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo > filter for VPP and doing this gave a slight improvement in speed. > Is there any particular reason its done this way. > > Sorry if i am being wrong in any way, i was just curious. > > Thanks, > > Pragash Vijayaragavan > Grad Student at Rochester Institute of Technology > email : pxv3620@rit.edu > ph : 585 764 4662 Bitwise AND typically takes 1 cycle on any CPU, while modulo takes dozens of cycles. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] cuckoo hash in dpdk 2017-08-23 18:28 ` Dumitrescu, Cristian @ 2017-08-23 20:20 ` Pragash Vijayaragavan 2017-08-23 20:39 ` Pragash Vijayaragavan 0 siblings, 1 reply; 7+ messages in thread From: Pragash Vijayaragavan @ 2017-08-23 20:20 UTC (permalink / raw) To: Dumitrescu, Cristian; +Cc: dev, Minseok Kwon Hi , The performance will depend on the time taken for calculating the hash1 and hash2 values for each lookup. Can i know which hash functions are used to calculate the hash values for each incoming key. We use CRC32 which uses xxhash i guess. I could not find the implementation of the hash functions. Thanks, Pragash Vijayaragavan Grad Student at Rochester Institute of Technology email : pxv3620@rit.edu ph : 585 764 4662 On Wed, Aug 23, 2017 at 2:28 PM, Dumitrescu, Cristian < cristian.dumitrescu@intel.com> wrote: > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pragash > > Vijayaragavan > > Sent: Wednesday, August 23, 2017 3:16 PM > > To: dev@dpdk.org > > Cc: Minseok Kwon <mxkvcs@rit.edu> > > Subject: [dpdk-dev] cuckoo hash in dpdk > > > > Hi, > > > > I got the chance to look at the cuckoo hash used in dpdk and have a > query. > > > > would using division and modulo operations be slower than bitwise > > operations on RTE_HASH_BUCKET_ENTRIES, specially since > > RTE_HASH_BUCKET_ENTRIES is a power of 2. > > For example, to do a modulo we can do a "AND" operation on > > (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo > > filter for VPP and doing this gave a slight improvement in speed. > > Is there any particular reason its done this way. > > > > Sorry if i am being wrong in any way, i was just curious. > > > > Thanks, > > > > Pragash Vijayaragavan > > Grad Student at Rochester Institute of Technology > > email : pxv3620@rit.edu > > ph : 585 764 4662 > > Bitwise AND typically takes 1 cycle on any CPU, while modulo takes dozens > of cycles. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] cuckoo hash in dpdk 2017-08-23 20:20 ` Pragash Vijayaragavan @ 2017-08-23 20:39 ` Pragash Vijayaragavan 0 siblings, 0 replies; 7+ messages in thread From: Pragash Vijayaragavan @ 2017-08-23 20:39 UTC (permalink / raw) To: Dumitrescu, Cristian; +Cc: dev, Minseok Kwon And yea, mod will take only 1 Clock cycle that way :) Also if divisor is a power of 2, a / b can be done like ( Given a is unsigned, a > b and since our hash values are greater than 0, a is our hash value, b the power of 2) while(b != 1) { a >>= 1; b >>= 1; } which is less clock cycles than normal division i guess :) Thanks, Pragash Vijayaragavan Grad Student at Rochester Institute of Technology email : pxv3620@rit.edu ph : 585 764 4662 On Wed, Aug 23, 2017 at 4:20 PM, Pragash Vijayaragavan <pxv3620@g.rit.edu> wrote: > Hi , > > The performance will depend on the time taken for calculating the hash1 > and hash2 values for each lookup. > > Can i know which hash functions are used to calculate the hash values for > each incoming key. We use CRC32 which uses xxhash i guess. > I could not find the implementation of the hash functions. > > > Thanks, > > Pragash Vijayaragavan > Grad Student at Rochester Institute of Technology > email : pxv3620@rit.edu > ph : 585 764 4662 <(585)%20764-4662> > > > On Wed, Aug 23, 2017 at 2:28 PM, Dumitrescu, Cristian < > cristian.dumitrescu@intel.com> wrote: > >> >> >> > -----Original Message----- >> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pragash >> > Vijayaragavan >> > Sent: Wednesday, August 23, 2017 3:16 PM >> > To: dev@dpdk.org >> > Cc: Minseok Kwon <mxkvcs@rit.edu> >> > Subject: [dpdk-dev] cuckoo hash in dpdk >> > >> > Hi, >> > >> > I got the chance to look at the cuckoo hash used in dpdk and have a >> query. >> > >> > would using division and modulo operations be slower than bitwise >> > operations on RTE_HASH_BUCKET_ENTRIES, specially since >> > RTE_HASH_BUCKET_ENTRIES is a power of 2. >> > For example, to do a modulo we can do a "AND" operation on >> > (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo >> > filter for VPP and doing this gave a slight improvement in speed. >> > Is there any particular reason its done this way. >> > >> > Sorry if i am being wrong in any way, i was just curious. >> > >> > Thanks, >> > >> > Pragash Vijayaragavan >> > Grad Student at Rochester Institute of Technology >> > email : pxv3620@rit.edu >> > ph : 585 764 4662 >> >> Bitwise AND typically takes 1 cycle on any CPU, while modulo takes dozens >> of cycles. >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] cuckoo hash in dpdk 2017-08-23 14:15 [dpdk-dev] cuckoo hash in dpdk Pragash Vijayaragavan 2017-08-23 18:28 ` Dumitrescu, Cristian @ 2017-08-24 14:53 ` Andriy Berestovskyy 2017-08-24 18:54 ` Pragash Vijayaragavan 1 sibling, 1 reply; 7+ messages in thread From: Andriy Berestovskyy @ 2017-08-24 14:53 UTC (permalink / raw) To: pxv3620; +Cc: DPDK, Minseok Kwon Hey Pragash, I am not the author of the code, but I guess it is done that way because modern compilers do recognize power of two constants and do substitute division and modulo operations with corresponding bit manipulations. Just try to compile a small program like the following: volatile unsigned a = 123, b, c; int main(int argc, char **argv) { b = a / 4; c = a % 4; printf("%x %x %x\n", a, b, c); } and then disassemble it with gdb: (gdb) disassemble /s main [...] 13 b = a / 4; 0x0000000000400464 <+20>: shr $0x2,%eax 0x0000000000400467 <+23>: mov %eax,0x200bd3(%rip) # 0x601040 <b> 14 c = a % 4; 0x000000000040046d <+29>: mov 0x200bc5(%rip),%eax # 0x601038 <a> 0x0000000000400473 <+35>: and $0x3,%eax 0x0000000000400476 <+38>: mov %eax,0x200bc8(%rip) # 0x601044 <c> [...] As you can see both division and modulo was substituted with "shr" and "and". So basically nowadays there is no need to worry about that and complicate code with explicit low-level optimizations. Hope that answers your question. Regards, Andriy On Wed, Aug 23, 2017 at 4:15 PM, Pragash Vijayaragavan <pxv3620@rit.edu> wrote: > Hi, > > I got the chance to look at the cuckoo hash used in dpdk and have a query. > > would using division and modulo operations be slower than bitwise > operations on RTE_HASH_BUCKET_ENTRIES, specially since > RTE_HASH_BUCKET_ENTRIES is a power of 2. > For example, to do a modulo we can do a "AND" operation on > (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo > filter for VPP and doing this gave a slight improvement in speed. > Is there any particular reason its done this way. > > Sorry if i am being wrong in any way, i was just curious. > > Thanks, > > Pragash Vijayaragavan > Grad Student at Rochester Institute of Technology > email : pxv3620@rit.edu > ph : 585 764 4662 -- Andriy Berestovskyy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] cuckoo hash in dpdk 2017-08-24 14:53 ` Andriy Berestovskyy @ 2017-08-24 18:54 ` Pragash Vijayaragavan 2017-08-25 9:00 ` Andriy Berestovskyy 0 siblings, 1 reply; 7+ messages in thread From: Pragash Vijayaragavan @ 2017-08-24 18:54 UTC (permalink / raw) To: Andriy Berestovskyy; +Cc: dev, Minseok Kwon Thats great, what about the hash functions. On 24 Aug 2017 10:54, "Andriy Berestovskyy" <aber@semihalf.com> wrote: > Hey Pragash, > I am not the author of the code, but I guess it is done that way > because modern compilers do recognize power of two constants and do > substitute division and modulo operations with corresponding bit > manipulations. > > Just try to compile a small program like the following: > > volatile unsigned a = 123, b, c; > int main(int argc, char **argv) > { > b = a / 4; > c = a % 4; > printf("%x %x %x\n", a, b, c); > } > > > and then disassemble it with gdb: > > (gdb) disassemble /s main > [...] > 13 b = a / 4; > 0x0000000000400464 <+20>: shr $0x2,%eax > 0x0000000000400467 <+23>: mov %eax,0x200bd3(%rip) # 0x601040 > <b> > > 14 c = a % 4; > 0x000000000040046d <+29>: mov 0x200bc5(%rip),%eax # 0x601038 > <a> > 0x0000000000400473 <+35>: and $0x3,%eax > 0x0000000000400476 <+38>: mov %eax,0x200bc8(%rip) # 0x601044 > <c> > [...] > > As you can see both division and modulo was substituted with "shr" and > "and". > > So basically nowadays there is no need to worry about that and > complicate code with explicit low-level optimizations. Hope that > answers your question. > > Regards, > Andriy > > > On Wed, Aug 23, 2017 at 4:15 PM, Pragash Vijayaragavan <pxv3620@rit.edu> > wrote: > > Hi, > > > > I got the chance to look at the cuckoo hash used in dpdk and have a > query. > > > > would using division and modulo operations be slower than bitwise > > operations on RTE_HASH_BUCKET_ENTRIES, specially since > > RTE_HASH_BUCKET_ENTRIES is a power of 2. > > For example, to do a modulo we can do a "AND" operation on > > (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo > > filter for VPP and doing this gave a slight improvement in speed. > > Is there any particular reason its done this way. > > > > Sorry if i am being wrong in any way, i was just curious. > > > > Thanks, > > > > Pragash Vijayaragavan > > Grad Student at Rochester Institute of Technology > > email : pxv3620@rit.edu > > ph : 585 764 4662 > > > > -- > Andriy Berestovskyy > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] cuckoo hash in dpdk 2017-08-24 18:54 ` Pragash Vijayaragavan @ 2017-08-25 9:00 ` Andriy Berestovskyy 0 siblings, 0 replies; 7+ messages in thread From: Andriy Berestovskyy @ 2017-08-25 9:00 UTC (permalink / raw) To: pxv3620; +Cc: DPDK, users Hey Pragash, You can pass your own hash function to rte_hash_create() otherwise a default one will be used, see http://dpdk.org/browse/dpdk/tree/lib/librte_hash/rte_cuckoo_hash.c#n281 The default hash function is rte_hash_crc() or in some cases rte_jhash(), see http://dpdk.org/browse/dpdk/tree/lib/librte_hash/rte_cuckoo_hash.h#n61 You can find the implementation of rte_hash_crc() over here: http://dpdk.org/browse/dpdk/tree/lib/librte_hash/rte_hash_crc.h#n588 Please note there is a separate mailing list for DPDK usage discussions: http://dpdk.org/ml/listinfo/users The dev@ list is mostly for patch reviews and RFCs... Andriy On Thu, Aug 24, 2017 at 8:54 PM, Pragash Vijayaragavan <pxv3620@rit.edu> wrote: > Thats great, what about the hash functions. > > On 24 Aug 2017 10:54, "Andriy Berestovskyy" <aber@semihalf.com> wrote: >> >> Hey Pragash, >> I am not the author of the code, but I guess it is done that way >> because modern compilers do recognize power of two constants and do >> substitute division and modulo operations with corresponding bit >> manipulations. >> >> Just try to compile a small program like the following: >> >> volatile unsigned a = 123, b, c; >> int main(int argc, char **argv) >> { >> b = a / 4; >> c = a % 4; >> printf("%x %x %x\n", a, b, c); >> } >> >> >> and then disassemble it with gdb: >> >> (gdb) disassemble /s main >> [...] >> 13 b = a / 4; >> 0x0000000000400464 <+20>: shr $0x2,%eax >> 0x0000000000400467 <+23>: mov %eax,0x200bd3(%rip) # 0x601040 >> <b> >> >> 14 c = a % 4; >> 0x000000000040046d <+29>: mov 0x200bc5(%rip),%eax # 0x601038 >> <a> >> 0x0000000000400473 <+35>: and $0x3,%eax >> 0x0000000000400476 <+38>: mov %eax,0x200bc8(%rip) # 0x601044 >> <c> >> [...] >> >> As you can see both division and modulo was substituted with "shr" and >> "and". >> >> So basically nowadays there is no need to worry about that and >> complicate code with explicit low-level optimizations. Hope that >> answers your question. >> >> Regards, >> Andriy >> >> >> On Wed, Aug 23, 2017 at 4:15 PM, Pragash Vijayaragavan <pxv3620@rit.edu> >> wrote: >> > Hi, >> > >> > I got the chance to look at the cuckoo hash used in dpdk and have a >> > query. >> > >> > would using division and modulo operations be slower than bitwise >> > operations on RTE_HASH_BUCKET_ENTRIES, specially since >> > RTE_HASH_BUCKET_ENTRIES is a power of 2. >> > For example, to do a modulo we can do a "AND" operation on >> > (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo >> > filter for VPP and doing this gave a slight improvement in speed. >> > Is there any particular reason its done this way. >> > >> > Sorry if i am being wrong in any way, i was just curious. >> > >> > Thanks, >> > >> > Pragash Vijayaragavan >> > Grad Student at Rochester Institute of Technology >> > email : pxv3620@rit.edu >> > ph : 585 764 4662 >> >> >> >> -- >> Andriy Berestovskyy -- Andriy Berestovskyy ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-25 9:00 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-23 14:15 [dpdk-dev] cuckoo hash in dpdk Pragash Vijayaragavan 2017-08-23 18:28 ` Dumitrescu, Cristian 2017-08-23 20:20 ` Pragash Vijayaragavan 2017-08-23 20:39 ` Pragash Vijayaragavan 2017-08-24 14:53 ` Andriy Berestovskyy 2017-08-24 18:54 ` Pragash Vijayaragavan 2017-08-25 9:00 ` Andriy Berestovskyy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).