From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by dpdk.org (Postfix) with ESMTP id 41FBF2BB8 for ; Thu, 16 May 2019 11:03:13 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id C9EDE40008 for ; Thu, 16 May 2019 11:03:12 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id B754E40007; Thu, 16 May 2019 11:03:12 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED,AWL autolearn=disabled version=3.4.1 X-Spam-Score: -0.9 Received: from [192.168.1.59] (host-90-232-127-248.mobileonline.telia.com [90.232.127.248]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 433F540004; Thu, 16 May 2019 11:03:11 +0200 (CEST) To: Stephen Hemminger , dev@dpdk.org References: <20190515221952.21959-1-stephen@networkplumber.org> <20190515221952.21959-5-stephen@networkplumber.org> From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= Message-ID: <95e9a56f-5d33-2a53-033d-d8963193cbea@ericsson.com> Date: Thu, 16 May 2019 11:03:10 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190515221952.21959-5-stephen@networkplumber.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Subject: Re: [dpdk-dev] [RFC 4/4] net/ether: use bitops to speedup comparison X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 May 2019 09:03:13 -0000 On 2019-05-16 00:19, Stephen Hemminger wrote: > Using bit operations like or and xor is faster than a loop > on all architectures. Really just explicit unrolling. > > Similar cast to uint16 unaligned is already done in > other functions here. > > Signed-off-by: Stephen Hemminger > --- > lib/librte_net/rte_ether.h | 17 +++++++---------- > 1 file changed, 7 insertions(+), 10 deletions(-) > > diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h > index b94e64b2195e..5d9242cda230 100644 > --- a/lib/librte_net/rte_ether.h > +++ b/lib/librte_net/rte_ether.h > @@ -78,11 +78,10 @@ struct ether_addr { > static inline int is_same_ether_addr(const struct ether_addr *ea1, > const struct ether_addr *ea2) > { > - int i; > - for (i = 0; i < ETHER_ADDR_LEN; i++) > - if (ea1->addr_bytes[i] != ea2->addr_bytes[i]) > - return 0; > - return 1; > + const unaligned_uint16_t *w1 = (const uint16_t *)ea1; > + const unaligned_uint16_t *w2 = (const uint16_t *)ea2; > + > + return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) == 0; > } > If you want to shave off a couple of instructions, you can switch the three 16-bit loads to one 32-bit and one 16-bit load. Something like: const uint8_t *ea1_b = (const uint8_t *)ea1; const uint8_t *ea2_b = (const uint8_t *)ea2; uint32_t ea1_h; uint32_t ea2_h; uint16_t ea1_l; uint16_t ea2_l; memcpy(&ea1_h, &ea1_b[0], sizeof(ea1_h)); memcpy(&ea1_l, &ea1_b[sizeof(ea1_h)], sizeof(ea1_l)); memcpy(&ea2_h, &ea2_b[0], sizeof(ea2_h)); memcpy(&ea2_l, &ea2_b[sizeof(ea2_h)], sizeof(ea2_l)); return ((ea1_l ^ ea2_l) | (ea1_h ^ ea2_h)) == 0; Code is not as clean as your solution though. > /** > @@ -97,11 +96,9 @@ static inline int is_same_ether_addr(const struct ether_addr *ea1, > */ > static inline int is_zero_ether_addr(const struct ether_addr *ea) > { > - int i; > - for (i = 0; i < ETHER_ADDR_LEN; i++) > - if (ea->addr_bytes[i] != 0x00) > - return 0; > - return 1; > + const unaligned_uint16_t *w = (const uint16_t *)ea; > + > + return (w[0] | w[1] | w[2]) == 0; > } > > /** > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 67F34A00E6 for ; Thu, 16 May 2019 11:03:15 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 247A734F0; Thu, 16 May 2019 11:03:14 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by dpdk.org (Postfix) with ESMTP id 41FBF2BB8 for ; Thu, 16 May 2019 11:03:13 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id C9EDE40008 for ; Thu, 16 May 2019 11:03:12 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id B754E40007; Thu, 16 May 2019 11:03:12 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED,AWL autolearn=disabled version=3.4.1 X-Spam-Score: -0.9 Received: from [192.168.1.59] (host-90-232-127-248.mobileonline.telia.com [90.232.127.248]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 433F540004; Thu, 16 May 2019 11:03:11 +0200 (CEST) To: Stephen Hemminger , dev@dpdk.org References: <20190515221952.21959-1-stephen@networkplumber.org> <20190515221952.21959-5-stephen@networkplumber.org> From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= Message-ID: <95e9a56f-5d33-2a53-033d-d8963193cbea@ericsson.com> Date: Thu, 16 May 2019 11:03:10 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190515221952.21959-5-stephen@networkplumber.org> Content-Type: text/plain; charset="UTF-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Subject: Re: [dpdk-dev] [RFC 4/4] net/ether: use bitops to speedup comparison X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190516090310.j9RIBc2rFPFlWZZI3xy2Y1avPZWHhyadyrAS9uPvhTg@z> On 2019-05-16 00:19, Stephen Hemminger wrote: > Using bit operations like or and xor is faster than a loop > on all architectures. Really just explicit unrolling. > > Similar cast to uint16 unaligned is already done in > other functions here. > > Signed-off-by: Stephen Hemminger > --- > lib/librte_net/rte_ether.h | 17 +++++++---------- > 1 file changed, 7 insertions(+), 10 deletions(-) > > diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h > index b94e64b2195e..5d9242cda230 100644 > --- a/lib/librte_net/rte_ether.h > +++ b/lib/librte_net/rte_ether.h > @@ -78,11 +78,10 @@ struct ether_addr { > static inline int is_same_ether_addr(const struct ether_addr *ea1, > const struct ether_addr *ea2) > { > - int i; > - for (i = 0; i < ETHER_ADDR_LEN; i++) > - if (ea1->addr_bytes[i] != ea2->addr_bytes[i]) > - return 0; > - return 1; > + const unaligned_uint16_t *w1 = (const uint16_t *)ea1; > + const unaligned_uint16_t *w2 = (const uint16_t *)ea2; > + > + return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) == 0; > } > If you want to shave off a couple of instructions, you can switch the three 16-bit loads to one 32-bit and one 16-bit load. Something like: const uint8_t *ea1_b = (const uint8_t *)ea1; const uint8_t *ea2_b = (const uint8_t *)ea2; uint32_t ea1_h; uint32_t ea2_h; uint16_t ea1_l; uint16_t ea2_l; memcpy(&ea1_h, &ea1_b[0], sizeof(ea1_h)); memcpy(&ea1_l, &ea1_b[sizeof(ea1_h)], sizeof(ea1_l)); memcpy(&ea2_h, &ea2_b[0], sizeof(ea2_h)); memcpy(&ea2_l, &ea2_b[sizeof(ea2_h)], sizeof(ea2_l)); return ((ea1_l ^ ea2_l) | (ea1_h ^ ea2_h)) == 0; Code is not as clean as your solution though. > /** > @@ -97,11 +96,9 @@ static inline int is_same_ether_addr(const struct ether_addr *ea1, > */ > static inline int is_zero_ether_addr(const struct ether_addr *ea) > { > - int i; > - for (i = 0; i < ETHER_ADDR_LEN; i++) > - if (ea->addr_bytes[i] != 0x00) > - return 0; > - return 1; > + const unaligned_uint16_t *w = (const uint16_t *)ea; > + > + return (w[0] | w[1] | w[2]) == 0; > } > > /** >