DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bili Dong <qobilidop@gmail.com>
To: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
Cc: "Richardson, Bruce" <bruce.richardson@intel.com>,
	"Gobriel, Sameh" <sameh.gobriel@intel.com>,
	 "thomas@monjalon.net" <thomas@monjalon.net>,
	"dev@dpdk.org" <dev@dpdk.org>,
	 "Wang, Yipeng1" <yipeng1.wang@intel.com>,
	 "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
Subject: Re: [PATCH v3] hash: add XOR32 hash function
Date: Mon, 20 Feb 2023 12:44:06 -0800	[thread overview]
Message-ID: <CAFnY8dG9Q47P+qCQSs8RtSTR6ag5SKK-bcmSZqkeipC2OU=BGA@mail.gmail.com> (raw)
In-Reply-To: <CH0PR11MB5724856C57CAA89AF40DA315EBA49@CH0PR11MB5724.namprd11.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 5384 bytes --]

Hi Cristian,

I agree the 64-bit version could enable better performance, and I will do
it in the next version.

For endianness, see my comments below (inline):

On Mon, Feb 20, 2023 at 12:19 PM Dumitrescu, Cristian <
cristian.dumitrescu@intel.com> wrote:

> HI Bili,
>
> Comments inline below:
>
> <snip>
>
> > diff --git a/lib/hash/rte_hash_xor.h b/lib/hash/rte_hash_xor.h
> > new file mode 100644
> > index 0000000000..7004f83ec2
> > --- /dev/null
> > +++ b/lib/hash/rte_hash_xor.h
> > @@ -0,0 +1,65 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2023 Intel Corporation
> > + */
> > +
> > +#ifndef _RTE_HASH_XOR_H_
> > +#define _RTE_HASH_XOR_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * RTE XOR Hash
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <stdint.h>
> > +
> > +#include <rte_byteorder.h>
> > +
> > +#define LEFT8b_MASK rte_cpu_to_be_32(0xff000000)
>
> I know that cpu_to_be() and be_to_cpu() are doing the same thing, but to
> me the correct function to use here is be_to_cpu(), as the for loop in the
> function below works with values in the CPU endianness. Would you agree?
>
What I have in mind is the 0xff000000 literal is in CPU endianness, and I'm
converting it to big-endian. For performance reasons, I think it's better
to work in big-endian in the loop.
Also as Vladimir suggested above, I'll remove these masks and switch to
using shifts directly. So I guess this won't matter anymore.

>
> > +#define LEFT16b_MASK rte_cpu_to_be_32(0xffff0000)
>
> I know that cpu_to_be() and be_to_cpu() are doing the same thing, but to
> me the correct function to use here is be_to_cpu(), as the for loop in the
> function below works with values in the CPU endianness. Would you agree?
>
> > +
> > +/**
> > + * Calculate XOR32 hash on user-supplied byte array.
> > + *
> > + * @param data
> > + *   Data to perform hash on.
> > + * @param data_len
> > + *   How many bytes to use to calculate hash value.
> > + * @param init_val
> > + *   Value to initialise hash generator.
> > + * @return
> > + *   32bit calculated hash value.
> > + */
> > +static inline uint32_t
> > +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
> > +{
> > +     uint32_t i;
> > +     uintptr_t pd = (uintptr_t) data;
> > +     init_val = rte_cpu_to_be_32(init_val);
> I don't think this is correct, I the correct version here is to remove the
> above assignment, as I think the intention of this function (for
> performance reasons) is to do the endianness conversion only when needed,
> which is once at the end of the function (and also when handling the 2-byte
> and 1-byte left-overs).
>
What I have in mind is to convert init_val to big-endian once here, instead
of having to convert every byte array chunks from big-endian to host endian
in the loop (more conversions, worse performance, see comment below).

>
> > +
> > +     for (i = 0; i < data_len / 4; i++) {
> > +             init_val ^= *(const uint32_t *)pd;
>
Look at this line, if init_val is still in host endianness, we need to do
init_val ^= rte_be_to_cpu_32(*(const uint32_t *)pd) instead.
I think the essential tradeoff here is that, if we convert init_val from
host to be and back, we pay the constant cost of 2 conversions. The
alternative is to convert byte array chunks from be to host. The number of
conversions depends on data_len. Now I think of it more, maybe the best
option is to condition on the value of data_len, and choose the method with
fewer conversions.

> > +             pd += 4;
> > +     }
> > +
> > +     if (data_len & 0x2) {
> > +             init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;
> > +             pd += 2;
> > +     }
> > +
> > +     if (data_len & 0x1)
> > +             init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
> > +
> > +     init_val = rte_be_to_cpu_32(init_val);
> > +     return init_val;
> > +}
> > +
>
> Due to the XOR properties (endianness-insensitivity, no carry bits, etc)
> and for performance reasons, I would also recommend implementing a 64-bit
> version of this function (while keeping the 32-bit result), similar to this:
>
> uint64_t *data64 = (uint64_t *)data;
> uint64_t result = init_data;
>
> /* Read & accumulate input data in 8-byte increments. */
> for (i = 0; i < data_len / 8; i++)
>         result ^= *data64++;
>
> data_len &= 0x7;
>
> /* Handle any remaining bytes in the input data (up to 7 bytes). */
> if (data_len >= 4) {
>         uint32_t *data32 = (uint32_t *)data64;
>
>         /* Read and accumulate  the next 4 bytes from the input data. */
>         result ^= *data32++;
>         data_len -= 4;
>
>         if (data_len >= 2) {
>                 uint16_t *data16 = (uint16_t *)data32;
>
>                 /* Read and accumulate the next 2 bytes from the input
> data. */
>                 result ^= *data16++
>                 data_len -= 2;
>
>                 if (data_len) {
>                         uint8_t *data8 = (uint8_t *)data8;
>
>                         /* Read and accumulate the next byte from the
> input data. */
>                         result ^= *data8;
>                 }
>         }
> }
>
> /* Accumulate the upper 32 bits on top of the lower 32 bits. */
> result ^= result >> 32;
>
> /* Single endianness swap at the very end. */
> return rte_cpu_to_be32((uint32_t)result);
>
> What do you think?
>
> Regards,
> Cristian
>
>

[-- Attachment #2: Type: text/html, Size: 7040 bytes --]

  reply	other threads:[~2023-02-20 20:44 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-15 10:30 [PATCH] " Bili Dong
2023-02-15 10:54 ` [PATCH v2] " Bili Dong
2023-02-15 11:06   ` [PATCH v3] " Bili Dong
2023-02-15 11:39     ` Morten Brørup
2023-02-15 21:39       ` Bili Dong
2023-02-16  9:49         ` Morten Brørup
2023-02-20 13:49     ` Thomas Monjalon
2023-02-20 17:21       ` Bili Dong
2023-02-20 17:38         ` Thomas Monjalon
2023-02-20 17:51           ` Bruce Richardson
2023-02-20 17:54             ` Bili Dong
2023-02-20 18:19               ` Dumitrescu, Cristian
2023-02-20 18:50                 ` Bili Dong
2023-02-20 20:10     ` Medvedkin, Vladimir
2023-02-20 20:17       ` Bili Dong
2023-02-20 20:19     ` Dumitrescu, Cristian
2023-02-20 20:44       ` Bili Dong [this message]
2023-02-20 23:04         ` Stephen Hemminger
2023-02-21  1:38           ` Bili Dong
2023-02-21 16:47     ` [PATCH v4] " Bili Dong
2023-02-21 17:55       ` [PATCH v5] " Bili Dong
2023-02-21 19:37         ` [PATCH v6] " Bili Dong
2023-02-21 21:35           ` Bili Dong
2023-06-12 14:56           ` Thomas Monjalon
2023-06-15 17:14           ` Vladimir Medvedkin
2023-06-16 17:15             ` Bili Dong
2023-06-17 20:34               ` Mattias Rönnblom
2023-06-20 19:12           ` [PATCH v7 1/1] " Bili Dong
2023-06-28 19:19             ` Medvedkin, Vladimir
2023-06-29 17:33             ` [PATCH v8] " Bili Dong
2023-07-06 20:08               ` Stephen Hemminger
2023-07-10 22:01                 ` Bili Dong
2023-07-10 21:59               ` [PATCH v9] " Bili Dong
2023-09-29 15:38                 ` David Marchand
2023-10-03 16:51                 ` Medvedkin, Vladimir
2023-10-06  8:06                 ` David Marchand
2023-06-17 20:59 ` [PATCH] " Stephen Hemminger
2023-06-20 19:27   ` Bili Dong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFnY8dG9Q47P+qCQSs8RtSTR6ag5SKK-bcmSZqkeipC2OU=BGA@mail.gmail.com' \
    --to=qobilidop@gmail.com \
    --cc=bruce.richardson@intel.com \
    --cc=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    --cc=sameh.gobriel@intel.com \
    --cc=thomas@monjalon.net \
    --cc=vladimir.medvedkin@intel.com \
    --cc=yipeng1.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).