DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: Shreesh Adiga <16567adigashreesh@gmail.com>
Cc: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
	Jasvinder Singh <jasvinder.singh@intel.com>, <dev@dpdk.org>
Subject: Re: [PATCH] net/crc: reduce usage of static arrays in net_crc_sse.c
Date: Thu, 9 Oct 2025 17:41:35 +0100	[thread overview]
Message-ID: <aOflvyl1Ow9N7Mve@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <20250716103439.831760-1-16567adigashreesh@gmail.com>

On Wed, Jul 16, 2025 at 04:04:39PM +0530, Shreesh Adiga wrote:
> Replace the clearing of lower 32 bits of XMM register with blend of
> zero register.
> Replace the clearing of upper 64 bits of XMM register with _mm_move_epi64.
> Clang is able to optimize away the AND + memory operand with the
> above sequence, however GCC is still emitting the code for AND with
> memory operands which is being explicitly eliminated here.
> 
> Additionally replace the 48 byte crc_xmm_shift_tab with the contents of
> shf_table which is 32 bytes, achieving the same functionality.
> 
> Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
> ---
>  lib/net/net_crc_sse.c | 30 +++++++-----------------------
>  1 file changed, 7 insertions(+), 23 deletions(-)
> 

See inline below. Changes to the reduce_64_to_32 look ok, I don't know
enough to understand fully the other changes you made. Maybe split the
patch into two patches for review and merge separately?

/Bruce

> diff --git a/lib/net/net_crc_sse.c b/lib/net/net_crc_sse.c
> index 112dc94ac1..eec854e587 100644
> --- a/lib/net/net_crc_sse.c
> +++ b/lib/net/net_crc_sse.c
> @@ -96,20 +96,13 @@ crcr32_reduce_128_to_64(__m128i data128, __m128i precomp)
>  static __rte_always_inline uint32_t
>  crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
>  {
> -	static const alignas(16) uint32_t mask1[4] = {
> -		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
> -	};
> -
> -	static const alignas(16) uint32_t mask2[4] = {
> -		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
> -	};
>  	__m128i tmp0, tmp1, tmp2;
>  
> -	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
> +	tmp0 = _mm_blend_epi16(_mm_setzero_si128(), data64, 252);

Minor nit: 252 would be better in hex to make it clearer that it's the
lower two bits are unset. Even better, how about switching the operands so
that the constant is just "3", which is clearer again.

>  
>  	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
>  	tmp1 = _mm_xor_si128(tmp1, tmp0);
> -	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
> +	tmp1 = _mm_move_epi64(tmp1);
>  

This change LGTM.

>  	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
>  	tmp2 = _mm_xor_si128(tmp2, tmp1);
> @@ -118,13 +111,11 @@ crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
>  	return _mm_extract_epi32(tmp2, 2);
>  }
>  
> -static const alignas(16) uint8_t crc_xmm_shift_tab[48] = {
> -	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> -	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +static const alignas(16) uint8_t crc_xmm_shift_tab[32] = {
> +	0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
> +	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
>  	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> -	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
> -	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> -	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
> +	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
>  };
>  

Can you perhaps explain how changing this table doesn't break existing uses
of the table as it now is in the code? Specifically, does xmm_shift_left
function not now have different behaviour?

>  /**
> @@ -216,19 +207,12 @@ crc32_eth_calc_pclmulqdq(
>  			0x80808080, 0x80808080, 0x80808080, 0x80808080
>  		};
>  
> -		const alignas(16) uint8_t shf_table[32] = {
> -			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
> -			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
> -			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> -			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
> -		};
> -
>  		__m128i last16, a, b;
>  
>  		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
>  
>  		temp = _mm_loadu_si128((const __m128i *)
> -			&shf_table[data_len & 15]);
> +			&crc_xmm_shift_tab[data_len & 15]);
>  		a = _mm_shuffle_epi8(fold, temp);
>  
>  		temp = _mm_xor_si128(temp,
> -- 
> 2.49.1
> 

  parent reply	other threads:[~2025-10-09 16:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-16 10:34 Shreesh Adiga
2025-09-24 14:58 ` Thomas Monjalon
2025-09-29 16:28   ` Shreesh Adiga
2025-10-01  7:55     ` Thomas Monjalon
2025-10-01 10:24       ` Shreesh Adiga
2025-10-01 12:16         ` Thomas Monjalon
2025-10-09 16:41 ` Bruce Richardson [this message]
2025-10-10 14:31   ` Shreesh Adiga
2025-10-11 11:29 Shreesh Adiga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aOflvyl1Ow9N7Mve@bricha3-mobl1.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=16567adigashreesh@gmail.com \
    --cc=dev@dpdk.org \
    --cc=jasvinder.singh@intel.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).