DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: Shreesh Adiga <16567adigashreesh@gmail.com>
Cc: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
	Jasvinder Singh <jasvinder.singh@intel.com>, <dev@dpdk.org>
Subject: Re: [PATCH] net/crc: reduce usage of static arrays in net_crc_sse.c
Date: Fri, 14 Nov 2025 15:49:06 +0000	[thread overview]
Message-ID: <aRdPcuSLTPah6Zry@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <20251011113202.937991-1-16567adigashreesh@gmail.com>

On Sat, Oct 11, 2025 at 04:59:34PM +0530, Shreesh Adiga wrote:
> Replace the clearing of lower 32 bits of XMM register with blend of
> zero register.
> Remove the clearing of upper 64 bits of tmp1 as it is redundant.
> tmp1 after clearing upper bits was being xor with tmp2 before the
> bits 96:65 from tmp2 were returned. The xor operation of bits 96:65
> remains unchanged due to tmp1 having bits 96:64 cleared to 0.
> After removing the xor operation, the clearing of upper 64 bits of tmp1
> becomes redundant and hence can be removed.
> Clang is able to optimize away the AND + memory operand with the
> above sequence, however GCC is still emitting the code for AND with
> memory operands which is being explicitly eliminated here.
> 
> Additionally replace the 48 byte crc_xmm_shift_tab with the contents of
> shf_table which is 32 bytes, achieving the same functionality.
> 
> Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
> ---

Sorry for delay in getting back to look at the second version of this. The
explanation, given in reponse to questions of v1, of the second set of
changes in this makes sense.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Ideally, this patch should have been sent in reponse to v1 to keep the
thread together. Also, I think this would be better split into two patches,
one for the reduce64_to_32 change and another for the shift table change.
That way, you could include the fuller explanation of the second change in
the commit log to make easier review.

> Changes since v1:
> Reversed the operands in the blend operation for readability.
> Removed tmp1 operations that are not affecting the result and hence
> avoid clearing the upper 64 bits for tmp1.
> 
>  lib/net/net_crc_sse.c | 30 ++++++------------------------
>  1 file changed, 6 insertions(+), 24 deletions(-)
> 
> diff --git a/lib/net/net_crc_sse.c b/lib/net/net_crc_sse.c
> index 112dc94ac1..e590aeb5ac 100644
> --- a/lib/net/net_crc_sse.c
> +++ b/lib/net/net_crc_sse.c
> @@ -96,35 +96,24 @@ crcr32_reduce_128_to_64(__m128i data128, __m128i precomp)
>  static __rte_always_inline uint32_t
>  crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
>  {
> -	static const alignas(16) uint32_t mask1[4] = {
> -		0xffffffff, 0xffffffff, 0x00000000, 0x00000000
> -	};
> -
> -	static const alignas(16) uint32_t mask2[4] = {
> -		0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
> -	};
>  	__m128i tmp0, tmp1, tmp2;
> 
> -	tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i *)mask2));
> +	tmp0 = _mm_blend_epi16(data64, _mm_setzero_si128(), 0x3);
> 
>  	tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
>  	tmp1 = _mm_xor_si128(tmp1, tmp0);
> -	tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
> 
>  	tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
> -	tmp2 = _mm_xor_si128(tmp2, tmp1);
>  	tmp2 = _mm_xor_si128(tmp2, tmp0);
> 
>  	return _mm_extract_epi32(tmp2, 2);
>  }
> 
> -static const alignas(16) uint8_t crc_xmm_shift_tab[48] = {
> -	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> -	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +static const alignas(16) uint8_t crc_xmm_shift_tab[32] = {
> +	0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
> +	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
>  	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> -	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
> -	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> -	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
> +	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
>  };
> 
>  /**
> @@ -216,19 +205,12 @@ crc32_eth_calc_pclmulqdq(
>  			0x80808080, 0x80808080, 0x80808080, 0x80808080
>  		};
> 
> -		const alignas(16) uint8_t shf_table[32] = {
> -			0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
> -			0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
> -			0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> -			0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
> -		};
> -
>  		__m128i last16, a, b;
> 
>  		last16 = _mm_loadu_si128((const __m128i *)&data[data_len - 16]);
> 
>  		temp = _mm_loadu_si128((const __m128i *)
> -			&shf_table[data_len & 15]);
> +			&crc_xmm_shift_tab[data_len & 15]);
>  		a = _mm_shuffle_epi8(fold, temp);
> 
>  		temp = _mm_xor_si128(temp,
> --
> 2.49.1
> 

  parent reply	other threads:[~2025-11-14 15:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-11 11:29 Shreesh Adiga
2025-10-22 17:30 ` Patrick Robb
2025-11-14 15:49 ` Bruce Richardson [this message]
2025-11-15 10:07   ` Shreesh Adiga
2025-11-15 10:09     ` [PATCH 1/2] net/crc: remove redundant operations in crcr32_reduce_64_to_32 Shreesh Adiga
2025-11-15 10:09       ` [PATCH 2/2] net/crc: reduce usage of static arrays in net_crc_sse.c Shreesh Adiga
  -- strict thread matches above, loose matches on Subject: below --
2025-07-16 10:34 [PATCH] " Shreesh Adiga
2025-09-24 14:58 ` Thomas Monjalon
2025-09-29 16:28   ` Shreesh Adiga
2025-10-01  7:55     ` Thomas Monjalon
2025-10-01 10:24       ` Shreesh Adiga
2025-10-01 12:16         ` Thomas Monjalon
2025-10-09 16:41 ` Bruce Richardson
2025-10-10 14:31   ` Shreesh Adiga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRdPcuSLTPah6Zry@bricha3-mobl1.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=16567adigashreesh@gmail.com \
    --cc=dev@dpdk.org \
    --cc=jasvinder.singh@intel.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).