DPDK patches and discussions
 help / color / mirror / Atom feed
From: Shreesh Adiga <16567adigashreesh@gmail.com>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
	 Jasvinder Singh <jasvinder.singh@intel.com>,
	dev@dpdk.org
Subject: Re: [PATCH] net/crc: reduce usage of static arrays in net_crc_sse.c
Date: Sat, 15 Nov 2025 15:37:02 +0530	[thread overview]
Message-ID: <CA+-x59aamcto-nwQooHFGTN_K_ahE+u7eb_aJG2kk6muo+MnGw@mail.gmail.com> (raw)
In-Reply-To: <aRdPcuSLTPah6Zry@bricha3-mobl1.ger.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 5219 bytes --]

On Fri, Nov 14, 2025 at 9:19 PM Bruce Richardson <bruce.richardson@intel.com>
wrote:

> On Sat, Oct 11, 2025 at 04:59:34PM +0530, Shreesh Adiga wrote:
> > Replace the clearing of lower 32 bits of XMM register with blend of
> > zero register.
> > Remove the clearing of upper 64 bits of tmp1 as it is redundant.
> > tmp1 after clearing upper bits was being xor with tmp2 before the
> > bits 96:65 from tmp2 were returned. The xor operation of bits 96:65
> > remains unchanged due to tmp1 having bits 96:64 cleared to 0.
> > After removing the xor operation, the clearing of upper 64 bits of tmp1
> > becomes redundant and hence can be removed.
> > Clang is able to optimize away the AND + memory operand with the
> > above sequence, however GCC is still emitting the code for AND with
> > memory operands which is being explicitly eliminated here.
> >
> > Additionally replace the 48 byte crc_xmm_shift_tab with the contents of
> > shf_table which is 32 bytes, achieving the same functionality.
> >
> > Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
> > ---
>
> Sorry for delay in getting back to look at the second version of this. The
> explanation, given in reponse to questions of v1, of the second set of
> changes in this makes sense.
>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
>
> Ideally, this patch should have been sent in reponse to v1 to keep the
> thread together. Also, I think this would be better split into two patches,
> one for the reduce64_to_32 change and another for the shift table change.
> That way, you could include the fuller explanation of the second change in
> the commit log to make easier review.
>

Sure I will send an updated patch after splitting into two patches.
Since I am not familiar with email based patch submissions, it ended up
being
a new thread, sorry about that. I will try to update this thread with the
new revision soon.


> > Changes since v1:
> > Reversed the operands in the blend operation for readability.
> > Removed tmp1 operations that are not affecting the result and hence
> > avoid clearing the upper 64 bits for tmp1.
> >
> >  lib/net/net_crc_sse.c | 30 ++++++------------------------
> >  1 file changed, 6 insertions(+), 24 deletions(-)
> >
> > diff --git a/lib/net/net_crc_sse.c b/lib/net/net_crc_sse.c
> > index 112dc94ac1..e590aeb5ac 100644
> > --- a/lib/net/net_crc_sse.c
> > +++ b/lib/net/net_crc_sse.c
> > @@ -96,35 +96,24 @@ crcr32_reduce_128_to_64(__m128i data128, __m128i
> precomp)
> >  static __rte_always_inline uint32_t
> >  crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
> >  {
> > -     static const alignas(16) uint32_t mask1[4] = {
> > -             0xffffffff, 0xffffffff, 0x00000000, 0x00000000
> > -     };
> > -
> > -     static const alignas(16) uint32_t mask2[4] = {
> > -             0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
> > -     };
> >       __m128i tmp0, tmp1, tmp2;
> >
> > -     tmp0 = _mm_and_si128(data64, _mm_load_si128((const __m128i
> *)mask2));
> > +     tmp0 = _mm_blend_epi16(data64, _mm_setzero_si128(), 0x3);
> >
> >       tmp1 = _mm_clmulepi64_si128(tmp0, precomp, 0x00);
> >       tmp1 = _mm_xor_si128(tmp1, tmp0);
> > -     tmp1 = _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask1));
> >
> >       tmp2 = _mm_clmulepi64_si128(tmp1, precomp, 0x10);
> > -     tmp2 = _mm_xor_si128(tmp2, tmp1);
> >       tmp2 = _mm_xor_si128(tmp2, tmp0);
> >
> >       return _mm_extract_epi32(tmp2, 2);
> >  }
> >
> > -static const alignas(16) uint8_t crc_xmm_shift_tab[48] = {
> > -     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> > -     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> > +static const alignas(16) uint8_t crc_xmm_shift_tab[32] = {
> > +     0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
> > +     0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
> >       0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> > -     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
> > -     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> > -     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
> > +     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
> >  };
> >
> >  /**
> > @@ -216,19 +205,12 @@ crc32_eth_calc_pclmulqdq(
> >                       0x80808080, 0x80808080, 0x80808080, 0x80808080
> >               };
> >
> > -             const alignas(16) uint8_t shf_table[32] = {
> > -                     0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
> > -                     0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
> > -                     0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> > -                     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
> > -             };
> > -
> >               __m128i last16, a, b;
> >
> >               last16 = _mm_loadu_si128((const __m128i *)&data[data_len -
> 16]);
> >
> >               temp = _mm_loadu_si128((const __m128i *)
> > -                     &shf_table[data_len & 15]);
> > +                     &crc_xmm_shift_tab[data_len & 15]);
> >               a = _mm_shuffle_epi8(fold, temp);
> >
> >               temp = _mm_xor_si128(temp,
> > --
> > 2.49.1
> >
>

[-- Attachment #2: Type: text/html, Size: 6633 bytes --]

  reply	other threads:[~2025-11-15 10:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-11 11:29 Shreesh Adiga
2025-10-22 17:30 ` Patrick Robb
2025-11-14 15:49 ` Bruce Richardson
2025-11-15 10:07   ` Shreesh Adiga [this message]
2025-11-15 10:09     ` [PATCH 1/2] net/crc: remove redundant operations in crcr32_reduce_64_to_32 Shreesh Adiga
2025-11-15 10:09       ` [PATCH 2/2] net/crc: reduce usage of static arrays in net_crc_sse.c Shreesh Adiga
  -- strict thread matches above, loose matches on Subject: below --
2025-07-16 10:34 [PATCH] " Shreesh Adiga
2025-09-24 14:58 ` Thomas Monjalon
2025-09-29 16:28   ` Shreesh Adiga
2025-10-01  7:55     ` Thomas Monjalon
2025-10-01 10:24       ` Shreesh Adiga
2025-10-01 12:16         ` Thomas Monjalon
2025-10-09 16:41 ` Bruce Richardson
2025-10-10 14:31   ` Shreesh Adiga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+-x59aamcto-nwQooHFGTN_K_ahE+u7eb_aJG2kk6muo+MnGw@mail.gmail.com \
    --to=16567adigashreesh@gmail.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=jasvinder.singh@intel.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).