From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3B8A448900; Fri, 10 Oct 2025 16:32:10 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E9EC94021E; Fri, 10 Oct 2025 16:32:09 +0200 (CEST) Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by mails.dpdk.org (Postfix) with ESMTP id C15314003C for ; Fri, 10 Oct 2025 16:32:08 +0200 (CEST) Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-b3d5088259eso305479566b.1 for ; Fri, 10 Oct 2025 07:32:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760106728; x=1760711528; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TRxCZGcCcgqFQnT+u43nZuD3+Nh48IXCNP40ttOeV/w=; b=mlK6JjeyjeI06+jXR5KoZlSPOA+FTJp+vhuR3OJtdODv35cODSt9TK16qz//RC+2B0 Swb8Ynzk/I0l9Ouu2s5CCcXMfFTrYIPr0mOL9esjPdaAxg5giVNwRAjiSYK9+F1NcKOF GyrrGclb02M23W/tPPFGThZBdZR2U/TZBfPWWCCJ9+r2iJsFSRWMM/oP0Jrx2JoHqRi7 BTh829AX7ehx3Nt5Bsg8RijJW2ZmAtVFBdpZzCwVpekOd1aLh7ZovYfc6aN5lyhZdCdd Hsta/93roRawE/5cfPM/qOtW+wIU/0sGR235P/endAVz7/q+yh1xsxIeiEWMP05CDszM 07Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760106728; x=1760711528; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TRxCZGcCcgqFQnT+u43nZuD3+Nh48IXCNP40ttOeV/w=; b=XJeTyoY4V+WuMeMyBumL8AzxgeBuZc8bmOOo3+WiSsP7jyvATe2DRp9mcTHf9WsAEk y0LnBUTpHETID2rOYV243OHtTAWweuZuNUS82jefU+naLDczfIAAv3owiH/6SFO3iAQX fxQzo5+xwR1oipGDkMZMoA1rbuxKEOU+oA8w/q+n+FB1kzSIPMUIR4mBLhEkddgUtYZp 7wF/pBT94Jtobd0uj6+SKuhfxpN+qbIUMiASEoI/hqizvSB195yU9xmiqVxx6a3d/kr2 2P0xdDJQZkWkTXp/ZlL2mX8I4RmCM7ayjNKGHQf1W7T8rPsGpfd76WM1MK/5Ny1Fp6GP liCw== X-Forwarded-Encrypted: i=1; AJvYcCUKMWGEA/f+AWuYKKDMerYMr+rw8qIfkBR+BvvjkwXqxFsvDiYu1gO+WGeK5pWWyqYvG3I=@dpdk.org X-Gm-Message-State: AOJu0YzFti3vz/vUSdDBc+GX6VRDwUaPTOGFJH2DBK1+VVI/Xw9uFxI/ +5deAfB9cra3t6VEFkAUfUAEMATKvtY02++NWJsQcPfpEdf7bYY7FwbGIZYd7zYVBKTdpydDnTg G4syFn4cWGDj+ZnT8SEpowLaXtaeEwuZLadYu X-Gm-Gg: ASbGncutoSZzNENg/DDxDYALGggD7WWETwn72yLLUhgA7hJWugeJ+jv0dH25C6mInhv VaPgK3sk5ftEg1vPsHbGWIH6VsL/rrrMp0htHArF/x+0cS2gb41oyOZzt2WC67kcdZBBLT+jPTn 46hQN5b+1CYvOJ4NoODWq2XhZdsDtpSpOBGOW9JZZvf2dtNyvnZWF7N6oyrJuOoC6ARUKmZUm1x yQfXE5zEpmD/uXMcE77iZIFa8CMx5jxQjYQ X-Google-Smtp-Source: AGHT+IEQG7qzCe9yZfgCZbpTgZtYSB1kbBlT1fJLXcLNQg6d/RGu3r6u9JyZUFG1iDg/sQNwuT6UZvixsTokUBNWJ0g= X-Received: by 2002:a17:906:6a2a:b0:b42:7c2:1f9f with SMTP id a640c23a62f3a-b50acc1aa50mr1326987966b.62.1760106728014; Fri, 10 Oct 2025 07:32:08 -0700 (PDT) MIME-Version: 1.0 References: <20250716103439.831760-1-16567adigashreesh@gmail.com> In-Reply-To: From: Shreesh Adiga <16567adigashreesh@gmail.com> Date: Fri, 10 Oct 2025 20:01:56 +0530 X-Gm-Features: AS18NWD9vHIvp1GPKtQ-96fMebEm_wiUXDYj3FohrSTkKytosk48jxiqELU4jVo Message-ID: Subject: Re: [PATCH] net/crc: reduce usage of static arrays in net_crc_sse.c To: Bruce Richardson Cc: Konstantin Ananyev , Jasvinder Singh , dev@dpdk.org Content-Type: multipart/alternative; boundary="0000000000004a4fc50640cec844" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --0000000000004a4fc50640cec844 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Oct 9, 2025 at 10:11=E2=80=AFPM Bruce Richardson wrote: > On Wed, Jul 16, 2025 at 04:04:39PM +0530, Shreesh Adiga wrote: > > Replace the clearing of lower 32 bits of XMM register with blend of > > zero register. > > Replace the clearing of upper 64 bits of XMM register with > _mm_move_epi64. > > Clang is able to optimize away the AND + memory operand with the > > above sequence, however GCC is still emitting the code for AND with > > memory operands which is being explicitly eliminated here. > > > > Additionally replace the 48 byte crc_xmm_shift_tab with the contents of > > shf_table which is 32 bytes, achieving the same functionality. > > > > Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com> > > --- > > lib/net/net_crc_sse.c | 30 +++++++----------------------- > > 1 file changed, 7 insertions(+), 23 deletions(-) > > > > See inline below. Changes to the reduce_64_to_32 look ok, I don't know > enough to understand fully the other changes you made. Maybe split the > patch into two patches for review and merge separately? > > /Bruce > > > diff --git a/lib/net/net_crc_sse.c b/lib/net/net_crc_sse.c > > index 112dc94ac1..eec854e587 100644 > > --- a/lib/net/net_crc_sse.c > > +++ b/lib/net/net_crc_sse.c > > @@ -96,20 +96,13 @@ crcr32_reduce_128_to_64(__m128i data128, __m128i > precomp) > > static __rte_always_inline uint32_t > > crcr32_reduce_64_to_32(__m128i data64, __m128i precomp) > > { > > - static const alignas(16) uint32_t mask1[4] =3D { > > - 0xffffffff, 0xffffffff, 0x00000000, 0x00000000 > > - }; > > - > > - static const alignas(16) uint32_t mask2[4] =3D { > > - 0x00000000, 0xffffffff, 0xffffffff, 0xffffffff > > - }; > > __m128i tmp0, tmp1, tmp2; > > > > - tmp0 =3D _mm_and_si128(data64, _mm_load_si128((const __m128i > *)mask2)); > > + tmp0 =3D _mm_blend_epi16(_mm_setzero_si128(), data64, 252); > > Minor nit: 252 would be better in hex to make it clearer that it's the > lower two bits are unset. Even better, how about switching the operands s= o > that the constant is just "3", which is clearer again. > Okay I will update it with your suggestion in the next patch. > > > > tmp1 =3D _mm_clmulepi64_si128(tmp0, precomp, 0x00); > > tmp1 =3D _mm_xor_si128(tmp1, tmp0); > > - tmp1 =3D _mm_and_si128(tmp1, _mm_load_si128((const __m128i *)mask= 1)); > > + tmp1 =3D _mm_move_epi64(tmp1); > > > > This change LGTM. > > > tmp2 =3D _mm_clmulepi64_si128(tmp1, precomp, 0x10); > > tmp2 =3D _mm_xor_si128(tmp2, tmp1); > > @@ -118,13 +111,11 @@ crcr32_reduce_64_to_32(__m128i data64, __m128i > precomp) > > return _mm_extract_epi32(tmp2, 2); > > } > > > > -static const alignas(16) uint8_t crc_xmm_shift_tab[48] =3D { > > - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, > > - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, > > +static const alignas(16) uint8_t crc_xmm_shift_tab[32] =3D { > > + 0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, > > + 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, > > 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, > > - 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, > > - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, > > - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff > > + 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f > > }; > > > > Can you perhaps explain how changing this table doesn't break existing us= es > of the table as it now is in the code? Specifically, does xmm_shift_left > function not now have different behaviour? > Sure, crc_xmm_shift_tab is only used inside xmm_shift_left which is only used when the total data_len is < 16. We call xmm_shift_left(fold, 8 - data_len) when len <=3D 4 and xmm_shift_left(fold, 16 - data_len) when 5 <=3D len <=3D 15. This results i= n accessing crc_xmm_shift_tab between 1 and 31, i.e. element 0 is never accessed. Now if we take a specific case of xmm_shift_left(fold, 10), then previously the shuffle register would get loaded with (crc_xmm_shift_tab + 16 - 10) which would be crc_xmm_shift_tab + 6: {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05} which when used with PSHUFB would result in first 10 bytes of reg being 0 and the lower 6 elements moving left: {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, d1, d2, d3, d4, d5, d6}. The 0 get inserted because PSHUFB with index > 0x7f (MSB set) results in 0. Now since we have replaced the contents of crc_xmm_shift_tab with shf_table, we will load: {0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05} into the index register. Since the first 10 elements have MSB set, PSHUFB will again result in the same vector above with first 10 elements being zeroes and the 6 moving left as intended. Since xmm_shift_left is called with num between 11 and 1, we don't access crc_xmm_shift_tab[0] and the remaining elements 0x8{i} behave identically to 0xff when used with PSHUFB. Thus the xmm_shift_left behavior for currently used num values inside this file has identical behavior. > > > /** > > @@ -216,19 +207,12 @@ crc32_eth_calc_pclmulqdq( > > 0x80808080, 0x80808080, 0x80808080, 0x80808080 > > }; > > > > - const alignas(16) uint8_t shf_table[32] =3D { > > - 0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, > > - 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, > > - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, > > - 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f > > - }; > > - > > __m128i last16, a, b; > > > > last16 =3D _mm_loadu_si128((const __m128i *)&data[data_le= n - > 16]); > > > > temp =3D _mm_loadu_si128((const __m128i *) > > - &shf_table[data_len & 15]); > > + &crc_xmm_shift_tab[data_len & 15]); > > a =3D _mm_shuffle_epi8(fold, temp); > > > > temp =3D _mm_xor_si128(temp, > > -- > > 2.49.1 > > > --0000000000004a4fc50640cec844 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Oct 9, = 2025 at 10:11=E2=80=AFPM Bruce Richardson <bruce.richardson@intel.com> wrote:
On Wed, Jul 16, 2025 at 04:04:39= PM +0530, Shreesh Adiga wrote:
> Replace the clearing of lower 32 bits of XMM register with blend of > zero register.
> Replace the clearing of upper 64 bits of XMM register with _mm_move_ep= i64.
> Clang is able to optimize away the AND + memory operand with the
> above sequence, however GCC is still emitting the code for AND with > memory operands which is being explicitly eliminated here.
>
> Additionally replace the 48 byte crc_xmm_shift_tab with the contents o= f
> shf_table which is 32 bytes, achieving the same functionality.
>
> Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
> ---
>=C2=A0 lib/net/net_crc_sse.c | 30 +++++++-----------------------
>=C2=A0 1 file changed, 7 insertions(+), 23 deletions(-)
>

See inline below. Changes to the reduce_64_to_32 look ok, I don't know<= br> enough to understand fully the other changes you made. Maybe split the
patch into two patches for review and merge separately?

/Bruce

> diff --git a/lib/net/net_crc_sse.c b/lib/net/net_crc_sse.c
> index 112dc94ac1..eec854e587 100644
> --- a/lib/net/net_crc_sse.c
> +++ b/lib/net/net_crc_sse.c
> @@ -96,20 +96,13 @@ crcr32_reduce_128_to_64(__m128i data128, __m128i p= recomp)
>=C2=A0 static __rte_always_inline uint32_t
>=C2=A0 crcr32_reduce_64_to_32(__m128i data64, __m128i precomp)
>=C2=A0 {
> -=C2=A0 =C2=A0 =C2=A0static const alignas(16) uint32_t mask1[4] =3D {<= br> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00xffffffff, 0xfffffff= f, 0x00000000, 0x00000000
> -=C2=A0 =C2=A0 =C2=A0};
> -
> -=C2=A0 =C2=A0 =C2=A0static const alignas(16) uint32_t mask2[4] =3D {<= br> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00x00000000, 0xfffffff= f, 0xffffffff, 0xffffffff
> -=C2=A0 =C2=A0 =C2=A0};
>=C2=A0 =C2=A0 =C2=A0 =C2=A0__m128i tmp0, tmp1, tmp2;
>=C2=A0
> -=C2=A0 =C2=A0 =C2=A0tmp0 =3D _mm_and_si128(data64, _mm_load_si128((co= nst __m128i *)mask2));
> +=C2=A0 =C2=A0 =C2=A0tmp0 =3D _mm_blend_epi16(_mm_setzero_si128(), dat= a64, 252);

Minor nit: 252 would be better in hex to make it clearer that it's the<= br> lower two bits are unset. Even better, how about switching the operands so<= br> that the constant is just "3", which is clearer again.
Okay I will update it with your suggestion in the next patch.=C2= =A0


>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0tmp1 =3D _mm_clmulepi64_si128(tmp0, precomp,= 0x00);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0tmp1 =3D _mm_xor_si128(tmp1, tmp0);
> -=C2=A0 =C2=A0 =C2=A0tmp1 =3D _mm_and_si128(tmp1, _mm_load_si128((cons= t __m128i *)mask1));
> +=C2=A0 =C2=A0 =C2=A0tmp1 =3D _mm_move_epi64(tmp1);
>=C2=A0

This change LGTM.

>=C2=A0 =C2=A0 =C2=A0 =C2=A0tmp2 =3D _mm_clmulepi64_si128(tmp1, precomp,= 0x10);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0tmp2 =3D _mm_xor_si128(tmp2, tmp1);
> @@ -118,13 +111,11 @@ crcr32_reduce_64_to_32(__m128i data64, __m128i p= recomp)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0return _mm_extract_epi32(tmp2, 2);
>=C2=A0 }
>=C2=A0
> -static const alignas(16) uint8_t crc_xmm_shift_tab[48] =3D {
> -=C2=A0 =C2=A0 =C2=A00xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, > -=C2=A0 =C2=A0 =C2=A00xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, > +static const alignas(16) uint8_t crc_xmm_shift_tab[32] =3D {
> +=C2=A0 =C2=A0 =C2=A00x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, > +=C2=A0 =C2=A0 =C2=A00x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, >=C2=A0 =C2=A0 =C2=A0 =C2=A00x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x= 07,
> -=C2=A0 =C2=A0 =C2=A00x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, > -=C2=A0 =C2=A0 =C2=A00xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, > -=C2=A0 =C2=A0 =C2=A00xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff > +=C2=A0 =C2=A0 =C2=A00x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f >=C2=A0 };
>=C2=A0

Can you perhaps explain how changing this table doesn't break existing = uses
of the table as it now is in the code? Specifically, does xmm_shift_left function not now have different behaviour?
Sure,=C2=A0= crc_xmm_shift_tab is only used inside=C2=A0xmm_shift_left which is only use= d when
the total data_len is < 16. We call=C2=A0xmm_shift_left= (fold, 8 - data_len) when len <=3D 4 and
xmm_shift_left(fold, = 16 - data_len) when 5 <=3D len <=3D 15. This results in accessing
crc_xmm_shift_tab between 1 and 31, i.e. element 0 is never accessed= .

Now if we take a specific case of xmm_shift_left= (fold, 10), then previously the shuffle register
would get loaded= with=C2=A0(crc_xmm_shift_tab + 16 - 10) which would be crc_xmm_shift_tab= =C2=A0+ 6:
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,= 0xff, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05} which
when used with P= SHUFB would result in first 10 bytes of reg being 0 and the lower 6
elements moving left: {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, d1, d2, d3, d4, d5,= d6}. The 0 get inserted
because PSHUFB with index > 0x7f (MSB= set) results in 0.

Now since we have replaced the= contents of=C2=A0crc_xmm_shift_tab with=C2=A0shf_table, we will load:
{0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x00, 0x= 01, 0x02, 0x03, 0x04, 0x05}
into the index register. Since the fi= rst 10 elements have MSB set, PSHUFB will again result in the=C2=A0
same vector above=C2=A0with first 10 elements being zeroes and the 6 mov= ing left as intended.
Since=C2=A0xmm_shift_left is called with nu= m between 11 and 1, we don't access=C2=A0crc_xmm_shift_tab[0]
and the remaining elements 0x8{i} behave identically to 0xff when used wit= h PSHUFB. Thus
the=C2=A0xmm_shift_left behavior for currently use= d num values inside this file has identical behavior.
=C2=A0

>=C2=A0 /**
> @@ -216,19 +207,12 @@ crc32_eth_calc_pclmulqdq(
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00x80808080, 0x80808080, 0x80808080, 0x80808080
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0};
>=C2=A0
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const alignas(16) uin= t8_t shf_table[32] =3D {
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A00x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A00x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A00x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A00x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0};
> -
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__m128i last16, = a, b;
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0last16 =3D _mm_l= oadu_si128((const __m128i *)&data[data_len - 16]);
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0temp =3D _mm_loa= du_si128((const __m128i *)
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0&shf_table[data_len & 15]);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0&crc_xmm_shift_tab[data_len & 15]);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0a =3D _mm_shuffl= e_epi8(fold, temp);
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0temp =3D _mm_xor= _si128(temp,
> --
> 2.49.1
>
--0000000000004a4fc50640cec844--