29/09/2025 18:28, Shreesh Adiga:
> On Wed, Sep 24, 2025 at 8:28 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> > Hello,
> >
> > 16/07/2025 12:34, Shreesh Adiga:
> > > Replace the clearing of lower 32 bits of XMM register with blend of
> > > zero register.
> > > Replace the clearing of upper 64 bits of XMM register with
> > _mm_move_epi64.
> > > Clang is able to optimize away the AND + memory operand with the
> > > above sequence, however GCC is still emitting the code for AND with
> > > memory operands which is being explicitly eliminated here.
> > >
> > > Additionally replace the 48 byte crc_xmm_shift_tab with the contents of
> > > shf_table which is 32 bytes, achieving the same functionality.
> > >
> > > Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
> >
> > Sorry I'm not following.
> > Please could you start with defining the goal of this patch?
> > Is it a code simplification or a performance optimization?
>
> It is intended to be a minor performance optimization.
Please could you give some performance numbers in the commit log?
I don't think that this change can be reliably measured. The changes only impact
the last stage crc 64 to 32 fold and the last 16 bytes computation. The impact will only
be a couple of clock cycles at best. Reducing the static array usage also I don't know
if it can be reliably measured especially since it is not affecting the main loop.
This patch can be ignored if minor incremental changes are not desirable.