[AMD Official Use Only - AMD Internal Distribution Only]

<snipped>
 
> > > >> --- a/app/test-pmd/macswap_sse.h
> > > >> +++ b/app/test-pmd/macswap_sse.h
> > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t
> nb,
> > > >>        uint64_t ol_flags;
> > > >>        int i;
> > > >>        int r;
> > > >> -     __m128i addr0, addr1, addr2, addr3;
> > > >> +     register __m128i addr0, addr1, addr2, addr3;
> > > > Some compilers treat register as a no-op. Are you sure? Did you check
> with godbolt.
> > >
> > > Thank you Stephen, I have tested the code changes on Linux using GCC
> > > and Clang compiler.
> > >
> > > In both cases in Linux environment, we have seen the the values
> > > loaded onto register `xmm`.
> > >
> > > ```
> > > registerconst__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, 5, 4,
> > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+
> > > .LCPI0_0]
>
> Yep, that what I would probably expect: one time load before the loop starts,
> right?
> Curious  what exactly it would generate then if 'register' keyword is missed?
> BTW, on my box,  gcc-11  with '-O3 -msse4.2 ...'  I am seeing expected
> behavior without 'register' keyword.
> Is it some particular compiler version that misbehaves?
 
Thank you, Konstantin, for this pointer. I have been trying this understand this a bit more internally. Here are my observations
 
1. shuf simd ISA works on XMM register only.
2. Any values from variables has to be loaded to `xmm` register before processing.
3. when compiled for `-march=native` with compiler not aware (SoC Arch gcc weights) without patch might have generating with ` movzx   eax, BYTE PTR [rbp-48]`
4. when register keyword is applied for both shufl_mask and addr, the compiler generates trying to get the variables directly into xmm using ` vmovdqu (%rsi),%xmm1`
 
So, I think you are right, from gcc12.3 and gcc 13.1 which supports `-march=znver4` this problem will not come.
 
>
> > >
> > > ```
> > >
> > > Both cases we have performance improvement.
> > >
> > >
> > > Can you please help us understand if we have missed out something?
> >
> > Ok, not sure why compiler would not decide to already use a register here?