On 8/21/2024 8:25 PM, Stephen Hemminger wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Wed, 21 Aug 2024 20:08:55 +0530
Vipin Varghese <vipin.varghese@amd.com> wrote:

diff --git a/app/test-pmd/macswap_sse.h b/app/test-pmd/macswap_sse.h
index 223f87a539..29088843b7 100644
--- a/app/test-pmd/macswap_sse.h
+++ b/app/test-pmd/macswap_sse.h
@@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb,
      uint64_t ol_flags;
      int i;
      int r;
-     __m128i addr0, addr1, addr2, addr3;
+     register __m128i addr0, addr1, addr2, addr3;
Some compilers treat register as a no-op. Are you sure? Did you check with godbolt.

Thank you Stephen, I have tested the code changes on Linux using GCC and Clang compiler. 

In both cases in Linux environment, we have seen the the values loaded onto register `xmm`.

```
register const __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, 5, 4, 3, 2, 1, 0, 11, 10, 9, 8, 7, 6);
vmovdqa xmm0, xmmword ptr [rip + .LCPI0_0]

```

Both cases we have performance improvement.


Can you please help us understand if we have missed out something?