Hi Bruce, Thanks for highlighting the variance. We found this was an internal test bed configuration issue. We are sharing the next version of the same patch with updated numbers. On 7/23/2024 10:42 PM, Bruce Richardson wrote: > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > On Tue, Jul 23, 2024 at 05:45:57PM +0100, Ferruh Yigit wrote: >> On 7/16/2024 7:37 AM, Vipin Varghese wrote: >>> Goal of the patch is to improve SSE macswap on x86_64 by reducing >>> the stalls in backend engine. Original implementation of the SSE >>> macswap makes loop call to multiple load, shuffle & store. Using >>> SIMD ISA interleaving we can reduce the stalls for >>> - load SSE token exhaustion >>> - Shuffle and Load dependency >>> >>> Also other changes which improves packet per second are >>> - Filling access to MBUF for offload flags which is separate cacheline, >>> - using register keyword >>> >>> Build test using meson script: >>> `````````````````````````````` >>> >>> build-gcc-static >>> buildtools >>> build-gcc-shared >>> build-mini >>> build-clang-static >>> build-clang-shared >>> build-x86-generic >>> >>> Test Results: >>> ````````````` >>> >>> Platform-1: AMD EPYC SIENA 8594P @2.3GHz, no boost >>> >>> ------------------------------------------------ >>> TEST IO 64B: baseline >>> - mellanox CX-7 2*200Gbps : 42.0 >>> - intel E810 1*100Gbps : 82.0 >>> - intel E810 2*200Gbps (2CQ-DA2): 82.45 >>> ------------------------------------------------ >>> TEST MACSWAP 64B: >>> - mellanox CX-7 2*200Gbps : 31.533 : 31.90 >>> - intel E810 1*100Gbps : 50.380 : 47.0 >>> - intel E810 2*200Gbps (2CQ-DA2): 48.840 : 49.827 >>> ------------------------------------------------ >>> TEST MACSWAP 128B: >>> - mellanox CX-7 2*200Gbps: 30.946 : 31.770 >>> - intel E810 1*100Gbps: 49.386 : 46.366 >>> - intel E810 2*200Gbps (2CQ-DA2): 47.979 : 49.503 >>> ------------------------------------------------ >>> TEST MACSWAP 256B: >>> - mellanox CX-7 2*200Gbps: 32.480 : 33.150 >>> - intel E810 1 * 100Gbps: 45.29 : 44.571 >>> - intel E810 2 * 200Gbps (2CQ-DA2): 45.033 : 45.117 >>> ------------------------------------------------ >>> >>> Platform-2: AMD EPYC 9554 @3.1GHz, no boost >>> >>> ------------------------------------------------ >>> TEST IO 64B: baseline >>> - intel E810 2*200Gbps (2CQ-DA2): 82.49 >>> ------------------------------------------------ >>> >>> TEST MACSWAP: 1Q 1C1T >>> 64B: : 45.0 : 45.54 >>> 128B: : 44.48 : 44.43 >>> 256B: : 42.0 : 41.99 >>> +++++++++++++++++++++++++ >>> TEST MACSWAP: 2Q 2C2T >>> 64B: : 59.5 : 60.55 >>> 128B: : 56.78 : 58.1 >>> 256B: : 41.85 : 41.99 >>> ------------------------------------------------ >>> >>> Signed-off-by: Vipin Varghese >>> >> Hi Bruce, John, >> >> Can you please help testing macswap performance with this patch on Intel >> platforms, to be sure it is not causing regression? >> > Hi Ferruh, > > We can try and get some Intel numbers for you, but I think at this point it > is better deferred to 24.11 due to lack of discussion and analysis of the > numbers. This is because the numbers above already show that it is causing > regressions - in fact many of the regressions are larger than the benefits > shown. This may be acceptable, but it would imply that we shouldn't be too > hasty in applying the patch. > > Regards, > /Bruce