From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 73B434591E; Fri, 6 Sep 2024 15:02:07 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 406754025D; Fri, 6 Sep 2024 15:02:07 +0200 (CEST) Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2040.outbound.protection.outlook.com [40.107.93.40]) by mails.dpdk.org (Postfix) with ESMTP id 9768B400D5 for ; Fri, 6 Sep 2024 15:02:05 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=alqAxrEhWkDW3L9kukDP9l72SPqoOQ38skvxoxaD8ESXCtqxj8WcTP8ZOpirfr/5oiLuO0n4/XG1Thdky8/ZjM4Ys6Kr+YWhFRBTo2F9FfFPj/I+oK7Oz283HdlkG1m0w8DivlY5/98J024BD6lWkiY3qKWrFyFC0rBajtaj1+vgs2kxUVl+UUQBnDCS2VBRkWZeYxAFdgHuUc7MvgaIX2DtxjWX4z2JAyMnX1hWHOLOg/g4jCtkTs311SL3hUNZMprAu7gQJXBowmdtqzGGeLb2+EMYLrn3ow/DXMO75nvf5VgxilYALkZmRiBpvo5L3uo8tTkfEpdYThVXUAuFRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bGjMonKF+hffEx1BDeb+VBBMPUvbU0KsNAfhK1UiotA=; b=wEAExT7Zu8md5Ao4ZfhSFBTOcikkNQFFqwwKCjhCBnq92dH9tpn2JfHdx/l+OaF8eu9plxzPJrh897iSldhg518TmjuhXRzMKhg0HsENNNMj3dMneXUGf12g+6NV1e2XXybCzjXYCbAi2cEPk40HLq6sEUKy5Dj9yJ/0J6Rwgy7Y8lLtbIK5eoQEXpPvBD+kJoZY++LHalByZ43mgQsGQU54S3AswTOK9HYkT02INAoUHvkROATDblbH8tYC2cl+vpEXS7lT4sAB52B6JjLkPpeLqPa/E+Ml/plY8TOL3J/cNUZ/dMmhJ1WUipDxL+yBaUcqIH+uxNa3+idVQhW67A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bGjMonKF+hffEx1BDeb+VBBMPUvbU0KsNAfhK1UiotA=; b=HXh+xFAnnvoeHU9adFnS41FY/YQyfSyWr6AHKX++JpaB6P/32J8X3cIcrTsIjafQKNDbi+VGDxF5adrfDJU55/hLEQgIdnfqMV9dHcAT3QFX+9Rfvfoi9XL/XXEvRilptSQWbYfu5ski3paCI9SCp0iPHLcH+s87K9n4CknRs04= Received: from PH7PR12MB8596.namprd12.prod.outlook.com (2603:10b6:510:1b7::6) by PH7PR12MB8107.namprd12.prod.outlook.com (2603:10b6:510:2bb::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.25; Fri, 6 Sep 2024 13:02:01 +0000 Received: from PH7PR12MB8596.namprd12.prod.outlook.com ([fe80::a011:943d:7291:8069]) by PH7PR12MB8596.namprd12.prod.outlook.com ([fe80::a011:943d:7291:8069%5]) with mapi id 15.20.7939.017; Fri, 6 Sep 2024 13:02:01 +0000 From: "Varghese, Vipin" To: Konstantin Ananyev , Stephen Hemminger CC: "Yigit, Ferruh" , "bruce.richardson@intel.com" , "konstantin.v.ananyev@yandex.ru" , "aman.deep.singh@intel.com" , "dev@dpdk.org" Subject: RE: [PATCH v2 1/3] app/testpmd: add register keyword Thread-Topic: [PATCH v2 1/3] app/testpmd: add register keyword Thread-Index: AQHa89f04/LxlhssC0ekoZEKEZx5QrIxzIMAgAl4UQCAACOogIAKn3IAgATIgMA= Date: Fri, 6 Sep 2024 13:02:01 +0000 Message-ID: References: <20240716063724.850-1-vipin.varghese@amd.com> <20240821143857.1972-1-vipin.varghese@amd.com> <20240821143857.1972-2-vipin.varghese@amd.com> <20240821075502.3faa0997@hermes.local> <20240827103924.1d1d2711@hermes.local> <0ae233fb72ce49cea5186e1f924db76b@huawei.com> In-Reply-To: <0ae233fb72ce49cea5186e1f924db76b@huawei.com> Accept-Language: en-IN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_dce362fe-1558-4fb5-9f64-8a6240d76441_ActionId=6672cf1d-b837-47a5-aa53-63bd61e30a13; MSIP_Label_dce362fe-1558-4fb5-9f64-8a6240d76441_ContentBits=0; MSIP_Label_dce362fe-1558-4fb5-9f64-8a6240d76441_Enabled=true; MSIP_Label_dce362fe-1558-4fb5-9f64-8a6240d76441_Method=Standard; MSIP_Label_dce362fe-1558-4fb5-9f64-8a6240d76441_Name=AMD Internal Distribution Only; MSIP_Label_dce362fe-1558-4fb5-9f64-8a6240d76441_SetDate=2024-09-06T12:55:32Z; MSIP_Label_dce362fe-1558-4fb5-9f64-8a6240d76441_SiteId=3dd8961f-e488-4e60-8e11-a82d994e183d; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PH7PR12MB8596:EE_|PH7PR12MB8107:EE_ x-ms-office365-filtering-correlation-id: 05f34361-4b8b-42db-c252-08dcce741948 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?us-ascii?Q?tb3A3680wrzm/Hvnq2qDr+z69Vez/iUhGqWd2zHLJXFrUyCus2oY6HHIk3Xs?= =?us-ascii?Q?AF/WZR2uzD3jmoZ9xEJWo/jUYbEZO9i2t1qb7WmjPRTiaCzNAdQmJt763CHj?= =?us-ascii?Q?rJKcZakkrMeKAHsBlrY5CgYMVvEwc1YuKCfbdkBuTClMFqgrIN1ma9zXLiS/?= =?us-ascii?Q?AkVilMQ0a6XtYkkpNwpGYM7k9XpYoQLgF/PVK1TWpwBKLSKWfZzxrgJHanME?= =?us-ascii?Q?jkqPexLQFB/xob10D6sl3D+/lHwxcNODhz1ZnqFri35Em4+UxTmUSEhdPxLU?= =?us-ascii?Q?k+0+GTrB9IMrA/vejgiW4LCQaEpkMccAkAfUFCFtJNDvUBdL1NrRur+en4mz?= =?us-ascii?Q?NOdZLbFNOmz9ivvjTd8jSn4UBBQdMuDWejm/MDmj4eTr5GVINhvt2YetGJ14?= =?us-ascii?Q?x9IBQqoiN/eG1x9+snqW+UxKteV/N2MoUrTJKNlPs4VlIynLlYw+vwd4YLDh?= =?us-ascii?Q?C3prcMhUOEuXa+N3htRWjQ7vduNDhUqpqEgpHEsGGkgtC+g1mVoP+QWvN8kM?= =?us-ascii?Q?Q08Ud8yVyS8oVIuRr22/NCi9f6vFG+Jfg+kIv7ES1KCLIucpaG5+21aJClvP?= =?us-ascii?Q?Bsu6KckE3r1hG6Gn91LGwcxaZ/XzPBHsIbz8GywM4rIfv4sAvGwHGz6FfYHw?= =?us-ascii?Q?AIlgjJ7czvTGbP3zcINogasBbps3vsPA0W9tu7jAwa7GTiukToiCAVOoP6aV?= =?us-ascii?Q?HOjAHueoh7ABcblOZIl40ykww7pYbbsCOpNr24JNA/3DngHb6ToINHEZ/iym?= =?us-ascii?Q?x2mh8kLaFJD8NemuQtDBroMMIoJEizTzJtH94pOShOUehVgchXjAC061uvxW?= =?us-ascii?Q?pL7NwauRCp4gSGsWznpgOpoIHzFBFVRarmJIZyOjpMRwd58P7dxw2za4eXFt?= =?us-ascii?Q?BIPee9KFfVHUHEzGAJy99oYworAVVoURXw1fcMI/zpgQwIQLVvBkWmw9K7Ij?= =?us-ascii?Q?xtXojK+D2a4dTkJQGxGzsw5l1zgWgXP7WidxgWgCn1ifp5/A1rSCjNcI2G22?= =?us-ascii?Q?S9Sz7NIOsg8H8dXaVz6lybDgHdguK0nfscx5W1/QWlMnGEd/YhQK3R08DfOs?= =?us-ascii?Q?Mw7+DKvaeJp9c5IedinmjCxR8ks1wRERUXlqquGYX5SGgRoCAvoYTZavXMtW?= =?us-ascii?Q?2eXXjcET82uIms3e5xHPvQV6k3oeVZ8lhroU/CPV1HB7m/PngbUwf7oAXG8y?= =?us-ascii?Q?pTrBPht10YDeSpyKw59KTtEmPTQijmV/Oe+8Q+NiRlHdARzj5rS6ANoqWFgE?= =?us-ascii?Q?Gy8yshbHNKtXdWQ0Lx5+LOetPAGypNW+OszIQ3D/BwzaAGv1bvghXvaWF7mF?= =?us-ascii?Q?NiofpPJFQP8sYfpgGZAYmdByMRPY2vHC5r/mu3e8RdyB7b9aujLrX9jdVaRk?= =?us-ascii?Q?AEbqKkwr1TEQ9duOZGkNIFJoHyDxgaY8graSW4Fl6wZj14sYnw=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH7PR12MB8596.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?6PlgPH3Jmv+paDEADGcioZ+bCgjOSGzYMlcMhN3p3h3lXQopdY7eRoVL2n+O?= =?us-ascii?Q?kdlW3eHKl8kTZOWORrmp5CpfxNIwPjcwYTVg3E7bqLwEF+L5GGPqhd+MAWnc?= =?us-ascii?Q?lruomC9sI5aEvFEDL6SCUlzc1AOUWwaupElD4caxNdtVW8m3i1nyZ7TwoV4X?= =?us-ascii?Q?vklIzrbxPR/m/30sG2YHs5IsGMyC3t1TRphef6mL/QAmajkbgN2LhmXednDT?= =?us-ascii?Q?AkFyLj4oD6DE0FPBg+tCu9frieh4gxN8j0pzDktQ6S80ZjAy/WmKi0gAvQ6m?= =?us-ascii?Q?nRCU99FyRDW/IFsDEYxkufNQib4/Wg/jY3IyMVbEPxMT8pyXSw6+8uGdLOh5?= =?us-ascii?Q?HGOf6qac3K1kOFlyXEiLorNus0Lrw3cEVGbG8Ni2zuDqcJTsGFT74B/PALoc?= =?us-ascii?Q?cMzuHPDtd9MRjijFGNWI6iO8WRERgVzjiif90bbstHTD6QDEQGmbl5T9CfQm?= =?us-ascii?Q?7wrT7+E9aJkAadf7V8IbJHKq3OQxU3fi4nGDe3+Wvtns9CFCOwRjCvGyVGrf?= =?us-ascii?Q?3hSKBU8XC9FtQQIGKDh2ljb5gz1fDcCalihnINzGZOSVdTauPl3Q5kmp3e3/?= =?us-ascii?Q?D1+7NsBYb5vwaSMW/0Y3ADZB7ModFyupd4O+EMSBjC+PsfBgsEdS4Bnh5HrF?= =?us-ascii?Q?L85k7jJ72URNrzJiF2qbDzJNAe63K+HNtP9DQpE+IA+88BX6+ngAtMzPRoHR?= =?us-ascii?Q?10tO+fh204MdhlMHZlzYaF+A+i5UlVD+MPEYfX2bjiIgqrNKXdz4V732vW4j?= =?us-ascii?Q?sFrOoTXomcO+LE+eA/X0N8uO6ip/sZNEjBVbxEt7oBdIpOQG4Rr/kZCn1BcE?= =?us-ascii?Q?M9jWwieGG1rEyJ4BQ8ih9o9NOhd2repAu3eFgi3yEQ0XoWYwpJXeKcF4qMlc?= =?us-ascii?Q?ygcnnRflT8Zekzl3iaW8SBwSn7vjRMQSP/17idG28fYzCXO1v/jnC83hpInY?= =?us-ascii?Q?ZxgKeqJC4yh3Z2mH/0x5CWJSM0GjpujglzbYX9pb8mEn0K4G+xdRa0FLmqK7?= =?us-ascii?Q?42CwmQFamXsnm09/Et8vjIPaf7K8w8k8meXpGFZNacGaP1uRy+J6wIooMmY2?= =?us-ascii?Q?IIIMHp9vhuhcJaEmDG9tEbKlqWpJZ0axyVUae8jRX2RtdgeydRu86csUMCt1?= =?us-ascii?Q?9d+tRzoa8esfPl4vd0wIiPsgx/C7H8cO7mUm1aaIjwQpSC9Yvnw+8sqkdNk6?= =?us-ascii?Q?arqU1sfaqwlvHRlmm/Zvpgi4FWE0YQiyqQLxGErdSy6RPd5l8KhC22ml5/cB?= =?us-ascii?Q?8a1OkSpdjPc07yuE7Q/RE3vla83geA0wuhz1EvNmFrWptvo/uijNl9IMXOZt?= =?us-ascii?Q?muhF6HDzsxFGRaScaV8bxC/CzlpkK5yOIZXvHyOdWCRQ2vi3sQSrvBx+PAFk?= =?us-ascii?Q?Axq3maanxaeQdcYaP805qxSEzs/TVzULOB2BU+zfOV/oZL4vEXLWJy7p/erS?= =?us-ascii?Q?xywyk+N+8nZkbWRaKGTKSze5hnNk+wt88zhWiRBTWokf9tna6hYNLXpCFHZx?= =?us-ascii?Q?1vvr70OdJgHzPdcdDwcpQovqIFv1eJO7LEOPOhnOSKtaeh9HF5GyfgjJusSh?= =?us-ascii?Q?eJB1IVB6RwR4i0LMPHjW+FnDLg+rYBuDV2Yj6vF7?= Content-Type: multipart/alternative; boundary="_000_PH7PR12MB859639DA1EDC9494B9703817829E2PH7PR12MB8596namp_" MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PH7PR12MB8596.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 05f34361-4b8b-42db-c252-08dcce741948 X-MS-Exchange-CrossTenant-originalarrivaltime: 06 Sep 2024 13:02:01.4870 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: hjcsR3Gv+mnD9ZWzmqz79GjWRlWOa+3jbt2gdvpjbIB4Ud/zGXjVjmf5Pm6UA0KT0LkXaEeg3rJilDCuTJkCXg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB8107 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --_000_PH7PR12MB859639DA1EDC9494B9703817829E2PH7PR12MB8596namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable [AMD Official Use Only - AMD Internal Distribution Only] > > > >> --- a/app/test-pmd/macswap_sse.h > > > >> +++ b/app/test-pmd/macswap_sse.h > > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t > nb, > > > >> uint64_t ol_flags; > > > >> int i; > > > >> int r; > > > >> - __m128i addr0, addr1, addr2, addr3; > > > >> + register __m128i addr0, addr1, addr2, addr3; > > > > Some compilers treat register as a no-op. Are you sure? Did you che= ck > with godbolt. > > > > > > Thank you Stephen, I have tested the code changes on Linux using GCC > > > and Clang compiler. > > > > > > In both cases in Linux environment, we have seen the the values > > > loaded onto register `xmm`. > > > > > > ``` > > > registerconst__m128i shfl_msk =3D _mm_set_epi8(15, 14, 13, 12, 5, 4, > > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+ > > > .LCPI0_0] > > Yep, that what I would probably expect: one time load before the loop sta= rts, > right? > Curious what exactly it would generate then if 'register' keyword is mis= sed? > BTW, on my box, gcc-11 with '-O3 -msse4.2 ...' I am seeing expected > behavior without 'register' keyword. > Is it some particular compiler version that misbehaves? Thank you, Konstantin, for this pointer. I have been trying this understand= this a bit more internally. Here are my observations 1. shuf simd ISA works on XMM register only. 2. Any values from variables has to be loaded to `xmm` register before proc= essing. 3. when compiled for `-march=3Dnative` with compiler not aware (SoC Arch gc= c weights) without patch might have generating with ` movzx eax, BYTE PTR= [rbp-48]` 4. when register keyword is applied for both shufl_mask and addr, the compi= ler generates trying to get the variables directly into xmm using ` vmovdqu= (%rsi),%xmm1` So, I think you are right, from gcc12.3 and gcc 13.1 which supports `-march= =3Dznver4` this problem will not come. > > > > > > > ``` > > > > > > Both cases we have performance improvement. > > > > > > > > > Can you please help us understand if we have missed out something? > > > > Ok, not sure why compiler would not decide to already use a register he= re? --_000_PH7PR12MB859639DA1EDC9494B9703817829E2PH7PR12MB8596namp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
[AM= D Official Use Only - AMD Internal Distribution Only]

<= snipped>
&nbs= p;
>= > > >> --- a/app/test-pmd/macswap_sse.h
>= > > >> +++ b/app/test-pmd/macswap_sse.h
>= > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[],= uint16_t
>= nb,
>= > > >>        uint64_t ol_f= lags;
>= > > >>        int i;=
>= > > >>        int r;=
>= > > >> -     __m128i addr0, addr1, addr2, = addr3;
>= > > >> +     register __m128i addr0, addr1= , addr2, addr3;
>= > > > Some compilers treat register as a no-op. Are you sure? Did= you check
>= with godbolt.
>= > >
>= > > Thank you Stephen, I have tested the code changes on Linux using= GCC
>= > > and Clang compiler.
>= > >
>= > > In both cases in Linux environment, we have seen the the values<= /span>
>= > > loaded onto register `xmm`.
>= > >
>= > > ```
>= > > registerconst__m128i shfl_msk =3D _mm_set_epi8(15, 14, 13, 12, 5= , 4,
>= > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+
>= > > .LCPI0_0]
>=
>= Yep, that what I would probably expect: one time load before the loop star= ts,
>= right?
>= Curious  what exactly it would generate then if 'register' keyword is= missed?
>= BTW, on my box,  gcc-11  with '-O3 -msse4.2 ...'  I am seei= ng expected
>= behavior without 'register' keyword.
>= Is it some particular compiler version that misbehaves?
&nbs= p;
Than= k you, Konstantin, for this pointer. I have been trying this understand thi= s a bit more internally. Here are my observations
&nbs= p;
1. s= huf simd ISA works on XMM register only.
2. A= ny values from variables has to be loaded to `xmm` register before processi= ng.
3. w= hen compiled for `-march=3Dnative` with compiler not aware (SoC Arch gcc we= ights) without patch might have generating with ` movzx   eax, BYTE PTR [rbp-48]`
4. w= hen register keyword is applied for both shufl_mask and addr, the compiler = generates trying to get the variables directly into xmm using ` vmovdqu (%r= si),%xmm1`
&nbs= p;
So, = I think you are right, from gcc12.3 and gcc 13.1 which supports `-march=3Dz= nver4` this problem will not come.
&nbs= p;
>=
>= > >
>= > > ```
>= > >
>= > > Both cases we have performance improvement.
>= > >
>= > >
>= > > Can you please help us understand if we have missed out somethin= g?
>= >
>= > Ok, not sure why compiler would not decide to already use a register = here?
--_000_PH7PR12MB859639DA1EDC9494B9703817829E2PH7PR12MB8596namp_--