From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0051.outbound.protection.outlook.com [104.47.42.51]) by dpdk.org (Postfix) with ESMTP id B6F245B26; Sat, 6 Oct 2018 09:41:59 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z5vgECTU1HqA+Yid9MeKlXsALwp30x0eNrKEkHyOP+o=; b=ZSRmx203NglDiJFvWhGl1TpSMF4id/1JYJxUEgWI9QVro+eU5l2Zj2nFlfTSobdI8SBs0/grYyYreafbChtxQhuJwmfM75QpzrkQ+XHVlAn7mJ+/r3CBXI3TP/XHWIG5x4hCIPVG9xmM7S93Xp0asMqufrCJS+RqMsH1rW7B99A= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; Received: from jerin (115.113.156.3) by DM6PR07MB5003.namprd07.prod.outlook.com (2603:10b6:5:25::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1185.25; Sat, 6 Oct 2018 07:41:53 +0000 Date: Sat, 6 Oct 2018 13:11:35 +0530 From: Jerin Jacob To: Ola Liljedahl Cc: Honnappa Nagarahalli , "Ananyev, Konstantin" , "Gavin Hu (Arm Technology China)" , "dev@dpdk.org" , Steve Capper , nd , "stable@dpdk.org" Message-ID: <20181006074126.GA16715@jerin> References: <621E373E-048D-4808-8CE8-84373EA98D2F@arm.com> <2601191342CEEE43887BDE71AB9772580102FE2951@IRSMSX106.ger.corp.intel.com> <20181005170725.GA18671@jerin> <1555626C-F2B8-44EB-98A3-79B1F7002587@arm.com> <60055965-A7C8-4E9F-8668-0AE1DCE57515@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <60055965-A7C8-4E9F-8668-0AE1DCE57515@arm.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Originating-IP: [115.113.156.3] X-ClientProxiedBy: PN1PR01CA0087.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c00:1::27) To DM6PR07MB5003.namprd07.prod.outlook.com (2603:10b6:5:25::24) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c6db46a9-f408-4de6-6619-08d62b5f316c X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(7168020)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(2017052603328)(7153060)(7193020); SRVR:DM6PR07MB5003; X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB5003; 3:SenRMEAH4GSwXH5rzVviRBlVdI5uuIFKVyiMOpcGDuuLFjpQdKlRuHTpljjBm0OvshKlgn42EOctOzDmzfWx++m+Gv9GsOCj2un3/EiEAoBEGOMkckUdtsCaSLsVuEPTjInuiI9pjP8Y/iOgNps0RL81AYFg13LVxcoKCyiUzvD/ms1Ic5XiAiwevt4VVgGQv36uObOo98+dZ8OQBpovFndlJOsKHgg27T+DTDK6E/yCMOPJyi1HI80WQWKhkSLr; 25:+qDBr90UR6fspWjU5hWTeLRA8PxdAK4vfuSM4zE8pl9Ipa/N9WqQ8EHtPvCefVGjMbiyztCA0Kv8OINUYXouJcnGRtZzO68evTwlJNsK5Jyw8KG9zJzdKQVtjyL9u47mfAey04FIKQXzgX9y257z7HfekehEEYChc0siDZXLQXWnWLZPQelwxkpSoQ6bz6k2FNY87NCA0WI1vyczbxQqG0+giyTnCP5E0MUI13y6aJvNexzROTJelumeC2ZsXu0ArQ9ArtkCwMuuGRVnnxbfjeN7px+CR2aGZs6yt+AzJlsw4i2b4LDN/fKt4vMWH69YVO3xoeJGDRuVA4S0aPFQ4A==; 31:gxAe6YrUYaODD1Hhg7Bj8+/njrTMJ7LpfONyn1vzVbhjL63uv1uULWGMwW33HWx0aWISpS3YmA/Y+Ipcz7Q5Kq4Np+Uivc3uaL8c1oHDtYu91L3urK/zh0ihm7zIhBpO2CI9u6WskV7zC/ko9AZxY3mEf14cFLp3tZ9+DMDVPz2R3LPjbE1a7R/Jkf8ONA2y+kq6K5q35eOVRqadz8nzkPVQmVd5p58wP8ShcctvBb8= X-MS-TrafficTypeDiagnostic: DM6PR07MB5003: X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB5003; 20:L5+d0B6Xg8Fk0+2yIxIUse+DF2kt7qTFpN1fR9wdtwVjldU5c3zd+sjzXmNijIST4focFu+Y7dCkTDrDJY8cPCXwQOXflY05V2Rlp2C3ubohpmvCOHEwDcucYPkUKzQSgQGEVRqZJDkyhAv4LhsnZmUIiwq4dILGZRjYMIXTdA1wH+JFKxgnmuYZ0mEnxEYnMAWzrZqVRAOf+RgCeTsL28EAVqFg+BGJgx6+nLC3bZiIbJzJfTI8dK9nuz4s+w0GzNbGFOtuoxHP4Pqh4TBX8HesOQSy8rrXAYdkcZLVCdmG1o9CFaI2gZUP9C7K8NvsLLByWssgzUaPWy927qbuVaBjO2su2gk/V/peATxxTmEn3Em6EFVeakqRdOjYppPJBlhv3gEhiYy7NOETbpy/qDltgV0Te5MeyLFUS5XD6FKqAt5b4jVVRkcQ9jylntyjWdR9NI/3WVzA8v9fXs+cUZXtlGnFxkq6Kjl3oQvpqXs1HbvnZCvQkJWNWpUiUgXY7SDxLsP9Wr0p8tQ6a3dlx+3i+qfpqU3+PCEMS6m9DOvU123m5Cew7sJGHuMHTXFJczsjO0nmevR2ojbf7H5UM6XuAVsotJ2pP5u62jJleiM= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917)(228905959029699)(100405760836317); X-MS-Exchange-SenderADCheck: 1 X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231355)(944501410)(52105095)(93006095)(10201501046)(3002001)(149066)(150057)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123562045)(20161123558120)(20161123560045)(201708071742011)(7699051); SRVR:DM6PR07MB5003; BCL:0; PCL:0; RULEID:; SRVR:DM6PR07MB5003; X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB5003; 4:O3Mkd8nFA/QPvGGLIdL6/cFB+QuBn1k2Yr4iPiBIPCQNxmlp4la2fXz4q89mrtMuqWjZnLqTQeMMYuS+ZBwT0IurYSYqNv2bO4gv6+Gcm9nfqGrSKn5o3xL5I6hwZ2IPVnAn5bMuDm9fPv1bBB+94gYHz9DL9apvjNlY1j0fw0n2EZqL2Ku+dzbxDeb4TMbiT6awag1qkTmaaFx9qYtVqGLiLXkhbBBrEp3mA9elWTqp6mFmUwDuOSPwOM98xLxpK0/wbL1KPPTxmMiO5ENuQ7Tuu6Zr9Rn7u0sX/Q/5G7na5j2Jh2pbKeIoCET9WkHKyQ1gC1GWN3qcJtGy+tY1T5kAvLYY5v4gfgEAc0Gs4sk3mAR+nS4Z/5YLTIEfkLpP X-Forefront-PRVS: 0817737FD1 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(366004)(396003)(39850400004)(136003)(376002)(346002)(13464003)(189003)(199004)(19627235002)(8936002)(42882007)(575784001)(2870700001)(53936002)(229853002)(486006)(345774005)(105586002)(106356001)(68736007)(2906002)(4326008)(72206003)(966005)(45080400002)(956004)(478600001)(446003)(11346002)(97736004)(14444005)(476003)(25786009)(5660300001)(26005)(66066001)(55236004)(47776003)(186003)(6246003)(16526019)(4744004)(7736002)(305945005)(6916009)(33656002)(93886005)(1076002)(54906003)(9686003)(6306002)(58126008)(81166006)(81156014)(8676002)(6666003)(33896004)(50466002)(316002)(23676004)(2486003)(76176011)(53546011)(44832011)(52146003)(386003)(6496006)(52116002)(6116002)(55016002)(3846002)(33716001)(10126625002)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM6PR07MB5003; H:jerin; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtETTZQUjA3TUI1MDAzOzIzOkFRcVJacGkveitJSnZvUU1MTnZLWGRhSXJW?= =?utf-8?B?dVpuOXBLK2x5UzQxOVlpc3lHcEVVZGJMR0ZUeGVjcWVWeis4ejRGTjZITnZn?= =?utf-8?B?Tkh5NG9zTy9DRUIwbi9MdnUvRk5yY0xERFNUb2VZUngvYi93a0JnL2ZFRkky?= =?utf-8?B?MFNoemM0ZlNpNEMxcmxkc2dBbG1WM3R2aWZFWjJ0ZHduL3hUb3RacDV2R2sr?= =?utf-8?B?Q3laaEtQNzRKZnM3ODN3QjN3Y0dyYjNRMGJ5UWRteU1MUmdoaHZwdFQ5N1c3?= =?utf-8?B?MVRKMVhGMldjc1VrTVVhSk95RDlYUEwxRG1uL0xITGIvYUhva1ljaVc1TFNV?= =?utf-8?B?eWlYYmw1UTJxSmg0V2dkZGdwOTNHMlhEQkprOEFsT3grdDB0WEo0R1ZQbGtH?= =?utf-8?B?R3VXczRxeEhwQ3l1OGVsWDZOdFQ0aGlBRVAyUUtYMEU1T29pMkxqR1gxWFVH?= =?utf-8?B?UnZ6Y0JWdDBsaGM2SUpCeWNyWTdWY1M5TzBBSEQrdlRybFlDTm0wMGhpVmFR?= =?utf-8?B?Z0w0ZnJCQjI3d0lKaDAwb2tkbjB3TmZmcGVZT1B6L1QyaUhtS1R5bjNzQTBS?= =?utf-8?B?WXFMZ3R4N0FHa045RndlNSs1VWkzcnpIWFFyaThJUkhOMVN0YUp6emd3ZTVo?= =?utf-8?B?SlZxNlQ2Z2JHallTd3ROYjdpSG83QlRCZWVlWGZrTFpQSTA3bGY4YTllTUtH?= =?utf-8?B?dGVDVE1pSWdtVXZUWks1OHV5QUFpZHhMZWlBN0hYSmx3NnJDdTNocmVNd1M4?= =?utf-8?B?WEJSbThOWndOWnQzdWxrak9iRjA2SkxPOFoyZXdNcDdjbzdwbFBXa0NSNkdt?= =?utf-8?B?eDBlMFlJNCt3dE1ScytVZ0xXWUxFT01qcmx6anRtVmNXZGtYNUlWR094TGpQ?= =?utf-8?B?ekZ1TkJaOWFJSitMc2YxOGRRRkREVGszMkFOUGsxeFZFcTVNVExqcldzUWdq?= =?utf-8?B?ZUpBekZpSUN0TDZJRHlpajlrMHJIME1uWjljTW5yc1ZCU0lWSnE2dUtqWE12?= =?utf-8?B?YjlhNkRLT05lSlpydEZTQ29HUy9MVk5uVmlSZnBSSjdrSWw5aVNTOHY2dExP?= =?utf-8?B?bTJJdkF1Q1VCZHpwSnlhV0pGMWI2bFE4Tk1TMjVoeGFzVGQzRThSQ25DbUZo?= =?utf-8?B?WlBJZmRtMWpVWDRHbjQyT09WOFovZXplb01Hc0hhUWR3RnlhRXdhbllCdGFh?= =?utf-8?B?U1R6b3VVdnNuNHZhdXg2dlRQam5rclFDN1dZRy9zbmVwcFBiNXpwcWc4T0dv?= =?utf-8?B?b3pDUktLOEg2WHVKQmxmbzIwQndQSFZMeDlCTklKK1o5NE1WektjQThhcUdr?= =?utf-8?B?Z08xaUM3TXk4RWtmSmg1cGlXUnc5NFd0NVErMXBTQjl6VVQ0OVZobG00U1BL?= =?utf-8?B?MVZTV0VRRzJIQ0l1UkJHbFdzOHhPdW9pNW12Y1BvUUhKTzRIbkRoMm8ybVZO?= =?utf-8?B?Q2g1andadDVPdE1Jbis1WFFOS01RZHFPdTl0TmJwNW16a3BISEZZVDF3S0lz?= =?utf-8?B?U1h5cm8yNjRreGhBZFF1M3duODVxNmdxcmhHL0JwVVN2bnhpbTNYMGpXQnlU?= =?utf-8?B?VjM2eWVrTnEvZDFzYWl4ZE03V0p0bnQrdFZ4aDFhelVhbVhteFpVK3lmaEc1?= =?utf-8?B?MW1SeUlpTjJENnAxSW1jWlNRNCs3bFNGSHJqRFlvRVNiR2g5bUptK1A4bHQ2?= =?utf-8?B?YSszZWUrbFIyVXhHcUhtUXREcTdSTmI5Z2RFaGo4c2JiN1hNTTBTQm5uQjEx?= =?utf-8?B?OVdCUkdybGorSmRubWJFZXhVcjBmMnJla0pudDFiaTIwQkdaSjJnNUZkSWN0?= =?utf-8?B?cG1weVRMdnBYVUhLQUxzU3hHSTdlRm9qSEFJSGtLSW1xa1MyRm1PbjNpNVVT?= =?utf-8?B?TUVvKytnWVVLTjRKclIzeUJZL21DcndQeGdYMHErL0NpRkhWQWp5Z3BUR2dF?= =?utf-8?B?NGVGbW9wOGVWOEw3MFhIK25NMnErRUtVa3krWFhtMEdxWVVsTkxHMnBWcjRI?= =?utf-8?B?ZTNyT2xjUlNzQ0MwRnZlYjhOa3NhL1h2UE9zOXRPVnJOWjlJT2EvYXBJQ0hY?= =?utf-8?B?Z2pIQytya1BhN0taNk9rd3p5QXpoeFlMV3J1b3RlMGk2dkdwbjJEMUhjbUJj?= =?utf-8?B?clpScW51enpseEZhdVBBaTg3VlNBcmZzOFVxcXJqRXVUcEtUYWJLdzJNbmRY?= =?utf-8?B?VkJMM0twOHN6V0VNU2thKzRJc2lpWVJjdkpXUktrR3FJSEw4UnVzaXdnVGtm?= =?utf-8?Q?oenFn9i5Po9JOUN7Re?= X-Microsoft-Antispam-Message-Info: VYzhG4S82qEj0hkNyapZTFnPh/GACpEknYh3Yq72SMBw9O8VxUApt1KmdlAc8IfErB5kYqSO/R1y/nnL2n5AY2KnwJVKXmKymeGQcUkn6RfDU3feZL6B5WNmzEqAlCaO9I8jnTBK0F47yNSR/4fq+06kogsQ9sRkC6WMrDIJw2dsoiaUxkJU3qHp+bmVGB5NiwuwnL0u5uiu1hCZZcu6/STYCBBU4rktS91XNbleBiL85nNGYi4NBicI+c9ISuJHHZJ07r5Jbff4CL0Smbh7NTuA9+M4T45njqiSc4+uqZnwd3kOOYoGP52+PK1z5C+fvf6NdpcQJs0NMNO0/CwGXW9fom8Rlh/s/z53TMo4Vm0= X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB5003; 6:hdYwPmNxrJB8Yp2JH1rZ7HpCq8xBMQIO3cZ939PLtqYUwhL/+TqvFWcMz3CFv/dGya3Cj0d4gBhlVkQIbmalQ9jcLDQiCAmU/1BRcPES/t63dHH16jumDyCI+MB/twIWG2Sk4ffoB8ee4muAtFvLFtOE90MmTEjd85waAjoHKYQb+ELmnTAaIJZVHaEnlDPMt+rD14Irtx4kCYBQjRN1XKo09oMP0fSJjjgsoWcO8x+amOB+sw9UXkk3FPfUSHwPJnfSwwBpFw0N05QWsiFDg/5Hg5min+fQs5KKBwdBNvW1e8ed+eZ03z7wJ0LF8QfuHCtoOnhLROgWYYMfAPEDZhVfmCahlHXrEulGogQNjkYoC3RhR/5nvzPcrEsWYoVoXbdU6ZOJTbJOG6ceKwc66M1GDcluE4X/mI2gypTvqR5wQjQEQZxkfqL06zveFg4lOSwk7Y8g8bvwSp5JbV2Gbw==; 5:lBGDKQkOV10Ubj7K80s8AvPqUMxcu+voPW0DMt8PgkiZsl9jIXgqc1Cdt0X8qtZcpgBQ+E9ntdRVrT28DKs9yx5ivjDNSxgiZdKygXvzMhq66IkvFnw4sdHTW/m6TIRd7Hmbfe2ACArprMVfFFS1wII3ge1ntg80dm4mXk/w+ts=; 7:g4HVS0aH6CXKy07+jeYNQrB4V+J2r2MCCT+VDJ07E45o5TtWy3XIwkEnoXUFe6mh1TRXdo70RJkMob3uIVc8zwHzHKaJ80Pp9TIuGvu0uIRa4CPQxGNG1DZ42hJHtC6lqlWO2oSsDSlACdSJlBXDxBVJWx9yil8LM03hbh80vM/7/WSXmon/eszXfA3Uoi7f1tSu9k14qofF0Ctw/PNk5z5kR82NMaw0mG7LBfM9CG5cL+75PxstNO9b12CA+8+V SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Oct 2018 07:41:53.9641 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c6db46a9-f408-4de6-6619-08d62b5f316c X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR07MB5003 Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2018 07:42:00 -0000 -----Original Message----- > Date: Fri, 5 Oct 2018 20:34:15 +0000 > From: Ola Liljedahl > To: Honnappa Nagarahalli , Jerin Jacob > > CC: "Ananyev, Konstantin" , "Gavin Hu (Arm > Technology China)" , "dev@dpdk.org" , > Steve Capper , nd , "stable@dpdk.org" > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load > user-agent: Microsoft-MacOutlook/10.10.0.180812 > > External Email > > On 05/10/2018, 22:29, "Honnappa Nagarahalli" wrote: > > > > > I doubt it is possible to benchmark with such a precision so to see the > > potential difference of one ADD instruction. > > Just changes in function alignment can affect performance by percents. And > > the natural variation when not using a 100% deterministic system is going to > > be a lot larger than one cycle per ring buffer operation. > > > > Some of the other patches are also for correctness (e.g. load-acquire of tail) > The discussion is about this patch alone. Other patches are already Acked. > So the benchmarking then makes zero sense. Why ? > > > > so while performance measurements may be interesting, we can't skip a bug > > fix just because it proves to decrease performance. > IMO, this patch is not a bug fix - in terms of it fixing any failures with the current code. > It's a fix for correctness. Per the C++11 (and probably C11 as well due to the shared memory model), we have undefined behaviour here. If the compiler detects UB, it is allowed to do anything. Current compilers might not exploit this but future compilers could. All I am saying this, The code is not same and compiler(the very latest gcc 8.2) is not smart enough understand it is a dead code. I think, The moment any __builtin_gcc comes the compiler add predefined template which has additional "add" instruction. I think this specific case, we ALL know that, a) ht->tail will be 32 bit for life long of DPDK, it will be atomic in all DPDK supported processors b) The rte_pause() down which never make and compiler reordering etc. so why to loose one cycle at worst case? It is easy loose one cycle and it very difficult to get one back in fastpath. > > > > > > > -- Ola > > > > On 05/10/2018, 22:06, "Honnappa Nagarahalli" > > wrote: > > > > Hi Jerin, > > Thank you for generating the disassembly, that is really helpful. I > > agree with you that we have the option of moving parts 2 and 3 forward. I > > will let Gavin take a decision. > > > > I suggest that we run benchmarks on this patch alone and in combination > > with other patches in the series. We have few Arm machines and we will run > > on all of them along with x86. We take a decision based on that. > > > > Would that be a way to move forward? I think this should address both > > your and Ola's concerns. > > > > I am open for other suggestions as well. > > > > Thank you, > > Honnappa > > > > > > > > So you don't want to write the proper C11 code because the compiler > > > generates one extra instruction that way? > > > You don't even know if that one extra instruction has any measurable > > > impact on performance. E.g. it could be issued the cycle before together > > > with other instructions. > > > > > > We can complain to the compiler writers that the code generation for > > > __atomic_load_n(, __ATOMIC_RELAXED) is not optimal (at least on > > > ARM/A64). I think the problem is that the __atomic builtins only accept > > a > > > base address without any offset and this is possibly because e.g. > > load/store > > > exclusive (LDX/STX) and load-acquire (LDAR) and store-release (STLR) > > only > > > accept a base register with no offset. So any offset has to be added > > before > > > the actual "atomic" instruction, LDR in this case. > > > > > > > > > -- Ola > > > > > > > > > On 05/10/2018, 19:07, "Jerin Jacob" > > > wrote: > > > > > > -----Original Message----- > > > > Date: Fri, 5 Oct 2018 15:11:44 +0000 > > > > From: Honnappa Nagarahalli > > > > To: "Ananyev, Konstantin" , Ola > > > Liljedahl > > > > , "Gavin Hu (Arm Technology China)" > > > > , Jerin Jacob > > > > > > CC: "dev@dpdk.org" , Steve Capper > > > , nd > > > > , "stable@dpdk.org" > > > > Subject: RE: [PATCH v3 1/3] ring: read tail using atomic load > > > > > > > > > > > Hi Jerin, > > > > > > > > > > > > > > Thanks for your review, inline comments from our > > internal > > > > > discussions. > > > > > > > > > > > > > > BR. Gavin > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Jerin Jacob > > > > > > > > Sent: Saturday, September 29, 2018 6:49 PM > > > > > > > > To: Gavin Hu (Arm Technology China) > > > > > > > > > > Cc: dev@dpdk.org; Honnappa Nagarahalli > > > > > > > > ; Steve Capper > > > > > > > > ; Ola Liljedahl > > > ; > > > > > nd > > > > > > > > ; stable@dpdk.org > > > > > > > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic > > load > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > Date: Mon, 17 Sep 2018 16:17:22 +0800 > > > > > > > > > From: Gavin Hu > > > > > > > > > To: dev@dpdk.org > > > > > > > > > CC: gavin.hu@arm.com, > > Honnappa.Nagarahalli@arm.com, > > > > > > > > > steve.capper@arm.com, Ola.Liljedahl@arm.com, > > > > > > > > > jerin.jacob@caviumnetworks.com, nd@arm.com, > > > > > stable@dpdk.org > > > > > > > > > Subject: [PATCH v3 1/3] ring: read tail using atomic > > load > > > > > > > > > X-Mailer: git-send-email 2.7.4 > > > > > > > > > > > > > > > > > > External Email > > > > > > > > > > > > > > > > > > In update_tail, read ht->tail using > > > __atomic_load.Although the > > > > > > > > > compiler currently seems to be doing the right thing > > even > > > without > > > > > > > > > _atomic_load, we don't want to give the compiler > > > freedom to > > > > > optimise > > > > > > > > > what should be an atomic load, it should not be > > arbitarily > > > moved > > > > > > > > > around. > > > > > > > > > > > > > > > > > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model > > > barrier > > > > > option") > > > > > > > > > Cc: stable@dpdk.org > > > > > > > > > > > > > > > > > > Signed-off-by: Gavin Hu > > > > > > > > > Reviewed-by: Honnappa Nagarahalli > > > > > > > > > > > > > > Reviewed-by: Steve Capper > > > > > > > > > Reviewed-by: Ola Liljedahl > > > > > > > > > --- > > > > > > > > > lib/librte_ring/rte_ring_c11_mem.h | 3 ++- > > > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > The read of ht->tail needs to be atomic, a non-atomic > > read > > > would not > > > > > be correct. > > > > > > > > > > > > That's a 32bit value load. > > > > > > AFAIK on all CPUs that we support it is an atomic operation. > > > > > > [Ola] But that the ordinary C load is translated to an atomic > > load > > > for the > > > > > target architecture is incidental. > > > > > > > > > > > > If the design requires an atomic load (which is the case here), > > we > > > > > > should use an atomic load on the language level. Then we can > > be > > > sure it will > > > > > always be translated to an atomic load for the target in question > > or > > > > > compilation will fail. We don't have to depend on assumptions. > > > > > > > > > > We all know that 32bit load/store on cpu we support - are atomic. > > > > > If it wouldn't be the case - DPDK would be broken in dozen places. > > > > > So what the point to pretend that "it might be not atomic" if we > > do > > > know for > > > > > sure that it is? > > > > > I do understand that you want to use atomic_load(relaxed) here > > for > > > > > consistency, and to conform with C11 mem-model and I don't see > > any > > > harm in > > > > > that. > > > > We can continue to discuss the topic, it is a good discussion. But, as > > far > > > this patch is concerned, can I consider this as us having a consensus? > > The > > > file rte_ring_c11_mem.h is specifically for C11 memory model and I also > > do > > > not see any harm in having code that completely conforms to C11 > > memory > > > model. > > > > > > Have you guys checked the output assembly with and without atomic > > > load? > > > There is an extra "add" instruction with at least the code I have > > checked. > > > I think, compiler is not smart enough to understand it is a dead code > > for > > > arm64. > > > > > > ➜ [~] $ aarch64-linux-gnu-gcc -v > > > Using built-in specs. > > > COLLECT_GCC=aarch64-linux-gnu-gcc > > > COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/8.2.0/lto- > > > wrapper > > > Target: aarch64-linux-gnu > > > Configured with: /build/aarch64-linux-gnu-gcc/src/gcc-8.2.0/configure > > > --prefix=/usr --program-prefix=aarch64-linux-gnu- > > > --with-local-prefix=/usr/aarch64-linux-gnu > > > --with-sysroot=/usr/aarch64-linux-gnu > > > --with-build-sysroot=/usr/aarch64-linux-gnu --libdir=/usr/lib > > > --libexecdir=/usr/lib --target=aarch64-linux-gnu > > > --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-nls > > > --enable-languages=c,c++ --enable-shared --enable-threads=posix > > > --with-system-zlib --with-isl --enable-__cxa_atexit > > > --disable-libunwind-exceptions --enable-clocale=gnu > > > --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object > > > --enable-linker-build-id --enable-lto --enable-plugin > > > --enable-install-libiberty --with-linker-hash-style=gnu > > > --enable-gnu-indirect-function --disable-multilib --disable-werror > > > --enable-checking=release > > > Thread model: posix > > > gcc version 8.2.0 (GCC) > > > > > > > > > # build setup > > > make -j 8 config T=arm64-armv8a-linuxapp-gcc CROSS=aarch64-linux- > > gnu- > > > make -j 8 test-build CROSS=aarch64-linux-gnu- > > > > > > # generate asm > > > aarch64-linux-gnu-gdb -batch -ex 'file build/app/test ' -ex > > 'disassemble /rs > > > bucket_enqueue_single' > > > > > > I have uploaded generated file for your convenience > > > with_atomic_load.txt(includes patch 1,2,3) > > > ----------------------- > > > https://pastebin.com/SQ6w1yRu > > > > > > without_atomic_load.txt(includes patch 2,3) > > > ----------------------- > > > https://pastebin.com/BpvnD0CA > > > > > > > > > without_atomic > > > ------------- > > > 23 if (!single) > > > 0x000000000068d290 <+240>: 85 00 00 35 cbnz w5, 0x68d2a0 > > > > > > 0x000000000068d294 <+244>: 82 04 40 b9 ldr w2, [x4, #4] > > > 0x000000000068d298 <+248>: 5f 00 01 6b cmp w2, w1 > > > 0x000000000068d29c <+252>: 21 01 00 54 b.ne 0x68d2c0 > > > // b.any > > > > > > 24 while (unlikely(ht->tail != old_val)) > > > 25 rte_pause(); > > > > > > > > > with_atomic > > > ----------- > > > 23 if (!single) > > > 0x000000000068ceb0 <+240>: 00 10 04 91 add x0, x0, #0x104 > > > 0x000000000068ceb4 <+244>: 84 00 00 35 cbnz w4, 0x68cec4 > > > > > > 0x000000000068ceb8 <+248>: 02 00 40 b9 ldr w2, [x0] > > > 0x000000000068cebc <+252>: 3f 00 02 6b cmp w1, w2 > > > 0x000000000068cec0 <+256>: 01 09 00 54 b.ne 0x68cfe0 > > > // b.any > > > > > > 24 while (unlikely(old_val != __atomic_load_n(&ht->tail, > > > __ATOMIC_RELAXED))) > > > > > > > > > I don't want to block this series of patches due this patch. Can we > > make > > > re spin one series with 2 and 3 patches. And Wait for patch 1 to > > conclude? > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > But argument that we shouldn't assume 32bit load/store ops as > > > atomic > > > > > sounds a bit flaky to me. > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But there are no memory ordering requirements (with > > > > > > > regards to other loads and/or stores by this thread) so > > > relaxed > > > > > memory order is sufficient. > > > > > > > Another aspect of using __atomic_load_n() is that the > > > > > > compiler cannot "optimise" this load (e.g. combine, hoist etc), it > > has > > > to be > > > > > done as > > > > > > > specified in the source code which is also what we need > > here. > > > > > > > > > > > > I think Jerin points that rte_pause() acts here as compiler > > > barrier too, > > > > > > so no need to worry that compiler would optimize out the > > loop. > > > > > > [Ola] Sorry missed that. But the barrier behaviour of > > rte_pause() > > > > > > is not part of C11, is it essentially a hand-made feature to > > support > > > > > > the legacy multithreaded memory model (which uses explicit > > HW > > > and > > > > > compiler barriers). I'd prefer code using the C11 memory model > > not to > > > > > depend on such legacy features. > > > > > > > > > > > > > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > One point worth mentioning though is that this change is > > for > > > > > > the rte_ring_c11_mem.h file, not the legacy ring. It may be > > worth > > > persisting > > > > > > > with getting the C11 code right when people are less > > excited > > > about > > > > > sending a release out? > > > > > > > > > > > > > > We can explain that for C11 we would prefer to do loads > > and > > > stores > > > > > as per the C11 memory model. In the case of rte_ring, the code is > > > > > > > separated cleanly into C11 specific files anyway. > > > > > > > > > > > > > > I think reading ht->tail using __atomic_load_n() is the > > most > > > > > appropriate way. We show that ht->tail is used for > > synchronization, > > > we > > > > > > > acknowledge that ht->tail may be written by other > > threads > > > > > > without any other kind of synchronization (e.g. no lock involved) > > > and we > > > > > require > > > > > > > an atomic load (any write to ht->tail must also be atomic). > > > > > > > > > > > > > > Using volatile and explicit compiler (or processor) > > memory > > > barriers > > > > > (fences) is the legacy pre-C11 way of accomplishing these things. > > > > > > There's > > > > > > > a reason why C11/C++11 moved away from the old ways. > > > > > > > > > > > > > > > > > > __atomic_store_n(&ht->tail, new_val, > > > __ATOMIC_RELEASE); > > > > > > > > > -- > > > > > > > > > 2.7.4 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >