From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0045.outbound.protection.outlook.com [104.47.42.45]) by dpdk.org (Postfix) with ESMTP id 733FA160; Mon, 8 Oct 2018 12:50:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GDCPtL7YkTJyJsfddv1Q7Cuw6noutawp5gPgtf/Qogg=; b=S2Yw6YaHMFk7NuZWZL20eMO8lbbQ1SvoiLfJTUopdR0UwM4jN4HhoIC0UqcRxrmszvkBeqpIw2VLDqALuERRTRS8L6iNTkVbrUGH2b4pBI+hhhl0EesWYILJfSaACc7LALqW61pEP23qYfto8n3Srq0lEjRwxmhnkRwKZeF86+M= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; Received: from jerin (115.113.156.3) by DM6PR07MB5002.namprd07.prod.outlook.com (2603:10b6:5:25::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1207.26; Mon, 8 Oct 2018 10:50:00 +0000 Date: Mon, 8 Oct 2018 16:19:45 +0530 From: Jerin Jacob To: "Gavin Hu (Arm Technology China)" Cc: Ola Liljedahl , "dev@dpdk.org" , Honnappa Nagarahalli , "Ananyev, Konstantin" , Steve Capper , nd , "stable@dpdk.org" Message-ID: <20181008104944.GD11081@jerin> References: <60055965-A7C8-4E9F-8668-0AE1DCE57515@arm.com> <20181006074126.GA16715@jerin> <20181007040243.GA1850@jerin> <7A156041-23EC-4CCB-B129-3607AF34A992@arm.com> <20181008060629.GA5228@jerin> <063A95EC-CFC1-42F7-B864-DFB9C6718AC8@arm.com> <20181008100004.GB11081@jerin> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Originating-IP: [115.113.156.3] X-ClientProxiedBy: BM1PR01CA0107.INDPRD01.PROD.OUTLOOK.COM (2603:1096:b00::23) To DM6PR07MB5002.namprd07.prod.outlook.com (2603:10b6:5:25::23) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 5a3a5b78-c7e0-4565-baad-08d62d0bce45 X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(7168020)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(2017052603328)(7153060)(7193020); SRVR:DM6PR07MB5002; X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB5002; 3:/52WDrfoSpXTRzQ43zCEWjcbqkhd12ZZFk2n+JIH7EjM3ZCT05QmurI73rBdCMzNBOnF6y0I8GX36tyoQygtTK6jbzeGYlhaGVTUE2vom/RIZo90R0d74tWgDzeoq3UPN1tHG4jtS6c7J89ngkiglVbO2xvNZaksV95jCPdGQG2VGi9WIQ4LNMqhnKAvMtMLO8bb8RWVF3vEU/92NicMTkp5UAZN0YcvJy60VX4R962aiVY1bzLO/2ld65X5yj1W; 25:mXK0+u7frDCx5KYtWGWgfajPFfN+HXeVJ7ikbxnedPVtE29/Q7Jm6Q/JESDRKjZ/bW5JwCMSVnW75Y2Z1AgbMbNYOWE8W8SYjJZwXeTMRqAGr0Ot1aTlmqcIOYtJI+ODE11Cprk3E4JB3OrUAfY2Da+Iw1XIMNr7uySKrF6m5/jOGothtfY5AUvQzsl5WvtwpfCwZW0t9Z3V4nfTVhPzj0ATP0rGtkY/Q7b1VqtlS9JS5yP5Z1rY0cfvOoYfz1xt+ukctICEFJctviqYX2tki1NTM8+mygcSi9wSNGcz9zkuICB2rpSWKkVVr/fj4HswWX12m/OP/2fNn0utzupd8w==; 31:OY6sHfvtmrLVd3QUV3jxVhgLr4EoC8kMtQCKJZUbD6FkcC4hcRQo+WuwxmLS2mcCzwFJTm6qUvkUxDm6G5G3GgNy3nDuOcLXD6tViPT6KbfDdpWnJYwOQIL8BWBYZj7tmltYJfDj0kZVsG4Q68B4hTjYl2EwVmYXPrLtswgfjZhOMin5g4V6u0egXVGYkvBvDFlLK0lCx3eWQqQScxTy6JY6BmJU2kZWcuTm7KmMjmQ= X-MS-TrafficTypeDiagnostic: DM6PR07MB5002: X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB5002; 20:O8RFMBp9U5R37UpcIDCW9IEMwwCAexeIsZH1NKH+Uhlu9eHlAPb9LykXgTi+xEICOdj2RXRz8S04BBDK7SJjik9T3CKXSGIJua1x1m0MyS0yqlDD/k1DO2BQLeXKv/OXPUnkSI7zzm0ZVwG/2dHt3aaHGFHOR7dvwReUDbP6HcBlvISJmyJzJJv+EZ1qWtHXPHBE/5z1zVPAhBVldBUnBBU3/dy0L1eUQMLIrnJRM7s8icdF9LemmNsOdNMHgyVn/aHce8nNRUho76em4g8+80aczU2ExRK0MiJRLQpcSvWo9M3vGo2waUAF2fvxUEz3L6I4K0+/4kD5o54iqSgC8exyNDNa5ZhFXW14UBfcGw8s7BbiIV6J2PEOsgzfrgUfIqsGSieyeZcDX66EbSJZYU8tJ7eGJp7/F3s9A5I0taXRtJGGBjlu/oRqVfbrxhlVJS3zppUk7sb5RcdmPL2LS5JqEz9G4A88WOkpwmt9lhzWj1OzvhAczjfq4XOIRjjY2gvKUSiz5n4iM2keeRZF9h3l2ssjSiKUX/m98rx9e4VXwlQA7AJWlbWA2DcVC8p7UNyafL1533RCgDEf8ao0emIKcsSPAkitfC5qrF1/xbg= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917)(228905959029699)(166708455590820); X-MS-Exchange-SenderADCheck: 1 X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(3002001)(93006095)(3231355)(944501410)(52105095)(10201501046)(149066)(150057)(6041310)(20161123558120)(20161123560045)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(201708071742011)(7699051)(76991055); SRVR:DM6PR07MB5002; BCL:0; PCL:0; RULEID:; SRVR:DM6PR07MB5002; X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB5002; 4:HQFpKfAtsAJxy3B0kZHYw06e7xqsabmzB9qjhT+PokZz7SpOIPHSecSwNrwvrttv2pAQr2oTWYIxBzci/+TyNiV3PDIIb2zNp4+RFrfVLnhfj8rPLs5Ul58LzDZhurEgKsGGugp6W8+SgCwZZLBojq5yt9yPfst43G+3CUmcgtIX2u4VAL4yDvRqXoDJ6+AQYgJvDqxedzZEY8kZ6vPTqWKxYUzT29pnNtbcxmm0iN+wR+28lLa6meZ7vkSmb4D6MnzWnPwGdY2tB+U3X5zMjDj2+4cXSHe2vNrjchKtBhISAF+Tiey+qRgTqMizdWQxm0ZdPH8woNNrAegLeUU41f+HWWBpaNFofrYGOE4e3nKALeVwzxeaFcaYwT9A2Uxx X-Forefront-PRVS: 081904387B X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(39860400002)(346002)(396003)(136003)(376002)(366004)(189003)(199004)(13464003)(50466002)(229853002)(68736007)(446003)(956004)(476003)(11346002)(33716001)(44832011)(486006)(6496006)(54906003)(26005)(58126008)(33656002)(2906002)(305945005)(66066001)(16526019)(186003)(7736002)(9686003)(53936002)(6306002)(42882007)(55016002)(4326008)(55236004)(93886005)(76176011)(14444005)(2870700001)(97736004)(33896004)(316002)(3846002)(6116002)(81156014)(81166006)(8936002)(966005)(6916009)(1076002)(4744004)(6246003)(6666003)(5660300001)(53546011)(386003)(25786009)(72206003)(52116002)(478600001)(106356001)(105586002)(23676004)(45080400002)(8676002)(52146003)(2486003)(19627235002)(47776003)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM6PR07MB5002; H:jerin; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtETTZQUjA3TUI1MDAyOzIzOk1EdFQ3SFJTanN1WXZ1T2R4VEVVdkkyVm1Z?= =?utf-8?B?cGJJSXRSME1TeTRZRW11Zmw1dlY0Y0VBK1FrNHJGdWE0YWZPM2hULy9tVnls?= =?utf-8?B?cmJzWjBNRWNqNGUxMnF6UTROVnpabjBQN1N4VldReERhaDdFZndPOTBTRDlF?= =?utf-8?B?WVo5Y1AzREVOTnZVdU9UYTJXc2o0REwzdjE2aUFhTTZVL2ZsSUQwTjZqcTdo?= =?utf-8?B?ZVhtQzNETTBFMUM5VGZkYjBtV3FqQ0xwYmwreGdQYVV5TVJ4dVlsUWxwcWc2?= =?utf-8?B?TXNUTGpScFFSR051aDJ0UXpUdXBOeFpGSFZzS3ZnZkZvd0UxSXZXU2d6NkFP?= =?utf-8?B?QXFCdHplUjh3UFFyV3gxa2YwZW9RTjMxVHNMNjRHaVVYSHk2cmIzYkpBVXhW?= =?utf-8?B?a21KeWUwbGtoang5aHRZbkJvTkpBdmhGZmM2VSs3T1dET2xpZGNSOGhjZFlR?= =?utf-8?B?YzNEMFdaSDhCWWcrSWU2Sk85ZXMrUUtDQUNEQlVjL0RRcEJMWW5nZzlmTHZn?= =?utf-8?B?alpOeDJub1pHZlJlWkxHS1prK3hBSzdtNmxuUTdYODQ0Y09PRmR2eUtlV0k4?= =?utf-8?B?L3lyK2haclZyWXIvZndySVBFUDJjWWMrcTVETzBZd3lLTEVkcm53SC80Y3Q5?= =?utf-8?B?SU4wUkNMTHZOREV5Mm9qMnVuVXZ2ZEpxeGY0bzlvZXJlWVI3bDJSVS9Oeloy?= =?utf-8?B?Vys4VGNBWTZ1SGxON0YreDY1ZE81NFpaamFXczdFL0FCellhSXpocHlyRDF1?= =?utf-8?B?SFlTdUxXbHMyaGx4Qmw5b2ZNb2owUFBLWTIxMEQ4SU5iSzdsOXhueXVPQi82?= =?utf-8?B?dVRYVytkZU80NGdzVVJlL0VuS0sxVDJHOFRTVjM1NU0yeFMwMmh4T0VuM1hH?= =?utf-8?B?bVdhYTZReHl4OGxKVnFmbm5VU0VRZ1l3TG1mQVR2WWVOWG9MN3hlOXhBdGRN?= =?utf-8?B?YVBtS2xZallGY1QwNXNaLzNUWU1CSWo2WmFkYUVsSU54eUJUbzJxeTNESGlC?= =?utf-8?B?MlBQZGRXYkdpeWsxNXIrTDRPWEc2dSs1RmRpTEEyQ2MwVlRyN0xlQWRQQmJF?= =?utf-8?B?NE1GaDNESEtmTXJGVkVMNHF2M0hNZnh5cjErdjNabGZjY0ZIdFdzdC9HdVg5?= =?utf-8?B?Y2VWcjdIRmk0RGRiUXc3V3FMeWRFZmtWdERoc1c3K0dxcGFuSDhDdzhPYWtJ?= =?utf-8?B?WmlVbnp3YmpGbmh0Zkw4cnh6dWhtKzduQmorTnpmS1lSNWdhTmRWL3RUcW9B?= =?utf-8?B?cjJBZG9ib3dmaGRIS0g0YTZyU3dHVHRCbWdTMkpSVDZMcXdldjVrakh0T3NJ?= =?utf-8?B?UGZWSjRRQmtmUmJaM0NHTXVmcnFTd1V2VDRPdjZuRlY5Smo2MjRLNEt2aXVJ?= =?utf-8?B?VUttT0RkTWdPVm1IMmtBb3dkVnpHSEU3MUduVFlORlV0K21rNjlwcjBIMkJt?= =?utf-8?B?M1hoems1ZTFUZ3BHN2I1OVBrRWIyY25qT1VKUWE4K3B0QUFnaTFYSU5uNVdI?= =?utf-8?B?bDY0TVZWRVRLeUp3TitvWkRzNC9yTnducjFabGRWMHRVckdiRHBCajlhZGJO?= =?utf-8?B?WVN3bXV6cmxjNEpGU3RiVFFPd0UxYjRPQzlhckxVYjFaTFNIZHdEb2NhNVU2?= =?utf-8?B?RjVwK2FyaFpaNUcveDRHb25adXM0cUJwcTU3Uk5FOEFERGt6SVQ0TmZGdVFj?= =?utf-8?B?MkVLZmd5RXR1THR5ZjFCempWQVllS3Jua1BPQzlzdncvZ1BJOHlYL1lKYS96?= =?utf-8?B?SWp3YTloTytaZndwendhanZESlpYYmJVWWRnTm5tMmJxdUtKRW5Ga1NXQXBa?= =?utf-8?B?Y1lCL0J6QnEwTElDVEo1QURsWmFjTEU3TDU5eUsvT2E1ZkNodjZhSXplQmZw?= =?utf-8?B?b01TcnR5MUhYZzROOUg4R01ENHZvdnA0NFRpbWVldkRPa1BDdXY5b1JXZkpp?= =?utf-8?B?MS9IWGRMQXAwVWVYWmpMNFluYXlRcGptdWtEeDhPTVpCTU9TRFg5QUxiWEht?= =?utf-8?B?V2xLL1FINW9wYjhsRWo0MHJScTA5K0gwd2Q0cHFma25kL0dvMGlSNVhCQVl5?= =?utf-8?B?OVUwR015L2doeGxCVG1adDFSQWZmaEJZdk9RbWlNbG9UZURoZHpYVU1GRTNw?= =?utf-8?Q?S13FjpgD3SBksymFpFWb9CI=3D?= X-Microsoft-Antispam-Message-Info: ogl/kZ8TmvdZWrSef3fain1hZMdsd4rRZy+5JpBTrYlzQFsIKl9zB/vItiL1Vej1nv8ydSwTFesunBJuiq4PAEg24t3WwmqJ53oyQ115RSywZD8KtQJORV0W3ITd8TxiGPSobjnW+3RV9noZmXjQCZ9c+71gvvMjZNyN71oS36h7f4HRHsJFO/KhTs47sd/GGtn5lEvPffMzpW1gZIlreOIFo2s49iSnDpFJOpNjBSAqLU7j6ugQo/TbuAqivvmBZt/KItAfDWrm7Wu48Bni4erGeF5FAt6NMDII9MlJ1UQdgNYxR0HtrmpJ943OZ2P0IDJR/kZ/XQP5oS8HjiGuPV6dl5ghHfbJ2nFe/1cssXI= X-Microsoft-Exchange-Diagnostics: 1; DM6PR07MB5002; 6:eHqTdcut1TjP1p9FR0JuFJkMfCJk4G19y06yiJpOcBfLjF2C5z9XRSyQSYspCM0FP5KFO4m337/69M4YfquTJBW9Wroma3BU7dHLw0bJ4AeDbDKUELZCHP0H2HYZ2ZPmw81OaHxz26iFsUwoLVJQuLfLdtexTz0MJvcfeQ4nJufa1v1mjanhsoSsIPC5Ros03vbSFcMiOHIW6nTC9xYs5x7VbLw9rcl+dbZEqjfOnlPFRUr31G9y7ZKSY4JnKE9hGKXbMz+BzitXBWsoop+Vlu31rd3feRFclNr/LT49H7f2kyDQORIWggkw6GiiO3HeOko/m91DJHW6DV/GJHRtQDe/bY28nN8GftwJqro7W6f4qUdb0R/8VNLnP35uwfuChmgWm0wp0Yi7zDP+2hJs7t2eMexfWUQHdxSOSt1CiuvSf9xkJD6+1srSRmiFKtH//jAwrpLAj+TY3MA/aM8rWQ==; 5:iHcFnL20dbnNXG90y6W1mcxkv3Js1lfWwM/Px9jo5sAMKG8gVPSa8jZm8oORruuVF8IYjxcWVUSnZefCLkQSmIQeCTkOvHno14cBOuibl4hpdhJdVKz/dg/EUimXTELW6tnboqYLAvxgmUeQ6awrxvhUXTdjVRDfOTCqVPZlCr8=; 7:hHRsPtxayR+7VTZKBGcfPGqH8ZGYHLszKDo7najPh1+R0dVKa3CuGRx2sY53/KNFJ698jgF3dTyadFJLTuT3VoIbuLgtWYPvajHXRuyrpGGulLbs8HiszDSg4GVz/gPfSv5BD+J60m+LKQ9ysEnWFehaWp/VUZV7zs5K8WM1IPwKsBb8tZ0p1Vu5xfNkIiww/n+jo/UI+OUzQuO5zUKv1sDB75PeG8gkoOE0EBYzFhG/tVpglVdCrdOXkZqM96CS SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Oct 2018 10:50:00.4090 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5a3a5b78-c7e0-4565-baad-08d62d0bce45 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR07MB5002 Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Oct 2018 10:50:07 -0000 -----Original Message----- > Date: Mon, 8 Oct 2018 10:33:43 +0000 > From: "Gavin Hu (Arm Technology China)" > To: Ola Liljedahl , Jerin Jacob > > CC: "dev@dpdk.org" , Honnappa Nagarahalli > , "Ananyev, Konstantin" > , Steve Capper , nd > , "stable@dpdk.org" > Subject: RE: [PATCH v3 1/3] ring: read tail using atomic load > > > I did benchmarking w/o and w/ the patch, it did not show any noticeable differences in terms of latency. > Here is the full log( 3 runs w/o the patch and 2 runs w/ the patch). > > sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=1024 -- -i These counters are running at 100MHz. Use PMU counters to get more accurate results. https://doc.dpdk.org/guides/prog_guide/profile_app.html See: 55.2. Profiling on ARM64 > > -----Original Message----- > > From: Ola Liljedahl > > Sent: Monday, October 8, 2018 6:26 PM > > To: Jerin Jacob > > Cc: dev@dpdk.org; Honnappa Nagarahalli > > ; Ananyev, Konstantin > > ; Gavin Hu (Arm Technology China) > > ; Steve Capper ; nd > > ; stable@dpdk.org > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load > > > > > > > > On 08/10/2018, 12:00, "Jerin Jacob" > > wrote: > > > > -----Original Message----- > > > Date: Mon, 8 Oct 2018 09:22:05 +0000 > > > From: Ola Liljedahl > > > To: Jerin Jacob > > > CC: "dev@dpdk.org" , Honnappa Nagarahalli > > > , "Ananyev, Konstantin" > > > , "Gavin Hu (Arm Technology China)" > > > , Steve Capper , nd > > , > > > "stable@dpdk.org" > > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load > > > user-agent: Microsoft-MacOutlook/10.11.0.180909 > > > > > > External Email > > > > > > On 08/10/2018, 08:06, "Jerin Jacob" > > wrote: > > > > > > -----Original Message----- > > > > Date: Sun, 7 Oct 2018 20:44:54 +0000 > > > > From: Ola Liljedahl > > > > To: Jerin Jacob > > > > CC: "dev@dpdk.org" , Honnappa Nagarahalli > > > > , "Ananyev, Konstantin" > > > > , "Gavin Hu (Arm Technology > > China)" > > > > , Steve Capper , nd > > , > > > > "stable@dpdk.org" > > > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load > > > > user-agent: Microsoft-MacOutlook/10.11.0.180909 > > > > > > > > > > > > > Could you please fix the email client for inline reply. > > > Sorry that doesn't seem to be possible with Outlook for Mac 16 or > > Office365. The official Office365/Outlook > > > documentation doesn't match the actual user interface... > > > > > > > > > > > > https://www.kernel.org/doc/html/v4.19-rc7/process/email- > > clients.html > > > > > > > > > > > > > > On 07/10/2018, 06:03, "Jerin Jacob" > > wrote: > > > > > > > > In arm64 case, it will have ATOMIC_RELAXED followed by asm > > volatile ("":::"memory") of rte_pause(). > > > > I would n't have any issue, if the generated code code is same or > > better than the exiting case. but it not the case, Right? > > > > The existing case is actually not interesting (IMO) as it exposes > > undefined behaviour which allows the compiler to do anything. But you > > seem to be satisfied with "works for me, right here right now". I think the > > cost of avoiding undefined behaviour is acceptable (actually I don't think it > > even will be noticeable). > > > > > > I am not convinced because of use of volatile in head and tail indexes. > > > For me that brings the defined behavior. > > > As long as you don't mix in C11 atomic accesses (just use "plain" accesses > > to volatile objects), > > > it is AFAIK defined behaviour (but not necessarily using atomic loads and > > stores). But I quoted > > > the C11 spec where it explicitly mentions that mixing atomic and non- > > atomic accesses to the same > > > object is undefined behaviour. Don't argue with me, argue with the C11 > > spec. > > > If you want to disobey the spec, this should at least be called out for in > > the code with a comment. > > > > That's boils down only one question, should we follow C11 spec? Why not > > only take load > > acquire and store release semantics only just like Linux kernel and FreeBSD. > > And introduce even more undefined behaviour? > > > > Does not look like C11 memory model is super efficient in term of gcc > > implementation. > > You are making a chicken out of a feather. > > > > I think this "problem" with one additional ADD instruction will only concern > > __atomic_load_n(__ATOMIC_RELAXED) and > > __atomic_store_n(__ATOMIC_RELAXED) because the compiler separates > > the address generation (add offset of struct member) from the load or store > > itself. For other atomic operations and memory orderings (e.g. > > __atomic_load_n(__ATOMIC_ACQUIRE), the extra ADD instruction will be > > included anyway (as long as we access a non-first struct member) because > > e.g. LDAR only accepts a base register with no offset. > > > > I suggest minimising the imposed memory orderings can have a much larger > > (positive) effect on performance compared to avoiding one ADD instruction > > (memory accesses are much slower than CPU ALU instructions). > > Using C11 memory model and identifying exactly which objects are used for > > synchronisation and whether (any) updates to shared memory are acquired > > or released (no updates to shared memory means relaxed order can be used) > > will provide maximum freedom to the compiler and hardware to get the best > > result. > > > > The FreeBSD and DPDK ring buffers show some fundamental > > misunderstandings here. Instead excessive orderings and explicit barriers > > have been used as band-aids, with unknown effects on performance. > > > > > > > > > > > > > That the reason why I shared > > > the generated assembly code. If you think other way, Pick any compiler > > > and see generated output. > > > This is what one compiler for one architecture generates today. These > > things change. Other things > > > that used to work or worked for some specific architecture has stopped > > working in newer versions of > > > the compiler. > > > > > > > > > And > > > > > > Freebsd implementation of ring buffer(Which DPDK derived from), > > Don't have > > > such logic, See > > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L108 > > > It looks like FreeBSD uses some kind of C11 atomic memory model- > > inspired API although I don't see > > > exactly how e.g. atomic_store_rel_int() is implemented. The code also > > mixes in explicit barriers > > > so definitively not pure C11 memory model usage. And finally, it doesn't > > establish the proper > > > load-acquire/store-release relationships (e.g. store-release cons_tail > > requires a load-acquire cons_tail, > > > same for prod_tail). > > > > > > "* multi-producer safe lock-free ring buffer enqueue" > > > The comment is also wrong. This design is not lock-free, how could it be > > when there is spinning > > > (waiting) for other threads in the code? If a thread must wait for other > > threads, then by definition > > > the design is blocking. > > > > > > So you are saying that because FreeBSD is doing it wrong, DPDK can also > > do it wrong? > > > > > > > > > See below too. > > > > > > > > > > > Skipping the compiler memory barrier in rte_pause() potentially > > allows for optimisations that provide much more benefit, e.g. hiding some > > cache miss latency for later loads. The DPDK ring buffer implementation is > > defined so to enable inlining of enqueue/dequeue functions into the caller, > > any code could immediately follow these calls. > > > > > > > > From INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x > > > > Programming languages — C > > > > > > > > 5.1.2.4 > > > > 4 Two expression evaluations conflict if one of them modifies a > > memory location and the other one reads or modifies the same memory > > location. > > > > > > > > 25 The execution of a program contains a data race if it contains two > > conflicting actions in different threads, at least one of which is not atomic, > > and neither happens before the other. Any such data race results in > > undefined behavior. > > > > > > IMO, Both condition will satisfy if the variable is volatile and 32bit read > > will atomic > > > for 32b and 64b machines. If not, the problem persist for generic case > > > as well(lib/librte_ring/rte_ring_generic.h) > > > The read from a volatile object is not an atomic access per the C11 spec. It > > just happens to > > > be translated to an instruction (on x86-64 and AArch64/A64) that > > implements an atomic load. > > > I don't think any compiler would change this code generation and > > suddenly generate some > > > non-atomic load instruction for a program that *only* uses volatile to do > > "atomic" accesses. > > > But a future compiler could detect the mix of atomic and non-atomic > > accesses and mark this > > > expression as causing undefined behaviour and that would have > > consequences for code generation. > > > > > > > > > I agree with you on C11 memory model semantics usage. The reason > > why I > > > propose name for the file as rte_ring_c11_mem.h as DPDK it self did > > not > > > had definitions for load acquire and store release semantics. > > > I was looking for taking load acquire and store release semantics > > > from C11 instead of creating new API like Linux kernel for FreeBSD(APIs > > > like atomic_load_acq_32(), atomic_store_rel_32()). If the file name is > > your > > > concern then we could create new abstractions as well. That would > > help > > > exiting KNI problem as well. > > > I appreciate your embrace of the C11 memory model. I think it is better > > for describing > > > (both to the compiler and to humans) which and how objects are used > > for synchronisation. > > > > > > However, I don't think an API as you suggest (and others have suggested > > before, e.g. as > > > done in ODP) is a good idea. There is an infinite amount of possible base > > types, an > > > increasing number of operations and a bunch of different memory > > orderings, a "complete" > > > API would be very large and difficult to test, and most members of the > > API would never be used. > > > GCC and Clang both support the __atomic intrinsics. This API avoids the > > problems I > > > described above. Or we could use the official C11 syntax (stdatomic.h). > > But then we > > > have the problem with using pre-C11 compilers... > > > > I have no objection, if everyone agrees to move C11 memory model > > with __atomic intrinsics. But if we need to keep both have then > > atomic_load_acq_32() kind of API make sense. > > > > > > > > > > > > > > > > > > > I think, currently it mixed usage because, the same variable declaration > > > used for C11 vs non C11 usage.Ideally we wont need "volatile" for C11 > > > case. Either we need to change only to C11 mode OR have APIs for > > > atomic_load_acq_() and atomic_store_rel_() to allow both models like > > > Linux kernel and FreeBSD. > > > > > > > > > > > -- Ola > > > > > > > > > > > > > > > > > > > > >