From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id CA1244C95 for ; Tue, 2 Oct 2018 00:43:56 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Oct 2018 15:43:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,329,1534834800"; d="scan'208";a="84929599" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by FMSMGA003.fm.intel.com with ESMTP; 01 Oct 2018 15:41:57 -0700 Received: from fmsmsx102.amr.corp.intel.com (10.18.124.200) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 1 Oct 2018 15:41:56 -0700 Received: from fmsmsx151.amr.corp.intel.com ([169.254.7.87]) by FMSMSX102.amr.corp.intel.com ([169.254.10.220]) with mapi id 14.03.0319.002; Mon, 1 Oct 2018 15:41:56 -0700 From: "Wang, Yipeng1" To: Honnappa Nagarahalli , "Richardson, Bruce" , "De Lara Guarch, Pablo" CC: "dev@dpdk.org" , "Gavin Hu (Arm Technology China)" , Steve Capper , Ola Liljedahl , nd , "Gobriel, Sameh" Thread-Topic: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions Thread-Index: AQHURgTw9q3ETpVDqE+BfQvCqkpoP6UE8BeggAUQ8oCAASCl8A== Date: Mon, 1 Oct 2018 22:41:56 +0000 Message-ID: References: <1536253938-192391-1-git-send-email-honnappa.nagarahalli@arm.com> <1536253938-192391-3-git-send-email-honnappa.nagarahalli@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-ctpclassification: CTP_NT x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNjg3OGZjMjAtNDA3YS00OWU3LWI4MWUtMmFlZjc1YWE3NWIwIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoibGxtYVVGRmxqV2tzREpcL2JkcVZER091MHUxNE1WbkdHakg3UzZxYUt0dDQ4Z0VxcGxhN1dpV1lNNFc4d1AzMUYifQ== x-originating-ip: [10.1.200.107] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Oct 2018 22:43:57 -0000 >-----Original Message----- >From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] >Sent: Sunday, September 30, 2018 3:21 PM >To: Wang, Yipeng1 ; Richardson, Bruce ; De Lara Guarch, Pablo > >Cc: dev@dpdk.org; Gavin Hu (Arm Technology China) ; Stev= e Capper ; Ola Liljedahl >; nd >Subject: RE: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid rac= e conditions > >> >> Some general comments for the various __atomic_store/load added, >> >> 1. Although it passes the compiler check, but I just want to confirm tha= t if we >> should use GCC/clang builtins, or if There are higher level APIs in DPDK= to do >> atomic operations? >> >I have used gcc builtins (just like rte_ring does) [Wang, Yipeng] I checked rte_ring, it also has a specific header for C11, s= ince it is a C11 standard, do we need something similar here?=20 > >> 2. We believe compiler will translate the atomic_store/load to regular M= OV >> instruction on Total Store Order architecture (e.g. X86_64). But we run = the >> perf test on x86 and here is the relative slowdown on lookup comparing t= o >> master head. I am not sure if the performance drop comes from the atomic >> buitins. >> >C11 atomics also block compiler reordering. Other than this, the retry loo= p is an addition to lookup. >The patch also has the alignment corrected. I am not sure how is that affe= cting the perf numbers. > >> Keysize | single lookup | bulk lookup >> 4 | 0.93 | 0.95 >> 8 | 0.95 | 0.96 >> 16 | 0.97 | 0.96 >> 32 | 0.97 | 1.00 >> 48 | 1.03 | 0.99 >> 64 | 1.04 | 0.98 >> 9 | 0.91 | 0.96 >> 13 | 0.97 | 0.98 >> 37 | 1.04 | 1.03 >> 40 | 1.02 | 0.98 >> >I assume this is the data from the test cases in test_hash_perf.c file. I = tried to reproduce this data, but my data is worse. Can you >specify the actual test from test_hash_perf.c you are using (With locks/Pr= e-computed hash/With data/Elements in primary)? >IMO, the differences you have provided are not high. [Wang, Yipeng] I remember the performance data I used is the no-lock, witho= ut hash, with 8-byte data, in both primary and secondary. I compared the master head to the one with your first two commits.=20 > >> [Wang, Yipeng] I think even for current code, we need to check empty_slo= t. >> Could you export this as a bug fix commit? >> >In the existing code, there is check 'if (!!key_idx & !rte_hash....)'. Are= you referring to '!!key_idx'? I think this should be changed to >'(key_idx !=3D EMPTY_SLOT)'. [Wang, Yipeng] Yeah, I guess I did not see that part. Then I guess it is no= need to export as a bug fix for now since it is not a functional issue. Your change is good.