From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0078.outbound.protection.outlook.com [104.47.40.78]) by dpdk.org (Postfix) with ESMTP id 3BF8A1B3E1 for ; Mon, 23 Oct 2017 12:06:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=IB4Qn6fpT6J5euDu9yWR3Rv5QbKlJdHAcOMneGYizeg=; b=ojBlm92enwMHR5jqDVVXauD0q/XDHl30koIpv7j/7DzaDkIuOkp6dPSlKhp15Hkr6uDj9jwBXGXyjFaX8X+3nJpOonLugDZX6JiEVvVa0Wru/NFX1N1uck7QPt5WMO3tcQf3gIasWjAVFdEKEUY8udz6CAbUGHq7r2CZvVDfnyo= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; Received: from jerin (14.140.2.178) by BN3PR07MB2513.namprd07.prod.outlook.com (10.167.4.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.77.7; Mon, 23 Oct 2017 10:06:35 +0000 Date: Mon, 23 Oct 2017 15:36:18 +0530 From: Jerin Jacob To: Jia He Cc: "Ananyev, Konstantin" , "Zhao, Bing" , Olivier MATZ , "dev@dpdk.org" , "jia.he@hxt-semitech.com" , "jie2.liu@hxt-semitech.com" , "bing.zhao@hxt-semitech.com" , "Richardson, Bruce" Message-ID: <20171023100617.GA17957@jerin> References: <20171012172311.GA8524@jerin> <2601191342CEEE43887BDE71AB9772585FAAB171@IRSMSX103.ger.corp.intel.com> <8806e2bd-c57b-03ff-a315-0a311690f1d9@163.com> <2601191342CEEE43887BDE71AB9772585FAAB404@IRSMSX103.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772585FAAB570@IRSMSX103.ger.corp.intel.com> <3e580cd7-2854-d855-be9c-7c4ce06e3ed5@gmail.com> <20171020054319.GA4249@jerin> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) X-Originating-IP: [14.140.2.178] X-ClientProxiedBy: PN1PR01CA0110.INDPRD01.PROD.OUTLOOK.COM (10.174.144.26) To BN3PR07MB2513.namprd07.prod.outlook.com (10.167.4.138) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 2c36db67-8b1b-40d1-7a59-08d519fdc08f X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(4534020)(4602075)(4627075)(201703031133081)(201702281549075)(2017052603199); SRVR:BN3PR07MB2513; X-Microsoft-Exchange-Diagnostics: 1; BN3PR07MB2513; 3:oS1SzDK28auBvtXwSZJaTT+jUNuG1DQjRLwPZqwKWP2HuS6ZmWfj7O5ileeCUeKSf1MVfdbBEj7Ajb26i6lUe00zIX0mJb0COXgcQB1tqO1m8yQyANm1Z8LD74wkuxPQP9ywO6r6rYtKMU1LHZ6zeUJw7bAzB7NL3p8jYxiVn+Y7koqrUOd3EQASAgtRVT4000t4VrnoIXXsVMBRDCkos/nVSB7ApfWcoAGMNkElXU6p46PEl1t45EdTdKp1/ZV0; 25:khcdiEmbJMDnflvCFbY4MakosefZCdnErPo4jBhsP5PQddor/AcYEpzP+7f1uljpn1wzgNlwBY7Akh1ZOLGULuiwZXfCOiCa+pr6kzxnX5R6UGN29cY6RiF9Of7pQFzLMOcHAAnrCZXCFVwZexsHAf5cKC/PPN/A1y7VSTWeE5sGUQpppAm0Cg207CmJV8bg2VM2Q/IiqYenEBXje7yltLFcHPW9owl7787BCX/3VB/QeOgbELUnMurBpg9B5TOS0F8Wa7DukSj9QO8AeOOGO5e9yfWYWyPy8uiw04gfRKg/3L8qoLsVkkENwP6cYyUyfT4JEoOJAMyrF5e0jd+43/Oud2zHu8zZPNzlYyrxKBs=; 31:kd3exrCqluCXs5rvIY8Ztj4WganU9750Bhgq3OzJGeeW5lh6Ot7LdCauLsvpqufV22r5LAjgGY6imlJ9AfI/7+Y+nDzIdkFLWqbyEpmrDGAUeQqVSJ5xfWCYLv3JNQOiZ8x2c5rAKyU+xTPoNxhjpPVbEQ9VAxB2iG8lw7jhFPjLAKRm8bVLSVo3ADoVY4wStR+cR0Kxs3FaG7MQYQpLakiw0aw8yu07V8itbgYnGF4= X-MS-TrafficTypeDiagnostic: BN3PR07MB2513: X-Microsoft-Exchange-Diagnostics: 1; BN3PR07MB2513; 20:23tzJbQnvAr3tUlNkOqZRm/ZyqfrRJQxaroLsDtfBvxA17LsXgXFDYeXpiMDrvzPUIP7gLzHiXDfpDUZivbuVqK/spjQaRDz4bKnEXWDcY174cI2eVQiDl19aQWL0ReMF3a6FvGYR7cilOArbUx1E4Pu3zuFaA99VOkD3C4H0LX0kN/IJc4nCikA4uEHLX4Q2UmVAkp4UX7yaGat2Ft/Dire0W82Tky2HCCX4P5F/2I8bTHZQgvAemB0Xfr0exWdwTPNb7fVFGFdgjWWs3cj57tTEaT8BReqmQCOcpyrTTydvTFZNV1fDNc05R1mXviF7s2t4VNmi8n2gvzCEj/XBucmsr1VdkYYhSYEUiYFaO16Yv+fp5OUO975qvL17b5S+XT991I1xS8DXdQ9Cp/0+TLQwnc8nxe3Lgzp/kUqkqLP5zN0m/W11wP1uaZg9VK2587wqS4KaxUVOIgSQhqKjW04ligINBbLYHelIaJBoYUeoo5M3FczBwcRBxdex0XDhG3kr7PkTZrBmVupU86ncx1KVLDVc+ZzuFJbtfhr91fyWnCpntab7Ux2lGfqWtzWkV+6Ps+87tULJFQQpzw8ml9A8CfKBpmsHDvir6nD4dI= X-Exchange-Antispam-Report-Test: UriScan:(166708455590820)(130843839470238)(228905959029699); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(3002001)(3231020)(100000703101)(100105400095)(10201501046)(93006095)(6041248)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123555025)(20161123564025)(20161123558100)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BN3PR07MB2513; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BN3PR07MB2513; X-Microsoft-Exchange-Diagnostics: 1; BN3PR07MB2513; 4:rJM3bzsV07Gr+C67URadzoq8BaVBli8aPzyG62C7oEohBs+GH0fyNRNHEoh9zLI/2FXD4aSH6SIVZq2cRz/0tOFfyeorVlC398kU81M4jkour88kJEmvRBotPPqmufQIUg6aWNo57lrkwbAp2PIyqBe+63Uc7uldv/NQCDPzmDED2+hkRS5s8OrD/15uweGJISMJouiHFVWRSPafj/RTVQfwIj3HlwozcM43vBAFrilS7ed9IATuiG+j5tWj4Jw/5wTsFQToqi1SFBZ+oeNzjUlV9W6PuTJKlKM4DXSkAq1edYdn+d9QKEd73sfjFKNfICPVdLkyYRf1AB6OgtsAcr99nMYm3FutNGTDw2PcuV2etUaE3372lmEeOyCUj2yV X-Forefront-PRVS: 046985391D X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(6009001)(376002)(346002)(24454002)(199003)(13464003)(189002)(47776003)(25786009)(76176999)(305945005)(101416001)(7736002)(54356999)(106356001)(2906002)(6496005)(66066001)(2950100002)(4326008)(68736007)(8936002)(6246003)(50986999)(53376002)(39060400002)(42882006)(6916009)(81156014)(55016002)(81166006)(8676002)(6306002)(105586002)(33716001)(53936002)(5660300001)(9686003)(2870700001)(966005)(478600001)(1076002)(5009440100003)(97736004)(93886005)(6666003)(50466002)(33656002)(23756003)(3846002)(6116002)(58126008)(54906003)(1411001)(83506002)(72206003)(189998001)(229853002)(316002)(16526018)(110426004)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:BN3PR07MB2513; H:jerin; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1; BN3PR07MB2513; 23:5fNtk/vgW2N8XjXXbvXji2JMugjYkil636cklom?= =?iso-8859-1?Q?j56Tu+07vt9G0UcS0YNekGoGF/ctwCSRXci+aONFIo879PBewNgtYI2kiP?= =?iso-8859-1?Q?p7iSAEm+4CFkTFnumukTDxNGDobSwLhgrsn5vke42WgOAIibyuvF11lmvU?= =?iso-8859-1?Q?oEIeiweVP6IK0Nl01T64MxKz1JqVQC6jwIzqVy/YZjdNg5/GL9Nkceu3O0?= =?iso-8859-1?Q?fsLeqtLthw4H8o+kb++5p1ESmas+agBObfoh6DqRtk0kQU6shMoM0608EP?= =?iso-8859-1?Q?XN5XZBZlTJvo+bYbrfZFepDlHTlYIJd3GwoySR63Wgiov8eI7uyFQCmtDY?= =?iso-8859-1?Q?g7WkKk+sZ2Ec12Fnu2kHz2QB/h6MbFeFJI0bwAqrv3FfFH5xEXe22Y8U5h?= =?iso-8859-1?Q?wJ6ti3XV5nMIm795aFtJdFSuqg1SKh3I2xBY6jSjMTbLbGuCJc/b3gbgxJ?= =?iso-8859-1?Q?y8qSFhNvfCBh2tw03+DCA+5SttLCUFz9LPTeAjLChroSlja9waw2WEe2gt?= =?iso-8859-1?Q?tvywkIvWreo6nz6wCd0zi12vs5TEg3EFMutV7xVi7D7IvX3TNQyLqJWBms?= =?iso-8859-1?Q?72jescWOjXSpZ+zEIBgmRdnnCWcp+EbDuKqpl7zs/TmFX7gF/4qrV9ILac?= =?iso-8859-1?Q?R2wpHrziwmtJVvzZuYW0tjEHuKj2XTl/L2wz2jtbHp38xRI85j1Xnv/v2b?= =?iso-8859-1?Q?7uzNlZ2Gs12BHZiP18lXjSkYjzchmG+gzFjafMn5j+1ls8EzZSNFLdw2Ql?= =?iso-8859-1?Q?PnIeVK+80VzIsJ25UWxE3zfOfTp1u2KbobXrfZvZDnGJEODHjT8a+D2vlS?= =?iso-8859-1?Q?jPrThukXZvAqS8WqXVZRM1GN7+oPGl8C+gwfWoIw8Rba2HJGvtcvujNpSD?= =?iso-8859-1?Q?CVgcgOguaidVkovIMsxPn8G+ZH2F33yJUkFnB+6e/BX2PsG3lOTKJLLIzt?= =?iso-8859-1?Q?Y3UxKvk9ch9sR77KkGKgHZ+r0vQkhmxE95Y9cv7ajnMe9Zs0Gr60EfS2VO?= =?iso-8859-1?Q?K7588Z8Ib5J7EsaIQETj53G2Od3WvDnQMQEOhlkLwh15YbQt4M9DJh3Mr0?= =?iso-8859-1?Q?X/IABVoKWS9/CpIXM0/hSTBtlD7M6E+eTxHl/KOG0GGRq52WZqzZya64+S?= =?iso-8859-1?Q?ee7brYBHOSX+gDKKNOK1iGYtUKj4yXBMUJ43A7DqfNECqtgzah+14AaYrl?= =?iso-8859-1?Q?+q4rjx4YExPDk8Xb9cIjgSni7p6yQn9lb9SuCE9skM7iGgEvmFw8gQuI//?= =?iso-8859-1?Q?0dJDH3RXup3p9j1ftqsKZY/WjXNPCkMGqCIsmc0aWzN9h6ZkFWd056X2Zh?= =?iso-8859-1?Q?Cx4TZjbtgA801uQ7EUF7HhCWjE/vv4jOKUN7kSanACutW3tvMpPRankgnB?= =?iso-8859-1?Q?Wh3Ho+P50NsPEe7vkJ2sWOKQl7BQA4Oy7lg3brfnHBfZf970BIECwsjKxY?= =?iso-8859-1?Q?52V8sI0mbefITdEe62eA8bCJ/HrkYqeVQzMVJD38qjKZgBcc2cUlJAxbw?= =?iso-8859-1?Q?=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; BN3PR07MB2513; 6:zj4Z9QefVCt78P1+51/LqbLMz6+ZWSO+489maA/dtywbeJDgrkO9B1FStRrZ/3Un5la66Pqc20mgCnsWhuoBQL3v2yrgQL3/i020+lvgu4kjsz8v6W+nnGpYKHD/QCPT9HzL290g4F9Kytgr8mKdNERgCRQIZnScfC2CAO/0LlpkiVq2+ni0XPmCRltXUBd1zV0BIrbl92Hl7Jh2e1Es3ZIVIgT7rsf9LaaHQylgnxqK6HZ1OlW/KBHz36QsOy2gwhnnxv88xsUJzjehW+pzS4ixGskgRo8YHWSmd7Eh1lnad6v6jFRzuL+nPLfl/w2hgxfWItTIoTpLOJtXJf2yBg==; 5:WNoeIFGZ9pwJ1T4Uv+RgrvdhXal04QeQlknB+tZgaVhFEUsayAhfhsdmLJd292iJHwfoJ76uFHJxfQoxoTYmUHVens5Dt+yF4B0+Uz32Jh+xc5xqi4DUcyqKRZvAH3kgK2onEGtfr4JX+n72YqPKuw==; 24:U8SXcAkRRUzFy0cDEU1PA361zpK+Jcytw9hnMt58yDH+NejkH8xIPususjrr9f1wNAst0C7KrgRMh5nN7m5xUlUDp8kwky6QQ4zgezpD91E=; 7:+geBpVAkRq/gnUUASZ8VVNXX0aNIUfZTa/S6ouB5UjHSQhKCRoKDvZCMhoRez2dVmpUIgNdL/E4vCs+tz+jwc0WmXy1Lqk2U9ouwTSWFKATZcSRPRrlyh4aXsRmYOIC3jc+JV++uMcbVwoIXWrxHIVo7N30qnB7NZLZ/Gvo7ZYtj4j7HBfLkgK7B45Y9Er/KBgJr8304MIfCN9Y9jxpIXGQlT4h8wQtnshudC94lH74= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Oct 2017 10:06:35.5059 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR07MB2513 Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Oct 2017 10:06:42 -0000 -----Original Message----- > Date: Mon, 23 Oct 2017 16:49:01 +0800 > From: Jia He > To: Jerin Jacob > Cc: "Ananyev, Konstantin" , "Zhao, Bing" > , Olivier MATZ , > "dev@dpdk.org" , "jia.he@hxt-semitech.com" > , "jie2.liu@hxt-semitech.com" > , "bing.zhao@hxt-semitech.com" > , "Richardson, Bruce" > > Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod > loading when doing enqueue/dequeue > User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 > Thunderbird/52.4.0 > > Hi Jerin > > > On 10/20/2017 1:43 PM, Jerin Jacob Wrote: > > -----Original Message----- > > > > [...] > > > dependant on each other. > > > Thus a memory barrier is neccessary. > > Yes. The barrier is necessary. > > In fact, upstream freebsd fixed this issue for arm64. DPDK ring > > implementation is derived from freebsd's buf_ring.h. > > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166 > > > > I think, the only outstanding issue is, how to reduce the performance > > impact for arm64. I believe using accurate/release semantics instead > > of rte_smp_rmb() will reduce the performance overhead like similar ring implementations below, > > freebsd: https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166 > > odp: https://github.com/Linaro/odp/blob/master/platform/linux-generic/pktio/ring.c > > > > Jia, > > 1) Can you verify the use of accurate/release semantics fixes the problem in your > > platform? like use of atomic_load_acq* in the reference code. > > 2) If so, What is the overhead between accurate/release and plane smp_smb() > > barriers. Based on that we need decide what path to take. > I've tested 3 cases.  The new 3rd case is to use the load_acquire barrier > (half barrier) you mentioned > at above link. > The patch seems like: > @@ -408,8 +466,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp, >                 /* Reset n to the initial burst count */ >                 n = max; > > -               *old_head = r->prod.head; > -               const uint32_t cons_tail = r->cons.tail; > +               *old_head = atomic_load_acq_32(&r->prod.head); > +               const uint32_t cons_tail = > atomic_load_acq_32(&r->cons.tail); > > @@ -516,14 +576,15 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_s >                 /* Restore n as it may change every loop */ >                 n = max; > > -               *old_head = r->cons.head; > -               const uint32_t prod_tail = r->prod.tail; > +               *old_head = atomic_load_acq_32(&r->cons.head); > +               const uint32_t prod_tail = atomic_load_acq_32(&r->prod.tail) >                 /* The subtraction is done between two unsigned 32bits value >                  * (the result is always modulo 32 bits even if we have >                  * cons_head > prod_tail). So 'entries' is always between 0 >                  * and size(ring)-1. */ > > The half barrier patch passed the fuctional test. > > As for the performance comparision on *arm64*(the debug patch is at > http://dpdk.org/ml/archives/dev/2017-October/079012.html), please see the > test results > below: > > [case 1] old codes, no barrier > ============================================ >  Performance counter stats for './test --no-huge -l 1-10': > >      689275.001200      task-clock (msec)         #    9.771 CPUs utilized >               6223      context-switches          #    0.009 K/sec >                 10      cpu-migrations            #    0.000 K/sec >                653      page-faults               #    0.001 K/sec >      1721190914583      cycles                    #    2.497 GHz >      3363238266430      instructions              #    1.95  insn per cycle >    branches >           27804740      branch-misses             #    0.00% of all branches > >       70.540618825 seconds time elapsed > > [case 2] full barrier with rte_smp_rmb() > ============================================ >  Performance counter stats for './test --no-huge -l 1-10': > >      582557.895850      task-clock (msec)         #    9.752 CPUs utilized >               5242      context-switches          #    0.009 K/sec >                 10      cpu-migrations            #    0.000 K/sec >                665      page-faults               #    0.001 K/sec >      1454360730055      cycles                    #    2.497 GHz >       587197839907      instructions              #    0.40  insn per cycle >    branches >           27799687      branch-misses             #    0.00% of all branches > >       59.735582356 seconds time elapse > > [case 1] half barrier with load_acquire > ============================================ >  Performance counter stats for './test --no-huge -l 1-10': > >      660758.877050      task-clock (msec)         #    9.764 CPUs utilized >               5982      context-switches          #    0.009 K/sec >                 11      cpu-migrations            #    0.000 K/sec >                657      page-faults               #    0.001 K/sec >      1649875318044      cycles                    #    2.497 GHz >       591583257765      instructions              #    0.36  insn per cycle >    branches >           27994903      branch-misses             #    0.00% of all branches > >       67.672855107 seconds time elapsed > > Please see the context-switches in the perf results > test result  sorted by time is: > full barrier < half barrier < no barrier > > AFAICT, in this case ,the cpu reordering will add the possibility for > context switching and > increase the running time. > Any ideas? Regarding performance test, it better to use ring perf test case on _isolated_ cores to measure impact on number of enqueue/dequeue operations. example: ./build/app/test -c 0xff -n 4 >>ring_perf_autotest By default, arm64+dpdk will be using el0 counter to measure the cycles. I think, in your SoC, it will be running at 50MHz or 100MHz.So, You can follow the below scheme to get accurate cycle measurement scheme: See: http://dpdk.org/doc/guides/prog_guide/profile_app.html check: 44.2.2. High-resolution cycle counter