From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40085.outbound.protection.outlook.com [40.107.4.85]) by dpdk.org (Postfix) with ESMTP id 5A8701B133; Wed, 26 Sep 2018 11:59:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AwfasgKLG/a98Jc5Rnrun90DHHXKtDQXt0QX7OeqyXU=; b=lz+RkvNTfg7MB+eaukjEw4gfOD+iURdhaMVCuR7MGH5GjRIv8h8vKZvXkCfH6Wog2rZulLoCIPWiYuhSWPHZ4SA7p3LqHSfk//HOsw+lEHgqUXZ+teFdhPlcRqsEQ9ulWvXmicN9+QONu5Sk9QELbYgqoOO4Q3ppKOQ9Mtt9Sx0= Received: from DB7PR08MB3082.eurprd08.prod.outlook.com (52.134.110.24) by DB7PR08MB3561.eurprd08.prod.outlook.com (20.177.120.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1164.22; Wed, 26 Sep 2018 09:59:02 +0000 Received: from DB7PR08MB3082.eurprd08.prod.outlook.com ([fe80::6c4d:ef1:bd1c:4589]) by DB7PR08MB3082.eurprd08.prod.outlook.com ([fe80::6c4d:ef1:bd1c:4589%6]) with mapi id 15.20.1143.022; Wed, 26 Sep 2018 09:59:02 +0000 From: Justin He To: "Gavin Hu (Arm Technology China)" , "dev@dpdk.org" CC: Honnappa Nagarahalli , Steve Capper , Ola Liljedahl , "jerin.jacob@caviumnetworks.com" , nd , "stable@dpdk.org" Thread-Topic: [PATCH v3 2/3] ring: synchronize the load and store of the tail Thread-Index: AQHUTl7pRLknpW1AnkGfCembAZIwWKUCWgmQgAAFlfA= Date: Wed, 26 Sep 2018 09:59:02 +0000 Message-ID: References: <20180807031943.5331-1-gavin.hu@arm.com> <1537172244-64874-1-git-send-email-gavin.hu@arm.com> <1537172244-64874-2-git-send-email-gavin.hu@arm.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Justin.He@arm.com; x-originating-ip: [113.29.88.7] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB7PR08MB3561; 6:ouP/YcQZiJyGQGF+8cL5NuQ9LUyaufsqrRxfTNFVfEmIzbb0vTg6CbzddCVLGmit6cs732f08mQkCY98tAaO0efVy5q+fEW5194H84fhh040H3njXn0LWb8eH3xDPVzLOscEW5LiWub6jonxBUZXZ2b/c0KMFHmvV8+b8vKpzTY0MTXGXyAU0+lSDuoztrRe4AAYq4J/hOsfEbqKeh7EZ2qhtcfixcVyveT3FfFFeXl8ORtLlTz+d1DvP81WILwAjBU9XnAST0N3BS9g1bemoUqRBzJe5lPH3TKtfrNdn4h9lLpH99hKQ0Q4leNaC1Mpwo8uMJsG7DmTQEw0eOJWhw7uDuH0BNr+GpsQk6Jz5yDzHNtpnQSOwX8LB2J0YEMUeZzTjxM9fY9UePOiMO/M3xnhOmWdBzDYssaBt+a2Yd+QEtNJn4rNmT/4L4wVHt1w3Vz+h1yrAnvAWSi5sJa6RA==; 5:Peh9CjyAMS5tkpJPibYoB/qxtdTejKqx4By09ZE2+Ui2Lr+8PzCoyvar0CkSud0mnNehs4fNwh1BZG777u1NEY6SkKppOX056aIdyOJTFpn3Fb8KBREl419ndMzWeG2Cm+8RHiKw12+TSppBZYBpCnz2TIGA+2DHwAEhYPaVVBQ=; 7:iq6ykrbFC/DIgaesq6FuabMyG1ejYlAzzV3RoIn5pKlQFzU4QvapAcGch0mBq4AuxExgaf9gYfTbSGR5C/kQXilV+RKTwf8U8/FPXyDwKJISXHCNJjIDEEk9A/PlXRncBjZ49RVp35hIhaeTLXcZnaUNerhLuMaPWjzBMleckyULA7DziwbkCQaihkQhHslYJ3uu2QTkI3ixTqFb+ar1mv6WrdOJeXa2K4IIDpTc/SWK9x0/jQB8fqbJ7mO5Bver x-ms-exchange-antispam-srfa-diagnostics: SOS;SOR; x-ms-office365-filtering-correlation-id: c60e61c6-e533-4a34-c258-08d62396afe6 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:DB7PR08MB3561; x-ms-traffictypediagnostic: DB7PR08MB3561: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3002001)(93006095)(93001095)(10201501046)(3231355)(944501410)(52105095)(6055026)(149066)(150057)(6041310)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(20161123560045)(201708071742011)(7699051); SRVR:DB7PR08MB3561; BCL:0; PCL:0; RULEID:; SRVR:DB7PR08MB3561; x-forefront-prvs: 08076ABC99 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(366004)(39860400002)(136003)(396003)(346002)(199004)(189003)(13464003)(40434004)(446003)(5250100002)(3846002)(25786009)(6116002)(478600001)(99286004)(2900100001)(2501003)(68736007)(11346002)(966005)(66066001)(256004)(14444005)(5024004)(81156014)(81166006)(8936002)(5660300001)(72206003)(33656002)(54906003)(2906002)(110136005)(14454004)(316002)(105586002)(34290500001)(97736004)(71200400001)(106356001)(71190400001)(102836004)(4326008)(305945005)(93886005)(74316002)(6246003)(476003)(7736002)(7696005)(76176011)(6436002)(229853002)(486006)(26005)(9686003)(8676002)(53936002)(86362001)(6306002)(53546011)(6506007)(55016002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB7PR08MB3561; H:DB7PR08MB3082.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: DLv2CFN3fPVLx4qm3COvOSZ7sJxFlFvMXqEdmWyGbzEgAi8Qbkm0Kp9/3c6y4rKSv6cgsN7JNsufVHQpxfo5SftQOq2HIpzPpE0K29O/p7aEJTe5OEYBoTJxCt7i4kkznnj+x9Xhx+A/ddNEyNV8J3+2O+O36ieMFbnRyeiZ2hd7pCqkQIGxXuaJtXCvrCPBXGMziIQ3hgg7xizfgDXj5eDzCShQNopufClFVAmQE75Mo6FptBCNay4dqvamSCRB6B7MLlb/1Atn8Uas9EvasCBIhIFToNi3P77HLswG57HnzCr5V/mNlv8owiMpYSyURxMG86w/C2sicpBK5mVm7mf71ztXMNmUhSrkQoyT0FI= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: c60e61c6-e533-4a34-c258-08d62396afe6 X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Sep 2018 09:59:02.1895 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3561 X-Mailman-Approved-At: Thu, 27 Sep 2018 13:25:05 +0200 Subject: Re: [dpdk-dev] [PATCH v3 2/3] ring: synchronize the load and store of the tail X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Sep 2018 09:59:03 -0000 Hi Gavin > -----Original Message----- > From: Gavin Hu (Arm Technology China) > Sent: 2018=1B$BG/=1B(B9=1B$B7n=1B(B26=1B$BF|=1B(B 17:30 > To: Gavin Hu (Arm Technology China) ; dev@dpdk.org > Cc: Honnappa Nagarahalli ; Steve Capper > ; Ola Liljedahl ; > jerin.jacob@caviumnetworks.com; nd ; stable@dpdk.org; Justin > He > Subject: RE: [PATCH v3 2/3] ring: synchronize the load and store of the t= ail > > +Justin He for review. > > > -----Original Message----- > > From: Gavin Hu > > Sent: Monday, September 17, 2018 4:17 PM > > To: dev@dpdk.org > > Cc: Gavin Hu (Arm Technology China) ; Honnappa > > Nagarahalli ; Steve Capper > > ; Ola Liljedahl ; > > jerin.jacob@caviumnetworks.com; nd ; stable@dpdk.org > > Subject: [PATCH v3 2/3] ring: synchronize the load and store of the > > tail > > > > Synchronize the load-acquire of the tail and the store-release within > > update_tail, the store release ensures all the ring operations, > > enqueue or dequeue, are seen by the observers on the other side as > > soon as they see the updated tail. The load-acquire is needed here as > > the data dependency is not a reliable way for ordering as the compiler > > might break it by saving to temporary values to boost performance. > > When computing the free_entries and avail_entries, use atomic > > semantics to load the heads and tails instead. > > > > The patch was benchmarked with test/ring_perf_autotest and it > > decreases the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, > > the real gains are dependent on the number of lcores, depth of the ring= , SPSC > or MPMC. > > For 1 lcore, it also improves a little, about 3 ~ 4%. > > It is a big improvement, in case of MPMC, with two lcores and ring > > size of 32, it saves latency up to (3.26-2.36)/3.26 =3D 27.6%. > > > > This patch is a bug fix, while the improvement is a bonus. In our > > analysis the improvement comes from the cacheline pre-filling after > > hoisting load- acquire from _atomic_compare_exchange_n up above. > > > > The test command: > > $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 > > --socket-mem=3D\ > > 1024 -- -i > > > > Test result with this patch(two cores): > > SP/SC bulk enq/dequeue (size: 8): 5.86 MP/MC bulk enq/dequeue (size: > > 8): 10.15 SP/SC bulk enq/dequeue (size: > > 32): 1.94 MP/MC bulk enq/dequeue (size: 32): 2.36 > > > > In comparison of the test result without this patch: > > SP/SC bulk enq/dequeue (size: 8): 6.67 MP/MC bulk enq/dequeue (size: > > 8): 13.12 SP/SC bulk enq/dequeue (size: > > 32): 2.04 MP/MC bulk enq/dequeue (size: 32): 3.26 > > > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") > > Cc: stable@dpdk.org > > > > Signed-off-by: Gavin Hu > > Reviewed-by: Honnappa Nagarahalli > > Reviewed-by: Steve Capper > > Reviewed-by: Ola Liljedahl > > --- > > lib/librte_ring/rte_ring_c11_mem.h | 20 ++++++++++++++++---- > > 1 file changed, 16 insertions(+), 4 deletions(-) > > > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > > b/lib/librte_ring/rte_ring_c11_mem.h > > index 234fea0..0eae3b3 100644 > > --- a/lib/librte_ring/rte_ring_c11_mem.h > > +++ b/lib/librte_ring/rte_ring_c11_mem.h > > @@ -68,13 +68,18 @@ __rte_ring_move_prod_head(struct rte_ring *r, > > unsigned int is_sp, > > *old_head =3D __atomic_load_n(&r->prod.head, > > __ATOMIC_ACQUIRE); > > > > -/* > > - * The subtraction is done between two unsigned 32bits > > value > > +/* load-acquire synchronize with store-release of ht->tail > > + * in update_tail. > > + */ > > +const uint32_t cons_tail =3D __atomic_load_n(&r->cons.tail, > > + > > __ATOMIC_ACQUIRE); I ever noticed that freebsd also used the double __atomic_load_n. And as we discussed it several months ago [1], seems the second load_acquire is not necessary. But as you verified, it looks good to me [1] http://mails.dpdk.org/archives/dev/2017-November/080983.html So, Reviewed-by: Jia He Cheers, Justin (Jia He) > > + > > +/* The subtraction is done between two unsigned 32bits > > value > > * (the result is always modulo 32 bits even if we have > > * *old_head > cons_tail). So 'free_entries' is always between 0 > > * and capacity (which is < size). > > */ > > -*free_entries =3D (capacity + r->cons.tail - *old_head); > > +*free_entries =3D (capacity + cons_tail - *old_head); > > > > /* check that we have enough room in ring */ > > if (unlikely(n > *free_entries)) > > @@ -132,15 +137,22 @@ __rte_ring_move_cons_head(struct rte_ring *r, > > int is_sc, > > do { > > /* Restore n as it may change every loop */ > > n =3D max; > > + > > *old_head =3D __atomic_load_n(&r->cons.head, > > __ATOMIC_ACQUIRE); > > > > +/* this load-acquire synchronize with store-release of ht->tail > > + * in update_tail. > > + */ > > +const uint32_t prod_tail =3D __atomic_load_n(&r->prod.tail, > > +__ATOMIC_ACQUIRE); > > + > > /* The subtraction is done between two unsigned 32bits value > > * (the result is always modulo 32 bits even if we have > > * cons_head > prod_tail). So 'entries' is always between 0 > > * and size(ring)-1. > > */ > > -*entries =3D (r->prod.tail - *old_head); > > +*entries =3D (prod_tail - *old_head); > > > > /* Set the actual entries for dequeue */ > > if (n > *entries) > > -- > > 2.7.4 IMPORTANT NOTICE: The contents of this email and any attachments are confid= ential and may also be privileged. If you are not the intended recipient, p= lease notify the sender immediately and do not disclose the contents to any= other person, use it for any purpose, or store or copy the information in = any medium. Thank you.