From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0083.outbound.protection.outlook.com [104.47.0.83]) by dpdk.org (Postfix) with ESMTP id 41CFE5F11; Wed, 17 Oct 2018 08:35:36 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ZruATUqPdbEAxhn+MDhp5cKhq4ej3FeEMedqN09NNBY=; b=DzuupktEMIxvaxsYlLM2VU5Bc8BcM5hI9IXXV+Y0MYPXl7S77/DkFe12TySSfisaM9czc0fTSfBVrqCA7+FPyW5smbXKeEoC3RuXfUFl64w/LWGbJvmlUD2f/ueYg1orf+TiNvIg/MkkLtxpyE+Mg2k3ktz/z3ZbUV1zLoBOoes= Received: from VI1PR08MB3167.eurprd08.prod.outlook.com (52.133.15.142) by VI1PR08MB0973.eurprd08.prod.outlook.com (10.166.143.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1228.24; Wed, 17 Oct 2018 06:35:33 +0000 Received: from VI1PR08MB3167.eurprd08.prod.outlook.com ([fe80::4c13:b1f:ad01:86d7]) by VI1PR08MB3167.eurprd08.prod.outlook.com ([fe80::4c13:b1f:ad01:86d7%4]) with mapi id 15.20.1228.032; Wed, 17 Oct 2018 06:35:31 +0000 From: "Gavin Hu (Arm Technology China)" To: "Gavin Hu (Arm Technology China)" , "dev@dpdk.org" , "jerin.jacob@caviumnetworks.com" CC: Honnappa Nagarahalli , "stable@dpdk.org" , Ola Liljedahl Thread-Topic: [PATCH 1/2] ring: synchronize the load and store of the tail Thread-Index: AQHUZeLht2NjggOSDE2ZyYOtLzEKsqUi+lcw Date: Wed, 17 Oct 2018 06:35:31 +0000 Message-ID: References: <1537172244-64874-2-git-send-email-gavin.hu@arm.com> <1539757786-226178-1-git-send-email-gavin.hu@arm.com> In-Reply-To: <1539757786-226178-1-git-send-email-gavin.hu@arm.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Gavin.Hu@arm.com; x-originating-ip: [113.29.88.7] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR08MB0973; 6:+9wFwqXm4HOMr5qi7ks3oT6/25RRGGyM2/azHyKQwNuxUTjKMjt3jqM46baKJXgCdkebMXHqFGLxP9Iea41BqzjaYEldmmJ/r0RqBoYPYs5NOouxu0fSa7GtLc+OeVDig2VkvWyQxGIl1iiYZCJ7UU79jnXS1oz8OsdDiNo/wq4duZYd2s/MBbCtrEOGKktSCm07F/rJSa5T4G5jRVq4F5YsCXzZHyDl2GUiEtHqMp4ErlxxSXoDU2kSv0XsTK8qPnBQWP9u3BzwUfy8BfQQ3Ra0Hn0TcbbJeDsbLxmt45Orqu87GZoWaOroYkmOdUd2EcKD+ewJWyf/kYLkbnUsfbE/lqIDqCNwT1KDbJvwgBVe7VslT4kPjvV66eFxAky3Tr0kfaJZp5IPBBhX2S8/FVG9jhH0fI5VcErS3lClKmuSom3aWOeHR62weS4C5hQyPOThzSsrhdb2TtblXctugg==; 5:d4ffTrzs02obKh543iKs996BvckXTVA3Eeaao7JfNDXglzmxBaEN/lTsl3tI7k4v6Tw2ZhDEM8uMo1Xf7XeRxYs/t/fktMdCXdWd3cw48a4qMr8lS/EYUBwDkKTJCLuLFAvhjIbff1/UPJD75t1ZO6CraxQZJ/TGpRwywIAEU1s=; 7:ZMkCRcZfwNoAgU7gJ2s6U16byGWYF4pzeFIWl2xYmpYbQsQR0goTYp+oDM0+b2i89kJaRf+5YkmX0r4yzxN0/lqlVcFslTAu5x7EMNnsUiWT/h+1FzXpauDtqJWtwv2gcnecx5jtj+7CNAIvCRtiePD7358XTw5eLzdS+3K5V4i+RtDKwf/ITt8tQG4b21Mzwx3pT6BxiSt8HSFYluwd9sCg6ObCIb3B+jFTcw/H603glpwpSsp+Cwva5qevk3oS x-ms-exchange-antispam-srfa-diagnostics: SOS;SOR; x-ms-office365-filtering-correlation-id: 4d7e45ad-792f-4f22-90a4-08d633fabc79 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(7168020)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:VI1PR08MB0973; x-ms-traffictypediagnostic: VI1PR08MB0973: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231355)(944501410)(52105095)(10201501046)(3002001)(6055026)(149066)(150057)(6041310)(20161123564045)(20161123562045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(201708071742011)(7699051)(76991093); SRVR:VI1PR08MB0973; BCL:0; PCL:0; RULEID:; SRVR:VI1PR08MB0973; x-forefront-prvs: 08286A0BE2 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(979002)(366004)(39860400002)(136003)(376002)(346002)(396003)(40434004)(13464003)(189003)(199004)(72206003)(2900100001)(33656002)(99286004)(71190400001)(4326008)(71200400001)(478600001)(229853002)(316002)(6436002)(53546011)(6506007)(2201001)(8936002)(6246003)(2501003)(81156014)(81166006)(25786009)(8676002)(7696005)(110136005)(86362001)(55236004)(5250100002)(68736007)(76176011)(54906003)(74316002)(66066001)(9686003)(102836004)(105586002)(2906002)(5024004)(186003)(14444005)(106356001)(256004)(5660300001)(7736002)(97736004)(55016002)(305945005)(476003)(14454004)(11346002)(446003)(3846002)(6116002)(53936002)(486006)(26005)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR08MB0973; H:VI1PR08MB3167.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: F4oesL8SloTUGLjr3+duq9OE2NOlZGlSPbbPKCyyZMkWP82M+HrRvlM7/DJiND7hPir6ebQA/1ZPx3V/6WYkdx3UO84APvAKHFfVU0jOpVS6kghyzhYHiclH4DDswAv0npuKx0H96HwPAiGuNf7uupyyrhIDnryAnoytKitwM7jnMt6dkBqqDaKlgFcQsRAw6yaUi3Az4XniLeqd80nzGX31Fx06EQxZ6GUH1Ak/mDL6VWy7cdxScUbDgc8c+Z5psM8+1CGZIR2STPNBZnIGPqiU5xufvoNZHSiyBk9u54eOuSF8xMMnuyYBipeIKQ8QlzWxcCdGXVWQhilEIuzAIxOkQ/nF5a31hLBFZA9KYak= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4d7e45ad-792f-4f22-90a4-08d633fabc79 X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Oct 2018 06:35:31.4296 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB0973 Subject: Re: [dpdk-dev] [PATCH 1/2] ring: synchronize the load and store of the tail X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2018 06:35:36 -0000 Hi Jerin As the 1st one of the 3-patch set was not concluded, I submit this 2-patch = series to unblock the merge. Best Regards, Gavin > -----Original Message----- > From: Gavin Hu > Sent: Wednesday, October 17, 2018 2:30 PM > To: dev@dpdk.org > Cc: Gavin Hu (Arm Technology China) ; Honnappa > Nagarahalli ; > jerin.jacob@caviumnetworks.com; stable@dpdk.org > Subject: [PATCH 1/2] ring: synchronize the load and store of the tail > > Synchronize the load-acquire of the tail and the store-release within > update_tail, the store release ensures all the ring operations, enqueue o= r > dequeue, are seen by the observers on the other side as soon as they see > the updated tail. The load-acquire is needed here as the data dependency = is > not a reliable way for ordering as the compiler might break it by saving = to > temporary values to boost performance. > When computing the free_entries and avail_entries, use atomic semantics t= o > load the heads and tails instead. > > The patch was benchmarked with test/ring_perf_autotest and it decreases > the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains > are dependent on the number of lcores, depth of the ring, SPSC or MPMC. > For 1 lcore, it also improves a little, about 3 ~ 4%. > It is a big improvement, in case of MPMC, with two lcores and ring size o= f 32, > it saves latency up to (3.26-2.36)/3.26 =3D 27.6%. > > This patch is a bug fix, while the improvement is a bonus. In our analysi= s the > improvement comes from the cacheline pre-filling after hoisting load- > acquire from _atomic_compare_exchange_n up above. > > The test command: > $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=3D\ > 1024 -- -i > > Test result with this patch(two cores): > SP/SC bulk enq/dequeue (size: 8): 5.86 > MP/MC bulk enq/dequeue (size: 8): 10.15 SP/SC bulk enq/dequeue (size: > 32): 1.94 MP/MC bulk enq/dequeue (size: 32): 2.36 > > In comparison of the test result without this patch: > SP/SC bulk enq/dequeue (size: 8): 6.67 > MP/MC bulk enq/dequeue (size: 8): 13.12 SP/SC bulk enq/dequeue (size: > 32): 2.04 MP/MC bulk enq/dequeue (size: 32): 3.26 > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") > Cc: stable@dpdk.org > > Signed-off-by: Gavin Hu > Reviewed-by: Honnappa Nagarahalli > Reviewed-by: Steve Capper > Reviewed-by: Ola Liljedahl > Reviewed-by: Jia He > Acked-by: Jerin Jacob > Tested-by: Jerin Jacob > --- > lib/librte_ring/rte_ring_c11_mem.h | 20 ++++++++++++++++---- > 1 file changed, 16 insertions(+), 4 deletions(-) > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > b/lib/librte_ring/rte_ring_c11_mem.h > index 94df3c4..4851763 100644 > --- a/lib/librte_ring/rte_ring_c11_mem.h > +++ b/lib/librte_ring/rte_ring_c11_mem.h > @@ -67,13 +67,18 @@ __rte_ring_move_prod_head(struct rte_ring *r, > unsigned int is_sp, > *old_head =3D __atomic_load_n(&r->prod.head, > __ATOMIC_ACQUIRE); > > -/* > - * The subtraction is done between two unsigned 32bits > value > +/* load-acquire synchronize with store-release of ht->tail > + * in update_tail. > + */ > +const uint32_t cons_tail =3D __atomic_load_n(&r->cons.tail, > + > __ATOMIC_ACQUIRE); > + > +/* The subtraction is done between two unsigned 32bits > value > * (the result is always modulo 32 bits even if we have > * *old_head > cons_tail). So 'free_entries' is always > between 0 > * and capacity (which is < size). > */ > -*free_entries =3D (capacity + r->cons.tail - *old_head); > +*free_entries =3D (capacity + cons_tail - *old_head); > > /* check that we have enough room in ring */ > if (unlikely(n > *free_entries)) > @@ -131,15 +136,22 @@ __rte_ring_move_cons_head(struct rte_ring *r, int > is_sc, > do { > /* Restore n as it may change every loop */ > n =3D max; > + > *old_head =3D __atomic_load_n(&r->cons.head, > __ATOMIC_ACQUIRE); > > +/* this load-acquire synchronize with store-release of ht->tail > + * in update_tail. > + */ > +const uint32_t prod_tail =3D __atomic_load_n(&r->prod.tail, > +__ATOMIC_ACQUIRE); > + > /* The subtraction is done between two unsigned 32bits > value > * (the result is always modulo 32 bits even if we have > * cons_head > prod_tail). So 'entries' is always between 0 > * and size(ring)-1. > */ > -*entries =3D (r->prod.tail - *old_head); > +*entries =3D (prod_tail - *old_head); > > /* Set the actual entries for dequeue */ > if (n > *entries) > -- > 2.7.4 IMPORTANT NOTICE: The contents of this email and any attachments are confid= ential and may also be privileged. If you are not the intended recipient, p= lease notify the sender immediately and do not disclose the contents to any= other person, use it for any purpose, or store or copy the information in = any medium. Thank you.