From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0046.outbound.protection.outlook.com [104.47.0.46]) by dpdk.org (Postfix) with ESMTP id BCDFD1B118; Wed, 26 Sep 2018 11:29:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0keAsul2Z1VruiFC+PqkBr1/T8eR3nM/0Ry17K+/vW8=; b=ZzvGmtjgT8fNjI98dvcdWjlHA5CVbFUYrgy2vo1K+yq4cZPxxHq142bJKp7IMsAeFHHTdXQhnFq7rhKDMLl2/y+3ySH7LM59WSa2+eQappQ+TPwblpI0u4/QJIjfaQp/FnPOdzU6hweGsQmVmSwBjbbSuv5UynY1+AX3UyB7Bpc= Received: from VI1PR08MB3167.eurprd08.prod.outlook.com (52.133.15.142) by VI1PR08MB0608.eurprd08.prod.outlook.com (10.163.169.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1143.18; Wed, 26 Sep 2018 09:29:36 +0000 Received: from VI1PR08MB3167.eurprd08.prod.outlook.com ([fe80::4c13:b1f:ad01:86d7]) by VI1PR08MB3167.eurprd08.prod.outlook.com ([fe80::4c13:b1f:ad01:86d7%4]) with mapi id 15.20.1164.024; Wed, 26 Sep 2018 09:29:36 +0000 From: "Gavin Hu (Arm Technology China)" To: "Gavin Hu (Arm Technology China)" , "dev@dpdk.org" CC: Honnappa Nagarahalli , Steve Capper , Ola Liljedahl , "jerin.jacob@caviumnetworks.com" , nd , "stable@dpdk.org" , Justin He Thread-Topic: [PATCH v3 2/3] ring: synchronize the load and store of the tail Thread-Index: AQHUTl7pRLknpW1AnkGfCembAZIwWKUCWgmQ Date: Wed, 26 Sep 2018 09:29:36 +0000 Message-ID: References: <20180807031943.5331-1-gavin.hu@arm.com> <1537172244-64874-1-git-send-email-gavin.hu@arm.com> <1537172244-64874-2-git-send-email-gavin.hu@arm.com> In-Reply-To: <1537172244-64874-2-git-send-email-gavin.hu@arm.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Gavin.Hu@arm.com; x-originating-ip: [113.29.88.7] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR08MB0608; 6:S7VI6TK7pLRJKHIyBmssR6udlQW+YeCtqklwhqwD2bBxVkrpbKS4Ql6eAgOVrcPLxPfmfFrSLldR7WYDtmKUDE+1z42C7UCzms9VVU6U57SPJmQvLeuby+Nick7BuT0WpHfvZz7o8QJwb77UHDvIlLXwGIdss9aWVhswChDQdzpHQQHMiC/XnfmjJr9sGA4Jt3n0gN+PMouCbm2qqROOP6WKXdz6y4UTh89dwoidnSBHJc77icac+r/95p891r9dw603E6VD5fj/cYN7wKx/CZpNxDZfOxwO0xyceF2Ew1Xjd860VRU0mUtDfjwXusL783rs8Ez9linHn/7G1aeG3xUZmRFC64BMCnQRzxObIGIbbQSRN0Ook2EFDinV5tMX1YOkYa44nhL4QzwBWexdX/2Dm6r/dVo9vSku0GaurfCRTb0uRIoGB3mk5KHAlzC1zo4sG2ZNNqABPNP7DGzZjg==; 5:OM1BhOR/V0u82WyQleVypzpZZsE/lYPVBIAIh894LEZmVolzQgzGYP4C/XqhvEzaF3RRfatrStdmPVLFriiALErrpsI+ydPpcktVcOzUkYx3XHWRGOn2HQqefipfE7AKAKg9yZCZqyh8gNYof7itZGHqeLG3eEpBsw/+5XJGgMI=; 7:qcKkFTTU5vNCubYzbp3WFqEbQXbLynGFOM5NTsVmbFNCkJ6u0bA4jEjtxPD1xM8zDFIwSQ/pgKmMC8RsM9Ir9jFEuoQ2ItiWiBLZkMQ/dJo/qXWYjZC3SFEOBWB8bM3urskpkn60Q+EYniqgJiHZZ2CTChxDaycdTBSb2bI4f3MqJGpGye1hs3PZoM00M7iBxrA769X8FBeXxKc2AUrWxt4Ab7ekk0JJdtZyk6hoROLVm3o6tg+SZpFwp5/E0FZq x-ms-exchange-antispam-srfa-diagnostics: SOS;SOR; x-ms-office365-filtering-correlation-id: db6a5c59-742a-47ab-aa89-08d62392936a x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:VI1PR08MB0608; x-ms-traffictypediagnostic: VI1PR08MB0608: nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(3002001)(3231355)(944501410)(52105095)(93006095)(93001095)(10201501046)(6055026)(149066)(150057)(6041310)(20161123560045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123562045)(201708071742011)(7699051); SRVR:VI1PR08MB0608; BCL:0; PCL:0; RULEID:; SRVR:VI1PR08MB0608; x-forefront-prvs: 08076ABC99 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(376002)(39860400002)(346002)(136003)(396003)(13464003)(199004)(189003)(4326008)(5250100002)(316002)(229853002)(110136005)(6506007)(53546011)(66066001)(6246003)(54906003)(71200400001)(71190400001)(25786009)(11346002)(34290500001)(5660300001)(2906002)(7696005)(33656002)(76176011)(476003)(99286004)(2501003)(486006)(14444005)(8676002)(478600001)(446003)(86362001)(305945005)(55016002)(256004)(97736004)(74316002)(81166006)(81156014)(8936002)(7736002)(105586002)(14454004)(2900100001)(9686003)(53936002)(72206003)(68736007)(26005)(102836004)(106356001)(3846002)(6116002)(6436002); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR08MB0608; H:VI1PR08MB3167.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: s5qCvxQ/HxLAAm/1CaaivrFULRXyoMfwDEk0uTIIDTTYulPUGX+yPmGiD3a34dlEafU7NFjPGouXSqv4OQ38oDIdVlfs/CkrVe5CNPUbH21YtAAZgcajc4RKjXMzVxfQbFPJR2k5CgkCn1ohr911IAzYP+TFAy8y4WJZhDAUJ0S7KsqaF7ZpIX3bww2fOiIr05AxbjkTPbFCenxfxgCTkazI6lHCHWcoaAiqi64yUoDVKfeSjDUEHF02M7DfnzVFwZUmr4xGSEFmziiMK5l9iVbb+YnIAKzUaBQmz4TLV0og8/mAVC6HcjphHqwnUIttslV295lJNTfSl2QKO0xg25AZB8fCHWNrTFN5HR/CP3o= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: db6a5c59-742a-47ab-aa89-08d62392936a X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Sep 2018 09:29:36.3848 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB0608 Subject: Re: [dpdk-dev] [PATCH v3 2/3] ring: synchronize the load and store of the tail X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Sep 2018 09:29:39 -0000 +Justin He for review. > -----Original Message----- > From: Gavin Hu > Sent: Monday, September 17, 2018 4:17 PM > To: dev@dpdk.org > Cc: Gavin Hu (Arm Technology China) ; Honnappa > Nagarahalli ; Steve Capper > ; Ola Liljedahl ; > jerin.jacob@caviumnetworks.com; nd ; stable@dpdk.org > Subject: [PATCH v3 2/3] ring: synchronize the load and store of the tail >=20 > Synchronize the load-acquire of the tail and the store-release within > update_tail, the store release ensures all the ring operations, enqueue o= r > dequeue, are seen by the observers on the other side as soon as they see > the updated tail. The load-acquire is needed here as the data dependency = is > not a reliable way for ordering as the compiler might break it by saving = to > temporary values to boost performance. > When computing the free_entries and avail_entries, use atomic semantics t= o > load the heads and tails instead. >=20 > The patch was benchmarked with test/ring_perf_autotest and it decreases > the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains > are dependent on the number of lcores, depth of the ring, SPSC or MPMC. > For 1 lcore, it also improves a little, about 3 ~ 4%. > It is a big improvement, in case of MPMC, with two lcores and ring size o= f 32, > it saves latency up to (3.26-2.36)/3.26 =3D 27.6%. >=20 > This patch is a bug fix, while the improvement is a bonus. In our analysi= s the > improvement comes from the cacheline pre-filling after hoisting load- > acquire from _atomic_compare_exchange_n up above. >=20 > The test command: > $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=3D\ > 1024 -- -i >=20 > Test result with this patch(two cores): > SP/SC bulk enq/dequeue (size: 8): 5.86 > MP/MC bulk enq/dequeue (size: 8): 10.15 SP/SC bulk enq/dequeue (size: > 32): 1.94 MP/MC bulk enq/dequeue (size: 32): 2.36 >=20 > In comparison of the test result without this patch: > SP/SC bulk enq/dequeue (size: 8): 6.67 > MP/MC bulk enq/dequeue (size: 8): 13.12 SP/SC bulk enq/dequeue (size: > 32): 2.04 MP/MC bulk enq/dequeue (size: 32): 3.26 >=20 > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") > Cc: stable@dpdk.org >=20 > Signed-off-by: Gavin Hu > Reviewed-by: Honnappa Nagarahalli > Reviewed-by: Steve Capper > Reviewed-by: Ola Liljedahl > --- > lib/librte_ring/rte_ring_c11_mem.h | 20 ++++++++++++++++---- > 1 file changed, 16 insertions(+), 4 deletions(-) >=20 > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > b/lib/librte_ring/rte_ring_c11_mem.h > index 234fea0..0eae3b3 100644 > --- a/lib/librte_ring/rte_ring_c11_mem.h > +++ b/lib/librte_ring/rte_ring_c11_mem.h > @@ -68,13 +68,18 @@ __rte_ring_move_prod_head(struct rte_ring *r, > unsigned int is_sp, > *old_head =3D __atomic_load_n(&r->prod.head, > __ATOMIC_ACQUIRE); >=20 > - /* > - * The subtraction is done between two unsigned 32bits > value > + /* load-acquire synchronize with store-release of ht->tail > + * in update_tail. > + */ > + const uint32_t cons_tail =3D __atomic_load_n(&r->cons.tail, > + > __ATOMIC_ACQUIRE); > + > + /* The subtraction is done between two unsigned 32bits > value > * (the result is always modulo 32 bits even if we have > * *old_head > cons_tail). So 'free_entries' is always > between 0 > * and capacity (which is < size). > */ > - *free_entries =3D (capacity + r->cons.tail - *old_head); > + *free_entries =3D (capacity + cons_tail - *old_head); >=20 > /* check that we have enough room in ring */ > if (unlikely(n > *free_entries)) > @@ -132,15 +137,22 @@ __rte_ring_move_cons_head(struct rte_ring *r, int > is_sc, > do { > /* Restore n as it may change every loop */ > n =3D max; > + > *old_head =3D __atomic_load_n(&r->cons.head, > __ATOMIC_ACQUIRE); >=20 > + /* this load-acquire synchronize with store-release of ht->tail > + * in update_tail. > + */ > + const uint32_t prod_tail =3D __atomic_load_n(&r->prod.tail, > + __ATOMIC_ACQUIRE); > + > /* The subtraction is done between two unsigned 32bits > value > * (the result is always modulo 32 bits even if we have > * cons_head > prod_tail). So 'entries' is always between 0 > * and size(ring)-1. > */ > - *entries =3D (r->prod.tail - *old_head); > + *entries =3D (prod_tail - *old_head); >=20 > /* Set the actual entries for dequeue */ > if (n > *entries) > -- > 2.7.4