From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B007C4306A; Tue, 15 Aug 2023 07:14:54 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 853BE427E9; Tue, 15 Aug 2023 07:14:54 +0200 (CEST) Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on2041.outbound.protection.outlook.com [40.107.13.41]) by mails.dpdk.org (Postfix) with ESMTP id 518E841104 for ; Tue, 15 Aug 2023 07:14:52 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nMXidwlh6KNHYYmEehhUFwOF+iF02aXiJFRZ46SUKjzjEQbmgiGdHdyTxec8ZQJSms70pIgz6aZ5Bz73Knx6gPhUVRqR2KR8EEE7HAw4pzCJKT/X4fU3mxZZ2vRjxN8V16FPbh6zXVf7pVqAkDECj8jMP7qmtA5kCDTNpJJVwNpbxW18FpGYT4CEP8qYXlCGcGoaXWPfkZoDd/VzQhXr64eaKNeQf+//5hN5HeQPv5CI3idwFQ3TRdYxgdwJB+1ERFwakSFgEXfs/jqIajcnDgHqt6s7Nbpdn8/6vCL4DTsmwY0wo7WVrWgaQZArZkG+jq6XcM/mXpLyGs3zhzaIfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=laQI4KDpjvaEfFqz6XkU1YmI0OsTyKvtfVA9dx8i0KU=; b=ZmA9xdLQE4JT6iHAIWxPaLdH20TT0nNZ3g+Fe0VrrJWNttK84bL9vYofGCQthcRRMytDyXQB/MgwgRm4jRsQ5ULD2g8z85A35fygbaV8jl147R3uhqRdZpQEnI1Ow+IBCrGMbx9YWiJFkuYTjdb80MBZpmrg9qOgss4Wxb8JsiTNZ5+jqBS182uTf/Ck+lB09cl6ZDm8TTH7uoQryEkDAjzK1OTghkXFHVDxnj/phwLdWhjzflCq0Aywwi1yFb1gEWjs8GYk3rsSi73evJmR8wEWcTwbAi6cT0FVEOMC23+c2SMcMvrNcJpcNQOBp1g22O98LVqs9WpTGYyfzGDqWw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=laQI4KDpjvaEfFqz6XkU1YmI0OsTyKvtfVA9dx8i0KU=; b=+Jlik39HR0k3sE6fHM/zUElnVkTbxgKsNFE8hVuY8AYBKVDOWsB0CjQQSyvRuz80C/x9eNc7NspQUl+F1vE4E1sx5KI3mrewUctxXicLQjSElZTzGBxcg27+t+hhRhcU1EzyNWsaMo4RZusfZqiyqyXPTUCkl8JOdnFL0AUw86g= Received: from DBAPR08MB5814.eurprd08.prod.outlook.com (2603:10a6:10:1b1::6) by PAXPR08MB7318.eurprd08.prod.outlook.com (2603:10a6:102:231::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6678.24; Tue, 15 Aug 2023 05:14:45 +0000 Received: from DBAPR08MB5814.eurprd08.prod.outlook.com ([fe80::b3a3:eb56:481f:99c2]) by DBAPR08MB5814.eurprd08.prod.outlook.com ([fe80::b3a3:eb56:481f:99c2%4]) with mapi id 15.20.6678.025; Tue, 15 Aug 2023 05:14:45 +0000 From: Honnappa Nagarahalli To: Konstantin Ananyev , Wathsala Wathawana Vithanage , "konstantin.v.ananyev@yandex.ru" , "thomas@monjalon.net" , Ruifeng Wang CC: "dev@dpdk.org" , nd , Justin He , nd Subject: RE: [RFC] ring: further performance improvements with C11 Thread-Topic: [RFC] ring: further performance improvements with C11 Thread-Index: AQHZyu3rd69l1EzC8Uu7ypuMBfp9fK/q1pFw Date: Tue, 15 Aug 2023 05:14:44 +0000 Message-ID: References: <20230615201335.919563-1-wathsala.vithanage@arm.com> <20230615201335.919563-2-wathsala.vithanage@arm.com> <67a8987cb0d5456b9c99887402ea30af@huawei.com> <184a99eee3bf41f1ace54273678565fb@huawei.com> In-Reply-To: <184a99eee3bf41f1ace54273678565fb@huawei.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: 863547DDB510434EA7EE4A79E4725999.0 x-checkrecipientchecked: true authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: DBAPR08MB5814:EE_|PAXPR08MB7318:EE_ x-ms-office365-filtering-correlation-id: 4c2caeb3-3f38-4029-3546-08db9d4e898c x-ld-processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr nodisclaimer: true x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: pK0yyr3k8eYV3uRFmp+68JFzE7GLfmV92QGvM4JWdRjeAc2o8DRQTTaJSxkWQekTK62dfKz8nzjhDRLIxoRAoau5UGFP0ru716HyKAHxRw2+IveokPifzU5uHBu6/juTZTISdwQbGNLSReGNDWsJ/0vB5A60JArW6DUTQBUiV7sYdjORVYKBojg4ro1CCy4xGkQ0s6/E44Hv7eUGMlGyT7MR3QwfVPCh4FN8s3tnRgd4WD3eVJVcG1lVRY6cvgC3tlqS0sEGGFBlEGEth8DyLmWlqPKTcanJ0mJruQi339xDkm0Y/jMJzF3dVnPIxXxteMuOT0KUxwEJmZ4sL0Jdpqia9EeaKEUkbXBpjotzl9W3r6R5Fh1yORqnrCOL38905COGWLkp47fkbeTBiYVStK9pUjLV6CcaV3Nf4Ddp6c0VG1UUXGyjBJzOvnLf5Cbr1lID9JpR9EOCtYkRpMlX4wsSmgQs6YYC77Jnj0AoQrJv8jVW+db07d/PhC5rzLWUuZvy+rBZqZGGw2DXdN1OXyIeladas+UCqRCDAZtdb32feQ93Cp0O8j6H+6M+TZ9VW+CvRg1JzifwvrvzgwlhQWp9vrarlaAeNtNrysBOw3H3WCvHwI83E3QuoURLq4bB8yUfXC2Ve2oJxZowZvF55577uBc5paiDSflmWRsL5/I= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DBAPR08MB5814.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(346002)(376002)(366004)(39860400002)(396003)(136003)(451199021)(186006)(1800799006)(33656002)(52536014)(5660300002)(86362001)(2906002)(55016003)(83380400001)(66556008)(4326008)(316002)(64756008)(76116006)(66476007)(54906003)(38100700002)(66446008)(38070700005)(66946007)(6636002)(110136005)(41300700001)(9686003)(53546011)(6506007)(7696005)(8676002)(8936002)(71200400001)(966005)(478600001)(122000001)(23180200003); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?0G8dsuoEtnPBHExzak74j/tTKEoalepnMB7edzTXSeW8SlTt33ToRMUna3oD?= =?us-ascii?Q?Rck+2mKcxKjTWDtHVVKrW12YS9x0ELQg0If0mJWsuQunzVOwFGC/D0pfVu6R?= =?us-ascii?Q?6GV6ZvK14END4gbTlbkjgKyf7G0mzx65v/PVPJPuDCOI6hbugyDmvKhj0iss?= =?us-ascii?Q?8ndYRDLYsuaJYnoY1ST1k2drEXW+BPNdgDwPmQtYPibYFrdGXfJdyzWMt1rM?= =?us-ascii?Q?Z0Cj7N5CEMF9s6L5UZk51wW5YfKWGDcgQ2nVv3lryuDXS8aE/SdnelLTeui5?= =?us-ascii?Q?T4bujynmOGktY+LSnLZlA+3ugjhqXagRIcWooASnxluOTQtAEWWIj60bxeG7?= =?us-ascii?Q?1tVd1FZL6IN/o1RQMerXEUtkfBjtpgFuzfpdpASZZrxdNPJy6zV9lMmLHCRu?= =?us-ascii?Q?9OLNVNXrIbYQodrQyoqkOjZh07KNP0ljjxyDVTuoAHHCfqxNYSDEFqYCHeeX?= =?us-ascii?Q?8WuxPnNq2wDFXSFjHvHmAZ1CavkIxfgIntCPpxlUI5LrerMZZ2CqylFrFdsx?= =?us-ascii?Q?l2fV2aLbVXVoO7pBHBNIrK6xk8XHr+k2YRWwEkuV/MxEE5YPDFZOaK/sfRA8?= =?us-ascii?Q?cbgQZ3VShRusJez/+Xsehz4v6VAvNC23vlZVc0nJN8AhIqJFUSW4uHgBxoUc?= =?us-ascii?Q?VwAknUF8mc+9owiI8L+YwJav2wp78ByKHNz+mxYvo+MPWbQ767/DS5mY9sVK?= =?us-ascii?Q?914WclCn9IgVLQvqDqpXiIAqkitA7HBJBYjIb5dmQjfsIyROdVIa0ezwKYVG?= =?us-ascii?Q?6X2fsM6RgmPTfGHwVRqINuyMUrZv5nLv60FemT62dDJHpqqlLe7BYBz3NvAO?= =?us-ascii?Q?0L7UuuiPIOdnmItg4VEMlNKdReMiqvVa1GnqBQy5w4lUNzaZo8qJ1f0EbXxH?= =?us-ascii?Q?XrvBGUoUgKPYIr3K5KiMmXXiAwW51WGpafbEnN5HcXqR4XqCeNsK0nCTBc7s?= =?us-ascii?Q?EjDX+kdBpR0/SqamSjHs4VLbtNW5H1zVpNcBKQGTjUG4Y08sKKbc5fGF46qg?= =?us-ascii?Q?cL5KCCX+jbxdEQO7AO6bocZgoHlvvB7ZXesojvT/iHUIYZjCWaCT/M7mSM40?= =?us-ascii?Q?lkuRoNZrM7CA3WG7F8oIWOJ+lwU7j0f0vkQbMntJS81s5zySjCU3gJ4kq5Op?= =?us-ascii?Q?2K4ERTbaZmA6itsi+IQ5oZ4ZSXSc5+NO0blL04BXsNmVsV4vXYvWc+lgU1x1?= =?us-ascii?Q?h1T+NLWNNwJKhoImbeUByUGjHp86hNJM8pmZj2yR6u6TLygZmGa3EV23Jkpf?= =?us-ascii?Q?GLJHAIv0EE46VLABHsdzv9l0ktnlBnoJ9hSbaHtQ7clvfUpLzJ1L6NNAgMJA?= =?us-ascii?Q?IaMmJjHju6k8AY9gkewQtZGsqH0HnQAQTeZzrvrdPUo7JhBLDViZ+nj2XNu+?= =?us-ascii?Q?06L3CPHoo9ExlUipEobtMaX/WPj2wcPWBwxivlHRbQjW0riDk43Itn9FNGX+?= =?us-ascii?Q?lsfC3wVeD+95d0UtY1rhfVAjJWwyfTsQVSJCQL2+NFWbFFVzf5l6qMyjmSu2?= =?us-ascii?Q?H7/3rGZmBNZsJPgtM5CKVpkEzzbevG56ogcQeH2VS+lByC2A7FUrB3PMCRqw?= =?us-ascii?Q?qSB7JV+OhEOwvNA56elreYVJ6iiIsGSMp9W3dNv+dyUlUky3DK5PVYBiSgJ5?= =?us-ascii?Q?N+0toCQqd7RTSFhsfYFSIlugrSMifDsNdgdksQO1mMimah15muD+vYWu54lq?= =?us-ascii?Q?48HTOg=3D=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DBAPR08MB5814.eurprd08.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4c2caeb3-3f38-4029-3546-08db9d4e898c X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Aug 2023 05:14:44.3649 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: y5HQsAfXAxRTyzRrQyRqG82Ye1LKTZubv/wgJyzCZxJzHBR943l/p8eXJpuGgyauiVnmtfBSOh3VeWn+ODfSYikLOa3+yOnN/t5q+bIPiWE= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB7318 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > -----Original Message----- > From: Konstantin Ananyev > Sent: Wednesday, August 9, 2023 1:19 PM > To: Wathsala Wathawana Vithanage ; > Honnappa Nagarahalli ; > konstantin.v.ananyev@yandex.ru; thomas@monjalon.net; Ruifeng Wang > > Cc: dev@dpdk.org; nd ; Justin He ; nd > > Subject: RE: [RFC] ring: further performance improvements with C11 >=20 >=20 > > > > For improved performance over the current C11 based ring > > > > implementation following changes were made. > > > > (1) Replace tail store with RELEASE semantics in > > > > __rte_ring_update_tail with a RELEASE fence. Replace load of the > > > > tail with ACQUIRE semantics in __rte_ring_move_prod_head and > > > > __rte_ring_move_cons_head with ACQUIRE fences. > > > > (2) Remove ACQUIRE fences between load of the old_head and load of > > > > the cons_tail in __rte_ring_move_prod_head and > __rte_ring_move_cons_head. > > > > These two fences are not required for the safety of the ring librar= y. > > > > > > Hmm... with these changes, aren't we re-introducing the old bug > > > fixed by this > > > commit: > > > > Cover letter explains why this barrier does not solve what it intends > > to solve and Why it should not matter. > > https://mails.dpdk.org/archives/dev/2023-June/270874.html >=20 > Ok, let's consider the case similar to yours (i), but when r->prod.head w= as > moved for distance greater then r->capacity. > To be more specific, let' start with the same initial state: > capacity =3D 32 > r->cons.tail =3D 5 > r->cons.head =3D 5 > r->prod.head =3D 10 > r-prod.tail =3D 10 >=20 > time 0, thread1: > /* re-ordered load */ > const_tail =3D r->cons.tail; //=3D 5 >=20 > Now, thread1 was stalled for a bit, meanwhile there were few What exactly do you mean by 'stalled'? If you are meaning, thread is preemp= ted, then the ring algorithm is not designed for it. There are restrictions= mentioned in [1]. However, we do need to handle the re-ordering case. [1] https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#known= -issues > enqueue/dequeus done by other threads, so current state of the ring: > r->cons.tail =3D 105 > r->cons.head =3D 105 > r->prod.head =3D 110 > r-prod.tail =3D 110 >=20 > time 1, thread1: > old_head =3D r->prod.head; // 110 > *free_entries =3D (capacity + cons_tail - old_head); // =3D (uint32_t)(32= + 5 - 110) > =3D=3D (uint32_t)-73 =3D=3D 4294967223 >=20 > So, free_entries value is way too big, and that comparison: >=20 > if (unlikely(n > *free_entries)) >=20 > might provide wrong result. >=20 > So I still think we do need some sort of _read_fence_ between these two l= oads. > As I said before, that looks exactly like the old bug, fixed a while ago: > http://git.dpdk.org/dpdk/commit/?id=3D9bc2cbb007c0a3335c5582357ae9f6d37 > ea0b654 > but now re-introduced for C11 case. Agree that the re-ordering case should be handled. I am thinking a check (*= free_entries > capacity) and restarting the loop might suffice (without the= barrier)? >=20 > > > > > > commit 9bc2cbb007c0a3335c5582357ae9f6d37ea0b654 > > > Author: Jia He > > > Date: Fri Nov 10 03:30:42 2017 +0000 > > > > > > ring: guarantee load/load order in enqueue and dequeue > > > > > > We watched a rte panic of mbuf_autotest in our qualcomm arm64 ser= ver > > > (Amberwing). > > > > > > Root cause: > > > In __rte_ring_move_cons_head() > > > ... > > > do { > > > /* Restore n as it may change every loop */ > > > n =3D max; > > > > > > *old_head =3D r->cons.head; //1st = load > > > const uint32_t prod_tail =3D r->prod.tail; //2nd > > > load > > > > > > In weak memory order architectures (powerpc,arm), the 2nd load mi= ght > be > > > reodered before the 1st load, that makes *entries is bigger than = we > wanted. > > > This nasty reording messed enque/deque up. > > > .... > > > ? > > > > > > > > > > > Signed-off-by: Wathsala Vithanage > > > > Reviewed-by: Honnappa Nagarahalli > > > > Reviewed-by: Ruifeng Wang > > > > --- > > > > .mailmap | 1 + > > > > lib/ring/rte_ring_c11_pvt.h | 35 > > > > ++++++++++++++++++++--------------- > > > > 2 files changed, 21 insertions(+), 15 deletions(-) > > > > > > > > diff --git a/.mailmap b/.mailmap > > > > index 4018f0fc47..367115d134 100644 > > > > --- a/.mailmap > > > > +++ b/.mailmap > > > > @@ -1430,6 +1430,7 @@ Walter Heymans > > > > > > > Wang Sheng-Hui Wangyu (Eric) > > > > Waterman Cao > > > > > +Wathsala Vithanage > > > > Weichun Chen Wei Dai > > > > Weifeng Li diff --git > > > > a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h index > > > > f895950df4..63fe58ce9e 100644 > > > > --- a/lib/ring/rte_ring_c11_pvt.h > > > > +++ b/lib/ring/rte_ring_c11_pvt.h > > > > @@ -16,6 +16,13 @@ __rte_ring_update_tail(struct rte_ring_headtail > > > > *ht, > > > uint32_t old_val, > > > > uint32_t new_val, uint32_t single, uint32_t enqueue) { > > > > RTE_SET_USED(enqueue); > > > > + /* > > > > + * Updating of ht->tail cannot happen before elements are added t= o or > > > > + * removed from the ring, as it could result in data races betwee= n > > > > + * producer and consumer threads. Therefore we need a release > > > > + * barrier here. > > > > + */ > > > > + rte_atomic_thread_fence(__ATOMIC_RELEASE); > > > > > > > > /* > > > > * If there are other enqueues/dequeues in progress that > > > > preceded us, @@ -24,7 +31,7 @@ __rte_ring_update_tail(struct > > > > rte_ring_headtail > > > *ht, uint32_t old_val, > > > > if (!single) > > > > rte_wait_until_equal_32(&ht->tail, old_val, > > > __ATOMIC_RELAXED); > > > > > > > > - __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); > > > > + __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELAXED); > > > > } > > > > > > > > /** > > > > @@ -66,14 +73,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, > > > unsigned int is_sp, > > > > /* Reset n to the initial burst count */ > > > > n =3D max; > > > > > > > > - /* Ensure the head is read before tail */ > > > > - __atomic_thread_fence(__ATOMIC_ACQUIRE); > > > > - > > > > - /* load-acquire synchronize with store-release of ht->tail > > > > - * in update_tail. > > > > - */ > > > > cons_tail =3D __atomic_load_n(&r->cons.tail, > > > > - __ATOMIC_ACQUIRE); > > > > + __ATOMIC_RELAXED); > > > > > > > > /* The subtraction is done between two unsigned 32bits value > > > > * (the result is always modulo 32 bits even if we have @@ - > > > 100,6 > > > > +101,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned > > > > +int > > > is_sp, > > > > 0, __ATOMIC_RELAXED, > > > > __ATOMIC_RELAXED); > > > > } while (unlikely(success =3D=3D 0)); > > > > + /* > > > > + * Ensure that updates to the ring doesn't rise above > > > > + * load of the new_head in SP and MP cases. > > > > + */ > > > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > > > return n; > > > > } > > > > > > > > @@ -142,14 +148,8 @@ __rte_ring_move_cons_head(struct rte_ring *r, > > > > int > > > is_sc, > > > > /* Restore n as it may change every loop */ > > > > n =3D max; > > > > > > > > - /* Ensure the head is read before tail */ > > > > - __atomic_thread_fence(__ATOMIC_ACQUIRE); > > > > - > > > > - /* this load-acquire synchronize with store-release of ht->tail > > > > - * in update_tail. > > > > - */ > > > > prod_tail =3D __atomic_load_n(&r->prod.tail, > > > > - __ATOMIC_ACQUIRE); > > > > + __ATOMIC_RELAXED); > > > > > > > > /* The subtraction is done between two unsigned 32bits value > > > > * (the result is always modulo 32 bits even if we have @@ - > > > 175,6 > > > > +175,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int > > > > +is_sc, > > > > 0, > > > __ATOMIC_RELAXED, > > > > > __ATOMIC_RELAXED); > > > > } while (unlikely(success =3D=3D 0)); > > > > + /* > > > > + * Ensure that updates to the ring doesn't rise above > > > > + * load of the new_head in SP and MP cases. > > > > + */ > > > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > > > return n; > > > > } > > > > > > > > -- > > > > 2.25.1 > > > >