From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5AD7B45AE1; Tue, 8 Oct 2024 14:58:14 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 295B24025C; Tue, 8 Oct 2024 14:58:14 +0200 (CEST) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id DD1564021E for ; Tue, 8 Oct 2024 14:58:12 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XNGDK5gD2z6LD9g; Tue, 8 Oct 2024 20:53:53 +0800 (CST) Received: from frapeml500006.china.huawei.com (unknown [7.182.85.219]) by mail.maildlp.com (Postfix) with ESMTPS id CD1A51404F9; Tue, 8 Oct 2024 20:58:11 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by frapeml500006.china.huawei.com (7.182.85.219) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 8 Oct 2024 14:58:11 +0200 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.039; Tue, 8 Oct 2024 14:58:11 +0200 From: Konstantin Ananyev To: "dev@dpdk.org" CC: "honnappa.nagarahalli@arm.com" , Jerin Jacob , Wathsala Vithanage , "drc@linux.ibm.com" Subject: rte_ring move head question for machines with relaxed MO (arm/ppc) Thread-Topic: rte_ring move head question for machines with relaxed MO (arm/ppc) Thread-Index: AdsZem51pV3bFdnOQj2Lv/oX5wHuJQ== Date: Tue, 8 Oct 2024 12:58:11 +0000 Message-ID: <8139916ad4814629b8804525bd785d58@huawei.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.48.152.51] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi lads, Looking at rte_ring move_head functions I noticed that all of them use slightly different approach to guarantee desired order of memory access= es: 1. rte_ring_generic_pvt.h: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D pseudo-c-code // related armv8 = instructions -------------------- ------= --------------------------------=20 head.load() // ldr [head] rte_smp_rmb() // dmb ishld opposite_tail.load() // ldr [opposite_ta= il] ... rte_atomic32_cmpset(head, ...) // ldrex[head];... stlex[head] 2. rte_ring_c11_pvt.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D pseudo-c-code // related armv8= instructions -------------------- ------= -------------------------------- head.atomic_load(relaxed) // ldr[head] atomic_thread_fence(acquire) // dmb ish opposite_tail.atomic_load(acquire) // lda[opposite_tail] ... head.atomic_cas(..., relaxed) // ldrex[haed]; ... strex= [head] 3. rte_ring_hts_elem_pvt.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D pseudo-c-code // related armv8= instructions -------------------- ------= -------------------------------- head.atomic_load(acquire) // lda [head] opposite_tail.load() // ldr [opposite_ta= il]=20 ... head.atomic_cas(..., acquire) // ldaex[head]; ... strex[= head] =20 The questions that arose from these observations: a) are all 3 approaches equivalent in terms of functionality? b) if yes, is there any difference in terms of performance between: "ldr; dmb; ldr;" vs "lda; ldr;" ? c) Comapring at 1) and 2) above, combination of=20 ldr [head]; dmb; lda [opposite_tail]: looks like an overkill to me. Wouldn't just: ldr [head]; dmb; ldr[opposite_tail]; be sufficient here? I.E.- for reading tail value - we don't need to use load(acquire). Or probably I do miss something obvious here? Thanks Konstantin For convenience, I created a godbot page with all these variants: https://godbolt.org/z/Yjj73b8xa =20 #1 - __rte_ring_headtail_move_head() #2 - __rte_ring_headtail_move_head_c11_v1 =20 #3 - __rte_ring_headtail_move_head_c11_v2 #2 with c) - __rte_ring_headtail_move_head_c11_v3 =20 =20 =20