From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id AD27245AE2; Tue, 8 Oct 2024 17:56:33 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7A1564025C; Tue, 8 Oct 2024 17:56:33 +0200 (CEST) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id 836A34021E for ; Tue, 8 Oct 2024 17:56:32 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XNLFc3b5lz6K6Wb; Tue, 8 Oct 2024 23:55:16 +0800 (CST) Received: from frapeml500007.china.huawei.com (unknown [7.182.85.172]) by mail.maildlp.com (Postfix) with ESMTPS id 65D1E140133; Tue, 8 Oct 2024 23:56:31 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by frapeml500007.china.huawei.com (7.182.85.172) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 8 Oct 2024 17:56:31 +0200 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.039; Tue, 8 Oct 2024 17:56:31 +0200 From: Konstantin Ananyev To: Konstantin Ananyev , "Wathsala Wathawana Vithanage" , "dev@dpdk.org" CC: Honnappa Nagarahalli , "jerinj@marvell.com" , "drc@linux.ibm.com" , nd Subject: RE: rte_ring move head question for machines with relaxed MO (arm/ppc) Thread-Topic: rte_ring move head question for machines with relaxed MO (arm/ppc) Thread-Index: AdsZem51pV3bFdnOQj2Lv/oX5wHuJQAE4tywAAJgEaAAAKmmgA== Date: Tue, 8 Oct 2024 15:56:30 +0000 Message-ID: References: <8139916ad4814629b8804525bd785d58@huawei.com> <0badc1b8ea524bf3b69d0b7b316bdc8f@huawei.com> In-Reply-To: <0badc1b8ea524bf3b69d0b7b316bdc8f@huawei.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.48.152.51] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > -----Original Message----- > From: Konstantin Ananyev > Sent: Tuesday, October 8, 2024 4:46 PM > To: Wathsala Wathawana Vithanage ; dev@dpdk.o= rg > Cc: Honnappa Nagarahalli ; jerinj@marvell.c= om; drc@linux.ibm.com; nd > Subject: RE: rte_ring move head question for machines with relaxed MO (ar= m/ppc) >=20 >=20 > > > 1. rte_ring_generic_pvt.h: > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > pseudo-c-code // related = armv8 instructions > > > -------------------- = -------------------------------------- > > > head.load() // ldr [= head] > > > rte_smp_rmb() // dmb ishld > > > opposite_tail.load() // ldr [oppos= ite_tail] > > > ... > > > rte_atomic32_cmpset(head, ...) // ldrex[head];... stlex[= head] > > > > > > > > > 2. rte_ring_c11_pvt.h > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > pseudo-c-code // related= armv8 instructions > > > -------------------- = -------------------------------------- > > > head.atomic_load(relaxed) // ldr[head] > > > atomic_thread_fence(acquire) // dmb ish > > > opposite_tail.atomic_load(acquire) // lda[opposite_tail] > > > ... > > > head.atomic_cas(..., relaxed) // ldrex[haed]; ...= strex[head] > > > > > > > > > 3. rte_ring_hts_elem_pvt.h > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > > > > > > pseudo-c-code // related= armv8 instructions > > > -------------------- = -------------------------------------- > > > head.atomic_load(acquire) // lda [head] > > > opposite_tail.load() // ldr [oppos= ite_tail] > > > ... > > > head.atomic_cas(..., acquire) // ldaex[head]; ... = strex[head] > > > > > > The questions that arose from these observations: > > > a) are all 3 approaches equivalent in terms of functionality? > > Different, lda (Load with acquire semantics) and ldr (load) are differe= nt. >=20 > I understand that, my question was: > lda {head]; ldr[tail] > vs > ldr [head]; dmb ishld; ldr [tail]; >=20 > Is there any difference in terms of functionality (memory ops ordering/ob= servability)? To be more precise: lda {head]; ldr[tail] vs ldr [head]; dmb ishld; ldr [tail];=20 vs ldr [head]; dmb ishld; lda [tail]; what would be the difference between these 3 cases? >=20 > > > > > b) if yes, is there any difference in terms of performance between: > > > "ldr; dmb; ldr;" vs "lda; ldr;" > > > ? > > dmb is a full barrier, performance is poor. > > I would assume (haven't measured) ldr; dmb; ldr to be less performant t= han lda;ldr; >=20 > Through all this mail am talking about 'dmb ishld', sorry for not being c= lear upfront. >=20 > > > > > c) Comapring at 1) and 2) above, combination of > > > ldr [head]; dmb; lda [opposite_tail]: > > > looks like an overkill to me. Wouldn't just: > > > ldr [head]; dmb; ldr[opposite_tail]; > > > be sufficient here? > > lda [opposite_tail]: synchronizes with stlr in tail update that happens= after array update. > > So, it cannot be changed to ldr. >=20 > Can you explain me a bit more here why it is not possible? > From here: > https://developer.arm.com/documentation/dui0802/b/A32-and-T32-Instruction= s/LDA-and-STL > "There is no requirement that a load-acquire and store-release be paired.= " > Do I misinterpret this statement somehow? >=20 > > lda can be replaced with ldapr (LDA with release consistency - processo= r consistency) > > which is more performant as lda is allowed to rise above stlr. Can be d= one with -mcpu=3D+rcpc > > > > --wathsala > >