From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 1B927A0562;
	Tue, 23 Mar 2021 09:44:57 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 0F1A3140D04;
	Tue, 23 Mar 2021 09:44:52 +0100 (CET)
Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com
 [67.231.156.173])
 by mails.dpdk.org (Postfix) with ESMTP id 66ACE40687
 for <dev@dpdk.org>; Tue, 23 Mar 2021 09:44:51 +0100 (CET)
Received: from pps.filterd (m0045851.ppops.net [127.0.0.1])
 by mx0b-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id
 12N8ZFWS031024 for <dev@dpdk.org>; Tue, 23 Mar 2021 01:44:50 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com;
 h=from : to : cc :
 subject : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=pfpt0220;
 bh=mYIrgGfbNE0lvwVgDGhy9VI6QI97ss6KjFS6XnlsYic=;
 b=g80uuyB9wf5I85t8iSP1b+1zPoX9gi+a5cLICKcc6jaL8b+qQ6ntojUTAdnCTFn61vw+
 9pPK7VYF2KolIWVFYEwVzoCEoCg+lCjqtECN6LlaUoWIlvPCETQv41Zskb+mGuJBRcND
 p32K2xgGH+RAUx0OYf4j1DYRmyHlTl/j2hOwMnsI2FkNEyVtbGRuIpBMgP/2uhmjKpYU
 zgW9F/kzE78eKt1CZr9LmZNd4Gwf0iweBoG9ZB2tGXi0s+oDLfTi7a/uF8rDCdCnio0s
 JlRg5HK6FwhdyDXYpM+wrLJIFFpFr8c7awOFARb6BUlK6Tfd5cT2uZjiiGn7u2bnh/9F +A== 
Received: from dc5-exch02.marvell.com ([199.233.59.182])
 by mx0b-0016f401.pphosted.com with ESMTP id 37dgjp05um-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT)
 for <dev@dpdk.org>; Tue, 23 Mar 2021 01:44:50 -0700
Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com
 (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.2;
 Tue, 23 Mar 2021 01:44:48 -0700
Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com
 (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.2 via Frontend
 Transport; Tue, 23 Mar 2021 01:44:48 -0700
Received: from BG-LT7430.marvell.com (BG-LT7430.marvell.com [10.28.177.176])
 by maili.marvell.com (Postfix) with ESMTP id 686933F703F;
 Tue, 23 Mar 2021 01:44:47 -0700 (PDT)
From: <pbhagavatula@marvell.com>
To: <jerinj@marvell.com>, Pavan Nikhilesh <pbhagavatula@marvell.com>
CC: <dev@dpdk.org>
Date: Tue, 23 Mar 2021 14:14:36 +0530
Message-ID: <20210323084439.3898-2-pbhagavatula@marvell.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20210323084439.3898-1-pbhagavatula@marvell.com>
References: <20210321084915.2649-1-pbhagavatula@marvell.com>
 <20210323084439.3898-1-pbhagavatula@marvell.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369, 18.0.761
 definitions=2021-03-23_02:2021-03-22,
 2021-03-23 signatures=0
Subject: [dpdk-dev] [PATCH v3 2/4] event/octeontx2: optimize timer arm
 routine
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Use relaxed load exclusive when polling for other threads or
hardware to complete.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 v3 Changes:
 - Fix incorrect asm register usage detected by clang.

 drivers/event/octeontx2/otx2_tim_worker.c |   1 +
 drivers/event/octeontx2/otx2_tim_worker.h | 163 ++++++++++++----------
 2 files changed, 90 insertions(+), 74 deletions(-)

diff --git a/drivers/event/octeontx2/otx2_tim_worker.c b/drivers/event/octeontx2/otx2_tim_worker.c
index eb901844d..6a3511ec0 100644
--- a/drivers/event/octeontx2/otx2_tim_worker.c
+++ b/drivers/event/octeontx2/otx2_tim_worker.c
@@ -170,6 +170,7 @@ otx2_tim_timer_cancel_burst(const struct rte_event_timer_adapter *adptr,
 	int ret;

 	RTE_SET_USED(adptr);
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
 	for (index = 0; index < nb_timers; index++) {
 		if (tim[index]->state == RTE_EVENT_TIMER_CANCELED) {
 			rte_errno = EALREADY;
diff --git a/drivers/event/octeontx2/otx2_tim_worker.h b/drivers/event/octeontx2/otx2_tim_worker.h
index f03912b81..5ece8fd05 100644
--- a/drivers/event/octeontx2/otx2_tim_worker.h
+++ b/drivers/event/octeontx2/otx2_tim_worker.h
@@ -84,7 +84,13 @@ tim_bkt_inc_lock(struct otx2_tim_bkt *bktp)
 static inline void
 tim_bkt_dec_lock(struct otx2_tim_bkt *bktp)
 {
-	__atomic_add_fetch(&bktp->lock, 0xff, __ATOMIC_RELEASE);
+	__atomic_fetch_sub(&bktp->lock, 1, __ATOMIC_RELEASE);
+}
+
+static inline void
+tim_bkt_dec_lock_relaxed(struct otx2_tim_bkt *bktp)
+{
+	__atomic_fetch_sub(&bktp->lock, 1, __ATOMIC_RELAXED);
 }

 static inline uint32_t
@@ -246,22 +252,20 @@ tim_add_entry_sp(struct otx2_tim_ring * const tim_ring,
 		if (tim_bkt_get_nent(lock_sema) != 0) {
 			uint64_t hbt_state;
 #ifdef RTE_ARCH_ARM64
-			asm volatile(
-					"	ldaxr %[hbt], [%[w1]]	\n"
-					"	tbz %[hbt], 33, dne%=	\n"
-					"	sevl			\n"
-					"rty%=: wfe			\n"
-					"	ldaxr %[hbt], [%[w1]]	\n"
-					"	tbnz %[hbt], 33, rty%=	\n"
-					"dne%=:				\n"
-					: [hbt] "=&r" (hbt_state)
-					: [w1] "r" ((&bkt->w1))
-					: "memory"
-				    );
+			asm volatile("		ldxr %[hbt], [%[w1]]	\n"
+				     "		tbz %[hbt], 33, dne%=	\n"
+				     "		sevl			\n"
+				     "rty%=:	wfe			\n"
+				     "		ldxr %[hbt], [%[w1]]	\n"
+				     "		tbnz %[hbt], 33, rty%=	\n"
+				     "dne%=:				\n"
+				     : [hbt] "=&r"(hbt_state)
+				     : [w1] "r"((&bkt->w1))
+				     : "memory");
 #else
 			do {
 				hbt_state = __atomic_load_n(&bkt->w1,
-						__ATOMIC_ACQUIRE);
+							    __ATOMIC_RELAXED);
 			} while (hbt_state & BIT_ULL(33));
 #endif

@@ -282,10 +286,10 @@ tim_add_entry_sp(struct otx2_tim_ring * const tim_ring,

 		if (unlikely(chunk == NULL)) {
 			bkt->chunk_remainder = 0;
-			tim_bkt_dec_lock(bkt);
 			tim->impl_opaque[0] = 0;
 			tim->impl_opaque[1] = 0;
 			tim->state = RTE_EVENT_TIMER_ERROR;
+			tim_bkt_dec_lock(bkt);
 			return -ENOMEM;
 		}
 		mirr_bkt->current_chunk = (uintptr_t)chunk;
@@ -298,12 +302,11 @@ tim_add_entry_sp(struct otx2_tim_ring * const tim_ring,
 	/* Copy work entry. */
 	*chunk = *pent;

-	tim_bkt_inc_nent(bkt);
-	tim_bkt_dec_lock(bkt);
-
 	tim->impl_opaque[0] = (uintptr_t)chunk;
 	tim->impl_opaque[1] = (uintptr_t)bkt;
-	tim->state = RTE_EVENT_TIMER_ARMED;
+	__atomic_store_n(&tim->state, RTE_EVENT_TIMER_ARMED, __ATOMIC_RELEASE);
+	tim_bkt_inc_nent(bkt);
+	tim_bkt_dec_lock_relaxed(bkt);

 	return 0;
 }
@@ -331,22 +334,20 @@ tim_add_entry_mp(struct otx2_tim_ring * const tim_ring,
 		if (tim_bkt_get_nent(lock_sema) != 0) {
 			uint64_t hbt_state;
 #ifdef RTE_ARCH_ARM64
-			asm volatile(
-					"	ldaxr %[hbt], [%[w1]]	\n"
-					"	tbz %[hbt], 33, dne%=	\n"
-					"	sevl			\n"
-					"rty%=: wfe			\n"
-					"	ldaxr %[hbt], [%[w1]]	\n"
-					"	tbnz %[hbt], 33, rty%=	\n"
-					"dne%=:				\n"
-					: [hbt] "=&r" (hbt_state)
-					: [w1] "r" ((&bkt->w1))
-					: "memory"
-				    );
+			asm volatile("		ldxr %[hbt], [%[w1]]	\n"
+				     "		tbz %[hbt], 33, dne%=	\n"
+				     "		sevl			\n"
+				     "rty%=:	wfe			\n"
+				     "		ldxr %[hbt], [%[w1]]	\n"
+				     "		tbnz %[hbt], 33, rty%=	\n"
+				     "dne%=:				\n"
+				     : [hbt] "=&r"(hbt_state)
+				     : [w1] "r"((&bkt->w1))
+				     : "memory");
 #else
 			do {
 				hbt_state = __atomic_load_n(&bkt->w1,
-						__ATOMIC_ACQUIRE);
+							    __ATOMIC_RELAXED);
 			} while (hbt_state & BIT_ULL(33));
 #endif

@@ -359,26 +360,24 @@ tim_add_entry_mp(struct otx2_tim_ring * const tim_ring,

 	rem = tim_bkt_fetch_rem(lock_sema);
 	if (rem < 0) {
+		tim_bkt_dec_lock(bkt);
 #ifdef RTE_ARCH_ARM64
-		asm volatile(
-				"	ldaxrh %w[rem], [%[crem]]	\n"
-				"	tbz %w[rem], 15, dne%=		\n"
-				"	sevl				\n"
-				"rty%=: wfe				\n"
-				"	ldaxrh %w[rem], [%[crem]]	\n"
-				"	tbnz %w[rem], 15, rty%=		\n"
-				"dne%=:					\n"
-				: [rem] "=&r" (rem)
-				: [crem] "r" (&bkt->chunk_remainder)
-				: "memory"
-			    );
+		uint64_t w1;
+		asm volatile("		ldxr %[w1], [%[crem]]	\n"
+			     "		tbz %[w1], 63, dne%=		\n"
+			     "		sevl				\n"
+			     "rty%=:	wfe				\n"
+			     "		ldxr %[w1], [%[crem]]	\n"
+			     "		tbnz %[w1], 63, rty%=		\n"
+			     "dne%=:					\n"
+			     : [w1] "=&r"(w1)
+			     : [crem] "r"(&bkt->w1)
+			     : "memory");
 #else
-		while (__atomic_load_n(&bkt->chunk_remainder,
-				       __ATOMIC_ACQUIRE) < 0)
+		while (__atomic_load_n((int64_t *)&bkt->w1, __ATOMIC_RELAXED) <
+		       0)
 			;
 #endif
-		/* Goto diff bucket. */
-		tim_bkt_dec_lock(bkt);
 		goto __retry;
 	} else if (!rem) {
 		/* Only one thread can be here*/
@@ -388,18 +387,21 @@ tim_add_entry_mp(struct otx2_tim_ring * const tim_ring,
 			chunk = tim_insert_chunk(bkt, mirr_bkt, tim_ring);

 		if (unlikely(chunk == NULL)) {
-			tim_bkt_set_rem(bkt, 0);
-			tim_bkt_dec_lock(bkt);
 			tim->impl_opaque[0] = 0;
 			tim->impl_opaque[1] = 0;
 			tim->state = RTE_EVENT_TIMER_ERROR;
+			tim_bkt_set_rem(bkt, 0);
+			tim_bkt_dec_lock(bkt);
 			return -ENOMEM;
 		}
 		*chunk = *pent;
-		while (tim_bkt_fetch_lock(lock_sema) !=
-				(-tim_bkt_fetch_rem(lock_sema)))
-			lock_sema = __atomic_load_n(&bkt->w1, __ATOMIC_ACQUIRE);
-
+		if (tim_bkt_fetch_lock(lock_sema)) {
+			do {
+				lock_sema = __atomic_load_n(&bkt->w1,
+							    __ATOMIC_RELAXED);
+			} while (tim_bkt_fetch_lock(lock_sema) - 1);
+			rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+		}
 		mirr_bkt->current_chunk = (uintptr_t)chunk;
 		__atomic_store_n(&bkt->chunk_remainder,
 				tim_ring->nb_chunk_slots - 1, __ATOMIC_RELEASE);
@@ -409,12 +411,11 @@ tim_add_entry_mp(struct otx2_tim_ring * const tim_ring,
 		*chunk = *pent;
 	}

-	/* Copy work entry. */
-	tim_bkt_inc_nent(bkt);
-	tim_bkt_dec_lock(bkt);
 	tim->impl_opaque[0] = (uintptr_t)chunk;
 	tim->impl_opaque[1] = (uintptr_t)bkt;
-	tim->state = RTE_EVENT_TIMER_ARMED;
+	__atomic_store_n(&tim->state, RTE_EVENT_TIMER_ARMED, __ATOMIC_RELEASE);
+	tim_bkt_inc_nent(bkt);
+	tim_bkt_dec_lock_relaxed(bkt);

 	return 0;
 }
@@ -463,6 +464,23 @@ tim_add_entry_brst(struct otx2_tim_ring * const tim_ring,

 	if (lock_cnt) {
 		tim_bkt_dec_lock(bkt);
+#ifdef RTE_ARCH_ARM64
+		asm volatile("		ldxrb %w[lock_cnt], [%[lock]]	\n"
+			     "		tst %w[lock_cnt], 255		\n"
+			     "		beq dne%=			\n"
+			     "		sevl				\n"
+			     "rty%=:	wfe				\n"
+			     "		ldxrb %w[lock_cnt], [%[lock]]	\n"
+			     "		tst %w[lock_cnt], 255		\n"
+			     "		bne rty%=			\n"
+			     "dne%=:					\n"
+			     : [lock_cnt] "=&r"(lock_cnt)
+			     : [lock] "r"(&bkt->lock)
+			     : "memory");
+#else
+		while (__atomic_load_n(&bkt->lock, __ATOMIC_RELAXED))
+			;
+#endif
 		goto __retry;
 	}

@@ -471,22 +489,20 @@ tim_add_entry_brst(struct otx2_tim_ring * const tim_ring,
 		if (tim_bkt_get_nent(lock_sema) != 0) {
 			uint64_t hbt_state;
 #ifdef RTE_ARCH_ARM64
-			asm volatile(
-					"	ldaxr %[hbt], [%[w1]]	\n"
-					"	tbz %[hbt], 33, dne%=	\n"
-					"	sevl			\n"
-					"rty%=: wfe			\n"
-					"	ldaxr %[hbt], [%[w1]]	\n"
-					"	tbnz %[hbt], 33, rty%=	\n"
-					"dne%=:				\n"
-					: [hbt] "=&r" (hbt_state)
-					: [w1] "r" ((&bkt->w1))
-					: "memory"
-					);
+			asm volatile("		ldxr %[hbt], [%[w1]]	\n"
+				     "		tbz %[hbt], 33, dne%=	\n"
+				     "		sevl			\n"
+				     "rty%=:	wfe			\n"
+				     "		ldxr %[hbt], [%[w1]]	\n"
+				     "		tbnz %[hbt], 33, rty%=	\n"
+				     "dne%=:				\n"
+				     : [hbt] "=&r"(hbt_state)
+				     : [w1] "r"((&bkt->w1))
+				     : "memory");
 #else
 			do {
 				hbt_state = __atomic_load_n(&bkt->w1,
-						__ATOMIC_ACQUIRE);
+							    __ATOMIC_RELAXED);
 			} while (hbt_state & BIT_ULL(33));
 #endif

@@ -563,19 +579,18 @@ tim_rm_entry(struct rte_event_timer *tim)
 	bkt = (struct otx2_tim_bkt *)(uintptr_t)tim->impl_opaque[1];
 	lock_sema = tim_bkt_inc_lock(bkt);
 	if (tim_bkt_get_hbt(lock_sema) || !tim_bkt_get_nent(lock_sema)) {
-		tim_bkt_dec_lock(bkt);
 		tim->impl_opaque[0] = 0;
 		tim->impl_opaque[1] = 0;
+		tim_bkt_dec_lock(bkt);
 		return -ENOENT;
 	}

 	entry->w0 = 0;
 	entry->wqe = 0;
-	tim_bkt_dec_lock(bkt);
-
 	tim->state = RTE_EVENT_TIMER_CANCELED;
 	tim->impl_opaque[0] = 0;
 	tim->impl_opaque[1] = 0;
+	tim_bkt_dec_lock(bkt);

 	return 0;
 }
--
2.17.1