From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6C79748A8B; Mon, 3 Nov 2025 16:12:42 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 57C08402D5; Mon, 3 Nov 2025 16:12:42 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 4256440281 for ; Mon, 3 Nov 2025 16:12:41 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C5A6F1D14; Mon, 3 Nov 2025 07:12:32 -0800 (PST) Received: from [10.122.34.113] (unknown [10.122.34.113]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 69A773F66E; Mon, 3 Nov 2025 07:12:40 -0800 (PST) Message-ID: Date: Mon, 3 Nov 2025 09:12:39 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/1] eal: correct memory ordering in MCS lock To: Honnappa Nagarahalli , Tyler Retzlaff Cc: dev@dpdk.org, Ola Liljedahl , vattunuru@marvell.com References: <20251023184724.1759497-1-wathsala.vithanage@arm.com> Content-Language: en-US From: Wathsala Vithanage In-Reply-To: <20251023184724.1759497-1-wathsala.vithanage@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org MCS lock is broken, it's just a matter of time it will run into a deadlock. drivers/dma/cnxk is a user of MCS lock. --wathsala On 10/23/25 13:47, Wathsala Vithanage wrote: > Fix incorrect memory ordering in the MCS lock implementation by > adding proper synchronizing edges to establish clear happens-before > relationships between threads invoking lock() and unlock(). These > synchronizing edges prevent potential deadlocks caused by improper > ordering and are documented in detail through in-code comments. > > The previously relaxed load of the successor’s lock object pointer > in unlock() has been upgraded to a load-acquire operation. This > change ensures that the successor’s initialization does not > overwrite the current lock holder’s update to the locked field, > which could otherwise lead to deadlocks. > > Remove two unnecessary fences: > > The acquire fence in unlock() had no matching release fence, making > it ineffective for enforcing memory order. The associated comment > suggested it prevented speculative reordering, but such fences (data > memory barriers) only establish memory ordering and do not control > instruction speculation. > > The release-acquire fence pair in lock() was previously justified as > preventing reordering between the load-acquire loop of me->locked > and the store-release of prev->next. This is no longer needed, as the > new synchronizing edges ensure a chain of happens-before > relationships between memory operations of threads calling lock() and > unlock(). > > Signed-off-by: Wathsala Vithanage > Reviewed-by: Ola Liljedahl > --- > lib/eal/include/rte_mcslock.h | 100 ++++++++++++++++++++++------------ > 1 file changed, 64 insertions(+), 36 deletions(-) > > diff --git a/lib/eal/include/rte_mcslock.h b/lib/eal/include/rte_mcslock.h > index bb218d2e50..0af7a94a06 100644 > --- a/lib/eal/include/rte_mcslock.h > +++ b/lib/eal/include/rte_mcslock.h > @@ -57,11 +57,21 @@ rte_mcslock_lock(RTE_ATOMIC(rte_mcslock_t *) *msl, rte_mcslock_t *me) > rte_atomic_store_explicit(&me->locked, 1, rte_memory_order_relaxed); > rte_atomic_store_explicit(&me->next, NULL, rte_memory_order_relaxed); > > - /* If the queue is empty, the exchange operation is enough to acquire > - * the lock. Hence, the exchange operation requires acquire semantics. > - * The store to me->next above should complete before the node is > - * visible to other CPUs/threads. Hence, the exchange operation requires > - * release semantics as well. > + /* > + * A0/R0: Queue might be empty, perform the exchange (RMW) with both acquire and > + * release semantics: > + * A0: Acquire — synchronizes with both R0 and R2. > + * Must synchronize with R0 to ensure that this thread observes predecessor's > + * initialization of its lock object or risk them overwriting this thread's > + * update to the next of the same object via store to prev->next. > + * > + * Must synchronize with R2 the releasing CAS in unlock(), this will ensure > + * that all prior critical-section writes become visible to this thread. > + * > + * R0: Release — ensures the successor observes our initialization of me->next; > + * without it, me->next could be overwritten to NULL after the successor > + * sets its own address, causing deadlock. This release synchronizes with > + * A0 above. > */ > prev = rte_atomic_exchange_explicit(msl, me, rte_memory_order_acq_rel); > if (likely(prev == NULL)) { > @@ -70,24 +80,26 @@ rte_mcslock_lock(RTE_ATOMIC(rte_mcslock_t *) *msl, rte_mcslock_t *me) > */ > return; > } > - /* The store to me->next above should also complete before the node is > - * visible to predecessor thread releasing the lock. Hence, the store > - * prev->next also requires release semantics. Note that, for example, > - * on ARM, the release semantics in the exchange operation is not > - * strong as a release fence and is not sufficient to enforce the > - * desired order here. > + > + /* > + * R1: With the relaxed memory model of C/C++, it's essential that after > + * we link ourselves by storing prev->next = me, the owner of prev must > + * observe our prior initialization of me->locked. Otherwise it could > + * clear me->locked before we set it to 1, which may deadlock. > + * Perform a releasing store so the predecessor's acquire loads A2 and A3 > + * observes our initialization, establishing a happens-before from those > + * writes. > */ > rte_atomic_store_explicit(&prev->next, me, rte_memory_order_release); > > - /* The while-load of me->locked should not move above the previous > - * store to prev->next. Otherwise it will cause a deadlock. Need a > - * store-load barrier. > - */ > - rte_atomic_thread_fence(rte_memory_order_acq_rel); > - /* If the lock has already been acquired, it first atomically > + /* > + * A1: If the lock has already been acquired, it first atomically > * places the node at the end of the queue and then proceeds > * to spin on me->locked until the previous lock holder resets > - * the me->locked using mcslock_unlock(). > + * the me->locked in rte_mcslock_unlock(). > + * This load must synchronize with store-release R3 to ensure that > + * all updates to critical section by previous lock holder is visible > + * to this thread after acquiring the lock. > */ > rte_wait_until_equal_32((uint32_t *)(uintptr_t)&me->locked, 0, rte_memory_order_acquire); > } > @@ -103,30 +115,46 @@ rte_mcslock_lock(RTE_ATOMIC(rte_mcslock_t *) *msl, rte_mcslock_t *me) > static inline void > rte_mcslock_unlock(RTE_ATOMIC(rte_mcslock_t *) *msl, RTE_ATOMIC(rte_mcslock_t *) me) > { > - /* Check if there are more nodes in the queue. */ > - if (likely(rte_atomic_load_explicit(&me->next, rte_memory_order_relaxed) == NULL)) { > + /* > + * A2: Check whether a successor is already queued. > + * Load me->next with acquire semantics so it can synchronize with the > + * successor’s release store R1. This guarantees that the successor’s > + * initialization of its lock object (me) is completed before we observe > + * it here, preventing a race between this thread’s store-release to > + * me->next->locked and the successor’s store to me->locked. > + */ > + if (likely(rte_atomic_load_explicit(&me->next, rte_memory_order_acquire) == NULL)) { > /* No, last member in the queue. */ > - rte_mcslock_t *save_me = rte_atomic_load_explicit(&me, rte_memory_order_relaxed); > + rte_mcslock_t *save_me = me; > > - /* Release the lock by setting it to NULL */ > + /* > + * R2: Try to release the lock by swinging *msl from save_me to NULL. > + * Use release semantics so all critical section writes become > + * visible to the next lock acquirer. > + */ > if (likely(rte_atomic_compare_exchange_strong_explicit(msl, &save_me, NULL, > rte_memory_order_release, rte_memory_order_relaxed))) > return; > > - /* Speculative execution would be allowed to read in the > - * while-loop first. This has the potential to cause a > - * deadlock. Need a load barrier. > - */ > - rte_atomic_thread_fence(rte_memory_order_acquire); > - /* More nodes added to the queue by other CPUs. > - * Wait until the next pointer is set. > + /* > + * A3: Another thread was enqueued concurrently, so the CAS and the lock > + * release failed. Wait until the successor sets our 'next' pointer. > + * This load must synchronize with the successor’s release store (R1) to > + * ensure that the successor’s initialization completes before we observe > + * it here. This ordering prevents a race between this thread’s later > + * store-release to me->next->locked and the successor’s store to me->locked. > */ > RTE_ATOMIC(uintptr_t) *next; > next = (__rte_atomic uintptr_t *)&me->next; > - RTE_WAIT_UNTIL_MASKED(next, UINTPTR_MAX, !=, 0, rte_memory_order_relaxed); > + RTE_WAIT_UNTIL_MASKED(next, UINTPTR_MAX, !=, 0, rte_memory_order_acquire); > } > > - /* Pass lock to next waiter. */ > + /* > + * R3: Pass the lock to the successor. > + * Use a release store to synchronize with A1 when clearing me->next->locked > + * so the successor observes our critical section writes after it sees locked > + * become 0. > + */ > rte_atomic_store_explicit(&me->next->locked, 0, rte_memory_order_release); > } > > @@ -149,11 +177,11 @@ rte_mcslock_trylock(RTE_ATOMIC(rte_mcslock_t *) *msl, rte_mcslock_t *me) > /* Try to lock */ > rte_mcslock_t *expected = NULL; > > - /* The lock can be taken only when the queue is empty. Hence, > - * the compare-exchange operation requires acquire semantics. > - * The store to me->next above should complete before the node > - * is visible to other CPUs/threads. Hence, the compare-exchange > - * operation requires release semantics as well. > + /* > + * A4/R4: The lock can be acquired only when the queue is empty. > + * The compare-and-exchange operation must use acquire and release > + * semantics for the same reasons described in the rte_mcslock_lock() > + * function’s empty-queue case (see A0/R0 for details). > */ > return rte_atomic_compare_exchange_strong_explicit(msl, &expected, me, > rte_memory_order_acq_rel, rte_memory_order_relaxed);