From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4DE29A0A0A; Mon, 29 Mar 2021 21:59:02 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CEA5E406B4; Mon, 29 Mar 2021 21:59:01 +0200 (CEST) Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by mails.dpdk.org (Postfix) with ESMTP id 880FA4069D for ; Mon, 29 Mar 2021 21:59:00 +0200 (CEST) Received: by mail-pf1-f181.google.com with SMTP id s11so4510646pfm.1 for ; Mon, 29 Mar 2021 12:59:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=71eys2VVfEi9YcWeJrV62o3Tmacd2wIFVr302xJbbvk=; b=cSo0LTrjWmRSbxKyp/IXomaKWsxC0vUQL16K/ZmQXgPk93SSxSdPZ7Q1kcspOohHNd lZtPISa5aW0aC8qd7KHyYuQat+UhaUSKez760QuSgmag8KKaKS7VPSYkHWSIrThVW5TT WlM6EQFua4OR3Lys4l8NLgmJuV6jyqXG9F26JAp6qajyKVnqUfWd8/EJ+rXooIy6HzJ3 i0Q8cIXTv3tJGSL6H+RnMbPj8gUyhmCMFiqSYBD/WRyOPeV6I7Czfv4LF1jRVPBAkQVb PI/WRZZDEUwQU4CxQ5mSWnkZoJYFhqji0XuPDOTICzyilT/W+eSRb20MiajMkDZJdpqj ef9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=71eys2VVfEi9YcWeJrV62o3Tmacd2wIFVr302xJbbvk=; b=oToesIKj6XjUYdv3L9fsgKS8JOAdPNLJtd0+3lZKro2kaxRLaXJnDiSWV5/5EUXk61 M+nybPKwYquHRaZdmTjIR89qOJH/WEFROefo5SYZ4Q2kUwWlCk7HTG+DpM+w7Cple35V 2IuqsbUlfp7+midYNlmWzt+J2xwAjDGewFm+JU3Mp3WdirKILl7HRV0seVk5VtnqkRs5 hBGsY2uFkB7i/7fUFwzbPDHyBmhWCc3w9pTs8pcDn0PUTsPWMPnWUCE1IbQZjhAXn1Ao bWkIcFKx7ViR27oUwd//lCNqoUANrQiObta7ubIDySefyVHxNXqZYg1Y013rcK3DgxbU CDNw== X-Gm-Message-State: AOAM533nsdrGOI3U3bAV8+WK8rQpB4uterzJXgL2+AgmwnlSpOytuI5v iFu+YDErN8sXlcvYzau6nGwibw== X-Google-Smtp-Source: ABdhPJyzmATWzNLPUd4L99+pbY4pqkKk9OIcK5GsDz2IkHljFV086ZzW6a1/uIdCkrzl2KQzDiVw4Q== X-Received: by 2002:aa7:8d92:0:b029:1ee:75d1:c87 with SMTP id i18-20020aa78d920000b02901ee75d10c87mr27195220pfr.9.1617047939556; Mon, 29 Mar 2021 12:58:59 -0700 (PDT) Received: from hermes.local (76-14-218-44.or.wavecable.com. [76.14.218.44]) by smtp.gmail.com with ESMTPSA id d2sm7686725pgp.47.2021.03.29.12.58.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Mar 2021 12:58:59 -0700 (PDT) Date: Mon, 29 Mar 2021 12:58:21 -0700 From: Stephen Hemminger To: Honnappa Nagarahalli Cc: "dev@dpdk.org" , Stephen Hemminger , nd Message-ID: <20210329125821.3808b0c5@hermes.local> In-Reply-To: References: <20210212013838.312623-1-sthemmin@microsoft.com> <20210303191945.94617-1-sthemmin@microsoft.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v3] pflock: implementation of phase-fair reader writer locks X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Meta question: is implementing trylock worth it? The original did not have it. There are tradeoffs about number of readers and added complexity in the code? > > diff --git a/lib/librte_eal/include/generic/rte_pflock.h > > b/lib/librte_eal/include/generic/rte_pflock.h > > new file mode 100644 > > index 000000000000..6808c70c34a2 > > --- /dev/null > > +++ b/lib/librte_eal/include/generic/rte_pflock.h > > @@ -0,0 +1,273 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright(c) 2021 Microsoft Corp. > > + * Copyright 2011-2015 Samy Al Bahra. > Any reason for adding the above copy right? Code originally came from Concurrency Kit, so wanted to keep attribution to original author > > + * The rte_pflock_t type. > > + */ > > +struct rte_pflock { > > + union rte_pflock_ticket { > > + uint32_t tickets; > > + struct { > > + uint16_t in; > > + uint16_t out; > > + }; > > + } rd, wr; > Just wondering if placing these on 2 different cache lines would help the performance? That won't work because the implementation of trylock requires compare/exchange of the whole structure as an atomic operation. > > > +}; > > +typedef struct rte_pflock rte_pflock_t; > > + > > +/** > > + * Constants used to map the bits in reader counter > > + * > > + * +-----------------+-+-+ > > + * | Readers |W|P| > > + * | |R|H| > > + * +-----------------+-+-+ > It would be good to indicate the reserved part. Ok > > > + */ > > + > > +#define RTE_PFLOCK_LSB 0xFFF0 > Based on the value of RTE_PFLOCK_RINC, should this be 0xFF00? The unused bits never get set so it doesn't matter > > > +#define RTE_PFLOCK_RINC 0x100 /* Reader increment value. > Does this mean, there can be only 256 concurrent readers? Yes, there is a tradeoff. If you assume that the largest atomic operation is 64 bits, and you want to support trylock then 256 readers is the limit. The original code has 32 bit counters but no trylock. > > +__rte_experimental > > +static inline void > > +rte_pflock_read_lock(rte_pflock_t *pf) > > +{ > > + uint32_t w; > > + > > + /* > > + * If no writer is present, then the operation has completed > > + * successfully. > > + */ > > + w = __atomic_fetch_add(&pf->rd.in, RTE_PFLOCK_RINC, > > __ATOMIC_ACQ_REL) & RTE_PFLOCK_WBITS; > Any reason for the RELEASE? I think ACQUIRE is enough as the write to rd.in is not releasing any previous memory operations. That make sense, will fix > > +__rte_experimental > > +static inline int > > +rte_pflock_read_trylock(rte_pflock_t *pf) { > > + union rte_pflock_ticket old, new; > > + > > + /* Get current state of the lock */ > > + old.tickets = __atomic_load_n(&pf->rd.tickets, > > __ATOMIC_RELAXED); > > + > > + /* loop until writer shows up */ > > + while ((old.in & RTE_PFLOCK_WBITS) == 0) { > > + new.out = old.out; > > + new.in = old.in + RTE_PFLOCK_RINC; > > + if (__atomic_compare_exchange_n(&pf->rd.tickets, > > &old.tickets, new.tickets, > > + 0, __ATOMIC_ACQ_REL, > ^^^ I think ACQUIRE is enough. We are not releasing anything to other threads. Fixed. > > > __ATOMIC_RELAXED)) > > + return 0; /* got it */ > > + > > + /* either new reader got in (so retry) or a writer */ > > + } > > + > > +__rte_experimental > > +static inline void > > +rte_pflock_write_lock(rte_pflock_t *pf) { > > + uint16_t ticket; > > + > > + /* Acquire ownership of write-phase. */ > > + ticket = __atomic_fetch_add(&pf->wr.in, 1, __ATOMIC_ACQUIRE); > > + rte_wait_until_equal_16(&pf->wr.out, ticket, __ATOMIC_RELAXED); > > + > > + /* > > + * Acquire ticket on read-side in order to allow them > > + * to flush. Indicates to any incoming reader that a > > + * write-phase is pending. > > + * > > + * Need ACQUIRE to prevent speculative execution of the wait loop > I do not think the entire wait loop will be executed speculatively. Only the load of pf->rd.out would happen speculatively. There is a dependency on 'ticket' variable. So, the load of the 'ticket' variable should happen after 'ticket' is updated below. > > > + */ > > + ticket = __atomic_fetch_add(&pf->rd.in, > > + (ticket & RTE_PFLOCK_PHID) | > > RTE_PFLOCK_PRES, > > + __ATOMIC_ACQUIRE); > Since, it is ok to execute part of the wait loop above this. We could make this RELAXED. > Also, since we just need to set the 2 bits, is it better to use __atomic_fetch_or? It also matches with the use of __atomic_fetch_and in the unlock API. ticket = __atomic_fetch_or(&pf->rd.in, (ticket & RTE_PFLOCK_PHID) | RTE_PFLOCK_PRES, __ATOMIC_RELAXED > > + > > + /* Wait for any pending readers to flush. */ > > + rte_wait_until_equal_16(&pf->rd.out, ticket, __ATOMIC_RELAXED); > RELAXED here will allow the critical section to execute above the wait loop. Hence it is better to make this ACQUIRE. Would it be better to add a fence instead? > > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice. > > + * > > + * Try to take the pflock for write. > > + * > > + * @param pf > > + * A pointer to the pflock. > > + * @return > > + * - zero if the lock is successfully taken > > + * - -EBUSY if lock could not be acquired for writing because > > + * another writer holds the lock > What about the readers holding the lock? Originally, I had it return -EBUSY, but then all the write trylock would fail. Trylock doesn't seem to play well with phase fair nature. Writing is a two part operation in this model, if the 1st part succeeds (which changes the phase), then there is no way to backout/undo the ticket. > > > + */ > > +__rte_experimental > > +static inline int > > +rte_pflock_write_trylock(rte_pflock_t *pf) { > > + union rte_pflock_ticket old, new; > > + uint16_t ticket; > > + > > + /* Get current state of the lock */ > > + old.tickets = __atomic_load_n(&pf->wr.tickets, > > __ATOMIC_RELAXED); > > + new.out = old.out; > > + new.in = old.in + 1; > > + ticket = new.in; > > + > > + /* if writer is already present then too busy */ > > + if (old.out != new.in || > > + !__atomic_compare_exchange_n(&pf->wr.tickets, &old.tickets, > > new.tickets, > > + 0, __ATOMIC_ACQ_REL, > > __ATOMIC_RELAXED)) > > + return -EBUSY; /* another writer is present already */ > > + > > + /* > > + * We now own the write phase, but still need to tell > > + * readers and wait for them. > The write lock is taken if there are no readers AND no writers (unlike the read lock which is taken if there are no writers waiting (only)) > Since this is a try lock, should we wait for the readers to give up the lock? > I think, if the readers are present, we should give up the writer phase and return. > > > + * > > + * Need ACQUIRE semantics to avoid speculative execution of wait > > loop > > + */ > > + ticket = __atomic_fetch_add(&pf->rd.in, > > + (ticket & RTE_PFLOCK_PHID) | > > RTE_PFLOCK_PRES, > > + __ATOMIC_ACQUIRE); > > + > > + /* Wait for any pending readers to flush. */ > > + rte_wait_until_equal_16(&pf->rd.out, ticket, __ATOMIC_RELAXED); > > + return 0; > > +} > > + > > +#ifdef __cplusplus > > +} > > +#endif > > + > > +#endif /* RTE_PFLOCK_H */ > > diff --git a/lib/librte_eal/ppc/include/meson.build > > b/lib/librte_eal/ppc/include/meson.build > > index dae40ede546e..7692a531ccba 100644 > > --- a/lib/librte_eal/ppc/include/meson.build > > +++ b/lib/librte_eal/ppc/include/meson.build > > @@ -11,6 +11,7 @@ arch_headers = files( > > 'rte_mcslock.h', > > 'rte_memcpy.h', > > 'rte_pause.h', > > + 'rte_pflock.h', > > 'rte_power_intrinsics.h', > > 'rte_prefetch.h', > > 'rte_rwlock.h', > > diff --git a/lib/librte_eal/ppc/include/rte_pflock.h > > b/lib/librte_eal/ppc/include/rte_pflock.h > > new file mode 100644 > > index 000000000000..e7b875ac56a8 > > --- /dev/null > > +++ b/lib/librte_eal/ppc/include/rte_pflock.h > > @@ -0,0 +1,16 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause */ #ifndef > Copyright header missing? > > > +_RTE_PFLOCK_PPC_64_H_ #define _RTE_PFLOCK_PPC_64_H_ > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +#include "generic/rte_pflock.h" > > + > > +#ifdef __cplusplus > > +} > > +#endif > > + > > +#endif /* _RTE_PFLOCK_PPC_64_H_ */ > > diff --git a/lib/librte_eal/x86/include/meson.build > > b/lib/librte_eal/x86/include/meson.build > > index 1a6ad0b17342..f43645c20899 100644 > > --- a/lib/librte_eal/x86/include/meson.build > > +++ b/lib/librte_eal/x86/include/meson.build > > @@ -10,6 +10,7 @@ arch_headers = files( > > 'rte_mcslock.h', > > 'rte_memcpy.h', > > 'rte_pause.h', > > + 'rte_pflock.h', > > 'rte_power_intrinsics.h', > > 'rte_prefetch.h', > > 'rte_rtm.h', > > diff --git a/lib/librte_eal/x86/include/rte_pflock.h > > b/lib/librte_eal/x86/include/rte_pflock.h > > new file mode 100644 > > index 000000000000..c2d876062c08 > > --- /dev/null > > +++ b/lib/librte_eal/x86/include/rte_pflock.h > > @@ -0,0 +1,18 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright(c) 2021 Microsoft Corporation */ > > + > > +#ifndef _RTE_PFLOCK_X86_64_H_ > > +#define _RTE_PFLOCK_X86_64_H_ > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +#include "generic/rte_pflock.h" > > + > > +#ifdef __cplusplus > > +} > > +#endif > > + > > +#endif /* _RTE_PFLOCK_X86_64_H_ */ > > -- > > 2.30.1 >