Re: [dpdk-dev] [PATCH v3] pflock: implementation of phase-fair reader writer locks

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Stephen Hemminger <sthemmin@microsoft.com>,  nd <nd@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v3] pflock: implementation of phase-fair reader writer locks
Date: Tue, 30 Mar 2021 00:18:40 +0000	[thread overview]
Message-ID: <DBAPR08MB581458D378B895AA2200A4B1987D9@DBAPR08MB5814.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20210329125821.3808b0c5@hermes.local>

<snip>

> Subject: Re: [PATCH v3] pflock: implementation of phase-fair reader writer locks
> 
> Meta question: is implementing trylock worth it?
> The original did not have it.
If there is no use for it currently, I suggest not to add. If someone sees a need, they can always add. I am ok if you add as well.

> 
> There are tradeoffs about number of readers and added complexity in the code?
If we increase the size of 'in' and 'out' to 32b and 'tickets' to 64b, that should increase the number of readers. Do you see any other complexity?
This would mean 64b compare-swap in try lock, that should be fine I think.

> 
> > > diff --git a/lib/librte_eal/include/generic/rte_pflock.h
> > > b/lib/librte_eal/include/generic/rte_pflock.h
> > > new file mode 100644
> > > index 000000000000..6808c70c34a2
> > > --- /dev/null
> > > +++ b/lib/librte_eal/include/generic/rte_pflock.h
> > > @@ -0,0 +1,273 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2021 Microsoft Corp.
> > > + * Copyright 2011-2015 Samy Al Bahra.
> > Any reason for adding the above copy right?
> 
> Code originally came from Concurrency Kit, so wanted to keep attribution to
> original author
Ack

> 
> > > + * The rte_pflock_t type.
> > > + */
> > > +struct rte_pflock {
> > > +	union rte_pflock_ticket {
> > > +		uint32_t tickets;
> > > +		struct {
> > > +			uint16_t in;
> > > +			uint16_t out;
> > > +		};
> > > +	} rd, wr;
> > Just wondering if placing these on 2 different cache lines would help the
> performance?
> 
> That won't work because the implementation of trylock requires
> compare/exchange of the whole structure as an atomic operation.
I meant, placing 'rd' and 'wr' on separate cache lines. It might help in the reader-writer contention case.

> 
> >
> > > +};
> > > +typedef struct rte_pflock rte_pflock_t;
> > > +
> > > +/**
> > > + * Constants used to map the bits in reader counter
> > > + *
> > > + * +-----------------+-+-+
> > > + * |     Readers     |W|P|
> > > + * |                 |R|H|
> > > + * +-----------------+-+-+
> > It would be good to indicate the reserved part.
> 
> Ok
> 
> >
> > > + */
> > > +
> > > +#define RTE_PFLOCK_LSB   0xFFF0
> > Based on the value of RTE_PFLOCK_RINC, should this be 0xFF00?
> 
> The unused bits never get set so it doesn't matter
Agree, it just creates confusion while reading the code.

> 
> >
> > > +#define RTE_PFLOCK_RINC  0x100		/* Reader increment value.
> > Does this mean, there can be only 256 concurrent readers?
> 
> Yes, there is a tradeoff.  If you assume that the largest atomic operation is 64
> bits, and you want to support trylock then 256 readers is the limit.
>
May be I am missing something, I see that you are using 32b atomic operations. 'union rte_pflock_ticket' is 32b.

> The original code has 32 bit counters but no trylock.
> 
> 
> > > +__rte_experimental
> > > +static inline void
> > > +rte_pflock_read_lock(rte_pflock_t *pf) {
> > > +	uint32_t w;
> > > +
> > > +	/*
> > > +	 * If no writer is present, then the operation has completed
> > > +	 * successfully.
> > > +	 */
> > > +	w = __atomic_fetch_add(&pf->rd.in, RTE_PFLOCK_RINC,
> > > __ATOMIC_ACQ_REL) & RTE_PFLOCK_WBITS;
> > Any reason for the RELEASE? I think ACQUIRE is enough as the write to rd.in is
> not releasing any previous memory operations.
> 
> That make sense, will fix
> 
> > > +__rte_experimental
> > > +static inline int
> > > +rte_pflock_read_trylock(rte_pflock_t *pf) {
> > > +	union rte_pflock_ticket old, new;
> > > +
> > > +	/* Get current state of the lock */
> > > +	old.tickets = __atomic_load_n(&pf->rd.tickets,
> > > __ATOMIC_RELAXED);
> > > +
> > > +	/* loop until writer shows up */
> > > +	while ((old.in & RTE_PFLOCK_WBITS) == 0) {
> > > +		new.out = old.out;
> > > +		new.in = old.in + RTE_PFLOCK_RINC;
> > > +		if (__atomic_compare_exchange_n(&pf->rd.tickets,
> > > &old.tickets, new.tickets,
> > > +						0, __ATOMIC_ACQ_REL,
> >
>               ^^^ I think ACQUIRE is enough. We are not releasing anything to other
> threads.
> 
> Fixed.
> 
> >
> > > __ATOMIC_RELAXED))
> > > +			return 0;	/* got it */
> > > +
> > > +		/* either new reader got in (so retry) or a writer */
> > > +	}
> > > +
> 
> > > +__rte_experimental
> > > +static inline void
> > > +rte_pflock_write_lock(rte_pflock_t *pf) {
> > > +	uint16_t ticket;
> > > +
> > > +	/* Acquire ownership of write-phase. */
> > > +	ticket = __atomic_fetch_add(&pf->wr.in, 1, __ATOMIC_ACQUIRE);
> > > +	rte_wait_until_equal_16(&pf->wr.out, ticket, __ATOMIC_RELAXED);
> > > +
> > > +	/*
> > > +	 * Acquire ticket on read-side in order to allow them
> > > +	 * to flush. Indicates to any incoming reader that a
> > > +	 * write-phase is pending.
> > > +	 *
> > > +	 * Need ACQUIRE to prevent speculative execution of the wait loop
> > I do not think the entire wait loop will be executed speculatively. Only the load
> of pf->rd.out would happen speculatively. There is a dependency on 'ticket'
> variable. So, the load of the 'ticket' variable should happen after 'ticket' is
> updated below.
> >
> > > +	 */
> > > +	ticket = __atomic_fetch_add(&pf->rd.in,
> > > +				    (ticket & RTE_PFLOCK_PHID) |
> > > RTE_PFLOCK_PRES,
> > > +				    __ATOMIC_ACQUIRE);
> > Since, it is ok to execute part of the wait loop above this. We could make this
> RELAXED.
> > Also, since we just need to set the 2 bits, is it better to use __atomic_fetch_or?
> It also matches with the use of __atomic_fetch_and in the unlock API.
> 
> 
> 	ticket = __atomic_fetch_or(&pf->rd.in,
> 				    (ticket & RTE_PFLOCK_PHID) |
> RTE_PFLOCK_PRES,
> 				    __ATOMIC_RELAXED
Ack

> 
> > > +
> > > +	/* Wait for any pending readers to flush. */
> > > +	rte_wait_until_equal_16(&pf->rd.out, ticket, __ATOMIC_RELAXED);
> > RELAXED here will allow the critical section to execute above the wait loop.
> Hence it is better to make this ACQUIRE.
> 
> 	Would it be better to add a fence instead?
Acquire fence here will have more restrictions for the micro-architecture to optimize. It does not allow the earlier reads to cross the fence (a load-acquire would allow it).
Good to run performance tests to confirm.

> >
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * Try to take the pflock for write.
> > > + *
> > > + * @param pf
> > > + *   A pointer to the pflock.
> > > + * @return
> > > + *   - zero if the lock is successfully taken
> > > + *   - -EBUSY if lock could not be acquired for writing because
> > > + *     another writer holds the lock
> > What about the readers holding the lock?
> 
> Originally, I had it return -EBUSY, but then all the write trylock would fail.
Ok. It is also not clear what the try lock should do? Without the clear requirements we will bind ourselves into an implement which might not be suitable in the future. May be it is better to skip it.

> Trylock doesn't seem to play well with phase fair nature.
> 
> Writing is a two part operation in this model, if the 1st part succeeds (which
> changes the phase), then there is no way to backout/undo the ticket.
The undo operation is similar to 'rte_pflock_write_unlock'? Reset the phase bits and increment the ticket.

> 
> 
> 
> >
> > > + */
> > > +__rte_experimental
> > > +static inline int
> > > +rte_pflock_write_trylock(rte_pflock_t *pf) {
> > > +	union rte_pflock_ticket old, new;
> > > +	uint16_t ticket;
> > > +
> > > +	/* Get current state of the lock */
> > > +	old.tickets = __atomic_load_n(&pf->wr.tickets,
> > > __ATOMIC_RELAXED);
> > > +	new.out = old.out;
> > > +	new.in  = old.in + 1;
> > > +	ticket = new.in;
> > > +
> > > +	/* if writer is already present then too busy */
> > > +	if (old.out != new.in ||
> > > +	    !__atomic_compare_exchange_n(&pf->wr.tickets, &old.tickets,
> > > new.tickets,
> > > +					 0, __ATOMIC_ACQ_REL,
> > > __ATOMIC_RELAXED))
> > > +		return -EBUSY; /* another writer is present already */
> > > +
> > > +	/*
> > > +	 * We now own the write phase, but still need to tell
> > > +	 * readers and wait for them.
> > The write lock is taken if there are no readers AND no writers (unlike the read
> lock which is taken if there are no writers waiting (only))
> > Since this is a try lock, should we wait for the readers to give up the lock?
> > I think, if the readers are present, we should give up the writer phase and
> return.
> >
> > > +	 *
> > > +	 * Need ACQUIRE semantics to avoid speculative execution of wait
> > > loop
> > > +	 */
> > > +	ticket  = __atomic_fetch_add(&pf->rd.in,
> > > +				 (ticket & RTE_PFLOCK_PHID) |
> > > RTE_PFLOCK_PRES,
> > > +				 __ATOMIC_ACQUIRE);
> > > +
> > > +	/* Wait for any pending readers to flush. */
> > > +	rte_wait_until_equal_16(&pf->rd.out, ticket, __ATOMIC_RELAXED);
> > > +	return 0;
> > > +}
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* RTE_PFLOCK_H */
> > > diff --git a/lib/librte_eal/ppc/include/meson.build
> > > b/lib/librte_eal/ppc/include/meson.build
> > > index dae40ede546e..7692a531ccba 100644
> > > --- a/lib/librte_eal/ppc/include/meson.build
> > > +++ b/lib/librte_eal/ppc/include/meson.build
> > > @@ -11,6 +11,7 @@ arch_headers = files(
> > >  	'rte_mcslock.h',
> > >  	'rte_memcpy.h',
> > >  	'rte_pause.h',
> > > +	'rte_pflock.h',
> > >  	'rte_power_intrinsics.h',
> > >  	'rte_prefetch.h',
> > >  	'rte_rwlock.h',
> > > diff --git a/lib/librte_eal/ppc/include/rte_pflock.h
> > > b/lib/librte_eal/ppc/include/rte_pflock.h
> > > new file mode 100644
> > > index 000000000000..e7b875ac56a8
> > > --- /dev/null
> > > +++ b/lib/librte_eal/ppc/include/rte_pflock.h
> > > @@ -0,0 +1,16 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause  */ #ifndef
> > Copyright header missing?
> >
> > > +_RTE_PFLOCK_PPC_64_H_ #define _RTE_PFLOCK_PPC_64_H_
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include "generic/rte_pflock.h"
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_PFLOCK_PPC_64_H_ */
> > > diff --git a/lib/librte_eal/x86/include/meson.build
> > > b/lib/librte_eal/x86/include/meson.build
> > > index 1a6ad0b17342..f43645c20899 100644
> > > --- a/lib/librte_eal/x86/include/meson.build
> > > +++ b/lib/librte_eal/x86/include/meson.build
> > > @@ -10,6 +10,7 @@ arch_headers = files(
> > >  	'rte_mcslock.h',
> > >  	'rte_memcpy.h',
> > >  	'rte_pause.h',
> > > +	'rte_pflock.h',
> > >  	'rte_power_intrinsics.h',
> > >  	'rte_prefetch.h',
> > >  	'rte_rtm.h',
> > > diff --git a/lib/librte_eal/x86/include/rte_pflock.h
> > > b/lib/librte_eal/x86/include/rte_pflock.h
> > > new file mode 100644
> > > index 000000000000..c2d876062c08
> > > --- /dev/null
<snip>

next prev parent reply	other threads:[~2021-03-30  0:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12  6:05 [dpdk-dev] [RFC] eal: add fair reader writer lock Stephen Hemminger
2021-01-14 17:34 ` [dpdk-dev] [PATCH v1] eal: add ticket based " Stephen Hemminger
2021-01-27 10:25   ` Ruifeng Wang
2021-01-28  1:32     ` Stephen Hemminger
2021-01-28  1:16   ` [dpdk-dev] [PATCH v2] " Stephen Hemminger
2021-02-12  1:38   ` [dpdk-dev] [RFC] pflock: add implementation of phase-fair locks Stephen Hemminger
2021-02-28 17:21     ` [dpdk-dev] [PATCH v1] pflock: implementation of phase-fair reader writer locks Stephen Hemminger
2021-03-03 18:30     ` [dpdk-dev] [PATCH v2] " Stephen Hemminger
2021-03-03 19:19     ` [dpdk-dev] [PATCH v3] " Stephen Hemminger
2021-03-26 17:17       ` Stephen Hemminger
2021-03-29  3:14       ` Honnappa Nagarahalli
2021-03-29 17:22         ` Stephen Hemminger
2021-03-29 18:09           ` Honnappa Nagarahalli
2021-03-29 19:58         ` Stephen Hemminger
2021-03-30  0:18           ` Honnappa Nagarahalli [this message]
2021-03-30  4:56             ` Stephen Hemminger
2021-03-30  5:00     ` [dpdk-dev] [PATCH v4] pflock: add " Stephen Hemminger
2021-03-30  5:14       ` Stephen Hemminger
2021-03-31  4:19       ` Honnappa Nagarahalli
2021-03-31 16:32         ` Stephen Hemminger
2021-04-02  1:37         ` Stephen Hemminger
2021-04-02  1:42     ` [dpdk-dev] [PATCH v5] pflock: implementation of " Stephen Hemminger
2021-04-06 21:56       ` Honnappa Nagarahalli
2021-04-06 22:33         ` Stephen Hemminger
2021-04-07  0:17           ` Honnappa Nagarahalli
2021-04-07 15:09       ` Ananyev, Konstantin
2021-04-14 15:36         ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBAPR08MB581458D378B895AA2200A4B1987D9@DBAPR08MB5814.eurprd08.prod.outlook.com \
    --to=honnappa.nagarahalli@arm.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    --cc=stephen@networkplumber.org \
    --cc=sthemmin@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).