The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling for a memory location to become equal to a given value'[1]. Use the API for the rte spinlock and ring implementations. [1] http://patches.dpdk.org/cover/62703/ Gavin Hu (2): spinlock: use wfe to reduce contention on aarch64 ring: use wfe to wait for ring tail update on aarch64 lib/librte_eal/include/generic/rte_spinlock.h | 5 ++--- lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- lib/librte_ring/rte_ring_generic.h | 3 +-- 3 files changed, 5 insertions(+), 7 deletions(-) -- 2.17.1
In acquiring a spinlock, cores repeatedly poll the lock variable. This is replaced by rte_wait_until_equal API. Running the micro benchmarking and the testpmd and l3fwd traffic tests on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no notable performance gain nor degradation was measured. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- lib/librte_eal/include/generic/rte_spinlock.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/include/generic/rte_spinlock.h b/lib/librte_eal/include/generic/rte_spinlock.h index 87ae7a4f1..5cc123247 100644 --- a/lib/librte_eal/include/generic/rte_spinlock.h +++ b/lib/librte_eal/include/generic/rte_spinlock.h @@ -28,7 +28,7 @@ * The rte_spinlock_t type. */ typedef struct { - volatile int locked; /**< lock status 0 = unlocked, 1 = locked */ + volatile uint32_t locked; /**< lock status 0 = unlocked, 1 = locked */ } rte_spinlock_t; /** @@ -65,8 +65,7 @@ rte_spinlock_lock(rte_spinlock_t *sl) while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { - while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) - rte_pause(); + rte_wait_until_equal_32(&sl->locked, 0, __ATOMIC_RELAXED); exp = 0; } } -- 2.17.1
Instead of polling for tail to be updated, use wfe instruction. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- lib/librte_ring/rte_ring_generic.h | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index 0fb73a337..764d8f186 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -2,6 +2,7 @@ * * Copyright (c) 2017,2018 HXT-semitech Corporation. * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org + * Copyright (c) 2019 Arm Limited * All rights reserved. * Derived from FreeBSD's bufring.h * Used as BSD-3 Licensed with permission from Kip Macy. @@ -21,8 +22,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); } diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h index 953cdbbd5..682852783 100644 --- a/lib/librte_ring/rte_ring_generic.h +++ b/lib/librte_ring/rte_ring_generic.h @@ -23,8 +23,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); ht->tail = new_val; } -- 2.17.1
> > Instead of polling for tail to be updated, use wfe instruction. > > Signed-off-by: Gavin Hu <gavin.hu@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > Reviewed-by: Steve Capper <steve.capper@arm.com> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > --- > lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- > lib/librte_ring/rte_ring_generic.h | 3 +-- > 2 files changed, 3 insertions(+), 4 deletions(-) > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h > index 0fb73a337..764d8f186 100644 > --- a/lib/librte_ring/rte_ring_c11_mem.h > +++ b/lib/librte_ring/rte_ring_c11_mem.h > @@ -2,6 +2,7 @@ > * > * Copyright (c) 2017,2018 HXT-semitech Corporation. > * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org > + * Copyright (c) 2019 Arm Limited > * All rights reserved. > * Derived from FreeBSD's bufring.h > * Used as BSD-3 Licensed with permission from Kip Macy. > @@ -21,8 +22,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, > * we need to wait for them to complete > */ > if (!single) > - while (unlikely(ht->tail != old_val)) > - rte_pause(); > + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); > > __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); > } > diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h > index 953cdbbd5..682852783 100644 > --- a/lib/librte_ring/rte_ring_generic.h > +++ b/lib/librte_ring/rte_ring_generic.h > @@ -23,8 +23,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, > * we need to wait for them to complete > */ > if (!single) > - while (unlikely(ht->tail != old_val)) > - rte_pause(); > + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); ./lib/librte_ring/rte_ring_generic.h:26:3: error: implicit declaration of function 'rte_wait_until_equal_relaxed_32'; did you mean 'rte_wait_until_equal_32'? [-Werror=implicit-function-declaration] rte_wait_until_equal_relaxed_32(&ht->tail, old_val); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BTW, after patch #1 compilation fails also: ../lib/librte_eal/x86/include/rte_spinlock.h:125:24: error: pointer targets in passing argument 1 of 'rte_try_tm' differ in signedness [-Werror=pointer-sign] if (likely(rte_try_tm(&sl->locked))) ^ > > ht->tail = new_val; > } > -- > 2.17.1
<snip>
Hi Gavin,
There are other sync modes added to rte_ring library. Can you please extend this to other rte_pause() calls in the library?
Thanks,
Honnappa
> Subject: [PATCH v1 2/2] ring: use wfe to wait for ring tail update on aarch64
>
> Instead of polling for tail to be updated, use wfe instruction.
>
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
> lib/librte_ring/rte_ring_c11_mem.h | 4 ++--
> lib/librte_ring/rte_ring_generic.h | 3 +--
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/lib/librte_ring/rte_ring_c11_mem.h
> b/lib/librte_ring/rte_ring_c11_mem.h
> index 0fb73a337..764d8f186 100644
> --- a/lib/librte_ring/rte_ring_c11_mem.h
> +++ b/lib/librte_ring/rte_ring_c11_mem.h
> @@ -2,6 +2,7 @@
> *
> * Copyright (c) 2017,2018 HXT-semitech Corporation.
> * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * Copyright (c) 2019 Arm Limited
> * All rights reserved.
> * Derived from FreeBSD's bufring.h
> * Used as BSD-3 Licensed with permission from Kip Macy.
> @@ -21,8 +22,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t
> old_val, uint32_t new_val,
> * we need to wait for them to complete
> */
> if (!single)
> - while (unlikely(ht->tail != old_val))
> - rte_pause();
> + rte_wait_until_equal_relaxed_32(&ht->tail, old_val);
>
> __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); } diff --
> git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
> index 953cdbbd5..682852783 100644
> --- a/lib/librte_ring/rte_ring_generic.h
> +++ b/lib/librte_ring/rte_ring_generic.h
> @@ -23,8 +23,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t
> old_val, uint32_t new_val,
> * we need to wait for them to complete
> */
> if (!single)
> - while (unlikely(ht->tail != old_val))
> - rte_pause();
> + rte_wait_until_equal_relaxed_32(&ht->tail, old_val);
>
> ht->tail = new_val;
> }
> --
> 2.17.1
Hi Honnappa, > -----Original Message----- > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Sent: Friday, April 24, 2020 10:46 PM > To: Gavin Hu <Gavin.Hu@arm.com>; dev@dpdk.org > Cc: nd <nd@arm.com>; david.marchand@redhat.com; > thomas@monjalon.net; jerinj@marvell.com; Ruifeng Wang > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong > <Joyce.Kong@arm.com>; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com> > Subject: RE: [PATCH v1 2/2] ring: use wfe to wait for ring tail update on > aarch64 > > <snip> > > Hi Gavin, > There are other sync modes added to rte_ring library. Can you > please extend this to other rte_pause() calls in the library? I looked into the other two calls to rte_cause, they can't use the rte_wait_until_equal_XX APIs simply. Maybe new APIs are required? Here is the 32-bit API for your reference: rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected, int memorder) > > Thanks, > Honnappa > > > Subject: [PATCH v1 2/2] ring: use wfe to wait for ring tail update on > aarch64 > > > > Instead of polling for tail to be updated, use wfe instruction. > > > > Signed-off-by: Gavin Hu <gavin.hu@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > Reviewed-by: Steve Capper <steve.capper@arm.com> > > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > > --- > > lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- > > lib/librte_ring/rte_ring_generic.h | 3 +-- > > 2 files changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > > b/lib/librte_ring/rte_ring_c11_mem.h > > index 0fb73a337..764d8f186 100644 > > --- a/lib/librte_ring/rte_ring_c11_mem.h > > +++ b/lib/librte_ring/rte_ring_c11_mem.h > > @@ -2,6 +2,7 @@ > > * > > * Copyright (c) 2017,2018 HXT-semitech Corporation. > > * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org > > + * Copyright (c) 2019 Arm Limited > > * All rights reserved. > > * Derived from FreeBSD's bufring.h > > * Used as BSD-3 Licensed with permission from Kip Macy. > > @@ -21,8 +22,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t > > old_val, uint32_t new_val, > > * we need to wait for them to complete > > */ > > if (!single) > > - while (unlikely(ht->tail != old_val)) > > - rte_pause(); > > + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); > > > > __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); } diff -- > > git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h > > index 953cdbbd5..682852783 100644 > > --- a/lib/librte_ring/rte_ring_generic.h > > +++ b/lib/librte_ring/rte_ring_generic.h > > @@ -23,8 +23,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t > > old_val, uint32_t new_val, > > * we need to wait for them to complete > > */ > > if (!single) > > - while (unlikely(ht->tail != old_val)) > > - rte_pause(); > > + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); > > > > ht->tail = new_val; > > } > > -- > > 2.17.1
The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling for a memory location to become equal to a given value'[1]. Use the API for the rte spinlock and ring implementations. [1] http://patches.dpdk.org/cover/62703/ Gavin Hu (2): spinlock: use wfe to reduce contention on aarch64 ring: use wfe to wait for ring tail update on aarch64 lib/librte_eal/include/generic/rte_spinlock.h | 4 ++-- lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- lib/librte_ring/rte_ring_generic.h | 3 +-- 3 files changed, 5 insertions(+), 6 deletions(-) -- 2.17.1
In acquiring a spinlock, cores repeatedly poll the lock variable. This is replaced by rte_wait_until_equal API. Running the micro benchmarking and the testpmd and l3fwd traffic tests on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no notable performance gain nor degradation was measured. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- lib/librte_eal/include/generic/rte_spinlock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/include/generic/rte_spinlock.h b/lib/librte_eal/include/generic/rte_spinlock.h index 87ae7a4f1..40fe49d5a 100644 --- a/lib/librte_eal/include/generic/rte_spinlock.h +++ b/lib/librte_eal/include/generic/rte_spinlock.h @@ -65,8 +65,8 @@ rte_spinlock_lock(rte_spinlock_t *sl) while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { - while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) - rte_pause(); + rte_wait_until_equal_32((volatile uint32_t *)&sl->locked, + 0, __ATOMIC_RELAXED); exp = 0; } } -- 2.17.1
Instead of polling for tail to be updated, use wfe instruction. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- lib/librte_ring/rte_ring_generic.h | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index 0fb73a337..b37ef5995 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -2,6 +2,7 @@ * * Copyright (c) 2017,2018 HXT-semitech Corporation. * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org + * Copyright (c) 2019 Arm Limited * All rights reserved. * Derived from FreeBSD's bufring.h * Used as BSD-3 Licensed with permission from Kip Macy. @@ -21,8 +22,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); } diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h index 953cdbbd5..afce89aec 100644 --- a/lib/librte_ring/rte_ring_generic.h +++ b/lib/librte_ring/rte_ring_generic.h @@ -23,8 +23,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); ht->tail = new_val; } -- 2.17.1
Hi Konstantin, > -----Original Message----- > From: Ananyev, Konstantin <konstantin.ananyev@intel.com> > Sent: Friday, April 24, 2020 9:24 PM > To: Gavin Hu <Gavin.Hu@arm.com>; dev@dpdk.org > Cc: nd <nd@arm.com>; david.marchand@redhat.com; > thomas@monjalon.net; jerinj@marvell.com; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong > <Joyce.Kong@arm.com> > Subject: RE: [dpdk-dev] [PATCH v1 2/2] ring: use wfe to wait for ring tail > update on aarch64 > > > > > > Instead of polling for tail to be updated, use wfe instruction. > > > > Signed-off-by: Gavin Hu <gavin.hu@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > Reviewed-by: Steve Capper <steve.capper@arm.com> > > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > > --- > > lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- > > lib/librte_ring/rte_ring_generic.h | 3 +-- > > 2 files changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > b/lib/librte_ring/rte_ring_c11_mem.h > > index 0fb73a337..764d8f186 100644 > > --- a/lib/librte_ring/rte_ring_c11_mem.h > > +++ b/lib/librte_ring/rte_ring_c11_mem.h > > @@ -2,6 +2,7 @@ > > * > > * Copyright (c) 2017,2018 HXT-semitech Corporation. > > * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org > > + * Copyright (c) 2019 Arm Limited > > * All rights reserved. > > * Derived from FreeBSD's bufring.h > > * Used as BSD-3 Licensed with permission from Kip Macy. > > @@ -21,8 +22,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t > old_val, uint32_t new_val, > > * we need to wait for them to complete > > */ > > if (!single) > > - while (unlikely(ht->tail != old_val)) > > - rte_pause(); > > + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); > > > > __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); > > } > > diff --git a/lib/librte_ring/rte_ring_generic.h > b/lib/librte_ring/rte_ring_generic.h > > index 953cdbbd5..682852783 100644 > > --- a/lib/librte_ring/rte_ring_generic.h > > +++ b/lib/librte_ring/rte_ring_generic.h > > @@ -23,8 +23,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t > old_val, uint32_t new_val, > > * we need to wait for them to complete > > */ > > if (!single) > > - while (unlikely(ht->tail != old_val)) > > - rte_pause(); > > + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); > > ./lib/librte_ring/rte_ring_generic.h:26:3: error: implicit declaration of > function 'rte_wait_until_equal_relaxed_32'; did you mean > 'rte_wait_until_equal_32'? [-Werror=implicit-function-declaration] > rte_wait_until_equal_relaxed_32(&ht->tail, old_val); > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sorry for the leakage, fixed in v2. > > BTW, after patch #1 compilation fails also: > ../lib/librte_eal/x86/include/rte_spinlock.h:125:24: error: pointer targets in > passing argument 1 of 'rte_try_tm' differ in signedness [-Werror=pointer- > sign] > if (likely(rte_try_tm(&sl->locked))) Fixed in v2, thanks! > ^ > > > > > > > ht->tail = new_val; > > } > > -- > > 2.17.1
On Sun, Apr 26, 2020 at 2:09 PM Gavin Hu <gavin.hu@arm.com> wrote: > > In acquiring a spinlock, cores repeatedly poll the lock variable. > This is replaced by rte_wait_until_equal API. > > Running the micro benchmarking and the testpmd and l3fwd traffic tests > on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no > notable performance gain nor degradation was measured. > > Signed-off-by: Gavin Hu <gavin.hu@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > Reviewed-by: Phil Yang <phil.yang@arm.com> > Reviewed-by: Steve Capper <steve.capper@arm.com> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Jerin Jacob <jerinj@marvell.com> > --- > lib/librte_eal/include/generic/rte_spinlock.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_eal/include/generic/rte_spinlock.h b/lib/librte_eal/include/generic/rte_spinlock.h > index 87ae7a4f1..40fe49d5a 100644 > --- a/lib/librte_eal/include/generic/rte_spinlock.h > +++ b/lib/librte_eal/include/generic/rte_spinlock.h > @@ -65,8 +65,8 @@ rte_spinlock_lock(rte_spinlock_t *sl) > > while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, > __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { > - while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) > - rte_pause(); > + rte_wait_until_equal_32((volatile uint32_t *)&sl->locked, > + 0, __ATOMIC_RELAXED); > exp = 0; > } > } > -- > 2.17.1 >
> > The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling > for a memory location to become equal to a given value'[1]. > > Use the API for the rte spinlock and ring implementations. > > [1] http://patches.dpdk.org/cover/62703/ > > Gavin Hu (2): > spinlock: use wfe to reduce contention on aarch64 > ring: use wfe to wait for ring tail update on aarch64 > > lib/librte_eal/include/generic/rte_spinlock.h | 4 ++-- > lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- > lib/librte_ring/rte_ring_generic.h | 3 +-- > 3 files changed, 5 insertions(+), 6 deletions(-) > Series Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > -- > 2.17.1
On Sun, Apr 26, 2020 at 10:39 AM Gavin Hu <gavin.hu@arm.com> wrote:
>
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'[1].
>
> Use the API for the rte spinlock and ring implementations.
>
> [1] http://patches.dpdk.org/cover/62703/
>
> Gavin Hu (2):
> spinlock: use wfe to reduce contention on aarch64
> ring: use wfe to wait for ring tail update on aarch64
This would result in rte_ring and rte_spinlock APIs becoming
experimental and this breaks compilation for external applications
using stable ring and spinlock APIs.
IIRC, it was the reason why these patches were dropped with the
introduction of the rte_wait_until_equal_* API.
--
David Marchand
<snip> > Subject: Re: [PATCH v2 0/2] Use WFE for spinlock and ring > > On Sun, Apr 26, 2020 at 10:39 AM Gavin Hu <gavin.hu@arm.com> wrote: > > > > The rte_wait_until_equal_xxx APIs abstract the functionality of > > 'polling for a memory location to become equal to a given value'[1]. > > > > Use the API for the rte spinlock and ring implementations. > > > > [1] http://patches.dpdk.org/cover/62703/ > > > > Gavin Hu (2): > > spinlock: use wfe to reduce contention on aarch64 > > ring: use wfe to wait for ring tail update on aarch64 > > This would result in rte_ring and rte_spinlock APIs becoming experimental > and this breaks compilation for external applications using stable ring and > spinlock APIs. > IIRC, it was the reason why these patches were dropped with the introduction > of the rte_wait_until_equal_* API. Agreed, the rte_ring new sync modes are resulting in different use cases for these APIs. We need to take a relook at the APIs. > > > -- > David Marchand
Hello Honnappa,
On Sun, May 3, 2020 at 11:32 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> <snip>
>
> > Subject: Re: [PATCH v2 0/2] Use WFE for spinlock and ring
> >
> > On Sun, Apr 26, 2020 at 10:39 AM Gavin Hu <gavin.hu@arm.com> wrote:
> > >
> > > The rte_wait_until_equal_xxx APIs abstract the functionality of
> > > 'polling for a memory location to become equal to a given value'[1].
> > >
> > > Use the API for the rte spinlock and ring implementations.
> > >
> > > [1] http://patches.dpdk.org/cover/62703/
> > >
> > > Gavin Hu (2):
> > > spinlock: use wfe to reduce contention on aarch64
> > > ring: use wfe to wait for ring tail update on aarch64
> >
> > This would result in rte_ring and rte_spinlock APIs becoming experimental
> > and this breaks compilation for external applications using stable ring and
> > spinlock APIs.
> > IIRC, it was the reason why these patches were dropped with the introduction
> > of the rte_wait_until_equal_* API.
> Agreed, the rte_ring new sync modes are resulting in different use cases for these APIs. We need to take a relook at the APIs.
Will we reconsider this series now that the wait until equal is going stable?
--
David Marchand
Hello,
On Fri, Sep 25, 2020 at 4:30 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> Hello Honnappa,
>
> On Sun, May 3, 2020 at 11:32 PM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> >
> > <snip>
> >
> > > Subject: Re: [PATCH v2 0/2] Use WFE for spinlock and ring
> > >
> > > On Sun, Apr 26, 2020 at 10:39 AM Gavin Hu <gavin.hu@arm.com> wrote:
> > > >
> > > > The rte_wait_until_equal_xxx APIs abstract the functionality of
> > > > 'polling for a memory location to become equal to a given value'[1].
> > > >
> > > > Use the API for the rte spinlock and ring implementations.
> > > >
> > > > [1] http://patches.dpdk.org/cover/62703/
> > > >
> > > > Gavin Hu (2):
> > > > spinlock: use wfe to reduce contention on aarch64
> > > > ring: use wfe to wait for ring tail update on aarch64
> > >
> > > This would result in rte_ring and rte_spinlock APIs becoming experimental
> > > and this breaks compilation for external applications using stable ring and
> > > spinlock APIs.
> > > IIRC, it was the reason why these patches were dropped with the introduction
> > > of the rte_wait_until_equal_* API.
> > Agreed, the rte_ring new sync modes are resulting in different use cases for these APIs. We need to take a relook at the APIs.
>
> Will we reconsider this series now that the wait until equal is going stable?
Any update?
I suppose this would need some rebasing after the ring library changes.
Thanks.
--
David Marchand
The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling for a memory location to become equal to a given value'[1]. Use the API for the rte spinlock and ring implementations. With the wait until equal APIs being stable, changes will not impact ABI. [1] http://patches.dpdk.org/cover/62703/ v3: Series rebased. (David) Gavin Hu (1): spinlock: use wfe to reduce contention on aarch64 Ruifeng Wang (1): ring: use wfe to wait for ring tail update on aarch64 lib/eal/include/generic/rte_spinlock.h | 4 ++-- lib/ring/rte_ring_c11_pvt.h | 4 ++-- lib/ring/rte_ring_generic_pvt.h | 3 +-- 3 files changed, 5 insertions(+), 6 deletions(-) -- 2.25.1
From: Gavin Hu <gavin.hu@arm.com> In acquiring a spinlock, cores repeatedly poll the lock variable. This is replaced by rte_wait_until_equal API. Running the micro benchmarking and the testpmd and l3fwd traffic tests on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no notable performance gain nor degradation was measured. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/include/generic/rte_spinlock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h index 87ae7a4f18..40fe49d5ad 100644 --- a/lib/eal/include/generic/rte_spinlock.h +++ b/lib/eal/include/generic/rte_spinlock.h @@ -65,8 +65,8 @@ rte_spinlock_lock(rte_spinlock_t *sl) while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { - while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) - rte_pause(); + rte_wait_until_equal_32((volatile uint32_t *)&sl->locked, + 0, __ATOMIC_RELAXED); exp = 0; } } -- 2.25.1
Instead of polling for tail to be updated, use wfe instruction. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- lib/ring/rte_ring_c11_pvt.h | 4 ++-- lib/ring/rte_ring_generic_pvt.h | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h index 759192f4c4..37e0b2afd6 100644 --- a/lib/ring/rte_ring_c11_pvt.h +++ b/lib/ring/rte_ring_c11_pvt.h @@ -2,6 +2,7 @@ * * Copyright (c) 2017,2018 HXT-semitech Corporation. * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org + * Copyright (c) 2021 Arm Limited * All rights reserved. * Derived from FreeBSD's bufring.h * Used as BSD-3 Licensed with permission from Kip Macy. @@ -21,8 +22,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); } diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h index 532deb5e7a..c95ad7e12c 100644 --- a/lib/ring/rte_ring_generic_pvt.h +++ b/lib/ring/rte_ring_generic_pvt.h @@ -23,8 +23,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); ht->tail = new_val; } -- 2.25.1
> -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: Thursday, March 25, 2021 10:58 PM > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Cc: dev <dev@dpdk.org>; nd <nd@arm.com>; thomas@monjalon.net; > jerinj@marvell.com; Ruifeng Wang <Ruifeng.Wang@arm.com>; Phil Yang > <Phil.Yang@arm.com>; Joyce Kong <Joyce.Kong@arm.com>; Ananyev, > Konstantin <konstantin.ananyev@intel.com> > Subject: Re: [dpdk-dev] [PATCH v2 0/2] Use WFE for spinlock and ring > > Hello, > > On Fri, Sep 25, 2020 at 4:30 PM David Marchand > <david.marchand@redhat.com> wrote: > > > > Hello Honnappa, > > > > On Sun, May 3, 2020 at 11:32 PM Honnappa Nagarahalli > > <Honnappa.Nagarahalli@arm.com> wrote: > > > > > > <snip> > > > > > > > Subject: Re: [PATCH v2 0/2] Use WFE for spinlock and ring > > > > > > > > On Sun, Apr 26, 2020 at 10:39 AM Gavin Hu <gavin.hu@arm.com> wrote: > > > > > > > > > > The rte_wait_until_equal_xxx APIs abstract the functionality of > > > > > 'polling for a memory location to become equal to a given value'[1]. > > > > > > > > > > Use the API for the rte spinlock and ring implementations. > > > > > > > > > > [1] http://patches.dpdk.org/cover/62703/ > > > > > > > > > > Gavin Hu (2): > > > > > spinlock: use wfe to reduce contention on aarch64 > > > > > ring: use wfe to wait for ring tail update on aarch64 > > > > > > > > This would result in rte_ring and rte_spinlock APIs becoming > > > > experimental and this breaks compilation for external applications > > > > using stable ring and spinlock APIs. > > > > IIRC, it was the reason why these patches were dropped with the > > > > introduction of the rte_wait_until_equal_* API. > > > Agreed, the rte_ring new sync modes are resulting in different use cases > for these APIs. We need to take a relook at the APIs. > > > > Will we reconsider this series now that the wait until equal is going stable? > > Any update? > I suppose this would need some rebasing after the ring library changes. Sorry for the late response. This series has been rebased and v3 posted. I have problem to mark v2 as superseded. Thank you. > > Thanks. > > > -- > David Marchand
Ruifeng Wang:
> From: David Marchand <david.marchand@redhat.com>
> > Any update?
> > I suppose this would need some rebasing after the ring library changes.
>
> Sorry for the late response.
> This series has been rebased and v3 posted.
> I have problem to mark v2 as superseded.
I did change v2 as superseded.
On Sun, Apr 25, 2021 at 11:27 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote: > > Instead of polling for tail to be updated, use wfe instruction. > > Signed-off-by: Gavin Hu <gavin.hu@arm.com> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> > Reviewed-by: Steve Capper <steve.capper@arm.com> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com> > --- > lib/ring/rte_ring_c11_pvt.h | 4 ++-- > lib/ring/rte_ring_generic_pvt.h | 3 +-- > 2 files changed, 3 insertions(+), 4 deletions(-) > > diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h > index 759192f4c4..37e0b2afd6 100644 > --- a/lib/ring/rte_ring_c11_pvt.h > +++ b/lib/ring/rte_ring_c11_pvt.h > @@ -2,6 +2,7 @@ > * > * Copyright (c) 2017,2018 HXT-semitech Corporation. > * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org > + * Copyright (c) 2021 Arm Limited > * All rights reserved. > * Derived from FreeBSD's bufring.h > * Used as BSD-3 Licensed with permission from Kip Macy. > @@ -21,8 +22,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, > * we need to wait for them to complete > */ > if (!single) > - while (unlikely(ht->tail != old_val)) > - rte_pause(); > + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); > > __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); > } > diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h > index 532deb5e7a..c95ad7e12c 100644 > --- a/lib/ring/rte_ring_generic_pvt.h > +++ b/lib/ring/rte_ring_generic_pvt.h > @@ -23,8 +23,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, > * we need to wait for them to complete > */ > if (!single) > - while (unlikely(ht->tail != old_val)) > - rte_pause(); > + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); > > ht->tail = new_val; > } > -- > 2.25.1 >
Hello Ruifeng,
On Sun, Apr 25, 2021 at 7:57 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'[1].
>
> Use the API for the rte spinlock and ring implementations.
> With the wait until equal APIs being stable, changes will not impact ABI.
Afaics, there is no ARM target with WFE enabled and we lost ability to
enable WFE support with removal of the make build system.
$ git grep RTE_ARM_USE_WFE
config/arm/meson.build: ['RTE_ARM_USE_WFE', false],
lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE
How did you enable WFE to test this series?
--
David Marchand
> -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: Wednesday, April 28, 2021 3:42 PM > To: Ruifeng Wang <Ruifeng.Wang@arm.com> > Cc: dev <dev@dpdk.org>; thomas@monjalon.net; jerinj@marvell.com; nd > <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Subject: Re: Use WFE for spinlock and ring > > Hello Ruifeng, Hello David, > > On Sun, Apr 25, 2021 at 7:57 AM Ruifeng Wang <ruifeng.wang@arm.com> > wrote: > > > > The rte_wait_until_equal_xxx APIs abstract the functionality of > > 'polling for a memory location to become equal to a given value'[1]. > > > > Use the API for the rte spinlock and ring implementations. > > With the wait until equal APIs being stable, changes will not impact ABI. > > Afaics, there is no ARM target with WFE enabled and we lost ability to enable > WFE support with removal of the make build system. WFE can be enabled with direct meson file change. WFE is not intended to be enabled by default. It can be enabled based on benchmarking result on hardware. > > $ git grep RTE_ARM_USE_WFE > config/arm/meson.build: ['RTE_ARM_USE_WFE', false], > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE > > How did you enable WFE to test this series? I modified meson file to test. Tests were also done with WFE disabled to make sure no degradation with generic implementation. > > > -- > David Marchand
28/04/2021 11:30, Ruifeng Wang:
> From: David Marchand <david.marchand@redhat.com>
> > On Sun, Apr 25, 2021 at 7:57 AM Ruifeng Wang <ruifeng.wang@arm.com>
> > wrote:
> > >
> > > The rte_wait_until_equal_xxx APIs abstract the functionality of
> > > 'polling for a memory location to become equal to a given value'[1].
> > >
> > > Use the API for the rte spinlock and ring implementations.
> > > With the wait until equal APIs being stable, changes will not impact ABI.
> >
> > Afaics, there is no ARM target with WFE enabled and we lost ability to enable
> > WFE support with removal of the make build system.
>
> WFE can be enabled with direct meson file change.
> WFE is not intended to be enabled by default. It can be enabled based on benchmarking
> result on hardware.
> >
> > $ git grep RTE_ARM_USE_WFE
> > config/arm/meson.build: ['RTE_ARM_USE_WFE', false],
> > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE
> >
> > How did you enable WFE to test this series?
>
> I modified meson file to test.
> Tests were also done with WFE disabled to make sure no degradation with generic implementation.
I don't understand the usage.
Which platform should use it?
Should it be a compile-time option?
On Sun, 25 Apr 2021 05:56:53 +0000
Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> Instead of polling for tail to be updated, use wfe instruction.
>
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Looks ok to me, but it does raise an interesting question.
Shouldn't the original code have been using atomic load to look at ht->tail.
This another place where "volatile considered harmful" applies.
> -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Wednesday, April 28, 2021 7:14 PM > To: Ruifeng Wang <Ruifeng.Wang@arm.com> > Cc: David Marchand <david.marchand@redhat.com>; dev <dev@dpdk.org>; > jerinj@marvell.com; nd <nd@arm.com>; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com> > Subject: Re: Use WFE for spinlock and ring > > 28/04/2021 11:30, Ruifeng Wang: > > From: David Marchand <david.marchand@redhat.com> > > > On Sun, Apr 25, 2021 at 7:57 AM Ruifeng Wang <ruifeng.wang@arm.com> > > > wrote: > > > > > > > > The rte_wait_until_equal_xxx APIs abstract the functionality of > > > > 'polling for a memory location to become equal to a given value'[1]. > > > > > > > > Use the API for the rte spinlock and ring implementations. > > > > With the wait until equal APIs being stable, changes will not impact ABI. > > > > > > Afaics, there is no ARM target with WFE enabled and we lost ability > > > to enable WFE support with removal of the make build system. > > > > WFE can be enabled with direct meson file change. > > WFE is not intended to be enabled by default. It can be enabled based > > on benchmarking result on hardware. > > > > > > $ git grep RTE_ARM_USE_WFE > > > config/arm/meson.build: ['RTE_ARM_USE_WFE', false], > > > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE > > > > > > How did you enable WFE to test this series? > > > > I modified meson file to test. > > Tests were also done with WFE disabled to make sure no degradation with > generic implementation. > > I don't understand the usage. > Which platform should use it? Platforms that implement WFE semantic (e.g. N1) can use. The user can enable this feature for power efficiency purpose. But there is something to note as described in commit message 1be7855d77 when the API was introduced. > Should it be a compile-time option? Yes, it should be a compile-time option. It can be configured via c_args meson option? >
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, April 29, 2021 1:17 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> Ananyev <konstantin.ananyev@intel.com>; dev@dpdk.org;
> david.marchand@redhat.com; thomas@monjalon.net; jerinj@marvell.com;
> nd <nd@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v3 2/2] ring: use wfe to wait for ring tail
> update on aarch64
>
> On Sun, 25 Apr 2021 05:56:53 +0000
> Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> > Instead of polling for tail to be updated, use wfe instruction.
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>
> Looks ok to me, but it does raise an interesting question.
> Shouldn't the original code have been using atomic load to look at ht->tail.
>
> This another place where "volatile considered harmful" applies.
Do you mean 'volatile' should be removed from rte_wait_until_equal_xxx parameters?
On Thu, 29 Apr 2021 14:35:35 +0000
Ruifeng Wang <Ruifeng.Wang@arm.com> wrote:
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Thursday, April 29, 2021 1:17 AM
> > To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> > Ananyev <konstantin.ananyev@intel.com>; dev@dpdk.org;
> > david.marchand@redhat.com; thomas@monjalon.net; jerinj@marvell.com;
> > nd <nd@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Steve Capper
> > <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v3 2/2] ring: use wfe to wait for ring tail
> > update on aarch64
> >
> > On Sun, 25 Apr 2021 05:56:53 +0000
> > Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> >
> > > Instead of polling for tail to be updated, use wfe instruction.
> > >
> > > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > Looks ok to me, but it does raise an interesting question.
> > Shouldn't the original code have been using atomic load to look at ht->tail.
> >
> > This another place where "volatile considered harmful" applies.
>
> Do you mean 'volatile' should be removed from rte_wait_until_equal_xxx parameters?
>
I meant that all access to tail should be via C11 atomic builtin. At that point,
the volatile on the data structure elements does not matter.
29/04/2021 16:28, Ruifeng Wang:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 28/04/2021 11:30, Ruifeng Wang:
> > > From: David Marchand <david.marchand@redhat.com>
> > > > On Sun, Apr 25, 2021 at 7:57 AM Ruifeng Wang <ruifeng.wang@arm.com>
> > > > wrote:
> > > > >
> > > > > The rte_wait_until_equal_xxx APIs abstract the functionality of
> > > > > 'polling for a memory location to become equal to a given value'[1].
> > > > >
> > > > > Use the API for the rte spinlock and ring implementations.
> > > > > With the wait until equal APIs being stable, changes will not impact ABI.
> > > >
> > > > Afaics, there is no ARM target with WFE enabled and we lost ability
> > > > to enable WFE support with removal of the make build system.
> > >
> > > WFE can be enabled with direct meson file change.
> > > WFE is not intended to be enabled by default. It can be enabled based
> > > on benchmarking result on hardware.
> > > >
> > > > $ git grep RTE_ARM_USE_WFE
> > > > config/arm/meson.build: ['RTE_ARM_USE_WFE', false],
> > > > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE
> > > >
> > > > How did you enable WFE to test this series?
> > >
> > > I modified meson file to test.
> > > Tests were also done with WFE disabled to make sure no degradation with
> > generic implementation.
> >
> > I don't understand the usage.
> > Which platform should use it?
>
> Platforms that implement WFE semantic (e.g. N1) can use.
> The user can enable this feature for power efficiency purpose. But there is something to
> note as described in commit message 1be7855d77 when the API was introduced.
>
> > Should it be a compile-time option?
>
> Yes, it should be a compile-time option.
> It can be configured via c_args meson option?
+Cc Bruce for discussing how to enable such feature.
The problem with c_args is that the application has no way to know.
On Thu, Apr 29, 2021 at 05:20:05PM +0200, Thomas Monjalon wrote:
> 29/04/2021 16:28, Ruifeng Wang:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 28/04/2021 11:30, Ruifeng Wang:
> > > > From: David Marchand <david.marchand@redhat.com>
> > > > > On Sun, Apr 25, 2021 at 7:57 AM Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > wrote:
> > > > > >
> > > > > > The rte_wait_until_equal_xxx APIs abstract the functionality of
> > > > > > 'polling for a memory location to become equal to a given value'[1].
> > > > > >
> > > > > > Use the API for the rte spinlock and ring implementations.
> > > > > > With the wait until equal APIs being stable, changes will not impact ABI.
> > > > >
> > > > > Afaics, there is no ARM target with WFE enabled and we lost ability
> > > > > to enable WFE support with removal of the make build system.
> > > >
> > > > WFE can be enabled with direct meson file change.
> > > > WFE is not intended to be enabled by default. It can be enabled based
> > > > on benchmarking result on hardware.
> > > > >
> > > > > $ git grep RTE_ARM_USE_WFE
> > > > > config/arm/meson.build: ['RTE_ARM_USE_WFE', false],
> > > > > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE
> > > > >
> > > > > How did you enable WFE to test this series?
> > > >
> > > > I modified meson file to test.
> > > > Tests were also done with WFE disabled to make sure no degradation with
> > > generic implementation.
> > >
> > > I don't understand the usage.
> > > Which platform should use it?
> >
> > Platforms that implement WFE semantic (e.g. N1) can use.
> > The user can enable this feature for power efficiency purpose. But there is something to
> > note as described in commit message 1be7855d77 when the API was introduced.
> >
> > > Should it be a compile-time option?
> >
> > Yes, it should be a compile-time option.
> > It can be configured via c_args meson option?
>
> +Cc Bruce for discussing how to enable such feature.
>
> The problem with c_args is that the application has no way to know.
>
Agree about c_args not being a great choice. Why does this need to be a
compile-time option? Can runtime support not be detected in some manner?
<snip>
> > > > > > >
> > > > > > > The rte_wait_until_equal_xxx APIs abstract the functionality
> > > > > > > of 'polling for a memory location to become equal to a given
> value'[1].
> > > > > > >
> > > > > > > Use the API for the rte spinlock and ring implementations.
> > > > > > > With the wait until equal APIs being stable, changes will not impact
> ABI.
> > > > > >
> > > > > > Afaics, there is no ARM target with WFE enabled and we lost
> > > > > > ability to enable WFE support with removal of the make build
> system.
> > > > >
> > > > > WFE can be enabled with direct meson file change.
> > > > > WFE is not intended to be enabled by default. It can be enabled
> > > > > based on benchmarking result on hardware.
> > > > > >
> > > > > > $ git grep RTE_ARM_USE_WFE
> > > > > > config/arm/meson.build: ['RTE_ARM_USE_WFE', false],
> > > > > > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE
> > > > > >
> > > > > > How did you enable WFE to test this series?
> > > > >
> > > > > I modified meson file to test.
> > > > > Tests were also done with WFE disabled to make sure no
> > > > > degradation with
> > > > generic implementation.
> > > >
> > > > I don't understand the usage.
> > > > Which platform should use it?
> > >
> > > Platforms that implement WFE semantic (e.g. N1) can use.
> > > The user can enable this feature for power efficiency purpose. But
> > > there is something to note as described in commit message 1be7855d77
> when the API was introduced.
> > >
> > > > Should it be a compile-time option?
> > >
> > > Yes, it should be a compile-time option.
> > > It can be configured via c_args meson option?
> >
> > +Cc Bruce for discussing how to enable such feature.
> >
> > The problem with c_args is that the application has no way to know.
> >
> Agree about c_args not being a great choice. Why does this need to be a
> compile-time option? Can runtime support not be detected in some
> manner?
The problem is inconsistency in performance on different Arm platforms. We had decided that each platform needs to enable it after some testing.
On Fri, Apr 30, 2021 at 01:41:22PM +0000, Honnappa Nagarahalli wrote:
> <snip>
>
> > > > > > > >
> > > > > > > > The rte_wait_until_equal_xxx APIs abstract the functionality
> > > > > > > > of 'polling for a memory location to become equal to a given
> > value'[1].
> > > > > > > >
> > > > > > > > Use the API for the rte spinlock and ring implementations.
> > > > > > > > With the wait until equal APIs being stable, changes will not impact
> > ABI.
> > > > > > >
> > > > > > > Afaics, there is no ARM target with WFE enabled and we lost
> > > > > > > ability to enable WFE support with removal of the make build
> > system.
> > > > > >
> > > > > > WFE can be enabled with direct meson file change.
> > > > > > WFE is not intended to be enabled by default. It can be enabled
> > > > > > based on benchmarking result on hardware.
> > > > > > >
> > > > > > > $ git grep RTE_ARM_USE_WFE
> > > > > > > config/arm/meson.build: ['RTE_ARM_USE_WFE', false],
> > > > > > > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE
> > > > > > >
> > > > > > > How did you enable WFE to test this series?
> > > > > >
> > > > > > I modified meson file to test.
> > > > > > Tests were also done with WFE disabled to make sure no
> > > > > > degradation with
> > > > > generic implementation.
> > > > >
> > > > > I don't understand the usage.
> > > > > Which platform should use it?
> > > >
> > > > Platforms that implement WFE semantic (e.g. N1) can use.
> > > > The user can enable this feature for power efficiency purpose. But
> > > > there is something to note as described in commit message 1be7855d77
> > when the API was introduced.
> > > >
> > > > > Should it be a compile-time option?
> > > >
> > > > Yes, it should be a compile-time option.
> > > > It can be configured via c_args meson option?
> > >
> > > +Cc Bruce for discussing how to enable such feature.
> > >
> > > The problem with c_args is that the application has no way to know.
> > >
> > Agree about c_args not being a great choice. Why does this need to be a
> > compile-time option? Can runtime support not be detected in some
> > manner?
> The problem is inconsistency in performance on different Arm platforms. We had decided that each platform needs to enable it after some testing.
Then it sounds like it does indeed need to be a build option. Does it need
to be added to the meson_options.txt, or can it just be specified in
cross-files and optionally via c_args?
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, April 29, 2021 11:06 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> Ananyev <konstantin.ananyev@intel.com>; dev@dpdk.org;
> david.marchand@redhat.com; thomas@monjalon.net; jerinj@marvell.com;
> nd <nd@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v3 2/2] ring: use wfe to wait for ring tail
> update on aarch64
>
> On Thu, 29 Apr 2021 14:35:35 +0000
> Ruifeng Wang <Ruifeng.Wang@arm.com> wrote:
>
> > > -----Original Message-----
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > Sent: Thursday, April 29, 2021 1:17 AM
> > > To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > > Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> > > Ananyev <konstantin.ananyev@intel.com>; dev@dpdk.org;
> > > david.marchand@redhat.com; thomas@monjalon.net;
> jerinj@marvell.com;
> > > nd <nd@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Steve Capper
> > > <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v3 2/2] ring: use wfe to wait for
> > > ring tail update on aarch64
> > >
> > > On Sun, 25 Apr 2021 05:56:53 +0000
> > > Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> > >
> > > > Instead of polling for tail to be updated, use wfe instruction.
> > > >
> > > > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > > > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > >
> > > Looks ok to me, but it does raise an interesting question.
> > > Shouldn't the original code have been using atomic load to look at ht->tail.
> > >
> > > This another place where "volatile considered harmful" applies.
> >
> > Do you mean 'volatile' should be removed from rte_wait_until_equal_xxx
> parameters?
> >
>
> I meant that all access to tail should be via C11 atomic builtin. At that point,
> the volatile on the data structure elements does not matter.
Agreed. If synchronization is ensured by using C11 atomic builtin, 'volatile' on elements can be removed.
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Friday, April 30, 2021 10:19 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: thomas@monjalon.net; Ruifeng Wang <Ruifeng.Wang@arm.com>; David
> Marchand <david.marchand@redhat.com>; dev <dev@dpdk.org>;
> jerinj@marvell.com; nd <nd@arm.com>
> Subject: Re: Use WFE for spinlock and ring
>
> On Fri, Apr 30, 2021 at 01:41:22PM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > > > > > > >
> > > > > > > > > The rte_wait_until_equal_xxx APIs abstract the
> > > > > > > > > functionality of 'polling for a memory location to
> > > > > > > > > become equal to a given
> > > value'[1].
> > > > > > > > >
> > > > > > > > > Use the API for the rte spinlock and ring implementations.
> > > > > > > > > With the wait until equal APIs being stable, changes
> > > > > > > > > will not impact
> > > ABI.
> > > > > > > >
> > > > > > > > Afaics, there is no ARM target with WFE enabled and we
> > > > > > > > lost ability to enable WFE support with removal of the
> > > > > > > > make build
> > > system.
> > > > > > >
> > > > > > > WFE can be enabled with direct meson file change.
> > > > > > > WFE is not intended to be enabled by default. It can be
> > > > > > > enabled based on benchmarking result on hardware.
> > > > > > > >
> > > > > > > > $ git grep RTE_ARM_USE_WFE
> > > > > > > > config/arm/meson.build: ['RTE_ARM_USE_WFE', false],
> > > > > > > > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE
> > > > > > > >
> > > > > > > > How did you enable WFE to test this series?
> > > > > > >
> > > > > > > I modified meson file to test.
> > > > > > > Tests were also done with WFE disabled to make sure no
> > > > > > > degradation with
> > > > > > generic implementation.
> > > > > >
> > > > > > I don't understand the usage.
> > > > > > Which platform should use it?
> > > > >
> > > > > Platforms that implement WFE semantic (e.g. N1) can use.
> > > > > The user can enable this feature for power efficiency purpose.
> > > > > But there is something to note as described in commit message
> > > > > 1be7855d77
> > > when the API was introduced.
> > > > >
> > > > > > Should it be a compile-time option?
> > > > >
> > > > > Yes, it should be a compile-time option.
> > > > > It can be configured via c_args meson option?
> > > >
> > > > +Cc Bruce for discussing how to enable such feature.
> > > >
> > > > The problem with c_args is that the application has no way to know.
> > > >
> > > Agree about c_args not being a great choice. Why does this need to
> > > be a compile-time option? Can runtime support not be detected in
> > > some manner?
> > The problem is inconsistency in performance on different Arm platforms.
> We had decided that each platform needs to enable it after some testing.
>
> Then it sounds like it does indeed need to be a build option. Does it need to
> be added to the meson_options.txt, or can it just be specified in cross-files
> and optionally via c_args?
Add it to the meson_options.txt is good as the option will be clearly exposed.
My concern is more options (e.g. RTE_ARCH_ARM64_MEMCPY) need to be added in.
Will the options bloat meson_options.txt?
On Fri, May 07, 2021 at 10:18:52AM +0000, Ruifeng Wang wrote:
> > -----Original Message-----
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > Sent: Friday, April 30, 2021 10:19 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Cc: thomas@monjalon.net; Ruifeng Wang <Ruifeng.Wang@arm.com>; David
> > Marchand <david.marchand@redhat.com>; dev <dev@dpdk.org>;
> > jerinj@marvell.com; nd <nd@arm.com>
> > Subject: Re: Use WFE for spinlock and ring
> >
> > On Fri, Apr 30, 2021 at 01:41:22PM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > > > > > > >
> > > > > > > > > > The rte_wait_until_equal_xxx APIs abstract the
> > > > > > > > > > functionality of 'polling for a memory location to
> > > > > > > > > > become equal to a given
> > > > value'[1].
> > > > > > > > > >
> > > > > > > > > > Use the API for the rte spinlock and ring implementations.
> > > > > > > > > > With the wait until equal APIs being stable, changes
> > > > > > > > > > will not impact
> > > > ABI.
> > > > > > > > >
> > > > > > > > > Afaics, there is no ARM target with WFE enabled and we
> > > > > > > > > lost ability to enable WFE support with removal of the
> > > > > > > > > make build
> > > > system.
> > > > > > > >
> > > > > > > > WFE can be enabled with direct meson file change.
> > > > > > > > WFE is not intended to be enabled by default. It can be
> > > > > > > > enabled based on benchmarking result on hardware.
> > > > > > > > >
> > > > > > > > > $ git grep RTE_ARM_USE_WFE
> > > > > > > > > config/arm/meson.build: ['RTE_ARM_USE_WFE', false],
> > > > > > > > > lib/eal/arm/include/rte_pause_64.h:#ifdef RTE_ARM_USE_WFE
> > > > > > > > >
> > > > > > > > > How did you enable WFE to test this series?
> > > > > > > >
> > > > > > > > I modified meson file to test.
> > > > > > > > Tests were also done with WFE disabled to make sure no
> > > > > > > > degradation with
> > > > > > > generic implementation.
> > > > > > >
> > > > > > > I don't understand the usage.
> > > > > > > Which platform should use it?
> > > > > >
> > > > > > Platforms that implement WFE semantic (e.g. N1) can use.
> > > > > > The user can enable this feature for power efficiency purpose.
> > > > > > But there is something to note as described in commit message
> > > > > > 1be7855d77
> > > > when the API was introduced.
> > > > > >
> > > > > > > Should it be a compile-time option?
> > > > > >
> > > > > > Yes, it should be a compile-time option.
> > > > > > It can be configured via c_args meson option?
> > > > >
> > > > > +Cc Bruce for discussing how to enable such feature.
> > > > >
> > > > > The problem with c_args is that the application has no way to know.
> > > > >
> > > > Agree about c_args not being a great choice. Why does this need to
> > > > be a compile-time option? Can runtime support not be detected in
> > > > some manner?
> > > The problem is inconsistency in performance on different Arm platforms.
> > We had decided that each platform needs to enable it after some testing.
> >
> > Then it sounds like it does indeed need to be a build option. Does it need to
> > be added to the meson_options.txt, or can it just be specified in cross-files
> > and optionally via c_args?
>
> Add it to the meson_options.txt is good as the option will be clearly exposed.
> My concern is more options (e.g. RTE_ARCH_ARM64_MEMCPY) need to be added in.
> Will the options bloat meson_options.txt?
That bloat would indeed become a concern. We may need to look at more use
of auto-detection and cross files for such options.
On Fri, May 7, 2021 at 12:24 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
> > > > The problem is inconsistency in performance on different Arm platforms.
> > > We had decided that each platform needs to enable it after some testing.
> > >
> > > Then it sounds like it does indeed need to be a build option. Does it need to
> > > be added to the meson_options.txt, or can it just be specified in cross-files
> > > and optionally via c_args?
> >
> > Add it to the meson_options.txt is good as the option will be clearly exposed.
> > My concern is more options (e.g. RTE_ARCH_ARM64_MEMCPY) need to be added in.
> > Will the options bloat meson_options.txt?
>
> That bloat would indeed become a concern. We may need to look at more use
> of auto-detection and cross files for such options.
Ruifeng,
This series is blocked on this last topic.
Do you intend to send a n+1?
--
David Marchand
> -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: Monday, July 5, 2021 4:52 PM > To: Ruifeng Wang <Ruifeng.Wang@arm.com> > Cc: Bruce Richardson <bruce.richardson@intel.com>; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; thomas@monjalon.net; dev > <dev@dpdk.org>; jerinj@marvell.com; nd <nd@arm.com> > Subject: Re: Use WFE for spinlock and ring > > On Fri, May 7, 2021 at 12:24 PM Bruce Richardson > <bruce.richardson@intel.com> wrote: > > > > > The problem is inconsistency in performance on different Arm > platforms. > > > > We had decided that each platform needs to enable it after some > testing. > > > > > > > > Then it sounds like it does indeed need to be a build option. Does > > > > it need to be added to the meson_options.txt, or can it just be > > > > specified in cross-files and optionally via c_args? > > > > > > Add it to the meson_options.txt is good as the option will be clearly > exposed. > > > My concern is more options (e.g. RTE_ARCH_ARM64_MEMCPY) need to > be added in. > > > Will the options bloat meson_options.txt? > > > > That bloat would indeed become a concern. We may need to look at more > > use of auto-detection and cross files for such options. > > Ruifeng, > > This series is blocked on this last topic. > Do you intend to send a n+1? I think the best way to expose this config is adding it to meson_options. Will send v4 to update. > > > -- > David Marchand
The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling for a memory location to become equal to a given value'[1]. Use the API for the rte spinlock and ring implementations. With the wait until equal APIs being stable, changes will not impact ABI. [1] http://patches.dpdk.org/cover/62703/ v4: Added meson option to expose WFE. (David, Bruce) v3: Series rebased. (David) Gavin Hu (1): spinlock: use wfe to reduce contention on aarch64 Ruifeng Wang (2): ring: use wfe to wait for ring tail update on aarch64 build: add option to enable wait until equal config/arm/meson.build | 2 +- lib/eal/include/generic/rte_spinlock.h | 4 ++-- lib/ring/rte_ring_c11_pvt.h | 4 ++-- lib/ring/rte_ring_generic_pvt.h | 3 +-- meson_options.txt | 2 ++ 5 files changed, 8 insertions(+), 7 deletions(-) -- 2.25.1
From: Gavin Hu <gavin.hu@arm.com> In acquiring a spinlock, cores repeatedly poll the lock variable. This is replaced by rte_wait_until_equal API. Running the micro benchmarking and the testpmd and l3fwd traffic tests on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no notable performance gain nor degradation was measured. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- lib/eal/include/generic/rte_spinlock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h index 87ae7a4f18..40fe49d5ad 100644 --- a/lib/eal/include/generic/rte_spinlock.h +++ b/lib/eal/include/generic/rte_spinlock.h @@ -65,8 +65,8 @@ rte_spinlock_lock(rte_spinlock_t *sl) while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { - while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) - rte_pause(); + rte_wait_until_equal_32((volatile uint32_t *)&sl->locked, + 0, __ATOMIC_RELAXED); exp = 0; } } -- 2.25.1
Instead of polling for tail to be updated, use wfe instruction. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- lib/ring/rte_ring_c11_pvt.h | 4 ++-- lib/ring/rte_ring_generic_pvt.h | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h index 759192f4c4..37e0b2afd6 100644 --- a/lib/ring/rte_ring_c11_pvt.h +++ b/lib/ring/rte_ring_c11_pvt.h @@ -2,6 +2,7 @@ * * Copyright (c) 2017,2018 HXT-semitech Corporation. * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org + * Copyright (c) 2021 Arm Limited * All rights reserved. * Derived from FreeBSD's bufring.h * Used as BSD-3 Licensed with permission from Kip Macy. @@ -21,8 +22,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); } diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h index 532deb5e7a..c95ad7e12c 100644 --- a/lib/ring/rte_ring_generic_pvt.h +++ b/lib/ring/rte_ring_generic_pvt.h @@ -23,8 +23,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); ht->tail = new_val; } -- 2.25.1
Introduce a meson option 'use_wfe' to select wait until equal method. The default is disable. Traditional polling loop is used. When enabled, architecture specific mechanism is relied on to do the wait. Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> --- config/arm/meson.build | 2 +- meson_options.txt | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 9b147c0b93..9ea478fb77 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -18,7 +18,6 @@ flags_common = [ # ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false], ['RTE_SCHED_VECTOR', false], - ['RTE_ARM_USE_WFE', false], ['RTE_ARCH_ARM64', true], ['RTE_CACHE_LINE_SIZE', 128] ] @@ -371,6 +370,7 @@ socs = { dpdk_conf.set('RTE_ARCH_ARM', 1) dpdk_conf.set('RTE_FORCE_INTRINSICS', 1) +dpdk_conf.set('RTE_ARM_USE_WFE', get_option('use_wfe')) if dpdk_conf.get('RTE_ARCH_32') # armv7 build diff --git a/meson_options.txt b/meson_options.txt index 56bdfd0f0a..5012803c77 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -44,3 +44,5 @@ option('tests', type: 'boolean', value: true, description: 'build unit tests') option('use_hpet', type: 'boolean', value: false, description: 'use HPET timer in EAL') +option('use_wfe', type: 'boolean', value: false, description: + 'use Wait Until Equal for polling loops') -- 2.25.1
The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling for a memory location to become equal to a given value'[1]. Use the API for the rte spinlock and ring implementations. With the wait until equal APIs being stable, changes will not impact ABI. [1] http://patches.dpdk.org/cover/62703/ v4: Added meson option to expose WFE. (David, Bruce) v3: Series rebased. (David) Gavin Hu (1): spinlock: use wfe to reduce contention on aarch64 Ruifeng Wang (2): ring: use wfe to wait for ring tail update on aarch64 build: add option to enable wait until equal config/arm/meson.build | 2 +- lib/eal/include/generic/rte_spinlock.h | 4 ++-- lib/ring/rte_ring_c11_pvt.h | 4 ++-- lib/ring/rte_ring_generic_pvt.h | 3 +-- meson_options.txt | 2 ++ 5 files changed, 8 insertions(+), 7 deletions(-) -- 2.25.1
From: Gavin Hu <gavin.hu@arm.com> In acquiring a spinlock, cores repeatedly poll the lock variable. This is replaced by rte_wait_until_equal API. Running the micro benchmarking and the testpmd and l3fwd traffic tests on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no notable performance gain nor degradation was measured. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/include/generic/rte_spinlock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h index 87ae7a4f18..40fe49d5ad 100644 --- a/lib/eal/include/generic/rte_spinlock.h +++ b/lib/eal/include/generic/rte_spinlock.h @@ -65,8 +65,8 @@ rte_spinlock_lock(rte_spinlock_t *sl) while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { - while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) - rte_pause(); + rte_wait_until_equal_32((volatile uint32_t *)&sl->locked, + 0, __ATOMIC_RELAXED); exp = 0; } } -- 2.25.1
Instead of polling for tail to be updated, use wfe instruction. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com> --- lib/ring/rte_ring_c11_pvt.h | 4 ++-- lib/ring/rte_ring_generic_pvt.h | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h index 759192f4c4..37e0b2afd6 100644 --- a/lib/ring/rte_ring_c11_pvt.h +++ b/lib/ring/rte_ring_c11_pvt.h @@ -2,6 +2,7 @@ * * Copyright (c) 2017,2018 HXT-semitech Corporation. * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org + * Copyright (c) 2021 Arm Limited * All rights reserved. * Derived from FreeBSD's bufring.h * Used as BSD-3 Licensed with permission from Kip Macy. @@ -21,8 +22,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); } diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h index 532deb5e7a..c95ad7e12c 100644 --- a/lib/ring/rte_ring_generic_pvt.h +++ b/lib/ring/rte_ring_generic_pvt.h @@ -23,8 +23,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); ht->tail = new_val; } -- 2.25.1
Introduce a meson option 'use_wfe' to select wait until equal method. The default is disable. Traditional polling loop is used. When enabled, architecture specific mechanism is relied on to do the wait. Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> --- config/arm/meson.build | 2 +- meson_options.txt | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 9b147c0b93..9ea478fb77 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -18,7 +18,6 @@ flags_common = [ # ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false], ['RTE_SCHED_VECTOR', false], - ['RTE_ARM_USE_WFE', false], ['RTE_ARCH_ARM64', true], ['RTE_CACHE_LINE_SIZE', 128] ] @@ -371,6 +370,7 @@ socs = { dpdk_conf.set('RTE_ARCH_ARM', 1) dpdk_conf.set('RTE_FORCE_INTRINSICS', 1) +dpdk_conf.set('RTE_ARM_USE_WFE', get_option('use_wfe')) if dpdk_conf.get('RTE_ARCH_32') # armv7 build diff --git a/meson_options.txt b/meson_options.txt index 56bdfd0f0a..5012803c77 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -44,3 +44,5 @@ option('tests', type: 'boolean', value: true, description: 'build unit tests') option('use_hpet', type: 'boolean', value: false, description: 'use HPET timer in EAL') +option('use_wfe', type: 'boolean', value: false, description: + 'use Wait Until Equal for polling loops') -- 2.25.1
07/07/2021 07:48, Ruifeng Wang:
> Introduce a meson option 'use_wfe' to select wait until equal method.
> The default is disable. Traditional polling loop is used.
> When enabled, architecture specific mechanism is relied on to do the
> wait.
Why do we need an option?
Can it be automatic to enable it when supported?
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, July 7, 2021 2:32 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Jan Viktorin <viktorin@rehivetech.com>; jerinj@marvell.com; Bruce
> Richardson <bruce.richardson@intel.com>; dev@dpdk.org;
> david.marchand@redhat.com; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Subject: Re: [PATCH v4 3/3] build: add option to enable wait until equal
>
> 07/07/2021 07:48, Ruifeng Wang:
> > Introduce a meson option 'use_wfe' to select wait until equal method.
> > The default is disable. Traditional polling loop is used.
> > When enabled, architecture specific mechanism is relied on to do the
> > wait.
>
> Why do we need an option?
> Can it be automatic to enable it when supported?
>
The problem is inconsistency in performance on different Arm platforms. We had decided that each platform needs to enable it after some testing.
On Wed, Jul 7, 2021 at 7:44 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> diff --git a/meson_options.txt b/meson_options.txt
> index 56bdfd0f0a..5012803c77 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -44,3 +44,5 @@ option('tests', type: 'boolean', value: true, description:
> 'build unit tests')
> option('use_hpet', type: 'boolean', value: false, description:
> 'use HPET timer in EAL')
> +option('use_wfe', type: 'boolean', value: false, description:
> + 'use Wait Until Equal for polling loops')
This is badly described.
We have a wait until equal API.
What you want to enable with this option is the hw assisted implementation.
--
David Marchand
On Wed, Jul 07, 2021 at 06:46:33AM +0000, Ruifeng Wang wrote:
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Wednesday, July 7, 2021 2:32 PM
> > To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Cc: Jan Viktorin <viktorin@rehivetech.com>; jerinj@marvell.com; Bruce
> > Richardson <bruce.richardson@intel.com>; dev@dpdk.org;
> > david.marchand@redhat.com; nd <nd@arm.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>
> > Subject: Re: [PATCH v4 3/3] build: add option to enable wait until equal
> >
> > 07/07/2021 07:48, Ruifeng Wang:
> > > Introduce a meson option 'use_wfe' to select wait until equal method.
> > > The default is disable. Traditional polling loop is used.
> > > When enabled, architecture specific mechanism is relied on to do the
> > > wait.
> >
> > Why do we need an option?
> > Can it be automatic to enable it when supported?
> >
> The problem is inconsistency in performance on different Arm platforms. We had decided that each platform needs to enable it after some testing.
>
Can that not be done via variables in the cross-file for the builds, or via
automatic detection if it's a native build? Is it likely that individual
users of DPDK will be knowledgable enough to use this option correctly?
On Wed, Jul 7, 2021 at 5:57 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Wed, Jul 07, 2021 at 06:46:33AM +0000, Ruifeng Wang wrote:
> > > -----Original Message-----
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > Sent: Wednesday, July 7, 2021 2:32 PM
> > > To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > > Cc: Jan Viktorin <viktorin@rehivetech.com>; jerinj@marvell.com; Bruce
> > > Richardson <bruce.richardson@intel.com>; dev@dpdk.org;
> > > david.marchand@redhat.com; nd <nd@arm.com>; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>
> > > Subject: Re: [PATCH v4 3/3] build: add option to enable wait until equal
> > >
> > > 07/07/2021 07:48, Ruifeng Wang:
> > > > Introduce a meson option 'use_wfe' to select wait until equal method.
> > > > The default is disable. Traditional polling loop is used.
> > > > When enabled, architecture specific mechanism is relied on to do the
> > > > wait.
> > >
> > > Why do we need an option?
> > > Can it be automatic to enable it when supported?
> > >
> > The problem is inconsistency in performance on different Arm platforms. We had decided that each platform needs to enable it after some testing.
> >
> Can that not be done via variables in the cross-file for the builds, or via
> automatic detection if it's a native build? Is it likely that individual
> users of DPDK will be knowledgable enough to use this option correctly?
+1 to add this in cross-file instead of the top of config option as
scope if is only for arm64 builds.
On Sun, 25 Apr 2021 05:56:51 +0000
Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'[1].
>
> Use the API for the rte spinlock and ring implementations.
> With the wait until equal APIs being stable, changes will not impact ABI.
>
> [1] http://patches.dpdk.org/cover/62703/
>
> v3:
> Series rebased. (David)
>
> Gavin Hu (1):
> spinlock: use wfe to reduce contention on aarch64
>
> Ruifeng Wang (1):
> ring: use wfe to wait for ring tail update on aarch64
>
> lib/eal/include/generic/rte_spinlock.h | 4 ++--
> lib/ring/rte_ring_c11_pvt.h | 4 ++--
> lib/ring/rte_ring_generic_pvt.h | 3 +--
> 3 files changed, 5 insertions(+), 6 deletions(-)
>
Other places that should use WFE:
rte_mcslock.h:rte_mcslock_lock()
rte_mcslock_unlock:rte_mcslock_unlock()
rte_pflock.h:rte_pflock_lock()
rte_rwlock.h:rte_rwlock_read_lock()
rte_rwlock.h:rte_rwlock_write_lock()
You should also introduce rte_wait_while_XXX variants to handle some
of these cases.
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, July 7, 2021 8:37 PM
> To: Bruce Richardson <bruce.richardson@intel.com>
> Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; thomas@monjalon.net; Jan
> Viktorin <viktorin@rehivetech.com>; jerinj@marvell.com; dev@dpdk.org;
> david.marchand@redhat.com; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] build: add option to enable wait until
> equal
>
> On Wed, Jul 7, 2021 at 5:57 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Wed, Jul 07, 2021 at 06:46:33AM +0000, Ruifeng Wang wrote:
> > > > -----Original Message-----
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > Sent: Wednesday, July 7, 2021 2:32 PM
> > > > To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > > > Cc: Jan Viktorin <viktorin@rehivetech.com>; jerinj@marvell.com;
> > > > Bruce Richardson <bruce.richardson@intel.com>; dev@dpdk.org;
> > > > david.marchand@redhat.com; nd <nd@arm.com>; Honnappa
> Nagarahalli
> > > > <Honnappa.Nagarahalli@arm.com>
> > > > Subject: Re: [PATCH v4 3/3] build: add option to enable wait until
> > > > equal
> > > >
> > > > 07/07/2021 07:48, Ruifeng Wang:
> > > > > Introduce a meson option 'use_wfe' to select wait until equal method.
> > > > > The default is disable. Traditional polling loop is used.
> > > > > When enabled, architecture specific mechanism is relied on to do
> > > > > the wait.
> > > >
> > > > Why do we need an option?
> > > > Can it be automatic to enable it when supported?
> > > >
> > > The problem is inconsistency in performance on different Arm platforms.
> We had decided that each platform needs to enable it after some testing.
> > >
> > Can that not be done via variables in the cross-file for the builds,
> > or via automatic detection if it's a native build? Is it likely that
> > individual users of DPDK will be knowledgable enough to use this option
> correctly?
>
> +1 to add this in cross-file instead of the top of config option as
> scope if is only for arm64 builds.
Currently this option is in config/arm/meson.build (flags_common). SoCs can build
with this option enabled/disabled. And the ability is available for both native build
and cross build as cross build also goes through meson.build.
If a SoC needs to enable the option by default, an entry and be added to the SoC flags.
The key difference here is whether this option need to be exposed to the top level config.
On Thu, Jul 8, 2021 at 11:55 AM Ruifeng Wang <Ruifeng.Wang@arm.com> wrote:
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, July 7, 2021 8:37 PM
> > To: Bruce Richardson <bruce.richardson@intel.com>
> > Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; thomas@monjalon.net; Jan
> > Viktorin <viktorin@rehivetech.com>; jerinj@marvell.com; dev@dpdk.org;
> > david.marchand@redhat.com; nd <nd@arm.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v4 3/3] build: add option to enable wait until
> > equal
> >
> > On Wed, Jul 7, 2021 at 5:57 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > On Wed, Jul 07, 2021 at 06:46:33AM +0000, Ruifeng Wang wrote:
> > > > > -----Original Message-----
> > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > Sent: Wednesday, July 7, 2021 2:32 PM
> > > > > To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > > > > Cc: Jan Viktorin <viktorin@rehivetech.com>; jerinj@marvell.com;
> > > > > Bruce Richardson <bruce.richardson@intel.com>; dev@dpdk.org;
> > > > > david.marchand@redhat.com; nd <nd@arm.com>; Honnappa
> > Nagarahalli
> > > > > <Honnappa.Nagarahalli@arm.com>
> > > > > Subject: Re: [PATCH v4 3/3] build: add option to enable wait until
> > > > > equal
> > > > >
> > > > > 07/07/2021 07:48, Ruifeng Wang:
> > > > > > Introduce a meson option 'use_wfe' to select wait until equal method.
> > > > > > The default is disable. Traditional polling loop is used.
> > > > > > When enabled, architecture specific mechanism is relied on to do
> > > > > > the wait.
> > > > >
> > > > > Why do we need an option?
> > > > > Can it be automatic to enable it when supported?
> > > > >
> > > > The problem is inconsistency in performance on different Arm platforms.
> > We had decided that each platform needs to enable it after some testing.
> > > >
> > > Can that not be done via variables in the cross-file for the builds,
> > > or via automatic detection if it's a native build? Is it likely that
> > > individual users of DPDK will be knowledgable enough to use this option
> > correctly?
> >
> > +1 to add this in cross-file instead of the top of config option as
> > scope if is only for arm64 builds.
>
> Currently this option is in config/arm/meson.build (flags_common). SoCs can build
> with this option enabled/disabled. And the ability is available for both native build
> and cross build as cross build also goes through meson.build.
> If a SoC needs to enable the option by default, an entry and be added to the SoC flags.
>
> The key difference here is whether this option need to be exposed to the top level config.
In the view of limiting top-level config options and it is specific to
Arm, I think, it better to
be a cross file only option.
08/07/2021 08:32, Jerin Jacob:
> On Thu, Jul 8, 2021 at 11:55 AM Ruifeng Wang <Ruifeng.Wang@arm.com> wrote:
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > On Wed, Jul 7, 2021 at 5:57 PM Bruce Richardson
> > > <bruce.richardson@intel.com> wrote:
> > > > On Wed, Jul 07, 2021 at 06:46:33AM +0000, Ruifeng Wang wrote:
> > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > 07/07/2021 07:48, Ruifeng Wang:
> > > > > > > Introduce a meson option 'use_wfe' to select wait until equal method.
> > > > > > > The default is disable. Traditional polling loop is used.
> > > > > > > When enabled, architecture specific mechanism is relied on to do
> > > > > > > the wait.
> > > > > >
> > > > > > Why do we need an option?
> > > > > > Can it be automatic to enable it when supported?
> > > > > >
> > > > > The problem is inconsistency in performance on different Arm platforms.
> > > We had decided that each platform needs to enable it after some testing.
> > > > >
> > > > Can that not be done via variables in the cross-file for the builds,
> > > > or via automatic detection if it's a native build? Is it likely that
> > > > individual users of DPDK will be knowledgable enough to use this option
> > > correctly?
> > >
> > > +1 to add this in cross-file instead of the top of config option as
> > > scope if is only for arm64 builds.
> >
> > Currently this option is in config/arm/meson.build (flags_common). SoCs can build
> > with this option enabled/disabled. And the ability is available for both native build
> > and cross build as cross build also goes through meson.build.
> > If a SoC needs to enable the option by default, an entry and be added to the SoC flags.
> >
> > The key difference here is whether this option need to be exposed to the top level config.
>
> In the view of limiting top-level config options and it is specific to
> Arm, I think, it better to
> be a cross file only option.
+1, sorry for the late notice.
I would advocate to take this patch in 21.08-rc2.
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, July 8, 2021 3:33 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Jerin Jacob
> <jerinjacobk@gmail.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>; Jan Viktorin
> <viktorin@rehivetech.com>; jerinj@marvell.com; dev@dpdk.org;
> david.marchand@redhat.com; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] build: add option to enable wait until
> equal
>
> 08/07/2021 08:32, Jerin Jacob:
> > On Thu, Jul 8, 2021 at 11:55 AM Ruifeng Wang <Ruifeng.Wang@arm.com>
> wrote:
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > On Wed, Jul 7, 2021 at 5:57 PM Bruce Richardson
> > > > <bruce.richardson@intel.com> wrote:
> > > > > On Wed, Jul 07, 2021 at 06:46:33AM +0000, Ruifeng Wang wrote:
> > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > > 07/07/2021 07:48, Ruifeng Wang:
> > > > > > > > Introduce a meson option 'use_wfe' to select wait until equal
> method.
> > > > > > > > The default is disable. Traditional polling loop is used.
> > > > > > > > When enabled, architecture specific mechanism is relied on
> > > > > > > > to do the wait.
> > > > > > >
> > > > > > > Why do we need an option?
> > > > > > > Can it be automatic to enable it when supported?
> > > > > > >
> > > > > > The problem is inconsistency in performance on different Arm
> platforms.
> > > > We had decided that each platform needs to enable it after some
> testing.
> > > > > >
> > > > > Can that not be done via variables in the cross-file for the
> > > > > builds, or via automatic detection if it's a native build? Is it
> > > > > likely that individual users of DPDK will be knowledgable enough
> > > > > to use this option
> > > > correctly?
> > > >
> > > > +1 to add this in cross-file instead of the top of config option
> > > > +as
> > > > scope if is only for arm64 builds.
> > >
> > > Currently this option is in config/arm/meson.build (flags_common).
> > > SoCs can build with this option enabled/disabled. And the ability is
> > > available for both native build and cross build as cross build also goes
> through meson.build.
> > > If a SoC needs to enable the option by default, an entry and be added to
> the SoC flags.
> > >
> > > The key difference here is whether this option need to be exposed to
> the top level config.
> >
> > In the view of limiting top-level config options and it is specific to
> > Arm, I think, it better to be a cross file only option.
>
> +1, sorry for the late notice.
> I would advocate to take this patch in 21.08-rc2.
>
If the decision is not to expose this option in top-level config, we can just drop 3/3 patch.
Do I need to send a new version?
> -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Wednesday, July 7, 2021 10:48 PM > To: Ruifeng Wang <Ruifeng.Wang@arm.com> > Cc: dev@dpdk.org; david.marchand@redhat.com; thomas@monjalon.net; > jerinj@marvell.com; nd <nd@arm.com>; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com> > Subject: Re: [dpdk-dev] Use WFE for spinlock and ring > > On Sun, 25 Apr 2021 05:56:51 +0000 > Ruifeng Wang <ruifeng.wang@arm.com> wrote: > > > The rte_wait_until_equal_xxx APIs abstract the functionality of > > 'polling for a memory location to become equal to a given value'[1]. > > > > Use the API for the rte spinlock and ring implementations. > > With the wait until equal APIs being stable, changes will not impact ABI. > > > > [1] http://patches.dpdk.org/cover/62703/ > > > > v3: > > Series rebased. (David) > > > > Gavin Hu (1): > > spinlock: use wfe to reduce contention on aarch64 > > > > Ruifeng Wang (1): > > ring: use wfe to wait for ring tail update on aarch64 > > > > lib/eal/include/generic/rte_spinlock.h | 4 ++-- > > lib/ring/rte_ring_c11_pvt.h | 4 ++-- > > lib/ring/rte_ring_generic_pvt.h | 3 +-- > > 3 files changed, 5 insertions(+), 6 deletions(-) > > > > Other places that should use WFE: Thank you Stephen for looking into this. > > rte_mcslock.h:rte_mcslock_lock() Existing API can be used in this one. > rte_mcslock_unlock:rte_mcslock_unlock() This one needs rte_wait_while_xxx variant. > > rte_pflock.h:rte_pflock_lock() > rte_rwlock.h:rte_rwlock_read_lock() > rte_rwlock.h:rte_rwlock_write_lock() These occurrences have extra logic (AND, conditional branch, CAS) in the loop. I'm not sure generic API can be abstracted from these use cases. > > > You should also introduce rte_wait_while_XXX variants to handle some of > these cases. >
08/07/2021 11:21, Ruifeng Wang:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 08/07/2021 08:32, Jerin Jacob:
> > > On Thu, Jul 8, 2021 at 11:55 AM Ruifeng Wang <Ruifeng.Wang@arm.com>
> > wrote:
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > On Wed, Jul 7, 2021 at 5:57 PM Bruce Richardson
> > > > > <bruce.richardson@intel.com> wrote:
> > > > > > On Wed, Jul 07, 2021 at 06:46:33AM +0000, Ruifeng Wang wrote:
> > > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > > > 07/07/2021 07:48, Ruifeng Wang:
> > > > > > > > > Introduce a meson option 'use_wfe' to select wait until equal
> > method.
> > > > > > > > > The default is disable. Traditional polling loop is used.
> > > > > > > > > When enabled, architecture specific mechanism is relied on
> > > > > > > > > to do the wait.
> > > > > > > >
> > > > > > > > Why do we need an option?
> > > > > > > > Can it be automatic to enable it when supported?
> > > > > > > >
> > > > > > > The problem is inconsistency in performance on different Arm
> > platforms.
> > > > > We had decided that each platform needs to enable it after some
> > testing.
> > > > > > >
> > > > > > Can that not be done via variables in the cross-file for the
> > > > > > builds, or via automatic detection if it's a native build? Is it
> > > > > > likely that individual users of DPDK will be knowledgable enough
> > > > > > to use this option
> > > > > correctly?
> > > > >
> > > > > +1 to add this in cross-file instead of the top of config option
> > > > > +as
> > > > > scope if is only for arm64 builds.
> > > >
> > > > Currently this option is in config/arm/meson.build (flags_common).
> > > > SoCs can build with this option enabled/disabled. And the ability is
> > > > available for both native build and cross build as cross build also goes
> > through meson.build.
> > > > If a SoC needs to enable the option by default, an entry and be added to
> > the SoC flags.
> > > >
> > > > The key difference here is whether this option need to be exposed to
> > the top level config.
> > >
> > > In the view of limiting top-level config options and it is specific to
> > > Arm, I think, it better to be a cross file only option.
> >
> > +1, sorry for the late notice.
> > I would advocate to take this patch in 21.08-rc2.
> >
> If the decision is not to expose this option in top-level config, we can just drop 3/3 patch.
> Do I need to send a new version?
No it looks not needed to send a new version, I can take patches 1 & 2.
<snip> > > > > On Sun, 25 Apr 2021 05:56:51 +0000 > > Ruifeng Wang <ruifeng.wang@arm.com> wrote: > > > > > The rte_wait_until_equal_xxx APIs abstract the functionality of > > > 'polling for a memory location to become equal to a given value'[1]. > > > > > > Use the API for the rte spinlock and ring implementations. > > > With the wait until equal APIs being stable, changes will not impact ABI. > > > > > > [1] http://patches.dpdk.org/cover/62703/ > > > > > > v3: > > > Series rebased. (David) > > > > > > Gavin Hu (1): > > > spinlock: use wfe to reduce contention on aarch64 > > > > > > Ruifeng Wang (1): > > > ring: use wfe to wait for ring tail update on aarch64 > > > > > > lib/eal/include/generic/rte_spinlock.h | 4 ++-- > > > lib/ring/rte_ring_c11_pvt.h | 4 ++-- > > > lib/ring/rte_ring_generic_pvt.h | 3 +-- > > > 3 files changed, 5 insertions(+), 6 deletions(-) > > > > > > > Other places that should use WFE: > Thank you Stephen for looking into this. > > > > > rte_mcslock.h:rte_mcslock_lock() > Existing API can be used in this one. > > > rte_mcslock_unlock:rte_mcslock_unlock() > This one needs rte_wait_while_xxx variant. > > > > > rte_pflock.h:rte_pflock_lock() > > rte_rwlock.h:rte_rwlock_read_lock() > > rte_rwlock.h:rte_rwlock_write_lock() > These occurrences have extra logic (AND, conditional branch, CAS) in the loop. > I'm not sure generic API can be abstracted from these use cases. I think it is possible to create additional abstractions to address these cases. > > > > > > > You should also introduce rte_wait_while_XXX variants to handle some > > of these cases. > > > >
07/07/2021 07:48, Ruifeng Wang:
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'[1].
>
> Use the API for the rte spinlock and ring implementations.
> With the wait until equal APIs being stable, changes will not impact ABI.
>
> Gavin Hu (1):
> spinlock: use wfe to reduce contention on aarch64
>
> Ruifeng Wang (2):
> ring: use wfe to wait for ring tail update on aarch64
> build: add option to enable wait until equal
As discussed in the thread, patches 1 & 2 are applied.
The patch 3 (meson option) is rejected.