DPDK patches and discussions
 help / color / mirror / Atom feed
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"olivier.matz@6wind.com" <olivier.matz@6wind.com>,
	"sthemmin@microsoft.com" <sthemmin@microsoft.com>,
	"jerinj@marvell.com" <jerinj@marvell.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	"david.marchand@redhat.com" <david.marchand@redhat.com>,
	"pbhagavatula@marvell.com" <pbhagavatula@marvell.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Dharmik Thakkar <Dharmik.Thakkar@arm.com>,
	 "Ruifeng Wang (Arm Technology China)" <Ruifeng.Wang@arm.com>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
Date: Thu, 17 Oct 2019 04:46:27 +0000	[thread overview]
Message-ID: <VE1PR08MB5149D51FA4EDB55D6DEFA129986D0@VE1PR08MB5149.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725801A8C68A99@IRSMSX104.ger.corp.intel.com>

<snip>

> Hi Honnappa,
> 
> > > > >
> > > > > Current APIs assume ring elements to be pointers. However, in
> > > > > many use cases, the size can be different. Add new APIs to
> > > > > support configurable ring element sizes.
> > > > >
> > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > ---
> > > > >  lib/librte_ring/Makefile             |   3 +-
> > > > >  lib/librte_ring/meson.build          |   3 +
> > > > >  lib/librte_ring/rte_ring.c           |  45 +-
> > > > >  lib/librte_ring/rte_ring.h           |   1 +
> > > > >  lib/librte_ring/rte_ring_elem.h      | 946
> +++++++++++++++++++++++++++
> > > > >  lib/librte_ring/rte_ring_version.map |   2 +
> > > > >  6 files changed, 991 insertions(+), 9 deletions(-)  create mode
> > > > > 100644 lib/librte_ring/rte_ring_elem.h
> > > > >
> > > > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > > > > index 21a36770d..515a967bb 100644
> > > > > --- a/lib/librte_ring/Makefile
> > > > > +++ b/lib/librte_ring/Makefile

<snip>

> > > > > +
> > > > > +# rte_ring_create_elem and rte_ring_get_memsize_elem are
> > > > > +experimental allow_experimental_apis = true
> > > > > diff --git a/lib/librte_ring/rte_ring.c
> > > > > b/lib/librte_ring/rte_ring.c index d9b308036..6fed3648b 100644
> > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > @@ -33,6 +33,7 @@
> > > > >  #include <rte_tailq.h>
> > > > >
> > > > >  #include "rte_ring.h"
> > > > > +#include "rte_ring_elem.h"
> > > > >

<snip>

> > > > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > > b/lib/librte_ring/rte_ring_elem.h new file mode 100644 index
> > > > > 000000000..860f059ad
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > > > @@ -0,0 +1,946 @@
> > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > + *
> > > > > + * Copyright (c) 2019 Arm Limited
> > > > > + * Copyright (c) 2010-2017 Intel Corporation
> > > > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > > > + * All rights reserved.
> > > > > + * Derived from FreeBSD's bufring.h
> > > > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > > > + */
> > > > > +
> > > > > +#ifndef _RTE_RING_ELEM_H_
> > > > > +#define _RTE_RING_ELEM_H_
> > > > > +

<snip>

> > > > > +
> > > > > +/* the actual enqueue of pointers on the ring.
> > > > > + * Placed here since identical code needed in both
> > > > > + * single and multi producer enqueue functions.
> > > > > + */
> > > > > +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table,
> > > > > +esize, n)
> > > > > do { \
> > > > > +	if (esize == 4) \
> > > > > +		ENQUEUE_PTRS_32(r, ring_start, prod_head,
> obj_table, n); \
> > > > > +	else if (esize == 8) \
> > > > > +		ENQUEUE_PTRS_64(r, ring_start, prod_head,
> obj_table, n); \
> > > > > +	else if (esize == 16) \
> > > > > +		ENQUEUE_PTRS_128(r, ring_start, prod_head,
> obj_table, n);
> > > \ }
> > > > > while
> > > > > +(0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n)
> do { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8)
> { \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > +			ring[idx + 2] = obj[i + 2]; \
> > > > > +			ring[idx + 3] = obj[i + 3]; \
> > > > > +			ring[idx + 4] = obj[i + 4]; \
> > > > > +			ring[idx + 5] = obj[i + 5]; \
> > > > > +			ring[idx + 6] = obj[i + 6]; \
> > > > > +			ring[idx + 7] = obj[i + 7]; \
> > > > > +		} \
> > > > > +		switch (n & 0x7) { \
> > > > > +		case 7: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 6: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 5: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 4: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 3: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] = obj[i]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n)
> do { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4)
> { \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > +			ring[idx + 2] = obj[i + 2]; \
> > > > > +			ring[idx + 3] = obj[i + 3]; \
> > > > > +		} \
> > > > > +		switch (n & 0x3) { \
> > > > > +		case 3: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			ring[idx++] = obj[i++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] = obj[i]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table,
> > > > > +n) do
> > > { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > +		} \
> > > > > +		switch (n & 0x1) { \
> > > > > +		case 1: \
> > > > > +			ring[idx++] = obj[i++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] = obj[i]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +/* the actual copy of pointers on the ring to obj_table.
> > > > > + * Placed here since identical code needed in both
> > > > > + * single and multi consumer dequeue functions.
> > > > > + */
> > > > > +#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table,
> > > > > +esize, n)
> > > > > do { \
> > > > > +	if (esize == 4) \
> > > > > +		DEQUEUE_PTRS_32(r, ring_start, cons_head,
> obj_table, n); \
> > > > > +	else if (esize == 8) \
> > > > > +		DEQUEUE_PTRS_64(r, ring_start, cons_head,
> obj_table, n); \
> > > > > +	else if (esize == 16) \
> > > > > +		DEQUEUE_PTRS_128(r, ring_start, cons_head,
> obj_table, n);
> > > \ }
> > > > > while
> > > > > +(0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do
> { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8)
> {\
> > > > > +			obj[i] = ring[idx]; \
> > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > +			obj[i + 2] = ring[idx + 2]; \
> > > > > +			obj[i + 3] = ring[idx + 3]; \
> > > > > +			obj[i + 4] = ring[idx + 4]; \
> > > > > +			obj[i + 5] = ring[idx + 5]; \
> > > > > +			obj[i + 6] = ring[idx + 6]; \
> > > > > +			obj[i + 7] = ring[idx + 7]; \
> > > > > +		} \
> > > > > +		switch (n & 0x7) { \
> > > > > +		case 7: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 6: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 5: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 4: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 3: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do
> { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4)
> {\
> > > > > +			obj[i] = ring[idx]; \
> > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > +			obj[i + 2] = ring[idx + 2]; \
> > > > > +			obj[i + 3] = ring[idx + 3]; \
> > > > > +		} \
> > > > > +		switch (n & 0x3) { \
> > > > > +		case 3: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			obj[i++] = ring[idx++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table,
> > > > > +n) do
> > > { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > +		} \
> > > > > +		switch (n & 0x1) { \
> > > > > +		case 1: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +/* Between load and load. there might be cpu reorder in weak
> > > > > +model
> > > > > + * (powerpc/arm).
> > > > > + * There are 2 choices for the users
> > > > > + * 1.use rmb() memory barrier
> > > > > + * 2.use one-direction load_acquire/store_release
> > > > > +barrier,defined by
> > > > > + * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > > > + * It depends on performance test results.
> > > > > + * By default, move common functions to rte_ring_generic.h  */
> > > > > +#ifdef RTE_USE_C11_MEM_MODEL #include "rte_ring_c11_mem.h"
> > > > > +#else
> > > > > +#include "rte_ring_generic.h"
> > > > > +#endif
> > > > > +
> > > > > +/**
> > > > > + * @internal Enqueue several objects on the ring
> > > > > + *
> > > > > + * @param r
> > > > > + *   A pointer to the ring structure.
> > > > > + * @param obj_table
> > > > > + *   A pointer to a table of void * pointers (objects).
> > > > > + * @param esize
> > > > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > > > + *   Currently, sizes 4, 8 and 16 are supported. This should be the
> same
> > > > > + *   as passed while creating the ring, otherwise the results are
> undefined.
> > > > > + * @param n
> > > > > + *   The number of objects to add in the ring from the obj_table.
> > > > > + * @param behavior
> > > > > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items
> from a
> > > ring
> > > > > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> > > from
> > > > > ring
> > > > > + * @param is_sp
> > > > > + *   Indicates whether to use single producer or multi-producer head
> > > update
> > > > > + * @param free_space
> > > > > + *   returns the amount of space after the enqueue operation has
> > > finished
> > > > > + * @return
> > > > > + *   Actual number of objects enqueued.
> > > > > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > > > + */
> > > > > +static __rte_always_inline unsigned int
> > > > > +__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const
> obj_table,
> > > > > +		unsigned int esize, unsigned int n,
> > > > > +		enum rte_ring_queue_behavior behavior, unsigned
> int is_sp,
> > > > > +		unsigned int *free_space)
> > >
> > >
> > > I like the idea to add esize as an argument to the public API, so
> > > the compiler can do it's jib optimizing calls with constant esize.
> > > Though I am not very happy with the rest of implementation:
> > > 1. It doesn't really provide configurable elem size - only 4/8/16B
> > > elems are supported.
> > Agree. I was thinking other sizes can be added on need basis.
> > However, I am wondering if we should just provide for 4B and then the
> users can use bulk operations to construct whatever they need?
> 
> I suppose it could be plan B... if there would be no agreement on generic case.
> And for 4B elems, I guess you do have a particular use-case?
Yes

> 
> > It
> > would mean extra work for the users.
> >
> > > 2. A lot of code duplication with these 3 copies of ENQUEUE/DEQUEUE
> > > macros.
> > >
> > > Looking at ENQUEUE/DEQUEUE macros, I can see that main loop always
> > > does 32B copy per iteration.
> > Yes, I tried to keep it the same as the existing one (originally, I
> > guess the intention was to allow for 256b vector instructions to be
> > generated)
> >
> > > So wonder can we make a generic function that would do 32B copy per
> > > iteration in a main loop, and copy tail  by 4B chunks?
> > > That would avoid copy duplication and will allow user to have any
> > > elem size (multiple of 4B) he wants.
> > > Something like that (note didn't test it, just a rough idea):
> > >
> > >  static inline void
> > > copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num,
> > > uint32_t
> > > esize) {
> > >         uint32_t i, sz;
> > >
> > >         sz = (num * esize) / sizeof(uint32_t);
> > If 'num' is a compile time constant, 'sz' will be a compile time constant.
> Otherwise, this will result in a multiplication operation.
> 
> Not always.
> If esize is compile time constant, then for esize as power of 2 (4,8,16,...), it
> would be just one shift.
> For other constant values it could be a 'mul' or in many cases just 2 shifts plus
> 'add' (if compiler is smart enough).
> I.E. let say for 24B elem is would be either num * 6 or (num << 2) + (num <<
> 1).
With num * 15 it has to be (num << 3) + (num << 2) + (num << 1) + num
Not sure if the compiler will do this.

> I suppose for non-power of 2 elems it might be ok to get such small perf hit.
Agree, should be ok not to focus on right now.

> 
> >I have tried
> > to avoid the multiplication operation and try to use shift and mask
> operations (just like how the rest of the ring code does).
> >
> > >
> > >         for (i = 0; i < (sz & ~7); i += 8)
> > >                 memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> > I had used memcpy to start with (for the entire copy operation),
> > performance is not the same for 64b elements when compared with the
> existing ring APIs (some cases more and some cases less).
> 
> I remember that from one of your previous mails, that's why here I suggest to
> use in a loop memcpy() with fixed size.
> That way for each iteration complier will replace memcpy() with instructions
> to copy 32B in a way he thinks is optimal (same as for original macro, I think).
I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results are as follows. The numbers in brackets are with the code on master.
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 5
MP/MC single enq/dequeue: 40 (35)
SP/SC burst enq/dequeue (size: 8): 2
MP/MC burst enq/dequeue (size: 8): 6
SP/SC burst enq/dequeue (size: 32): 1 (2)
MP/MC burst enq/dequeue (size: 32): 2

### Testing empty dequeue ###
SC empty dequeue: 2.11
MC empty dequeue: 1.41 (2.11)

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 2.15 (2.86)
MP/MC bulk enq/dequeue (size: 8): 6.35 (6.91)
SP/SC bulk enq/dequeue (size: 32): 1.35 (2.06)
MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 73.81 (15.33)
MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58)
MP/MC bulk enq/dequeue (size: 32): 25.74 (20.91)

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 164.32 (50.66)
MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
SP/SC bulk enq/dequeue (size: 32): 50.78 (23)
MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)

On one of the Arm platform
MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest are ok)

On another Arm platform, all numbers are same or slightly better.

I can post the patch with this change if you want to run some benchmarks on your platform.
I have not used the same code you have suggested, instead I have used the same logic in a single macro with memcpy.

> 
> >
> > IMO, we have to keep the performance of the 64b and 128b the same as
> > what we get with the existing ring and event-ring APIs. That would allow us
> to replace them with these new APIs. I suggest that we keep the macros in
> this patch for 64b and 128b.
> 
> I still think we probably can achieve that without duplicating macros, while
> still supporting arbitrary elem size.
> See above.
> 
> > For the rest of the sizes, we could put a for loop around 32b macro (this
> would allow for all sizes as well).
> >
> > >
> > >         switch (sz & 7) {
> > >         case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
> > >         case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
> > >         case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
> > >         case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
> > >         case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
> > >         case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
> > >         case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
> > >         }
> > > }
> > >
> > > static inline void
> > > enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
> > >                 void *obj_table, uint32_t num, uint32_t esize) {
> > >         uint32_t idx, n;
> > >         uint32_t *du32;
> > >
> > >         const uint32_t size = r->size;
> > >
> > >         idx = prod_head & (r)->mask;
> > >
> > >         du32 = ring_start + idx * sizeof(uint32_t);
> > >
> > >         if (idx + num < size)
> > >                 copy_elems(du32, obj_table, num, esize);
> > >         else {
> > >                 n = size - idx;
> > >                 copy_elems(du32, obj_table, n, esize);
> > >                 copy_elems(ring_start, obj_table + n * sizeof(uint32_t),
> > >                         num - n, esize);
> > >         }
> > > }
> > >
> > > And then, in that function, instead of ENQUEUE_PTRS_ELEM(), just:
> > >
> > > enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
> > >
> > >
> > > > > +{
> > > > > +	uint32_t prod_head, prod_next;
> > > > > +	uint32_t free_entries;
> > > > > +
> > > > > +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > > > +			&prod_head, &prod_next, &free_entries);
> > > > > +	if (n == 0)
> > > > > +		goto end;
> > > > > +
> > > > > +	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize,
> n);
> > > > > +
> > > > > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > > > +end:
> > > > > +	if (free_space != NULL)
> > > > > +		*free_space = free_entries - n;
> > > > > +	return n;
> > > > > +}
> > > > > +

  reply	other threads:[~2019-10-17  4:46 UTC|newest]

Thread overview: 173+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom " Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 1/5] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes Honnappa Nagarahalli
2019-10-01 11:47   ` Ananyev, Konstantin
2019-10-02  4:21     ` Honnappa Nagarahalli
2019-10-02  8:39       ` Ananyev, Konstantin
2019-10-03  3:33         ` Honnappa Nagarahalli
2019-10-03 11:51           ` Ananyev, Konstantin
2019-10-03 12:27             ` Ananyev, Konstantin
2019-10-03 22:49               ` Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 3/5] tools/checkpatch: relax constraints on __rte_experimental Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 4/5] lib/ring: add ring APIs to support 32b ring elements Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 5/5] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2019-08-28 15:12 ` [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Jerin Jacob Kollanukkaran
2019-08-28 15:16 ` Pavan Nikhilesh Bhagavatula
2019-08-28 22:59   ` Honnappa Nagarahalli
2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 1/6] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes Honnappa Nagarahalli
2019-09-08 19:44     ` Stephen Hemminger
2019-09-09  9:01       ` Bruce Richardson
2019-09-09 22:33         ` Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 3/6] tools/checkpatch: relax constraints on __rte_experimental Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 4/6] lib/ring: add ring APIs to support 32b ring elements Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 6/6] lib/eventdev: use ring templates for event rings Honnappa Nagarahalli
2019-09-09 13:04   ` [dpdk-dev] [PATCH v2 0/6] lib/ring: templates to support custom element size Aaron Conole
2019-10-07 13:49   ` David Marchand
2019-10-08 19:19   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs " Honnappa Nagarahalli
2019-10-08 19:19     ` [dpdk-dev] [PATCH v3 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-10-08 19:19     ` [dpdk-dev] [PATCH v3 2/2] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
2019-10-09  2:47   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-10-11 19:21       ` Honnappa Nagarahalli
2019-10-14 19:41         ` Ananyev, Konstantin
2019-10-14 23:56           ` Honnappa Nagarahalli
2019-10-15  9:34             ` Ananyev, Konstantin
2019-10-17  4:46               ` Honnappa Nagarahalli [this message]
2019-10-17 11:51                 ` Ananyev, Konstantin
2019-10-17 20:16                   ` Honnappa Nagarahalli
2019-10-17 23:17                     ` David Christensen
2019-10-18  3:18                       ` Honnappa Nagarahalli
2019-10-18  8:04                         ` Jerin Jacob
2019-10-18 16:11                           ` Jerin Jacob
2019-10-21  0:27                             ` Honnappa Nagarahalli
2019-10-18 16:44                           ` Ananyev, Konstantin
2019-10-18 19:03                             ` Honnappa Nagarahalli
2019-10-21  0:36                             ` Honnappa Nagarahalli
2019-10-21  9:04                               ` Ananyev, Konstantin
2019-10-22 15:59                                 ` Ananyev, Konstantin
2019-10-22 17:57                                   ` Ananyev, Konstantin
2019-10-23 18:58                                     ` Honnappa Nagarahalli
2019-10-18 17:23                         ` David Christensen
2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 2/2] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
2019-10-17 20:08   ` [dpdk-dev] [PATCH v5 0/3] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-10-17 20:39       ` Stephen Hemminger
2019-10-17 20:40       ` Stephen Hemminger
2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 2/3] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 3/3] lib/ring: copy ring elements using memcpy partially Honnappa Nagarahalli
2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2019-10-23  9:49       ` Olivier Matz
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2019-10-23  9:59       ` Olivier Matz
2019-10-23 19:12         ` Honnappa Nagarahalli
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 3/6] test/ring: add functional tests for configurable element size ring Honnappa Nagarahalli
2019-10-23 10:01       ` Olivier Matz
2019-10-23 11:12         ` Ananyev, Konstantin
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 4/6] test/ring: add perf " Honnappa Nagarahalli
2019-10-23 10:02       ` Olivier Matz
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 5/6] lib/ring: copy ring elements using memcpy partially Honnappa Nagarahalli
2019-10-21  0:23     ` [dpdk-dev] [RFC v6 6/6] lib/ring: improved copy function to copy ring elements Honnappa Nagarahalli
2019-10-23 10:05       ` Olivier Matz
2019-10-23  9:48     ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Olivier Matz
2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 01/17] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2020-01-02 16:42       ` Ananyev, Konstantin
2020-01-07  5:35         ` Honnappa Nagarahalli
2020-01-07  6:00           ` Honnappa Nagarahalli
2020-01-07 10:21             ` Ananyev, Konstantin
2020-01-07 15:21               ` Honnappa Nagarahalli
2020-01-07 15:41                 ` Ananyev, Konstantin
2020-01-08  6:17                   ` Honnappa Nagarahalli
2020-01-08 10:05                     ` Ananyev, Konstantin
2020-01-08 23:40                       ` Honnappa Nagarahalli
2020-01-09  0:48                         ` Ananyev, Konstantin
2020-01-09 16:06                           ` Honnappa Nagarahalli
2020-01-13 11:53                             ` Ananyev, Konstantin
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2020-01-02 16:31       ` Ananyev, Konstantin
2020-01-07  5:13         ` Honnappa Nagarahalli
2020-01-07 16:03           ` Ananyev, Konstantin
2020-01-09  5:15             ` Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 04/17] test/ring: test burst APIs with random empty-full test case Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 05/17] test/ring: add default, single element test cases Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 06/17] test/ring: rte_ring_xxx_elem test cases for exact size ring Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 07/17] test/ring: negative test cases for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 08/17] test/ring: remove duplicate test cases Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 09/17] test/ring: removed unused variable synchro Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases Honnappa Nagarahalli
2020-01-02 17:03       ` Ananyev, Konstantin
2020-01-07  5:54         ` Honnappa Nagarahalli
2020-01-07 16:13           ` Ananyev, Konstantin
2020-01-07 22:33             ` Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 11/17] test/ring: modify burst " Honnappa Nagarahalli
2020-01-02 16:57       ` Ananyev, Konstantin
2020-01-07  5:42         ` Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 12/17] test/ring: modify bulk " Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 13/17] test/ring: modify bulk empty deq " Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 14/17] test/ring: modify multi-lcore " Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 15/17] test/ring: adjust run-on-all-cores " Honnappa Nagarahalli
2020-01-02 17:00       ` Ananyev, Konstantin
2020-01-07  5:42         ` Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 16/17] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 17/17] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
     [not found]       ` <1578977880-13011-1-git-send-email-robot@bytheb.org>
     [not found]         ` <VE1PR08MB5149BE79083CD66A41CBD6D198340@VE1PR08MB5149.eurprd08.prod.outlook.com>
2020-01-14 15:12           ` [dpdk-dev] FW: || pw64572 " Aaron Conole
2020-01-14 16:51             ` Aaron Conole
2020-01-14 19:35               ` Honnappa Nagarahalli
2020-01-14 20:44                 ` Aaron Conole
2020-01-15  0:55                   ` Honnappa Nagarahalli
2020-01-15  4:43                   ` Honnappa Nagarahalli
2020-01-15  5:05                     ` Honnappa Nagarahalli
2020-01-15 18:22                       ` Aaron Conole
2020-01-15 18:38                         ` Honnappa Nagarahalli
2020-01-16  5:27                           ` Honnappa Nagarahalli
2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2020-01-17 16:34       ` Olivier Matz
2020-01-17 16:45         ` Honnappa Nagarahalli
2020-01-17 18:10           ` David Christensen
2020-01-18 12:32           ` Ananyev, Konstantin
2020-01-18 15:01             ` Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2020-01-17 17:03       ` Olivier Matz
2020-01-18 16:27         ` Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
2020-01-17 17:12       ` Olivier Matz
2020-01-18 16:28         ` Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2020-01-17 20:27       ` David Marchand
2020-01-17 20:54         ` Honnappa Nagarahalli
2020-01-17 21:07           ` David Marchand
2020-01-17 22:24             ` Wang, Yipeng1
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
2020-01-17 14:41       ` Jerin Jacob
2020-01-17 16:12         ` David Marchand
2020-01-16 16:36     ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2020-01-17 12:14       ` David Marchand
2020-01-17 13:34         ` Jerin Jacob
2020-01-17 16:37           ` Mattias Rönnblom
2020-01-17 14:28         ` Honnappa Nagarahalli
2020-01-17 14:36           ` Honnappa Nagarahalli
2020-01-17 16:15           ` David Marchand
2020-01-17 16:32             ` Honnappa Nagarahalli
2020-01-17 17:15     ` Olivier Matz
2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 6/6] eventdev: use custom element size ring for event rings Honnappa Nagarahalli
2020-01-19 19:31     ` [dpdk-dev] [PATCH v10 0/6] lib/ring: APIs to support custom element size David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VE1PR08MB5149D51FA4EDB55D6DEFA129986D0@VE1PR08MB5149.eurprd08.prod.outlook.com \
    --to=honnappa.nagarahalli@arm.com \
    --cc=Dharmik.Thakkar@arm.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    --cc=pbhagavatula@marvell.com \
    --cc=stephen@networkplumber.org \
    --cc=sthemmin@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).