DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC] lib/st_ring: add single thread ring
@ 2023-08-21  6:04 Honnappa Nagarahalli
  2023-08-21  7:37 ` Morten Brørup
                   ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Honnappa Nagarahalli @ 2023-08-21  6:04 UTC (permalink / raw)
  To: jackmin, konstantin.v.ananyev
  Cc: dev, ruifeng.wang, aditya.ambadipudi, wathsala.vithanage, nd,
	Honnappa Nagarahalli

Add a single thread safe and multi-thread unsafe ring data structure.
This library provides an simple and efficient alternative to multi-thread
safe ring when multi-thread safety is not required.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
v1:
1) The code is very prelimnary and is not even compiled
2) This is intended to show the APIs and some thoughts on implementation
3) More APIs and the rest of the implementation will come in subsequent
   versions

 lib/st_ring/rte_st_ring.h | 567 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 567 insertions(+)
 create mode 100644 lib/st_ring/rte_st_ring.h

diff --git a/lib/st_ring/rte_st_ring.h b/lib/st_ring/rte_st_ring.h
new file mode 100644
index 0000000000..8cb8832591
--- /dev/null
+++ b/lib/st_ring/rte_st_ring.h
@@ -0,0 +1,567 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2023 Arm Limited
+ */
+
+#ifndef _RTE_ST_RING_H_
+#define _RTE_ST_RING_H_
+
+/**
+ * @file
+ * RTE Signle Thread Ring (ST Ring)
+ *
+ * The ST Ring is a fixed-size queue intended to be accessed
+ * by one thread at a time. It does not provide concurrent access to
+ * multiple threads. If there are multiple threads accessing the ST ring,
+ * then the threads have to use locks to protect the ring from
+ * getting corrupted.
+ *
+ * - FIFO (First In First Out)
+ * - Maximum size is fixed; the pointers are stored in a table.
+ * - Consumer and producer part of same thread.
+ * - Multi-thread producers and consumers need locking.
+ * - Single/Bulk/burst dequeue at Tail or Head
+ * - Single/Bulk/burst enqueue at Head or Tail
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_st_ring_core.h>
+#include <rte_st_ring_elem.h>
+
+/**
+ * Calculate the memory size needed for a ST ring
+ *
+ * This function returns the number of bytes needed for a ST ring, given
+ * the number of elements in it. This value is the sum of the size of
+ * the structure rte_st_ring and the size of the memory needed by the
+ * elements. The value is aligned to a cache line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ST ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+ssize_t rte_st_ring_get_memsize(unsigned int count);
+
+/**
+ * Initialize a ST ring structure.
+ *
+ * Initialize a ST ring structure in memory pointed by "r". The size of the
+ * memory area must be large enough to store the ring structure and the
+ * object table. It is advised to use rte_st_ring_get_memsize() to get the
+ * appropriate size.
+ *
+ * The ST ring size is set to *count*, which must be a power of two.
+ * The real usable ring size is *count-1* instead of *count* to
+ * differentiate a full ring from an empty ring.
+ *
+ * The ring is not added in RTE_TAILQ_ST_RING global list. Indeed, the
+ * memory given by the caller may not be shareable among dpdk
+ * processes.
+ *
+ * @param r
+ *   The pointer to the ring structure followed by the elements table.
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2,
+ *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
+ * @param flags
+ *   An OR of the following:
+ *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold
+ *     exactly the requested number of entries, and the requested size
+ *     will be rounded up to the next power of two, but the usable space
+ *     will be exactly that requested. Worst case, if a power-of-2 size is
+ *     requested, half the ring space will be wasted.
+ *     Without this flag set, the ring size requested must be a power of 2,
+ *     and the usable space will be that size - 1.
+ * @return
+ *   0 on success, or a negative value on error.
+ */
+int rte_st_ring_init(struct rte_st_ring *r, const char *name,
+	unsigned int count, unsigned int flags);
+
+/**
+ * Create a new ST ring named *name* in memory.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_st_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of two.
+ * The real usable ring size is *count-1* instead of *count* to
+ * differentiate a full ring from an empty ring.
+ *
+ * The ring is added in RTE_TAILQ_ST_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The size of the ring (must be a power of 2,
+ *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold exactly the
+ *     requested number of entries, and the requested size will be rounded up
+ *     to the next power of two, but the usable space will be exactly that
+ *     requested. Worst case, if a power-of-2 size is requested, half the
+ *     ring space will be wasted.
+ *     Without this flag set, the ring size requested must be a power of 2,
+ *     and the usable space will be that size - 1.
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_st_ring *rte_st_ring_create(const char *name, unsigned int count,
+				 int socket_id, unsigned int flags);
+
+/**
+ * De-allocate all memory used by the ring.
+ *
+ * @param r
+ *   Ring to free.
+ *   If NULL then, the function does nothing.
+ */
+void rte_st_ring_free(struct rte_st_ring *r);
+
+/**
+ * Dump the status of the ring to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param r
+ *   A pointer to the ring structure.
+ */
+void rte_st_ring_dump(FILE *f, const struct rte_st_ring *r);
+
+/**
+ * Enqueue fixed number of objects on a ST ring.
+ *
+ * This function copies the objects at the head of the ring and
+ * moves the head index.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_st_ring_enqueue_bulk(struct rte_st_ring *r, void * const *obj_table,
+		      unsigned int n, unsigned int *free_space)
+{
+	return rte_st_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
+			n, free_space);
+}
+
+/**
+ * Enqueue upto a maximum number of objects on a ST ring.
+ *
+ * This function copies the objects at the head of the ring and
+ * moves the head index.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned int
+rte_st_ring_enqueue_burst(struct rte_st_ring *r, void * const *obj_table,
+		      unsigned int n, unsigned int *free_space)
+{
+	return rte_st_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
+			n, free_space);
+}
+
+/**
+ * Enqueue one object on a ST ring.
+ *
+ * This function copies one object at the head of the ring and
+ * moves the head index.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_st_ring_enqueue(struct rte_st_ring *r, void *obj)
+{
+	return rte_st_ring_enqueue_elem(r, &obj, sizeof(void *));
+}
+
+/**
+ * Enqueue fixed number of objects on a ST ring at the tail.
+ *
+ * This function copies the objects at the tail of the ring and
+ * moves the tail index (backwards).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_st_ring_enqueue_at_tail_bulk(struct rte_st_ring *r,
+				 void * const *obj_table, unsigned int n,
+				 unsigned int *free_space)
+{
+	return rte_st_ring_enqueue_at_tail_bulk_elem(r, obj_table,
+			sizeof(void *), n, free_space);
+}
+
+/**
+ * Enqueue upto a maximum number of objects on a ST ring at the tail.
+ *
+ * This function copies the objects at the tail of the ring and
+ * moves the tail index (backwards).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned int
+rte_st_ring_enqueue_at_tail_burst(struct rte_st_ring *r,
+				  void * const *obj_table, unsigned int n,
+				  unsigned int *free_space)
+{
+	return rte_st_ring_enqueue_at_tail_burst_elem(r, obj_table,
+			sizeof(void *), n, free_space);
+}
+
+/**
+ * Enqueue one object on a ST ring at tail.
+ *
+ * This function copies one object at the tail of the ring and
+ * moves the tail index (backwards).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_st_ring_enqueue_at_tail(struct rte_st_ring *r, void *obj)
+{
+	return rte_st_ring_enqueue_at_tail_elem(r, &obj, sizeof(void *));
+}
+
+/**
+ * Dequeue a fixed number of objects from a ST ring.
+ *
+ * This function copies the objects from the tail of the ring and
+ * moves the tail index.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_st_ring_dequeue_bulk(struct rte_st_ring *r, void **obj_table, unsigned int n,
+		unsigned int *available)
+{
+	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
+			n, available);
+}
+
+/**
+ * Dequeue upto a maximum number of objects from a ST ring.
+ *
+ * This function copies the objects from the tail of the ring and
+ * moves the tail index.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned int
+rte_st_ring_dequeue_burst(struct rte_st_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
+			n, available);
+}
+
+/**
+ * Dequeue one object from a ST ring.
+ *
+ * This function copies one object from the tail of the ring and
+ * moves the tail index.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_st_ring_dequeue(struct rte_st_ring *r, void **obj_p)
+{
+	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *));
+}
+
+/**
+ * Dequeue a fixed number of objects from a ST ring from the head.
+ *
+ * This function copies the objects from the head of the ring and
+ * moves the head index (backwards).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_st_ring_dequeue_at_head_bulk(struct rte_st_ring *r, void **obj_table, unsigned int n,
+		unsigned int *available)
+{
+	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
+			n, available);
+}
+
+/**
+ * Dequeue upto a maximum number of objects from a ST ring from the head.
+ *
+ * This function copies the objects from the head of the ring and
+ * moves the head index (backwards).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned int
+rte_st_ring_dequeue_at_head_burst(struct rte_st_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
+			n, available);
+}
+
+/**
+ * Dequeue one object from a ST ring from the head.
+ *
+ * This function copies the objects from the head of the ring and
+ * moves the head index (backwards).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_st_ring_at_head_dequeue(struct rte_st_ring *r, void **obj_p)
+{
+	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *));
+}
+
+/**
+ * Flush a ST ring.
+ *
+ * This function flush all the elements in a ST ring
+ *
+ * @warning
+ * Make sure the ring is not in use while calling this function.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ */
+void
+rte_st_ring_reset(struct rte_st_ring *r);
+
+/**
+ * Return the number of entries in a ST ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The number of entries in the ring.
+ */
+static inline unsigned int
+rte_st_ring_count(const struct rte_st_ring *r)
+{
+	uint32_t count = (r->head - r->tail) & r->mask;
+	return count;
+}
+
+/**
+ * Return the number of free entries in a ST ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The number of free entries in the ring.
+ */
+static inline unsigned int
+rte_st_ring_free_count(const struct rte_st_ring *r)
+{
+	return r->capacity - rte_st_ring_count(r);
+}
+
+/**
+ * Test if a ST ring is full.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   - 1: The ring is full.
+ *   - 0: The ring is not full.
+ */
+static inline int
+rte_st_ring_full(const struct rte_st_ring *r)
+{
+	return rte_st_ring_free_count(r) == 0;
+}
+
+/**
+ * Test if a ST ring is empty.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   - 1: The ring is empty.
+ *   - 0: The ring is not empty.
+ */
+static inline int
+rte_st_ring_empty(const struct rte_st_ring *r)
+{
+	return r->tail == r->head;
+}
+
+/**
+ * Return the size of the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The size of the data store used by the ring.
+ *   NOTE: this is not the same as the usable space in the ring. To query that
+ *   use ``rte_st_ring_get_capacity()``.
+ */
+static inline unsigned int
+rte_st_ring_get_size(const struct rte_st_ring *r)
+{
+	return r->size;
+}
+
+/**
+ * Return the number of elements which can be stored in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The usable size of the ring.
+ */
+static inline unsigned int
+rte_st_ring_get_capacity(const struct rte_st_ring *r)
+{
+	return r->capacity;
+}
+
+/**
+ * Dump the status of all rings on the console
+ *
+ * @param f
+ *   A pointer to a file for output
+ */
+void rte_st_ring_list_dump(FILE *f);
+
+/**
+ * Search a ST ring from its name
+ *
+ * @param name
+ *   The name of the ring.
+ * @return
+ *   The pointer to the ring matching the name, or NULL if not found,
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - ENOENT - required entry not available to return.
+ */
+struct rte_st_ring *rte_st_ring_lookup(const char *name);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_ST_RING_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-08-21  6:04 [RFC] lib/st_ring: add single thread ring Honnappa Nagarahalli
@ 2023-08-21  7:37 ` Morten Brørup
  2023-08-22  5:47   ` Honnappa Nagarahalli
  2023-08-21 21:14 ` Mattias Rönnblom
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Morten Brørup @ 2023-08-21  7:37 UTC (permalink / raw)
  To: Honnappa Nagarahalli, jackmin, konstantin.v.ananyev
  Cc: dev, ruifeng.wang, aditya.ambadipudi, wathsala.vithanage, nd

> From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
> Sent: Monday, 21 August 2023 08.04
> 
> Add a single thread safe and multi-thread unsafe ring data structure.
> This library provides an simple and efficient alternative to multi-
> thread
> safe ring when multi-thread safety is not required.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---

Good idea.

However, I prefer it to be implemented in the ring lib as one more ring type. That would also give us a lot of the infrastructure (management functions, documentation and tests) for free.

The ring lib already has performance-optimized APIs for single-consumer and single-producer use, rte_ring_sc_dequeue_bulk() and rte_ring_sp_enqueue_burst(). Similar performance-optimized APIs for single-thread use could be added: rte_ring_st_dequeue_bulk() and rte_ring_st_enqueue_burst().

Regardless if added to the ring lib or as a separate lib, "reverse" APIs (for single-thread use only) and zero-copy APIs can be added at any time later.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] lib/st_ring: add single thread ring
  2023-08-21  6:04 [RFC] lib/st_ring: add single thread ring Honnappa Nagarahalli
  2023-08-21  7:37 ` Morten Brørup
@ 2023-08-21 21:14 ` Mattias Rönnblom
  2023-08-22  5:43   ` Honnappa Nagarahalli
  2023-09-04 10:13 ` Konstantin Ananyev
  2024-04-01  1:37 ` [PATCH v1 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  3 siblings, 1 reply; 48+ messages in thread
From: Mattias Rönnblom @ 2023-08-21 21:14 UTC (permalink / raw)
  To: Honnappa Nagarahalli, jackmin, konstantin.v.ananyev
  Cc: dev, ruifeng.wang, aditya.ambadipudi, wathsala.vithanage, nd

On 2023-08-21 08:04, Honnappa Nagarahalli wrote:
> Add a single thread safe and multi-thread unsafe ring data structure.

One must have set the bar very low, if one needs to specify that an API 
is single-thread safe.

> This library provides an simple and efficient alternative to multi-thread
> safe ring when multi-thread safety is not required.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
> v1:
> 1) The code is very prelimnary and is not even compiled
> 2) This is intended to show the APIs and some thoughts on implementation

If you haven't done it already, maybe it might be worth looking around 
in the code base for already-existing, more-or-less open-coded 
fifo/circular buffer type data structures. Just to make sure those can 
be eliminated if this makes it into DPDK.

There's one in rte_event_eth_rx_adapter.c, and I think one in the SW 
eventdev as well. Seems to be one in cmdline_cirbuf.h as well. I'm sure 
there are many more.

You could pick some other name for it, instead of the slightly awkward 
"st_ring" (e.g., "fifo", "cbuf", "cbuffer", "circ_buffer"). That would 
also leave you with more freedom to stray from the MT safe ring API 
without surprising the user, if needed (and I think it is needed).

Hopefully you can reduce API complexity compared to the MT-safe version. 
Having a name for these kinds of data structures doesn't make a lot of 
sense, for example. Skip the dump function. Relax from always_inline to 
just regular inline.

I'm not sure you need bulk/burst type operations. Without any memory 
fences, an optimizing compiler should do a pretty good job of unrolling 
multiple-element access type operations, assuming you leave the ST ring 
code in the header files (otherwise LTO is needed).

I think you will want a peek-type operation on the reader side. That 
more for convenience, rather than that I think the copies will actually 
be there in the object code (such should be eliminated by the compiler, 
given that the barriers are gone).

> 3) More APIs and the rest of the implementation will come in subsequent
>     versions
> 
>   lib/st_ring/rte_st_ring.h | 567 ++++++++++++++++++++++++++++++++++++++
>   1 file changed, 567 insertions(+)
>   create mode 100644 lib/st_ring/rte_st_ring.h
> 
> diff --git a/lib/st_ring/rte_st_ring.h b/lib/st_ring/rte_st_ring.h
> new file mode 100644
> index 0000000000..8cb8832591
> --- /dev/null
> +++ b/lib/st_ring/rte_st_ring.h
> @@ -0,0 +1,567 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Arm Limited
> + */
> +
> +#ifndef _RTE_ST_RING_H_
> +#define _RTE_ST_RING_H_
> +
> +/**
> + * @file
> + * RTE Signle Thread Ring (ST Ring)
> + *
> + * The ST Ring is a fixed-size queue intended to be accessed
> + * by one thread at a time. It does not provide concurrent access to
> + * multiple threads. If there are multiple threads accessing the ST ring,
> + * then the threads have to use locks to protect the ring from
> + * getting corrupted.

You are basically saying the same thing three times here.

> + *
> + * - FIFO (First In First Out)
> + * - Maximum size is fixed; the pointers are stored in a table.
> + * - Consumer and producer part of same thread.
> + * - Multi-thread producers and consumers need locking.

...two more times here. One might get the impression you really don't 
trust the reader.

> + * - Single/Bulk/burst dequeue at Tail or Head
> + * - Single/Bulk/burst enqueue at Head or Tail

Does this not sound more like a deque, than a FIFO/circular buffer? Are 
there any examples where this functionality (the double-endedness) is 
needed in the DPDK code base?

> + *
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_st_ring_core.h>
> +#include <rte_st_ring_elem.h>

Is the intention to provide a ring with compile-time variable element 
size? In other words, where the elements of a particular ring instance 
has the same element size, but different rings may have different 
element sizes.

Seems like a good idea to me, in that case. Although often you will have 
pointers, it would be useful to store larger things like small structs, 
and maybe smaller elements as well.

> +
> +/**
> + * Calculate the memory size needed for a ST ring
> + *
> + * This function returns the number of bytes needed for a ST ring, given
> + * the number of elements in it. This value is the sum of the size of
> + * the structure rte_st_ring and the size of the memory needed by the
> + * elements. The value is aligned to a cache line size.
> + *
> + * @param count
> + *   The number of elements in the ring (must be a power of 2).
> + * @return
> + *   - The memory size needed for the ST ring on success.
> + *   - -EINVAL if count is not a power of 2.
> + */
> +ssize_t rte_st_ring_get_memsize(unsigned int count);
> +
> +/**
> + * Initialize a ST ring structure.
> + *
> + * Initialize a ST ring structure in memory pointed by "r". The size of the
> + * memory area must be large enough to store the ring structure and the
> + * object table. It is advised to use rte_st_ring_get_memsize() to get the
> + * appropriate size.
> + *
> + * The ST ring size is set to *count*, which must be a power of two.
> + * The real usable ring size is *count-1* instead of *count* to
> + * differentiate a full ring from an empty ring.
> + *
> + * The ring is not added in RTE_TAILQ_ST_RING global list. Indeed, the
> + * memory given by the caller may not be shareable among dpdk
> + * processes.
> + *
> + * @param r
> + *   The pointer to the ring structure followed by the elements table.
> + * @param name
> + *   The name of the ring.
> + * @param count
> + *   The number of elements in the ring (must be a power of 2,
> + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> + * @param flags
> + *   An OR of the following:
> + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold
> + *     exactly the requested number of entries, and the requested size
> + *     will be rounded up to the next power of two, but the usable space
> + *     will be exactly that requested. Worst case, if a power-of-2 size is
> + *     requested, half the ring space will be wasted.
> + *     Without this flag set, the ring size requested must be a power of 2,
> + *     and the usable space will be that size - 1.
> + * @return
> + *   0 on success, or a negative value on error.
> + */
> +int rte_st_ring_init(struct rte_st_ring *r, const char *name,
> +	unsigned int count, unsigned int flags);
> +
> +/**
> + * Create a new ST ring named *name* in memory.
> + *
> + * This function uses ``memzone_reserve()`` to allocate memory. Then it
> + * calls rte_st_ring_init() to initialize an empty ring.
> + *
> + * The new ring size is set to *count*, which must be a power of two.
> + * The real usable ring size is *count-1* instead of *count* to
> + * differentiate a full ring from an empty ring.
> + *
> + * The ring is added in RTE_TAILQ_ST_RING list.
> + *
> + * @param name
> + *   The name of the ring.
> + * @param count
> + *   The size of the ring (must be a power of 2,
> + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> + * @param socket_id
> + *   The *socket_id* argument is the socket identifier in case of
> + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> + *   constraint for the reserved zone.
> + * @param flags
> + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold exactly the
> + *     requested number of entries, and the requested size will be rounded up
> + *     to the next power of two, but the usable space will be exactly that
> + *     requested. Worst case, if a power-of-2 size is requested, half the
> + *     ring space will be wasted.
> + *     Without this flag set, the ring size requested must be a power of 2,
> + *     and the usable space will be that size - 1.
> + * @return
> + *   On success, the pointer to the new allocated ring. NULL on error with
> + *    rte_errno set appropriately. Possible errno values include:
> + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
> + *    - EINVAL - count provided is not a power of 2
> + *    - ENOSPC - the maximum number of memzones has already been allocated
> + *    - EEXIST - a memzone with the same name already exists
> + *    - ENOMEM - no appropriate memory area found in which to create memzone
> + */
> +struct rte_st_ring *rte_st_ring_create(const char *name, unsigned int count,
> +				 int socket_id, unsigned int flags);
> +
> +/**
> + * De-allocate all memory used by the ring.
> + *
> + * @param r
> + *   Ring to free.
> + *   If NULL then, the function does nothing.
> + */
> +void rte_st_ring_free(struct rte_st_ring *r);
> +
> +/**
> + * Dump the status of the ring to a file.
> + *
> + * @param f
> + *   A pointer to a file for output
> + * @param r
> + *   A pointer to the ring structure.
> + */
> +void rte_st_ring_dump(FILE *f, const struct rte_st_ring *r);
> +
> +/**
> + * Enqueue fixed number of objects on a ST ring.
> + *
> + * This function copies the objects at the head of the ring and
> + * moves the head index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_enqueue_bulk(struct rte_st_ring *r, void * const *obj_table,
> +		      unsigned int n, unsigned int *free_space)
> +{
> +	return rte_st_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
> +}
> +
> +/**
> + * Enqueue upto a maximum number of objects on a ST ring.
> + *
> + * This function copies the objects at the head of the ring and
> + * moves the head index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_enqueue_burst(struct rte_st_ring *r, void * const *obj_table,
> +		      unsigned int n, unsigned int *free_space)
> +{
> +	return rte_st_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
> +}
> +
> +/**
> + * Enqueue one object on a ST ring.
> + *
> + * This function copies one object at the head of the ring and
> + * moves the head index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj
> + *   A pointer to the object to be added.
> + * @return
> + *   - 0: Success; objects enqueued.
> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
> + */
> +static __rte_always_inline int
> +rte_st_ring_enqueue(struct rte_st_ring *r, void *obj)
> +{
> +	return rte_st_ring_enqueue_elem(r, &obj, sizeof(void *));
> +}
> +
> +/**
> + * Enqueue fixed number of objects on a ST ring at the tail.
> + *
> + * This function copies the objects at the tail of the ring and
> + * moves the tail index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_enqueue_at_tail_bulk(struct rte_st_ring *r,
> +				 void * const *obj_table, unsigned int n,
> +				 unsigned int *free_space)
> +{
> +	return rte_st_ring_enqueue_at_tail_bulk_elem(r, obj_table,
> +			sizeof(void *), n, free_space);
> +}
> +
> +/**
> + * Enqueue upto a maximum number of objects on a ST ring at the tail.
> + *
> + * This function copies the objects at the tail of the ring and
> + * moves the tail index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_enqueue_at_tail_burst(struct rte_st_ring *r,
> +				  void * const *obj_table, unsigned int n,
> +				  unsigned int *free_space)
> +{
> +	return rte_st_ring_enqueue_at_tail_burst_elem(r, obj_table,
> +			sizeof(void *), n, free_space);
> +}
> +
> +/**
> + * Enqueue one object on a ST ring at tail.
> + *
> + * This function copies one object at the tail of the ring and
> + * moves the tail index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj
> + *   A pointer to the object to be added.
> + * @return
> + *   - 0: Success; objects enqueued.
> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
> + */
> +static __rte_always_inline int
> +rte_st_ring_enqueue_at_tail(struct rte_st_ring *r, void *obj)
> +{
> +	return rte_st_ring_enqueue_at_tail_elem(r, &obj, sizeof(void *));
> +}
> +
> +/**
> + * Dequeue a fixed number of objects from a ST ring.
> + *
> + * This function copies the objects from the tail of the ring and
> + * moves the tail index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_dequeue_bulk(struct rte_st_ring *r, void **obj_table, unsigned int n,
> +		unsigned int *available)
> +{
> +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
> +}
> +
> +/**
> + * Dequeue upto a maximum number of objects from a ST ring.
> + *
> + * This function copies the objects from the tail of the ring and
> + * moves the tail index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - Number of objects dequeued
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_dequeue_burst(struct rte_st_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available)
> +{
> +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> +			n, available);
> +}
> +
> +/**
> + * Dequeue one object from a ST ring.
> + *
> + * This function copies one object from the tail of the ring and
> + * moves the tail index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @return
> + *   - 0: Success, objects dequeued.
> + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> + *     dequeued.
> + */
> +static __rte_always_inline int
> +rte_st_ring_dequeue(struct rte_st_ring *r, void **obj_p)
> +{
> +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *));
> +}
> +
> +/**
> + * Dequeue a fixed number of objects from a ST ring from the head.
> + *
> + * This function copies the objects from the head of the ring and
> + * moves the head index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_dequeue_at_head_bulk(struct rte_st_ring *r, void **obj_table, unsigned int n,
> +		unsigned int *available)
> +{
> +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
> +}
> +
> +/**
> + * Dequeue upto a maximum number of objects from a ST ring from the head.
> + *
> + * This function copies the objects from the head of the ring and
> + * moves the head index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - Number of objects dequeued
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_dequeue_at_head_burst(struct rte_st_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available)
> +{
> +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> +			n, available);
> +}
> +
> +/**
> + * Dequeue one object from a ST ring from the head.
> + *
> + * This function copies the objects from the head of the ring and
> + * moves the head index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @return
> + *   - 0: Success, objects dequeued.
> + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> + *     dequeued.
> + */
> +static __rte_always_inline int
> +rte_st_ring_at_head_dequeue(struct rte_st_ring *r, void **obj_p)
> +{
> +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *));
> +}
> +
> +/**
> + * Flush a ST ring.
> + *
> + * This function flush all the elements in a ST ring
> + *
> + * @warning
> + * Make sure the ring is not in use while calling this function.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + */
> +void
> +rte_st_ring_reset(struct rte_st_ring *r);
> +
> +/**
> + * Return the number of entries in a ST ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   The number of entries in the ring.
> + */
> +static inline unsigned int
> +rte_st_ring_count(const struct rte_st_ring *r)
> +{
> +	uint32_t count = (r->head - r->tail) & r->mask;
> +	return count;
> +}
> +
> +/**
> + * Return the number of free entries in a ST ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   The number of free entries in the ring.
> + */
> +static inline unsigned int
> +rte_st_ring_free_count(const struct rte_st_ring *r)
> +{
> +	return r->capacity - rte_st_ring_count(r);
> +}
> +
> +/**
> + * Test if a ST ring is full.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   - 1: The ring is full.
> + *   - 0: The ring is not full.
> + */
> +static inline int
> +rte_st_ring_full(const struct rte_st_ring *r)
> +{
> +	return rte_st_ring_free_count(r) == 0;
> +}
> +
> +/**
> + * Test if a ST ring is empty.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   - 1: The ring is empty.
> + *   - 0: The ring is not empty.
> + */
> +static inline int
> +rte_st_ring_empty(const struct rte_st_ring *r)
> +{
> +	return r->tail == r->head;
> +}
> +
> +/**
> + * Return the size of the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   The size of the data store used by the ring.
> + *   NOTE: this is not the same as the usable space in the ring. To query that
> + *   use ``rte_st_ring_get_capacity()``.
> + */
> +static inline unsigned int
> +rte_st_ring_get_size(const struct rte_st_ring *r)
> +{
> +	return r->size;
> +}
> +
> +/**
> + * Return the number of elements which can be stored in the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   The usable size of the ring.
> + */
> +static inline unsigned int
> +rte_st_ring_get_capacity(const struct rte_st_ring *r)
> +{
> +	return r->capacity;
> +}
> +
> +/**
> + * Dump the status of all rings on the console
> + *
> + * @param f
> + *   A pointer to a file for output
> + */
> +void rte_st_ring_list_dump(FILE *f);
> +
> +/**
> + * Search a ST ring from its name
> + *
> + * @param name
> + *   The name of the ring.
> + * @return
> + *   The pointer to the ring matching the name, or NULL if not found,
> + *   with rte_errno set appropriately. Possible rte_errno values include:
> + *    - ENOENT - required entry not available to return.
> + */
> +struct rte_st_ring *rte_st_ring_lookup(const char *name);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_ST_RING_H_ */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-08-21 21:14 ` Mattias Rönnblom
@ 2023-08-22  5:43   ` Honnappa Nagarahalli
  2023-08-22  8:04     ` Mattias Rönnblom
  0 siblings, 1 reply; 48+ messages in thread
From: Honnappa Nagarahalli @ 2023-08-22  5:43 UTC (permalink / raw)
  To: Mattias Rönnblom, jackmin, konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi,
	Wathsala Wathawana Vithanage, nd, nd



> -----Original Message-----
> From: Mattias Rönnblom <hofors@lysator.liu.se>
> Sent: Monday, August 21, 2023 4:14 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> jackmin@nvidia.com; konstantin.v.ananyev@yandex.ru
> Cc: dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Aditya
> Ambadipudi <Aditya.Ambadipudi@arm.com>; Wathsala Wathawana
> Vithanage <wathsala.vithanage@arm.com>; nd <nd@arm.com>
> Subject: Re: [RFC] lib/st_ring: add single thread ring
> 
> On 2023-08-21 08:04, Honnappa Nagarahalli wrote:
> > Add a single thread safe and multi-thread unsafe ring data structure.
> 
> One must have set the bar very low, if one needs to specify that an API is
> single-thread safe.
> 
> > This library provides an simple and efficient alternative to
> > multi-thread safe ring when multi-thread safety is not required.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> > v1:
> > 1) The code is very prelimnary and is not even compiled
> > 2) This is intended to show the APIs and some thoughts on
> > implementation
> 
> If you haven't done it already, maybe it might be worth looking around in the
> code base for already-existing, more-or-less open-coded fifo/circular buffer
> type data structures. Just to make sure those can be eliminated if this makes
> it into DPDK.
> 
> There's one in rte_event_eth_rx_adapter.c, and I think one in the SW
> eventdev as well. Seems to be one in cmdline_cirbuf.h as well. I'm sure there
> are many more.
I knew there are some, but have not looked at them yet. I will look at them.

> 
> You could pick some other name for it, instead of the slightly awkward
> "st_ring" (e.g., "fifo", "cbuf", "cbuffer", "circ_buffer"). That would also leave
> you with more freedom to stray from the MT safe ring API without surprising
> the user, if needed (and I think it is needed).
The thought was to make it clear that this is for single-thread use (i.e.not even producer and consumer on different threads), may be I do not need to try hard.
"fifo" might not be good option given that dequeue/enqueue at both ends of the ring are required/allowed.
Wikipedia [1] and others [2], [3] indicates that this data structure should be called 'deque' (pronounced as deck). I would prefer to go with this (assuming this will be outside of 'rte_ring')

[1] https://en.wikipedia.org/wiki/Double-ended_queue
[2] https://www.geeksforgeeks.org/deque-set-1-introduction-applications/
[3] https://stackoverflow.com/questions/3880254/why-do-we-need-deque-data-structures-in-the-real-world#:~:text=A%20Deque%20is%20a%20double,thing%20on%20front%20of%20queue.

> 
> Hopefully you can reduce API complexity compared to the MT-safe version.
> Having a name for these kinds of data structures doesn't make a lot of sense,
> for example. Skip the dump function. Relax from always_inline to just regular
> inline.
Yes, plan is to reduce complexity (compared to rte_ring) and some APIs can be skipped until there is a need.

> 
> I'm not sure you need bulk/burst type operations. Without any memory
> fences, an optimizing compiler should do a pretty good job of unrolling
> multiple-element access type operations, assuming you leave the ST ring
> code in the header files (otherwise LTO is needed).
IMO, bulk/burst APIs are about the functionality rather than loop unrolling. APIs to work with single objects can be skipped (use bulk APIs with n=1).

> 
> I think you will want a peek-type operation on the reader side. That more for
> convenience, rather than that I think the copies will actually be there in the
> object code (such should be eliminated by the compiler, given that the
> barriers are gone).
> 
> > 3) More APIs and the rest of the implementation will come in subsequent
> >     versions
> >
> >   lib/st_ring/rte_st_ring.h | 567
> ++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 567 insertions(+)
> >   create mode 100644 lib/st_ring/rte_st_ring.h
> >
> > diff --git a/lib/st_ring/rte_st_ring.h b/lib/st_ring/rte_st_ring.h new
> > file mode 100644 index 0000000000..8cb8832591
> > --- /dev/null
> > +++ b/lib/st_ring/rte_st_ring.h
> > @@ -0,0 +1,567 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2023 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_ST_RING_H_
> > +#define _RTE_ST_RING_H_
> > +
> > +/**
> > + * @file
> > + * RTE Signle Thread Ring (ST Ring)
> > + *
> > + * The ST Ring is a fixed-size queue intended to be accessed
> > + * by one thread at a time. It does not provide concurrent access to
> > + * multiple threads. If there are multiple threads accessing the ST
> > +ring,
> > + * then the threads have to use locks to protect the ring from
> > + * getting corrupted.
> 
> You are basically saying the same thing three times here.
> 
> > + *
> > + * - FIFO (First In First Out)
> > + * - Maximum size is fixed; the pointers are stored in a table.
> > + * - Consumer and producer part of same thread.
> > + * - Multi-thread producers and consumers need locking.
> 
> ...two more times here. One might get the impression you really don't trust
> the reader.
> 
> > + * - Single/Bulk/burst dequeue at Tail or Head
> > + * - Single/Bulk/burst enqueue at Head or Tail
> 
> Does this not sound more like a deque, than a FIFO/circular buffer? Are there
> any examples where this functionality (the double-endedness) is needed in
> the DPDK code base?
I see, you are calling it 'deque' as well. Basically, this patch originated due to a requirement in MLX PMD [1]

[1] https://github.com/DPDK/dpdk/blob/main/drivers/net/mlx5/mlx5_hws_cnt.h#L381

> 
> > + *
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_st_ring_core.h>
> > +#include <rte_st_ring_elem.h>
> 
> Is the intention to provide a ring with compile-time variable element size? In
> other words, where the elements of a particular ring instance has the same
> element size, but different rings may have different element sizes.
> 
> Seems like a good idea to me, in that case. Although often you will have
> pointers, it would be useful to store larger things like small structs, and
> maybe smaller elements as well.
Yes, the idea is to make the element size flexible and also compile-time constant.

> 
> > +
> > +/**
> > + * Calculate the memory size needed for a ST ring
> > + *
> > + * This function returns the number of bytes needed for a ST ring,
> > +given
> > + * the number of elements in it. This value is the sum of the size of
> > + * the structure rte_st_ring and the size of the memory needed by the
> > + * elements. The value is aligned to a cache line size.
> > + *
> > + * @param count
> > + *   The number of elements in the ring (must be a power of 2).
> > + * @return
> > + *   - The memory size needed for the ST ring on success.
> > + *   - -EINVAL if count is not a power of 2.
> > + */
> > +ssize_t rte_st_ring_get_memsize(unsigned int count);
> > +
> > +/**
> > + * Initialize a ST ring structure.
> > + *
> > + * Initialize a ST ring structure in memory pointed by "r". The size
> > +of the
> > + * memory area must be large enough to store the ring structure and
> > +the
> > + * object table. It is advised to use rte_st_ring_get_memsize() to
> > +get the
> > + * appropriate size.
> > + *
> > + * The ST ring size is set to *count*, which must be a power of two.
> > + * The real usable ring size is *count-1* instead of *count* to
> > + * differentiate a full ring from an empty ring.
> > + *
> > + * The ring is not added in RTE_TAILQ_ST_RING global list. Indeed,
> > +the
> > + * memory given by the caller may not be shareable among dpdk
> > + * processes.
> > + *
> > + * @param r
> > + *   The pointer to the ring structure followed by the elements table.
> > + * @param name
> > + *   The name of the ring.
> > + * @param count
> > + *   The number of elements in the ring (must be a power of 2,
> > + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> > + * @param flags
> > + *   An OR of the following:
> > + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold
> > + *     exactly the requested number of entries, and the requested size
> > + *     will be rounded up to the next power of two, but the usable space
> > + *     will be exactly that requested. Worst case, if a power-of-2 size is
> > + *     requested, half the ring space will be wasted.
> > + *     Without this flag set, the ring size requested must be a power of 2,
> > + *     and the usable space will be that size - 1.
> > + * @return
> > + *   0 on success, or a negative value on error.
> > + */
> > +int rte_st_ring_init(struct rte_st_ring *r, const char *name,
> > +	unsigned int count, unsigned int flags);
> > +
> > +/**
> > + * Create a new ST ring named *name* in memory.
> > + *
> > + * This function uses ``memzone_reserve()`` to allocate memory. Then
> > +it
> > + * calls rte_st_ring_init() to initialize an empty ring.
> > + *
> > + * The new ring size is set to *count*, which must be a power of two.
> > + * The real usable ring size is *count-1* instead of *count* to
> > + * differentiate a full ring from an empty ring.
> > + *
> > + * The ring is added in RTE_TAILQ_ST_RING list.
> > + *
> > + * @param name
> > + *   The name of the ring.
> > + * @param count
> > + *   The size of the ring (must be a power of 2,
> > + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> > + * @param socket_id
> > + *   The *socket_id* argument is the socket identifier in case of
> > + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> > + *   constraint for the reserved zone.
> > + * @param flags
> > + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold exactly
> the
> > + *     requested number of entries, and the requested size will be rounded
> up
> > + *     to the next power of two, but the usable space will be exactly that
> > + *     requested. Worst case, if a power-of-2 size is requested, half the
> > + *     ring space will be wasted.
> > + *     Without this flag set, the ring size requested must be a power of 2,
> > + *     and the usable space will be that size - 1.
> > + * @return
> > + *   On success, the pointer to the new allocated ring. NULL on error with
> > + *    rte_errno set appropriately. Possible errno values include:
> > + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config
> structure
> > + *    - EINVAL - count provided is not a power of 2
> > + *    - ENOSPC - the maximum number of memzones has already been
> allocated
> > + *    - EEXIST - a memzone with the same name already exists
> > + *    - ENOMEM - no appropriate memory area found in which to create
> memzone
> > + */
> > +struct rte_st_ring *rte_st_ring_create(const char *name, unsigned int
> count,
> > +				 int socket_id, unsigned int flags);
> > +
> > +/**
> > + * De-allocate all memory used by the ring.
> > + *
> > + * @param r
> > + *   Ring to free.
> > + *   If NULL then, the function does nothing.
> > + */
> > +void rte_st_ring_free(struct rte_st_ring *r);
> > +
> > +/**
> > + * Dump the status of the ring to a file.
> > + *
> > + * @param f
> > + *   A pointer to a file for output
> > + * @param r
> > + *   A pointer to the ring structure.
> > + */
> > +void rte_st_ring_dump(FILE *f, const struct rte_st_ring *r);
> > +
> > +/**
> > + * Enqueue fixed number of objects on a ST ring.
> > + *
> > + * This function copies the objects at the head of the ring and
> > + * moves the head index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects enqueued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_enqueue_bulk(struct rte_st_ring *r, void * const *obj_table,
> > +		      unsigned int n, unsigned int *free_space) {
> > +	return rte_st_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue upto a maximum number of objects on a ST ring.
> > + *
> > + * This function copies the objects at the head of the ring and
> > + * moves the head index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   - n: Actual number of objects enqueued.
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_enqueue_burst(struct rte_st_ring *r, void * const *obj_table,
> > +		      unsigned int n, unsigned int *free_space) {
> > +	return rte_st_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue one object on a ST ring.
> > + *
> > + * This function copies one object at the head of the ring and
> > + * moves the head index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj
> > + *   A pointer to the object to be added.
> > + * @return
> > + *   - 0: Success; objects enqueued.
> > + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> enqueued.
> > + */
> > +static __rte_always_inline int
> > +rte_st_ring_enqueue(struct rte_st_ring *r, void *obj) {
> > +	return rte_st_ring_enqueue_elem(r, &obj, sizeof(void *)); }
> > +
> > +/**
> > + * Enqueue fixed number of objects on a ST ring at the tail.
> > + *
> > + * This function copies the objects at the tail of the ring and
> > + * moves the tail index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects enqueued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_enqueue_at_tail_bulk(struct rte_st_ring *r,
> > +				 void * const *obj_table, unsigned int n,
> > +				 unsigned int *free_space)
> > +{
> > +	return rte_st_ring_enqueue_at_tail_bulk_elem(r, obj_table,
> > +			sizeof(void *), n, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue upto a maximum number of objects on a ST ring at the tail.
> > + *
> > + * This function copies the objects at the tail of the ring and
> > + * moves the tail index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   - n: Actual number of objects enqueued.
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_enqueue_at_tail_burst(struct rte_st_ring *r,
> > +				  void * const *obj_table, unsigned int n,
> > +				  unsigned int *free_space)
> > +{
> > +	return rte_st_ring_enqueue_at_tail_burst_elem(r, obj_table,
> > +			sizeof(void *), n, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue one object on a ST ring at tail.
> > + *
> > + * This function copies one object at the tail of the ring and
> > + * moves the tail index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj
> > + *   A pointer to the object to be added.
> > + * @return
> > + *   - 0: Success; objects enqueued.
> > + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> enqueued.
> > + */
> > +static __rte_always_inline int
> > +rte_st_ring_enqueue_at_tail(struct rte_st_ring *r, void *obj) {
> > +	return rte_st_ring_enqueue_at_tail_elem(r, &obj, sizeof(void *)); }
> > +
> > +/**
> > + * Dequeue a fixed number of objects from a ST ring.
> > + *
> > + * This function copies the objects from the tail of the ring and
> > + * moves the tail index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   The number of objects dequeued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_dequeue_bulk(struct rte_st_ring *r, void **obj_table,
> unsigned int n,
> > +		unsigned int *available)
> > +{
> > +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> > +}
> > +
> > +/**
> > + * Dequeue upto a maximum number of objects from a ST ring.
> > + *
> > + * This function copies the objects from the tail of the ring and
> > + * moves the tail index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   - Number of objects dequeued
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_dequeue_burst(struct rte_st_ring *r, void **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> > +}
> > +
> > +/**
> > + * Dequeue one object from a ST ring.
> > + *
> > + * This function copies one object from the tail of the ring and
> > + * moves the tail index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_p
> > + *   A pointer to a void * pointer (object) that will be filled.
> > + * @return
> > + *   - 0: Success, objects dequeued.
> > + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> > + *     dequeued.
> > + */
> > +static __rte_always_inline int
> > +rte_st_ring_dequeue(struct rte_st_ring *r, void **obj_p) {
> > +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *)); }
> > +
> > +/**
> > + * Dequeue a fixed number of objects from a ST ring from the head.
> > + *
> > + * This function copies the objects from the head of the ring and
> > + * moves the head index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   The number of objects dequeued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_dequeue_at_head_bulk(struct rte_st_ring *r, void
> **obj_table, unsigned int n,
> > +		unsigned int *available)
> > +{
> > +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> > +}
> > +
> > +/**
> > + * Dequeue upto a maximum number of objects from a ST ring from the
> head.
> > + *
> > + * This function copies the objects from the head of the ring and
> > + * moves the head index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   - Number of objects dequeued
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_dequeue_at_head_burst(struct rte_st_ring *r, void
> **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> > +}
> > +
> > +/**
> > + * Dequeue one object from a ST ring from the head.
> > + *
> > + * This function copies the objects from the head of the ring and
> > + * moves the head index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_p
> > + *   A pointer to a void * pointer (object) that will be filled.
> > + * @return
> > + *   - 0: Success, objects dequeued.
> > + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> > + *     dequeued.
> > + */
> > +static __rte_always_inline int
> > +rte_st_ring_at_head_dequeue(struct rte_st_ring *r, void **obj_p) {
> > +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *)); }
> > +
> > +/**
> > + * Flush a ST ring.
> > + *
> > + * This function flush all the elements in a ST ring
> > + *
> > + * @warning
> > + * Make sure the ring is not in use while calling this function.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + */
> > +void
> > +rte_st_ring_reset(struct rte_st_ring *r);
> > +
> > +/**
> > + * Return the number of entries in a ST ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   The number of entries in the ring.
> > + */
> > +static inline unsigned int
> > +rte_st_ring_count(const struct rte_st_ring *r) {
> > +	uint32_t count = (r->head - r->tail) & r->mask;
> > +	return count;
> > +}
> > +
> > +/**
> > + * Return the number of free entries in a ST ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   The number of free entries in the ring.
> > + */
> > +static inline unsigned int
> > +rte_st_ring_free_count(const struct rte_st_ring *r) {
> > +	return r->capacity - rte_st_ring_count(r); }
> > +
> > +/**
> > + * Test if a ST ring is full.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   - 1: The ring is full.
> > + *   - 0: The ring is not full.
> > + */
> > +static inline int
> > +rte_st_ring_full(const struct rte_st_ring *r) {
> > +	return rte_st_ring_free_count(r) == 0; }
> > +
> > +/**
> > + * Test if a ST ring is empty.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   - 1: The ring is empty.
> > + *   - 0: The ring is not empty.
> > + */
> > +static inline int
> > +rte_st_ring_empty(const struct rte_st_ring *r) {
> > +	return r->tail == r->head;
> > +}
> > +
> > +/**
> > + * Return the size of the ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   The size of the data store used by the ring.
> > + *   NOTE: this is not the same as the usable space in the ring. To query that
> > + *   use ``rte_st_ring_get_capacity()``.
> > + */
> > +static inline unsigned int
> > +rte_st_ring_get_size(const struct rte_st_ring *r) {
> > +	return r->size;
> > +}
> > +
> > +/**
> > + * Return the number of elements which can be stored in the ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   The usable size of the ring.
> > + */
> > +static inline unsigned int
> > +rte_st_ring_get_capacity(const struct rte_st_ring *r) {
> > +	return r->capacity;
> > +}
> > +
> > +/**
> > + * Dump the status of all rings on the console
> > + *
> > + * @param f
> > + *   A pointer to a file for output
> > + */
> > +void rte_st_ring_list_dump(FILE *f);
> > +
> > +/**
> > + * Search a ST ring from its name
> > + *
> > + * @param name
> > + *   The name of the ring.
> > + * @return
> > + *   The pointer to the ring matching the name, or NULL if not found,
> > + *   with rte_errno set appropriately. Possible rte_errno values include:
> > + *    - ENOENT - required entry not available to return.
> > + */
> > +struct rte_st_ring *rte_st_ring_lookup(const char *name);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_ST_RING_H_ */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-08-21  7:37 ` Morten Brørup
@ 2023-08-22  5:47   ` Honnappa Nagarahalli
  2023-08-24  8:05     ` Morten Brørup
  0 siblings, 1 reply; 48+ messages in thread
From: Honnappa Nagarahalli @ 2023-08-22  5:47 UTC (permalink / raw)
  To: Morten Brørup, jackmin, konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi,
	Wathsala Wathawana Vithanage, nd, nd



> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Monday, August 21, 2023 2:37 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> jackmin@nvidia.com; konstantin.v.ananyev@yandex.ru
> Cc: dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Aditya
> Ambadipudi <Aditya.Ambadipudi@arm.com>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com>; nd <nd@arm.com>
> Subject: RE: [RFC] lib/st_ring: add single thread ring
> 
> > From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
> > Sent: Monday, 21 August 2023 08.04
> >
> > Add a single thread safe and multi-thread unsafe ring data structure.
> > This library provides an simple and efficient alternative to multi-
> > thread safe ring when multi-thread safety is not required.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> 
> Good idea.
> 
> However, I prefer it to be implemented in the ring lib as one more ring type.
> That would also give us a lot of the infrastructure (management functions,
> documentation and tests) for free.
IMO, the current code for rte_ring seems complex with C11 and generic implementations, APIs for pointer objects vs APIs for flexible element size etc. I did not want to introduce one more flavor and make it more complex.
The requirements are different as well. For ex: single thread ring needs APIs for dequeuing and enqueuing at both ends of the ring which is not applicable to existing RTE ring.

But, I see how the existing infra can be reused easily.

> 
> The ring lib already has performance-optimized APIs for single-consumer and
> single-producer use, rte_ring_sc_dequeue_bulk() and
> rte_ring_sp_enqueue_burst(). Similar performance-optimized APIs for single-
> thread use could be added: rte_ring_st_dequeue_bulk() and
> rte_ring_st_enqueue_burst().
Yes, the names look fine.
Looking through the code. We have the sync type enum:

/** prod/cons sync types */
enum rte_ring_sync_type {
        RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
        RTE_RING_SYNC_ST,     /**< single thread only */
        RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
        RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
};

The type RTE_RING_SYNC_ST needs better explanation (not a problem). But, this name would have been ideal to use for single thread ring.
This enum does not need to be exposed to the users. However, there are rte_ring_get_prod/cons_sync_type etc which seem to be exposed to the user.
This all means, we need to have a sync type name RTE_RING_SYNC_MT_UNSAFE (any other better name?) which then affects API naming. rte_ring_mt_unsafe_dequeue_bulk?

> 
> Regardless if added to the ring lib or as a separate lib, "reverse" APIs (for single-
> thread use only) and zero-copy APIs can be added at any time later.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] lib/st_ring: add single thread ring
  2023-08-22  5:43   ` Honnappa Nagarahalli
@ 2023-08-22  8:04     ` Mattias Rönnblom
  2023-08-22 16:28       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 48+ messages in thread
From: Mattias Rönnblom @ 2023-08-22  8:04 UTC (permalink / raw)
  To: Honnappa Nagarahalli, jackmin, konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi, Wathsala Wathawana Vithanage, nd

On 2023-08-22 07:43, Honnappa Nagarahalli wrote:
> 
> 
>> -----Original Message-----
>> From: Mattias Rönnblom <hofors@lysator.liu.se>
>> Sent: Monday, August 21, 2023 4:14 PM
>> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
>> jackmin@nvidia.com; konstantin.v.ananyev@yandex.ru
>> Cc: dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Aditya
>> Ambadipudi <Aditya.Ambadipudi@arm.com>; Wathsala Wathawana
>> Vithanage <wathsala.vithanage@arm.com>; nd <nd@arm.com>
>> Subject: Re: [RFC] lib/st_ring: add single thread ring
>>
>> On 2023-08-21 08:04, Honnappa Nagarahalli wrote:
>>> Add a single thread safe and multi-thread unsafe ring data structure.
>>
>> One must have set the bar very low, if one needs to specify that an API is
>> single-thread safe.
>>
>>> This library provides an simple and efficient alternative to
>>> multi-thread safe ring when multi-thread safety is not required.
>>>
>>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>> ---
>>> v1:
>>> 1) The code is very prelimnary and is not even compiled
>>> 2) This is intended to show the APIs and some thoughts on
>>> implementation
>>
>> If you haven't done it already, maybe it might be worth looking around in the
>> code base for already-existing, more-or-less open-coded fifo/circular buffer
>> type data structures. Just to make sure those can be eliminated if this makes
>> it into DPDK.
>>
>> There's one in rte_event_eth_rx_adapter.c, and I think one in the SW
>> eventdev as well. Seems to be one in cmdline_cirbuf.h as well. I'm sure there
>> are many more.
> I knew there are some, but have not looked at them yet. I will look at them.
> 
>>
>> You could pick some other name for it, instead of the slightly awkward
>> "st_ring" (e.g., "fifo", "cbuf", "cbuffer", "circ_buffer"). That would also leave
>> you with more freedom to stray from the MT safe ring API without surprising
>> the user, if needed (and I think it is needed).
> The thought was to make it clear that this is for single-thread use (i.e.not even producer and consumer on different threads), may be I do not need to try hard.
> "fifo" might not be good option given that dequeue/enqueue at both ends of the ring are required/allowed.
> Wikipedia [1] and others [2], [3] indicates that this data structure should be called 'deque' (pronounced as deck). I would prefer to go with this (assuming this will be outside of 'rte_ring')
> 
> [1] https://en.wikipedia.org/wiki/Double-ended_queue
> [2] https://www.geeksforgeeks.org/deque-set-1-introduction-applications/
> [3] https://stackoverflow.com/questions/3880254/why-do-we-need-deque-data-structures-in-the-real-world#:~:text=A%20Deque%20is%20a%20double,thing%20on%20front%20of%20queue.
> 
>>
>> Hopefully you can reduce API complexity compared to the MT-safe version.
>> Having a name for these kinds of data structures doesn't make a lot of sense,
>> for example. Skip the dump function. Relax from always_inline to just regular
>> inline.
> Yes, plan is to reduce complexity (compared to rte_ring) and some APIs can be skipped until there is a need.
> 
>>
>> I'm not sure you need bulk/burst type operations. Without any memory
>> fences, an optimizing compiler should do a pretty good job of unrolling
>> multiple-element access type operations, assuming you leave the ST ring
>> code in the header files (otherwise LTO is needed).
> IMO, bulk/burst APIs are about the functionality rather than loop unrolling. APIs to work with single objects can be skipped (use bulk APIs with n=1).
> 

Given that this data structure will often be use in conjunction with 
other burst/bulk type operations, I agree.

What about peek? I guess you could have a burst/bulk peek as well, would 
that operation be needed. I think it will be needed, but the 
introduction of such API elements could always be deferred.

>>
>> I think you will want a peek-type operation on the reader side. That more for
>> convenience, rather than that I think the copies will actually be there in the
>> object code (such should be eliminated by the compiler, given that the
>> barriers are gone).
>>
>>> 3) More APIs and the rest of the implementation will come in subsequent
>>>      versions
>>>
>>>    lib/st_ring/rte_st_ring.h | 567
>> ++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 567 insertions(+)
>>>    create mode 100644 lib/st_ring/rte_st_ring.h
>>>
>>> diff --git a/lib/st_ring/rte_st_ring.h b/lib/st_ring/rte_st_ring.h new
>>> file mode 100644 index 0000000000..8cb8832591
>>> --- /dev/null
>>> +++ b/lib/st_ring/rte_st_ring.h
>>> @@ -0,0 +1,567 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright(c) 2023 Arm Limited
>>> + */
>>> +
>>> +#ifndef _RTE_ST_RING_H_
>>> +#define _RTE_ST_RING_H_
>>> +
>>> +/**
>>> + * @file
>>> + * RTE Signle Thread Ring (ST Ring)
>>> + *
>>> + * The ST Ring is a fixed-size queue intended to be accessed
>>> + * by one thread at a time. It does not provide concurrent access to
>>> + * multiple threads. If there are multiple threads accessing the ST
>>> +ring,
>>> + * then the threads have to use locks to protect the ring from
>>> + * getting corrupted.
>>
>> You are basically saying the same thing three times here.
>>
>>> + *
>>> + * - FIFO (First In First Out)
>>> + * - Maximum size is fixed; the pointers are stored in a table.
>>> + * - Consumer and producer part of same thread.
>>> + * - Multi-thread producers and consumers need locking.
>>
>> ...two more times here. One might get the impression you really don't trust
>> the reader.
>>
>>> + * - Single/Bulk/burst dequeue at Tail or Head
>>> + * - Single/Bulk/burst enqueue at Head or Tail
>>
>> Does this not sound more like a deque, than a FIFO/circular buffer? Are there
>> any examples where this functionality (the double-endedness) is needed in
>> the DPDK code base?
> I see, you are calling it 'deque' as well. Basically, this patch originated due to a requirement in MLX PMD [1]
> 
> [1] https://github.com/DPDK/dpdk/blob/main/drivers/net/mlx5/mlx5_hws_cnt.h#L381
> 
>>
>>> + *
>>> + */
>>> +
>>> +#ifdef __cplusplus
>>> +extern "C" {
>>> +#endif
>>> +
>>> +#include <rte_st_ring_core.h>
>>> +#include <rte_st_ring_elem.h>
>>
>> Is the intention to provide a ring with compile-time variable element size? In
>> other words, where the elements of a particular ring instance has the same
>> element size, but different rings may have different element sizes.
>>
>> Seems like a good idea to me, in that case. Although often you will have
>> pointers, it would be useful to store larger things like small structs, and
>> maybe smaller elements as well.
> Yes, the idea is to make the element size flexible and also compile-time constant.
> 
>>
>>> +
>>> +/**
>>> + * Calculate the memory size needed for a ST ring
>>> + *
>>> + * This function returns the number of bytes needed for a ST ring,
>>> +given
>>> + * the number of elements in it. This value is the sum of the size of
>>> + * the structure rte_st_ring and the size of the memory needed by the
>>> + * elements. The value is aligned to a cache line size.
>>> + *
>>> + * @param count
>>> + *   The number of elements in the ring (must be a power of 2).
>>> + * @return
>>> + *   - The memory size needed for the ST ring on success.
>>> + *   - -EINVAL if count is not a power of 2.
>>> + */
>>> +ssize_t rte_st_ring_get_memsize(unsigned int count);
>>> +
>>> +/**
>>> + * Initialize a ST ring structure.
>>> + *
>>> + * Initialize a ST ring structure in memory pointed by "r". The size
>>> +of the
>>> + * memory area must be large enough to store the ring structure and
>>> +the
>>> + * object table. It is advised to use rte_st_ring_get_memsize() to
>>> +get the
>>> + * appropriate size.
>>> + *
>>> + * The ST ring size is set to *count*, which must be a power of two.
>>> + * The real usable ring size is *count-1* instead of *count* to
>>> + * differentiate a full ring from an empty ring.
>>> + *
>>> + * The ring is not added in RTE_TAILQ_ST_RING global list. Indeed,
>>> +the
>>> + * memory given by the caller may not be shareable among dpdk
>>> + * processes.
>>> + *
>>> + * @param r
>>> + *   The pointer to the ring structure followed by the elements table.
>>> + * @param name
>>> + *   The name of the ring.
>>> + * @param count
>>> + *   The number of elements in the ring (must be a power of 2,
>>> + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
>>> + * @param flags
>>> + *   An OR of the following:
>>> + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold
>>> + *     exactly the requested number of entries, and the requested size
>>> + *     will be rounded up to the next power of two, but the usable space
>>> + *     will be exactly that requested. Worst case, if a power-of-2 size is
>>> + *     requested, half the ring space will be wasted.
>>> + *     Without this flag set, the ring size requested must be a power of 2,
>>> + *     and the usable space will be that size - 1.
>>> + * @return
>>> + *   0 on success, or a negative value on error.
>>> + */
>>> +int rte_st_ring_init(struct rte_st_ring *r, const char *name,
>>> +	unsigned int count, unsigned int flags);
>>> +
>>> +/**
>>> + * Create a new ST ring named *name* in memory.
>>> + *
>>> + * This function uses ``memzone_reserve()`` to allocate memory. Then
>>> +it
>>> + * calls rte_st_ring_init() to initialize an empty ring.
>>> + *
>>> + * The new ring size is set to *count*, which must be a power of two.
>>> + * The real usable ring size is *count-1* instead of *count* to
>>> + * differentiate a full ring from an empty ring.
>>> + *
>>> + * The ring is added in RTE_TAILQ_ST_RING list.
>>> + *
>>> + * @param name
>>> + *   The name of the ring.
>>> + * @param count
>>> + *   The size of the ring (must be a power of 2,
>>> + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
>>> + * @param socket_id
>>> + *   The *socket_id* argument is the socket identifier in case of
>>> + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
>>> + *   constraint for the reserved zone.
>>> + * @param flags
>>> + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold exactly
>> the
>>> + *     requested number of entries, and the requested size will be rounded
>> up
>>> + *     to the next power of two, but the usable space will be exactly that
>>> + *     requested. Worst case, if a power-of-2 size is requested, half the
>>> + *     ring space will be wasted.
>>> + *     Without this flag set, the ring size requested must be a power of 2,
>>> + *     and the usable space will be that size - 1.
>>> + * @return
>>> + *   On success, the pointer to the new allocated ring. NULL on error with
>>> + *    rte_errno set appropriately. Possible errno values include:
>>> + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config
>> structure
>>> + *    - EINVAL - count provided is not a power of 2
>>> + *    - ENOSPC - the maximum number of memzones has already been
>> allocated
>>> + *    - EEXIST - a memzone with the same name already exists
>>> + *    - ENOMEM - no appropriate memory area found in which to create
>> memzone
>>> + */
>>> +struct rte_st_ring *rte_st_ring_create(const char *name, unsigned int
>> count,
>>> +				 int socket_id, unsigned int flags);
>>> +
>>> +/**
>>> + * De-allocate all memory used by the ring.
>>> + *
>>> + * @param r
>>> + *   Ring to free.
>>> + *   If NULL then, the function does nothing.
>>> + */
>>> +void rte_st_ring_free(struct rte_st_ring *r);
>>> +
>>> +/**
>>> + * Dump the status of the ring to a file.
>>> + *
>>> + * @param f
>>> + *   A pointer to a file for output
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + */
>>> +void rte_st_ring_dump(FILE *f, const struct rte_st_ring *r);
>>> +
>>> +/**
>>> + * Enqueue fixed number of objects on a ST ring.
>>> + *
>>> + * This function copies the objects at the head of the ring and
>>> + * moves the head index.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_table
>>> + *   A pointer to a table of void * pointers (objects).
>>> + * @param n
>>> + *   The number of objects to add in the ring from the obj_table.
>>> + * @param free_space
>>> + *   if non-NULL, returns the amount of space in the ring after the
>>> + *   enqueue operation has finished.
>>> + * @return
>>> + *   The number of objects enqueued, either 0 or n
>>> + */
>>> +static __rte_always_inline unsigned int
>>> +rte_st_ring_enqueue_bulk(struct rte_st_ring *r, void * const *obj_table,
>>> +		      unsigned int n, unsigned int *free_space) {
>>> +	return rte_st_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
>>> +			n, free_space);
>>> +}
>>> +
>>> +/**
>>> + * Enqueue upto a maximum number of objects on a ST ring.
>>> + *
>>> + * This function copies the objects at the head of the ring and
>>> + * moves the head index.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_table
>>> + *   A pointer to a table of void * pointers (objects).
>>> + * @param n
>>> + *   The number of objects to add in the ring from the obj_table.
>>> + * @param free_space
>>> + *   if non-NULL, returns the amount of space in the ring after the
>>> + *   enqueue operation has finished.
>>> + * @return
>>> + *   - n: Actual number of objects enqueued.
>>> + */
>>> +static __rte_always_inline unsigned int
>>> +rte_st_ring_enqueue_burst(struct rte_st_ring *r, void * const *obj_table,
>>> +		      unsigned int n, unsigned int *free_space) {
>>> +	return rte_st_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
>>> +			n, free_space);
>>> +}
>>> +
>>> +/**
>>> + * Enqueue one object on a ST ring.
>>> + *
>>> + * This function copies one object at the head of the ring and
>>> + * moves the head index.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj
>>> + *   A pointer to the object to be added.
>>> + * @return
>>> + *   - 0: Success; objects enqueued.
>>> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
>> enqueued.
>>> + */
>>> +static __rte_always_inline int
>>> +rte_st_ring_enqueue(struct rte_st_ring *r, void *obj) {
>>> +	return rte_st_ring_enqueue_elem(r, &obj, sizeof(void *)); }
>>> +
>>> +/**
>>> + * Enqueue fixed number of objects on a ST ring at the tail.
>>> + *
>>> + * This function copies the objects at the tail of the ring and
>>> + * moves the tail index (backwards).
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_table
>>> + *   A pointer to a table of void * pointers (objects).
>>> + * @param n
>>> + *   The number of objects to add in the ring from the obj_table.
>>> + * @param free_space
>>> + *   if non-NULL, returns the amount of space in the ring after the
>>> + *   enqueue operation has finished.
>>> + * @return
>>> + *   The number of objects enqueued, either 0 or n
>>> + */
>>> +static __rte_always_inline unsigned int
>>> +rte_st_ring_enqueue_at_tail_bulk(struct rte_st_ring *r,
>>> +				 void * const *obj_table, unsigned int n,
>>> +				 unsigned int *free_space)
>>> +{
>>> +	return rte_st_ring_enqueue_at_tail_bulk_elem(r, obj_table,
>>> +			sizeof(void *), n, free_space);
>>> +}
>>> +
>>> +/**
>>> + * Enqueue upto a maximum number of objects on a ST ring at the tail.
>>> + *
>>> + * This function copies the objects at the tail of the ring and
>>> + * moves the tail index (backwards).
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_table
>>> + *   A pointer to a table of void * pointers (objects).
>>> + * @param n
>>> + *   The number of objects to add in the ring from the obj_table.
>>> + * @param free_space
>>> + *   if non-NULL, returns the amount of space in the ring after the
>>> + *   enqueue operation has finished.
>>> + * @return
>>> + *   - n: Actual number of objects enqueued.
>>> + */
>>> +static __rte_always_inline unsigned int
>>> +rte_st_ring_enqueue_at_tail_burst(struct rte_st_ring *r,
>>> +				  void * const *obj_table, unsigned int n,
>>> +				  unsigned int *free_space)
>>> +{
>>> +	return rte_st_ring_enqueue_at_tail_burst_elem(r, obj_table,
>>> +			sizeof(void *), n, free_space);
>>> +}
>>> +
>>> +/**
>>> + * Enqueue one object on a ST ring at tail.
>>> + *
>>> + * This function copies one object at the tail of the ring and
>>> + * moves the tail index (backwards).
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj
>>> + *   A pointer to the object to be added.
>>> + * @return
>>> + *   - 0: Success; objects enqueued.
>>> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
>> enqueued.
>>> + */
>>> +static __rte_always_inline int
>>> +rte_st_ring_enqueue_at_tail(struct rte_st_ring *r, void *obj) {
>>> +	return rte_st_ring_enqueue_at_tail_elem(r, &obj, sizeof(void *)); }
>>> +
>>> +/**
>>> + * Dequeue a fixed number of objects from a ST ring.
>>> + *
>>> + * This function copies the objects from the tail of the ring and
>>> + * moves the tail index.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_table
>>> + *   A pointer to a table of void * pointers (objects) that will be filled.
>>> + * @param n
>>> + *   The number of objects to dequeue from the ring to the obj_table.
>>> + * @param available
>>> + *   If non-NULL, returns the number of remaining ring entries after the
>>> + *   dequeue has finished.
>>> + * @return
>>> + *   The number of objects dequeued, either 0 or n
>>> + */
>>> +static __rte_always_inline unsigned int
>>> +rte_st_ring_dequeue_bulk(struct rte_st_ring *r, void **obj_table,
>> unsigned int n,
>>> +		unsigned int *available)
>>> +{
>>> +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
>>> +			n, available);
>>> +}
>>> +
>>> +/**
>>> + * Dequeue upto a maximum number of objects from a ST ring.
>>> + *
>>> + * This function copies the objects from the tail of the ring and
>>> + * moves the tail index.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_table
>>> + *   A pointer to a table of void * pointers (objects) that will be filled.
>>> + * @param n
>>> + *   The number of objects to dequeue from the ring to the obj_table.
>>> + * @param available
>>> + *   If non-NULL, returns the number of remaining ring entries after the
>>> + *   dequeue has finished.
>>> + * @return
>>> + *   - Number of objects dequeued
>>> + */
>>> +static __rte_always_inline unsigned int
>>> +rte_st_ring_dequeue_burst(struct rte_st_ring *r, void **obj_table,
>>> +		unsigned int n, unsigned int *available) {
>>> +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
>>> +			n, available);
>>> +}
>>> +
>>> +/**
>>> + * Dequeue one object from a ST ring.
>>> + *
>>> + * This function copies one object from the tail of the ring and
>>> + * moves the tail index.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_p
>>> + *   A pointer to a void * pointer (object) that will be filled.
>>> + * @return
>>> + *   - 0: Success, objects dequeued.
>>> + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
>>> + *     dequeued.
>>> + */
>>> +static __rte_always_inline int
>>> +rte_st_ring_dequeue(struct rte_st_ring *r, void **obj_p) {
>>> +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *)); }
>>> +
>>> +/**
>>> + * Dequeue a fixed number of objects from a ST ring from the head.
>>> + *
>>> + * This function copies the objects from the head of the ring and
>>> + * moves the head index (backwards).
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_table
>>> + *   A pointer to a table of void * pointers (objects) that will be filled.
>>> + * @param n
>>> + *   The number of objects to dequeue from the ring to the obj_table.
>>> + * @param available
>>> + *   If non-NULL, returns the number of remaining ring entries after the
>>> + *   dequeue has finished.
>>> + * @return
>>> + *   The number of objects dequeued, either 0 or n
>>> + */
>>> +static __rte_always_inline unsigned int
>>> +rte_st_ring_dequeue_at_head_bulk(struct rte_st_ring *r, void
>> **obj_table, unsigned int n,
>>> +		unsigned int *available)
>>> +{
>>> +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
>>> +			n, available);
>>> +}
>>> +
>>> +/**
>>> + * Dequeue upto a maximum number of objects from a ST ring from the
>> head.
>>> + *
>>> + * This function copies the objects from the head of the ring and
>>> + * moves the head index (backwards).
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_table
>>> + *   A pointer to a table of void * pointers (objects) that will be filled.
>>> + * @param n
>>> + *   The number of objects to dequeue from the ring to the obj_table.
>>> + * @param available
>>> + *   If non-NULL, returns the number of remaining ring entries after the
>>> + *   dequeue has finished.
>>> + * @return
>>> + *   - Number of objects dequeued
>>> + */
>>> +static __rte_always_inline unsigned int
>>> +rte_st_ring_dequeue_at_head_burst(struct rte_st_ring *r, void
>> **obj_table,
>>> +		unsigned int n, unsigned int *available) {
>>> +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
>>> +			n, available);
>>> +}
>>> +
>>> +/**
>>> + * Dequeue one object from a ST ring from the head.
>>> + *
>>> + * This function copies the objects from the head of the ring and
>>> + * moves the head index (backwards).
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @param obj_p
>>> + *   A pointer to a void * pointer (object) that will be filled.
>>> + * @return
>>> + *   - 0: Success, objects dequeued.
>>> + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
>>> + *     dequeued.
>>> + */
>>> +static __rte_always_inline int
>>> +rte_st_ring_at_head_dequeue(struct rte_st_ring *r, void **obj_p) {
>>> +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *)); }
>>> +
>>> +/**
>>> + * Flush a ST ring.
>>> + *
>>> + * This function flush all the elements in a ST ring
>>> + *
>>> + * @warning
>>> + * Make sure the ring is not in use while calling this function.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + */
>>> +void
>>> +rte_st_ring_reset(struct rte_st_ring *r);
>>> +
>>> +/**
>>> + * Return the number of entries in a ST ring.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @return
>>> + *   The number of entries in the ring.
>>> + */
>>> +static inline unsigned int
>>> +rte_st_ring_count(const struct rte_st_ring *r) {
>>> +	uint32_t count = (r->head - r->tail) & r->mask;
>>> +	return count;
>>> +}
>>> +
>>> +/**
>>> + * Return the number of free entries in a ST ring.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @return
>>> + *   The number of free entries in the ring.
>>> + */
>>> +static inline unsigned int
>>> +rte_st_ring_free_count(const struct rte_st_ring *r) {
>>> +	return r->capacity - rte_st_ring_count(r); }
>>> +
>>> +/**
>>> + * Test if a ST ring is full.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @return
>>> + *   - 1: The ring is full.
>>> + *   - 0: The ring is not full.
>>> + */
>>> +static inline int
>>> +rte_st_ring_full(const struct rte_st_ring *r) {
>>> +	return rte_st_ring_free_count(r) == 0; }
>>> +
>>> +/**
>>> + * Test if a ST ring is empty.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @return
>>> + *   - 1: The ring is empty.
>>> + *   - 0: The ring is not empty.
>>> + */
>>> +static inline int
>>> +rte_st_ring_empty(const struct rte_st_ring *r) {
>>> +	return r->tail == r->head;
>>> +}
>>> +
>>> +/**
>>> + * Return the size of the ring.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @return
>>> + *   The size of the data store used by the ring.
>>> + *   NOTE: this is not the same as the usable space in the ring. To query that
>>> + *   use ``rte_st_ring_get_capacity()``.
>>> + */
>>> +static inline unsigned int
>>> +rte_st_ring_get_size(const struct rte_st_ring *r) {
>>> +	return r->size;
>>> +}
>>> +
>>> +/**
>>> + * Return the number of elements which can be stored in the ring.
>>> + *
>>> + * @param r
>>> + *   A pointer to the ring structure.
>>> + * @return
>>> + *   The usable size of the ring.
>>> + */
>>> +static inline unsigned int
>>> +rte_st_ring_get_capacity(const struct rte_st_ring *r) {
>>> +	return r->capacity;
>>> +}
>>> +
>>> +/**
>>> + * Dump the status of all rings on the console
>>> + *
>>> + * @param f
>>> + *   A pointer to a file for output
>>> + */
>>> +void rte_st_ring_list_dump(FILE *f);
>>> +
>>> +/**
>>> + * Search a ST ring from its name
>>> + *
>>> + * @param name
>>> + *   The name of the ring.
>>> + * @return
>>> + *   The pointer to the ring matching the name, or NULL if not found,
>>> + *   with rte_errno set appropriately. Possible rte_errno values include:
>>> + *    - ENOENT - required entry not available to return.
>>> + */
>>> +struct rte_st_ring *rte_st_ring_lookup(const char *name);
>>> +
>>> +#ifdef __cplusplus
>>> +}
>>> +#endif
>>> +
>>> +#endif /* _RTE_ST_RING_H_ */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-08-22  8:04     ` Mattias Rönnblom
@ 2023-08-22 16:28       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 48+ messages in thread
From: Honnappa Nagarahalli @ 2023-08-22 16:28 UTC (permalink / raw)
  To: Mattias Rönnblom, jackmin, konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi,
	Wathsala Wathawana Vithanage, nd, nd

<snip>

> >> Subject: Re: [RFC] lib/st_ring: add single thread ring
> >>
> >> On 2023-08-21 08:04, Honnappa Nagarahalli wrote:
> >>> Add a single thread safe and multi-thread unsafe ring data structure.
> >>
> >> One must have set the bar very low, if one needs to specify that an
> >> API is single-thread safe.
> >>
> >>> This library provides an simple and efficient alternative to
> >>> multi-thread safe ring when multi-thread safety is not required.
> >>>
> >>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >>> ---
> >>> v1:
> >>> 1) The code is very prelimnary and is not even compiled
> >>> 2) This is intended to show the APIs and some thoughts on
> >>> implementation
> >>
> >> If you haven't done it already, maybe it might be worth looking
> >> around in the code base for already-existing, more-or-less open-coded
> >> fifo/circular buffer type data structures. Just to make sure those
> >> can be eliminated if this makes it into DPDK.
> >>
> >> There's one in rte_event_eth_rx_adapter.c, and I think one in the SW
> >> eventdev as well. Seems to be one in cmdline_cirbuf.h as well. I'm
> >> sure there are many more.
> > I knew there are some, but have not looked at them yet. I will look at them.
> >
> >>
> >> You could pick some other name for it, instead of the slightly
> >> awkward "st_ring" (e.g., "fifo", "cbuf", "cbuffer", "circ_buffer").
> >> That would also leave you with more freedom to stray from the MT safe
> >> ring API without surprising the user, if needed (and I think it is needed).
> > The thought was to make it clear that this is for single-thread use (i.e.not even
> producer and consumer on different threads), may be I do not need to try hard.
> > "fifo" might not be good option given that dequeue/enqueue at both ends of
> the ring are required/allowed.
> > Wikipedia [1] and others [2], [3] indicates that this data structure
> > should be called 'deque' (pronounced as deck). I would prefer to go
> > with this (assuming this will be outside of 'rte_ring')
> >
> > [1] https://en.wikipedia.org/wiki/Double-ended_queue
> > [2]
> > https://www.geeksforgeeks.org/deque-set-1-introduction-applications/
> > [3] https://stackoverflow.com/questions/3880254/why-do-we-need-deque-
> data-structures-in-the-real-
> world#:~:text=A%20Deque%20is%20a%20double,thing%20on%20front%20of
> %20queue.
> >
> >>
> >> Hopefully you can reduce API complexity compared to the MT-safe version.
> >> Having a name for these kinds of data structures doesn't make a lot
> >> of sense, for example. Skip the dump function. Relax from
> >> always_inline to just regular inline.
> > Yes, plan is to reduce complexity (compared to rte_ring) and some APIs can be
> skipped until there is a need.
> >
> >>
> >> I'm not sure you need bulk/burst type operations. Without any memory
> >> fences, an optimizing compiler should do a pretty good job of
> >> unrolling multiple-element access type operations, assuming you leave
> >> the ST ring code in the header files (otherwise LTO is needed).
> > IMO, bulk/burst APIs are about the functionality rather than loop unrolling.
> APIs to work with single objects can be skipped (use bulk APIs with n=1).
> >
> 
> Given that this data structure will often be use in conjunction with other
> burst/bulk type operations, I agree.
> 
> What about peek? I guess you could have a burst/bulk peek as well, would that
> operation be needed. I think it will be needed, but the introduction of such API
> elements could always be deferred.
Yes, I will provide this functionality, but will differ for later.

> 
> >>
> >> I think you will want a peek-type operation on the reader side. That
> >> more for convenience, rather than that I think the copies will
> >> actually be there in the object code (such should be eliminated by
> >> the compiler, given that the barriers are gone).
> >>
> >>> 3) More APIs and the rest of the implementation will come in subsequent
> >>>      versions
> >>>
> >>>    lib/st_ring/rte_st_ring.h | 567
> >> ++++++++++++++++++++++++++++++++++++++
> >>>    1 file changed, 567 insertions(+)
> >>>    create mode 100644 lib/st_ring/rte_st_ring.h
> >>>
> >>> diff --git a/lib/st_ring/rte_st_ring.h b/lib/st_ring/rte_st_ring.h
> >>> new file mode 100644 index 0000000000..8cb8832591
> >>> --- /dev/null
> >>> +++ b/lib/st_ring/rte_st_ring.h
> >>> @@ -0,0 +1,567 @@
> >>> +/* SPDX-License-Identifier: BSD-3-Clause
> >>> + * Copyright(c) 2023 Arm Limited
> >>> + */
> >>> +
> >>> +#ifndef _RTE_ST_RING_H_
> >>> +#define _RTE_ST_RING_H_
> >>> +
> >>> +/**
> >>> + * @file
> >>> + * RTE Signle Thread Ring (ST Ring)
> >>> + *
> >>> + * The ST Ring is a fixed-size queue intended to be accessed
> >>> + * by one thread at a time. It does not provide concurrent access
> >>> +to
> >>> + * multiple threads. If there are multiple threads accessing the ST
> >>> +ring,
> >>> + * then the threads have to use locks to protect the ring from
> >>> + * getting corrupted.
> >>
> >> You are basically saying the same thing three times here.
> >>
> >>> + *
> >>> + * - FIFO (First In First Out)
> >>> + * - Maximum size is fixed; the pointers are stored in a table.
> >>> + * - Consumer and producer part of same thread.
> >>> + * - Multi-thread producers and consumers need locking.
> >>
> >> ...two more times here. One might get the impression you really don't
> >> trust the reader.
> >>
> >>> + * - Single/Bulk/burst dequeue at Tail or Head
> >>> + * - Single/Bulk/burst enqueue at Head or Tail
> >>
> >> Does this not sound more like a deque, than a FIFO/circular buffer?
> >> Are there any examples where this functionality (the
> >> double-endedness) is needed in the DPDK code base?
> > I see, you are calling it 'deque' as well. Basically, this patch
> > originated due to a requirement in MLX PMD [1]
> >
> > [1]
> > https://github.com/DPDK/dpdk/blob/main/drivers/net/mlx5/mlx5_hws_cnt.h
> > #L381
> >
> >>
> >>> + *
> >>> + */
> >>> +
> >>> +#ifdef __cplusplus
> >>> +extern "C" {
> >>> +#endif
> >>> +
> >>> +#include <rte_st_ring_core.h>
> >>> +#include <rte_st_ring_elem.h>
> >>
> >> Is the intention to provide a ring with compile-time variable element
> >> size? In other words, where the elements of a particular ring
> >> instance has the same element size, but different rings may have different
> element sizes.
> >>
> >> Seems like a good idea to me, in that case. Although often you will
> >> have pointers, it would be useful to store larger things like small
> >> structs, and maybe smaller elements as well.
> > Yes, the idea is to make the element size flexible and also compile-time
> constant.
> >
> >>
> >>> +
> >>> +/**

<snip>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-08-22  5:47   ` Honnappa Nagarahalli
@ 2023-08-24  8:05     ` Morten Brørup
  2023-08-24 10:52       ` Mattias Rönnblom
  0 siblings, 1 reply; 48+ messages in thread
From: Morten Brørup @ 2023-08-24  8:05 UTC (permalink / raw)
  To: Honnappa Nagarahalli, jackmin, konstantin.v.ananyev, hofors
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi,
	Wathsala Wathawana Vithanage, nd, nd

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Tuesday, 22 August 2023 07.47
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Monday, August 21, 2023 2:37 AM
> >
> > > From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
> > > Sent: Monday, 21 August 2023 08.04
> > >
> > > Add a single thread safe and multi-thread unsafe ring data structure.
> > > This library provides an simple and efficient alternative to multi-
> > > thread safe ring when multi-thread safety is not required.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > ---
> >
> > Good idea.
> >
> > However, I prefer it to be implemented in the ring lib as one more ring
> type.
> > That would also give us a lot of the infrastructure (management functions,
> > documentation and tests) for free.
> IMO, the current code for rte_ring seems complex with C11 and generic
> implementations, APIs for pointer objects vs APIs for flexible element size
> etc. I did not want to introduce one more flavor and make it more complex.

From the user perspective, I think one more ring flavor is less complex than an entirely separate (very similar) library with its own set of (very similar) APIs.

I agree that the ring lib has grown somewhat over-engineered, but please don't use that as an argument for making the same-thread ring a separate lib.

On the other hand: If the addition of an optimized same-thread ring flavor would require too many invasive modifications of the existing ring lib, I would accept that as an argument for not adding it as another ring flavor to the existing ring lib.

> The requirements are different as well. For ex: single thread ring needs APIs
> for dequeuing and enqueuing at both ends of the ring which is not applicable
> to existing RTE ring.

Yes, I will address this topic at the end of this mail.

> 
> But, I see how the existing infra can be reused easily.

This also goes for future infrastructure. I doubt that new infrastructure added to the ring lib will also be added to the same-thread ring lib... for reference, consider the PMDs containing copy-pasted code from the mempool lib... none of the later improvements of the mempool lib were implemented in those PMDs.

In essence, I think this lib overlaps the existing ring lib too much to justify making it a separate lib.

> 
> >
> > The ring lib already has performance-optimized APIs for single-consumer and
> > single-producer use, rte_ring_sc_dequeue_bulk() and
> > rte_ring_sp_enqueue_burst(). Similar performance-optimized APIs for single-
> > thread use could be added: rte_ring_st_dequeue_bulk() and
> > rte_ring_st_enqueue_burst().
> Yes, the names look fine.
> Looking through the code. We have the sync type enum:
> 
> /** prod/cons sync types */
> enum rte_ring_sync_type {
>         RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
>         RTE_RING_SYNC_ST,     /**< single thread only */
>         RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
>         RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
> };
> 
> The type RTE_RING_SYNC_ST needs better explanation (not a problem). But, this
> name would have been ideal to use for single thread ring.
> This enum does not need to be exposed to the users. However, there are
> rte_ring_get_prod/cons_sync_type etc which seem to be exposed to the user.
> This all means, we need to have a sync type name RTE_RING_SYNC_MT_UNSAFE (any
> other better name?) which then affects API naming.
> rte_ring_mt_unsafe_dequeue_bulk?

As always, naming is difficult.
The enum rte_ring_sync_type describes the producer and consumer independently, whereas this ring type uses the same thread for both producer and consumer.
I think we should avoid MT in the names for this variant. How about:

RTE_RING_SYNC_STPC /**< same thread for both producer and consumer */

And:

rte_ring_spc_dequeue_bulk() and rte_ring_spc_enqueue_burst()

> 
> >
> > Regardless if added to the ring lib or as a separate lib, "reverse" APIs
> (for single-
> > thread use only) and zero-copy APIs can be added at any time later.

As the only current use case for "reverse" (i.e. dequeue at tail, enqueue at head) APIs is for the same-thread ring flavor, we could start by adding only the specialized variants of the "reverse" APIs, rte_ring_spc_reverse_xxx(), and initially omit the generic rte_ring_reverse_xxx() APIs. (We need better names; I used "reverse" for explanation only.)


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] lib/st_ring: add single thread ring
  2023-08-24  8:05     ` Morten Brørup
@ 2023-08-24 10:52       ` Mattias Rönnblom
  2023-08-24 11:22         ` Morten Brørup
  0 siblings, 1 reply; 48+ messages in thread
From: Mattias Rönnblom @ 2023-08-24 10:52 UTC (permalink / raw)
  To: Morten Brørup, Honnappa Nagarahalli, jackmin, konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi, Wathsala Wathawana Vithanage, nd

On 2023-08-24 10:05, Morten Brørup wrote:
>> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> Sent: Tuesday, 22 August 2023 07.47
>>
>>> From: Morten Brørup <mb@smartsharesystems.com>
>>> Sent: Monday, August 21, 2023 2:37 AM
>>>
>>>> From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
>>>> Sent: Monday, 21 August 2023 08.04
>>>>
>>>> Add a single thread safe and multi-thread unsafe ring data structure.
>>>> This library provides an simple and efficient alternative to multi-
>>>> thread safe ring when multi-thread safety is not required.
>>>>
>>>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>>> ---
>>>
>>> Good idea.
>>>
>>> However, I prefer it to be implemented in the ring lib as one more ring
>> type.
>>> That would also give us a lot of the infrastructure (management functions,
>>> documentation and tests) for free.
>> IMO, the current code for rte_ring seems complex with C11 and generic
>> implementations, APIs for pointer objects vs APIs for flexible element size
>> etc. I did not want to introduce one more flavor and make it more complex.
> 
>  From the user perspective, I think one more ring flavor is less complex than an entirely separate (very similar) library with its own set of (very similar) APIs.
> 
> I agree that the ring lib has grown somewhat over-engineered, but please don't use that as an argument for making the same-thread ring a separate lib.
> 

What's being proposed is a double-ended queue, not a ring (in the DPDK 
sense).

If you want to Swiss army knifify the rte_ring further and make it a 
deque, then rte_stack should scrapped as well, since it's will become 
just a subset of the new rte_ring_now_really_a_deque.

> On the other hand: If the addition of an optimized same-thread ring flavor would require too many invasive modifications of the existing ring lib, I would accept that as an argument for not adding it as another ring flavor to the existing ring lib.
> 
>> The requirements are different as well. For ex: single thread ring needs APIs
>> for dequeuing and enqueuing at both ends of the ring which is not applicable
>> to existing RTE ring.
> 
> Yes, I will address this topic at the end of this mail.
> 
>>
>> But, I see how the existing infra can be reused easily.
> 
> This also goes for future infrastructure. I doubt that new infrastructure added to the ring lib will also be added to the same-thread ring lib... for reference, consider the PMDs containing copy-pasted code from the mempool lib... none of the later improvements of the mempool lib were implemented in those PMDs.
> 
> In essence, I think this lib overlaps the existing ring lib too much to justify making it a separate lib.
> 
>>
>>>
>>> The ring lib already has performance-optimized APIs for single-consumer and
>>> single-producer use, rte_ring_sc_dequeue_bulk() and
>>> rte_ring_sp_enqueue_burst(). Similar performance-optimized APIs for single-
>>> thread use could be added: rte_ring_st_dequeue_bulk() and
>>> rte_ring_st_enqueue_burst().
>> Yes, the names look fine.
>> Looking through the code. We have the sync type enum:
>>
>> /** prod/cons sync types */
>> enum rte_ring_sync_type {
>>          RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
>>          RTE_RING_SYNC_ST,     /**< single thread only */
>>          RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
>>          RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
>> };
>>
>> The type RTE_RING_SYNC_ST needs better explanation (not a problem). But, this
>> name would have been ideal to use for single thread ring.
>> This enum does not need to be exposed to the users. However, there are
>> rte_ring_get_prod/cons_sync_type etc which seem to be exposed to the user.
>> This all means, we need to have a sync type name RTE_RING_SYNC_MT_UNSAFE (any
>> other better name?) which then affects API naming.
>> rte_ring_mt_unsafe_dequeue_bulk?
> 
> As always, naming is difficult.
> The enum rte_ring_sync_type describes the producer and consumer independently, whereas this ring type uses the same thread for both producer and consumer.
> I think we should avoid MT in the names for this variant. How about:
> 
> RTE_RING_SYNC_STPC /**< same thread for both producer and consumer */
> 
> And:
> 
> rte_ring_spc_dequeue_bulk() and rte_ring_spc_enqueue_burst()
> 
>>
>>>
>>> Regardless if added to the ring lib or as a separate lib, "reverse" APIs
>> (for single-
>>> thread use only) and zero-copy APIs can be added at any time later.
> 
> As the only current use case for "reverse" (i.e. dequeue at tail, enqueue at head) APIs is for the same-thread ring flavor, we could start by adding only the specialized variants of the "reverse" APIs, rte_ring_spc_reverse_xxx(), and initially omit the generic rte_ring_reverse_xxx() APIs. (We need better names; I used "reverse" for explanation only.)
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-08-24 10:52       ` Mattias Rönnblom
@ 2023-08-24 11:22         ` Morten Brørup
  2023-08-26 23:34           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 48+ messages in thread
From: Morten Brørup @ 2023-08-24 11:22 UTC (permalink / raw)
  To: Mattias Rönnblom, Honnappa Nagarahalli, jackmin,
	konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi, Wathsala Wathawana Vithanage, nd

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Thursday, 24 August 2023 12.53
> 
> On 2023-08-24 10:05, Morten Brørup wrote:
> >> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> >> Sent: Tuesday, 22 August 2023 07.47
> >>
> >>> From: Morten Brørup <mb@smartsharesystems.com>
> >>> Sent: Monday, August 21, 2023 2:37 AM
> >>>
> >>>> From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
> >>>> Sent: Monday, 21 August 2023 08.04
> >>>>
> >>>> Add a single thread safe and multi-thread unsafe ring data structure.
> >>>> This library provides an simple and efficient alternative to multi-
> >>>> thread safe ring when multi-thread safety is not required.
> >>>>
> >>>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >>>> ---
> >>>
> >>> Good idea.
> >>>
> >>> However, I prefer it to be implemented in the ring lib as one more ring
> >> type.
> >>> That would also give us a lot of the infrastructure (management functions,
> >>> documentation and tests) for free.
> >> IMO, the current code for rte_ring seems complex with C11 and generic
> >> implementations, APIs for pointer objects vs APIs for flexible element size
> >> etc. I did not want to introduce one more flavor and make it more complex.
> >
> >  From the user perspective, I think one more ring flavor is less complex
> than an entirely separate (very similar) library with its own set of (very
> similar) APIs.
> >
> > I agree that the ring lib has grown somewhat over-engineered, but please
> don't use that as an argument for making the same-thread ring a separate lib.
> >
> 
> What's being proposed is a double-ended queue, not a ring (in the DPDK
> sense).
> 
> If you want to Swiss army knifify the rte_ring further and make it a
> deque, then rte_stack should scrapped as well, since it's will become
> just a subset of the new rte_ring_now_really_a_deque.

OK. I accept that argument for not hacking it into the ring lib.

Then I will suggest that the new "deque" library should be designed with multi-threading in mind, like its two sibling libs (ring and stack). This makes it easier to use, and leaves it open for expansion to other flavors in the future.

It is perfectly acceptable that the first version only supports the same-thread deque flavor, and only the same-thread specialized APIs are exposed. I don't require any APIs or implementations supporting single-threaded (individual producer/consumer threads) or multi-threaded flavors, I only request that the design and API resemble those of its two sibling libraries. (And if there are no use cases for multi-threading flavors, they might never be added to this lib.)

> 
> > On the other hand: If the addition of an optimized same-thread ring flavor
> would require too many invasive modifications of the existing ring lib, I
> would accept that as an argument for not adding it as another ring flavor to
> the existing ring lib.
> >
> >> The requirements are different as well. For ex: single thread ring needs
> APIs
> >> for dequeuing and enqueuing at both ends of the ring which is not
> applicable
> >> to existing RTE ring.
> >
> > Yes, I will address this topic at the end of this mail.
> >
> >>
> >> But, I see how the existing infra can be reused easily.
> >
> > This also goes for future infrastructure. I doubt that new infrastructure
> added to the ring lib will also be added to the same-thread ring lib... for
> reference, consider the PMDs containing copy-pasted code from the mempool
> lib... none of the later improvements of the mempool lib were implemented in
> those PMDs.
> >
> > In essence, I think this lib overlaps the existing ring lib too much to
> justify making it a separate lib.
> >
> >>
> >>>
> >>> The ring lib already has performance-optimized APIs for single-consumer
> and
> >>> single-producer use, rte_ring_sc_dequeue_bulk() and
> >>> rte_ring_sp_enqueue_burst(). Similar performance-optimized APIs for
> single-
> >>> thread use could be added: rte_ring_st_dequeue_bulk() and
> >>> rte_ring_st_enqueue_burst().
> >> Yes, the names look fine.
> >> Looking through the code. We have the sync type enum:
> >>
> >> /** prod/cons sync types */
> >> enum rte_ring_sync_type {
> >>          RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> >>          RTE_RING_SYNC_ST,     /**< single thread only */
> >>          RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
> >>          RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
> >> };
> >>
> >> The type RTE_RING_SYNC_ST needs better explanation (not a problem). But,
> this
> >> name would have been ideal to use for single thread ring.
> >> This enum does not need to be exposed to the users. However, there are
> >> rte_ring_get_prod/cons_sync_type etc which seem to be exposed to the user.
> >> This all means, we need to have a sync type name RTE_RING_SYNC_MT_UNSAFE
> (any
> >> other better name?) which then affects API naming.
> >> rte_ring_mt_unsafe_dequeue_bulk?
> >
> > As always, naming is difficult.
> > The enum rte_ring_sync_type describes the producer and consumer
> independently, whereas this ring type uses the same thread for both producer
> and consumer.
> > I think we should avoid MT in the names for this variant. How about:
> >
> > RTE_RING_SYNC_STPC /**< same thread for both producer and consumer */
> >
> > And:
> >
> > rte_ring_spc_dequeue_bulk() and rte_ring_spc_enqueue_burst()
> >
> >>
> >>>
> >>> Regardless if added to the ring lib or as a separate lib, "reverse" APIs
> >> (for single-
> >>> thread use only) and zero-copy APIs can be added at any time later.
> >
> > As the only current use case for "reverse" (i.e. dequeue at tail, enqueue at
> head) APIs is for the same-thread ring flavor, we could start by adding only
> the specialized variants of the "reverse" APIs, rte_ring_spc_reverse_xxx(),
> and initially omit the generic rte_ring_reverse_xxx() APIs. (We need better
> names; I used "reverse" for explanation only.)
> >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-08-24 11:22         ` Morten Brørup
@ 2023-08-26 23:34           ` Honnappa Nagarahalli
  0 siblings, 0 replies; 48+ messages in thread
From: Honnappa Nagarahalli @ 2023-08-26 23:34 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, jackmin, konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi,
	Wathsala Wathawana Vithanage, nd, nd

<snip>

> 
> > From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> > Sent: Thursday, 24 August 2023 12.53
> >
> > On 2023-08-24 10:05, Morten Brørup wrote:
> > >> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > >> Sent: Tuesday, 22 August 2023 07.47
> > >>
> > >>> From: Morten Brørup <mb@smartsharesystems.com>
> > >>> Sent: Monday, August 21, 2023 2:37 AM
> > >>>
> > >>>> From: Honnappa Nagarahalli
> [mailto:honnappa.nagarahalli@arm.com]
> > >>>> Sent: Monday, 21 August 2023 08.04
> > >>>>
> > >>>> Add a single thread safe and multi-thread unsafe ring data structure.
> > >>>> This library provides an simple and efficient alternative to
> > >>>> multi- thread safe ring when multi-thread safety is not required.
> > >>>>
> > >>>> Signed-off-by: Honnappa Nagarahalli
> > >>>> <honnappa.nagarahalli@arm.com>
> > >>>> ---
> > >>>
> > >>> Good idea.
> > >>>
> > >>> However, I prefer it to be implemented in the ring lib as one more
> > >>> ring
> > >> type.
> > >>> That would also give us a lot of the infrastructure (management
> > >>> functions, documentation and tests) for free.
> > >> IMO, the current code for rte_ring seems complex with C11 and
> > >> generic implementations, APIs for pointer objects vs APIs for
> > >> flexible element size etc. I did not want to introduce one more flavor and
> make it more complex.
> > >
> > >  From the user perspective, I think one more ring flavor is less
> > > complex
> > than an entirely separate (very similar) library with its own set of
> > (very
> > similar) APIs.
> > >
> > > I agree that the ring lib has grown somewhat over-engineered, but
> > > please
> > don't use that as an argument for making the same-thread ring a separate
> lib.
> > >
> >
> > What's being proposed is a double-ended queue, not a ring (in the DPDK
> > sense).
> >
> > If you want to Swiss army knifify the rte_ring further and make it a
> > deque, then rte_stack should scrapped as well, since it's will become
> > just a subset of the new rte_ring_now_really_a_deque.
> 
> OK. I accept that argument for not hacking it into the ring lib.
> 
> Then I will suggest that the new "deque" library should be designed with
> multi-threading in mind, like its two sibling libs (ring and stack). This makes it
> easier to use, and leaves it open for expansion to other flavors in the future.
> 
> It is perfectly acceptable that the first version only supports the same-thread
> deque flavor, and only the same-thread specialized APIs are exposed. I don't
> require any APIs or implementations supporting single-threaded (individual
> producer/consumer threads) or multi-threaded flavors, I only request that the
> design and API resemble those of its two sibling libraries. (And if there are no
> use cases for multi-threading flavors, they might never be added to this lib.)
+1, will aim for this

> 
> >
> > > On the other hand: If the addition of an optimized same-thread ring
> > > flavor
> > would require too many invasive modifications of the existing ring
> > lib, I would accept that as an argument for not adding it as another
> > ring flavor to the existing ring lib.
> > >
> > >> The requirements are different as well. For ex: single thread ring
> > >> needs
> > APIs
> > >> for dequeuing and enqueuing at both ends of the ring which is not
> > applicable
> > >> to existing RTE ring.
> > >
> > > Yes, I will address this topic at the end of this mail.
> > >
> > >>
> > >> But, I see how the existing infra can be reused easily.
> > >
> > > This also goes for future infrastructure. I doubt that new
> > > infrastructure
> > added to the ring lib will also be added to the same-thread ring
> > lib... for reference, consider the PMDs containing copy-pasted code
> > from the mempool lib... none of the later improvements of the mempool
> > lib were implemented in those PMDs.
> > >
> > > In essence, I think this lib overlaps the existing ring lib too much
> > > to
> > justify making it a separate lib.
> > >
> > >>
> > >>>
> > >>> The ring lib already has performance-optimized APIs for
> > >>> single-consumer
> > and
> > >>> single-producer use, rte_ring_sc_dequeue_bulk() and
> > >>> rte_ring_sp_enqueue_burst(). Similar performance-optimized APIs
> > >>> for
> > single-
> > >>> thread use could be added: rte_ring_st_dequeue_bulk() and
> > >>> rte_ring_st_enqueue_burst().
> > >> Yes, the names look fine.
> > >> Looking through the code. We have the sync type enum:
> > >>
> > >> /** prod/cons sync types */
> > >> enum rte_ring_sync_type {
> > >>          RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> > >>          RTE_RING_SYNC_ST,     /**< single thread only */
> > >>          RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
> > >>          RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
> > >> };
> > >>
> > >> The type RTE_RING_SYNC_ST needs better explanation (not a problem).
> > >> But,
> > this
> > >> name would have been ideal to use for single thread ring.
> > >> This enum does not need to be exposed to the users. However, there
> > >> are rte_ring_get_prod/cons_sync_type etc which seem to be exposed to
> the user.
> > >> This all means, we need to have a sync type name
> > >> RTE_RING_SYNC_MT_UNSAFE
> > (any
> > >> other better name?) which then affects API naming.
> > >> rte_ring_mt_unsafe_dequeue_bulk?
> > >
> > > As always, naming is difficult.
> > > The enum rte_ring_sync_type describes the producer and consumer
> > independently, whereas this ring type uses the same thread for both
> > producer and consumer.
> > > I think we should avoid MT in the names for this variant. How about:
> > >
> > > RTE_RING_SYNC_STPC /**< same thread for both producer and consumer
> > > */
> > >
> > > And:
> > >
> > > rte_ring_spc_dequeue_bulk() and rte_ring_spc_enqueue_burst()
> > >
> > >>
> > >>>
> > >>> Regardless if added to the ring lib or as a separate lib,
> > >>> "reverse" APIs
> > >> (for single-
> > >>> thread use only) and zero-copy APIs can be added at any time later.
> > >
> > > As the only current use case for "reverse" (i.e. dequeue at tail,
> > > enqueue at
> > head) APIs is for the same-thread ring flavor, we could start by
> > adding only the specialized variants of the "reverse" APIs,
> > rte_ring_spc_reverse_xxx(), and initially omit the generic
> > rte_ring_reverse_xxx() APIs. (We need better names; I used "reverse"
> > for explanation only.)
> > >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-08-21  6:04 [RFC] lib/st_ring: add single thread ring Honnappa Nagarahalli
  2023-08-21  7:37 ` Morten Brørup
  2023-08-21 21:14 ` Mattias Rönnblom
@ 2023-09-04 10:13 ` Konstantin Ananyev
  2023-09-04 18:10   ` Honnappa Nagarahalli
  2024-04-01  1:37 ` [PATCH v1 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  3 siblings, 1 reply; 48+ messages in thread
From: Konstantin Ananyev @ 2023-09-04 10:13 UTC (permalink / raw)
  To: Honnappa Nagarahalli, jackmin, konstantin.v.ananyev
  Cc: dev, ruifeng.wang, aditya.ambadipudi, wathsala.vithanage, nd



> Add a single thread safe and multi-thread unsafe ring data structure.
> This library provides an simple and efficient alternative to multi-thread
> safe ring when multi-thread safety is not required.

Just a thought: do we really need whole new library for that?
From what I understand all we need right now just one extra function:
rte_ring_mt_unsafe_prod_deque(...)
Sorry for ugly name :)
To dequeue N elems from prod.tail.
Or you think there would be some extra advantages in ST version of the ring:
extra usages, better performance, etc.?

> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
> v1:
> 1) The code is very prelimnary and is not even compiled
> 2) This is intended to show the APIs and some thoughts on implementation
> 3) More APIs and the rest of the implementation will come in subsequent
>    versions
> 
>  lib/st_ring/rte_st_ring.h | 567 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 567 insertions(+)
>  create mode 100644 lib/st_ring/rte_st_ring.h
> 
> diff --git a/lib/st_ring/rte_st_ring.h b/lib/st_ring/rte_st_ring.h
> new file mode 100644
> index 0000000000..8cb8832591
> --- /dev/null
> +++ b/lib/st_ring/rte_st_ring.h
> @@ -0,0 +1,567 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Arm Limited
> + */
> +
> +#ifndef _RTE_ST_RING_H_
> +#define _RTE_ST_RING_H_
> +
> +/**
> + * @file
> + * RTE Signle Thread Ring (ST Ring)
> + *
> + * The ST Ring is a fixed-size queue intended to be accessed
> + * by one thread at a time. It does not provide concurrent access to
> + * multiple threads. If there are multiple threads accessing the ST ring,
> + * then the threads have to use locks to protect the ring from
> + * getting corrupted.
> + *
> + * - FIFO (First In First Out)
> + * - Maximum size is fixed; the pointers are stored in a table.
> + * - Consumer and producer part of same thread.
> + * - Multi-thread producers and consumers need locking.
> + * - Single/Bulk/burst dequeue at Tail or Head
> + * - Single/Bulk/burst enqueue at Head or Tail
> + *
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_st_ring_core.h>
> +#include <rte_st_ring_elem.h>
> +
> +/**
> + * Calculate the memory size needed for a ST ring
> + *
> + * This function returns the number of bytes needed for a ST ring, given
> + * the number of elements in it. This value is the sum of the size of
> + * the structure rte_st_ring and the size of the memory needed by the
> + * elements. The value is aligned to a cache line size.
> + *
> + * @param count
> + *   The number of elements in the ring (must be a power of 2).
> + * @return
> + *   - The memory size needed for the ST ring on success.
> + *   - -EINVAL if count is not a power of 2.
> + */
> +ssize_t rte_st_ring_get_memsize(unsigned int count);
> +
> +/**
> + * Initialize a ST ring structure.
> + *
> + * Initialize a ST ring structure in memory pointed by "r". The size of the
> + * memory area must be large enough to store the ring structure and the
> + * object table. It is advised to use rte_st_ring_get_memsize() to get the
> + * appropriate size.
> + *
> + * The ST ring size is set to *count*, which must be a power of two.
> + * The real usable ring size is *count-1* instead of *count* to
> + * differentiate a full ring from an empty ring.
> + *
> + * The ring is not added in RTE_TAILQ_ST_RING global list. Indeed, the
> + * memory given by the caller may not be shareable among dpdk
> + * processes.
> + *
> + * @param r
> + *   The pointer to the ring structure followed by the elements table.
> + * @param name
> + *   The name of the ring.
> + * @param count
> + *   The number of elements in the ring (must be a power of 2,
> + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> + * @param flags
> + *   An OR of the following:
> + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold
> + *     exactly the requested number of entries, and the requested size
> + *     will be rounded up to the next power of two, but the usable space
> + *     will be exactly that requested. Worst case, if a power-of-2 size is
> + *     requested, half the ring space will be wasted.
> + *     Without this flag set, the ring size requested must be a power of 2,
> + *     and the usable space will be that size - 1.
> + * @return
> + *   0 on success, or a negative value on error.
> + */
> +int rte_st_ring_init(struct rte_st_ring *r, const char *name,
> +	unsigned int count, unsigned int flags);
> +
> +/**
> + * Create a new ST ring named *name* in memory.
> + *
> + * This function uses ``memzone_reserve()`` to allocate memory. Then it
> + * calls rte_st_ring_init() to initialize an empty ring.
> + *
> + * The new ring size is set to *count*, which must be a power of two.
> + * The real usable ring size is *count-1* instead of *count* to
> + * differentiate a full ring from an empty ring.
> + *
> + * The ring is added in RTE_TAILQ_ST_RING list.
> + *
> + * @param name
> + *   The name of the ring.
> + * @param count
> + *   The size of the ring (must be a power of 2,
> + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> + * @param socket_id
> + *   The *socket_id* argument is the socket identifier in case of
> + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> + *   constraint for the reserved zone.
> + * @param flags
> + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold exactly the
> + *     requested number of entries, and the requested size will be rounded up
> + *     to the next power of two, but the usable space will be exactly that
> + *     requested. Worst case, if a power-of-2 size is requested, half the
> + *     ring space will be wasted.
> + *     Without this flag set, the ring size requested must be a power of 2,
> + *     and the usable space will be that size - 1.
> + * @return
> + *   On success, the pointer to the new allocated ring. NULL on error with
> + *    rte_errno set appropriately. Possible errno values include:
> + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
> + *    - EINVAL - count provided is not a power of 2
> + *    - ENOSPC - the maximum number of memzones has already been allocated
> + *    - EEXIST - a memzone with the same name already exists
> + *    - ENOMEM - no appropriate memory area found in which to create memzone
> + */
> +struct rte_st_ring *rte_st_ring_create(const char *name, unsigned int count,
> +				 int socket_id, unsigned int flags);
> +
> +/**
> + * De-allocate all memory used by the ring.
> + *
> + * @param r
> + *   Ring to free.
> + *   If NULL then, the function does nothing.
> + */
> +void rte_st_ring_free(struct rte_st_ring *r);
> +
> +/**
> + * Dump the status of the ring to a file.
> + *
> + * @param f
> + *   A pointer to a file for output
> + * @param r
> + *   A pointer to the ring structure.
> + */
> +void rte_st_ring_dump(FILE *f, const struct rte_st_ring *r);
> +
> +/**
> + * Enqueue fixed number of objects on a ST ring.
> + *
> + * This function copies the objects at the head of the ring and
> + * moves the head index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_enqueue_bulk(struct rte_st_ring *r, void * const *obj_table,
> +		      unsigned int n, unsigned int *free_space)
> +{
> +	return rte_st_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
> +}
> +
> +/**
> + * Enqueue upto a maximum number of objects on a ST ring.
> + *
> + * This function copies the objects at the head of the ring and
> + * moves the head index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_enqueue_burst(struct rte_st_ring *r, void * const *obj_table,
> +		      unsigned int n, unsigned int *free_space)
> +{
> +	return rte_st_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
> +}
> +
> +/**
> + * Enqueue one object on a ST ring.
> + *
> + * This function copies one object at the head of the ring and
> + * moves the head index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj
> + *   A pointer to the object to be added.
> + * @return
> + *   - 0: Success; objects enqueued.
> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
> + */
> +static __rte_always_inline int
> +rte_st_ring_enqueue(struct rte_st_ring *r, void *obj)
> +{
> +	return rte_st_ring_enqueue_elem(r, &obj, sizeof(void *));
> +}
> +
> +/**
> + * Enqueue fixed number of objects on a ST ring at the tail.
> + *
> + * This function copies the objects at the tail of the ring and
> + * moves the tail index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_enqueue_at_tail_bulk(struct rte_st_ring *r,
> +				 void * const *obj_table, unsigned int n,
> +				 unsigned int *free_space)
> +{
> +	return rte_st_ring_enqueue_at_tail_bulk_elem(r, obj_table,
> +			sizeof(void *), n, free_space);
> +}
> +
> +/**
> + * Enqueue upto a maximum number of objects on a ST ring at the tail.
> + *
> + * This function copies the objects at the tail of the ring and
> + * moves the tail index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_enqueue_at_tail_burst(struct rte_st_ring *r,
> +				  void * const *obj_table, unsigned int n,
> +				  unsigned int *free_space)
> +{
> +	return rte_st_ring_enqueue_at_tail_burst_elem(r, obj_table,
> +			sizeof(void *), n, free_space);
> +}
> +
> +/**
> + * Enqueue one object on a ST ring at tail.
> + *
> + * This function copies one object at the tail of the ring and
> + * moves the tail index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj
> + *   A pointer to the object to be added.
> + * @return
> + *   - 0: Success; objects enqueued.
> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
> + */
> +static __rte_always_inline int
> +rte_st_ring_enqueue_at_tail(struct rte_st_ring *r, void *obj)
> +{
> +	return rte_st_ring_enqueue_at_tail_elem(r, &obj, sizeof(void *));
> +}
> +
> +/**
> + * Dequeue a fixed number of objects from a ST ring.
> + *
> + * This function copies the objects from the tail of the ring and
> + * moves the tail index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_dequeue_bulk(struct rte_st_ring *r, void **obj_table, unsigned int n,
> +		unsigned int *available)
> +{
> +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
> +}
> +
> +/**
> + * Dequeue upto a maximum number of objects from a ST ring.
> + *
> + * This function copies the objects from the tail of the ring and
> + * moves the tail index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - Number of objects dequeued
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_dequeue_burst(struct rte_st_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available)
> +{
> +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> +			n, available);
> +}
> +
> +/**
> + * Dequeue one object from a ST ring.
> + *
> + * This function copies one object from the tail of the ring and
> + * moves the tail index.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @return
> + *   - 0: Success, objects dequeued.
> + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> + *     dequeued.
> + */
> +static __rte_always_inline int
> +rte_st_ring_dequeue(struct rte_st_ring *r, void **obj_p)
> +{
> +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *));
> +}
> +
> +/**
> + * Dequeue a fixed number of objects from a ST ring from the head.
> + *
> + * This function copies the objects from the head of the ring and
> + * moves the head index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_dequeue_at_head_bulk(struct rte_st_ring *r, void **obj_table, unsigned int n,
> +		unsigned int *available)
> +{
> +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
> +}
> +
> +/**
> + * Dequeue upto a maximum number of objects from a ST ring from the head.
> + *
> + * This function copies the objects from the head of the ring and
> + * moves the head index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - Number of objects dequeued
> + */
> +static __rte_always_inline unsigned int
> +rte_st_ring_dequeue_at_head_burst(struct rte_st_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available)
> +{
> +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> +			n, available);
> +}
> +
> +/**
> + * Dequeue one object from a ST ring from the head.
> + *
> + * This function copies the objects from the head of the ring and
> + * moves the head index (backwards).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @return
> + *   - 0: Success, objects dequeued.
> + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> + *     dequeued.
> + */
> +static __rte_always_inline int
> +rte_st_ring_at_head_dequeue(struct rte_st_ring *r, void **obj_p)
> +{
> +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *));
> +}
> +
> +/**
> + * Flush a ST ring.
> + *
> + * This function flush all the elements in a ST ring
> + *
> + * @warning
> + * Make sure the ring is not in use while calling this function.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + */
> +void
> +rte_st_ring_reset(struct rte_st_ring *r);
> +
> +/**
> + * Return the number of entries in a ST ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   The number of entries in the ring.
> + */
> +static inline unsigned int
> +rte_st_ring_count(const struct rte_st_ring *r)
> +{
> +	uint32_t count = (r->head - r->tail) & r->mask;
> +	return count;
> +}
> +
> +/**
> + * Return the number of free entries in a ST ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   The number of free entries in the ring.
> + */
> +static inline unsigned int
> +rte_st_ring_free_count(const struct rte_st_ring *r)
> +{
> +	return r->capacity - rte_st_ring_count(r);
> +}
> +
> +/**
> + * Test if a ST ring is full.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   - 1: The ring is full.
> + *   - 0: The ring is not full.
> + */
> +static inline int
> +rte_st_ring_full(const struct rte_st_ring *r)
> +{
> +	return rte_st_ring_free_count(r) == 0;
> +}
> +
> +/**
> + * Test if a ST ring is empty.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   - 1: The ring is empty.
> + *   - 0: The ring is not empty.
> + */
> +static inline int
> +rte_st_ring_empty(const struct rte_st_ring *r)
> +{
> +	return r->tail == r->head;
> +}
> +
> +/**
> + * Return the size of the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   The size of the data store used by the ring.
> + *   NOTE: this is not the same as the usable space in the ring. To query that
> + *   use ``rte_st_ring_get_capacity()``.
> + */
> +static inline unsigned int
> +rte_st_ring_get_size(const struct rte_st_ring *r)
> +{
> +	return r->size;
> +}
> +
> +/**
> + * Return the number of elements which can be stored in the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   The usable size of the ring.
> + */
> +static inline unsigned int
> +rte_st_ring_get_capacity(const struct rte_st_ring *r)
> +{
> +	return r->capacity;
> +}
> +
> +/**
> + * Dump the status of all rings on the console
> + *
> + * @param f
> + *   A pointer to a file for output
> + */
> +void rte_st_ring_list_dump(FILE *f);
> +
> +/**
> + * Search a ST ring from its name
> + *
> + * @param name
> + *   The name of the ring.
> + * @return
> + *   The pointer to the ring matching the name, or NULL if not found,
> + *   with rte_errno set appropriately. Possible rte_errno values include:
> + *    - ENOENT - required entry not available to return.
> + */
> +struct rte_st_ring *rte_st_ring_lookup(const char *name);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_ST_RING_H_ */
> --
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-09-04 10:13 ` Konstantin Ananyev
@ 2023-09-04 18:10   ` Honnappa Nagarahalli
  2023-09-05  8:19     ` Konstantin Ananyev
  0 siblings, 1 reply; 48+ messages in thread
From: Honnappa Nagarahalli @ 2023-09-04 18:10 UTC (permalink / raw)
  To: Konstantin Ananyev, jackmin, konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi,
	Wathsala Wathawana Vithanage, nd, nd



> -----Original Message-----
> From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> Sent: Monday, September 4, 2023 5:13 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> jackmin@nvidia.com; konstantin.v.ananyev@yandex.ru
> Cc: dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Aditya
> Ambadipudi <Aditya.Ambadipudi@arm.com>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com>; nd <nd@arm.com>
> Subject: RE: [RFC] lib/st_ring: add single thread ring
> 
> 
> 
> > Add a single thread safe and multi-thread unsafe ring data structure.
> > This library provides an simple and efficient alternative to
> > multi-thread safe ring when multi-thread safety is not required.
> 
> Just a thought: do we really need whole new library for that?
> From what I understand all we need right now just one extra function:
> rte_ring_mt_unsafe_prod_deque(...)
> Sorry for ugly name :)
> To dequeue N elems from prod.tail.
> Or you think there would be some extra advantages in ST version of the ring:
> extra usages, better performance, etc.?
There are multiple implementations of the ST ring being used in other parts of DPDK. Mattias Ronnblom pointed out some (distributed scheduler, eth RX adapter, cmdline) [1] existing ones which will be replaced by this one.
This implementation will not use atomic instructions, head and tail indices will be in the same cache line and it will be a double ended queue. So, I am expecting better perf and more use cases (some might not be applicable currently).

[1] https://mails.dpdk.org/archives/dev/2023-August/275003.html

> 
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> > v1:
> > 1) The code is very prelimnary and is not even compiled
> > 2) This is intended to show the APIs and some thoughts on
> > implementation
> > 3) More APIs and the rest of the implementation will come in subsequent
> >    versions
> >
> >  lib/st_ring/rte_st_ring.h | 567
> > ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 567 insertions(+)
> >  create mode 100644 lib/st_ring/rte_st_ring.h
> >
> > diff --git a/lib/st_ring/rte_st_ring.h b/lib/st_ring/rte_st_ring.h new
> > file mode 100644 index 0000000000..8cb8832591
> > --- /dev/null
> > +++ b/lib/st_ring/rte_st_ring.h
> > @@ -0,0 +1,567 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2023 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_ST_RING_H_
> > +#define _RTE_ST_RING_H_
> > +
> > +/**
> > + * @file
> > + * RTE Signle Thread Ring (ST Ring)
> > + *
> > + * The ST Ring is a fixed-size queue intended to be accessed
> > + * by one thread at a time. It does not provide concurrent access to
> > + * multiple threads. If there are multiple threads accessing the ST
> > +ring,
> > + * then the threads have to use locks to protect the ring from
> > + * getting corrupted.
> > + *
> > + * - FIFO (First In First Out)
> > + * - Maximum size is fixed; the pointers are stored in a table.
> > + * - Consumer and producer part of same thread.
> > + * - Multi-thread producers and consumers need locking.
> > + * - Single/Bulk/burst dequeue at Tail or Head
> > + * - Single/Bulk/burst enqueue at Head or Tail
> > + *
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_st_ring_core.h>
> > +#include <rte_st_ring_elem.h>
> > +
> > +/**
> > + * Calculate the memory size needed for a ST ring
> > + *
> > + * This function returns the number of bytes needed for a ST ring,
> > +given
> > + * the number of elements in it. This value is the sum of the size of
> > + * the structure rte_st_ring and the size of the memory needed by the
> > + * elements. The value is aligned to a cache line size.
> > + *
> > + * @param count
> > + *   The number of elements in the ring (must be a power of 2).
> > + * @return
> > + *   - The memory size needed for the ST ring on success.
> > + *   - -EINVAL if count is not a power of 2.
> > + */
> > +ssize_t rte_st_ring_get_memsize(unsigned int count);
> > +
> > +/**
> > + * Initialize a ST ring structure.
> > + *
> > + * Initialize a ST ring structure in memory pointed by "r". The size
> > +of the
> > + * memory area must be large enough to store the ring structure and
> > +the
> > + * object table. It is advised to use rte_st_ring_get_memsize() to
> > +get the
> > + * appropriate size.
> > + *
> > + * The ST ring size is set to *count*, which must be a power of two.
> > + * The real usable ring size is *count-1* instead of *count* to
> > + * differentiate a full ring from an empty ring.
> > + *
> > + * The ring is not added in RTE_TAILQ_ST_RING global list. Indeed,
> > +the
> > + * memory given by the caller may not be shareable among dpdk
> > + * processes.
> > + *
> > + * @param r
> > + *   The pointer to the ring structure followed by the elements table.
> > + * @param name
> > + *   The name of the ring.
> > + * @param count
> > + *   The number of elements in the ring (must be a power of 2,
> > + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> > + * @param flags
> > + *   An OR of the following:
> > + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold
> > + *     exactly the requested number of entries, and the requested size
> > + *     will be rounded up to the next power of two, but the usable space
> > + *     will be exactly that requested. Worst case, if a power-of-2 size is
> > + *     requested, half the ring space will be wasted.
> > + *     Without this flag set, the ring size requested must be a power of 2,
> > + *     and the usable space will be that size - 1.
> > + * @return
> > + *   0 on success, or a negative value on error.
> > + */
> > +int rte_st_ring_init(struct rte_st_ring *r, const char *name,
> > +	unsigned int count, unsigned int flags);
> > +
> > +/**
> > + * Create a new ST ring named *name* in memory.
> > + *
> > + * This function uses ``memzone_reserve()`` to allocate memory. Then
> > +it
> > + * calls rte_st_ring_init() to initialize an empty ring.
> > + *
> > + * The new ring size is set to *count*, which must be a power of two.
> > + * The real usable ring size is *count-1* instead of *count* to
> > + * differentiate a full ring from an empty ring.
> > + *
> > + * The ring is added in RTE_TAILQ_ST_RING list.
> > + *
> > + * @param name
> > + *   The name of the ring.
> > + * @param count
> > + *   The size of the ring (must be a power of 2,
> > + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> > + * @param socket_id
> > + *   The *socket_id* argument is the socket identifier in case of
> > + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> > + *   constraint for the reserved zone.
> > + * @param flags
> > + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold exactly
> the
> > + *     requested number of entries, and the requested size will be rounded up
> > + *     to the next power of two, but the usable space will be exactly that
> > + *     requested. Worst case, if a power-of-2 size is requested, half the
> > + *     ring space will be wasted.
> > + *     Without this flag set, the ring size requested must be a power of 2,
> > + *     and the usable space will be that size - 1.
> > + * @return
> > + *   On success, the pointer to the new allocated ring. NULL on error with
> > + *    rte_errno set appropriately. Possible errno values include:
> > + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config
> structure
> > + *    - EINVAL - count provided is not a power of 2
> > + *    - ENOSPC - the maximum number of memzones has already been
> allocated
> > + *    - EEXIST - a memzone with the same name already exists
> > + *    - ENOMEM - no appropriate memory area found in which to create
> memzone
> > + */
> > +struct rte_st_ring *rte_st_ring_create(const char *name, unsigned int count,
> > +				 int socket_id, unsigned int flags);
> > +
> > +/**
> > + * De-allocate all memory used by the ring.
> > + *
> > + * @param r
> > + *   Ring to free.
> > + *   If NULL then, the function does nothing.
> > + */
> > +void rte_st_ring_free(struct rte_st_ring *r);
> > +
> > +/**
> > + * Dump the status of the ring to a file.
> > + *
> > + * @param f
> > + *   A pointer to a file for output
> > + * @param r
> > + *   A pointer to the ring structure.
> > + */
> > +void rte_st_ring_dump(FILE *f, const struct rte_st_ring *r);
> > +
> > +/**
> > + * Enqueue fixed number of objects on a ST ring.
> > + *
> > + * This function copies the objects at the head of the ring and
> > + * moves the head index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects enqueued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_enqueue_bulk(struct rte_st_ring *r, void * const *obj_table,
> > +		      unsigned int n, unsigned int *free_space) {
> > +	return rte_st_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue upto a maximum number of objects on a ST ring.
> > + *
> > + * This function copies the objects at the head of the ring and
> > + * moves the head index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   - n: Actual number of objects enqueued.
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_enqueue_burst(struct rte_st_ring *r, void * const *obj_table,
> > +		      unsigned int n, unsigned int *free_space) {
> > +	return rte_st_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue one object on a ST ring.
> > + *
> > + * This function copies one object at the head of the ring and
> > + * moves the head index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj
> > + *   A pointer to the object to be added.
> > + * @return
> > + *   - 0: Success; objects enqueued.
> > + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> enqueued.
> > + */
> > +static __rte_always_inline int
> > +rte_st_ring_enqueue(struct rte_st_ring *r, void *obj) {
> > +	return rte_st_ring_enqueue_elem(r, &obj, sizeof(void *)); }
> > +
> > +/**
> > + * Enqueue fixed number of objects on a ST ring at the tail.
> > + *
> > + * This function copies the objects at the tail of the ring and
> > + * moves the tail index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects enqueued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_enqueue_at_tail_bulk(struct rte_st_ring *r,
> > +				 void * const *obj_table, unsigned int n,
> > +				 unsigned int *free_space)
> > +{
> > +	return rte_st_ring_enqueue_at_tail_bulk_elem(r, obj_table,
> > +			sizeof(void *), n, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue upto a maximum number of objects on a ST ring at the tail.
> > + *
> > + * This function copies the objects at the tail of the ring and
> > + * moves the tail index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   - n: Actual number of objects enqueued.
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_enqueue_at_tail_burst(struct rte_st_ring *r,
> > +				  void * const *obj_table, unsigned int n,
> > +				  unsigned int *free_space)
> > +{
> > +	return rte_st_ring_enqueue_at_tail_burst_elem(r, obj_table,
> > +			sizeof(void *), n, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue one object on a ST ring at tail.
> > + *
> > + * This function copies one object at the tail of the ring and
> > + * moves the tail index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj
> > + *   A pointer to the object to be added.
> > + * @return
> > + *   - 0: Success; objects enqueued.
> > + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> enqueued.
> > + */
> > +static __rte_always_inline int
> > +rte_st_ring_enqueue_at_tail(struct rte_st_ring *r, void *obj) {
> > +	return rte_st_ring_enqueue_at_tail_elem(r, &obj, sizeof(void *)); }
> > +
> > +/**
> > + * Dequeue a fixed number of objects from a ST ring.
> > + *
> > + * This function copies the objects from the tail of the ring and
> > + * moves the tail index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   The number of objects dequeued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_dequeue_bulk(struct rte_st_ring *r, void **obj_table, unsigned
> int n,
> > +		unsigned int *available)
> > +{
> > +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> > +}
> > +
> > +/**
> > + * Dequeue upto a maximum number of objects from a ST ring.
> > + *
> > + * This function copies the objects from the tail of the ring and
> > + * moves the tail index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   - Number of objects dequeued
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_dequeue_burst(struct rte_st_ring *r, void **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> > +}
> > +
> > +/**
> > + * Dequeue one object from a ST ring.
> > + *
> > + * This function copies one object from the tail of the ring and
> > + * moves the tail index.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_p
> > + *   A pointer to a void * pointer (object) that will be filled.
> > + * @return
> > + *   - 0: Success, objects dequeued.
> > + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> > + *     dequeued.
> > + */
> > +static __rte_always_inline int
> > +rte_st_ring_dequeue(struct rte_st_ring *r, void **obj_p) {
> > +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *)); }
> > +
> > +/**
> > + * Dequeue a fixed number of objects from a ST ring from the head.
> > + *
> > + * This function copies the objects from the head of the ring and
> > + * moves the head index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   The number of objects dequeued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_dequeue_at_head_bulk(struct rte_st_ring *r, void **obj_table,
> unsigned int n,
> > +		unsigned int *available)
> > +{
> > +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> > +}
> > +
> > +/**
> > + * Dequeue upto a maximum number of objects from a ST ring from the
> head.
> > + *
> > + * This function copies the objects from the head of the ring and
> > + * moves the head index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   - Number of objects dequeued
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_st_ring_dequeue_at_head_burst(struct rte_st_ring *r, void
> **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> > +}
> > +
> > +/**
> > + * Dequeue one object from a ST ring from the head.
> > + *
> > + * This function copies the objects from the head of the ring and
> > + * moves the head index (backwards).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_p
> > + *   A pointer to a void * pointer (object) that will be filled.
> > + * @return
> > + *   - 0: Success, objects dequeued.
> > + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> > + *     dequeued.
> > + */
> > +static __rte_always_inline int
> > +rte_st_ring_at_head_dequeue(struct rte_st_ring *r, void **obj_p) {
> > +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *)); }
> > +
> > +/**
> > + * Flush a ST ring.
> > + *
> > + * This function flush all the elements in a ST ring
> > + *
> > + * @warning
> > + * Make sure the ring is not in use while calling this function.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + */
> > +void
> > +rte_st_ring_reset(struct rte_st_ring *r);
> > +
> > +/**
> > + * Return the number of entries in a ST ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   The number of entries in the ring.
> > + */
> > +static inline unsigned int
> > +rte_st_ring_count(const struct rte_st_ring *r) {
> > +	uint32_t count = (r->head - r->tail) & r->mask;
> > +	return count;
> > +}
> > +
> > +/**
> > + * Return the number of free entries in a ST ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   The number of free entries in the ring.
> > + */
> > +static inline unsigned int
> > +rte_st_ring_free_count(const struct rte_st_ring *r) {
> > +	return r->capacity - rte_st_ring_count(r); }
> > +
> > +/**
> > + * Test if a ST ring is full.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   - 1: The ring is full.
> > + *   - 0: The ring is not full.
> > + */
> > +static inline int
> > +rte_st_ring_full(const struct rte_st_ring *r) {
> > +	return rte_st_ring_free_count(r) == 0; }
> > +
> > +/**
> > + * Test if a ST ring is empty.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   - 1: The ring is empty.
> > + *   - 0: The ring is not empty.
> > + */
> > +static inline int
> > +rte_st_ring_empty(const struct rte_st_ring *r) {
> > +	return r->tail == r->head;
> > +}
> > +
> > +/**
> > + * Return the size of the ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   The size of the data store used by the ring.
> > + *   NOTE: this is not the same as the usable space in the ring. To query that
> > + *   use ``rte_st_ring_get_capacity()``.
> > + */
> > +static inline unsigned int
> > +rte_st_ring_get_size(const struct rte_st_ring *r) {
> > +	return r->size;
> > +}
> > +
> > +/**
> > + * Return the number of elements which can be stored in the ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   The usable size of the ring.
> > + */
> > +static inline unsigned int
> > +rte_st_ring_get_capacity(const struct rte_st_ring *r) {
> > +	return r->capacity;
> > +}
> > +
> > +/**
> > + * Dump the status of all rings on the console
> > + *
> > + * @param f
> > + *   A pointer to a file for output
> > + */
> > +void rte_st_ring_list_dump(FILE *f);
> > +
> > +/**
> > + * Search a ST ring from its name
> > + *
> > + * @param name
> > + *   The name of the ring.
> > + * @return
> > + *   The pointer to the ring matching the name, or NULL if not found,
> > + *   with rte_errno set appropriately. Possible rte_errno values include:
> > + *    - ENOENT - required entry not available to return.
> > + */
> > +struct rte_st_ring *rte_st_ring_lookup(const char *name);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_ST_RING_H_ */
> > --
> > 2.25.1
> >


^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [RFC] lib/st_ring: add single thread ring
  2023-09-04 18:10   ` Honnappa Nagarahalli
@ 2023-09-05  8:19     ` Konstantin Ananyev
  0 siblings, 0 replies; 48+ messages in thread
From: Konstantin Ananyev @ 2023-09-05  8:19 UTC (permalink / raw)
  To: Honnappa Nagarahalli, jackmin, konstantin.v.ananyev
  Cc: dev, Ruifeng Wang, Aditya Ambadipudi,
	Wathsala Wathawana Vithanage, nd, nd



> > > Add a single thread safe and multi-thread unsafe ring data structure.
> > > This library provides an simple and efficient alternative to
> > > multi-thread safe ring when multi-thread safety is not required.
> >
> > Just a thought: do we really need whole new library for that?
> > From what I understand all we need right now just one extra function:
> > rte_ring_mt_unsafe_prod_deque(...)
> > Sorry for ugly name :)
> > To dequeue N elems from prod.tail.
> > Or you think there would be some extra advantages in ST version of the ring:
> > extra usages, better performance, etc.?
> There are multiple implementations of the ST ring being used in other parts of DPDK. Mattias Ronnblom pointed out some (distributed
> scheduler, eth RX adapter, cmdline) [1] existing ones which will be replaced by this one.
> This implementation will not use atomic instructions, head and tail indices will be in the same cache line and it will be a double ended
> queue. So, I am expecting better perf and more use cases (some might not be applicable currently).

Yep, I do understand that we can skip sync logic for ST case.
Ok, if we do have multiple use-cases it might be plausible to have a separate API for it.

> 
> [1] https://mails.dpdk.org/archives/dev/2023-August/275003.html
> 
> >
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > ---
> > > v1:
> > > 1) The code is very prelimnary and is not even compiled
> > > 2) This is intended to show the APIs and some thoughts on
> > > implementation
> > > 3) More APIs and the rest of the implementation will come in subsequent
> > >    versions
> > >
> > >  lib/st_ring/rte_st_ring.h | 567
> > > ++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 567 insertions(+)
> > >  create mode 100644 lib/st_ring/rte_st_ring.h
> > >
> > > diff --git a/lib/st_ring/rte_st_ring.h b/lib/st_ring/rte_st_ring.h new
> > > file mode 100644 index 0000000000..8cb8832591
> > > --- /dev/null
> > > +++ b/lib/st_ring/rte_st_ring.h
> > > @@ -0,0 +1,567 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2023 Arm Limited
> > > + */
> > > +
> > > +#ifndef _RTE_ST_RING_H_
> > > +#define _RTE_ST_RING_H_
> > > +
> > > +/**
> > > + * @file
> > > + * RTE Signle Thread Ring (ST Ring)
> > > + *
> > > + * The ST Ring is a fixed-size queue intended to be accessed
> > > + * by one thread at a time. It does not provide concurrent access to
> > > + * multiple threads. If there are multiple threads accessing the ST
> > > +ring,
> > > + * then the threads have to use locks to protect the ring from
> > > + * getting corrupted.
> > > + *
> > > + * - FIFO (First In First Out)
> > > + * - Maximum size is fixed; the pointers are stored in a table.
> > > + * - Consumer and producer part of same thread.
> > > + * - Multi-thread producers and consumers need locking.
> > > + * - Single/Bulk/burst dequeue at Tail or Head
> > > + * - Single/Bulk/burst enqueue at Head or Tail
> > > + *
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include <rte_st_ring_core.h>
> > > +#include <rte_st_ring_elem.h>
> > > +
> > > +/**
> > > + * Calculate the memory size needed for a ST ring
> > > + *
> > > + * This function returns the number of bytes needed for a ST ring,
> > > +given
> > > + * the number of elements in it. This value is the sum of the size of
> > > + * the structure rte_st_ring and the size of the memory needed by the
> > > + * elements. The value is aligned to a cache line size.
> > > + *
> > > + * @param count
> > > + *   The number of elements in the ring (must be a power of 2).
> > > + * @return
> > > + *   - The memory size needed for the ST ring on success.
> > > + *   - -EINVAL if count is not a power of 2.
> > > + */
> > > +ssize_t rte_st_ring_get_memsize(unsigned int count);
> > > +
> > > +/**
> > > + * Initialize a ST ring structure.
> > > + *
> > > + * Initialize a ST ring structure in memory pointed by "r". The size
> > > +of the
> > > + * memory area must be large enough to store the ring structure and
> > > +the
> > > + * object table. It is advised to use rte_st_ring_get_memsize() to
> > > +get the
> > > + * appropriate size.
> > > + *
> > > + * The ST ring size is set to *count*, which must be a power of two.
> > > + * The real usable ring size is *count-1* instead of *count* to
> > > + * differentiate a full ring from an empty ring.
> > > + *
> > > + * The ring is not added in RTE_TAILQ_ST_RING global list. Indeed,
> > > +the
> > > + * memory given by the caller may not be shareable among dpdk
> > > + * processes.
> > > + *
> > > + * @param r
> > > + *   The pointer to the ring structure followed by the elements table.
> > > + * @param name
> > > + *   The name of the ring.
> > > + * @param count
> > > + *   The number of elements in the ring (must be a power of 2,
> > > + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> > > + * @param flags
> > > + *   An OR of the following:
> > > + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold
> > > + *     exactly the requested number of entries, and the requested size
> > > + *     will be rounded up to the next power of two, but the usable space
> > > + *     will be exactly that requested. Worst case, if a power-of-2 size is
> > > + *     requested, half the ring space will be wasted.
> > > + *     Without this flag set, the ring size requested must be a power of 2,
> > > + *     and the usable space will be that size - 1.
> > > + * @return
> > > + *   0 on success, or a negative value on error.
> > > + */
> > > +int rte_st_ring_init(struct rte_st_ring *r, const char *name,
> > > +	unsigned int count, unsigned int flags);
> > > +
> > > +/**
> > > + * Create a new ST ring named *name* in memory.
> > > + *
> > > + * This function uses ``memzone_reserve()`` to allocate memory. Then
> > > +it
> > > + * calls rte_st_ring_init() to initialize an empty ring.
> > > + *
> > > + * The new ring size is set to *count*, which must be a power of two.
> > > + * The real usable ring size is *count-1* instead of *count* to
> > > + * differentiate a full ring from an empty ring.
> > > + *
> > > + * The ring is added in RTE_TAILQ_ST_RING list.
> > > + *
> > > + * @param name
> > > + *   The name of the ring.
> > > + * @param count
> > > + *   The size of the ring (must be a power of 2,
> > > + *   unless RTE_ST_RING_F_EXACT_SZ is set in flags).
> > > + * @param socket_id
> > > + *   The *socket_id* argument is the socket identifier in case of
> > > + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> > > + *   constraint for the reserved zone.
> > > + * @param flags
> > > + *   - RTE_ST_RING_F_EXACT_SZ: If this flag is set, the ring will hold exactly
> > the
> > > + *     requested number of entries, and the requested size will be rounded up
> > > + *     to the next power of two, but the usable space will be exactly that
> > > + *     requested. Worst case, if a power-of-2 size is requested, half the
> > > + *     ring space will be wasted.
> > > + *     Without this flag set, the ring size requested must be a power of 2,
> > > + *     and the usable space will be that size - 1.
> > > + * @return
> > > + *   On success, the pointer to the new allocated ring. NULL on error with
> > > + *    rte_errno set appropriately. Possible errno values include:
> > > + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config
> > structure
> > > + *    - EINVAL - count provided is not a power of 2
> > > + *    - ENOSPC - the maximum number of memzones has already been
> > allocated
> > > + *    - EEXIST - a memzone with the same name already exists
> > > + *    - ENOMEM - no appropriate memory area found in which to create
> > memzone
> > > + */
> > > +struct rte_st_ring *rte_st_ring_create(const char *name, unsigned int count,
> > > +				 int socket_id, unsigned int flags);
> > > +
> > > +/**
> > > + * De-allocate all memory used by the ring.
> > > + *
> > > + * @param r
> > > + *   Ring to free.
> > > + *   If NULL then, the function does nothing.
> > > + */
> > > +void rte_st_ring_free(struct rte_st_ring *r);
> > > +
> > > +/**
> > > + * Dump the status of the ring to a file.
> > > + *
> > > + * @param f
> > > + *   A pointer to a file for output
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + */
> > > +void rte_st_ring_dump(FILE *f, const struct rte_st_ring *r);
> > > +
> > > +/**
> > > + * Enqueue fixed number of objects on a ST ring.
> > > + *
> > > + * This function copies the objects at the head of the ring and
> > > + * moves the head index.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects).
> > > + * @param n
> > > + *   The number of objects to add in the ring from the obj_table.
> > > + * @param free_space
> > > + *   if non-NULL, returns the amount of space in the ring after the
> > > + *   enqueue operation has finished.
> > > + * @return
> > > + *   The number of objects enqueued, either 0 or n
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +rte_st_ring_enqueue_bulk(struct rte_st_ring *r, void * const *obj_table,
> > > +		      unsigned int n, unsigned int *free_space) {
> > > +	return rte_st_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue upto a maximum number of objects on a ST ring.
> > > + *
> > > + * This function copies the objects at the head of the ring and
> > > + * moves the head index.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects).
> > > + * @param n
> > > + *   The number of objects to add in the ring from the obj_table.
> > > + * @param free_space
> > > + *   if non-NULL, returns the amount of space in the ring after the
> > > + *   enqueue operation has finished.
> > > + * @return
> > > + *   - n: Actual number of objects enqueued.
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +rte_st_ring_enqueue_burst(struct rte_st_ring *r, void * const *obj_table,
> > > +		      unsigned int n, unsigned int *free_space) {
> > > +	return rte_st_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue one object on a ST ring.
> > > + *
> > > + * This function copies one object at the head of the ring and
> > > + * moves the head index.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj
> > > + *   A pointer to the object to be added.
> > > + * @return
> > > + *   - 0: Success; objects enqueued.
> > > + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> > enqueued.
> > > + */
> > > +static __rte_always_inline int
> > > +rte_st_ring_enqueue(struct rte_st_ring *r, void *obj) {
> > > +	return rte_st_ring_enqueue_elem(r, &obj, sizeof(void *)); }
> > > +
> > > +/**
> > > + * Enqueue fixed number of objects on a ST ring at the tail.
> > > + *
> > > + * This function copies the objects at the tail of the ring and
> > > + * moves the tail index (backwards).
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects).
> > > + * @param n
> > > + *   The number of objects to add in the ring from the obj_table.
> > > + * @param free_space
> > > + *   if non-NULL, returns the amount of space in the ring after the
> > > + *   enqueue operation has finished.
> > > + * @return
> > > + *   The number of objects enqueued, either 0 or n
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +rte_st_ring_enqueue_at_tail_bulk(struct rte_st_ring *r,
> > > +				 void * const *obj_table, unsigned int n,
> > > +				 unsigned int *free_space)
> > > +{
> > > +	return rte_st_ring_enqueue_at_tail_bulk_elem(r, obj_table,
> > > +			sizeof(void *), n, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue upto a maximum number of objects on a ST ring at the tail.
> > > + *
> > > + * This function copies the objects at the tail of the ring and
> > > + * moves the tail index (backwards).
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects).
> > > + * @param n
> > > + *   The number of objects to add in the ring from the obj_table.
> > > + * @param free_space
> > > + *   if non-NULL, returns the amount of space in the ring after the
> > > + *   enqueue operation has finished.
> > > + * @return
> > > + *   - n: Actual number of objects enqueued.
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +rte_st_ring_enqueue_at_tail_burst(struct rte_st_ring *r,
> > > +				  void * const *obj_table, unsigned int n,
> > > +				  unsigned int *free_space)
> > > +{
> > > +	return rte_st_ring_enqueue_at_tail_burst_elem(r, obj_table,
> > > +			sizeof(void *), n, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue one object on a ST ring at tail.
> > > + *
> > > + * This function copies one object at the tail of the ring and
> > > + * moves the tail index (backwards).
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj
> > > + *   A pointer to the object to be added.
> > > + * @return
> > > + *   - 0: Success; objects enqueued.
> > > + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> > enqueued.
> > > + */
> > > +static __rte_always_inline int
> > > +rte_st_ring_enqueue_at_tail(struct rte_st_ring *r, void *obj) {
> > > +	return rte_st_ring_enqueue_at_tail_elem(r, &obj, sizeof(void *)); }
> > > +
> > > +/**
> > > + * Dequeue a fixed number of objects from a ST ring.
> > > + *
> > > + * This function copies the objects from the tail of the ring and
> > > + * moves the tail index.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > > + * @param n
> > > + *   The number of objects to dequeue from the ring to the obj_table.
> > > + * @param available
> > > + *   If non-NULL, returns the number of remaining ring entries after the
> > > + *   dequeue has finished.
> > > + * @return
> > > + *   The number of objects dequeued, either 0 or n
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +rte_st_ring_dequeue_bulk(struct rte_st_ring *r, void **obj_table, unsigned
> > int n,
> > > +		unsigned int *available)
> > > +{
> > > +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue upto a maximum number of objects from a ST ring.
> > > + *
> > > + * This function copies the objects from the tail of the ring and
> > > + * moves the tail index.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > > + * @param n
> > > + *   The number of objects to dequeue from the ring to the obj_table.
> > > + * @param available
> > > + *   If non-NULL, returns the number of remaining ring entries after the
> > > + *   dequeue has finished.
> > > + * @return
> > > + *   - Number of objects dequeued
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +rte_st_ring_dequeue_burst(struct rte_st_ring *r, void **obj_table,
> > > +		unsigned int n, unsigned int *available) {
> > > +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue one object from a ST ring.
> > > + *
> > > + * This function copies one object from the tail of the ring and
> > > + * moves the tail index.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_p
> > > + *   A pointer to a void * pointer (object) that will be filled.
> > > + * @return
> > > + *   - 0: Success, objects dequeued.
> > > + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> > > + *     dequeued.
> > > + */
> > > +static __rte_always_inline int
> > > +rte_st_ring_dequeue(struct rte_st_ring *r, void **obj_p) {
> > > +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *)); }
> > > +
> > > +/**
> > > + * Dequeue a fixed number of objects from a ST ring from the head.
> > > + *
> > > + * This function copies the objects from the head of the ring and
> > > + * moves the head index (backwards).
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > > + * @param n
> > > + *   The number of objects to dequeue from the ring to the obj_table.
> > > + * @param available
> > > + *   If non-NULL, returns the number of remaining ring entries after the
> > > + *   dequeue has finished.
> > > + * @return
> > > + *   The number of objects dequeued, either 0 or n
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +rte_st_ring_dequeue_at_head_bulk(struct rte_st_ring *r, void **obj_table,
> > unsigned int n,
> > > +		unsigned int *available)
> > > +{
> > > +	return rte_st_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue upto a maximum number of objects from a ST ring from the
> > head.
> > > + *
> > > + * This function copies the objects from the head of the ring and
> > > + * moves the head index (backwards).
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > > + * @param n
> > > + *   The number of objects to dequeue from the ring to the obj_table.
> > > + * @param available
> > > + *   If non-NULL, returns the number of remaining ring entries after the
> > > + *   dequeue has finished.
> > > + * @return
> > > + *   - Number of objects dequeued
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +rte_st_ring_dequeue_at_head_burst(struct rte_st_ring *r, void
> > **obj_table,
> > > +		unsigned int n, unsigned int *available) {
> > > +	return rte_st_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue one object from a ST ring from the head.
> > > + *
> > > + * This function copies the objects from the head of the ring and
> > > + * moves the head index (backwards).
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_p
> > > + *   A pointer to a void * pointer (object) that will be filled.
> > > + * @return
> > > + *   - 0: Success, objects dequeued.
> > > + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> > > + *     dequeued.
> > > + */
> > > +static __rte_always_inline int
> > > +rte_st_ring_at_head_dequeue(struct rte_st_ring *r, void **obj_p) {
> > > +	return rte_st_ring_dequeue_elem(r, obj_p, sizeof(void *)); }
> > > +
> > > +/**
> > > + * Flush a ST ring.
> > > + *
> > > + * This function flush all the elements in a ST ring
> > > + *
> > > + * @warning
> > > + * Make sure the ring is not in use while calling this function.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + */
> > > +void
> > > +rte_st_ring_reset(struct rte_st_ring *r);
> > > +
> > > +/**
> > > + * Return the number of entries in a ST ring.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   The number of entries in the ring.
> > > + */
> > > +static inline unsigned int
> > > +rte_st_ring_count(const struct rte_st_ring *r) {
> > > +	uint32_t count = (r->head - r->tail) & r->mask;
> > > +	return count;
> > > +}
> > > +
> > > +/**
> > > + * Return the number of free entries in a ST ring.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   The number of free entries in the ring.
> > > + */
> > > +static inline unsigned int
> > > +rte_st_ring_free_count(const struct rte_st_ring *r) {
> > > +	return r->capacity - rte_st_ring_count(r); }
> > > +
> > > +/**
> > > + * Test if a ST ring is full.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   - 1: The ring is full.
> > > + *   - 0: The ring is not full.
> > > + */
> > > +static inline int
> > > +rte_st_ring_full(const struct rte_st_ring *r) {
> > > +	return rte_st_ring_free_count(r) == 0; }
> > > +
> > > +/**
> > > + * Test if a ST ring is empty.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   - 1: The ring is empty.
> > > + *   - 0: The ring is not empty.
> > > + */
> > > +static inline int
> > > +rte_st_ring_empty(const struct rte_st_ring *r) {
> > > +	return r->tail == r->head;
> > > +}
> > > +
> > > +/**
> > > + * Return the size of the ring.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   The size of the data store used by the ring.
> > > + *   NOTE: this is not the same as the usable space in the ring. To query that
> > > + *   use ``rte_st_ring_get_capacity()``.
> > > + */
> > > +static inline unsigned int
> > > +rte_st_ring_get_size(const struct rte_st_ring *r) {
> > > +	return r->size;
> > > +}
> > > +
> > > +/**
> > > + * Return the number of elements which can be stored in the ring.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   The usable size of the ring.
> > > + */
> > > +static inline unsigned int
> > > +rte_st_ring_get_capacity(const struct rte_st_ring *r) {
> > > +	return r->capacity;
> > > +}
> > > +
> > > +/**
> > > + * Dump the status of all rings on the console
> > > + *
> > > + * @param f
> > > + *   A pointer to a file for output
> > > + */
> > > +void rte_st_ring_list_dump(FILE *f);
> > > +
> > > +/**
> > > + * Search a ST ring from its name
> > > + *
> > > + * @param name
> > > + *   The name of the ring.
> > > + * @return
> > > + *   The pointer to the ring matching the name, or NULL if not found,
> > > + *   with rte_errno set appropriately. Possible rte_errno values include:
> > > + *    - ENOENT - required entry not available to return.
> > > + */
> > > +struct rte_st_ring *rte_st_ring_lookup(const char *name);
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_ST_RING_H_ */
> > > --
> > > 2.25.1
> > >


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v1 0/2] deque: add multithread unsafe deque library
  2023-08-21  6:04 [RFC] lib/st_ring: add single thread ring Honnappa Nagarahalli
                   ` (2 preceding siblings ...)
  2023-09-04 10:13 ` Konstantin Ananyev
@ 2024-04-01  1:37 ` Aditya Ambadipudi
  2024-04-01  1:37   ` [PATCH v1 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
                     ` (2 more replies)
  3 siblings, 3 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-04-01  1:37 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors
  Cc: Honnappa.Nagarahalli, Dhruv.Tripathi, wathsala.vithanage,
	aditya.ambadipudi, ganeshaditya1, nd

As previously discussed in the mailing list [1] we are sending out this
patch that provides the implementation and unit test cases for the
RTE_DEQUE library. This includes functions for creating a RTE_DEQUE 
object. Allocating memory to it. Deleting that object and free'ing the
memory associated with it. Enqueue/Dequeue functions. Functions for 
zero-copy API.

[1] https://mails.dpdk.org/archives/dev/2023-August/275003.html

Aditya Ambadipudi (1):
  deque: add unit tests for the deque library

Honnappa Nagarahalli (1):
  deque: add multi-thread unsafe double ended queue

 .mailmap                               |    1 +
 app/test/meson.build                   |    2 +
 app/test/test_deque_enqueue_dequeue.c  | 1231 ++++++++++++++++++++++++
 app/test/test_deque_helper_functions.c |  170 ++++
 lib/deque/meson.build                  |   11 +
 lib/deque/rte_deque.c                  |  194 ++++
 lib/deque/rte_deque.h                  |  533 ++++++++++
 lib/deque/rte_deque_core.h             |   82 ++
 lib/deque/rte_deque_pvt.h              |  538 +++++++++++
 lib/deque/rte_deque_zc.h               |  430 +++++++++
 lib/deque/version.map                  |   14 +
 lib/meson.build                        |    1 +
 12 files changed, 3207 insertions(+)
 create mode 100644 app/test/test_deque_enqueue_dequeue.c
 create mode 100644 app/test/test_deque_helper_functions.c
 create mode 100644 lib/deque/meson.build
 create mode 100644 lib/deque/rte_deque.c
 create mode 100644 lib/deque/rte_deque.h
 create mode 100644 lib/deque/rte_deque_core.h
 create mode 100644 lib/deque/rte_deque_pvt.h
 create mode 100644 lib/deque/rte_deque_zc.h
 create mode 100644 lib/deque/version.map

-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v1 1/2] deque: add multi-thread unsafe double ended queue
  2024-04-01  1:37 ` [PATCH v1 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
@ 2024-04-01  1:37   ` Aditya Ambadipudi
  2024-04-06  9:35     ` Morten Brørup
  2024-04-24 13:42     ` [PATCH v2 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  2024-04-01  1:37   ` [PATCH v1 " Aditya Ambadipudi
  2024-04-01 14:05   ` [PATCH v1 0/2] deque: add multithread unsafe " Stephen Hemminger
  2 siblings, 2 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-04-01  1:37 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors
  Cc: Honnappa.Nagarahalli, Dhruv.Tripathi, wathsala.vithanage,
	aditya.ambadipudi, ganeshaditya1, nd, Honnappa Nagarahalli

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add a multi-thread unsafe double ended queue data structure. This
library provides a simple and efficient alternative to multi-thread
safe ring when multi-thread safety is not required.

Signed-off-by: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 .mailmap                   |   1 +
 lib/deque/meson.build      |  11 +
 lib/deque/rte_deque.c      | 194 +++++++++++++
 lib/deque/rte_deque.h      | 533 ++++++++++++++++++++++++++++++++++++
 lib/deque/rte_deque_core.h |  82 ++++++
 lib/deque/rte_deque_pvt.h  | 538 +++++++++++++++++++++++++++++++++++++
 lib/deque/rte_deque_zc.h   | 430 +++++++++++++++++++++++++++++
 lib/deque/version.map      |  14 +
 lib/meson.build            |   1 +
 9 files changed, 1804 insertions(+)
 create mode 100644 lib/deque/meson.build
 create mode 100644 lib/deque/rte_deque.c
 create mode 100644 lib/deque/rte_deque.h
 create mode 100644 lib/deque/rte_deque_core.h
 create mode 100644 lib/deque/rte_deque_pvt.h
 create mode 100644 lib/deque/rte_deque_zc.h
 create mode 100644 lib/deque/version.map

diff --git a/.mailmap b/.mailmap
index 3843868716..8e705ab6ab 100644
--- a/.mailmap
+++ b/.mailmap
@@ -17,6 +17,7 @@ Adam Bynes <adambynes@outlook.com>
 Adam Dybkowski <adamx.dybkowski@intel.com>
 Adam Ludkiewicz <adam.ludkiewicz@intel.com>
 Adham Masarwah <adham@nvidia.com> <adham@mellanox.com>
+Aditya Ambadipudi <aditya.ambadipudi@arm.com>
 Adrian Moreno <amorenoz@redhat.com>
 Adrian Podlawski <adrian.podlawski@intel.com>
 Adrien Mazarguil <adrien.mazarguil@6wind.com>
diff --git a/lib/deque/meson.build b/lib/deque/meson.build
new file mode 100644
index 0000000000..1ff45fc39f
--- /dev/null
+++ b/lib/deque/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Arm Limited
+
+sources = files('rte_deque.c')
+headers = files('rte_deque.h')
+# most sub-headers are not for direct inclusion
+indirect_headers += files (
+        'rte_deque_core.h',
+        'rte_deque_pvt.h',
+        'rte_deque_zc.h'
+)
diff --git a/lib/deque/rte_deque.c b/lib/deque/rte_deque.c
new file mode 100644
index 0000000000..3b08b91a98
--- /dev/null
+++ b/lib/deque/rte_deque.c
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+
+#include "rte_deque.h"
+
+/* mask of all valid flag values to deque_create() */
+#define __RTE_DEQUE_F_MASK (RTE_DEQUE_F_EXACT_SZ)
+ssize_t
+rte_deque_get_memsize_elem(unsigned int esize, unsigned int count)
+{
+	ssize_t sz;
+
+	/* Check if element size is a multiple of 4B */
+	if (esize % 4 != 0) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): element size is not a multiple of 4\n",
+			__func__);
+
+		return -EINVAL;
+	}
+
+	/* count must be a power of 2 */
+	if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Requested number of elements is invalid,"
+			"must be power of 2, and not exceed %u\n",
+			__func__, RTE_DEQUE_SZ_MASK);
+
+		return -EINVAL;
+	}
+
+	sz = sizeof(struct rte_deque) + (ssize_t)count * esize;
+	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+	return sz;
+}
+
+void
+rte_deque_reset(struct rte_deque *d)
+{
+	d->head = 0;
+	d->tail = 0;
+}
+
+int
+rte_deque_init(struct rte_deque *d, const char *name, unsigned int count,
+	unsigned int flags)
+{
+	int ret;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(struct rte_deque) &
+			  RTE_CACHE_LINE_MASK) != 0);
+
+	/* future proof flags, only allow supported values */
+	if (flags & ~__RTE_DEQUE_F_MASK) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Unsupported flags requested %#x\n",
+			__func__, flags);
+		return -EINVAL;
+	}
+
+	/* init the deque structure */
+	memset(d, 0, sizeof(*d));
+	ret = strlcpy(d->name, name, sizeof(d->name));
+	if (ret < 0 || ret >= (int)sizeof(d->name))
+		return -ENAMETOOLONG;
+	d->flags = flags;
+
+	if (flags & RTE_DEQUE_F_EXACT_SZ) {
+		d->size = rte_align32pow2(count + 1);
+		d->mask = d->size - 1;
+		d->capacity = count;
+	} else {
+		if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
+			rte_log(RTE_LOG_ERR, rte_deque_log_type,
+				"%s(): Requested size is invalid, must be power"
+				" of 2, and not exceed the size limit %u\n",
+				__func__, RTE_DEQUE_SZ_MASK);
+			return -EINVAL;
+		}
+		d->size = count;
+		d->mask = count - 1;
+		d->capacity = d->mask;
+	}
+
+	return 0;
+}
+
+/* create the deque for a given element size */
+struct rte_deque *
+rte_deque_create(const char *name, unsigned int esize, unsigned int count,
+		int socket_id, unsigned int flags)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct rte_deque *d;
+	const struct rte_memzone *mz;
+	ssize_t deque_size;
+	int mz_flags = 0;
+	const unsigned int requested_count = count;
+	int ret;
+
+	/* for an exact size deque, round up from count to a power of two */
+	if (flags & RTE_DEQUE_F_EXACT_SZ)
+		count = rte_align32pow2(count + 1);
+
+	deque_size = rte_deque_get_memsize_elem(esize, count);
+	if (deque_size < 0) {
+		rte_errno = -deque_size;
+		return NULL;
+	}
+
+	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
+		RTE_DEQUE_MZ_PREFIX, name);
+	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
+		rte_errno = ENAMETOOLONG;
+		return NULL;
+	}
+
+	/* reserve a memory zone for this deque. If we can't get rte_config or
+	 * we are secondary process, the memzone_reserve function will set
+	 * rte_errno for us appropriately - hence no check in this function
+	 */
+	mz = rte_memzone_reserve_aligned(mz_name, deque_size, socket_id,
+					 mz_flags, alignof(struct rte_deque));
+	if (mz != NULL) {
+		d = mz->addr;
+		/* no need to check return value here, we already checked the
+		 * arguments above
+		 */
+		rte_deque_init(d, name, requested_count, flags);
+		d->memzone = mz;
+	} else {
+		d = NULL;
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot reserve memory\n", __func__);
+	}
+	return d;
+}
+
+/* free the deque */
+void
+rte_deque_free(struct rte_deque *d)
+{
+	if (d == NULL)
+		return;
+
+	/*
+	 * Deque was not created with rte_deque_create,
+	 * therefore, there is no memzone to free.
+	 */
+	if (d->memzone == NULL) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot free deque, not created "
+			"with rte_deque_create()\n", __func__);
+		return;
+	}
+
+	if (rte_memzone_free(d->memzone) != 0)
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot free memory\n", __func__);
+}
+
+/* dump the status of the deque on the console */
+void
+rte_deque_dump(FILE *f, const struct rte_deque *d)
+{
+	fprintf(f, "deque <%s>@%p\n", d->name, d);
+	fprintf(f, "  flags=%x\n", d->flags);
+	fprintf(f, "  size=%"PRIu32"\n", d->size);
+	fprintf(f, "  capacity=%"PRIu32"\n", d->capacity);
+	fprintf(f, "  head=%"PRIu32"\n", d->head);
+	fprintf(f, "  tail=%"PRIu32"\n", d->tail);
+	fprintf(f, "  used=%u\n", rte_deque_count(d));
+	fprintf(f, "  avail=%u\n", rte_deque_free_count(d));
+}
+
+RTE_LOG_REGISTER_DEFAULT(rte_deque_log_type, ERR);
diff --git a/lib/deque/rte_deque.h b/lib/deque/rte_deque.h
new file mode 100644
index 0000000000..1ac74ca539
--- /dev/null
+++ b/lib/deque/rte_deque.h
@@ -0,0 +1,533 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_H_
+#define _RTE_DEQUE_H_
+
+/**
+ * @file
+ * RTE double ended queue (Deque)
+ *
+ * This fixed-size queue does not provide concurrent access by
+ * multiple threads. If required, the application should use locks
+ * to protect the deque from concurrent access.
+ *
+ * - Double ended queue
+ * - Maximum size is fixed
+ * - Store objects of any size
+ * - Single/bulk/burst dequeue at tail or head
+ * - Single/bulk/burst enqueue at head or tail
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_deque_core.h>
+#include <rte_deque_pvt.h>
+#include <rte_deque_zc.h>
+
+/**
+ * Calculate the memory size needed for a deque
+ *
+ * This function returns the number of bytes needed for a deque, given
+ * the number of objects and the object size. This value is the sum of
+ * the size of the structure rte_deque and the size of the memory needed
+ * by the objects. The value is aligned to a cache line size.
+ *
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of objects in the deque (must be a power of 2).
+ * @return
+ *   - The memory size needed for the deque on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_deque_get_memsize_elem(unsigned int esize, unsigned int count);
+
+/**
+ * Initialize a deque structure.
+ *
+ * Initialize a deque structure in memory pointed by "d". The size of the
+ * memory area must be large enough to store the deque structure and the
+ * object table. It is advised to use rte_deque_get_memsize() to get the
+ * appropriate size.
+ *
+ * The deque size is set to *count*, which must be a power of two.
+ * The real usable deque size is *count-1* instead of *count* to
+ * differentiate a full deque from an empty deque.
+ *
+ * @param d
+ *   The pointer to the deque structure followed by the objects table.
+ * @param name
+ *   The name of the deque.
+ * @param count
+ *   The number of objects in the deque (must be a power of 2,
+ *   unless RTE_DEQUE_F_EXACT_SZ is set in flags).
+ * @param flags
+ *   - RTE_DEQUE_F_EXACT_SZ: If this flag is set, the deque will hold
+ *     exactly the requested number of objects, and the requested size
+ *     will be rounded up to the next power of two, but the usable space
+ *     will be exactly that requested. Worst case, if a power-of-2 size is
+ *     requested, half the deque space will be wasted.
+ *     Without this flag set, the deque size requested must be a power of 2,
+ *     and the usable space will be that size - 1.
+ * @return
+ *   0 on success, or a negative value on error.
+ */
+__rte_experimental
+int rte_deque_init(struct rte_deque *d, const char *name, unsigned int count,
+		unsigned int flags);
+
+/**
+ * Create a new deque named *name* in memory.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_deque_init() to initialize an empty deque.
+ *
+ * The new deque size is set to *count*, which must be a power of two.
+ * The real usable deque size is *count-1* instead of *count* to
+ * differentiate a full deque from an empty deque.
+ *
+ * @param name
+ *   The name of the deque.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The size of the deque (must be a power of 2,
+ *   unless RTE_DEQUE_F_EXACT_SZ is set in flags).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   - RTE_DEQUE_F_EXACT_SZ: If this flag is set, the deque will hold exactly the
+ *     requested number of entries, and the requested size will be rounded up
+ *     to the next power of two, but the usable space will be exactly that
+ *     requested. Worst case, if a power-of-2 size is requested, half the
+ *     deque space will be wasted.
+ *     Without this flag set, the deque size requested must be a power of 2,
+ *     and the usable space will be that size - 1.
+ * @return
+ *   On success, the pointer to the new allocated deque. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_deque *rte_deque_create(const char *name, unsigned int esize,
+				unsigned int count, int socket_id,
+				unsigned int flags);
+
+/**
+ * De-allocate all memory used by the deque.
+ *
+ * @param d
+ *   Deque to free.
+ *   If NULL then, the function does nothing.
+ */
+__rte_experimental
+void rte_deque_free(struct rte_deque *d);
+
+/**
+ * Dump the status of the deque to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param d
+ *   A pointer to the deque structure.
+ */
+__rte_experimental
+void rte_deque_dump(FILE *f, const struct rte_deque *d);
+
+/**
+ * Return the number of entries in a deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The number of entries in the deque.
+ */
+static inline unsigned int
+rte_deque_count(const struct rte_deque *d)
+{
+	return (d->head - d->tail) & d->mask;
+}
+
+/**
+ * Return the number of free entries in a deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The number of free entries in the deque.
+ */
+static inline unsigned int
+rte_deque_free_count(const struct rte_deque *d)
+{
+	return d->capacity - rte_deque_count(d);
+}
+
+/**
+ * Enqueue fixed number of objects on a deque.
+ *
+ * This function copies the objects at the head of the deque and
+ * moves the head index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_enqueue_bulk_elem(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n,
+			unsigned int *free_space)
+{
+	*free_space = rte_deque_free_count(d);
+	if (unlikely(n > *free_space))
+		return 0;
+	*free_space -= n;
+	return __rte_deque_enqueue_at_head(d, obj_table, esize, n);
+}
+
+/**
+ * Enqueue up to a maximum number of objects on a deque.
+ *
+ * This function copies the objects at the head of the deque and
+ * moves the head index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_enqueue_burst_elem(struct rte_deque *d, const void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *free_space)
+{
+	unsigned int avail_space = rte_deque_free_count(d);
+	unsigned int to_be_enqueued = (n <= avail_space ? n : avail_space);
+	*free_space = avail_space - n;
+	return __rte_deque_enqueue_at_head(d, obj_table, esize, to_be_enqueued);
+}
+
+/**
+ * Enqueue fixed number of objects on a deque at the tail.
+ *
+ * This function copies the objects at the tail of the deque and
+ * moves the tail index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_enqueue_at_tail_bulk_elem(struct rte_deque *d,
+				 const void *obj_table, unsigned int esize,
+				 unsigned int n, unsigned int *free_space)
+{
+	*free_space = rte_deque_free_count(d);
+	if (unlikely(n > *free_space))
+		return 0;
+	*free_space -= n;
+	return __rte_deque_enqueue_at_tail(d, obj_table, esize, n);
+}
+
+/**
+ * Enqueue up to a maximum number of objects on a deque at the tail.
+ *
+ * This function copies the objects at the tail of the deque and
+ * moves the tail index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_enqueue_at_tail_burst_elem(struct rte_deque *d,
+				const void *obj_table, unsigned int esize,
+				unsigned int n, unsigned int *free_space)
+{
+	unsigned int avail_space = rte_deque_free_count(d);
+	unsigned int to_be_enqueued = (n <= avail_space ? n : avail_space);
+	*free_space = avail_space - to_be_enqueued;
+	return __rte_deque_enqueue_at_tail(d, obj_table, esize, to_be_enqueued);
+}
+
+/**
+ * Dequeue a fixed number of objects from a deque.
+ *
+ * This function copies the objects from the tail of the deque and
+ * moves the tail index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_dequeue_bulk_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	*available = rte_deque_count(d);
+	if (unlikely(n > *available))
+		return 0;
+	*available -= n;
+	return __rte_deque_dequeue_at_tail(d, obj_table, esize, n);
+}
+
+/**
+ * Dequeue up to a maximum number of objects from a deque.
+ *
+ * This function copies the objects from the tail of the deque and
+ * moves the tail index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_dequeue_burst_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	unsigned int count = rte_deque_count(d);
+	unsigned int to_be_dequeued = (n <= count ? n : count);
+	*available = count - to_be_dequeued;
+	return __rte_deque_dequeue_at_tail(d, obj_table, esize, to_be_dequeued);
+}
+
+/**
+ * Dequeue a fixed number of objects from a deque from the head.
+ *
+ * This function copies the objects from the head of the deque and
+ * moves the head index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_dequeue_at_head_bulk_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	*available = rte_deque_count(d);
+	if (unlikely(n > *available))
+		return 0;
+	*available -= n;
+	return __rte_deque_dequeue_at_head(d, obj_table, esize, n);
+}
+
+/**
+ * Dequeue up to a maximum number of objects from a deque from the head.
+ *
+ * This function copies the objects from the head of the deque and
+ * moves the head index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_dequeue_at_head_burst_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	unsigned int count = rte_deque_count(d);
+	unsigned int to_be_dequeued = (n <= count ? n : count);
+	*available = count - to_be_dequeued;
+	return __rte_deque_dequeue_at_head(d, obj_table, esize, to_be_dequeued);
+}
+
+/**
+ * Flush a deque.
+ *
+ * This function flush all the objects in a deque
+ *
+ * @warning
+ * Make sure the deque is not in use while calling this function.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ */
+__rte_experimental
+void rte_deque_reset(struct rte_deque *d);
+
+/**
+ * Test if a deque is full.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   - 1: The deque is full.
+ *   - 0: The deque is not full.
+ */
+static inline int
+rte_deque_full(const struct rte_deque *d)
+{
+	return rte_deque_free_count(d) == 0;
+}
+
+/**
+ * Test if a deque is empty.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   - 1: The deque is empty.
+ *   - 0: The deque is not empty.
+ */
+static inline int
+rte_deque_empty(const struct rte_deque *d)
+{
+	return d->tail == d->head;
+}
+
+/**
+ * Return the size of the deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The size of the data store used by the deque.
+ *   NOTE: this is not the same as the usable space in the deque. To query that
+ *   use ``rte_deque_get_capacity()``.
+ */
+static inline unsigned int
+rte_deque_get_size(const struct rte_deque *d)
+{
+	return d->size;
+}
+
+/**
+ * Return the number of objects which can be stored in the deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The usable size of the deque.
+ */
+static inline unsigned int
+rte_deque_get_capacity(const struct rte_deque *d)
+{
+	return d->capacity;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_H_ */
diff --git a/lib/deque/rte_deque_core.h b/lib/deque/rte_deque_core.h
new file mode 100644
index 0000000000..ff82b80d38
--- /dev/null
+++ b/lib/deque/rte_deque_core.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_CORE_H_
+#define _RTE_DEQUE_CORE_H_
+
+/**
+ * @file
+ * This file contains definition of RTE deque structure, init flags and
+ * some related macros. This file should not be included directly,
+ * include rte_deque.h instead.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_debug.h>
+
+extern int rte_deque_log_type;
+
+#define RTE_DEQUE_MZ_PREFIX "DEQUE_"
+/** The maximum length of a deque name. */
+#define RTE_DEQUE_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_DEQUE_MZ_PREFIX) + 1)
+
+/**
+ * Double ended queue (deque) structure.
+ *
+ * The producer and the consumer have a head and a tail index. These indices
+ * are not between 0 and size(deque)-1. These indices are between 0 and
+ * 2^32 -1. Their value is masked while accessing the objects in deque.
+ * These indices are unsigned 32bits. Hence the result of the subtraction is
+ * always a modulo of 2^32 and it is between 0 and capacity.
+ */
+struct rte_deque {
+	alignas(RTE_CACHE_LINE_SIZE) char name[RTE_DEQUE_NAMESIZE];
+	/**< Name of the deque */
+	int flags;
+	/**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+	/**< Memzone, if any, containing the rte_deque */
+
+	alignas(RTE_CACHE_LINE_SIZE) char pad0; /**< empty cache line */
+
+	uint32_t size;           /**< Size of deque. */
+	uint32_t mask;           /**< Mask (size-1) of deque. */
+	uint32_t capacity;       /**< Usable size of deque */
+	/** Ring head and tail pointers. */
+	volatile uint32_t head;
+	volatile uint32_t tail;
+};
+
+/**
+ * Deque is to hold exactly requested number of entries.
+ * Without this flag set, the deque size requested must be a power of 2, and the
+ * usable space will be that size - 1. With the flag, the requested size will
+ * be rounded up to the next power of two, but the usable space will be exactly
+ * that requested. Worst case, if a power-of-2 size is requested, half the
+ * deque space will be wasted.
+ */
+#define RTE_DEQUE_F_EXACT_SZ 0x0004
+#define RTE_DEQUE_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_CORE_H_ */
diff --git a/lib/deque/rte_deque_pvt.h b/lib/deque/rte_deque_pvt.h
new file mode 100644
index 0000000000..931bbd4d19
--- /dev/null
+++ b/lib/deque/rte_deque_pvt.h
@@ -0,0 +1,538 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_PVT_H_
+#define _RTE_DEQUE_PVT_H_
+
+#define __RTE_DEQUE_COUNT(d) ((d->head - d->tail) & d->mask)
+#define __RTE_DEQUE_FREE_SPACE(d) (d->capacity - __RTE_DEQUE_COUNT(d))
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_32(struct rte_deque *d,
+				const unsigned int size,
+				uint32_t idx,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t *deque = (uint32_t *)&d[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			deque[idx] = obj[i];
+			deque[idx + 1] = obj[i + 1];
+			deque[idx + 2] = obj[i + 2];
+			deque[idx + 3] = obj[i + 3];
+			deque[idx + 4] = obj[i + 4];
+			deque[idx + 5] = obj[i + 5];
+			deque[idx + 6] = obj[i + 6];
+			deque[idx + 7] = obj[i + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 6:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 5:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 4:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 3:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			deque[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_64(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->head & d->mask);
+	uint64_t *deque = (uint64_t *)&d[1];
+	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			deque[idx] = obj[i];
+			deque[idx + 1] = obj[i + 1];
+			deque[idx + 2] = obj[i + 2];
+			deque[idx + 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			deque[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_128(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->head & d->mask);
+	rte_int128_t *deque = (rte_int128_t *)&d[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_enqueue_at_head(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_enqueue_elems_head_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_enqueue_elems_head_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->head & d->mask;
+		nd_idx = idx * scale;
+		nd_size = d->size * scale;
+		__rte_deque_enqueue_elems_head_32(d, nd_size, nd_idx,
+						obj_table, nd_num);
+	}
+	d->head = (d->head + n) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_32(struct rte_deque *d,
+				const unsigned int mask,
+				uint32_t idx,
+				const void *obj_table,
+				unsigned int n,
+				const unsigned int scale,
+				const unsigned int elem_size)
+{
+	unsigned int i;
+	uint32_t *deque = (uint32_t *)&d[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+
+	if (likely(idx >= n)) {
+		for (i = 0; i < n; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+	} else {
+		for (i = 0; (int32_t)idx >= 0; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+
+		/* Start at the ending */
+		idx = mask;
+		for (; i < n; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_64(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->tail & d->mask);
+	uint64_t *deque = (uint64_t *)&d[1];
+	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx -= 4) {
+			deque[idx] = obj[i];
+			deque[idx - 1] = obj[i + 1];
+			deque[idx - 2] = obj[i + 2];
+			deque[idx - 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			deque[idx] = obj[i];
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_128(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->tail & d->mask);
+	rte_int128_t *deque = (rte_int128_t *)&d[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx -= 2) {
+			deque[idx] = obj[i];
+			deque[idx - 1] = obj[i + 1];
+		}
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_enqueue_at_tail(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* The tail point must point at an empty cell when enqueuing */
+	d->tail--;
+
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_enqueue_elems_tail_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_enqueue_elems_tail_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_mask;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->tail & d->mask;
+		nd_idx = idx * scale;
+		nd_mask = d->mask * scale;
+		__rte_deque_enqueue_elems_tail_32(d, nd_mask, nd_idx, obj_table,
+						nd_num, scale, esize);
+	}
+
+	/* The +1 is because the tail needs to point at a
+	 * non-empty memory location after the enqueuing operation.
+	 */
+	d->tail = (d->tail - n + 1) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_32(struct rte_deque *d,
+			const unsigned int size,
+			uint32_t idx,
+			void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t *deque = (const uint32_t *)&d[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx + 1];
+			obj[i + 2] = deque[idx + 2];
+			obj[i + 3] = deque[idx + 3];
+			obj[i + 4] = deque[idx + 4];
+			obj[i + 5] = deque[idx + 5];
+			obj[i + 6] = deque[idx + 6];
+			obj[i + 7] = deque[idx + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 6:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 5:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 4:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 3:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = deque[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_64(struct rte_deque *d, void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->tail & d->mask);
+	const uint64_t *deque = (const uint64_t *)&d[1];
+	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx + 1];
+			obj[i + 2] = deque[idx + 2];
+			obj[i + 3] = deque[idx + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = deque[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_128(struct rte_deque *d,
+			void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->tail & d->mask);
+	const rte_int128_t *deque = (const rte_int128_t *)&d[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_dequeue_at_tail(struct rte_deque *d,
+			void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_dequeue_elems_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_dequeue_elems_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->tail & d->mask;
+		nd_idx = idx * scale;
+		nd_size = d->size * scale;
+		__rte_deque_dequeue_elems_32(d, nd_size, nd_idx,
+					obj_table, nd_num);
+	}
+	d->tail = (d->tail + n) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_32(struct rte_deque *d,
+				const unsigned int mask,
+				uint32_t idx,
+				void *obj_table,
+				unsigned int n,
+				const unsigned int scale,
+				const unsigned int elem_size)
+{
+	unsigned int i;
+	const uint32_t *deque = (uint32_t *)&d[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+
+	if (likely(idx >= n)) {
+		for (i = 0; i < n; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+	} else {
+		for (i = 0; (int32_t)idx >= 0; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+		/* Start at the ending */
+		idx = mask;
+		for (; i < n; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_64(struct rte_deque *d,
+				void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->head & d->mask);
+	const uint64_t *deque = (uint64_t *)&d[1];
+	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx -= 4) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx - 1];
+			obj[i + 2] = deque[idx - 2];
+			obj[i + 3] = deque[idx - 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = deque[idx--];  /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx--]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx--]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			obj[i] = deque[idx];
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_128(struct rte_deque *d,
+				void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->head & d->mask);
+	const rte_int128_t *deque = (rte_int128_t *)&d[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx -= 2) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx - 1];
+		}
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_dequeue_at_head(struct rte_deque *d,
+			void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* The head must point at an empty cell when dequeueing */
+	d->head--;
+
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_dequeue_elems_head_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_dequeue_elems_head_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_mask;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->head & d->mask;
+		nd_idx = idx * scale;
+		nd_mask = d->mask * scale;
+		__rte_deque_dequeue_elems_head_32(d, nd_mask, nd_idx, obj_table,
+						nd_num, scale, esize);
+	}
+
+	/* The +1 is because the head needs to point at a
+	 * empty memory location after the dequeueing operation.
+	 */
+	d->head = (d->head - n + 1) & d->mask;
+	return n;
+}
+#endif /* _RTE_DEQUEU_PVT_H_ */
diff --git a/lib/deque/rte_deque_zc.h b/lib/deque/rte_deque_zc.h
new file mode 100644
index 0000000000..75a7e6fddb
--- /dev/null
+++ b/lib/deque/rte_deque_zc.h
@@ -0,0 +1,430 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+#ifndef _RTE_DEQUE_ZC_H_
+#define _RTE_DEQUE_ZC_H_
+
+/**
+ * @file
+ * This file should not be included directly, include rte_deque.h instead.
+ *
+ * Deque Zero Copy APIs
+ * These APIs make it possible to split public enqueue/dequeue API
+ * into 3 parts:
+ * - enqueue/dequeue start
+ * - copy data to/from the deque
+ * - enqueue/dequeue finish
+ * These APIs provide the ability to avoid copying of the data to temporary area.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Deque zero-copy information structure.
+ *
+ * This structure contains the pointers and length of the space
+ * reserved on the Deque storage.
+ */
+struct __rte_cache_aligned rte_deque_zc_data {
+	/* Pointer to the first space in the deque */
+	void *ptr1;
+	/* Pointer to the second space in the deque if there is wrap-around.
+	 * It contains valid value only if wrap-around happens.
+	 */
+	void *ptr2;
+	/* Number of elements in the first pointer. If this is equal to
+	 * the number of elements requested, then ptr2 is NULL.
+	 * Otherwise, subtracting n1 from number of elements requested
+	 * will give the number of elements available at ptr2.
+	 */
+	unsigned int n1;
+};
+
+static __rte_always_inline void
+__rte_deque_get_elem_addr(struct rte_deque *d, uint32_t pos,
+	uint32_t esize, uint32_t num, void **dst1, uint32_t *n1, void **dst2,
+	bool low_to_high)
+{
+	uint32_t idx, scale, nr_idx;
+	uint32_t *deque = (uint32_t *)&d[1];
+
+	/* Normalize to uint32_t */
+	scale = esize / sizeof(uint32_t);
+	idx = pos & d->mask;
+	nr_idx = idx * scale;
+
+	*dst1 = deque + nr_idx;
+	*n1 = num;
+
+	if (low_to_high) {
+		if (idx + num > d->size) {
+			*n1 = d->size - idx;
+			*dst2 = deque;
+		} else
+			*dst2 = NULL;
+	} else {
+		if ((int32_t)(idx - num) < 0) {
+			*n1 = idx + 1;
+			*dst2 = (void *)&deque[(-1 & d->mask) * scale];
+		} else
+			*dst2 = NULL;
+	}
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the deque by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the deque using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_enqueue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	if (unlikely(*free_space < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->head, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, true);
+
+	*free_space -= n;
+	return n;
+}
+
+/**
+ * Complete enqueuing several pointers to objects on the deque.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of pointers to objects to add to the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_enqueue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->head = (d->head + n) & d->mask;
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the queue using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_enqueue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	n = n > *free_space ? *free_space : n;
+	return rte_deque_enqueue_zc_bulk_elem_start(d, esize, n, zcd, free_space);
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the deque by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the deque using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_enqueue_zc_bulk_elem_tail_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	if (unlikely(*free_space < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->tail - 1, esize, n, &zcd->ptr1,
+							  &zcd->n1, &zcd->ptr2, false);
+
+	*free_space -= n;
+	return n;
+}
+
+/**
+ * Complete enqueuing several pointers to objects on the deque.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of pointers to objects to add to the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_enqueue_zc_elem_tail_finish(struct rte_deque *d, unsigned int n)
+{
+	d->tail = (d->tail - n) & d->mask;
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the queue using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.@param r
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_enqueue_zc_burst_elem_tail_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	n = n > *free_space ? *free_space : n;
+	return rte_deque_enqueue_zc_bulk_elem_tail_start(d, esize, n, zcd, free_space);
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_dequeue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	if (unlikely(*available < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->tail, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, true);
+
+	*available -= n;
+	return n;
+}
+
+/**
+ * Complete dequeuing several objects from the deque.
+ * Note that number of objects to dequeued should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of objects to remove from the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_dequeue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->tail = (d->tail + n) & d->mask;
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_dequeue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	n = n > *available ? *available : n;
+	return rte_deque_dequeue_zc_bulk_elem_start(d, esize, n, zcd, available);
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_dequeue_zc_bulk_elem_head_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	if (unlikely(*available < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->head - 1, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, false);
+
+	*available -= n;
+	return n;
+}
+
+/**
+ * Complete dequeuing several objects from the deque.
+ * Note that number of objects to dequeued should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of objects to remove from the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_dequeue_zc_elem_head_finish(struct rte_deque *d, unsigned int n)
+{
+	d->head = (d->head - n) & d->mask;
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_dequeue_zc_burst_elem_head_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	n = n > *available ? *available : n;
+	return rte_deque_dequeue_zc_bulk_elem_head_start(d, esize, n, zcd, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_ZC_H_ */
diff --git a/lib/deque/version.map b/lib/deque/version.map
new file mode 100644
index 0000000000..103fd3b512
--- /dev/null
+++ b/lib/deque/version.map
@@ -0,0 +1,14 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 24.07
+	rte_deque_log_type;
+	rte_deque_create;
+	rte_deque_dump;
+	rte_deque_free;
+	rte_deque_get_memsize_elem;
+	rte_deque_init;
+	rte_deque_reset;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 179a272932..8c8c1e98e2 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -14,6 +14,7 @@ libraries = [
         'argparse',
         'telemetry', # basic info querying
         'eal', # everything depends on eal
+        'deque',
         'ring',
         'rcu', # rcu depends on ring
         'mempool',
-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v1 2/2] deque: add unit tests for the deque library
  2024-04-01  1:37 ` [PATCH v1 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  2024-04-01  1:37   ` [PATCH v1 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
@ 2024-04-01  1:37   ` Aditya Ambadipudi
  2024-04-01 14:05   ` [PATCH v1 0/2] deque: add multithread unsafe " Stephen Hemminger
  2 siblings, 0 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-04-01  1:37 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors
  Cc: Honnappa.Nagarahalli, Dhruv.Tripathi, wathsala.vithanage,
	aditya.ambadipudi, ganeshaditya1, nd, Honnappa Nagarahalli

Add unit test cases that test all of the enqueue/dequeue functions.
Both normal enqueue/dequeue functions and the zerocopy API functions.

Signed-off-by: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.nagarahalli@arm.com>
---
 app/test/meson.build                   |    2 +
 app/test/test_deque_enqueue_dequeue.c  | 1231 ++++++++++++++++++++++++
 app/test/test_deque_helper_functions.c |  170 ++++
 3 files changed, 1403 insertions(+)
 create mode 100644 app/test/test_deque_enqueue_dequeue.c
 create mode 100644 app/test/test_deque_helper_functions.c

diff --git a/app/test/meson.build b/app/test/meson.build
index 7d909039ae..8913050c9b 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -60,6 +60,8 @@ source_file_deps = {
     'test_cryptodev_security_tls_record.c': ['cryptodev', 'security'],
     'test_cycles.c': [],
     'test_debug.c': [],
+    'test_deque_enqueue_dequeue.c': ['deque'],
+    'test_deque_helper_functions.c': ['deque'],
     'test_devargs.c': ['kvargs'],
     'test_dispatcher.c': ['dispatcher'],
     'test_distributor.c': ['distributor'],
diff --git a/app/test/test_deque_enqueue_dequeue.c b/app/test/test_deque_enqueue_dequeue.c
new file mode 100644
index 0000000000..35f2dd4451
--- /dev/null
+++ b/app/test/test_deque_enqueue_dequeue.c
@@ -0,0 +1,1231 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+
+#include "test.h"
+
+#include <assert.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_deque.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+struct rte_deque *deque;
+
+static const int esize[] = {4, 8, 16, 20};
+#define DEQUE_SIZE 4096
+#define MAX_BULK 32
+#define TEST_DEQUE_FULL_EMPTY_ITER 8
+
+/*
+ * Validate the return value of test cases and print details of the
+ * deque if validation fails
+ *
+ * @param exp
+ *   Expression to validate return value.
+ * @param r
+ *   A pointer to the deque structure.
+ */
+#define TEST_DEQUE_VERIFY(exp, d, errst) do {				\
+	if (!(exp)) {							\
+		printf("error at %s:%d\tcondition " #exp " failed\n",	\
+			__func__, __LINE__);				\
+		rte_deque_dump(stdout, (d));				\
+		errst;							\
+	}								\
+} while (0)
+
+static int
+test_deque_mem_cmp(void *src, void *dst, unsigned int size)
+{
+	int ret;
+
+	ret = memcmp(src, dst, size);
+	if (ret) {
+		rte_hexdump(stdout, "src", src, size);
+		rte_hexdump(stdout, "dst", dst, size);
+		printf("data after dequeue is not the same\n");
+	}
+
+	return ret;
+}
+
+static int
+test_deque_mem_cmp_rvs(void *src, void *dst,
+		unsigned int count, unsigned int esize)
+{
+	int ret = 0;
+	uint32_t *src32 = ((uint32_t *)src), *dst32 = ((uint32_t *)dst);
+	uint32_t scale = esize/(sizeof(uint32_t));
+
+	/* Start at the end of the dst and compare from there.*/
+	dst32 += (count - 1) * scale;
+	for (unsigned int i = 0; i < count; i++) {
+		for (unsigned int j = 0; j < scale; j++) {
+			if (src32[j] != dst32[j]) {
+				ret = -1;
+				break;
+			}
+		}
+		if (ret)
+			break;
+		dst32 -= scale;
+		src32 += scale;
+	}
+	if (ret) {
+		rte_hexdump(stdout, "src", src, count * esize);
+		rte_hexdump(stdout, "dst", dst, count * esize);
+		printf("data after dequeue is not the same\n");
+	}
+
+	return ret;
+}
+
+static inline void *
+test_deque_calloc(unsigned int dsize, int esize)
+{
+	void *p;
+
+	p = rte_zmalloc(NULL, dsize * esize, RTE_CACHE_LINE_SIZE);
+	if (p == NULL)
+		printf("Failed to allocate memory\n");
+
+	return p;
+}
+
+static void
+test_deque_mem_init(void *obj, unsigned int count, int esize)
+{
+	for (unsigned int i = 0; i < (count * esize / sizeof(uint32_t)); i++)
+		((uint32_t *)obj)[i] = i;
+}
+
+static inline void *
+test_deque_inc_ptr(void *obj, int esize, unsigned int n)
+{
+	return (void *)((uint32_t *)obj + (n * esize / sizeof(uint32_t)));
+}
+
+/* Copy to the deque memory */
+static inline void
+test_deque_zc_copy_to_deque(struct rte_deque_zc_data *zcd, const void *src, int esize,
+	unsigned int num)
+{
+	memcpy(zcd->ptr1, src, esize * zcd->n1);
+	if (zcd->n1 != num) {
+		const void *inc_src = (const void *)((const char *)src +
+						(zcd->n1 * esize));
+		memcpy(zcd->ptr2, inc_src, esize * (num - zcd->n1));
+	}
+}
+
+static inline void
+test_deque_zc_copy_to_deque_rev(struct rte_deque_zc_data *zcd, const void *src,
+					int esize, unsigned int num)
+{
+	void *ptr1 = zcd->ptr1;
+	for (unsigned int i = 0; i < zcd->n1; i++) {
+		memcpy(ptr1, src, esize);
+		src = (const void *)((const char *)src + esize);
+		ptr1 = (void *)((char *)ptr1 - esize);
+	}
+	if (zcd->n1 != num) {
+		void *ptr2 = zcd->ptr2;
+		for (unsigned int i = 0; i < (num - zcd->n1); i++) {
+			memcpy(ptr2, src, esize);
+			src = (const void *)((const char *)src + esize);
+			ptr2 = (void *)((char *)ptr2 - esize);
+		}
+	}
+}
+
+/* Copy from the deque memory */
+static inline void
+test_deque_zc_copy_from_deque(struct rte_deque_zc_data *zcd, void *dst, int esize,
+	unsigned int num)
+{
+	memcpy(dst, zcd->ptr1, esize * zcd->n1);
+
+	if (zcd->n1 != num) {
+		dst = test_deque_inc_ptr(dst, esize, zcd->n1);
+		memcpy(dst, zcd->ptr2, esize * (num - zcd->n1));
+	}
+}
+
+static inline void
+test_deque_zc_copy_from_deque_rev(struct rte_deque_zc_data *zcd, void *dst, int esize,
+	unsigned int num)
+{
+	void *ptr1 = zcd->ptr1;
+	for (unsigned int i = 0; i < zcd->n1; i++) {
+		memcpy(dst, ptr1, esize);
+		dst = (void *)((char *)dst + esize);
+		ptr1 = (void *)((char *)ptr1 - esize);
+	}
+	if (zcd->n1 != num) {
+		void *ptr2 = zcd->ptr2;
+		for (unsigned int i = 0; i < (num - zcd->n1); i++) {
+			memcpy(dst, ptr2, esize);
+			dst = (void *)((char *)dst + esize);
+			ptr2 = (void *)((char *)ptr2 - esize);
+		}
+	}
+}
+
+/* Wrappers around the zero-copy APIs. The wrappers match
+ * the normal enqueue/dequeue API declarations.
+ */
+static unsigned int
+test_deque_enqueue_zc_bulk_elem(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_enqueue_zc_bulk_elem_start(d, esize, n,
+						&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque(&zcd, obj_table, esize, ret);
+		rte_deque_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_dequeue_zc_bulk_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_dequeue_zc_bulk_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque(&zcd, obj_table, esize, ret);
+		rte_deque_dequeue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_enqueue_zc_burst_elem(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_enqueue_zc_burst_elem_start(d, esize, n,
+						&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque(&zcd, obj_table, esize, ret);
+		rte_deque_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_dequeue_zc_burst_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_dequeue_zc_burst_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque(&zcd, obj_table, esize, ret);
+		rte_deque_dequeue_zc_elem_finish(d, ret);
+	}
+	return ret;
+}
+
+static unsigned int
+test_deque_enqueue_zc_bulk_elem_tail(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_enqueue_zc_bulk_elem_tail_start(d, esize, n,
+							&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_enqueue_zc_elem_tail_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_dequeue_zc_bulk_elem_head(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_dequeue_zc_bulk_elem_head_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_dequeue_zc_elem_head_finish(d, ret);
+	}
+	return ret;
+}
+
+static unsigned int
+test_deque_enqueue_zc_burst_elem_tail(struct rte_deque *d,
+	const void *obj_table, unsigned int esize, unsigned int n,
+	unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_enqueue_zc_burst_elem_tail_start(d, esize, n,
+							&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_enqueue_zc_elem_tail_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_dequeue_zc_burst_elem_head(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_dequeue_zc_burst_elem_head_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_dequeue_zc_elem_head_finish(d, ret);
+	}
+	return ret;
+}
+
+#define TEST_DEQUE_ELEM_BULK 8
+#define TEST_DEQUE_ELEM_BURST 16
+static const struct {
+	const char *desc;
+	const int api_flags;
+	unsigned int (*enq)(struct rte_deque *d, const void *obj_table,
+		unsigned int esize, unsigned int n,
+		unsigned int *free_space);
+	unsigned int (*deq)(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available);
+	/* This dequeues in the opposite direction of enqueue.
+	 * This is used for testing stack behavior
+	 */
+	unsigned int (*deq_opp)(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available);
+} test_enqdeq_impl[] = {
+	{
+		.desc = "Deque forward direction bulkmode",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = rte_deque_enqueue_bulk_elem,
+		.deq = rte_deque_dequeue_bulk_elem,
+		.deq_opp = rte_deque_dequeue_at_head_bulk_elem,
+	},
+	{
+		.desc = "Deque forward direction burstmode",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = rte_deque_enqueue_burst_elem,
+		.deq = rte_deque_dequeue_burst_elem,
+		.deq_opp = rte_deque_dequeue_at_head_burst_elem,
+	},
+	{
+		.desc = "Deque reverse direction bulkmode",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = rte_deque_enqueue_at_tail_bulk_elem,
+		.deq = rte_deque_dequeue_at_head_bulk_elem,
+		.deq_opp = rte_deque_dequeue_bulk_elem,
+	},
+	{
+		.desc = "Deque reverse direction burstmode",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = rte_deque_enqueue_at_tail_burst_elem,
+		.deq = rte_deque_dequeue_at_head_burst_elem,
+		.deq_opp = rte_deque_dequeue_burst_elem,
+	},
+	{
+		.desc = "Deque forward direction bulkmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = test_deque_enqueue_zc_bulk_elem,
+		.deq = test_deque_dequeue_zc_bulk_elem,
+		.deq_opp = test_deque_dequeue_zc_bulk_elem_head,
+	},
+	{
+		.desc = "Deque forward direction burstmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = test_deque_enqueue_zc_burst_elem,
+		.deq = test_deque_dequeue_zc_burst_elem,
+		.deq_opp = test_deque_dequeue_zc_burst_elem_head,
+	},
+	{
+		.desc = "Deque reverse direction bulkmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = test_deque_enqueue_zc_bulk_elem_tail,
+		.deq = test_deque_dequeue_zc_bulk_elem_head,
+		.deq_opp = test_deque_dequeue_zc_bulk_elem,
+	},
+	{
+		.desc = "Deque reverse direction burstmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = test_deque_enqueue_zc_burst_elem_tail,
+		.deq = test_deque_dequeue_zc_burst_elem_head,
+		.deq_opp = test_deque_dequeue_zc_burst_elem,
+	},
+};
+
+/*
+ * Burst and bulk operations in regular mode and zero copy mode.
+ * Random number of elements are enqueued and dequeued.
+ */
+static int
+test_deque_burst_bulk_tests1(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, j, temp_sz, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Over the boundary deque.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Random full/empty test\n");
+
+		for (j = 0; j != TEST_DEQUE_FULL_EMPTY_ITER; j++) {
+			/* random shift in the deque */
+			int rand = RTE_MAX(rte_rand() % DEQUE_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+				__func__, i, rand);
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							rand, &free_space);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)rand, d, goto fail);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+							rand, &available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)rand, d, goto fail);
+
+			/* fill the deque */
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], dsz,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == (int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d), d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d, goto fail);
+
+			/* empty the deque */
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], dsz,
+							&available);
+			TEST_DEQUE_VERIFY(ret == (int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+			/* check data */
+			temp_sz = dsz * esize[i];
+			TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst, temp_sz) == 0,
+							d, goto fail);
+		}
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with regular & zero copy mode.
+ * Sequence of simple enqueues/dequeues and validate the enqueued and
+ * dequeued data.
+ */
+static int
+test_deque_burst_bulk_tests2(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, free_space, available;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+		esize[i]);
+
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Multiple enqs, deqs.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("enqueue 1 obj\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						1, &free_space);
+		TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+
+		printf("enqueue 2 objs\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						2, &free_space);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+		printf("enqueue MAX_BULK objs\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						MAX_BULK, &free_space);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+
+		printf("dequeue 1 obj\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						1, &available);
+		TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 1);
+
+		printf("dequeue 2 objs\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						2, &available);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+		printf("dequeue MAX_BULK objs\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						MAX_BULK, &available);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+				RTE_PTR_DIFF(cur_dst, dst)) == 0,
+				d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with normal mode & zero copy mode.
+ * Enqueue and dequeue to cover the entire deque length.
+ */
+static int
+test_deque_burst_bulk_tests3(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, j, free_space, available;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("fill and empty the deque\n");
+		for (j = 0; j < DEQUE_SIZE / MAX_BULK; j++) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], MAX_BULK,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], MAX_BULK,
+							&available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with normal mode & zero copy mode.
+ * Enqueue till the deque is full and dequeue till the deque becomes empty.
+ */
+static int
+test_deque_burst_bulk_tests4(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, j, available, free_space;
+	unsigned int num_elems, api_type;
+	api_type = test_enqdeq_impl[test_idx].api_flags;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Test enqueue without enough memory space\n");
+		for (j = 0; j < (DEQUE_SIZE/MAX_BULK - 1); j++) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], MAX_BULK,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+		}
+
+		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						2, &free_space);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_DEQUE_ELEM_BULK))
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		/* Always one free entry left */
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						num_elems, &free_space);
+		TEST_DEQUE_VERIFY(ret == (MAX_BULK - 3), d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i],
+							(MAX_BULK - 3));
+
+		printf("Test if deque is full\n");
+		TEST_DEQUE_VERIFY(rte_deque_full(d) == 1, d, goto fail);
+
+		printf("Test enqueue for a full entry\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						1, &free_space);
+		TEST_DEQUE_VERIFY(ret == 0, d, goto fail);
+
+		printf("Test dequeue without enough objects\n");
+		for (j = 0; j < DEQUE_SIZE / MAX_BULK - 1; j++) {
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+							MAX_BULK, &available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i],
+						MAX_BULK);
+		}
+
+		/* Available memory space for the exact MAX_BULK entries */
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						2, &available);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_DEQUE_ELEM_BULK))
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						num_elems, &available);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK - 3, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK - 3);
+
+		printf("Test if deque is empty\n");
+		/* Check if deque is empty */
+		TEST_DEQUE_VERIFY(rte_deque_empty(d) == 1, d, goto fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Basic test cases with exact size deque.
+ */
+static int
+test_deque_with_exact_size(void)
+{
+	struct rte_deque *std_d = NULL, *exact_sz_d = NULL;
+	void *src_orig = NULL, *dst_orig = NULL;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	const unsigned int deque_sz = 16;
+	unsigned int i, j, free_space, available;
+	int ret = -1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\nTest exact size deque. Esize: %d\n", esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "std sized deque";
+		std_d = rte_deque_create(DEQUE_NAME, esize[i], deque_sz, 0, 0);
+
+		if (std_d == NULL) {
+			printf("%s: error, can't create std deque\n", __func__);
+			goto test_fail;
+		}
+		static const char *DEQUE_NAME2 = "Exact sized deque";
+		exact_sz_d = rte_deque_create(DEQUE_NAME2, esize[i], deque_sz,
+					0, RTE_DEQUE_F_EXACT_SZ);
+		if (exact_sz_d == NULL) {
+			printf("%s: error, can't create exact size deque\n",
+					__func__);
+			goto test_fail;
+		}
+
+		/* alloc object pointers. Allocate one extra object
+		 * and create an unaligned address.
+		 */
+		src_orig = test_deque_calloc(17, esize[i]);
+		if (src_orig == NULL)
+			goto test_fail;
+		test_deque_mem_init(src_orig, 17, esize[i]);
+		src = (void *)((uintptr_t)src_orig + 1);
+		cur_src = src;
+
+		dst_orig = test_deque_calloc(17, esize[i]);
+		if (dst_orig == NULL)
+			goto test_fail;
+		dst = (void *)((uintptr_t)dst_orig + 1);
+		cur_dst = dst;
+
+		/*
+		 * Check that the exact size deque is bigger than the
+		 * standard deque
+		 */
+		TEST_DEQUE_VERIFY(rte_deque_get_size(std_d) <=
+				rte_deque_get_size(exact_sz_d),
+				std_d, goto test_fail);
+
+		/*
+		 * check that the exact_sz_deque can hold one more element
+		 * than the standard deque. (16 vs 15 elements)
+		 */
+		for (j = 0; j < deque_sz - 1; j++) {
+			ret = test_enqdeq_impl[0].enq(std_d, cur_src, esize[i],
+						1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, std_d, goto test_fail);
+			ret = test_enqdeq_impl[0].enq(exact_sz_d, cur_src,
+						esize[i], 1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, exact_sz_d, goto test_fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+		}
+		ret = test_enqdeq_impl[0].enq(std_d, cur_src, esize[i], 1,
+					&free_space);
+		TEST_DEQUE_VERIFY(ret == 0, std_d, goto test_fail);
+		ret = test_enqdeq_impl[0].enq(exact_sz_d, cur_src, esize[i], 1,
+					&free_space);
+		TEST_DEQUE_VERIFY(ret == 1, exact_sz_d, goto test_fail);
+
+		/* check that dequeue returns the expected number of elements */
+		ret = test_enqdeq_impl[0].deq(exact_sz_d, cur_dst, esize[i],
+					deque_sz, &available);
+		TEST_DEQUE_VERIFY(ret == (int)deque_sz, exact_sz_d,
+				goto test_fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], deque_sz);
+
+		/* check that the capacity function returns expected value */
+		TEST_DEQUE_VERIFY(rte_deque_get_capacity(exact_sz_d) == deque_sz,
+				exact_sz_d, goto test_fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					exact_sz_d, goto test_fail);
+
+		rte_free(src_orig);
+		rte_free(dst_orig);
+		rte_deque_free(std_d);
+		rte_deque_free(exact_sz_d);
+		src_orig = NULL;
+		dst_orig = NULL;
+		std_d = NULL;
+		exact_sz_d = NULL;
+	}
+
+	return 0;
+
+test_fail:
+	rte_free(src_orig);
+	rte_free(dst_orig);
+	rte_deque_free(std_d);
+	rte_deque_free(exact_sz_d);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations in regular mode and zero copy mode.
+ * Random number of elements are enqueued and dequeued first.
+ * Which would bring both head and tail to somewhere in the middle of
+ * the deque. From that point, stack behavior of the deque is tested.
+ */
+static int
+test_deque_stack_random_tests1(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, j, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests1.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Over the boundary deque.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Random starting point stack test\n");
+
+		for (j = 0; j != TEST_DEQUE_FULL_EMPTY_ITER; j++) {
+			/* random shift in the deque */
+			int rand = RTE_MAX(rte_rand() % DEQUE_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+				__func__, i, rand);
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], rand,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret != 0, d, goto fail);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], rand,
+							&available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)rand, d,
+					goto fail);
+
+			/* fill the deque */
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							dsz, &free_space);
+			TEST_DEQUE_VERIFY(ret != 0, d, goto fail);
+
+			TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d,
+					goto fail);
+
+			/* empty the deque */
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i], dsz,
+								&available);
+			TEST_DEQUE_VERIFY(ret == (int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+			/* check data */
+			TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+					dsz, esize[i]) == 0, d, goto fail);
+		}
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/* Tests both standard mode and zero-copy mode.
+ * Keep enqueuing 1, 2, MAX_BULK elements till the deque is full.
+ * Then deque them all and make sure the data is opposite of what
+ * was enqued.
+ */
+static int
+test_deque_stack_random_tests2(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests2.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Multiple enqs, deqs.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+
+		printf("Enqueue objs till the deque is full.\n");
+		unsigned int count = 0;
+		const unsigned int perIterCount = 1 + 2 + MAX_BULK;
+		while (count + perIterCount < DEQUE_SIZE - 1) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							2, &free_space);
+			TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							MAX_BULK, &free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], MAX_BULK);
+			count += perIterCount;
+		}
+		unsigned int leftOver = DEQUE_SIZE - 1 - count;
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						leftOver, &free_space);
+		TEST_DEQUE_VERIFY(ret == leftOver, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], leftOver);
+
+		printf("Deque all the enqued objs.\n");
+		count = 0;
+		while (count + perIterCount < DEQUE_SIZE - 1) {
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+							esize[i], 1, &available);
+			TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 1);
+
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i], 2,
+								&available);
+			TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i],
+								MAX_BULK,
+								&available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK);
+			count += perIterCount;
+		}
+		leftOver = DEQUE_SIZE - 1 - count;
+		ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst, esize[i],
+							leftOver, &available);
+		TEST_DEQUE_VERIFY(ret == leftOver, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], leftOver);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+						dsz, esize[i]) == 0, d,
+						goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Tests both normal mode and zero-copy mode.
+ * Fill up the whole deque, and drain the deque.
+ * Make sure the data matches in reverse order.
+ */
+static int
+test_deque_stack_random_tests3(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, available, free_space;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests3.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		/* fill the deque */
+		printf("Fill the whole deque using 1 "
+		"single enqueue operation.\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						dsz, &free_space);
+		TEST_DEQUE_VERIFY(ret == (int)dsz, d, goto fail);
+
+		TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_full(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d, goto fail);
+
+		/* empty the deque */
+		printf("Empty the whole deque.\n");
+		ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst, esize[i],
+							dsz, &available);
+		TEST_DEQUE_VERIFY(ret == (int)dsz, d, goto fail);
+
+		TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+					dsz, esize[i]) == 0, d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+static int
+deque_enqueue_dequeue_autotest_fn(void)
+{
+	if (test_deque_with_exact_size() != 0)
+		goto fail;
+	int (*test_fns[])(unsigned int test_fn_idx) = {
+		test_deque_burst_bulk_tests1,
+		test_deque_burst_bulk_tests2,
+		test_deque_burst_bulk_tests3,
+		test_deque_burst_bulk_tests4,
+		test_deque_stack_random_tests1,
+		test_deque_stack_random_tests2,
+		test_deque_stack_random_tests3
+	};
+	for (unsigned int test_impl_idx = 0;
+		test_impl_idx < RTE_DIM(test_enqdeq_impl); test_impl_idx++) {
+		for (unsigned int test_fn_idx = 0;
+			test_fn_idx < RTE_DIM(test_fns); test_fn_idx++) {
+			if (test_fns[test_fn_idx](test_impl_idx) != 0)
+				goto fail;
+		}
+	}
+	return 0;
+fail:
+		return -1;
+}
+
+REGISTER_FAST_TEST(deque_enqueue_dequeue_autotest, true, true,
+		deque_enqueue_dequeue_autotest_fn);
diff --git a/app/test/test_deque_helper_functions.c b/app/test/test_deque_helper_functions.c
new file mode 100644
index 0000000000..78f3185c1b
--- /dev/null
+++ b/app/test/test_deque_helper_functions.c
@@ -0,0 +1,170 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include "test.h"
+
+#include <assert.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_deque.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+
+static int
+test_deque_get_memsize(void)
+{
+	const ssize_t RTE_DEQUE_SZ = sizeof(struct rte_deque);
+	/* (1) Should return EINVAL when the supplied size of deque is not a
+	 * power of 2.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, 9), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (2) Should return EINVAL when the supplied size of deque is not a
+	 * multiple of 4.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(5, 8), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (3) Requested size of the deque should be less than or equal to
+	 * RTE_DEQUEUE_SZ_MASK
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, RTE_DEQUE_SZ_MASK), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (4) A deque of count 1, where the element size is 0, should not allocate
+	 * any more memory than necessary to hold the dequeu structure.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(0, 1), RTE_DEQUE_SZ,
+					  "Get memsize function failed.");
+
+	/* (5) Make sure the function is calculating the size correctly.
+	 * size of deque: 128. Size for two elements each of size esize: 8
+	 * total: 128 + 8 = 132
+	 * Cache align'd size = 192.
+	 */
+	const ssize_t calculated_sz = RTE_ALIGN(RTE_DEQUE_SZ + 8, RTE_CACHE_LINE_SIZE);
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, 2), calculated_sz,
+					  "Get memsize function failed.");
+	return 0;
+}
+
+/* Define a Test macro that will allow us to correctly free all the rte_deque
+ * objects that were created as a part of the test in case of a failure.
+ */
+
+#define TEST_DEQUE_MEMSAFE(exp, msg, stmt) do { \
+	if (!(exp)) { \
+		printf("error at %s:%d\tcondition " #exp " failed. Msg: %s\n",	\
+			__func__, __LINE__, msg); \
+		stmt; \
+	 } \
+} while (0)
+
+static int
+test_deque_init(void)
+{
+	{
+	/* (1) Make sure init fails when the flags are not correctly passed in. */
+	struct rte_deque deque;
+
+	/* Calling init with undefined flags should fail. */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, 0x8),
+					  -EINVAL, "Init failed.");
+
+	/* Calling init with a count that is not a power of 2
+	 * And also not the setting the RTE_DEQUE_F_EXACT_SZ
+	 * flag should fail.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, 0),
+					  -EINVAL, "Init failed.");
+
+	/* Calling init with a count that is not a power of 2
+	 * Should succeed only if the RTE_DEQUE_F_EXACT_SZ flag is set.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, RTE_DEQUE_F_EXACT_SZ),
+					  0, "Init failed.");
+	}
+
+	{
+	/* Make sure all the fields are being correctly set when creating a
+	 * Deque of a size that is not a power of 2.
+	 */
+	struct rte_deque deque;
+	static const char NAME[] = "Deque";
+
+	/* Calling init with a count that is not a power of 2
+	 * But with RTE_DEQUE_F_EXACT_SZ should succeed.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, NAME, 10, RTE_DEQUE_F_EXACT_SZ),
+					  0, "Init failed.");
+
+	TEST_ASSERT_BUFFERS_ARE_EQUAL(deque.name, NAME, sizeof(NAME), "Init failed.");
+	TEST_ASSERT_EQUAL(deque.flags, RTE_DEQUE_F_EXACT_SZ, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.size, 16, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.mask, 15, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.capacity, 10, "Init failed.");
+	}
+
+	{
+	/* Make sure all the fields are being correctly set when creating a
+	 * Deque of a size that is a power of 2.
+	 */
+	struct rte_deque deque;
+	static const char NAME[] = "Deque";
+
+	/* Calling init with a count that is not a power of 2
+	 * But with RTE_DEQUE_F_EXACT_SZ should succeed.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, NAME, 16, 0), 0, "Init failed.");
+
+	TEST_ASSERT_EQUAL(deque.size, 16, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.mask, 15, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.capacity, 15, "Init failed.");
+	}
+	return 0;
+}
+
+static int
+test_deque_create(void)
+{
+	struct rte_deque *deque;
+	const char *NAME = "Deque";
+	deque = rte_deque_create(NAME, 4, 16, 0, 0);
+
+	/* Make sure the deque creation is successful. */
+	TEST_DEQUE_MEMSAFE(deque != NULL, "Deque creation failed.", goto fail);
+	TEST_DEQUE_MEMSAFE(deque->memzone != NULL, "Deque creation failed.", goto fail);
+	return 0;
+fail:
+	rte_free(deque);
+	return -1;
+}
+
+#undef TEST_DEQUE_MEMSAFE
+
+static struct unit_test_suite deque_helper_functions_testsuite = {
+	.suite_name = "Deque library helper functions test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_deque_get_memsize),
+		TEST_CASE(test_deque_init),
+		TEST_CASE(test_deque_create),
+		TEST_CASES_END(), /**< NULL terminate unit test array */
+	},
+};
+
+static int
+deque_helper_functions_autotest_fn(void)
+{
+	return unit_test_suite_runner(&deque_helper_functions_testsuite);
+}
+
+REGISTER_FAST_TEST(deque_helper_functions_autotest, true, true,
+		deque_helper_functions_autotest_fn);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-01  1:37 ` [PATCH v1 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  2024-04-01  1:37   ` [PATCH v1 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
  2024-04-01  1:37   ` [PATCH v1 " Aditya Ambadipudi
@ 2024-04-01 14:05   ` Stephen Hemminger
  2024-04-01 22:28     ` Aditya Ambadipudi
  2 siblings, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2024-04-01 14:05 UTC (permalink / raw)
  To: Aditya Ambadipudi
  Cc: dev, jackmin, matan, viacheslavo, roretzla, konstantin.v.ananyev,
	konstantin.ananyev, mb, hofors, Honnappa.Nagarahalli,
	Dhruv.Tripathi, wathsala.vithanage, ganeshaditya1, nd

On Sun, 31 Mar 2024 20:37:27 -0500
Aditya Ambadipudi <aditya.ambadipudi@arm.com> wrote:

> As previously discussed in the mailing list [1] we are sending out this
> patch that provides the implementation and unit test cases for the
> RTE_DEQUE library. This includes functions for creating a RTE_DEQUE 
> object. Allocating memory to it. Deleting that object and free'ing the
> memory associated with it. Enqueue/Dequeue functions. Functions for 
> zero-copy API.
> 
> [1] https://mails.dpdk.org/archives/dev/2023-August/275003.html

Does this build without errors with the Microsoft Visual C compiler?

Want to make sure that all new code does not create more work for the
Windows maintainers.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-01 14:05   ` [PATCH v1 0/2] deque: add multithread unsafe " Stephen Hemminger
@ 2024-04-01 22:28     ` Aditya Ambadipudi
  2024-04-02  0:05       ` Tyler Retzlaff
  2024-04-02  0:47       ` Stephen Hemminger
  0 siblings, 2 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-04-01 22:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, jackmin, matan, viacheslavo, roretzla, konstantin.v.ananyev,
	konstantin.ananyev, mb, hofors, Honnappa Nagarahalli,
	Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

Thanks, Stephen, for the comment.

Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.

Thank you,
Aditya Ambadipudi


________________________________
From: Stephen Hemminger <stephen@networkplumber.org>
Sent: Monday, April 1, 2024 9:05 AM
To: Aditya Ambadipudi <Aditya.Ambadipudi@arm.com>
Cc: dev@dpdk.org <dev@dpdk.org>; jackmin@nvidia.com <jackmin@nvidia.com>; matan@nvidia.com <matan@nvidia.com>; viacheslavo@nvidia.com <viacheslavo@nvidia.com>; roretzla@linux.microsoft.com <roretzla@linux.microsoft.com>; konstantin.v.ananyev@yandex.ru <konstantin.v.ananyev@yandex.ru>; konstantin.ananyev@huawei.com <konstantin.ananyev@huawei.com>; mb@smartsharesystems.com <mb@smartsharesystems.com>; hofors@lysator.liu.se <hofors@lysator.liu.se>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Dhruv Tripathi <Dhruv.Tripathi@arm.com>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; ganeshaditya1@gmail.com <ganeshaditya1@gmail.com>; nd <nd@arm.com>
Subject: Re: [PATCH v1 0/2] deque: add multithread unsafe deque library

On Sun, 31 Mar 2024 20:37:27 -0500
Aditya Ambadipudi <aditya.ambadipudi@arm.com> wrote:

> As previously discussed in the mailing list [1] we are sending out this
> patch that provides the implementation and unit test cases for the
> RTE_DEQUE library. This includes functions for creating a RTE_DEQUE
> object. Allocating memory to it. Deleting that object and free'ing the
> memory associated with it. Enqueue/Dequeue functions. Functions for
> zero-copy API.
>
> [1] https://mails.dpdk.org/archives/dev/2023-August/275003.html

Does this build without errors with the Microsoft Visual C compiler?

Want to make sure that all new code does not create more work for the
Windows maintainers.

[-- Attachment #2: Type: text/html, Size: 3046 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-01 22:28     ` Aditya Ambadipudi
@ 2024-04-02  0:05       ` Tyler Retzlaff
  2024-04-02  0:47       ` Stephen Hemminger
  1 sibling, 0 replies; 48+ messages in thread
From: Tyler Retzlaff @ 2024-04-02  0:05 UTC (permalink / raw)
  To: Aditya Ambadipudi
  Cc: Stephen Hemminger, dev, jackmin, matan, viacheslavo,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors,
	Honnappa Nagarahalli, Dhruv Tripathi,
	Wathsala Wathawana Vithanage, nd

On Mon, Apr 01, 2024 at 10:28:52PM +0000, Aditya Ambadipudi wrote:
> Thanks, Stephen, for the comment.
> 
> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.

what are the dependencies of this lib?

you've provided an agnostic api and unit tests, you can enable it in the
build and the CI will provide a minimum test bar.

> 
> Thank you,
> Aditya Ambadipudi
> 
> 
> ________________________________
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, April 1, 2024 9:05 AM
> To: Aditya Ambadipudi <Aditya.Ambadipudi@arm.com>
> Cc: dev@dpdk.org <dev@dpdk.org>; jackmin@nvidia.com <jackmin@nvidia.com>; matan@nvidia.com <matan@nvidia.com>; viacheslavo@nvidia.com <viacheslavo@nvidia.com>; roretzla@linux.microsoft.com <roretzla@linux.microsoft.com>; konstantin.v.ananyev@yandex.ru <konstantin.v.ananyev@yandex.ru>; konstantin.ananyev@huawei.com <konstantin.ananyev@huawei.com>; mb@smartsharesystems.com <mb@smartsharesystems.com>; hofors@lysator.liu.se <hofors@lysator.liu.se>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Dhruv Tripathi <Dhruv.Tripathi@arm.com>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; ganeshaditya1@gmail.com <ganeshaditya1@gmail.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
> 
> On Sun, 31 Mar 2024 20:37:27 -0500
> Aditya Ambadipudi <aditya.ambadipudi@arm.com> wrote:
> 
> > As previously discussed in the mailing list [1] we are sending out this
> > patch that provides the implementation and unit test cases for the
> > RTE_DEQUE library. This includes functions for creating a RTE_DEQUE
> > object. Allocating memory to it. Deleting that object and free'ing the
> > memory associated with it. Enqueue/Dequeue functions. Functions for
> > zero-copy API.
> >
> > [1] https://mails.dpdk.org/archives/dev/2023-August/275003.html
> 
> Does this build without errors with the Microsoft Visual C compiler?
> 
> Want to make sure that all new code does not create more work for the
> Windows maintainers.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-01 22:28     ` Aditya Ambadipudi
  2024-04-02  0:05       ` Tyler Retzlaff
@ 2024-04-02  0:47       ` Stephen Hemminger
  2024-04-02  1:35         ` Honnappa Nagarahalli
  2024-04-02  6:05         ` Mattias Rönnblom
  1 sibling, 2 replies; 48+ messages in thread
From: Stephen Hemminger @ 2024-04-02  0:47 UTC (permalink / raw)
  To: Aditya Ambadipudi
  Cc: dev, jackmin, matan, viacheslavo, roretzla, konstantin.v.ananyev,
	konstantin.ananyev, mb, hofors, Honnappa Nagarahalli,
	Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

On Mon, 1 Apr 2024 22:28:52 +0000
Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:

> Thanks, Stephen, for the comment.
> 
> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
> 
> Thank you,
> Aditya Ambadipudi

All it requires is the community version of MSVC which is free. And setting up a Windows
VM with KVM is free and easy.

IMHO all new libraries have to build on all environments, unless they are enabling
platform specific features.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  0:47       ` Stephen Hemminger
@ 2024-04-02  1:35         ` Honnappa Nagarahalli
  2024-04-02  2:00           ` Stephen Hemminger
  2024-04-02  6:05         ` Mattias Rönnblom
  1 sibling, 1 reply; 48+ messages in thread
From: Honnappa Nagarahalli @ 2024-04-02  1:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Aditya Ambadipudi, dev, jackmin, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors,
	Dhruv Tripathi, Wathsala Wathawana Vithanage, nd



> On Apr 1, 2024, at 7:47 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Mon, 1 Apr 2024 22:28:52 +0000
> Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
> 
>> Thanks, Stephen, for the comment.
>> 
>> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
>> 
>> Thank you,
>> Aditya Ambadipudi
> 
> All it requires is the community version of MSVC which is free. And setting up a Windows
> VM with KVM is free and easy.
> 
> IMHO all new libraries have to build on all environments, unless they are enabling
> platform specific features.
I see that UNH CI is running Windows VMs, the tests are passing there. So, we do not need to anything.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  1:35         ` Honnappa Nagarahalli
@ 2024-04-02  2:00           ` Stephen Hemminger
  2024-04-02  2:14             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2024-04-02  2:00 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Aditya Ambadipudi, dev, jackmin, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors,
	Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

On Tue, 2 Apr 2024 01:35:28 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> > On Apr 1, 2024, at 7:47 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > 
> > On Mon, 1 Apr 2024 22:28:52 +0000
> > Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
> >   
> >> Thanks, Stephen, for the comment.
> >> 
> >> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
> >> 
> >> Thank you,
> >> Aditya Ambadipudi  
> > 
> > All it requires is the community version of MSVC which is free. And setting up a Windows
> > VM with KVM is free and easy.
> > 
> > IMHO all new libraries have to build on all environments, unless they are enabling
> > platform specific features.  
> I see that UNH CI is running Windows VMs, the tests are passing there. So, we do not need to anything.
> 

That only tests the clang part.
You need to modify lib/meson.build to get it tested with the windows compiler.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  2:00           ` Stephen Hemminger
@ 2024-04-02  2:14             ` Honnappa Nagarahalli
  2024-04-02  2:53               ` Stephen Hemminger
  0 siblings, 1 reply; 48+ messages in thread
From: Honnappa Nagarahalli @ 2024-04-02  2:14 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Aditya Ambadipudi, dev, jackmin, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors,
	Dhruv Tripathi, Wathsala Wathawana Vithanage, nd



> On Apr 1, 2024, at 9:00 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Tue, 2 Apr 2024 01:35:28 +0000
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
>>> On Apr 1, 2024, at 7:47 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>>> 
>>> On Mon, 1 Apr 2024 22:28:52 +0000
>>> Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
>>> 
>>>> Thanks, Stephen, for the comment.
>>>> 
>>>> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
>>>> 
>>>> Thank you,
>>>> Aditya Ambadipudi  
>>> 
>>> All it requires is the community version of MSVC which is free. And setting up a Windows
>>> VM with KVM is free and easy.
>>> 
>>> IMHO all new libraries have to build on all environments, unless they are enabling
>>> platform specific features.  
>> I see that UNH CI is running Windows VMs, the tests are passing there. So, we do not need to anything.
>> 
> 
> That only tests the clang part.
> You need to modify lib/meson.build to get it tested with the windows compiler.
Any idea on when is this getting added to CI?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  2:14             ` Honnappa Nagarahalli
@ 2024-04-02  2:53               ` Stephen Hemminger
       [not found]                 ` <PAVPR08MB9185DC373708CBD16A38EFA8EF3E2@PAVPR08MB9185.eurprd08.prod.outlook.com>
                                   ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Stephen Hemminger @ 2024-04-02  2:53 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Aditya Ambadipudi, dev, jackmin, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors,
	Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

On Tue, 2 Apr 2024 02:14:06 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> > On Apr 1, 2024, at 9:00 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > 
> > On Tue, 2 Apr 2024 01:35:28 +0000
> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> >   
> >>> On Apr 1, 2024, at 7:47 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> >>> 
> >>> On Mon, 1 Apr 2024 22:28:52 +0000
> >>> Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
> >>>   
> >>>> Thanks, Stephen, for the comment.
> >>>> 
> >>>> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
> >>>> 
> >>>> Thank you,
> >>>> Aditya Ambadipudi    
> >>> 
> >>> All it requires is the community version of MSVC which is free. And setting up a Windows
> >>> VM with KVM is free and easy.
> >>> 
> >>> IMHO all new libraries have to build on all environments, unless they are enabling
> >>> platform specific features.    
> >> I see that UNH CI is running Windows VMs, the tests are passing there. So, we do not need to anything.
> >>   
> > 
> > That only tests the clang part.
> > You need to modify lib/meson.build to get it tested with the windows compiler.  
> Any idea on when is this getting added to CI?

You need to add this to next version of the patch.
I tried it and MSVC has no problems with the new code.

Another issue is the naming. Right now the choice of 'deque' generates lots of
checkpatch errors, and every bit of new code that uses the library will get a warning
as well. Can you think of a better name?

diff --git a/lib/meson.build b/lib/meson.build
index 8c8c1e98e2..127e4dc68c 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -75,6 +75,7 @@ if is_ms_compiler
             'kvargs',
             'telemetry',
             'eal',
+            'deque',
             'ring',
     ]
 endif

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
       [not found]                 ` <PAVPR08MB9185DC373708CBD16A38EFA8EF3E2@PAVPR08MB9185.eurprd08.prod.outlook.com>
@ 2024-04-02  4:20                   ` Tyler Retzlaff
  2024-04-02 23:44                     ` Stephen Hemminger
  0 siblings, 1 reply; 48+ messages in thread
From: Tyler Retzlaff @ 2024-04-02  4:20 UTC (permalink / raw)
  To: Aditya Ambadipudi
  Cc: Stephen Hemminger, Honnappa Nagarahalli, dev, jackmin, matan,
	viacheslavo, konstantin.v.ananyev, konstantin.ananyev, mb,
	hofors, Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

On Tue, Apr 02, 2024 at 03:03:13AM +0000, Aditya Ambadipudi wrote:
> Hello Stephen,
> 
> I have a copy of CLRS with me. And Deque is a very standard word in computer science. Even CLRS which is considered one of the most foundational books in computer science uses the word deque.

i'm kind of inclined to agree with this. double ended queue is pretty
well known as ``deque`` perhaps most notably from stl.

https://en.cppreference.com/w/cpp/container/deque

i would however strongly advise that there be no use of ``deque`` bare
(without the rte_ prefix) in any public header. (i.e. inline function
variables , parameter names, etc...). that would almost certainly result
in frustration for C++ consumers of dpdk that may be doing the
following:

#include <deque>
#include <rte_deque.h>

using namespace std;

a quick pass of the patches and i don't see any instances without the
rte_ prefix so only cautioning that we would want to avoid it.

> 
> I don't think there is any better word to describe the datastructure we are building other than "deque".
> 
> Is there a way to add an exception for that word in the dictionary words check we run? I genuinely think the readability of this library that we are building will suffer if we don't use the word "deque" here.
> 
> Thank you,
> Aditya Ambadipudi
> 
> [image/jpeg][image/jpeg]
> 
> ________________________________
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, April 1, 2024 9:53 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: Aditya Ambadipudi <Aditya.Ambadipudi@arm.com>; dev@dpdk.org <dev@dpdk.org>; jackmin@nvidia.com <jackmin@nvidia.com>; matan@nvidia.com <matan@nvidia.com>; viacheslavo@nvidia.com <viacheslavo@nvidia.com>; roretzla@linux.microsoft.com <roretzla@linux.microsoft.com>; konstantin.v.ananyev@yandex.ru <konstantin.v.ananyev@yandex.ru>; konstantin.ananyev@huawei.com <konstantin.ananyev@huawei.com>; mb@smartsharesystems.com <mb@smartsharesystems.com>; hofors@lysator.liu.se <hofors@lysator.liu.se>; Dhruv Tripathi <Dhruv.Tripathi@arm.com>; Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
> 
> On Tue, 2 Apr 2024 02:14:06 +0000
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> > > On Apr 1, 2024, at 9:00 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > >
> > > On Tue, 2 Apr 2024 01:35:28 +0000
> > > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> > >
> > >>> On Apr 1, 2024, at 7:47 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > >>>
> > >>> On Mon, 1 Apr 2024 22:28:52 +0000
> > >>> Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
> > >>>
> > >>>> Thanks, Stephen, for the comment.
> > >>>>
> > >>>> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
> > >>>>
> > >>>> Thank you,
> > >>>> Aditya Ambadipudi
> > >>>
> > >>> All it requires is the community version of MSVC which is free. And setting up a Windows
> > >>> VM with KVM is free and easy.
> > >>>
> > >>> IMHO all new libraries have to build on all environments, unless they are enabling
> > >>> platform specific features.
> > >> I see that UNH CI is running Windows VMs, the tests are passing there. So, we do not need to anything.
> > >>
> > >
> > > That only tests the clang part.
> > > You need to modify lib/meson.build to get it tested with the windows compiler.
> > Any idea on when is this getting added to CI?
> 
> You need to add this to next version of the patch.
> I tried it and MSVC has no problems with the new code.
> 
> Another issue is the naming. Right now the choice of 'deque' generates lots of
> checkpatch errors, and every bit of new code that uses the library will get a warning
> as well. Can you think of a better name?
> 
> diff --git a/lib/meson.build b/lib/meson.build
> index 8c8c1e98e2..127e4dc68c 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -75,6 +75,7 @@ if is_ms_compiler
>              'kvargs',
>              'telemetry',
>              'eal',
> +            'deque',
>              'ring',
>      ]
>  endif




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  2:53               ` Stephen Hemminger
       [not found]                 ` <PAVPR08MB9185DC373708CBD16A38EFA8EF3E2@PAVPR08MB9185.eurprd08.prod.outlook.com>
@ 2024-04-02  4:20                 ` Tyler Retzlaff
  2024-04-03 16:50                 ` Honnappa Nagarahalli
  2 siblings, 0 replies; 48+ messages in thread
From: Tyler Retzlaff @ 2024-04-02  4:20 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Honnappa Nagarahalli, Aditya Ambadipudi, dev, jackmin, matan,
	viacheslavo, konstantin.v.ananyev, konstantin.ananyev, mb,
	hofors, Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

On Mon, Apr 01, 2024 at 07:53:48PM -0700, Stephen Hemminger wrote:
> On Tue, 2 Apr 2024 02:14:06 +0000
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> > > On Apr 1, 2024, at 9:00 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > > 
> > > On Tue, 2 Apr 2024 01:35:28 +0000
> > > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> > >   
> > >>> On Apr 1, 2024, at 7:47 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > >>> 
> > >>> On Mon, 1 Apr 2024 22:28:52 +0000
> > >>> Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
> > >>>   
> > >>>> Thanks, Stephen, for the comment.
> > >>>> 
> > >>>> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
> > >>>> 
> > >>>> Thank you,
> > >>>> Aditya Ambadipudi    
> > >>> 
> > >>> All it requires is the community version of MSVC which is free. And setting up a Windows
> > >>> VM with KVM is free and easy.
> > >>> 
> > >>> IMHO all new libraries have to build on all environments, unless they are enabling
> > >>> platform specific features.    
> > >> I see that UNH CI is running Windows VMs, the tests are passing there. So, we do not need to anything.
> > >>   
> > > 
> > > That only tests the clang part.
> > > You need to modify lib/meson.build to get it tested with the windows compiler.  
> > Any idea on when is this getting added to CI?
> 
> You need to add this to next version of the patch.
> I tried it and MSVC has no problems with the new code.
> 
> Another issue is the naming. Right now the choice of 'deque' generates lots of
> checkpatch errors, and every bit of new code that uses the library will get a warning
> as well. Can you think of a better name?
> 
> diff --git a/lib/meson.build b/lib/meson.build
> index 8c8c1e98e2..127e4dc68c 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -75,6 +75,7 @@ if is_ms_compiler
>              'kvargs',
>              'telemetry',
>              'eal',
> +            'deque',
>              'ring',
>      ]
>  endif

please do this.

thanks

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  0:47       ` Stephen Hemminger
  2024-04-02  1:35         ` Honnappa Nagarahalli
@ 2024-04-02  6:05         ` Mattias Rönnblom
  2024-04-02 15:25           ` Stephen Hemminger
  1 sibling, 1 reply; 48+ messages in thread
From: Mattias Rönnblom @ 2024-04-02  6:05 UTC (permalink / raw)
  To: Stephen Hemminger, Aditya Ambadipudi
  Cc: dev, jackmin, matan, viacheslavo, roretzla, konstantin.v.ananyev,
	konstantin.ananyev, mb, Honnappa Nagarahalli, Dhruv Tripathi,
	Wathsala Wathawana Vithanage, nd

On 2024-04-02 02:47, Stephen Hemminger wrote:
> On Mon, 1 Apr 2024 22:28:52 +0000
> Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
> 
>> Thanks, Stephen, for the comment.
>>
>> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
>>
>> Thank you,
>> Aditya Ambadipudi
> 
> All it requires is the community version of MSVC which is free. And setting up a Windows
> VM with KVM is free and easy.
> 
> IMHO all new libraries have to build on all environments, unless they are enabling
> platform specific features.

Requiring all contributors to build and test on what is, in this 
context, a pretty obscure platform with a pretty obscure compiler seems 
like a bad idea to me.

It will raise the bar for contributions further.

In principle I agree though. Your contribution should not only build, 
but also run (and be tested) on all platforms. Otherwise, Windows isn't 
supported in the upstream, but rather we have a Windows port (which 
happens to live in the same source tree).

I never tested any contribution on a FreeBSD system, but at least those 
use the de-facto standard compilers and a standard API (POSIX), so the 
likelihood of things actually working is greater (but maybe not great 
enough).

Surely, this is something the tech board must have discussed when it 
agreed to supporting Windows *and* MSVC. Many if not most of the 
man-hours involved won't be spent by the Windows maintainer, but the 
individual future contributors.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  6:05         ` Mattias Rönnblom
@ 2024-04-02 15:25           ` Stephen Hemminger
  0 siblings, 0 replies; 48+ messages in thread
From: Stephen Hemminger @ 2024-04-02 15:25 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Aditya Ambadipudi, dev, jackmin, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb,
	Honnappa Nagarahalli, Dhruv Tripathi,
	Wathsala Wathawana Vithanage, nd

On Tue, 2 Apr 2024 08:05:41 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> On 2024-04-02 02:47, Stephen Hemminger wrote:
> > On Mon, 1 Apr 2024 22:28:52 +0000
> > Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
> >   
> >> Thanks, Stephen, for the comment.
> >>
> >> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
> >>
> >> Thank you,
> >> Aditya Ambadipudi  
> > 
> > All it requires is the community version of MSVC which is free. And setting up a Windows
> > VM with KVM is free and easy.
> > 
> > IMHO all new libraries have to build on all environments, unless they are enabling
> > platform specific features.  
> 
> Requiring all contributors to build and test on what is, in this 
> context, a pretty obscure platform with a pretty obscure compiler seems 
> like a bad idea to me.
> 
> It will raise the bar for contributions further.
> 
> In principle I agree though. Your contribution should not only build, 
> but also run (and be tested) on all platforms. Otherwise, Windows isn't 
> supported in the upstream, but rather we have a Windows port (which 
> happens to live in the same source tree).
> 
> I never tested any contribution on a FreeBSD system, but at least those 
> use the de-facto standard compilers and a standard API (POSIX), so the 
> likelihood of things actually working is greater (but maybe not great 
> enough).
> 
> Surely, this is something the tech board must have discussed when it 
> agreed to supporting Windows *and* MSVC. Many if not most of the 
> man-hours involved won't be spent by the Windows maintainer, but the 
> individual future contributors.

This what CI systems are for. FreeBSD and Windows are enabled in the
build system; just need to make sure all new libraries are enabled.

That said, I am all for keeping focus. So open to discussions on
dropping lesser platforms or build environments if they interfere.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  4:20                   ` Tyler Retzlaff
@ 2024-04-02 23:44                     ` Stephen Hemminger
  2024-04-03  0:12                       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2024-04-02 23:44 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Aditya Ambadipudi, Honnappa Nagarahalli, dev, jackmin, matan,
	viacheslavo, konstantin.v.ananyev, konstantin.ananyev, mb,
	hofors, Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

On Mon, 1 Apr 2024 21:20:13 -0700
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:

> On Tue, Apr 02, 2024 at 03:03:13AM +0000, Aditya Ambadipudi wrote:
> > Hello Stephen,
> > 
> > I have a copy of CLRS with me. And Deque is a very standard word in computer science. Even CLRS which is considered one of the most foundational books in computer science uses the word deque.  
> 
> i'm kind of inclined to agree with this. double ended queue is pretty
> well known as ``deque`` perhaps most notably from stl.
> 
> https://en.cppreference.com/w/cpp/container/deque

The root cause of my complaint is the codespell dictionary used in the patchwork tests.
It is a bit of twisted path to fix this though. CI runs checkpatch.sh which runs kernel checkpatch.pl
which calls codespell library. Although codespell has an ignore list, there is no option
to feed that through in kernel's checkpatch.pl file.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02 23:44                     ` Stephen Hemminger
@ 2024-04-03  0:12                       ` Honnappa Nagarahalli
  2024-04-03 23:52                         ` Variable name issues with codespell Stephen Hemminger
  0 siblings, 1 reply; 48+ messages in thread
From: Honnappa Nagarahalli @ 2024-04-03  0:12 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Tyler Retzlaff, Aditya Ambadipudi, dev, jackmin, matan,
	viacheslavo, konstantin.v.ananyev, konstantin.ananyev, mb,
	hofors, Dhruv Tripathi, Wathsala Wathawana Vithanage, nd



> On Apr 2, 2024, at 6:44 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Mon, 1 Apr 2024 21:20:13 -0700
> Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:
> 
>> On Tue, Apr 02, 2024 at 03:03:13AM +0000, Aditya Ambadipudi wrote:
>>> Hello Stephen,
>>> 
>>> I have a copy of CLRS with me. And Deque is a very standard word in computer science. Even CLRS which is considered one of the most foundational books in computer science uses the word deque.  
>> 
>> i'm kind of inclined to agree with this. double ended queue is pretty
>> well known as ``deque`` perhaps most notably from stl.
>> 
>> https://en.cppreference.com/w/cpp/container/deque
> 
> The root cause of my complaint is the codespell dictionary used in the patchwork tests.
> It is a bit of twisted path to fix this though. CI runs checkpatch.sh which runs kernel checkpatch.pl
> which calls codespell library. Although codespell has an ignore list, there is no option
> to feed that through in kernel's checkpatch.pl file.
So, not under DPDK control. May be we need to ignore the errors manually.

> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-02  2:53               ` Stephen Hemminger
       [not found]                 ` <PAVPR08MB9185DC373708CBD16A38EFA8EF3E2@PAVPR08MB9185.eurprd08.prod.outlook.com>
  2024-04-02  4:20                 ` [PATCH v1 0/2] deque: add multithread unsafe deque library Tyler Retzlaff
@ 2024-04-03 16:50                 ` Honnappa Nagarahalli
  2024-04-03 17:46                   ` Tyler Retzlaff
  2 siblings, 1 reply; 48+ messages in thread
From: Honnappa Nagarahalli @ 2024-04-03 16:50 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Aditya Ambadipudi, dev, jackmin, matan, viacheslavo, roretzla,
	konstantin.v.ananyev, konstantin.ananyev, mb, hofors,
	Dhruv Tripathi, Wathsala Wathawana Vithanage, nd



> On Apr 1, 2024, at 9:53 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Tue, 2 Apr 2024 02:14:06 +0000
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
>>> On Apr 1, 2024, at 9:00 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>>> 
>>> On Tue, 2 Apr 2024 01:35:28 +0000
>>> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
>>> 
>>>>> On Apr 1, 2024, at 7:47 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>>>>> 
>>>>> On Mon, 1 Apr 2024 22:28:52 +0000
>>>>> Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
>>>>> 
>>>>>> Thanks, Stephen, for the comment.
>>>>>> 
>>>>>> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
>>>>>> 
>>>>>> Thank you,
>>>>>> Aditya Ambadipudi    
>>>>> 
>>>>> All it requires is the community version of MSVC which is free. And setting up a Windows
>>>>> VM with KVM is free and easy.
>>>>> 
>>>>> IMHO all new libraries have to build on all environments, unless they are enabling
>>>>> platform specific features.    
>>>> I see that UNH CI is running Windows VMs, the tests are passing there. So, we do not need to anything.
>>>> 
>>> 
>>> That only tests the clang part.
>>> You need to modify lib/meson.build to get it tested with the windows compiler.  
>> Any idea on when is this getting added to CI?
> 
> You need to add this to next version of the patch.
> I tried it and MSVC has no problems with the new code.
> 
> Another issue is the naming. Right now the choice of 'deque' generates lots of
> checkpatch errors, and every bit of new code that uses the library will get a warning
> as well. Can you think of a better name?
> 
> diff --git a/lib/meson.build b/lib/meson.build
> index 8c8c1e98e2..127e4dc68c 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -75,6 +75,7 @@ if is_ms_compiler
>             'kvargs',
>             'telemetry',
>             'eal',
> +            'deque',
>             'ring',
>     ]
> endif
As discussed in Techboard meeting, the above change is not required as Tyler has a patch [1] that addresses this.

[1] https://patches.dpdk.org/project/dpdk/patch/1712076948-25853-2-git-send-email-roretzla@linux.microsoft.com/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] deque: add multithread unsafe deque library
  2024-04-03 16:50                 ` Honnappa Nagarahalli
@ 2024-04-03 17:46                   ` Tyler Retzlaff
  0 siblings, 0 replies; 48+ messages in thread
From: Tyler Retzlaff @ 2024-04-03 17:46 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Stephen Hemminger, Aditya Ambadipudi, dev, jackmin, matan,
	viacheslavo, konstantin.v.ananyev, konstantin.ananyev, mb,
	hofors, Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

On Wed, Apr 03, 2024 at 04:50:02PM +0000, Honnappa Nagarahalli wrote:
> 
> 
> > On Apr 1, 2024, at 9:53 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > 
> > On Tue, 2 Apr 2024 02:14:06 +0000
> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> > 
> >>> On Apr 1, 2024, at 9:00 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> >>> 
> >>> On Tue, 2 Apr 2024 01:35:28 +0000
> >>> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> >>> 
> >>>>> On Apr 1, 2024, at 7:47 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> >>>>> 
> >>>>> On Mon, 1 Apr 2024 22:28:52 +0000
> >>>>> Aditya Ambadipudi <Aditya.Ambadipudi@arm.com> wrote:
> >>>>> 
> >>>>>> Thanks, Stephen, for the comment.
> >>>>>> 
> >>>>>> Unfortunately, we don't have the dev setup nor the resources to test out this change using MSVC.
> >>>>>> 
> >>>>>> Thank you,
> >>>>>> Aditya Ambadipudi    
> >>>>> 
> >>>>> All it requires is the community version of MSVC which is free. And setting up a Windows
> >>>>> VM with KVM is free and easy.
> >>>>> 
> >>>>> IMHO all new libraries have to build on all environments, unless they are enabling
> >>>>> platform specific features.    
> >>>> I see that UNH CI is running Windows VMs, the tests are passing there. So, we do not need to anything.
> >>>> 
> >>> 
> >>> That only tests the clang part.
> >>> You need to modify lib/meson.build to get it tested with the windows compiler.  
> >> Any idea on when is this getting added to CI?
> > 
> > You need to add this to next version of the patch.
> > I tried it and MSVC has no problems with the new code.
> > 
> > Another issue is the naming. Right now the choice of 'deque' generates lots of
> > checkpatch errors, and every bit of new code that uses the library will get a warning
> > as well. Can you think of a better name?
> > 
> > diff --git a/lib/meson.build b/lib/meson.build
> > index 8c8c1e98e2..127e4dc68c 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -75,6 +75,7 @@ if is_ms_compiler
> >             'kvargs',
> >             'telemetry',
> >             'eal',
> > +            'deque',
> >             'ring',
> >     ]
> > endif
> As discussed in Techboard meeting, the above change is not required as Tyler has a patch [1] that addresses this.
> 
> [1] https://patches.dpdk.org/project/dpdk/patch/1712076948-25853-2-git-send-email-roretzla@linux.microsoft.com/

agreed, please merge my series first to get the correct outcome.

thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Variable name issues with codespell.
  2024-04-03  0:12                       ` Honnappa Nagarahalli
@ 2024-04-03 23:52                         ` Stephen Hemminger
  0 siblings, 0 replies; 48+ messages in thread
From: Stephen Hemminger @ 2024-04-03 23:52 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Tyler Retzlaff, Aditya Ambadipudi, dev, jackmin, matan,
	viacheslavo, konstantin.v.ananyev, konstantin.ananyev, mb,
	hofors, Dhruv Tripathi, Wathsala Wathawana Vithanage, nd

On Wed, 3 Apr 2024 00:12:19 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> > https://en.cppreference.com/w/cpp/container/deque  
> > 
> > The root cause of my complaint is the codespell dictionary used in the patchwork tests.
> > It is a bit of twisted path to fix this though. CI runs checkpatch.sh which runs kernel checkpatch.pl
> > which calls codespell library. Although codespell has an ignore list, there is no option
> > to feed that through in kernel's checkpatch.pl file.  

> So, not under DPDK control. May be we need to ignore the errors manually.

One solution would be for DPDK to provide its own codespell dictionary so that
all users and CI would see the same word list. By cloning the upstream dictionary
and trimming away know issues; maybe even have a devtools script to get the current
dictionary from github and fix it.

Then we could get rid of false positives for 'deque' and 'stdio'

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v1 1/2] deque: add multi-thread unsafe double ended queue
  2024-04-01  1:37   ` [PATCH v1 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
@ 2024-04-06  9:35     ` Morten Brørup
  2024-04-24 13:42     ` [PATCH v2 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  1 sibling, 0 replies; 48+ messages in thread
From: Morten Brørup @ 2024-04-06  9:35 UTC (permalink / raw)
  To: Aditya Ambadipudi, dev, jackmin, stephen, matan, viacheslavo,
	roretzla, konstantin.v.ananyev, konstantin.ananyev, hofors
  Cc: Honnappa.Nagarahalli, Dhruv.Tripathi, wathsala.vithanage,
	ganeshaditya1, nd, Honnappa Nagarahalli

> From: Aditya Ambadipudi [mailto:aditya.ambadipudi@arm.com]
> Sent: Monday, 1 April 2024 03.37
> 
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> Add a multi-thread unsafe double ended queue data structure. This
> library provides a simple and efficient alternative to multi-thread
> safe ring when multi-thread safety is not required.
> 
> Signed-off-by: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---

This is a good contribution, thank you.

Two high-level comments:

1.
Please use head/tail explicitly in function names, not implicitly.
E.g. rename rte_deque_enqueue_bulk_elem() to rte_deque_enqueue_at_head_bulk_elem(), like rte_deque_enqueue_at_tail_bulk_elem().
Also consider removing "at" from the function names, e.g. rte_deque_(head|tail)_enqueue_bulk_elem() instead of rte_deque_enqueue_at_(head|tail)_bulk_elem().

2.
In the future, someone might come up with a lock-free implementation, like for the stack.
Please ensure that the API and documentation is prepared for that, so we don't have to use a different namespace for such a lock-free implementation.
I haven't reviewed the patch thoroughly; if it is already be prepared for this, you can ignore this comment.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 0/2] deque: add multithread unsafe deque library
  2024-04-01  1:37   ` [PATCH v1 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
  2024-04-06  9:35     ` Morten Brørup
@ 2024-04-24 13:42     ` Aditya Ambadipudi
  2024-04-24 13:42       ` [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
  2024-04-24 13:42       ` [PATCH v2 2/2] deque: add unit tests for the " Aditya Ambadipudi
  1 sibling, 2 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-04-24 13:42 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors
  Cc: wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd,
	Aditya Ambadipudi

As previously discussed in the mailing list [1] we are sending out this
patch that provides the implementation and unit test cases for the
RTE_DEQUE library. This includes functions for creating a RTE_DEQUE 
object. Allocating memory to it. Deleting that object and free'ing the
memory associated with it. Enqueue/Dequeue functions. Functions for 
zero-copy API.

Aditya Ambadipudi (1):
  deque: add unit tests for the deque library

Honnappa Nagarahalli (1):
  deque: add multi-thread unsafe double ended queue

 .mailmap                               |    1 +
 app/test/meson.build                   |    2 +
 app/test/test_deque_enqueue_dequeue.c  | 1228 ++++++++++++++++++++++++
 app/test/test_deque_helper_functions.c |  169 ++++
 devtools/build-dict.sh                 |    1 +
 lib/deque/meson.build                  |   11 +
 lib/deque/rte_deque.c                  |  193 ++++
 lib/deque/rte_deque.h                  |  533 ++++++++++
 lib/deque/rte_deque_core.h             |   81 ++
 lib/deque/rte_deque_pvt.h              |  538 +++++++++++
 lib/deque/rte_deque_zc.h               |  430 +++++++++
 lib/deque/version.map                  |   14 +
 lib/meson.build                        |    2 +
 13 files changed, 3203 insertions(+)
 create mode 100644 app/test/test_deque_enqueue_dequeue.c
 create mode 100644 app/test/test_deque_helper_functions.c
 create mode 100644 lib/deque/meson.build
 create mode 100644 lib/deque/rte_deque.c
 create mode 100644 lib/deque/rte_deque.h
 create mode 100644 lib/deque/rte_deque_core.h
 create mode 100644 lib/deque/rte_deque_pvt.h
 create mode 100644 lib/deque/rte_deque_zc.h
 create mode 100644 lib/deque/version.map

-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue
  2024-04-24 13:42     ` [PATCH v2 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
@ 2024-04-24 13:42       ` Aditya Ambadipudi
  2024-04-24 15:16         ` Morten Brørup
                           ` (2 more replies)
  2024-04-24 13:42       ` [PATCH v2 2/2] deque: add unit tests for the " Aditya Ambadipudi
  1 sibling, 3 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-04-24 13:42 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors
  Cc: wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd,
	Aditya Ambadipudi

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add a multi-thread unsafe double ended queue data structure. This
library provides a simple and efficient alternative to multi-thread
safe ring when multi-thread safety is not required.

Signed-off-by: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Change-Id: I6f66fa2ebf750adb22ac75f8cb3c2fe8bdb5fa9e
---
v2:
  * Addressed the spell check warning issue with the word "Deque"
  * Tried to rename all objects that are named deque to avoid collision with
    std::deque
  * Added the deque library to msvc section in meson.build
  * Renamed api functions to explicitly state if the function inserts at head org
    tail.

 .mailmap                   |   1 +
 devtools/build-dict.sh     |   1 +
 lib/deque/meson.build      |  11 +
 lib/deque/rte_deque.c      | 193 +++++++++++++
 lib/deque/rte_deque.h      | 533 ++++++++++++++++++++++++++++++++++++
 lib/deque/rte_deque_core.h |  81 ++++++
 lib/deque/rte_deque_pvt.h  | 538 +++++++++++++++++++++++++++++++++++++
 lib/deque/rte_deque_zc.h   | 430 +++++++++++++++++++++++++++++
 lib/deque/version.map      |  14 +
 lib/meson.build            |   2 +
 10 files changed, 1804 insertions(+)
 create mode 100644 lib/deque/meson.build
 create mode 100644 lib/deque/rte_deque.c
 create mode 100644 lib/deque/rte_deque.h
 create mode 100644 lib/deque/rte_deque_core.h
 create mode 100644 lib/deque/rte_deque_pvt.h
 create mode 100644 lib/deque/rte_deque_zc.h
 create mode 100644 lib/deque/version.map

diff --git a/.mailmap b/.mailmap
index 3843868716..8e705ab6ab 100644
--- a/.mailmap
+++ b/.mailmap
@@ -17,6 +17,7 @@ Adam Bynes <adambynes@outlook.com>
 Adam Dybkowski <adamx.dybkowski@intel.com>
 Adam Ludkiewicz <adam.ludkiewicz@intel.com>
 Adham Masarwah <adham@nvidia.com> <adham@mellanox.com>
+Aditya Ambadipudi <aditya.ambadipudi@arm.com>
 Adrian Moreno <amorenoz@redhat.com>
 Adrian Podlawski <adrian.podlawski@intel.com>
 Adrien Mazarguil <adrien.mazarguil@6wind.com>
diff --git a/devtools/build-dict.sh b/devtools/build-dict.sh
index a8cac49029..595d8f9277 100755
--- a/devtools/build-dict.sh
+++ b/devtools/build-dict.sh
@@ -17,6 +17,7 @@ sed '/^..->/d' |
 sed '/^uint->/d' |
 sed "/^doesn'->/d" |
 sed '/^wasn->/d' |
+sed '/^deque.*->/d' |
 
 # print to stdout
 cat
diff --git a/lib/deque/meson.build b/lib/deque/meson.build
new file mode 100644
index 0000000000..1ff45fc39f
--- /dev/null
+++ b/lib/deque/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Arm Limited
+
+sources = files('rte_deque.c')
+headers = files('rte_deque.h')
+# most sub-headers are not for direct inclusion
+indirect_headers += files (
+        'rte_deque_core.h',
+        'rte_deque_pvt.h',
+        'rte_deque_zc.h'
+)
diff --git a/lib/deque/rte_deque.c b/lib/deque/rte_deque.c
new file mode 100644
index 0000000000..b83a6c43c4
--- /dev/null
+++ b/lib/deque/rte_deque.c
@@ -0,0 +1,193 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include <stdalign.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+
+#include "rte_deque.h"
+
+/* mask of all valid flag values to deque_create() */
+#define __RTE_DEQUE_F_MASK (RTE_DEQUE_F_EXACT_SZ)
+ssize_t
+rte_deque_get_memsize_elem(unsigned int esize, unsigned int count)
+{
+	ssize_t sz;
+
+	/* Check if element size is a multiple of 4B */
+	if (esize % 4 != 0) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): element size is not a multiple of 4\n",
+			__func__);
+
+		return -EINVAL;
+	}
+
+	/* count must be a power of 2 */
+	if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Requested number of elements is invalid,"
+			"must be power of 2, and not exceed %u\n",
+			__func__, RTE_DEQUE_SZ_MASK);
+
+		return -EINVAL;
+	}
+
+	sz = sizeof(struct rte_deque) + (ssize_t)count * esize;
+	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+	return sz;
+}
+
+void
+rte_deque_reset(struct rte_deque *d)
+{
+	d->head = 0;
+	d->tail = 0;
+}
+
+int
+rte_deque_init(struct rte_deque *d, const char *name, unsigned int count,
+	unsigned int flags)
+{
+	int ret;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(struct rte_deque) &
+			  RTE_CACHE_LINE_MASK) != 0);
+
+	/* future proof flags, only allow supported values */
+	if (flags & ~__RTE_DEQUE_F_MASK) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Unsupported flags requested %#x\n",
+			__func__, flags);
+		return -EINVAL;
+	}
+
+	/* init the deque structure */
+	memset(d, 0, sizeof(*d));
+	ret = strlcpy(d->name, name, sizeof(d->name));
+	if (ret < 0 || ret >= (int)sizeof(d->name))
+		return -ENAMETOOLONG;
+	d->flags = flags;
+
+	if (flags & RTE_DEQUE_F_EXACT_SZ) {
+		d->size = rte_align32pow2(count + 1);
+		d->mask = d->size - 1;
+		d->capacity = count;
+	} else {
+		if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
+			rte_log(RTE_LOG_ERR, rte_deque_log_type,
+				"%s(): Requested size is invalid, must be power"
+				" of 2, and not exceed the size limit %u\n",
+				__func__, RTE_DEQUE_SZ_MASK);
+			return -EINVAL;
+		}
+		d->size = count;
+		d->mask = count - 1;
+		d->capacity = d->mask;
+	}
+
+	return 0;
+}
+
+/* create the deque for a given element size */
+struct rte_deque *
+rte_deque_create(const char *name, unsigned int esize, unsigned int count,
+		int socket_id, unsigned int flags)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct rte_deque *d;
+	const struct rte_memzone *mz;
+	ssize_t deque_size;
+	int mz_flags = 0;
+	const unsigned int requested_count = count;
+	int ret;
+
+	/* for an exact size deque, round up from count to a power of two */
+	if (flags & RTE_DEQUE_F_EXACT_SZ)
+		count = rte_align32pow2(count + 1);
+
+	deque_size = rte_deque_get_memsize_elem(esize, count);
+	if (deque_size < 0) {
+		rte_errno = -deque_size;
+		return NULL;
+	}
+
+	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
+		RTE_DEQUE_MZ_PREFIX, name);
+	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
+		rte_errno = ENAMETOOLONG;
+		return NULL;
+	}
+
+	/* reserve a memory zone for this deque. If we can't get rte_config or
+	 * we are secondary process, the memzone_reserve function will set
+	 * rte_errno for us appropriately - hence no check in this function
+	 */
+	mz = rte_memzone_reserve_aligned(mz_name, deque_size, socket_id,
+					 mz_flags, alignof(struct rte_deque));
+	if (mz != NULL) {
+		d = mz->addr;
+		/* no need to check return value here, we already checked the
+		 * arguments above
+		 */
+		rte_deque_init(d, name, requested_count, flags);
+		d->memzone = mz;
+	} else {
+		d = NULL;
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot reserve memory\n", __func__);
+	}
+	return d;
+}
+
+/* free the deque */
+void
+rte_deque_free(struct rte_deque *d)
+{
+	if (d == NULL)
+		return;
+
+	/*
+	 * Deque was not created with rte_deque_create,
+	 * therefore, there is no memzone to free.
+	 */
+	if (d->memzone == NULL) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot free deque, not created "
+			"with rte_deque_create()\n", __func__);
+		return;
+	}
+
+	if (rte_memzone_free(d->memzone) != 0)
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot free memory\n", __func__);
+}
+
+/* dump the status of the deque on the console */
+void
+rte_deque_dump(FILE *f, const struct rte_deque *d)
+{
+	fprintf(f, "deque <%s>@%p\n", d->name, d);
+	fprintf(f, "  flags=%x\n", d->flags);
+	fprintf(f, "  size=%"PRIu32"\n", d->size);
+	fprintf(f, "  capacity=%"PRIu32"\n", d->capacity);
+	fprintf(f, "  head=%"PRIu32"\n", d->head);
+	fprintf(f, "  tail=%"PRIu32"\n", d->tail);
+	fprintf(f, "  used=%u\n", rte_deque_count(d));
+	fprintf(f, "  avail=%u\n", rte_deque_free_count(d));
+}
+
+RTE_LOG_REGISTER_DEFAULT(rte_deque_log_type, ERR);
diff --git a/lib/deque/rte_deque.h b/lib/deque/rte_deque.h
new file mode 100644
index 0000000000..6633eab377
--- /dev/null
+++ b/lib/deque/rte_deque.h
@@ -0,0 +1,533 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_H_
+#define _RTE_DEQUE_H_
+
+/**
+ * @file
+ * RTE double ended queue (Deque)
+ *
+ * This fixed-size queue does not provide concurrent access by
+ * multiple threads. If required, the application should use locks
+ * to protect the deque from concurrent access.
+ *
+ * - Double ended queue
+ * - Maximum size is fixed
+ * - Store objects of any size
+ * - Single/bulk/burst dequeue at tail or head
+ * - Single/bulk/burst enqueue at head or tail
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_deque_core.h>
+#include <rte_deque_pvt.h>
+#include <rte_deque_zc.h>
+
+/**
+ * Calculate the memory size needed for a deque
+ *
+ * This function returns the number of bytes needed for a deque, given
+ * the number of objects and the object size. This value is the sum of
+ * the size of the structure rte_deque and the size of the memory needed
+ * by the objects. The value is aligned to a cache line size.
+ *
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of objects in the deque (must be a power of 2).
+ * @return
+ *   - The memory size needed for the deque on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_deque_get_memsize_elem(unsigned int esize, unsigned int count);
+
+/**
+ * Initialize a deque structure.
+ *
+ * Initialize a deque structure in memory pointed by "d". The size of the
+ * memory area must be large enough to store the deque structure and the
+ * object table. It is advised to use rte_deque_get_memsize() to get the
+ * appropriate size.
+ *
+ * The deque size is set to *count*, which must be a power of two.
+ * The real usable deque size is *count-1* instead of *count* to
+ * differentiate a full deque from an empty deque.
+ *
+ * @param d
+ *   The pointer to the deque structure followed by the objects table.
+ * @param name
+ *   The name of the deque.
+ * @param count
+ *   The number of objects in the deque (must be a power of 2,
+ *   unless RTE_DEQUE_F_EXACT_SZ is set in flags).
+ * @param flags
+ *   - RTE_DEQUE_F_EXACT_SZ: If this flag is set, the deque will hold
+ *     exactly the requested number of objects, and the requested size
+ *     will be rounded up to the next power of two, but the usable space
+ *     will be exactly that requested. Worst case, if a power-of-2 size is
+ *     requested, half the deque space will be wasted.
+ *     Without this flag set, the deque size requested must be a power of 2,
+ *     and the usable space will be that size - 1.
+ * @return
+ *   0 on success, or a negative value on error.
+ */
+__rte_experimental
+int rte_deque_init(struct rte_deque *d, const char *name, unsigned int count,
+		unsigned int flags);
+
+/**
+ * Create a new deque named *name* in memory.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_deque_init() to initialize an empty deque.
+ *
+ * The new deque size is set to *count*, which must be a power of two.
+ * The real usable deque size is *count-1* instead of *count* to
+ * differentiate a full deque from an empty deque.
+ *
+ * @param name
+ *   The name of the deque.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The size of the deque (must be a power of 2,
+ *   unless RTE_DEQUE_F_EXACT_SZ is set in flags).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   - RTE_DEQUE_F_EXACT_SZ: If this flag is set, the deque will hold exactly the
+ *     requested number of entries, and the requested size will be rounded up
+ *     to the next power of two, but the usable space will be exactly that
+ *     requested. Worst case, if a power-of-2 size is requested, half the
+ *     deque space will be wasted.
+ *     Without this flag set, the deque size requested must be a power of 2,
+ *     and the usable space will be that size - 1.
+ * @return
+ *   On success, the pointer to the new allocated deque. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_deque *rte_deque_create(const char *name, unsigned int esize,
+				unsigned int count, int socket_id,
+				unsigned int flags);
+
+/**
+ * De-allocate all memory used by the deque.
+ *
+ * @param d
+ *   Deque to free.
+ *   If NULL then, the function does nothing.
+ */
+__rte_experimental
+void rte_deque_free(struct rte_deque *d);
+
+/**
+ * Dump the status of the deque to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param d
+ *   A pointer to the deque structure.
+ */
+__rte_experimental
+void rte_deque_dump(FILE *f, const struct rte_deque *d);
+
+/**
+ * Return the number of entries in a deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The number of entries in the deque.
+ */
+static inline unsigned int
+rte_deque_count(const struct rte_deque *d)
+{
+	return (d->head - d->tail) & d->mask;
+}
+
+/**
+ * Return the number of free entries in a deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The number of free entries in the deque.
+ */
+static inline unsigned int
+rte_deque_free_count(const struct rte_deque *d)
+{
+	return d->capacity - rte_deque_count(d);
+}
+
+/**
+ * Enqueue fixed number of objects on a deque at the head.
+ *
+ * This function copies the objects at the head of the deque and
+ * moves the head index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_enqueue_bulk_elem(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n,
+			unsigned int *free_space)
+{
+	*free_space = rte_deque_free_count(d);
+	if (unlikely(n > *free_space))
+		return 0;
+	*free_space -= n;
+	return __rte_deque_enqueue_at_head(d, obj_table, esize, n);
+}
+
+/**
+ * Enqueue up to a maximum number of objects on a deque at the head.
+ *
+ * This function copies the objects at the head of the deque and
+ * moves the head index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_enqueue_burst_elem(struct rte_deque *d, const void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *free_space)
+{
+	unsigned int avail_space = rte_deque_free_count(d);
+	unsigned int to_be_enqueued = (n <= avail_space ? n : avail_space);
+	*free_space = avail_space - n;
+	return __rte_deque_enqueue_at_head(d, obj_table, esize, to_be_enqueued);
+}
+
+/**
+ * Enqueue fixed number of objects on a deque at the tail.
+ *
+ * This function copies the objects at the tail of the deque and
+ * moves the tail index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_enqueue_bulk_elem(struct rte_deque *d,
+				 const void *obj_table, unsigned int esize,
+				 unsigned int n, unsigned int *free_space)
+{
+	*free_space = rte_deque_free_count(d);
+	if (unlikely(n > *free_space))
+		return 0;
+	*free_space -= n;
+	return __rte_deque_enqueue_at_tail(d, obj_table, esize, n);
+}
+
+/**
+ * Enqueue up to a maximum number of objects on a deque at the tail.
+ *
+ * This function copies the objects at the tail of the deque and
+ * moves the tail index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_enqueue_burst_elem(struct rte_deque *d,
+				const void *obj_table, unsigned int esize,
+				unsigned int n, unsigned int *free_space)
+{
+	unsigned int avail_space = rte_deque_free_count(d);
+	unsigned int to_be_enqueued = (n <= avail_space ? n : avail_space);
+	*free_space = avail_space - to_be_enqueued;
+	return __rte_deque_enqueue_at_tail(d, obj_table, esize, to_be_enqueued);
+}
+
+/**
+ * Dequeue a fixed number of objects from a deque at tail.
+ *
+ * This function copies the objects from the tail of the deque and
+ * moves the tail index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_dequeue_bulk_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	*available = rte_deque_count(d);
+	if (unlikely(n > *available))
+		return 0;
+	*available -= n;
+	return __rte_deque_dequeue_at_tail(d, obj_table, esize, n);
+}
+
+/**
+ * Dequeue up to a maximum number of objects from a deque at tail.
+ *
+ * This function copies the objects from the tail of the deque and
+ * moves the tail index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_dequeue_burst_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	unsigned int count = rte_deque_count(d);
+	unsigned int to_be_dequeued = (n <= count ? n : count);
+	*available = count - to_be_dequeued;
+	return __rte_deque_dequeue_at_tail(d, obj_table, esize, to_be_dequeued);
+}
+
+/**
+ * Dequeue a fixed number of objects from a deque from the head.
+ *
+ * This function copies the objects from the head of the deque and
+ * moves the head index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_dequeue_bulk_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	*available = rte_deque_count(d);
+	if (unlikely(n > *available))
+		return 0;
+	*available -= n;
+	return __rte_deque_dequeue_at_head(d, obj_table, esize, n);
+}
+
+/**
+ * Dequeue up to a maximum number of objects from a deque from the head.
+ *
+ * This function copies the objects from the head of the deque and
+ * moves the head index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_dequeue_burst_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	unsigned int count = rte_deque_count(d);
+	unsigned int to_be_dequeued = (n <= count ? n : count);
+	*available = count - to_be_dequeued;
+	return __rte_deque_dequeue_at_head(d, obj_table, esize, to_be_dequeued);
+}
+
+/**
+ * Flush a deque.
+ *
+ * This function flush all the objects in a deque
+ *
+ * @warning
+ * Make sure the deque is not in use while calling this function.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ */
+__rte_experimental
+void rte_deque_reset(struct rte_deque *d);
+
+/**
+ * Test if a deque is full.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   - 1: The deque is full.
+ *   - 0: The deque is not full.
+ */
+static inline int
+rte_deque_full(const struct rte_deque *d)
+{
+	return rte_deque_free_count(d) == 0;
+}
+
+/**
+ * Test if a deque is empty.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   - 1: The deque is empty.
+ *   - 0: The deque is not empty.
+ */
+static inline int
+rte_deque_empty(const struct rte_deque *d)
+{
+	return d->tail == d->head;
+}
+
+/**
+ * Return the size of the deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The size of the data store used by the deque.
+ *   NOTE: this is not the same as the usable space in the deque. To query that
+ *   use ``rte_deque_get_capacity()``.
+ */
+static inline unsigned int
+rte_deque_get_size(const struct rte_deque *d)
+{
+	return d->size;
+}
+
+/**
+ * Return the number of objects which can be stored in the deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The usable size of the deque.
+ */
+static inline unsigned int
+rte_deque_get_capacity(const struct rte_deque *d)
+{
+	return d->capacity;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_H_ */
diff --git a/lib/deque/rte_deque_core.h b/lib/deque/rte_deque_core.h
new file mode 100644
index 0000000000..0bb8695c8a
--- /dev/null
+++ b/lib/deque/rte_deque_core.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_CORE_H_
+#define _RTE_DEQUE_CORE_H_
+
+/**
+ * @file
+ * This file contains definition of RTE deque structure, init flags and
+ * some related macros. This file should not be included directly,
+ * include rte_deque.h instead.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <string.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_debug.h>
+
+extern int rte_deque_log_type;
+
+#define RTE_DEQUE_MZ_PREFIX "DEQUE_"
+/** The maximum length of a deque name. */
+#define RTE_DEQUE_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_DEQUE_MZ_PREFIX) + 1)
+
+/**
+ * Double ended queue (deque) structure.
+ *
+ * The producer and the consumer have a head and a tail index. These indices
+ * are not between 0 and size(deque)-1. These indices are between 0 and
+ * 2^32 -1. Their value is masked while accessing the objects in deque.
+ * These indices are unsigned 32bits. Hence the result of the subtraction is
+ * always a modulo of 2^32 and it is between 0 and capacity.
+ */
+struct rte_deque {
+	alignas(RTE_CACHE_LINE_SIZE) char name[RTE_DEQUE_NAMESIZE];
+	/**< Name of the deque */
+	int flags;
+	/**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+	/**< Memzone, if any, containing the rte_deque */
+
+	alignas(RTE_CACHE_LINE_SIZE) char pad0; /**< empty cache line */
+
+	uint32_t size;           /**< Size of deque. */
+	uint32_t mask;           /**< Mask (size-1) of deque. */
+	uint32_t capacity;       /**< Usable size of deque */
+	/** Ring head and tail pointers. */
+	volatile uint32_t head;
+	volatile uint32_t tail;
+};
+
+/**
+ * Deque is to hold exactly requested number of entries.
+ * Without this flag set, the deque size requested must be a power of 2, and the
+ * usable space will be that size - 1. With the flag, the requested size will
+ * be rounded up to the next power of two, but the usable space will be exactly
+ * that requested. Worst case, if a power-of-2 size is requested, half the
+ * deque space will be wasted.
+ */
+#define RTE_DEQUE_F_EXACT_SZ 0x0004
+#define RTE_DEQUE_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_CORE_H_ */
diff --git a/lib/deque/rte_deque_pvt.h b/lib/deque/rte_deque_pvt.h
new file mode 100644
index 0000000000..931bbd4d19
--- /dev/null
+++ b/lib/deque/rte_deque_pvt.h
@@ -0,0 +1,538 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_PVT_H_
+#define _RTE_DEQUE_PVT_H_
+
+#define __RTE_DEQUE_COUNT(d) ((d->head - d->tail) & d->mask)
+#define __RTE_DEQUE_FREE_SPACE(d) (d->capacity - __RTE_DEQUE_COUNT(d))
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_32(struct rte_deque *d,
+				const unsigned int size,
+				uint32_t idx,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t *deque = (uint32_t *)&d[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			deque[idx] = obj[i];
+			deque[idx + 1] = obj[i + 1];
+			deque[idx + 2] = obj[i + 2];
+			deque[idx + 3] = obj[i + 3];
+			deque[idx + 4] = obj[i + 4];
+			deque[idx + 5] = obj[i + 5];
+			deque[idx + 6] = obj[i + 6];
+			deque[idx + 7] = obj[i + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 6:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 5:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 4:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 3:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			deque[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_64(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->head & d->mask);
+	uint64_t *deque = (uint64_t *)&d[1];
+	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			deque[idx] = obj[i];
+			deque[idx + 1] = obj[i + 1];
+			deque[idx + 2] = obj[i + 2];
+			deque[idx + 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			deque[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_128(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->head & d->mask);
+	rte_int128_t *deque = (rte_int128_t *)&d[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_enqueue_at_head(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_enqueue_elems_head_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_enqueue_elems_head_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->head & d->mask;
+		nd_idx = idx * scale;
+		nd_size = d->size * scale;
+		__rte_deque_enqueue_elems_head_32(d, nd_size, nd_idx,
+						obj_table, nd_num);
+	}
+	d->head = (d->head + n) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_32(struct rte_deque *d,
+				const unsigned int mask,
+				uint32_t idx,
+				const void *obj_table,
+				unsigned int n,
+				const unsigned int scale,
+				const unsigned int elem_size)
+{
+	unsigned int i;
+	uint32_t *deque = (uint32_t *)&d[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+
+	if (likely(idx >= n)) {
+		for (i = 0; i < n; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+	} else {
+		for (i = 0; (int32_t)idx >= 0; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+
+		/* Start at the ending */
+		idx = mask;
+		for (; i < n; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_64(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->tail & d->mask);
+	uint64_t *deque = (uint64_t *)&d[1];
+	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx -= 4) {
+			deque[idx] = obj[i];
+			deque[idx - 1] = obj[i + 1];
+			deque[idx - 2] = obj[i + 2];
+			deque[idx - 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			deque[idx] = obj[i];
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_128(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->tail & d->mask);
+	rte_int128_t *deque = (rte_int128_t *)&d[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx -= 2) {
+			deque[idx] = obj[i];
+			deque[idx - 1] = obj[i + 1];
+		}
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_enqueue_at_tail(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* The tail point must point at an empty cell when enqueuing */
+	d->tail--;
+
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_enqueue_elems_tail_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_enqueue_elems_tail_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_mask;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->tail & d->mask;
+		nd_idx = idx * scale;
+		nd_mask = d->mask * scale;
+		__rte_deque_enqueue_elems_tail_32(d, nd_mask, nd_idx, obj_table,
+						nd_num, scale, esize);
+	}
+
+	/* The +1 is because the tail needs to point at a
+	 * non-empty memory location after the enqueuing operation.
+	 */
+	d->tail = (d->tail - n + 1) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_32(struct rte_deque *d,
+			const unsigned int size,
+			uint32_t idx,
+			void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t *deque = (const uint32_t *)&d[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx + 1];
+			obj[i + 2] = deque[idx + 2];
+			obj[i + 3] = deque[idx + 3];
+			obj[i + 4] = deque[idx + 4];
+			obj[i + 5] = deque[idx + 5];
+			obj[i + 6] = deque[idx + 6];
+			obj[i + 7] = deque[idx + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 6:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 5:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 4:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 3:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = deque[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_64(struct rte_deque *d, void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->tail & d->mask);
+	const uint64_t *deque = (const uint64_t *)&d[1];
+	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx + 1];
+			obj[i + 2] = deque[idx + 2];
+			obj[i + 3] = deque[idx + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = deque[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_128(struct rte_deque *d,
+			void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->tail & d->mask);
+	const rte_int128_t *deque = (const rte_int128_t *)&d[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_dequeue_at_tail(struct rte_deque *d,
+			void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_dequeue_elems_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_dequeue_elems_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->tail & d->mask;
+		nd_idx = idx * scale;
+		nd_size = d->size * scale;
+		__rte_deque_dequeue_elems_32(d, nd_size, nd_idx,
+					obj_table, nd_num);
+	}
+	d->tail = (d->tail + n) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_32(struct rte_deque *d,
+				const unsigned int mask,
+				uint32_t idx,
+				void *obj_table,
+				unsigned int n,
+				const unsigned int scale,
+				const unsigned int elem_size)
+{
+	unsigned int i;
+	const uint32_t *deque = (uint32_t *)&d[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+
+	if (likely(idx >= n)) {
+		for (i = 0; i < n; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+	} else {
+		for (i = 0; (int32_t)idx >= 0; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+		/* Start at the ending */
+		idx = mask;
+		for (; i < n; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_64(struct rte_deque *d,
+				void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->head & d->mask);
+	const uint64_t *deque = (uint64_t *)&d[1];
+	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx -= 4) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx - 1];
+			obj[i + 2] = deque[idx - 2];
+			obj[i + 3] = deque[idx - 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = deque[idx--];  /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx--]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx--]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			obj[i] = deque[idx];
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_128(struct rte_deque *d,
+				void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->head & d->mask);
+	const rte_int128_t *deque = (rte_int128_t *)&d[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx -= 2) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx - 1];
+		}
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_dequeue_at_head(struct rte_deque *d,
+			void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* The head must point at an empty cell when dequeueing */
+	d->head--;
+
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_dequeue_elems_head_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_dequeue_elems_head_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_mask;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->head & d->mask;
+		nd_idx = idx * scale;
+		nd_mask = d->mask * scale;
+		__rte_deque_dequeue_elems_head_32(d, nd_mask, nd_idx, obj_table,
+						nd_num, scale, esize);
+	}
+
+	/* The +1 is because the head needs to point at a
+	 * empty memory location after the dequeueing operation.
+	 */
+	d->head = (d->head - n + 1) & d->mask;
+	return n;
+}
+#endif /* _RTE_DEQUEU_PVT_H_ */
diff --git a/lib/deque/rte_deque_zc.h b/lib/deque/rte_deque_zc.h
new file mode 100644
index 0000000000..6d7167e158
--- /dev/null
+++ b/lib/deque/rte_deque_zc.h
@@ -0,0 +1,430 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+#ifndef _RTE_DEQUE_ZC_H_
+#define _RTE_DEQUE_ZC_H_
+
+/**
+ * @file
+ * This file should not be included directly, include rte_deque.h instead.
+ *
+ * Deque Zero Copy APIs
+ * These APIs make it possible to split public enqueue/dequeue API
+ * into 3 parts:
+ * - enqueue/dequeue start
+ * - copy data to/from the deque
+ * - enqueue/dequeue finish
+ * These APIs provide the ability to avoid copying of the data to temporary area.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Deque zero-copy information structure.
+ *
+ * This structure contains the pointers and length of the space
+ * reserved on the Deque storage.
+ */
+struct __rte_cache_aligned rte_deque_zc_data {
+	/* Pointer to the first space in the deque */
+	void *ptr1;
+	/* Pointer to the second space in the deque if there is wrap-around.
+	 * It contains valid value only if wrap-around happens.
+	 */
+	void *ptr2;
+	/* Number of elements in the first pointer. If this is equal to
+	 * the number of elements requested, then ptr2 is NULL.
+	 * Otherwise, subtracting n1 from number of elements requested
+	 * will give the number of elements available at ptr2.
+	 */
+	unsigned int n1;
+};
+
+static __rte_always_inline void
+__rte_deque_get_elem_addr(struct rte_deque *d, uint32_t pos,
+	uint32_t esize, uint32_t num, void **dst1, uint32_t *n1, void **dst2,
+	bool low_to_high)
+{
+	uint32_t idx, scale, nr_idx;
+	uint32_t *deque_ptr = (uint32_t *)&d[1];
+
+	/* Normalize to uint32_t */
+	scale = esize / sizeof(uint32_t);
+	idx = pos & d->mask;
+	nr_idx = idx * scale;
+
+	*dst1 = deque_ptr + nr_idx;
+	*n1 = num;
+
+	if (low_to_high) {
+		if (idx + num > d->size) {
+			*n1 = d->size - idx;
+			*dst2 = deque_ptr;
+		} else
+			*dst2 = NULL;
+	} else {
+		if ((int32_t)(idx - num) < 0) {
+			*n1 = idx + 1;
+			*dst2 = (void *)&deque_ptr[(-1 & d->mask) * scale];
+		} else
+			*dst2 = NULL;
+	}
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the deque by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the deque using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_enqueue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	if (unlikely(*free_space < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->head, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, true);
+
+	*free_space -= n;
+	return n;
+}
+
+/**
+ * Complete enqueuing several pointers to objects on the deque.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of pointers to objects to add to the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_head_enqueue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->head = (d->head + n) & d->mask;
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the queue using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_enqueue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	n = n > *free_space ? *free_space : n;
+	return rte_deque_head_enqueue_zc_bulk_elem_start(d, esize, n, zcd, free_space);
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the deque by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the deque using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_enqueue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	if (unlikely(*free_space < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->tail - 1, esize, n, &zcd->ptr1,
+							  &zcd->n1, &zcd->ptr2, false);
+
+	*free_space -= n;
+	return n;
+}
+
+/**
+ * Complete enqueuing several pointers to objects on the deque.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of pointers to objects to add to the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_tail_enqueue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->tail = (d->tail - n) & d->mask;
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the queue using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.@param r
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_enqueue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	n = n > *free_space ? *free_space : n;
+	return rte_deque_tail_enqueue_zc_bulk_elem_start(d, esize, n, zcd, free_space);
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_dequeue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	if (unlikely(*available < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->tail, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, true);
+
+	*available -= n;
+	return n;
+}
+
+/**
+ * Complete dequeuing several objects from the deque.
+ * Note that number of objects to dequeued should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of objects to remove from the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_tail_dequeue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->tail = (d->tail + n) & d->mask;
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_dequeue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	n = n > *available ? *available : n;
+	return rte_deque_tail_dequeue_zc_bulk_elem_start(d, esize, n, zcd, available);
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_dequeue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	if (unlikely(*available < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->head - 1, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, false);
+
+	*available -= n;
+	return n;
+}
+
+/**
+ * Complete dequeuing several objects from the deque.
+ * Note that number of objects to dequeued should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of objects to remove from the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_head_dequeue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->head = (d->head - n) & d->mask;
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_dequeue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	n = n > *available ? *available : n;
+	return rte_deque_head_dequeue_zc_bulk_elem_start(d, esize, n, zcd, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_ZC_H_ */
diff --git a/lib/deque/version.map b/lib/deque/version.map
new file mode 100644
index 0000000000..103fd3b512
--- /dev/null
+++ b/lib/deque/version.map
@@ -0,0 +1,14 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 24.07
+	rte_deque_log_type;
+	rte_deque_create;
+	rte_deque_dump;
+	rte_deque_free;
+	rte_deque_get_memsize_elem;
+	rte_deque_init;
+	rte_deque_reset;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 179a272932..82929b7a11 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -14,6 +14,7 @@ libraries = [
         'argparse',
         'telemetry', # basic info querying
         'eal', # everything depends on eal
+        'deque',
         'ring',
         'rcu', # rcu depends on ring
         'mempool',
@@ -74,6 +75,7 @@ if is_ms_compiler
             'kvargs',
             'telemetry',
             'eal',
+            'dpdk',
             'ring',
     ]
 endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 2/2] deque: add unit tests for the deque library
  2024-04-24 13:42     ` [PATCH v2 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  2024-04-24 13:42       ` [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
@ 2024-04-24 13:42       ` Aditya Ambadipudi
  1 sibling, 0 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-04-24 13:42 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors
  Cc: wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd,
	Aditya Ambadipudi

Add unit test cases that test all of the enqueue/dequeue functions.
Both normal enqueue/dequeue functions and the zerocopy API functions.

Signed-off-by: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
Change-Id: Ida5bdefdd9d001b792a8d4be011387ff4f84c154
---
v2:
  * Addressed the spell check warning issue with the word "Deque"
  * Tried to rename all objects that are named deque to avoid collision with
    std::deque

 app/test/meson.build                   |    2 +
 app/test/test_deque_enqueue_dequeue.c  | 1228 ++++++++++++++++++++++++
 app/test/test_deque_helper_functions.c |  169 ++++
 3 files changed, 1399 insertions(+)
 create mode 100644 app/test/test_deque_enqueue_dequeue.c
 create mode 100644 app/test/test_deque_helper_functions.c

diff --git a/app/test/meson.build b/app/test/meson.build
index 7d909039ae..8913050c9b 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -60,6 +60,8 @@ source_file_deps = {
     'test_cryptodev_security_tls_record.c': ['cryptodev', 'security'],
     'test_cycles.c': [],
     'test_debug.c': [],
+    'test_deque_enqueue_dequeue.c': ['deque'],
+    'test_deque_helper_functions.c': ['deque'],
     'test_devargs.c': ['kvargs'],
     'test_dispatcher.c': ['dispatcher'],
     'test_distributor.c': ['distributor'],
diff --git a/app/test/test_deque_enqueue_dequeue.c b/app/test/test_deque_enqueue_dequeue.c
new file mode 100644
index 0000000000..5816218235
--- /dev/null
+++ b/app/test/test_deque_enqueue_dequeue.c
@@ -0,0 +1,1228 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+
+#include "test.h"
+
+#include <assert.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_deque.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+static const int esize[] = {4, 8, 16, 20};
+#define DEQUE_SIZE 4096
+#define MAX_BULK 32
+#define TEST_DEQUE_FULL_EMPTY_ITER 8
+
+/*
+ * Validate the return value of test cases and print details of the
+ * deque if validation fails
+ *
+ * @param exp
+ *   Expression to validate return value.
+ * @param r
+ *   A pointer to the deque structure.
+ */
+#define TEST_DEQUE_VERIFY(exp, d, errst) do {				\
+	if (!(exp)) {							\
+		printf("error at %s:%d\tcondition " #exp " failed\n",	\
+			__func__, __LINE__);				\
+		rte_deque_dump(stdout, (d));				\
+		errst;							\
+	}								\
+} while (0)
+
+static int
+test_deque_mem_cmp(void *src, void *dst, unsigned int size)
+{
+	int ret;
+
+	ret = memcmp(src, dst, size);
+	if (ret) {
+		rte_hexdump(stdout, "src", src, size);
+		rte_hexdump(stdout, "dst", dst, size);
+		printf("data after dequeue is not the same\n");
+	}
+
+	return ret;
+}
+
+static int
+test_deque_mem_cmp_rvs(void *src, void *dst,
+		unsigned int count, unsigned int esize)
+{
+	int ret = 0;
+	uint32_t *src32 = ((uint32_t *)src), *dst32 = ((uint32_t *)dst);
+	uint32_t scale = esize/(sizeof(uint32_t));
+
+	/* Start at the end of the dst and compare from there.*/
+	dst32 += (count - 1) * scale;
+	for (unsigned int i = 0; i < count; i++) {
+		for (unsigned int j = 0; j < scale; j++) {
+			if (src32[j] != dst32[j]) {
+				ret = -1;
+				break;
+			}
+		}
+		if (ret)
+			break;
+		dst32 -= scale;
+		src32 += scale;
+	}
+	if (ret) {
+		rte_hexdump(stdout, "src", src, count * esize);
+		rte_hexdump(stdout, "dst", dst, count * esize);
+		printf("data after dequeue is not the same\n");
+	}
+
+	return ret;
+}
+
+static inline void *
+test_deque_calloc(unsigned int dsize, int esize)
+{
+	void *p;
+
+	p = rte_zmalloc(NULL, dsize * esize, RTE_CACHE_LINE_SIZE);
+	if (p == NULL)
+		printf("Failed to allocate memory\n");
+
+	return p;
+}
+
+static void
+test_deque_mem_init(void *obj, unsigned int count, int esize)
+{
+	for (unsigned int i = 0; i < (count * esize / sizeof(uint32_t)); i++)
+		((uint32_t *)obj)[i] = i;
+}
+
+static inline void *
+test_deque_inc_ptr(void *obj, int esize, unsigned int n)
+{
+	return (void *)((uint32_t *)obj + (n * esize / sizeof(uint32_t)));
+}
+
+/* Copy to the deque memory */
+static inline void
+test_deque_zc_copy_to_deque(struct rte_deque_zc_data *zcd, const void *src, int esize,
+	unsigned int num)
+{
+	memcpy(zcd->ptr1, src, esize * zcd->n1);
+	if (zcd->n1 != num) {
+		const void *inc_src = (const void *)((const char *)src +
+						(zcd->n1 * esize));
+		memcpy(zcd->ptr2, inc_src, esize * (num - zcd->n1));
+	}
+}
+
+static inline void
+test_deque_zc_copy_to_deque_rev(struct rte_deque_zc_data *zcd, const void *src,
+					int esize, unsigned int num)
+{
+	void *ptr1 = zcd->ptr1;
+	for (unsigned int i = 0; i < zcd->n1; i++) {
+		memcpy(ptr1, src, esize);
+		src = (const void *)((const char *)src + esize);
+		ptr1 = (void *)((char *)ptr1 - esize);
+	}
+	if (zcd->n1 != num) {
+		void *ptr2 = zcd->ptr2;
+		for (unsigned int i = 0; i < (num - zcd->n1); i++) {
+			memcpy(ptr2, src, esize);
+			src = (const void *)((const char *)src + esize);
+			ptr2 = (void *)((char *)ptr2 - esize);
+		}
+	}
+}
+
+/* Copy from the deque memory */
+static inline void
+test_deque_zc_copy_from_deque(struct rte_deque_zc_data *zcd, void *dst, int esize,
+	unsigned int num)
+{
+	memcpy(dst, zcd->ptr1, esize * zcd->n1);
+
+	if (zcd->n1 != num) {
+		dst = test_deque_inc_ptr(dst, esize, zcd->n1);
+		memcpy(dst, zcd->ptr2, esize * (num - zcd->n1));
+	}
+}
+
+static inline void
+test_deque_zc_copy_from_deque_rev(struct rte_deque_zc_data *zcd, void *dst, int esize,
+	unsigned int num)
+{
+	void *ptr1 = zcd->ptr1;
+	for (unsigned int i = 0; i < zcd->n1; i++) {
+		memcpy(dst, ptr1, esize);
+		dst = (void *)((char *)dst + esize);
+		ptr1 = (void *)((char *)ptr1 - esize);
+	}
+	if (zcd->n1 != num) {
+		void *ptr2 = zcd->ptr2;
+		for (unsigned int i = 0; i < (num - zcd->n1); i++) {
+			memcpy(dst, ptr2, esize);
+			dst = (void *)((char *)dst + esize);
+			ptr2 = (void *)((char *)ptr2 - esize);
+		}
+	}
+}
+
+/* Wrappers around the zero-copy APIs. The wrappers match
+ * the normal enqueue/dequeue API declarations.
+ */
+static unsigned int
+test_deque_head_enqueue_zc_bulk_elem(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_head_enqueue_zc_bulk_elem_start(d, esize, n,
+						&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque(&zcd, obj_table, esize, ret);
+		rte_deque_head_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_tail_dequeue_zc_bulk_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_tail_dequeue_zc_bulk_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque(&zcd, obj_table, esize, ret);
+		rte_deque_tail_dequeue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_head_enqueue_zc_burst_elem(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_head_enqueue_zc_burst_elem_start(d, esize, n,
+						&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque(&zcd, obj_table, esize, ret);
+		rte_deque_head_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_tail_dequeue_zc_burst_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_tail_dequeue_zc_burst_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque(&zcd, obj_table, esize, ret);
+		rte_deque_tail_dequeue_zc_elem_finish(d, ret);
+	}
+	return ret;
+}
+
+static unsigned int
+test_deque_tail_enqueue_zc_bulk_elem(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_tail_enqueue_zc_bulk_elem_start(d, esize, n,
+							&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_tail_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_head_dequeue_zc_bulk_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_head_dequeue_zc_bulk_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_head_dequeue_zc_elem_finish(d, ret);
+	}
+	return ret;
+}
+
+static unsigned int
+test_deque_tail_enqueue_zc_burst_elem(struct rte_deque *d,
+	const void *obj_table, unsigned int esize, unsigned int n,
+	unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_tail_enqueue_zc_burst_elem_start(d, esize, n,
+							&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_tail_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_head_dequeue_zc_burst_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_head_dequeue_zc_burst_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_head_dequeue_zc_elem_finish(d, ret);
+	}
+	return ret;
+}
+
+#define TEST_DEQUE_ELEM_BULK 8
+#define TEST_DEQUE_ELEM_BURST 16
+static const struct {
+	const char *desc;
+	const int api_flags;
+	unsigned int (*enq)(struct rte_deque *d, const void *obj_table,
+		unsigned int esize, unsigned int n,
+		unsigned int *free_space);
+	unsigned int (*deq)(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available);
+	/* This dequeues in the opposite direction of enqueue.
+	 * This is used for testing stack behavior
+	 */
+	unsigned int (*deq_opp)(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available);
+} test_enqdeq_impl[] = {
+	{
+		.desc = "Deque forward direction bulkmode",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = rte_deque_head_enqueue_bulk_elem,
+		.deq = rte_deque_tail_dequeue_bulk_elem,
+		.deq_opp = rte_deque_head_dequeue_bulk_elem,
+	},
+	{
+		.desc = "Deque forward direction burstmode",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = rte_deque_head_enqueue_burst_elem,
+		.deq = rte_deque_tail_dequeue_burst_elem,
+		.deq_opp = rte_deque_head_dequeue_burst_elem,
+	},
+	{
+		.desc = "Deque reverse direction bulkmode",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = rte_deque_tail_enqueue_bulk_elem,
+		.deq = rte_deque_head_dequeue_bulk_elem,
+		.deq_opp = rte_deque_tail_dequeue_bulk_elem,
+	},
+	{
+		.desc = "Deque reverse direction burstmode",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = rte_deque_tail_enqueue_burst_elem,
+		.deq = rte_deque_head_dequeue_burst_elem,
+		.deq_opp = rte_deque_tail_dequeue_burst_elem,
+	},
+	{
+		.desc = "Deque forward direction bulkmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = test_deque_head_enqueue_zc_bulk_elem,
+		.deq = test_deque_tail_dequeue_zc_bulk_elem,
+		.deq_opp = test_deque_head_dequeue_zc_bulk_elem,
+	},
+	{
+		.desc = "Deque forward direction burstmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = test_deque_head_enqueue_zc_burst_elem,
+		.deq = test_deque_tail_dequeue_zc_burst_elem,
+		.deq_opp = test_deque_head_dequeue_zc_burst_elem,
+	},
+	{
+		.desc = "Deque reverse direction bulkmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = test_deque_tail_enqueue_zc_bulk_elem,
+		.deq = test_deque_head_dequeue_zc_bulk_elem,
+		.deq_opp = test_deque_tail_dequeue_zc_bulk_elem,
+	},
+	{
+		.desc = "Deque reverse direction burstmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = test_deque_tail_enqueue_zc_burst_elem,
+		.deq = test_deque_head_dequeue_zc_burst_elem,
+		.deq_opp = test_deque_tail_dequeue_zc_burst_elem,
+	},
+};
+
+/*
+ * Burst and bulk operations in regular mode and zero copy mode.
+ * Random number of elements are enqueued and dequeued.
+ */
+static int
+test_deque_burst_bulk_tests1(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, j, temp_sz, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Over the boundary deque.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Random full/empty test\n");
+
+		for (j = 0; j != TEST_DEQUE_FULL_EMPTY_ITER; j++) {
+			/* random shift in the deque */
+			int rand = RTE_MAX(rte_rand() % DEQUE_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+				__func__, i, rand);
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							rand, &free_space);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)rand, d, goto fail);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+							rand, &available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)rand, d, goto fail);
+
+			/* fill the deque */
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], dsz,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d), d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d, goto fail);
+
+			/* empty the deque */
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], dsz,
+							&available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+			/* check data */
+			temp_sz = dsz * esize[i];
+			TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst, temp_sz) == 0,
+							d, goto fail);
+		}
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with regular & zero copy mode.
+ * Sequence of simple enqueues/dequeues and validate the enqueued and
+ * dequeued data.
+ */
+static int
+test_deque_burst_bulk_tests2(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, free_space, available;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+		esize[i]);
+
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Multiple enqs, deqs.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("enqueue 1 obj\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						1, &free_space);
+		TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+
+		printf("enqueue 2 objs\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						2, &free_space);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+		printf("enqueue MAX_BULK objs\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						MAX_BULK, &free_space);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+
+		printf("dequeue 1 obj\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						1, &available);
+		TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 1);
+
+		printf("dequeue 2 objs\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						2, &available);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+		printf("dequeue MAX_BULK objs\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						MAX_BULK, &available);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+				RTE_PTR_DIFF(cur_dst, dst)) == 0,
+				d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with normal mode & zero copy mode.
+ * Enqueue and dequeue to cover the entire deque length.
+ */
+static int
+test_deque_burst_bulk_tests3(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, j, free_space, available;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("fill and empty the deque\n");
+		for (j = 0; j < DEQUE_SIZE / MAX_BULK; j++) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], MAX_BULK,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], MAX_BULK,
+							&available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with normal mode & zero copy mode.
+ * Enqueue till the deque is full and dequeue till the deque becomes empty.
+ */
+static int
+test_deque_burst_bulk_tests4(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, j, available, free_space;
+	unsigned int num_elems, api_type;
+	api_type = test_enqdeq_impl[test_idx].api_flags;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Test enqueue without enough memory space\n");
+		for (j = 0; j < (DEQUE_SIZE/MAX_BULK - 1); j++) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], MAX_BULK,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+		}
+
+		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						2, &free_space);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_DEQUE_ELEM_BULK))
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		/* Always one free entry left */
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						num_elems, &free_space);
+		TEST_DEQUE_VERIFY(ret == (MAX_BULK - 3), d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i],
+							(MAX_BULK - 3));
+
+		printf("Test if deque is full\n");
+		TEST_DEQUE_VERIFY(rte_deque_full(d) == 1, d, goto fail);
+
+		printf("Test enqueue for a full entry\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						1, &free_space);
+		TEST_DEQUE_VERIFY(ret == 0, d, goto fail);
+
+		printf("Test dequeue without enough objects\n");
+		for (j = 0; j < DEQUE_SIZE / MAX_BULK - 1; j++) {
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+							MAX_BULK, &available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i],
+						MAX_BULK);
+		}
+
+		/* Available memory space for the exact MAX_BULK entries */
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						2, &available);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_DEQUE_ELEM_BULK))
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						num_elems, &available);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK - 3, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK - 3);
+
+		printf("Test if deque is empty\n");
+		/* Check if deque is empty */
+		TEST_DEQUE_VERIFY(rte_deque_empty(d) == 1, d, goto fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Basic test cases with exact size deque.
+ */
+static int
+test_deque_with_exact_size(void)
+{
+	struct rte_deque *std_d = NULL, *exact_sz_d = NULL;
+	void *src_orig = NULL, *dst_orig = NULL;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	const unsigned int deque_sz = 16;
+	unsigned int i, j, free_space, available;
+	int ret = -1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\nTest exact size deque. Esize: %d\n", esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "std sized deque";
+		std_d = rte_deque_create(DEQUE_NAME, esize[i], deque_sz, 0, 0);
+
+		if (std_d == NULL) {
+			printf("%s: error, can't create std deque\n", __func__);
+			goto test_fail;
+		}
+		static const char *DEQUE_NAME2 = "Exact sized deque";
+		exact_sz_d = rte_deque_create(DEQUE_NAME2, esize[i], deque_sz,
+					0, RTE_DEQUE_F_EXACT_SZ);
+		if (exact_sz_d == NULL) {
+			printf("%s: error, can't create exact size deque\n",
+					__func__);
+			goto test_fail;
+		}
+
+		/* alloc object pointers. Allocate one extra object
+		 * and create an unaligned address.
+		 */
+		src_orig = test_deque_calloc(17, esize[i]);
+		if (src_orig == NULL)
+			goto test_fail;
+		test_deque_mem_init(src_orig, 17, esize[i]);
+		src = (void *)((uintptr_t)src_orig + 1);
+		cur_src = src;
+
+		dst_orig = test_deque_calloc(17, esize[i]);
+		if (dst_orig == NULL)
+			goto test_fail;
+		dst = (void *)((uintptr_t)dst_orig + 1);
+		cur_dst = dst;
+
+		/*
+		 * Check that the exact size deque is bigger than the
+		 * standard deque
+		 */
+		TEST_DEQUE_VERIFY(rte_deque_get_size(std_d) <=
+				rte_deque_get_size(exact_sz_d),
+				std_d, goto test_fail);
+
+		/*
+		 * check that the exact_sz_deque can hold one more element
+		 * than the standard deque. (16 vs 15 elements)
+		 */
+		for (j = 0; j < deque_sz - 1; j++) {
+			ret = test_enqdeq_impl[0].enq(std_d, cur_src, esize[i],
+						1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, std_d, goto test_fail);
+			ret = test_enqdeq_impl[0].enq(exact_sz_d, cur_src,
+						esize[i], 1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, exact_sz_d, goto test_fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+		}
+		ret = test_enqdeq_impl[0].enq(std_d, cur_src, esize[i], 1,
+					&free_space);
+		TEST_DEQUE_VERIFY(ret == 0, std_d, goto test_fail);
+		ret = test_enqdeq_impl[0].enq(exact_sz_d, cur_src, esize[i], 1,
+					&free_space);
+		TEST_DEQUE_VERIFY(ret == 1, exact_sz_d, goto test_fail);
+
+		/* check that dequeue returns the expected number of elements */
+		ret = test_enqdeq_impl[0].deq(exact_sz_d, cur_dst, esize[i],
+					deque_sz, &available);
+		TEST_DEQUE_VERIFY(ret == (unsigned int)deque_sz, exact_sz_d,
+				goto test_fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], deque_sz);
+
+		/* check that the capacity function returns expected value */
+		TEST_DEQUE_VERIFY(rte_deque_get_capacity(exact_sz_d) == deque_sz,
+				exact_sz_d, goto test_fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					exact_sz_d, goto test_fail);
+
+		rte_free(src_orig);
+		rte_free(dst_orig);
+		rte_deque_free(std_d);
+		rte_deque_free(exact_sz_d);
+		src_orig = NULL;
+		dst_orig = NULL;
+		std_d = NULL;
+		exact_sz_d = NULL;
+	}
+
+	return 0;
+
+test_fail:
+	rte_free(src_orig);
+	rte_free(dst_orig);
+	rte_deque_free(std_d);
+	rte_deque_free(exact_sz_d);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations in regular mode and zero copy mode.
+ * Random number of elements are enqueued and dequeued first.
+ * Which would bring both head and tail to somewhere in the middle of
+ * the deque. From that point, stack behavior of the deque is tested.
+ */
+static int
+test_deque_stack_random_tests1(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, j, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests1.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Over the boundary deque.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Random starting point stack test\n");
+
+		for (j = 0; j != TEST_DEQUE_FULL_EMPTY_ITER; j++) {
+			/* random shift in the deque */
+			int rand = RTE_MAX(rte_rand() % DEQUE_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+				__func__, i, rand);
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], rand,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret != 0, d, goto fail);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], rand,
+							&available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)rand, d,
+					goto fail);
+
+			/* fill the deque */
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							dsz, &free_space);
+			TEST_DEQUE_VERIFY(ret != 0, d, goto fail);
+
+			TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d,
+					goto fail);
+
+			/* empty the deque */
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i], dsz,
+								&available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+			/* check data */
+			TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+					dsz, esize[i]) == 0, d, goto fail);
+		}
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/* Tests both standard mode and zero-copy mode.
+ * Keep enqueuing 1, 2, MAX_BULK elements till the deque is full.
+ * Then deque them all and make sure the data is opposite of what
+ * was enqued.
+ */
+static int
+test_deque_stack_random_tests2(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests2.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Multiple enqs, deqs.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+
+		printf("Enqueue objs till the deque is full.\n");
+		unsigned int count = 0;
+		const unsigned int perIterCount = 1 + 2 + MAX_BULK;
+		while (count + perIterCount < DEQUE_SIZE - 1) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							2, &free_space);
+			TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							MAX_BULK, &free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], MAX_BULK);
+			count += perIterCount;
+		}
+		unsigned int leftOver = DEQUE_SIZE - 1 - count;
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						leftOver, &free_space);
+		TEST_DEQUE_VERIFY(ret == leftOver, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], leftOver);
+
+		printf("Deque all the enqued objs.\n");
+		count = 0;
+		while (count + perIterCount < DEQUE_SIZE - 1) {
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+							esize[i], 1, &available);
+			TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 1);
+
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i], 2,
+								&available);
+			TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i],
+								MAX_BULK,
+								&available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK);
+			count += perIterCount;
+		}
+		leftOver = DEQUE_SIZE - 1 - count;
+		ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst, esize[i],
+							leftOver, &available);
+		TEST_DEQUE_VERIFY(ret == leftOver, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], leftOver);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+						dsz, esize[i]) == 0, d,
+						goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Tests both normal mode and zero-copy mode.
+ * Fill up the whole deque, and drain the deque.
+ * Make sure the data matches in reverse order.
+ */
+static int
+test_deque_stack_random_tests3(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, available, free_space;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests3.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		/* fill the deque */
+		printf("Fill the whole deque using 1 "
+		"single enqueue operation.\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						dsz, &free_space);
+		TEST_DEQUE_VERIFY(ret == (unsigned int)dsz, d, goto fail);
+
+		TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_full(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d, goto fail);
+
+		/* empty the deque */
+		printf("Empty the whole deque.\n");
+		ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst, esize[i],
+							dsz, &available);
+		TEST_DEQUE_VERIFY(ret == (unsigned int)dsz, d, goto fail);
+
+		TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+					dsz, esize[i]) == 0, d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+static int
+deque_enqueue_dequeue_autotest_fn(void)
+{
+	if (test_deque_with_exact_size() != 0)
+		goto fail;
+	int (*test_fns[])(unsigned int test_fn_idx) = {
+		test_deque_burst_bulk_tests1,
+		test_deque_burst_bulk_tests2,
+		test_deque_burst_bulk_tests3,
+		test_deque_burst_bulk_tests4,
+		test_deque_stack_random_tests1,
+		test_deque_stack_random_tests2,
+		test_deque_stack_random_tests3
+	};
+	for (unsigned int test_impl_idx = 0;
+		test_impl_idx < RTE_DIM(test_enqdeq_impl); test_impl_idx++) {
+		for (unsigned int test_fn_idx = 0;
+			test_fn_idx < RTE_DIM(test_fns); test_fn_idx++) {
+			if (test_fns[test_fn_idx](test_impl_idx) != 0)
+				goto fail;
+		}
+	}
+	return 0;
+fail:
+		return -1;
+}
+
+REGISTER_FAST_TEST(deque_enqueue_dequeue_autotest, true, true,
+		deque_enqueue_dequeue_autotest_fn);
diff --git a/app/test/test_deque_helper_functions.c b/app/test/test_deque_helper_functions.c
new file mode 100644
index 0000000000..0e47db7fcb
--- /dev/null
+++ b/app/test/test_deque_helper_functions.c
@@ -0,0 +1,169 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include "test.h"
+
+#include <assert.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_deque.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+
+static int
+test_deque_get_memsize(void)
+{
+	const ssize_t RTE_DEQUE_SZ = sizeof(struct rte_deque);
+	/* (1) Should return EINVAL when the supplied size of deque is not a
+	 * power of 2.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, 9), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (2) Should return EINVAL when the supplied size of deque is not a
+	 * multiple of 4.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(5, 8), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (3) Requested size of the deque should be less than or equal to
+	 * RTE_DEQUEUE_SZ_MASK
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, RTE_DEQUE_SZ_MASK), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (4) A deque of count 1, where the element size is 0, should not allocate
+	 * any more memory than necessary to hold the dequeu structure.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(0, 1), RTE_DEQUE_SZ,
+					  "Get memsize function failed.");
+
+	/* (5) Make sure the function is calculating the size correctly.
+	 * size of deque: 128. Size for two elements each of size esize: 8
+	 * total: 128 + 8 = 132
+	 * Cache align'd size = 192.
+	 */
+	const ssize_t calculated_sz = RTE_ALIGN(RTE_DEQUE_SZ + 8, RTE_CACHE_LINE_SIZE);
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, 2), calculated_sz,
+					  "Get memsize function failed.");
+	return 0;
+}
+
+/* Define a Test macro that will allow us to correctly free all the rte_deque
+ * objects that were created as a part of the test in case of a failure.
+ */
+
+#define TEST_DEQUE_MEMSAFE(exp, msg, stmt) do { \
+	if (!(exp)) { \
+		printf("error at %s:%d\tcondition " #exp " failed. Msg: %s\n",	\
+			__func__, __LINE__, msg); \
+		stmt; \
+	 } \
+} while (0)
+
+static int
+test_deque_init(void)
+{
+	{
+	/* (1) Make sure init fails when the flags are not correctly passed in. */
+	struct rte_deque deque;
+
+	/* Calling init with undefined flags should fail. */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, 0x8),
+					  -EINVAL, "Init failed.");
+
+	/* Calling init with a count that is not a power of 2
+	 * And also not the setting the RTE_DEQUE_F_EXACT_SZ
+	 * flag should fail.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, 0),
+					  -EINVAL, "Init failed.");
+
+	/* Calling init with a count that is not a power of 2
+	 * Should succeed only if the RTE_DEQUE_F_EXACT_SZ flag is set.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, RTE_DEQUE_F_EXACT_SZ),
+					  0, "Init failed.");
+	}
+
+	{
+	/* Make sure all the fields are being correctly set when creating a
+	 * Deque of a size that is not a power of 2.
+	 */
+	struct rte_deque deque;
+	static const char NAME[] = "Deque";
+
+	/* Calling init with a count that is not a power of 2
+	 * But with RTE_DEQUE_F_EXACT_SZ should succeed.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, NAME, 10, RTE_DEQUE_F_EXACT_SZ),
+					  0, "Init failed.");
+
+	TEST_ASSERT_BUFFERS_ARE_EQUAL(deque.name, NAME, sizeof(NAME), "Init failed.");
+	TEST_ASSERT_EQUAL(deque.flags, RTE_DEQUE_F_EXACT_SZ, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.size, 16, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.mask, 15, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.capacity, 10, "Init failed.");
+	}
+
+	{
+	/* Make sure all the fields are being correctly set when creating a
+	 * Deque of a size that is a power of 2.
+	 */
+	struct rte_deque deque;
+	static const char NAME[] = "Deque";
+
+	/* Calling init with a count that is not a power of 2
+	 * But with RTE_DEQUE_F_EXACT_SZ should succeed.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, NAME, 16, 0), 0, "Init failed.");
+
+	TEST_ASSERT_EQUAL(deque.size, 16, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.mask, 15, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.capacity, 15, "Init failed.");
+	}
+	return 0;
+}
+
+static int
+test_deque_create(void)
+{
+	struct rte_deque *deque;
+	const char *NAME = "Deque";
+	deque = rte_deque_create(NAME, 4, 16, 0, 0);
+
+	/* Make sure the deque creation is successful. */
+	TEST_DEQUE_MEMSAFE(deque != NULL, "Deque creation failed.", goto fail);
+	TEST_DEQUE_MEMSAFE(deque->memzone != NULL, "Deque creation failed.", goto fail);
+	return 0;
+fail:
+	rte_free(deque);
+	return -1;
+}
+
+#undef TEST_DEQUE_MEMSAFE
+
+static struct unit_test_suite deque_helper_functions_testsuite = {
+	.suite_name = "Deque library helper functions test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_deque_get_memsize),
+		TEST_CASE(test_deque_init),
+		TEST_CASE(test_deque_create),
+		TEST_CASES_END(), /**< NULL terminate unit test array */
+	},
+};
+
+static int
+deque_helper_functions_autotest_fn(void)
+{
+	return unit_test_suite_runner(&deque_helper_functions_testsuite);
+}
+
+REGISTER_FAST_TEST(deque_helper_functions_autotest, true, true,
+		deque_helper_functions_autotest_fn);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue
  2024-04-24 13:42       ` [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
@ 2024-04-24 15:16         ` Morten Brørup
  2024-04-24 17:21           ` Patrick Robb
  2024-04-24 23:28         ` Mattias Rönnblom
  2024-05-02 20:19         ` [PATCH v3 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  2 siblings, 1 reply; 48+ messages in thread
From: Morten Brørup @ 2024-04-24 15:16 UTC (permalink / raw)
  To: Aditya Ambadipudi, dev, jackmin, stephen, matan, viacheslavo,
	roretzla, konstantin.ananyev, hofors
  Cc: wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd

[...]

> +
> +/* mask of all valid flag values to deque_create() */
> +#define __RTE_DEQUE_F_MASK (RTE_DEQUE_F_EXACT_SZ)
> +ssize_t
> +rte_deque_get_memsize_elem(unsigned int esize, unsigned int count)
> +{
> +	ssize_t sz;
> +
> +	/* Check if element size is a multiple of 4B */
> +	if (esize % 4 != 0) {
> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): element size is not a multiple of 4\n",
> +			__func__);

Double indent when continuing on the next line:

+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+				"%s(): element size is not a multiple of 4\n",
+				__func__);

Not just here, but multiple locations in the code.

> +
> +		return -EINVAL;
> +	}
> +
> +	/* count must be a power of 2 */
> +	if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): Requested number of elements is invalid,"
> +			"must be power of 2, and not exceed %u\n",
> +			__func__, RTE_DEQUE_SZ_MASK);

Please use shorter error messages, so they can fit on one line in the source code.

Note: DPDK coding style allows 100 chars source code line length, not just 80.

[...]

> +/* create the deque for a given element size */
> +struct rte_deque *
> +rte_deque_create(const char *name, unsigned int esize, unsigned int count,
> +		int socket_id, unsigned int flags)
> +{
> +	char mz_name[RTE_MEMZONE_NAMESIZE];
> +	struct rte_deque *d;
> +	const struct rte_memzone *mz;
> +	ssize_t deque_size;
> +	int mz_flags = 0;
> +	const unsigned int requested_count = count;
> +	int ret;
> +
> +	/* for an exact size deque, round up from count to a power of two */
> +	if (flags & RTE_DEQUE_F_EXACT_SZ)
> +		count = rte_align32pow2(count + 1);
> +
> +	deque_size = rte_deque_get_memsize_elem(esize, count);
> +	if (deque_size < 0) {
> +		rte_errno = -deque_size;
> +		return NULL;
> +	}
> +
> +	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
> +		RTE_DEQUE_MZ_PREFIX, name);
> +	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
> +		rte_errno = ENAMETOOLONG;
> +		return NULL;
> +	}
> +
> +	/* reserve a memory zone for this deque. If we can't get rte_config or
> +	 * we are secondary process, the memzone_reserve function will set
> +	 * rte_errno for us appropriately - hence no check in this function
> +	 */
> +	mz = rte_memzone_reserve_aligned(mz_name, deque_size, socket_id,
> +					 mz_flags, alignof(struct rte_deque));
> +	if (mz != NULL) {
> +		d = mz->addr;
> +		/* no need to check return value here, we already checked the
> +		 * arguments above
> +		 */
> +		rte_deque_init(d, name, requested_count, flags);

rte_deque_init() error handling is missing here.

> +		d->memzone = mz;
> +	} else {
> +		d = NULL;
> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): Cannot reserve memory\n", __func__);
> +	}
> +	return d;
> +}

[...]

> +#define RTE_DEQUE_MZ_PREFIX "DEQUE_"
> +/** The maximum length of a deque name. */
> +#define RTE_DEQUE_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
> +			   sizeof(RTE_DEQUE_MZ_PREFIX) + 1)
> +
> +/**
> + * Double ended queue (deque) structure.
> + *
> + * The producer and the consumer have a head and a tail index. These indices
> + * are not between 0 and size(deque)-1. These indices are between 0 and
> + * 2^32 -1. Their value is masked while accessing the objects in deque.
> + * These indices are unsigned 32bits. Hence the result of the subtraction is
> + * always a modulo of 2^32 and it is between 0 and capacity.
> + */
> +struct rte_deque {
> +	alignas(RTE_CACHE_LINE_SIZE) char name[RTE_DEQUE_NAMESIZE];

Suggest alternative:
+struct __rte_cache_aligned rte_deque {
+	char name[RTE_DEQUE_NAMESIZE];

> +	/**< Name of the deque */
> +	int flags;
> +	/**< Flags supplied at creation. */
> +	const struct rte_memzone *memzone;
> +	/**< Memzone, if any, containing the rte_deque */
> +
> +	alignas(RTE_CACHE_LINE_SIZE) char pad0; /**< empty cache line */

Why the cache alignment here?

If required, omit the pad0 field and cache align the size field instead.

Alternatively, use RTE_CACHE_GUARD, if that is what you are trying to achieve.

> +
> +	uint32_t size;           /**< Size of deque. */
> +	uint32_t mask;           /**< Mask (size-1) of deque. */
> +	uint32_t capacity;       /**< Usable size of deque */
> +	/** Ring head and tail pointers. */
> +	volatile uint32_t head;
> +	volatile uint32_t tail;
> +};

[...]

> +static __rte_always_inline void
> +__rte_deque_enqueue_elems_head_128(struct rte_deque *d,
> +				const void *obj_table,
> +				unsigned int n)
> +{
> +	unsigned int i;
> +	const uint32_t size = d->size;
> +	uint32_t idx = (d->head & d->mask);
> +	rte_int128_t *deque = (rte_int128_t *)&d[1];
> +	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> +	if (likely(idx + n <= size)) {
> +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 32);

With 100 chars source code line length, this memcpy() fits on one line.
Not just here, but in all the functions.

> +		switch (n & 0x1) {
> +		case 1:
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +	}
> +}


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue
  2024-04-24 15:16         ` Morten Brørup
@ 2024-04-24 17:21           ` Patrick Robb
  2024-04-25  7:43             ` Ali Alnubani
  0 siblings, 1 reply; 48+ messages in thread
From: Patrick Robb @ 2024-04-24 17:21 UTC (permalink / raw)
  To: Ali Alnubani
  Cc: Aditya Ambadipudi, dev, jackmin, stephen, matan, viacheslavo,
	roretzla, konstantin.ananyev, hofors, wathsala.vithanage,
	dhruv.tripathi, honnappa.nagarahalli, nd, Morten Brørup

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

Hi Ali,

Wathsala reached out asking how the checkpatch CI check can be updated so
that this series passes checkpatch.

If building the dictionary is a 1 time operation for you, you may have to
apply this patch and re-run devtools/build-dict.sh so that the new
dictionary is in place for a V3 of this series.

It looks like these dictionary exceptions are submitted quite rarely. But,
if it becomes more common in the future you could look at adding a step to
your automation which produces a new dictionary every time you run
checkpatch, based on any additions to the exception list which came with
the patch. But it's probably not worth the effort with the low volume of
word exception additions.

Thanks.

[-- Attachment #2: Type: text/html, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue
  2024-04-24 13:42       ` [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
  2024-04-24 15:16         ` Morten Brørup
@ 2024-04-24 23:28         ` Mattias Rönnblom
  2024-05-02 20:19         ` [PATCH v3 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  2 siblings, 0 replies; 48+ messages in thread
From: Mattias Rönnblom @ 2024-04-24 23:28 UTC (permalink / raw)
  To: Aditya Ambadipudi, dev, jackmin, stephen, matan, viacheslavo,
	roretzla, konstantin.ananyev, mb
  Cc: wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd

On 2024-04-24 15:42, Aditya Ambadipudi wrote:
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> Add a multi-thread unsafe double ended queue data structure. This
> library provides a simple and efficient alternative to multi-thread
> safe ring when multi-thread safety is not required.
> 
> Signed-off-by: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Change-Id: I6f66fa2ebf750adb22ac75f8cb3c2fe8bdb5fa9e
> ---
> v2:
>    * Addressed the spell check warning issue with the word "Deque"
>    * Tried to rename all objects that are named deque to avoid collision with
>      std::deque
>    * Added the deque library to msvc section in meson.build
>    * Renamed api functions to explicitly state if the function inserts at head org
>      tail.
> 
>   .mailmap                   |   1 +
>   devtools/build-dict.sh     |   1 +
>   lib/deque/meson.build      |  11 +
>   lib/deque/rte_deque.c      | 193 +++++++++++++
>   lib/deque/rte_deque.h      | 533 ++++++++++++++++++++++++++++++++++++
>   lib/deque/rte_deque_core.h |  81 ++++++
>   lib/deque/rte_deque_pvt.h  | 538 +++++++++++++++++++++++++++++++++++++
>   lib/deque/rte_deque_zc.h   | 430 +++++++++++++++++++++++++++++
>   lib/deque/version.map      |  14 +
>   lib/meson.build            |   2 +
>   10 files changed, 1804 insertions(+)
>   create mode 100644 lib/deque/meson.build
>   create mode 100644 lib/deque/rte_deque.c
>   create mode 100644 lib/deque/rte_deque.h
>   create mode 100644 lib/deque/rte_deque_core.h
>   create mode 100644 lib/deque/rte_deque_pvt.h
>   create mode 100644 lib/deque/rte_deque_zc.h
>   create mode 100644 lib/deque/version.map
> 
> diff --git a/.mailmap b/.mailmap
> index 3843868716..8e705ab6ab 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -17,6 +17,7 @@ Adam Bynes <adambynes@outlook.com>
>   Adam Dybkowski <adamx.dybkowski@intel.com>
>   Adam Ludkiewicz <adam.ludkiewicz@intel.com>
>   Adham Masarwah <adham@nvidia.com> <adham@mellanox.com>
> +Aditya Ambadipudi <aditya.ambadipudi@arm.com>
>   Adrian Moreno <amorenoz@redhat.com>
>   Adrian Podlawski <adrian.podlawski@intel.com>
>   Adrien Mazarguil <adrien.mazarguil@6wind.com>
> diff --git a/devtools/build-dict.sh b/devtools/build-dict.sh
> index a8cac49029..595d8f9277 100755
> --- a/devtools/build-dict.sh
> +++ b/devtools/build-dict.sh
> @@ -17,6 +17,7 @@ sed '/^..->/d' |
>   sed '/^uint->/d' |
>   sed "/^doesn'->/d" |
>   sed '/^wasn->/d' |
> +sed '/^deque.*->/d' |
>   
>   # print to stdout
>   cat
> diff --git a/lib/deque/meson.build b/lib/deque/meson.build
> new file mode 100644
> index 0000000000..1ff45fc39f
> --- /dev/null
> +++ b/lib/deque/meson.build
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 Arm Limited
> +
> +sources = files('rte_deque.c')
> +headers = files('rte_deque.h')
> +# most sub-headers are not for direct inclusion
> +indirect_headers += files (
> +        'rte_deque_core.h',
> +        'rte_deque_pvt.h',
> +        'rte_deque_zc.h'
> +)
> diff --git a/lib/deque/rte_deque.c b/lib/deque/rte_deque.c
> new file mode 100644
> index 0000000000..b83a6c43c4
> --- /dev/null
> +++ b/lib/deque/rte_deque.c
> @@ -0,0 +1,193 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Arm Limited
> + */
> +
> +#include <stdalign.h>
> +#include <string.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <errno.h>
> +#include <sys/queue.h>
> +
> +#include <rte_common.h>
> +#include <rte_log.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_eal_memconfig.h>
> +#include <rte_errno.h>
> +#include <rte_string_fns.h>
> +
> +#include "rte_deque.h"
> +
> +/* mask of all valid flag values to deque_create() */
> +#define __RTE_DEQUE_F_MASK (RTE_DEQUE_F_EXACT_SZ)
> +ssize_t
> +rte_deque_get_memsize_elem(unsigned int esize, unsigned int count)
> +{
> +	ssize_t sz;
> +
> +	/* Check if element size is a multiple of 4B */
> +	if (esize % 4 != 0) {
> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): element size is not a multiple of 4\n",
> +			__func__);
> +
> +		return -EINVAL;
> +	}
> +

Use RTE_ASSERT()/VERIFY() instead of returning an error code for API 
contract violations. The application can't do anything useful with those 
anyway. (If you think otherwise, please give an example of an app 
recovering from one of these -EINVAL).

> +	/* count must be a power of 2 */
> +	if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): Requested number of elements is invalid,"
> +			"must be power of 2, and not exceed %u\n",
> +			__func__, RTE_DEQUE_SZ_MASK);
> +
> +		return -EINVAL;
> +	}
> +
> +	sz = sizeof(struct rte_deque) + (ssize_t)count * esize;
> +	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);

Why is the size cache-line aligned?

> +	return sz;
> +}
> +
> +void
> +rte_deque_reset(struct rte_deque *d)
> +{
> +	d->head = 0;
> +	d->tail = 0;
> +}
> +
> +int
> +rte_deque_init(struct rte_deque *d, const char *name, unsigned int count,
> +	unsigned int flags)
> +{
> +	int ret;
> +
> +	/* compilation-time checks */
> +	RTE_BUILD_BUG_ON((sizeof(struct rte_deque) &
> +			  RTE_CACHE_LINE_MASK) != 0);
> +
> +	/* future proof flags, only allow supported values */
> +	if (flags & ~__RTE_DEQUE_F_MASK) {

More RTE_VERIFY().

> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): Unsupported flags requested %#x\n",
> +			__func__, flags);
> +		return -EINVAL;
> +	}
> +
> +	/* init the deque structure */
> +	memset(d, 0, sizeof(*d));
> +	ret = strlcpy(d->name, name, sizeof(d->name));
> +	if (ret < 0 || ret >= (int)sizeof(d->name))
> +		return -ENAMETOOLONG;

Is the max name length known? In that case, RTE_ASSERT().

> +	d->flags = flags;
> +
> +	if (flags & RTE_DEQUE_F_EXACT_SZ) {
> +		d->size = rte_align32pow2(count + 1);
> +		d->mask = d->size - 1;
> +		d->capacity = count;
> +	} else {
> +		if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
> +			rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +				"%s(): Requested size is invalid, must be power"
> +				" of 2, and not exceed the size limit %u\n",
> +				__func__, RTE_DEQUE_SZ_MASK);
> +			return -EINVAL;
> +		}
> +		d->size = count;
> +		d->mask = count - 1;
> +		d->capacity = d->mask;
> +	}
> +
> +	return 0;
> +}
> +
> +/* create the deque for a given element size */
> +struct rte_deque *
> +rte_deque_create(const char *name, unsigned int esize, unsigned int count,
> +		int socket_id, unsigned int flags)
> +{
> +	char mz_name[RTE_MEMZONE_NAMESIZE];
> +	struct rte_deque *d;
> +	const struct rte_memzone *mz;
> +	ssize_t deque_size;
> +	int mz_flags = 0;
> +	const unsigned int requested_count = count;
> +	int ret;
> +
> +	/* for an exact size deque, round up from count to a power of two */
> +	if (flags & RTE_DEQUE_F_EXACT_SZ)
> +		count = rte_align32pow2(count + 1);
> +
> +	deque_size = rte_deque_get_memsize_elem(esize, count);
> +	if (deque_size < 0) {
> +		rte_errno = -deque_size;
> +		return NULL;
> +	}
> +
> +	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
> +		RTE_DEQUE_MZ_PREFIX, name);
> +	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
> +		rte_errno = ENAMETOOLONG;
> +		return NULL;
> +	}
> +
> +	/* reserve a memory zone for this deque. If we can't get rte_config or
> +	 * we are secondary process, the memzone_reserve function will set
> +	 * rte_errno for us appropriately - hence no check in this function
> +	 */

Why not use rte_malloc()?

> +	mz = rte_memzone_reserve_aligned(mz_name, deque_size, socket_id,
> +					 mz_flags, alignof(struct rte_deque));
> +	if (mz != NULL) {
> +		d = mz->addr;
> +		/* no need to check return value here, we already checked the
> +		 * arguments above
> +		 */
> +		rte_deque_init(d, name, requested_count, flags);
> +		d->memzone = mz;
> +	} else {
> +		d = NULL;
> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): Cannot reserve memory\n", __func__);
> +	}
> +	return d;
> +}
> +
> +/* free the deque */
> +void
> +rte_deque_free(struct rte_deque *d)
> +{
> +	if (d == NULL)
> +		return;
> +
> +	/*
> +	 * Deque was not created with rte_deque_create,
> +	 * therefore, there is no memzone to free.
> +	 */

In case it wasn't created, it should not be free'd, I would argue. Add a 
separate function (deinit?) to reverse init.

> +	if (d->memzone == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): Cannot free deque, not created "
> +			"with rte_deque_create()\n", __func__);
> +		return;
> +	}
> +
> +	if (rte_memzone_free(d->memzone) != 0)
> +		rte_log(RTE_LOG_ERR, rte_deque_log_type,
> +			"%s(): Cannot free memory\n", __func__);
> +}
> +
> +/* dump the status of the deque on the console */
> +void
> +rte_deque_dump(FILE *f, const struct rte_deque *d)
> +{
> +	fprintf(f, "deque <%s>@%p\n", d->name, d);
> +	fprintf(f, "  flags=%x\n", d->flags);
> +	fprintf(f, "  size=%"PRIu32"\n", d->size);
> +	fprintf(f, "  capacity=%"PRIu32"\n", d->capacity);
> +	fprintf(f, "  head=%"PRIu32"\n", d->head);
> +	fprintf(f, "  tail=%"PRIu32"\n", d->tail);
> +	fprintf(f, "  used=%u\n", rte_deque_count(d));
> +	fprintf(f, "  avail=%u\n", rte_deque_free_count(d));
> +}
> +
> +RTE_LOG_REGISTER_DEFAULT(rte_deque_log_type, ERR);
> diff --git a/lib/deque/rte_deque.h b/lib/deque/rte_deque.h
> new file mode 100644
> index 0000000000..6633eab377
> --- /dev/null
> +++ b/lib/deque/rte_deque.h
> @@ -0,0 +1,533 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Arm Limited
> + */
> +
> +#ifndef _RTE_DEQUE_H_
> +#define _RTE_DEQUE_H_
> +
> +/**
> + * @file
> + * RTE double ended queue (Deque)
> + *
> + * This fixed-size queue does not provide concurrent access by
> + * multiple threads. If required, the application should use locks
> + * to protect the deque from concurrent access.
> + *
> + * - Double ended queue
> + * - Maximum size is fixed
> + * - Store objects of any size
> + * - Single/bulk/burst dequeue at tail or head
> + * - Single/bulk/burst enqueue at head or tail
> + *
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_deque_core.h>
> +#include <rte_deque_pvt.h>
> +#include <rte_deque_zc.h>
> +
> +/**
> + * Calculate the memory size needed for a deque
> + *
> + * This function returns the number of bytes needed for a deque, given
> + * the number of objects and the object size. This value is the sum of
> + * the size of the structure rte_deque and the size of the memory needed
> + * by the objects. The value is aligned to a cache line size.
> + *
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + * @param count
> + *   The number of objects in the deque (must be a power of 2).
> + * @return
> + *   - The memory size needed for the deque on success.
> + *   - -EINVAL if count is not a power of 2.
> + */
> +__rte_experimental
> +ssize_t rte_deque_get_memsize_elem(unsigned int esize, unsigned int count);
> +
> +/**
> + * Initialize a deque structure.
> + *
> + * Initialize a deque structure in memory pointed by "d". The size of the
> + * memory area must be large enough to store the deque structure and the
> + * object table. It is advised to use rte_deque_get_memsize() to get the
> + * appropriate size.
> + *
> + * The deque size is set to *count*, which must be a power of two.
> + * The real usable deque size is *count-1* instead of *count* to
> + * differentiate a full deque from an empty deque.
> + *
> + * @param d
> + *   The pointer to the deque structure followed by the objects table.
> + * @param name
> + *   The name of the deque.
> + * @param count
> + *   The number of objects in the deque (must be a power of 2,
> + *   unless RTE_DEQUE_F_EXACT_SZ is set in flags).

What would be the performance implications of always having exact sizes, 
and exact-length allocations?

You can't have a mask, but do you need to?

> + * @param flags
> + *   - RTE_DEQUE_F_EXACT_SZ: If this flag is set, the deque will hold
> + *     exactly the requested number of objects, and the requested size
> + *     will be rounded up to the next power of two, but the usable space
> + *     will be exactly that requested. Worst case, if a power-of-2 size is
> + *     requested, half the deque space will be wasted.
> + *     Without this flag set, the deque size requested must be a power of 2,
> + *     and the usable space will be that size - 1.
> + * @return
> + *   0 on success, or a negative value on error.
> + */
> +__rte_experimental
> +int rte_deque_init(struct rte_deque *d, const char *name, unsigned int count,
> +		unsigned int flags);
> +
> +/**
> + * Create a new deque named *name* in memory.
> + *

Why does deques have names, when linked lists don't?

> + * This function uses ``memzone_reserve()`` to allocate memory. Then it
> + * calls rte_deque_init() to initialize an empty deque.
> + *
> + * The new deque size is set to *count*, which must be a power of two.
> + * The real usable deque size is *count-1* instead of *count* to
> + * differentiate a full deque from an empty deque.
> + *
> + * @param name
> + *   The name of the deque.
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + * @param count
> + *   The size of the deque (must be a power of 2,
> + *   unless RTE_DEQUE_F_EXACT_SZ is set in flags).
> + * @param socket_id
> + *   The *socket_id* argument is the socket identifier in case of
> + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> + *   constraint for the reserved zone.
> + * @param flags
> + *   - RTE_DEQUE_F_EXACT_SZ: If this flag is set, the deque will hold exactly the
> + *     requested number of entries, and the requested size will be rounded up
> + *     to the next power of two, but the usable space will be exactly that
> + *     requested. Worst case, if a power-of-2 size is requested, half the
> + *     deque space will be wasted.
> + *     Without this flag set, the deque size requested must be a power of 2,
> + *     and the usable space will be that size - 1.
> + * @return
> + *   On success, the pointer to the new allocated deque. NULL on error with
> + *    rte_errno set appropriately. Possible errno values include:
> + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
> + *    - EINVAL - count provided is not a power of 2
> + *    - ENOSPC - the maximum number of memzones has already been allocated
> + *    - EEXIST - a memzone with the same name already exists
> + *    - ENOMEM - no appropriate memory area found in which to create memzone
> + */
> +__rte_experimental
> +struct rte_deque *rte_deque_create(const char *name, unsigned int esize,
> +				unsigned int count, int socket_id,
> +				unsigned int flags);
> +
> +/**
> + * De-allocate all memory used by the deque.
> + *
> + * @param d
> + *   Deque to free.
> + *   If NULL then, the function does nothing.
> + */
> +__rte_experimental
> +void rte_deque_free(struct rte_deque *d);
> +
> +/**
> + * Dump the status of the deque to a file.
> + *
> + * @param f
> + *   A pointer to a file for output
> + * @param d
> + *   A pointer to the deque structure.
> + */
> +__rte_experimental
> +void rte_deque_dump(FILE *f, const struct rte_deque *d);
> +
> +/**
> + * Return the number of entries in a deque.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @return
> + *   The number of entries in the deque.
> + */
> +static inline unsigned int
> +rte_deque_count(const struct rte_deque *d)
> +{
> +	return (d->head - d->tail) & d->mask;
> +}
> +
> +/**
> + * Return the number of free entries in a deque.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @return
> + *   The number of free entries in the deque.
> + */
> +static inline unsigned int
> +rte_deque_free_count(const struct rte_deque *d)
> +{
> +	return d->capacity - rte_deque_count(d);
> +}
> +
> +/**
> + * Enqueue fixed number of objects on a deque at the head.
> + *
> + * This function copies the objects at the head of the deque and
> + * moves the head index.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param obj_table
> + *   A pointer to a table of objects.

Use "array", not "table".

> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the deque. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to add in the deque from the obj_table.
> + * @param free_space
> + *   Returns the amount of space in the deque after the enqueue operation
> + *   has finished.

I think you should remove the parameter. Just use the free count 
function if you need this information.

> + * @return
> + *   The number of objects enqueued, either 0 or n

Do we really need both a "bulk" and a "burst" function? Seems to me like 
burst-only would be good enough, and in case you want to know if you can 
fit the whole array, you can just check first. No concurrency issues, 
since this thingy is not MT safe.

> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_head_enqueue_bulk_elem(struct rte_deque *d,

Maybe use "push" and "pop" instead of "enqueue"/"dequeue"? Or maybe 
"append" and "pop" (like Python does). I think it make sense to not copy 
too much of the rte_ring terminology and design, since this thing is 
something else, way simpler, non-MT safe. Python also uses "left" and 
"right", rather than head and tail. I guess in the deque case, what is 
head and what is tail is not entirely clear.

Also, doesn't "enqueue" imply the operation is working against the tail, 
not the head?

> +			const void *obj_table,
> +			unsigned int esize,
> +			unsigned int n,
> +			unsigned int *free_space)
> +{
> +	*free_space = rte_deque_free_count(d);
> +	if (unlikely(n > *free_space))
> +		return 0;
> +	*free_space -= n;
> +	return __rte_deque_enqueue_at_head(d, obj_table, esize, n);
> +}
> +
> +/**
> + * Enqueue up to a maximum number of objects on a deque at the head.
> + *
> + * This function copies the objects at the head of the deque and
> + * moves the head index.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the deque. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to add in the deque from the obj_table.
> + * @param free_space
> + *   Returns the amount of space in the deque after the enqueue operation
> + *   has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_head_enqueue_burst_elem(struct rte_deque *d, const void *obj_table,
> +			unsigned int esize, unsigned int n,
> +			unsigned int *free_space)
> +{
> +	unsigned int avail_space = rte_deque_free_count(d);
> +	unsigned int to_be_enqueued = (n <= avail_space ? n : avail_space);
> +	*free_space = avail_space - n;
> +	return __rte_deque_enqueue_at_head(d, obj_table, esize, to_be_enqueued);
> +}
> +
> +/**
> + * Enqueue fixed number of objects on a deque at the tail.
> + *
> + * This function copies the objects at the tail of the deque and
> + * moves the tail index (backwards).
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the deque. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to add in the deque from the obj_table.
> + * @param free_space
> + *   Returns the amount of space in the deque after the enqueue operation
> + *   has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_tail_enqueue_bulk_elem(struct rte_deque *d,
> +				 const void *obj_table, unsigned int esize,
> +				 unsigned int n, unsigned int *free_space)
> +{
> +	*free_space = rte_deque_free_count(d);
> +	if (unlikely(n > *free_space))
> +		return 0;
> +	*free_space -= n;
> +	return __rte_deque_enqueue_at_tail(d, obj_table, esize, n);
> +}
> +
> +/**
> + * Enqueue up to a maximum number of objects on a deque at the tail.
> + *
> + * This function copies the objects at the tail of the deque and
> + * moves the tail index (backwards).
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the deque. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to add in the deque from the obj_table.
> + * @param free_space
> + *   Returns the amount of space in the deque after the enqueue operation
> + *   has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_tail_enqueue_burst_elem(struct rte_deque *d,
> +				const void *obj_table, unsigned int esize,
> +				unsigned int n, unsigned int *free_space)
> +{
> +	unsigned int avail_space = rte_deque_free_count(d);
> +	unsigned int to_be_enqueued = (n <= avail_space ? n : avail_space);
> +	*free_space = avail_space - to_be_enqueued;
> +	return __rte_deque_enqueue_at_tail(d, obj_table, esize, to_be_enqueued);
> +}
> +
> +/**
> + * Dequeue a fixed number of objects from a deque at tail.
> + *
> + * This function copies the objects from the tail of the deque and
> + * moves the tail index.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param obj_table
> + *   A pointer to a table of objects that will be filled.
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the deque. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the deque to the obj_table.
> + * @param available
> + *   Returns the number of remaining deque entries after the dequeue
> + *   has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_tail_dequeue_bulk_elem(struct rte_deque *d, void *obj_table,
> +			unsigned int esize, unsigned int n,
> +			unsigned int *available)
> +{
> +	*available = rte_deque_count(d);
> +	if (unlikely(n > *available))
> +		return 0;
> +	*available -= n;
> +	return __rte_deque_dequeue_at_tail(d, obj_table, esize, n);
> +}
> +
> +/**
> + * Dequeue up to a maximum number of objects from a deque at tail.
> + *
> + * This function copies the objects from the tail of the deque and
> + * moves the tail index.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param obj_table
> + *   A pointer to a table of objects that will be filled.
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the deque. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the deque to the obj_table.
> + * @param available
> + *   Returns the number of remaining deque entries after the dequeue
> + *   has finished.
> + * @return
> + *   - Number of objects dequeued
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_tail_dequeue_burst_elem(struct rte_deque *d, void *obj_table,
> +			unsigned int esize, unsigned int n,
> +			unsigned int *available)
> +{
> +	unsigned int count = rte_deque_count(d);
> +	unsigned int to_be_dequeued = (n <= count ? n : count);
> +	*available = count - to_be_dequeued;
> +	return __rte_deque_dequeue_at_tail(d, obj_table, esize, to_be_dequeued);
> +}
> +
> +/**
> + * Dequeue a fixed number of objects from a deque from the head.
> + *
> + * This function copies the objects from the head of the deque and
> + * moves the head index (backwards).
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param obj_table
> + *   A pointer to a table of objects that will be filled.
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the deque. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the deque to the obj_table.
> + * @param available
> + *   Returns the number of remaining deque entries after the dequeue
> + *   has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_head_dequeue_bulk_elem(struct rte_deque *d, void *obj_table,
> +			unsigned int esize, unsigned int n,
> +			unsigned int *available)
> +{
> +	*available = rte_deque_count(d);
> +	if (unlikely(n > *available))
> +		return 0;
> +	*available -= n;
> +	return __rte_deque_dequeue_at_head(d, obj_table, esize, n);
> +}
> +
> +/**
> + * Dequeue up to a maximum number of objects from a deque from the head.
> + *
> + * This function copies the objects from the head of the deque and
> + * moves the head index (backwards).
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param obj_table
> + *   A pointer to a table of objects that will be filled.
> + * @param esize
> + *   The size of deque object, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the deque. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the deque to the obj_table.
> + * @param available
> + *   Returns the number of remaining deque entries after the dequeue
> + *   has finished.
> + * @return
> + *   - Number of objects dequeued
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_head_dequeue_burst_elem(struct rte_deque *d, void *obj_table,
> +			unsigned int esize, unsigned int n,
> +			unsigned int *available)
> +{
> +	unsigned int count = rte_deque_count(d);
> +	unsigned int to_be_dequeued = (n <= count ? n : count);
> +	*available = count - to_be_dequeued;
> +	return __rte_deque_dequeue_at_head(d, obj_table, esize, to_be_dequeued);
> +}
> +
> +/**
> + * Flush a deque.
> + *
> + * This function flush all the objects in a deque
> + *
> + * @warning
> + * Make sure the deque is not in use while calling this function.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + */
> +__rte_experimental
> +void rte_deque_reset(struct rte_deque *d);
> +
> +/**
> + * Test if a deque is full.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @return
> + *   - 1: The deque is full.
> + *   - 0: The deque is not full.
> + */
> +static inline int
> +rte_deque_full(const struct rte_deque *d)
> +{
> +	return rte_deque_free_count(d) == 0;
> +}
> +
> +/**
> + * Test if a deque is empty.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @return
> + *   - 1: The deque is empty.
> + *   - 0: The deque is not empty.
> + */
> +static inline int
> +rte_deque_empty(const struct rte_deque *d)
> +{
> +	return d->tail == d->head;
> +}
> +
> +/**
> + * Return the size of the deque.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @return
> + *   The size of the data store used by the deque.

What is the "data store"? The size in what? Elements, bytes.

> + *   NOTE: this is not the same as the usable space in the deque. To query that
> + *   use ``rte_deque_get_capacity()``.
> + */
> +static inline unsigned int
> +rte_deque_get_size(const struct rte_deque *d)
> +{
> +	return d->size;
> +}
> +
> +/**
> + * Return the number of objects which can be stored in the deque.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @return
> + *   The usable size of the deque.
> + */
> +static inline unsigned int
> +rte_deque_get_capacity(const struct rte_deque *d)
> +{
> +	return d->capacity;
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DEQUE_H_ */
> diff --git a/lib/deque/rte_deque_core.h b/lib/deque/rte_deque_core.h
> new file mode 100644
> index 0000000000..0bb8695c8a
> --- /dev/null
> +++ b/lib/deque/rte_deque_core.h
> @@ -0,0 +1,81 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Arm Limited
> + */
> +
> +#ifndef _RTE_DEQUE_CORE_H_
> +#define _RTE_DEQUE_CORE_H_
> +
> +/**
> + * @file
> + * This file contains definition of RTE deque structure, init flags and
> + * some related macros. This file should not be included directly,
> + * include rte_deque.h instead.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdint.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <rte_common.h>
> +#include <rte_config.h>
> +#include <rte_memory.h>
> +#include <rte_lcore.h>
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_memzone.h>
> +#include <rte_pause.h>
> +#include <rte_debug.h>
> +
> +extern int rte_deque_log_type;
> +
> +#define RTE_DEQUE_MZ_PREFIX "DEQUE_"
> +/** The maximum length of a deque name. */
> +#define RTE_DEQUE_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
> +			   sizeof(RTE_DEQUE_MZ_PREFIX) + 1)
> +
> +/**
> + * Double ended queue (deque) structure.
> + *
> + * The producer and the consumer have a head and a tail index. These indices
> + * are not between 0 and size(deque)-1. These indices are between 0 and
> + * 2^32 -1. Their value is masked while accessing the objects in deque.
> + * These indices are unsigned 32bits. Hence the result of the subtraction is
> + * always a modulo of 2^32 and it is between 0 and capacity.
> + */
> +struct rte_deque {
> +	alignas(RTE_CACHE_LINE_SIZE) char name[RTE_DEQUE_NAMESIZE];
> +	/**< Name of the deque */
> +	int flags;
> +	/**< Flags supplied at creation. */
> +	const struct rte_memzone *memzone;
> +	/**< Memzone, if any, containing the rte_deque */
> +
> +	alignas(RTE_CACHE_LINE_SIZE) char pad0; /**< empty cache line */
> +
> +	uint32_t size;           /**< Size of deque. */
> +	uint32_t mask;           /**< Mask (size-1) of deque. */
> +	uint32_t capacity;       /**< Usable size of deque */
> +	/** Ring head and tail pointers. */
> +	volatile uint32_t head;
> +	volatile uint32_t tail;

Remove volatile.

> +};
> +
> +/**
> + * Deque is to hold exactly requested number of entries.
> + * Without this flag set, the deque size requested must be a power of 2, and the
> + * usable space will be that size - 1. With the flag, the requested size will
> + * be rounded up to the next power of two, but the usable space will be exactly
> + * that requested. Worst case, if a power-of-2 size is requested, half the
> + * deque space will be wasted.
> + */
> +#define RTE_DEQUE_F_EXACT_SZ 0x0004
> +#define RTE_DEQUE_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DEQUE_CORE_H_ */
> diff --git a/lib/deque/rte_deque_pvt.h b/lib/deque/rte_deque_pvt.h
> new file mode 100644
> index 0000000000..931bbd4d19
> --- /dev/null
> +++ b/lib/deque/rte_deque_pvt.h
> @@ -0,0 +1,538 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Arm Limited
> + */
> +
> +#ifndef _RTE_DEQUE_PVT_H_
> +#define _RTE_DEQUE_PVT_H_
> +
> +#define __RTE_DEQUE_COUNT(d) ((d->head - d->tail) & d->mask)
> +#define __RTE_DEQUE_FREE_SPACE(d) (d->capacity - __RTE_DEQUE_COUNT(d))
> +
> +static __rte_always_inline void
> +__rte_deque_enqueue_elems_head_32(struct rte_deque *d,
> +				const unsigned int size,
> +				uint32_t idx,
> +				const void *obj_table,
> +				unsigned int n)
> +{
> +	unsigned int i;
> +	uint32_t *deque = (uint32_t *)&d[1];
> +	const uint32_t *obj = (const uint32_t *)obj_table;
> +	if (likely(idx + n <= size)) {
> +		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
> +			deque[idx] = obj[i];
> +			deque[idx + 1] = obj[i + 1];
> +			deque[idx + 2] = obj[i + 2];
> +			deque[idx + 3] = obj[i + 3];
> +			deque[idx + 4] = obj[i + 4];
> +			deque[idx + 5] = obj[i + 5];
> +			deque[idx + 6] = obj[i + 6];
> +			deque[idx + 7] = obj[i + 7];
> +		}
> +		switch (n & 0x7) {
> +		case 7:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		case 6:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		case 5:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		case 4:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		case 3:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		case 2:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		case 1:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			deque[idx] = obj[i];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			deque[idx] = obj[i];
> +	}
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_enqueue_elems_head_64(struct rte_deque *d,
> +				const void *obj_table,
> +				unsigned int n)
> +{
> +	unsigned int i;
> +	const uint32_t size = d->size;
> +	uint32_t idx = (d->head & d->mask);
> +	uint64_t *deque = (uint64_t *)&d[1];
> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
> +	if (likely(idx + n <= size)) {
> +		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> +			deque[idx] = obj[i];
> +			deque[idx + 1] = obj[i + 1];
> +			deque[idx + 2] = obj[i + 2];
> +			deque[idx + 3] = obj[i + 3];
> +		}
> +		switch (n & 0x3) {
> +		case 3:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		case 2:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		case 1:
> +			deque[idx++] = obj[i++]; /* fallthrough */
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			deque[idx] = obj[i];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			deque[idx] = obj[i];
> +	}
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_enqueue_elems_head_128(struct rte_deque *d,
> +				const void *obj_table,
> +				unsigned int n)
> +{
> +	unsigned int i;
> +	const uint32_t size = d->size;
> +	uint32_t idx = (d->head & d->mask);
> +	rte_int128_t *deque = (rte_int128_t *)&d[1];
> +	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> +	if (likely(idx + n <= size)) {
> +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 32);
> +		switch (n & 0x1) {
> +		case 1:
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +	}
> +}
> +
> +static __rte_always_inline unsigned int
> +__rte_deque_enqueue_at_head(struct rte_deque *d,
> +			const void *obj_table,
> +			unsigned int esize,
> +			unsigned int n)
> +{
> +	/* 8B and 16B copies implemented individually because on some platforms
> +	 * there are 64 bit and 128 bit registers available for direct copying.
> +	 */
> +	if (esize == 8)
> +		__rte_deque_enqueue_elems_head_64(d, obj_table, n);
> +	else if (esize == 16)
> +		__rte_deque_enqueue_elems_head_128(d, obj_table, n);
> +	else {
> +		uint32_t idx, scale, nd_idx, nd_num, nd_size;
> +
> +		/* Normalize to uint32_t */
> +		scale = esize / sizeof(uint32_t);
> +		nd_num = n * scale;
> +		idx = d->head & d->mask;
> +		nd_idx = idx * scale;
> +		nd_size = d->size * scale;
> +		__rte_deque_enqueue_elems_head_32(d, nd_size, nd_idx,
> +						obj_table, nd_num);
> +	}
> +	d->head = (d->head + n) & d->mask;
> +	return n;
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_enqueue_elems_tail_32(struct rte_deque *d,
> +				const unsigned int mask,
> +				uint32_t idx,
> +				const void *obj_table,
> +				unsigned int n,
> +				const unsigned int scale,
> +				const unsigned int elem_size)
> +{
> +	unsigned int i;
> +	uint32_t *deque = (uint32_t *)&d[1];
> +	const uint32_t *obj = (const uint32_t *)obj_table;
> +
> +	if (likely(idx >= n)) {
> +		for (i = 0; i < n; idx -= scale, i += scale)
> +			memcpy(&deque[idx], &obj[i], elem_size);
> +	} else {
> +		for (i = 0; (int32_t)idx >= 0; idx -= scale, i += scale)
> +			memcpy(&deque[idx], &obj[i], elem_size);
> +
> +		/* Start at the ending */
> +		idx = mask;
> +		for (; i < n; idx -= scale, i += scale)
> +			memcpy(&deque[idx], &obj[i], elem_size);
> +	}
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_enqueue_elems_tail_64(struct rte_deque *d,
> +				const void *obj_table,
> +				unsigned int n)
> +{
> +	unsigned int i;
> +	uint32_t idx = (d->tail & d->mask);
> +	uint64_t *deque = (uint64_t *)&d[1];
> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
> +	if (likely((int32_t)(idx - n) >= 0)) {
> +		for (i = 0; i < (n & ~0x3); i += 4, idx -= 4) {
> +			deque[idx] = obj[i];
> +			deque[idx - 1] = obj[i + 1];
> +			deque[idx - 2] = obj[i + 2];
> +			deque[idx - 3] = obj[i + 3];
> +		}
> +		switch (n & 0x3) {
> +		case 3:
> +			deque[idx--] = obj[i++]; /* fallthrough */
> +		case 2:
> +			deque[idx--] = obj[i++]; /* fallthrough */
> +		case 1:
> +			deque[idx--] = obj[i++]; /* fallthrough */
> +		}
> +	} else {
> +		for (i = 0; (int32_t)idx >= 0; i++, idx--)
> +			deque[idx] = obj[i];
> +		/* Start at the ending */
> +		for (idx = d->mask; i < n; i++, idx--)
> +			deque[idx] = obj[i];
> +	}
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_enqueue_elems_tail_128(struct rte_deque *d,
> +				const void *obj_table,
> +				unsigned int n)
> +{
> +	unsigned int i;
> +	uint32_t idx = (d->tail & d->mask);
> +	rte_int128_t *deque = (rte_int128_t *)&d[1];
> +	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> +	if (likely((int32_t)(idx - n) >= 0)) {
> +		for (i = 0; i < (n & ~0x1); i += 2, idx -= 2) {
> +			deque[idx] = obj[i];
> +			deque[idx - 1] = obj[i + 1];
> +		}
> +		switch (n & 0x1) {
> +		case 1:
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +		}
> +	} else {
> +		for (i = 0; (int32_t)idx >= 0; i++, idx--)
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +		/* Start at the ending */
> +		for (idx = d->mask; i < n; i++, idx--)
> +			memcpy((void *)(deque + idx),
> +				(const void *)(obj + i), 16);
> +	}
> +}
> +
> +static __rte_always_inline unsigned int
> +__rte_deque_enqueue_at_tail(struct rte_deque *d,
> +			const void *obj_table,
> +			unsigned int esize,
> +			unsigned int n)
> +{
> +	/* The tail point must point at an empty cell when enqueuing */
> +	d->tail--;
> +
> +	/* 8B and 16B copies implemented individually because on some platforms
> +	 * there are 64 bit and 128 bit registers available for direct copying.
> +	 */
> +	if (esize == 8)
> +		__rte_deque_enqueue_elems_tail_64(d, obj_table, n);
> +	else if (esize == 16)
> +		__rte_deque_enqueue_elems_tail_128(d, obj_table, n);
> +	else {
> +		uint32_t idx, scale, nd_idx, nd_num, nd_mask;
> +
> +		/* Normalize to uint32_t */
> +		scale = esize / sizeof(uint32_t);
> +		nd_num = n * scale;
> +		idx = d->tail & d->mask;
> +		nd_idx = idx * scale;
> +		nd_mask = d->mask * scale;
> +		__rte_deque_enqueue_elems_tail_32(d, nd_mask, nd_idx, obj_table,
> +						nd_num, scale, esize);
> +	}
> +
> +	/* The +1 is because the tail needs to point at a
> +	 * non-empty memory location after the enqueuing operation.
> +	 */
> +	d->tail = (d->tail - n + 1) & d->mask;
> +	return n;
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_dequeue_elems_32(struct rte_deque *d,
> +			const unsigned int size,
> +			uint32_t idx,
> +			void *obj_table,
> +			unsigned int n)
> +{
> +	unsigned int i;
> +	const uint32_t *deque = (const uint32_t *)&d[1];
> +	uint32_t *obj = (uint32_t *)obj_table;
> +	if (likely(idx + n <= size)) {
> +		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
> +			obj[i] = deque[idx];
> +			obj[i + 1] = deque[idx + 1];
> +			obj[i + 2] = deque[idx + 2];
> +			obj[i + 3] = deque[idx + 3];
> +			obj[i + 4] = deque[idx + 4];
> +			obj[i + 5] = deque[idx + 5];
> +			obj[i + 6] = deque[idx + 6];
> +			obj[i + 7] = deque[idx + 7];
> +		}
> +		switch (n & 0x7) {
> +		case 7:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		case 6:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		case 5:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		case 4:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		case 3:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		case 2:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		case 1:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			obj[i] = deque[idx];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			obj[i] = deque[idx];
> +	}
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_dequeue_elems_64(struct rte_deque *d, void *obj_table,
> +			unsigned int n)
> +{
> +	unsigned int i;
> +	const uint32_t size = d->size;
> +	uint32_t idx = (d->tail & d->mask);
> +	const uint64_t *deque = (const uint64_t *)&d[1];
> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> +	if (likely(idx + n <= size)) {
> +		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> +			obj[i] = deque[idx];
> +			obj[i + 1] = deque[idx + 1];
> +			obj[i + 2] = deque[idx + 2];
> +			obj[i + 3] = deque[idx + 3];
> +		}
> +		switch (n & 0x3) {
> +		case 3:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		case 2:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		case 1:
> +			obj[i++] = deque[idx++]; /* fallthrough */
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			obj[i] = deque[idx];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			obj[i] = deque[idx];
> +	}
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_dequeue_elems_128(struct rte_deque *d,
> +			void *obj_table,
> +			unsigned int n)
> +{
> +	unsigned int i;
> +	const uint32_t size = d->size;
> +	uint32_t idx = (d->tail & d->mask);
> +	const rte_int128_t *deque = (const rte_int128_t *)&d[1];
> +	rte_int128_t *obj = (rte_int128_t *)obj_table;
> +	if (likely(idx + n <= size)) {
> +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> +			memcpy((void *)(obj + i),
> +				(const void *)(deque + idx), 32);
> +		switch (n & 0x1) {
> +		case 1:
> +			memcpy((void *)(obj + i),
> +				(const void *)(deque + idx), 16);
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			memcpy((void *)(obj + i),
> +				(const void *)(deque + idx), 16);
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			memcpy((void *)(obj + i),
> +				(const void *)(deque + idx), 16);
> +	}
> +}
> +
> +static __rte_always_inline unsigned int
> +__rte_deque_dequeue_at_tail(struct rte_deque *d,
> +			void *obj_table,
> +			unsigned int esize,
> +			unsigned int n)
> +{
> +	/* 8B and 16B copies implemented individually because on some platforms
> +	 * there are 64 bit and 128 bit registers available for direct copying.
> +	 */
> +	if (esize == 8)
> +		__rte_deque_dequeue_elems_64(d, obj_table, n);
> +	else if (esize == 16)
> +		__rte_deque_dequeue_elems_128(d, obj_table, n);
> +	else {
> +		uint32_t idx, scale, nd_idx, nd_num, nd_size;
> +
> +		/* Normalize to uint32_t */
> +		scale = esize / sizeof(uint32_t);
> +		nd_num = n * scale;
> +		idx = d->tail & d->mask;
> +		nd_idx = idx * scale;
> +		nd_size = d->size * scale;
> +		__rte_deque_dequeue_elems_32(d, nd_size, nd_idx,
> +					obj_table, nd_num);
> +	}
> +	d->tail = (d->tail + n) & d->mask;
> +	return n;
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_dequeue_elems_head_32(struct rte_deque *d,
> +				const unsigned int mask,
> +				uint32_t idx,
> +				void *obj_table,
> +				unsigned int n,
> +				const unsigned int scale,
> +				const unsigned int elem_size)
> +{
> +	unsigned int i;
> +	const uint32_t *deque = (uint32_t *)&d[1];
> +	uint32_t *obj = (uint32_t *)obj_table;
> +
> +	if (likely(idx >= n)) {
> +		for (i = 0; i < n; idx -= scale, i += scale)
> +			memcpy(&obj[i], &deque[idx], elem_size);
> +	} else {
> +		for (i = 0; (int32_t)idx >= 0; idx -= scale, i += scale)
> +			memcpy(&obj[i], &deque[idx], elem_size);
> +		/* Start at the ending */
> +		idx = mask;
> +		for (; i < n; idx -= scale, i += scale)
> +			memcpy(&obj[i], &deque[idx], elem_size);
> +	}
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_dequeue_elems_head_64(struct rte_deque *d,
> +				void *obj_table,
> +				unsigned int n)
> +{
> +	unsigned int i;
> +	uint32_t idx = (d->head & d->mask);
> +	const uint64_t *deque = (uint64_t *)&d[1];
> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> +	if (likely((int32_t)(idx - n) >= 0)) {
> +		for (i = 0; i < (n & ~0x3); i += 4, idx -= 4) {
> +			obj[i] = deque[idx];
> +			obj[i + 1] = deque[idx - 1];
> +			obj[i + 2] = deque[idx - 2];
> +			obj[i + 3] = deque[idx - 3];
> +		}
> +		switch (n & 0x3) {
> +		case 3:
> +			obj[i++] = deque[idx--];  /* fallthrough */
> +		case 2:
> +			obj[i++] = deque[idx--]; /* fallthrough */
> +		case 1:
> +			obj[i++] = deque[idx--]; /* fallthrough */
> +		}
> +	} else {
> +		for (i = 0; (int32_t)idx >= 0; i++, idx--)
> +			obj[i] = deque[idx];
> +		/* Start at the ending */
> +		for (idx = d->mask; i < n; i++, idx--)
> +			obj[i] = deque[idx];
> +	}
> +}
> +
> +static __rte_always_inline void
> +__rte_deque_dequeue_elems_head_128(struct rte_deque *d,
> +				void *obj_table,
> +				unsigned int n)
> +{
> +	unsigned int i;
> +	uint32_t idx = (d->head & d->mask);
> +	const rte_int128_t *deque = (rte_int128_t *)&d[1];
> +	rte_int128_t *obj = (rte_int128_t *)obj_table;
> +	if (likely((int32_t)(idx - n) >= 0)) {
> +		for (i = 0; i < (n & ~0x1); i += 2, idx -= 2) {
> +			obj[i] = deque[idx];
> +			obj[i + 1] = deque[idx - 1];
> +		}
> +		switch (n & 0x1) {
> +		case 1:
> +			memcpy((void *)(obj + i),
> +				(const void *)(deque + idx), 16);
> +		}
> +	} else {
> +		for (i = 0; (int32_t)idx >= 0; i++, idx--)
> +			memcpy((void *)(obj + i),
> +				(const void *)(deque + idx), 16);
> +		/* Start at the ending */
> +		for (idx = d->mask; i < n; i++, idx--)
> +			memcpy((void *)(obj + i),
> +				(const void *)(deque + idx), 16);
> +	}
> +}
> +
> +static __rte_always_inline unsigned int
> +__rte_deque_dequeue_at_head(struct rte_deque *d,
> +			void *obj_table,
> +			unsigned int esize,
> +			unsigned int n)
> +{
> +	/* The head must point at an empty cell when dequeueing */
> +	d->head--;
> +
> +	/* 8B and 16B copies implemented individually because on some platforms
> +	 * there are 64 bit and 128 bit registers available for direct copying.
> +	 */
> +	if (esize == 8)
> +		__rte_deque_dequeue_elems_head_64(d, obj_table, n);
> +	else if (esize == 16)
> +		__rte_deque_dequeue_elems_head_128(d, obj_table, n);
> +	else {
> +		uint32_t idx, scale, nd_idx, nd_num, nd_mask;
> +
> +		/* Normalize to uint32_t */
> +		scale = esize / sizeof(uint32_t);
> +		nd_num = n * scale;
> +		idx = d->head & d->mask;
> +		nd_idx = idx * scale;
> +		nd_mask = d->mask * scale;
> +		__rte_deque_dequeue_elems_head_32(d, nd_mask, nd_idx, obj_table,
> +						nd_num, scale, esize);
> +	}
> +
> +	/* The +1 is because the head needs to point at a
> +	 * empty memory location after the dequeueing operation.
> +	 */
> +	d->head = (d->head - n + 1) & d->mask;
> +	return n;
> +}
> +#endif /* _RTE_DEQUEU_PVT_H_ */
> diff --git a/lib/deque/rte_deque_zc.h b/lib/deque/rte_deque_zc.h
> new file mode 100644
> index 0000000000..6d7167e158
> --- /dev/null
> +++ b/lib/deque/rte_deque_zc.h
> @@ -0,0 +1,430 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Arm Limited
> + */
> +#ifndef _RTE_DEQUE_ZC_H_
> +#define _RTE_DEQUE_ZC_H_
> +
> +/**
> + * @file
> + * This file should not be included directly, include rte_deque.h instead.
> + *
> + * Deque Zero Copy APIs
> + * These APIs make it possible to split public enqueue/dequeue API
> + * into 3 parts:
> + * - enqueue/dequeue start
> + * - copy data to/from the deque
> + * - enqueue/dequeue finish
> + * These APIs provide the ability to avoid copying of the data to temporary area.
> + *
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Deque zero-copy information structure.
> + *
> + * This structure contains the pointers and length of the space
> + * reserved on the Deque storage.
> + */
> +struct __rte_cache_aligned rte_deque_zc_data {
> +	/* Pointer to the first space in the deque */
> +	void *ptr1;
> +	/* Pointer to the second space in the deque if there is wrap-around.
> +	 * It contains valid value only if wrap-around happens.
> +	 */
> +	void *ptr2;
> +	/* Number of elements in the first pointer. If this is equal to
> +	 * the number of elements requested, then ptr2 is NULL.
> +	 * Otherwise, subtracting n1 from number of elements requested
> +	 * will give the number of elements available at ptr2.
> +	 */
> +	unsigned int n1;
> +};
> +
> +static __rte_always_inline void
> +__rte_deque_get_elem_addr(struct rte_deque *d, uint32_t pos,
> +	uint32_t esize, uint32_t num, void **dst1, uint32_t *n1, void **dst2,
> +	bool low_to_high)
> +{
> +	uint32_t idx, scale, nr_idx;
> +	uint32_t *deque_ptr = (uint32_t *)&d[1];
> +
> +	/* Normalize to uint32_t */
> +	scale = esize / sizeof(uint32_t);
> +	idx = pos & d->mask;
> +	nr_idx = idx * scale;
> +
> +	*dst1 = deque_ptr + nr_idx;
> +	*n1 = num;
> +
> +	if (low_to_high) {
> +		if (idx + num > d->size) {
> +			*n1 = d->size - idx;
> +			*dst2 = deque_ptr;
> +		} else
> +			*dst2 = NULL;
> +	} else {
> +		if ((int32_t)(idx - num) < 0) {
> +			*n1 = idx + 1;
> +			*dst2 = (void *)&deque_ptr[(-1 & d->mask) * scale];
> +		} else
> +			*dst2 = NULL;
> +	}
> +}
> +
> +/**
> + * Start to enqueue several objects on the deque.
> + * Note that no actual objects are put in the deque by this function,
> + * it just reserves space for the user on the deque.
> + * User has to copy objects into the deque using the returned pointers.
> + * User should call rte_deque_enqueue_zc_elem_finish to complete the
> + * enqueue operation.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param esize
> + *   The size of deque element, in bytes. It must be a multiple of 4.
> + * @param n
> + *   The number of objects to add in the deque.
> + * @param zcd
> + *   Structure containing the pointers and length of the space
> + *   reserved on the deque storage.
> + * @param free_space
> + *   Returns the amount of space in the deque after the reservation operation
> + *   has finished.
> + * @return
> + *   The number of objects that can be enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_head_enqueue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
> +	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
> +{
> +
> +	*free_space = __RTE_DEQUE_FREE_SPACE(d);
> +	if (unlikely(*free_space < n))
> +		return 0;
> +	__rte_deque_get_elem_addr(d, d->head, esize, n, &zcd->ptr1,
> +							&zcd->n1, &zcd->ptr2, true);
> +
> +	*free_space -= n;
> +	return n;
> +}
> +
> +/**
> + * Complete enqueuing several pointers to objects on the deque.
> + * Note that number of objects to enqueue should not exceed previous
> + * enqueue_start return value.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param n
> + *   The number of pointers to objects to add to the deque.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_deque_head_enqueue_zc_elem_finish(struct rte_deque *d, unsigned int n)
> +{
> +	d->head = (d->head + n) & d->mask;
> +}
> +
> +/**
> + * Start to enqueue several objects on the deque.
> + * Note that no actual objects are put in the queue by this function,
> + * it just reserves space for the user on the deque.
> + * User has to copy objects into the queue using the returned pointers.
> + * User should call rte_deque_enqueue_zc_elem_finish to complete the
> + * enqueue operation.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param esize
> + *   The size of deque element, in bytes. It must be a multiple of 4.
> + * @param n
> + *   The number of objects to add in the deque.
> + * @param zcd
> + *   Structure containing the pointers and length of the space
> + *   reserved on the deque storage.
> + * @param free_space
> + *   Returns the amount of space in the deque after the reservation operation
> + *   has finished.
> + * @return
> + *   The number of objects that can be enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_head_enqueue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
> +	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
> +{
> +	*free_space = __RTE_DEQUE_FREE_SPACE(d);
> +	n = n > *free_space ? *free_space : n;
> +	return rte_deque_head_enqueue_zc_bulk_elem_start(d, esize, n, zcd, free_space);
> +}
> +
> +/**
> + * Start to enqueue several objects on the deque.
> + * Note that no actual objects are put in the deque by this function,
> + * it just reserves space for the user on the deque.
> + * User has to copy objects into the deque using the returned pointers.
> + * User should call rte_deque_enqueue_zc_elem_finish to complete the
> + * enqueue operation.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param esize
> + *   The size of deque element, in bytes. It must be a multiple of 4.
> + * @param n
> + *   The number of objects to add in the deque.
> + * @param zcd
> + *   Structure containing the pointers and length of the space
> + *   reserved on the deque storage.
> + * @param free_space
> + *   Returns the amount of space in the deque after the reservation operation
> + *   has finished.
> + * @return
> + *   The number of objects that can be enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_tail_enqueue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
> +	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
> +{
> +	*free_space = __RTE_DEQUE_FREE_SPACE(d);
> +	if (unlikely(*free_space < n))
> +		return 0;
> +	__rte_deque_get_elem_addr(d, d->tail - 1, esize, n, &zcd->ptr1,
> +							  &zcd->n1, &zcd->ptr2, false);
> +
> +	*free_space -= n;
> +	return n;
> +}
> +
> +/**
> + * Complete enqueuing several pointers to objects on the deque.
> + * Note that number of objects to enqueue should not exceed previous
> + * enqueue_start return value.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param n
> + *   The number of pointers to objects to add to the deque.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_deque_tail_enqueue_zc_elem_finish(struct rte_deque *d, unsigned int n)
> +{
> +	d->tail = (d->tail - n) & d->mask;
> +}
> +
> +/**
> + * Start to enqueue several objects on the deque.
> + * Note that no actual objects are put in the queue by this function,
> + * it just reserves space for the user on the deque.
> + * User has to copy objects into the queue using the returned pointers.
> + * User should call rte_deque_enqueue_zc_elem_finish to complete the
> + * enqueue operation.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param esize
> + *   The size of deque element, in bytes. It must be a multiple of 4.
> + * @param n
> + *   The number of objects to add in the deque.@param r
> + * @param zcd
> + *   Structure containing the pointers and length of the space
> + *   reserved on the deque storage.
> + * @param free_space
> + *   Returns the amount of space in the deque after the reservation operation
> + *   has finished.
> + * @return
> + *   The number of objects that can be enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_tail_enqueue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
> +	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
> +{
> +	*free_space = __RTE_DEQUE_FREE_SPACE(d);
> +	n = n > *free_space ? *free_space : n;
> +	return rte_deque_tail_enqueue_zc_bulk_elem_start(d, esize, n, zcd, free_space);
> +}
> +
> +/**
> + * Start to dequeue several objects from the deque.
> + * Note that no actual objects are copied from the queue by this function.
> + * User has to copy objects from the queue using the returned pointers.
> + * User should call rte_deque_dequeue_zc_elem_finish to complete the
> + * dequeue operation.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param esize
> + *   The size of deque element, in bytes. It must be a multiple of 4.
> + * @param n
> + *   The number of objects to remove from the deque.
> + * @param zcd
> + *   Structure containing the pointers and length of the space
> + *   reserved on the deque storage.
> + * @param available
> + *   Returns the number of remaining deque entries after the dequeue has
> + *   finished.
> + * @return
> + *   The number of objects that can be dequeued, either 0 or n.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_tail_dequeue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
> +	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
> +{
> +	*available = __RTE_DEQUE_COUNT(d);
> +	if (unlikely(*available < n))
> +		return 0;
> +	__rte_deque_get_elem_addr(d, d->tail, esize, n, &zcd->ptr1,
> +							&zcd->n1, &zcd->ptr2, true);
> +
> +	*available -= n;
> +	return n;
> +}
> +
> +/**
> + * Complete dequeuing several objects from the deque.
> + * Note that number of objects to dequeued should not exceed previous
> + * dequeue_start return value.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param n
> + *   The number of objects to remove from the deque.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_deque_tail_dequeue_zc_elem_finish(struct rte_deque *d, unsigned int n)
> +{
> +	d->tail = (d->tail + n) & d->mask;
> +}
> +
> +/**
> + * Start to dequeue several objects from the deque.
> + * Note that no actual objects are copied from the queue by this function.

Why do you even need to copy elements out from the queue, ever?

Wouldn't it be better to return a reference to the objects, rather than 
to copy objects around? Or at least have a zero-copy option.

"Peek" functions, either single-object or "burst". (The benefit of 
"bursts" is not going to be very great for this data structure, provided 
you remove "volatile" on head and tail.)

Say you have rte_event as the element (24 bytes, if I recall correctly). 
Then you don't want to needlessly copy those around to stack-allocated 
arrays.

Rather, one would like to do something like:

for (;;) {
	struct rte_event *events;
	unsigned int n = rte_deque_peek(dequeue, &events, 16);
	if (n == 0)
		break;
	process_events(events, n);
	rte_deque_pop(deque, n);
}

My overall impression is that you should forget about rte_ring, and both 
reduce rte_deque complexity and optimize for the non-MT-safe case.

I'm not even sure you need the "copy out" variant of the API.

> + * User has to copy objects from the queue using the returned pointers.
> + * User should call rte_deque_dequeue_zc_elem_finish to complete the
> + * dequeue operation.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param esize
> + *   The size of deque element, in bytes. It must be a multiple of 4.
> + * @param n
> + *   The number of objects to remove from the deque.
> + * @param zcd
> + *   Structure containing the pointers and length of the space
> + *   reserved on the deque storage.
> + * @param available
> + *   Returns the number of remaining deque entries after the dequeue has
> + *   finished.
> + * @return
> + *   The number of objects that can be dequeued, either 0 or n.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_tail_dequeue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
> +	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
> +{
> +	*available = __RTE_DEQUE_COUNT(d);
> +	n = n > *available ? *available : n;
> +	return rte_deque_tail_dequeue_zc_bulk_elem_start(d, esize, n, zcd, available);
> +}
> +
> +/**
> + * Start to dequeue several objects from the deque.
> + * Note that no actual objects are copied from the queue by this function.
> + * User has to copy objects from the queue using the returned pointers.
> + * User should call rte_deque_dequeue_zc_elem_finish to complete the
> + * dequeue operation.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param esize
> + *   The size of deque element, in bytes. It must be a multiple of 4.
> + * @param n
> + *   The number of objects to remove from the deque.
> + * @param zcd
> + *   Structure containing the pointers and length of the space
> + *   reserved on the deque storage.
> + * @param available
> + *   Returns the number of remaining deque entries after the dequeue has
> + *   finished.
> + * @return
> + *   The number of objects that can be dequeued, either 0 or n.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_head_dequeue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
> +	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
> +{
> +	*available = __RTE_DEQUE_COUNT(d);
> +	if (unlikely(*available < n))
> +		return 0;
> +	__rte_deque_get_elem_addr(d, d->head - 1, esize, n, &zcd->ptr1,
> +							&zcd->n1, &zcd->ptr2, false);
> +
> +	*available -= n;
> +	return n;
> +}
> +
> +/**
> + * Complete dequeuing several objects from the deque.
> + * Note that number of objects to dequeued should not exceed previous
> + * dequeue_start return value.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param n
> + *   The number of objects to remove from the deque.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_deque_head_dequeue_zc_elem_finish(struct rte_deque *d, unsigned int n)
> +{
> +	d->head = (d->head - n) & d->mask;
> +}
> +
> +/**
> + * Start to dequeue several objects from the deque.
> + * Note that no actual objects are copied from the queue by this function.
> + * User has to copy objects from the queue using the returned pointers.
> + * User should call rte_deque_dequeue_zc_elem_finish to complete the
> + * dequeue operation.
> + *
> + * @param d
> + *   A pointer to the deque structure.
> + * @param esize
> + *   The size of deque element, in bytes. It must be a multiple of 4.
> + * @param n
> + *   The number of objects to remove from the deque.
> + * @param zcd
> + *   Structure containing the pointers and length of the space
> + *   reserved on the deque storage.
> + * @param available
> + *   Returns the number of remaining deque entries after the dequeue has
> + *   finished.
> + * @return
> + *   The number of objects that can be dequeued, either 0 or n.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_deque_head_dequeue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
> +	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
> +{
> +	*available = __RTE_DEQUE_COUNT(d);
> +	n = n > *available ? *available : n;
> +	return rte_deque_head_dequeue_zc_bulk_elem_start(d, esize, n, zcd, available);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DEQUE_ZC_H_ */
> diff --git a/lib/deque/version.map b/lib/deque/version.map
> new file mode 100644
> index 0000000000..103fd3b512
> --- /dev/null
> +++ b/lib/deque/version.map
> @@ -0,0 +1,14 @@
> +EXPERIMENTAL {
> +	global:
> +
> +	# added in 24.07
> +	rte_deque_log_type;
> +	rte_deque_create;
> +	rte_deque_dump;
> +	rte_deque_free;
> +	rte_deque_get_memsize_elem;
> +	rte_deque_init;
> +	rte_deque_reset;
> +
> +	local: *;
> +};
> diff --git a/lib/meson.build b/lib/meson.build
> index 179a272932..82929b7a11 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -14,6 +14,7 @@ libraries = [
>           'argparse',
>           'telemetry', # basic info querying
>           'eal', # everything depends on eal
> +        'deque',
>           'ring',
>           'rcu', # rcu depends on ring
>           'mempool',
> @@ -74,6 +75,7 @@ if is_ms_compiler
>               'kvargs',
>               'telemetry',
>               'eal',
> +            'dpdk',
>               'ring',
>       ]
>   endif

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue
  2024-04-24 17:21           ` Patrick Robb
@ 2024-04-25  7:43             ` Ali Alnubani
  0 siblings, 0 replies; 48+ messages in thread
From: Ali Alnubani @ 2024-04-25  7:43 UTC (permalink / raw)
  To: Patrick Robb
  Cc: Aditya Ambadipudi, dev, Jack Min, stephen, Matan Azrad,
	Slava Ovsiienko, roretzla, konstantin.ananyev, hofors,
	wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd,
	Morten Brørup

> From: Patrick Robb <probb@iol.unh.edu>
> Sent: Wednesday, April 24, 2024 8:21 PM
> To: Ali Alnubani <alialnu@nvidia.com>
> Cc: Aditya Ambadipudi <aditya.ambadipudi@arm.com>; dev@dpdk.org; Jack
> Min <jackmin@nvidia.com>; stephen@networkplumber.org; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
> roretzla@linux.microsoft.com; konstantin.ananyev@huawei.com;
> hofors@lysator.liu.se; wathsala.vithanage@arm.com;
> dhruv.tripathi@arm.com; honnappa.nagarahalli@arm.com; nd@arm.com;
> Morten Brørup <mb@smartsharesystems.com>
> Subject: Re: [PATCH v2 1/2] deque: add multi-thread unsafe double ended
> queue
> 
> Hi Ali,
> 
> Wathsala reached out asking how the checkpatch CI check can be updated so
> that this series passes checkpatch.
> 
> If building the dictionary is a 1 time operation for you, you may have to apply
> this patch and re-run devtools/build-dict.sh so that the new dictionary is in
> place for a V3 of this series.
> 
> It looks like these dictionary exceptions are submitted quite rarely. But, if it
> becomes more common in the future you could look at adding a step to your
> automation which produces a new dictionary every time you run checkpatch,
> based on any additions to the exception list which came with the patch. But
> it's probably not worth the effort with the low volume of word exception
> additions.
> 

Hello,

Applied the change to the dictionary and reran the check.
Still failing because of the GERRIT CHANGE_ID though:
https://mails.dpdk.org/archives/test-report/2024-April/650688.html

Regards,
Ali

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 0/2] deque: add multithread unsafe deque library
  2024-04-24 13:42       ` [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
  2024-04-24 15:16         ` Morten Brørup
  2024-04-24 23:28         ` Mattias Rönnblom
@ 2024-05-02 20:19         ` Aditya Ambadipudi
  2024-05-02 20:19           ` [PATCH v3 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
                             ` (3 more replies)
  2 siblings, 4 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-05-02 20:19 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors, probb, alialnu
  Cc: wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd,
	venamb01, Aditya Ambadipudi

As previously discussed in the mailing list [1] we are sending out this
patch that provides the implementation and unit test cases for the
RTE_DEQUE library. This includes functions for creating a RTE_DEQUE 
object. Allocating memory to it. Deleting that object and free'ing the
memory associated with it. Enqueue/Dequeue functions. Functions for 
zero-copy API.

Aditya Ambadipudi (1):
  deque: add unit tests for the deque library

Honnappa Nagarahalli (1):
  deque: add multi-thread unsafe double ended queue

 .mailmap                               |    1 +
 app/test/meson.build                   |    2 +
 app/test/test_deque_enqueue_dequeue.c  | 1228 ++++++++++++++++++++++++
 app/test/test_deque_helper_functions.c |  169 ++++
 devtools/build-dict.sh                 |    1 +
 lib/deque/meson.build                  |   11 +
 lib/deque/rte_deque.c                  |  193 ++++
 lib/deque/rte_deque.h                  |  533 ++++++++++
 lib/deque/rte_deque_core.h             |   81 ++
 lib/deque/rte_deque_pvt.h              |  538 +++++++++++
 lib/deque/rte_deque_zc.h               |  430 +++++++++
 lib/deque/version.map                  |   14 +
 lib/meson.build                        |    2 +
 13 files changed, 3203 insertions(+)
 create mode 100644 app/test/test_deque_enqueue_dequeue.c
 create mode 100644 app/test/test_deque_helper_functions.c
 create mode 100644 lib/deque/meson.build
 create mode 100644 lib/deque/rte_deque.c
 create mode 100644 lib/deque/rte_deque.h
 create mode 100644 lib/deque/rte_deque_core.h
 create mode 100644 lib/deque/rte_deque_pvt.h
 create mode 100644 lib/deque/rte_deque_zc.h
 create mode 100644 lib/deque/version.map

-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 1/2] deque: add multi-thread unsafe double ended queue
  2024-05-02 20:19         ` [PATCH v3 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
@ 2024-05-02 20:19           ` Aditya Ambadipudi
  2024-05-02 20:19           ` [PATCH v3 2/2] deque: add unit tests for the deque library Aditya Ambadipudi
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-05-02 20:19 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors, probb, alialnu
  Cc: wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd,
	venamb01, Aditya Ambadipudi

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add a multi-thread unsafe double ended queue data structure. This
library provides a simple and efficient alternative to multi-thread
safe ring when multi-thread safety is not required.

Signed-off-by: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
v3:
  * Removed stdio.h from two files where it is not needed.

v2:
  * Addressed the spell check warning issue with the word "Deque"
  * Tried to rename all objects that are named deque to avoid collision with
    std::deque
  * Added the deque library to msvc section in meson.build
  * Renamed api functions to explicitly state if the function inserts at head org
    tail.
 .mailmap                   |   1 +
 devtools/build-dict.sh     |   1 +
 lib/deque/meson.build      |  11 +
 lib/deque/rte_deque.c      | 193 +++++++++++++
 lib/deque/rte_deque.h      | 533 ++++++++++++++++++++++++++++++++++++
 lib/deque/rte_deque_core.h |  81 ++++++
 lib/deque/rte_deque_pvt.h  | 538 +++++++++++++++++++++++++++++++++++++
 lib/deque/rte_deque_zc.h   | 430 +++++++++++++++++++++++++++++
 lib/deque/version.map      |  14 +
 lib/meson.build            |   2 +
 10 files changed, 1804 insertions(+)
 create mode 100644 lib/deque/meson.build
 create mode 100644 lib/deque/rte_deque.c
 create mode 100644 lib/deque/rte_deque.h
 create mode 100644 lib/deque/rte_deque_core.h
 create mode 100644 lib/deque/rte_deque_pvt.h
 create mode 100644 lib/deque/rte_deque_zc.h
 create mode 100644 lib/deque/version.map

diff --git a/.mailmap b/.mailmap
index 3843868716..8e705ab6ab 100644
--- a/.mailmap
+++ b/.mailmap
@@ -17,6 +17,7 @@ Adam Bynes <adambynes@outlook.com>
 Adam Dybkowski <adamx.dybkowski@intel.com>
 Adam Ludkiewicz <adam.ludkiewicz@intel.com>
 Adham Masarwah <adham@nvidia.com> <adham@mellanox.com>
+Aditya Ambadipudi <aditya.ambadipudi@arm.com>
 Adrian Moreno <amorenoz@redhat.com>
 Adrian Podlawski <adrian.podlawski@intel.com>
 Adrien Mazarguil <adrien.mazarguil@6wind.com>
diff --git a/devtools/build-dict.sh b/devtools/build-dict.sh
index a8cac49029..595d8f9277 100755
--- a/devtools/build-dict.sh
+++ b/devtools/build-dict.sh
@@ -17,6 +17,7 @@ sed '/^..->/d' |
 sed '/^uint->/d' |
 sed "/^doesn'->/d" |
 sed '/^wasn->/d' |
+sed '/^deque.*->/d' |
 
 # print to stdout
 cat
diff --git a/lib/deque/meson.build b/lib/deque/meson.build
new file mode 100644
index 0000000000..1ff45fc39f
--- /dev/null
+++ b/lib/deque/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Arm Limited
+
+sources = files('rte_deque.c')
+headers = files('rte_deque.h')
+# most sub-headers are not for direct inclusion
+indirect_headers += files (
+        'rte_deque_core.h',
+        'rte_deque_pvt.h',
+        'rte_deque_zc.h'
+)
diff --git a/lib/deque/rte_deque.c b/lib/deque/rte_deque.c
new file mode 100644
index 0000000000..b83a6c43c4
--- /dev/null
+++ b/lib/deque/rte_deque.c
@@ -0,0 +1,193 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include <stdalign.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+
+#include "rte_deque.h"
+
+/* mask of all valid flag values to deque_create() */
+#define __RTE_DEQUE_F_MASK (RTE_DEQUE_F_EXACT_SZ)
+ssize_t
+rte_deque_get_memsize_elem(unsigned int esize, unsigned int count)
+{
+	ssize_t sz;
+
+	/* Check if element size is a multiple of 4B */
+	if (esize % 4 != 0) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): element size is not a multiple of 4\n",
+			__func__);
+
+		return -EINVAL;
+	}
+
+	/* count must be a power of 2 */
+	if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Requested number of elements is invalid,"
+			"must be power of 2, and not exceed %u\n",
+			__func__, RTE_DEQUE_SZ_MASK);
+
+		return -EINVAL;
+	}
+
+	sz = sizeof(struct rte_deque) + (ssize_t)count * esize;
+	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+	return sz;
+}
+
+void
+rte_deque_reset(struct rte_deque *d)
+{
+	d->head = 0;
+	d->tail = 0;
+}
+
+int
+rte_deque_init(struct rte_deque *d, const char *name, unsigned int count,
+	unsigned int flags)
+{
+	int ret;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(struct rte_deque) &
+			  RTE_CACHE_LINE_MASK) != 0);
+
+	/* future proof flags, only allow supported values */
+	if (flags & ~__RTE_DEQUE_F_MASK) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Unsupported flags requested %#x\n",
+			__func__, flags);
+		return -EINVAL;
+	}
+
+	/* init the deque structure */
+	memset(d, 0, sizeof(*d));
+	ret = strlcpy(d->name, name, sizeof(d->name));
+	if (ret < 0 || ret >= (int)sizeof(d->name))
+		return -ENAMETOOLONG;
+	d->flags = flags;
+
+	if (flags & RTE_DEQUE_F_EXACT_SZ) {
+		d->size = rte_align32pow2(count + 1);
+		d->mask = d->size - 1;
+		d->capacity = count;
+	} else {
+		if ((!RTE_IS_POWER_OF_2(count)) || (count > RTE_DEQUE_SZ_MASK)) {
+			rte_log(RTE_LOG_ERR, rte_deque_log_type,
+				"%s(): Requested size is invalid, must be power"
+				" of 2, and not exceed the size limit %u\n",
+				__func__, RTE_DEQUE_SZ_MASK);
+			return -EINVAL;
+		}
+		d->size = count;
+		d->mask = count - 1;
+		d->capacity = d->mask;
+	}
+
+	return 0;
+}
+
+/* create the deque for a given element size */
+struct rte_deque *
+rte_deque_create(const char *name, unsigned int esize, unsigned int count,
+		int socket_id, unsigned int flags)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct rte_deque *d;
+	const struct rte_memzone *mz;
+	ssize_t deque_size;
+	int mz_flags = 0;
+	const unsigned int requested_count = count;
+	int ret;
+
+	/* for an exact size deque, round up from count to a power of two */
+	if (flags & RTE_DEQUE_F_EXACT_SZ)
+		count = rte_align32pow2(count + 1);
+
+	deque_size = rte_deque_get_memsize_elem(esize, count);
+	if (deque_size < 0) {
+		rte_errno = -deque_size;
+		return NULL;
+	}
+
+	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
+		RTE_DEQUE_MZ_PREFIX, name);
+	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
+		rte_errno = ENAMETOOLONG;
+		return NULL;
+	}
+
+	/* reserve a memory zone for this deque. If we can't get rte_config or
+	 * we are secondary process, the memzone_reserve function will set
+	 * rte_errno for us appropriately - hence no check in this function
+	 */
+	mz = rte_memzone_reserve_aligned(mz_name, deque_size, socket_id,
+					 mz_flags, alignof(struct rte_deque));
+	if (mz != NULL) {
+		d = mz->addr;
+		/* no need to check return value here, we already checked the
+		 * arguments above
+		 */
+		rte_deque_init(d, name, requested_count, flags);
+		d->memzone = mz;
+	} else {
+		d = NULL;
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot reserve memory\n", __func__);
+	}
+	return d;
+}
+
+/* free the deque */
+void
+rte_deque_free(struct rte_deque *d)
+{
+	if (d == NULL)
+		return;
+
+	/*
+	 * Deque was not created with rte_deque_create,
+	 * therefore, there is no memzone to free.
+	 */
+	if (d->memzone == NULL) {
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot free deque, not created "
+			"with rte_deque_create()\n", __func__);
+		return;
+	}
+
+	if (rte_memzone_free(d->memzone) != 0)
+		rte_log(RTE_LOG_ERR, rte_deque_log_type,
+			"%s(): Cannot free memory\n", __func__);
+}
+
+/* dump the status of the deque on the console */
+void
+rte_deque_dump(FILE *f, const struct rte_deque *d)
+{
+	fprintf(f, "deque <%s>@%p\n", d->name, d);
+	fprintf(f, "  flags=%x\n", d->flags);
+	fprintf(f, "  size=%"PRIu32"\n", d->size);
+	fprintf(f, "  capacity=%"PRIu32"\n", d->capacity);
+	fprintf(f, "  head=%"PRIu32"\n", d->head);
+	fprintf(f, "  tail=%"PRIu32"\n", d->tail);
+	fprintf(f, "  used=%u\n", rte_deque_count(d));
+	fprintf(f, "  avail=%u\n", rte_deque_free_count(d));
+}
+
+RTE_LOG_REGISTER_DEFAULT(rte_deque_log_type, ERR);
diff --git a/lib/deque/rte_deque.h b/lib/deque/rte_deque.h
new file mode 100644
index 0000000000..6633eab377
--- /dev/null
+++ b/lib/deque/rte_deque.h
@@ -0,0 +1,533 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_H_
+#define _RTE_DEQUE_H_
+
+/**
+ * @file
+ * RTE double ended queue (Deque)
+ *
+ * This fixed-size queue does not provide concurrent access by
+ * multiple threads. If required, the application should use locks
+ * to protect the deque from concurrent access.
+ *
+ * - Double ended queue
+ * - Maximum size is fixed
+ * - Store objects of any size
+ * - Single/bulk/burst dequeue at tail or head
+ * - Single/bulk/burst enqueue at head or tail
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_deque_core.h>
+#include <rte_deque_pvt.h>
+#include <rte_deque_zc.h>
+
+/**
+ * Calculate the memory size needed for a deque
+ *
+ * This function returns the number of bytes needed for a deque, given
+ * the number of objects and the object size. This value is the sum of
+ * the size of the structure rte_deque and the size of the memory needed
+ * by the objects. The value is aligned to a cache line size.
+ *
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of objects in the deque (must be a power of 2).
+ * @return
+ *   - The memory size needed for the deque on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_deque_get_memsize_elem(unsigned int esize, unsigned int count);
+
+/**
+ * Initialize a deque structure.
+ *
+ * Initialize a deque structure in memory pointed by "d". The size of the
+ * memory area must be large enough to store the deque structure and the
+ * object table. It is advised to use rte_deque_get_memsize() to get the
+ * appropriate size.
+ *
+ * The deque size is set to *count*, which must be a power of two.
+ * The real usable deque size is *count-1* instead of *count* to
+ * differentiate a full deque from an empty deque.
+ *
+ * @param d
+ *   The pointer to the deque structure followed by the objects table.
+ * @param name
+ *   The name of the deque.
+ * @param count
+ *   The number of objects in the deque (must be a power of 2,
+ *   unless RTE_DEQUE_F_EXACT_SZ is set in flags).
+ * @param flags
+ *   - RTE_DEQUE_F_EXACT_SZ: If this flag is set, the deque will hold
+ *     exactly the requested number of objects, and the requested size
+ *     will be rounded up to the next power of two, but the usable space
+ *     will be exactly that requested. Worst case, if a power-of-2 size is
+ *     requested, half the deque space will be wasted.
+ *     Without this flag set, the deque size requested must be a power of 2,
+ *     and the usable space will be that size - 1.
+ * @return
+ *   0 on success, or a negative value on error.
+ */
+__rte_experimental
+int rte_deque_init(struct rte_deque *d, const char *name, unsigned int count,
+		unsigned int flags);
+
+/**
+ * Create a new deque named *name* in memory.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_deque_init() to initialize an empty deque.
+ *
+ * The new deque size is set to *count*, which must be a power of two.
+ * The real usable deque size is *count-1* instead of *count* to
+ * differentiate a full deque from an empty deque.
+ *
+ * @param name
+ *   The name of the deque.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The size of the deque (must be a power of 2,
+ *   unless RTE_DEQUE_F_EXACT_SZ is set in flags).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   - RTE_DEQUE_F_EXACT_SZ: If this flag is set, the deque will hold exactly the
+ *     requested number of entries, and the requested size will be rounded up
+ *     to the next power of two, but the usable space will be exactly that
+ *     requested. Worst case, if a power-of-2 size is requested, half the
+ *     deque space will be wasted.
+ *     Without this flag set, the deque size requested must be a power of 2,
+ *     and the usable space will be that size - 1.
+ * @return
+ *   On success, the pointer to the new allocated deque. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_deque *rte_deque_create(const char *name, unsigned int esize,
+				unsigned int count, int socket_id,
+				unsigned int flags);
+
+/**
+ * De-allocate all memory used by the deque.
+ *
+ * @param d
+ *   Deque to free.
+ *   If NULL then, the function does nothing.
+ */
+__rte_experimental
+void rte_deque_free(struct rte_deque *d);
+
+/**
+ * Dump the status of the deque to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param d
+ *   A pointer to the deque structure.
+ */
+__rte_experimental
+void rte_deque_dump(FILE *f, const struct rte_deque *d);
+
+/**
+ * Return the number of entries in a deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The number of entries in the deque.
+ */
+static inline unsigned int
+rte_deque_count(const struct rte_deque *d)
+{
+	return (d->head - d->tail) & d->mask;
+}
+
+/**
+ * Return the number of free entries in a deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The number of free entries in the deque.
+ */
+static inline unsigned int
+rte_deque_free_count(const struct rte_deque *d)
+{
+	return d->capacity - rte_deque_count(d);
+}
+
+/**
+ * Enqueue fixed number of objects on a deque at the head.
+ *
+ * This function copies the objects at the head of the deque and
+ * moves the head index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_enqueue_bulk_elem(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n,
+			unsigned int *free_space)
+{
+	*free_space = rte_deque_free_count(d);
+	if (unlikely(n > *free_space))
+		return 0;
+	*free_space -= n;
+	return __rte_deque_enqueue_at_head(d, obj_table, esize, n);
+}
+
+/**
+ * Enqueue up to a maximum number of objects on a deque at the head.
+ *
+ * This function copies the objects at the head of the deque and
+ * moves the head index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_enqueue_burst_elem(struct rte_deque *d, const void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *free_space)
+{
+	unsigned int avail_space = rte_deque_free_count(d);
+	unsigned int to_be_enqueued = (n <= avail_space ? n : avail_space);
+	*free_space = avail_space - n;
+	return __rte_deque_enqueue_at_head(d, obj_table, esize, to_be_enqueued);
+}
+
+/**
+ * Enqueue fixed number of objects on a deque at the tail.
+ *
+ * This function copies the objects at the tail of the deque and
+ * moves the tail index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_enqueue_bulk_elem(struct rte_deque *d,
+				 const void *obj_table, unsigned int esize,
+				 unsigned int n, unsigned int *free_space)
+{
+	*free_space = rte_deque_free_count(d);
+	if (unlikely(n > *free_space))
+		return 0;
+	*free_space -= n;
+	return __rte_deque_enqueue_at_tail(d, obj_table, esize, n);
+}
+
+/**
+ * Enqueue up to a maximum number of objects on a deque at the tail.
+ *
+ * This function copies the objects at the tail of the deque and
+ * moves the tail index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the deque from the obj_table.
+ * @param free_space
+ *   Returns the amount of space in the deque after the enqueue operation
+ *   has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_enqueue_burst_elem(struct rte_deque *d,
+				const void *obj_table, unsigned int esize,
+				unsigned int n, unsigned int *free_space)
+{
+	unsigned int avail_space = rte_deque_free_count(d);
+	unsigned int to_be_enqueued = (n <= avail_space ? n : avail_space);
+	*free_space = avail_space - to_be_enqueued;
+	return __rte_deque_enqueue_at_tail(d, obj_table, esize, to_be_enqueued);
+}
+
+/**
+ * Dequeue a fixed number of objects from a deque at tail.
+ *
+ * This function copies the objects from the tail of the deque and
+ * moves the tail index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_dequeue_bulk_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	*available = rte_deque_count(d);
+	if (unlikely(n > *available))
+		return 0;
+	*available -= n;
+	return __rte_deque_dequeue_at_tail(d, obj_table, esize, n);
+}
+
+/**
+ * Dequeue up to a maximum number of objects from a deque at tail.
+ *
+ * This function copies the objects from the tail of the deque and
+ * moves the tail index.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_dequeue_burst_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	unsigned int count = rte_deque_count(d);
+	unsigned int to_be_dequeued = (n <= count ? n : count);
+	*available = count - to_be_dequeued;
+	return __rte_deque_dequeue_at_tail(d, obj_table, esize, to_be_dequeued);
+}
+
+/**
+ * Dequeue a fixed number of objects from a deque from the head.
+ *
+ * This function copies the objects from the head of the deque and
+ * moves the head index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_dequeue_bulk_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	*available = rte_deque_count(d);
+	if (unlikely(n > *available))
+		return 0;
+	*available -= n;
+	return __rte_deque_dequeue_at_head(d, obj_table, esize, n);
+}
+
+/**
+ * Dequeue up to a maximum number of objects from a deque from the head.
+ *
+ * This function copies the objects from the head of the deque and
+ * moves the head index (backwards).
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of deque object, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the deque. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the deque to the obj_table.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue
+ *   has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_dequeue_burst_elem(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available)
+{
+	unsigned int count = rte_deque_count(d);
+	unsigned int to_be_dequeued = (n <= count ? n : count);
+	*available = count - to_be_dequeued;
+	return __rte_deque_dequeue_at_head(d, obj_table, esize, to_be_dequeued);
+}
+
+/**
+ * Flush a deque.
+ *
+ * This function flush all the objects in a deque
+ *
+ * @warning
+ * Make sure the deque is not in use while calling this function.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ */
+__rte_experimental
+void rte_deque_reset(struct rte_deque *d);
+
+/**
+ * Test if a deque is full.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   - 1: The deque is full.
+ *   - 0: The deque is not full.
+ */
+static inline int
+rte_deque_full(const struct rte_deque *d)
+{
+	return rte_deque_free_count(d) == 0;
+}
+
+/**
+ * Test if a deque is empty.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   - 1: The deque is empty.
+ *   - 0: The deque is not empty.
+ */
+static inline int
+rte_deque_empty(const struct rte_deque *d)
+{
+	return d->tail == d->head;
+}
+
+/**
+ * Return the size of the deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The size of the data store used by the deque.
+ *   NOTE: this is not the same as the usable space in the deque. To query that
+ *   use ``rte_deque_get_capacity()``.
+ */
+static inline unsigned int
+rte_deque_get_size(const struct rte_deque *d)
+{
+	return d->size;
+}
+
+/**
+ * Return the number of objects which can be stored in the deque.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @return
+ *   The usable size of the deque.
+ */
+static inline unsigned int
+rte_deque_get_capacity(const struct rte_deque *d)
+{
+	return d->capacity;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_H_ */
diff --git a/lib/deque/rte_deque_core.h b/lib/deque/rte_deque_core.h
new file mode 100644
index 0000000000..0bb8695c8a
--- /dev/null
+++ b/lib/deque/rte_deque_core.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_CORE_H_
+#define _RTE_DEQUE_CORE_H_
+
+/**
+ * @file
+ * This file contains definition of RTE deque structure, init flags and
+ * some related macros. This file should not be included directly,
+ * include rte_deque.h instead.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <string.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_debug.h>
+
+extern int rte_deque_log_type;
+
+#define RTE_DEQUE_MZ_PREFIX "DEQUE_"
+/** The maximum length of a deque name. */
+#define RTE_DEQUE_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_DEQUE_MZ_PREFIX) + 1)
+
+/**
+ * Double ended queue (deque) structure.
+ *
+ * The producer and the consumer have a head and a tail index. These indices
+ * are not between 0 and size(deque)-1. These indices are between 0 and
+ * 2^32 -1. Their value is masked while accessing the objects in deque.
+ * These indices are unsigned 32bits. Hence the result of the subtraction is
+ * always a modulo of 2^32 and it is between 0 and capacity.
+ */
+struct rte_deque {
+	alignas(RTE_CACHE_LINE_SIZE) char name[RTE_DEQUE_NAMESIZE];
+	/**< Name of the deque */
+	int flags;
+	/**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+	/**< Memzone, if any, containing the rte_deque */
+
+	alignas(RTE_CACHE_LINE_SIZE) char pad0; /**< empty cache line */
+
+	uint32_t size;           /**< Size of deque. */
+	uint32_t mask;           /**< Mask (size-1) of deque. */
+	uint32_t capacity;       /**< Usable size of deque */
+	/** Ring head and tail pointers. */
+	volatile uint32_t head;
+	volatile uint32_t tail;
+};
+
+/**
+ * Deque is to hold exactly requested number of entries.
+ * Without this flag set, the deque size requested must be a power of 2, and the
+ * usable space will be that size - 1. With the flag, the requested size will
+ * be rounded up to the next power of two, but the usable space will be exactly
+ * that requested. Worst case, if a power-of-2 size is requested, half the
+ * deque space will be wasted.
+ */
+#define RTE_DEQUE_F_EXACT_SZ 0x0004
+#define RTE_DEQUE_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_CORE_H_ */
diff --git a/lib/deque/rte_deque_pvt.h b/lib/deque/rte_deque_pvt.h
new file mode 100644
index 0000000000..931bbd4d19
--- /dev/null
+++ b/lib/deque/rte_deque_pvt.h
@@ -0,0 +1,538 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#ifndef _RTE_DEQUE_PVT_H_
+#define _RTE_DEQUE_PVT_H_
+
+#define __RTE_DEQUE_COUNT(d) ((d->head - d->tail) & d->mask)
+#define __RTE_DEQUE_FREE_SPACE(d) (d->capacity - __RTE_DEQUE_COUNT(d))
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_32(struct rte_deque *d,
+				const unsigned int size,
+				uint32_t idx,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t *deque = (uint32_t *)&d[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			deque[idx] = obj[i];
+			deque[idx + 1] = obj[i + 1];
+			deque[idx + 2] = obj[i + 2];
+			deque[idx + 3] = obj[i + 3];
+			deque[idx + 4] = obj[i + 4];
+			deque[idx + 5] = obj[i + 5];
+			deque[idx + 6] = obj[i + 6];
+			deque[idx + 7] = obj[i + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 6:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 5:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 4:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 3:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			deque[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_64(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->head & d->mask);
+	uint64_t *deque = (uint64_t *)&d[1];
+	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			deque[idx] = obj[i];
+			deque[idx + 1] = obj[i + 1];
+			deque[idx + 2] = obj[i + 2];
+			deque[idx + 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			deque[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_head_128(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->head & d->mask);
+	rte_int128_t *deque = (rte_int128_t *)&d[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_enqueue_at_head(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_enqueue_elems_head_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_enqueue_elems_head_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->head & d->mask;
+		nd_idx = idx * scale;
+		nd_size = d->size * scale;
+		__rte_deque_enqueue_elems_head_32(d, nd_size, nd_idx,
+						obj_table, nd_num);
+	}
+	d->head = (d->head + n) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_32(struct rte_deque *d,
+				const unsigned int mask,
+				uint32_t idx,
+				const void *obj_table,
+				unsigned int n,
+				const unsigned int scale,
+				const unsigned int elem_size)
+{
+	unsigned int i;
+	uint32_t *deque = (uint32_t *)&d[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+
+	if (likely(idx >= n)) {
+		for (i = 0; i < n; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+	} else {
+		for (i = 0; (int32_t)idx >= 0; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+
+		/* Start at the ending */
+		idx = mask;
+		for (; i < n; idx -= scale, i += scale)
+			memcpy(&deque[idx], &obj[i], elem_size);
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_64(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->tail & d->mask);
+	uint64_t *deque = (uint64_t *)&d[1];
+	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx -= 4) {
+			deque[idx] = obj[i];
+			deque[idx - 1] = obj[i + 1];
+			deque[idx - 2] = obj[i + 2];
+			deque[idx - 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		case 2:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		case 1:
+			deque[idx--] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			deque[idx] = obj[i];
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			deque[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_enqueue_elems_tail_128(struct rte_deque *d,
+				const void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->tail & d->mask);
+	rte_int128_t *deque = (rte_int128_t *)&d[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx -= 2) {
+			deque[idx] = obj[i];
+			deque[idx - 1] = obj[i + 1];
+		}
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			memcpy((void *)(deque + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_enqueue_at_tail(struct rte_deque *d,
+			const void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* The tail point must point at an empty cell when enqueuing */
+	d->tail--;
+
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_enqueue_elems_tail_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_enqueue_elems_tail_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_mask;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->tail & d->mask;
+		nd_idx = idx * scale;
+		nd_mask = d->mask * scale;
+		__rte_deque_enqueue_elems_tail_32(d, nd_mask, nd_idx, obj_table,
+						nd_num, scale, esize);
+	}
+
+	/* The +1 is because the tail needs to point at a
+	 * non-empty memory location after the enqueuing operation.
+	 */
+	d->tail = (d->tail - n + 1) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_32(struct rte_deque *d,
+			const unsigned int size,
+			uint32_t idx,
+			void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t *deque = (const uint32_t *)&d[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx + 1];
+			obj[i + 2] = deque[idx + 2];
+			obj[i + 3] = deque[idx + 3];
+			obj[i + 4] = deque[idx + 4];
+			obj[i + 5] = deque[idx + 5];
+			obj[i + 6] = deque[idx + 6];
+			obj[i + 7] = deque[idx + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 6:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 5:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 4:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 3:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = deque[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_64(struct rte_deque *d, void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->tail & d->mask);
+	const uint64_t *deque = (const uint64_t *)&d[1];
+	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx + 1];
+			obj[i + 2] = deque[idx + 2];
+			obj[i + 3] = deque[idx + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = deque[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_128(struct rte_deque *d,
+			void *obj_table,
+			unsigned int n)
+{
+	unsigned int i;
+	const uint32_t size = d->size;
+	uint32_t idx = (d->tail & d->mask);
+	const rte_int128_t *deque = (const rte_int128_t *)&d[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely(idx + n <= size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_dequeue_at_tail(struct rte_deque *d,
+			void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_dequeue_elems_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_dequeue_elems_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->tail & d->mask;
+		nd_idx = idx * scale;
+		nd_size = d->size * scale;
+		__rte_deque_dequeue_elems_32(d, nd_size, nd_idx,
+					obj_table, nd_num);
+	}
+	d->tail = (d->tail + n) & d->mask;
+	return n;
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_32(struct rte_deque *d,
+				const unsigned int mask,
+				uint32_t idx,
+				void *obj_table,
+				unsigned int n,
+				const unsigned int scale,
+				const unsigned int elem_size)
+{
+	unsigned int i;
+	const uint32_t *deque = (uint32_t *)&d[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+
+	if (likely(idx >= n)) {
+		for (i = 0; i < n; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+	} else {
+		for (i = 0; (int32_t)idx >= 0; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+		/* Start at the ending */
+		idx = mask;
+		for (; i < n; idx -= scale, i += scale)
+			memcpy(&obj[i], &deque[idx], elem_size);
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_64(struct rte_deque *d,
+				void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->head & d->mask);
+	const uint64_t *deque = (uint64_t *)&d[1];
+	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx -= 4) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx - 1];
+			obj[i + 2] = deque[idx - 2];
+			obj[i + 3] = deque[idx - 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = deque[idx--];  /* fallthrough */
+		case 2:
+			obj[i++] = deque[idx--]; /* fallthrough */
+		case 1:
+			obj[i++] = deque[idx--]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			obj[i] = deque[idx];
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			obj[i] = deque[idx];
+	}
+}
+
+static __rte_always_inline void
+__rte_deque_dequeue_elems_head_128(struct rte_deque *d,
+				void *obj_table,
+				unsigned int n)
+{
+	unsigned int i;
+	uint32_t idx = (d->head & d->mask);
+	const rte_int128_t *deque = (rte_int128_t *)&d[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely((int32_t)(idx - n) >= 0)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx -= 2) {
+			obj[i] = deque[idx];
+			obj[i + 1] = deque[idx - 1];
+		}
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		}
+	} else {
+		for (i = 0; (int32_t)idx >= 0; i++, idx--)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+		/* Start at the ending */
+		for (idx = d->mask; i < n; i++, idx--)
+			memcpy((void *)(obj + i),
+				(const void *)(deque + idx), 16);
+	}
+}
+
+static __rte_always_inline unsigned int
+__rte_deque_dequeue_at_head(struct rte_deque *d,
+			void *obj_table,
+			unsigned int esize,
+			unsigned int n)
+{
+	/* The head must point at an empty cell when dequeueing */
+	d->head--;
+
+	/* 8B and 16B copies implemented individually because on some platforms
+	 * there are 64 bit and 128 bit registers available for direct copying.
+	 */
+	if (esize == 8)
+		__rte_deque_dequeue_elems_head_64(d, obj_table, n);
+	else if (esize == 16)
+		__rte_deque_dequeue_elems_head_128(d, obj_table, n);
+	else {
+		uint32_t idx, scale, nd_idx, nd_num, nd_mask;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nd_num = n * scale;
+		idx = d->head & d->mask;
+		nd_idx = idx * scale;
+		nd_mask = d->mask * scale;
+		__rte_deque_dequeue_elems_head_32(d, nd_mask, nd_idx, obj_table,
+						nd_num, scale, esize);
+	}
+
+	/* The +1 is because the head needs to point at a
+	 * empty memory location after the dequeueing operation.
+	 */
+	d->head = (d->head - n + 1) & d->mask;
+	return n;
+}
+#endif /* _RTE_DEQUEU_PVT_H_ */
diff --git a/lib/deque/rte_deque_zc.h b/lib/deque/rte_deque_zc.h
new file mode 100644
index 0000000000..6d7167e158
--- /dev/null
+++ b/lib/deque/rte_deque_zc.h
@@ -0,0 +1,430 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+#ifndef _RTE_DEQUE_ZC_H_
+#define _RTE_DEQUE_ZC_H_
+
+/**
+ * @file
+ * This file should not be included directly, include rte_deque.h instead.
+ *
+ * Deque Zero Copy APIs
+ * These APIs make it possible to split public enqueue/dequeue API
+ * into 3 parts:
+ * - enqueue/dequeue start
+ * - copy data to/from the deque
+ * - enqueue/dequeue finish
+ * These APIs provide the ability to avoid copying of the data to temporary area.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Deque zero-copy information structure.
+ *
+ * This structure contains the pointers and length of the space
+ * reserved on the Deque storage.
+ */
+struct __rte_cache_aligned rte_deque_zc_data {
+	/* Pointer to the first space in the deque */
+	void *ptr1;
+	/* Pointer to the second space in the deque if there is wrap-around.
+	 * It contains valid value only if wrap-around happens.
+	 */
+	void *ptr2;
+	/* Number of elements in the first pointer. If this is equal to
+	 * the number of elements requested, then ptr2 is NULL.
+	 * Otherwise, subtracting n1 from number of elements requested
+	 * will give the number of elements available at ptr2.
+	 */
+	unsigned int n1;
+};
+
+static __rte_always_inline void
+__rte_deque_get_elem_addr(struct rte_deque *d, uint32_t pos,
+	uint32_t esize, uint32_t num, void **dst1, uint32_t *n1, void **dst2,
+	bool low_to_high)
+{
+	uint32_t idx, scale, nr_idx;
+	uint32_t *deque_ptr = (uint32_t *)&d[1];
+
+	/* Normalize to uint32_t */
+	scale = esize / sizeof(uint32_t);
+	idx = pos & d->mask;
+	nr_idx = idx * scale;
+
+	*dst1 = deque_ptr + nr_idx;
+	*n1 = num;
+
+	if (low_to_high) {
+		if (idx + num > d->size) {
+			*n1 = d->size - idx;
+			*dst2 = deque_ptr;
+		} else
+			*dst2 = NULL;
+	} else {
+		if ((int32_t)(idx - num) < 0) {
+			*n1 = idx + 1;
+			*dst2 = (void *)&deque_ptr[(-1 & d->mask) * scale];
+		} else
+			*dst2 = NULL;
+	}
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the deque by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the deque using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_enqueue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	if (unlikely(*free_space < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->head, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, true);
+
+	*free_space -= n;
+	return n;
+}
+
+/**
+ * Complete enqueuing several pointers to objects on the deque.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of pointers to objects to add to the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_head_enqueue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->head = (d->head + n) & d->mask;
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the queue using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_enqueue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	n = n > *free_space ? *free_space : n;
+	return rte_deque_head_enqueue_zc_bulk_elem_start(d, esize, n, zcd, free_space);
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the deque by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the deque using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_enqueue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	if (unlikely(*free_space < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->tail - 1, esize, n, &zcd->ptr1,
+							  &zcd->n1, &zcd->ptr2, false);
+
+	*free_space -= n;
+	return n;
+}
+
+/**
+ * Complete enqueuing several pointers to objects on the deque.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of pointers to objects to add to the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_tail_enqueue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->tail = (d->tail - n) & d->mask;
+}
+
+/**
+ * Start to enqueue several objects on the deque.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves space for the user on the deque.
+ * User has to copy objects into the queue using the returned pointers.
+ * User should call rte_deque_enqueue_zc_elem_finish to complete the
+ * enqueue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to add in the deque.@param r
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param free_space
+ *   Returns the amount of space in the deque after the reservation operation
+ *   has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_enqueue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *free_space)
+{
+	*free_space = __RTE_DEQUE_FREE_SPACE(d);
+	n = n > *free_space ? *free_space : n;
+	return rte_deque_tail_enqueue_zc_bulk_elem_start(d, esize, n, zcd, free_space);
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_dequeue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	if (unlikely(*available < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->tail, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, true);
+
+	*available -= n;
+	return n;
+}
+
+/**
+ * Complete dequeuing several objects from the deque.
+ * Note that number of objects to dequeued should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of objects to remove from the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_tail_dequeue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->tail = (d->tail + n) & d->mask;
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_tail_dequeue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	n = n > *available ? *available : n;
+	return rte_deque_tail_dequeue_zc_bulk_elem_start(d, esize, n, zcd, available);
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_dequeue_zc_bulk_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	if (unlikely(*available < n))
+		return 0;
+	__rte_deque_get_elem_addr(d, d->head - 1, esize, n, &zcd->ptr1,
+							&zcd->n1, &zcd->ptr2, false);
+
+	*available -= n;
+	return n;
+}
+
+/**
+ * Complete dequeuing several objects from the deque.
+ * Note that number of objects to dequeued should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param n
+ *   The number of objects to remove from the deque.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_deque_head_dequeue_zc_elem_finish(struct rte_deque *d, unsigned int n)
+{
+	d->head = (d->head - n) & d->mask;
+}
+
+/**
+ * Start to dequeue several objects from the deque.
+ * Note that no actual objects are copied from the queue by this function.
+ * User has to copy objects from the queue using the returned pointers.
+ * User should call rte_deque_dequeue_zc_elem_finish to complete the
+ * dequeue operation.
+ *
+ * @param d
+ *   A pointer to the deque structure.
+ * @param esize
+ *   The size of deque element, in bytes. It must be a multiple of 4.
+ * @param n
+ *   The number of objects to remove from the deque.
+ * @param zcd
+ *   Structure containing the pointers and length of the space
+ *   reserved on the deque storage.
+ * @param available
+ *   Returns the number of remaining deque entries after the dequeue has
+ *   finished.
+ * @return
+ *   The number of objects that can be dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_deque_head_dequeue_zc_burst_elem_start(struct rte_deque *d, unsigned int esize,
+	unsigned int n, struct rte_deque_zc_data *zcd, unsigned int *available)
+{
+	*available = __RTE_DEQUE_COUNT(d);
+	n = n > *available ? *available : n;
+	return rte_deque_head_dequeue_zc_bulk_elem_start(d, esize, n, zcd, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DEQUE_ZC_H_ */
diff --git a/lib/deque/version.map b/lib/deque/version.map
new file mode 100644
index 0000000000..103fd3b512
--- /dev/null
+++ b/lib/deque/version.map
@@ -0,0 +1,14 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 24.07
+	rte_deque_log_type;
+	rte_deque_create;
+	rte_deque_dump;
+	rte_deque_free;
+	rte_deque_get_memsize_elem;
+	rte_deque_init;
+	rte_deque_reset;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 179a272932..127e4dc68c 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -14,6 +14,7 @@ libraries = [
         'argparse',
         'telemetry', # basic info querying
         'eal', # everything depends on eal
+        'deque',
         'ring',
         'rcu', # rcu depends on ring
         'mempool',
@@ -74,6 +75,7 @@ if is_ms_compiler
             'kvargs',
             'telemetry',
             'eal',
+            'deque',
             'ring',
     ]
 endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 2/2] deque: add unit tests for the deque library
  2024-05-02 20:19         ` [PATCH v3 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  2024-05-02 20:19           ` [PATCH v3 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
@ 2024-05-02 20:19           ` Aditya Ambadipudi
  2024-05-02 20:29           ` [PATCH v3 0/2] deque: add multithread unsafe " Aditya Ambadipudi
  2024-06-27 15:03           ` Thomas Monjalon
  3 siblings, 0 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-05-02 20:19 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors, probb, alialnu
  Cc: wathsala.vithanage, dhruv.tripathi, honnappa.nagarahalli, nd,
	venamb01, Aditya Ambadipudi

Add unit test cases that test all of the enqueue/dequeue functions.
Both normal enqueue/dequeue functions and the zerocopy API functions.

Signed-off-by: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
---
v3:
 * Fixed a few casts that were causing compiler warnings.

v2:
  * Addressed the spell check warning issue with the word "Deque"
  * Tried to rename all objects that are named deque to avoid collision with
    std::deque

 app/test/meson.build                   |    2 +
 app/test/test_deque_enqueue_dequeue.c  | 1228 ++++++++++++++++++++++++
 app/test/test_deque_helper_functions.c |  169 ++++
 3 files changed, 1399 insertions(+)
 create mode 100644 app/test/test_deque_enqueue_dequeue.c
 create mode 100644 app/test/test_deque_helper_functions.c

diff --git a/app/test/meson.build b/app/test/meson.build
index 7d909039ae..8913050c9b 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -60,6 +60,8 @@ source_file_deps = {
     'test_cryptodev_security_tls_record.c': ['cryptodev', 'security'],
     'test_cycles.c': [],
     'test_debug.c': [],
+    'test_deque_enqueue_dequeue.c': ['deque'],
+    'test_deque_helper_functions.c': ['deque'],
     'test_devargs.c': ['kvargs'],
     'test_dispatcher.c': ['dispatcher'],
     'test_distributor.c': ['distributor'],
diff --git a/app/test/test_deque_enqueue_dequeue.c b/app/test/test_deque_enqueue_dequeue.c
new file mode 100644
index 0000000000..8180677380
--- /dev/null
+++ b/app/test/test_deque_enqueue_dequeue.c
@@ -0,0 +1,1228 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+
+#include "test.h"
+
+#include <assert.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_deque.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+static const int esize[] = {4, 8, 16, 20};
+#define DEQUE_SIZE 4096
+#define MAX_BULK 32
+#define TEST_DEQUE_FULL_EMPTY_ITER 8
+
+/*
+ * Validate the return value of test cases and print details of the
+ * deque if validation fails
+ *
+ * @param exp
+ *   Expression to validate return value.
+ * @param r
+ *   A pointer to the deque structure.
+ */
+#define TEST_DEQUE_VERIFY(exp, d, errst) do {				\
+	if (!(exp)) {							\
+		printf("error at %s:%d\tcondition " #exp " failed\n",	\
+			__func__, __LINE__);				\
+		rte_deque_dump(stdout, (d));				\
+		errst;							\
+	}								\
+} while (0)
+
+static int
+test_deque_mem_cmp(void *src, void *dst, unsigned int size)
+{
+	int ret;
+
+	ret = memcmp(src, dst, size);
+	if (ret) {
+		rte_hexdump(stdout, "src", src, size);
+		rte_hexdump(stdout, "dst", dst, size);
+		printf("data after dequeue is not the same\n");
+	}
+
+	return ret;
+}
+
+static int
+test_deque_mem_cmp_rvs(void *src, void *dst,
+		unsigned int count, unsigned int esize)
+{
+	int ret = 0;
+	uint32_t *src32 = ((uint32_t *)src), *dst32 = ((uint32_t *)dst);
+	uint32_t scale = esize/(sizeof(uint32_t));
+
+	/* Start at the end of the dst and compare from there.*/
+	dst32 += (count - 1) * scale;
+	for (unsigned int i = 0; i < count; i++) {
+		for (unsigned int j = 0; j < scale; j++) {
+			if (src32[j] != dst32[j]) {
+				ret = -1;
+				break;
+			}
+		}
+		if (ret)
+			break;
+		dst32 -= scale;
+		src32 += scale;
+	}
+	if (ret) {
+		rte_hexdump(stdout, "src", src, count * esize);
+		rte_hexdump(stdout, "dst", dst, count * esize);
+		printf("data after dequeue is not the same\n");
+	}
+
+	return ret;
+}
+
+static inline void *
+test_deque_calloc(unsigned int dsize, int esize)
+{
+	void *p;
+
+	p = rte_zmalloc(NULL, dsize * esize, RTE_CACHE_LINE_SIZE);
+	if (p == NULL)
+		printf("Failed to allocate memory\n");
+
+	return p;
+}
+
+static void
+test_deque_mem_init(void *obj, unsigned int count, int esize)
+{
+	for (unsigned int i = 0; i < (count * esize / sizeof(uint32_t)); i++)
+		((uint32_t *)obj)[i] = i;
+}
+
+static inline void *
+test_deque_inc_ptr(void *obj, int esize, unsigned int n)
+{
+	return (void *)((uint32_t *)obj + (n * esize / sizeof(uint32_t)));
+}
+
+/* Copy to the deque memory */
+static inline void
+test_deque_zc_copy_to_deque(struct rte_deque_zc_data *zcd, const void *src, int esize,
+	unsigned int num)
+{
+	memcpy(zcd->ptr1, src, esize * zcd->n1);
+	if (zcd->n1 != num) {
+		const void *inc_src = (const void *)((const char *)src +
+						(zcd->n1 * esize));
+		memcpy(zcd->ptr2, inc_src, esize * (num - zcd->n1));
+	}
+}
+
+static inline void
+test_deque_zc_copy_to_deque_rev(struct rte_deque_zc_data *zcd, const void *src,
+					int esize, unsigned int num)
+{
+	void *ptr1 = zcd->ptr1;
+	for (unsigned int i = 0; i < zcd->n1; i++) {
+		memcpy(ptr1, src, esize);
+		src = (const void *)((const char *)src + esize);
+		ptr1 = (void *)((char *)ptr1 - esize);
+	}
+	if (zcd->n1 != num) {
+		void *ptr2 = zcd->ptr2;
+		for (unsigned int i = 0; i < (num - zcd->n1); i++) {
+			memcpy(ptr2, src, esize);
+			src = (const void *)((const char *)src + esize);
+			ptr2 = (void *)((char *)ptr2 - esize);
+		}
+	}
+}
+
+/* Copy from the deque memory */
+static inline void
+test_deque_zc_copy_from_deque(struct rte_deque_zc_data *zcd, void *dst, int esize,
+	unsigned int num)
+{
+	memcpy(dst, zcd->ptr1, esize * zcd->n1);
+
+	if (zcd->n1 != num) {
+		dst = test_deque_inc_ptr(dst, esize, zcd->n1);
+		memcpy(dst, zcd->ptr2, esize * (num - zcd->n1));
+	}
+}
+
+static inline void
+test_deque_zc_copy_from_deque_rev(struct rte_deque_zc_data *zcd, void *dst, int esize,
+	unsigned int num)
+{
+	void *ptr1 = zcd->ptr1;
+	for (unsigned int i = 0; i < zcd->n1; i++) {
+		memcpy(dst, ptr1, esize);
+		dst = (void *)((char *)dst + esize);
+		ptr1 = (void *)((char *)ptr1 - esize);
+	}
+	if (zcd->n1 != num) {
+		void *ptr2 = zcd->ptr2;
+		for (unsigned int i = 0; i < (num - zcd->n1); i++) {
+			memcpy(dst, ptr2, esize);
+			dst = (void *)((char *)dst + esize);
+			ptr2 = (void *)((char *)ptr2 - esize);
+		}
+	}
+}
+
+/* Wrappers around the zero-copy APIs. The wrappers match
+ * the normal enqueue/dequeue API declarations.
+ */
+static unsigned int
+test_deque_head_enqueue_zc_bulk_elem(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_head_enqueue_zc_bulk_elem_start(d, esize, n,
+						&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque(&zcd, obj_table, esize, ret);
+		rte_deque_head_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_tail_dequeue_zc_bulk_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_tail_dequeue_zc_bulk_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque(&zcd, obj_table, esize, ret);
+		rte_deque_tail_dequeue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_head_enqueue_zc_burst_elem(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_head_enqueue_zc_burst_elem_start(d, esize, n,
+						&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque(&zcd, obj_table, esize, ret);
+		rte_deque_head_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_tail_dequeue_zc_burst_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_tail_dequeue_zc_burst_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque(&zcd, obj_table, esize, ret);
+		rte_deque_tail_dequeue_zc_elem_finish(d, ret);
+	}
+	return ret;
+}
+
+static unsigned int
+test_deque_tail_enqueue_zc_bulk_elem(struct rte_deque *d, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_tail_enqueue_zc_bulk_elem_start(d, esize, n,
+							&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_tail_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_head_dequeue_zc_bulk_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_head_dequeue_zc_bulk_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_head_dequeue_zc_elem_finish(d, ret);
+	}
+	return ret;
+}
+
+static unsigned int
+test_deque_tail_enqueue_zc_burst_elem(struct rte_deque *d,
+	const void *obj_table, unsigned int esize, unsigned int n,
+	unsigned int *free_space)
+{
+	uint32_t ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_tail_enqueue_zc_burst_elem_start(d, esize, n,
+							&zcd, free_space);
+	if (ret != 0) {
+		/* Copy the data to the deque */
+		test_deque_zc_copy_to_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_tail_enqueue_zc_elem_finish(d, ret);
+	}
+
+	return ret;
+}
+
+static unsigned int
+test_deque_head_dequeue_zc_burst_elem(struct rte_deque *d, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	unsigned int ret;
+	struct rte_deque_zc_data zcd;
+
+	ret = rte_deque_head_dequeue_zc_burst_elem_start(d, esize, n,
+				&zcd, available);
+	if (ret != 0) {
+		/* Copy the data from the deque */
+		test_deque_zc_copy_from_deque_rev(&zcd, obj_table, esize, ret);
+		rte_deque_head_dequeue_zc_elem_finish(d, ret);
+	}
+	return ret;
+}
+
+#define TEST_DEQUE_ELEM_BULK 8
+#define TEST_DEQUE_ELEM_BURST 16
+static const struct {
+	const char *desc;
+	const int api_flags;
+	unsigned int (*enq)(struct rte_deque *d, const void *obj_table,
+		unsigned int esize, unsigned int n,
+		unsigned int *free_space);
+	unsigned int (*deq)(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available);
+	/* This dequeues in the opposite direction of enqueue.
+	 * This is used for testing stack behavior
+	 */
+	unsigned int (*deq_opp)(struct rte_deque *d, void *obj_table,
+			unsigned int esize, unsigned int n,
+			unsigned int *available);
+} test_enqdeq_impl[] = {
+	{
+		.desc = "Deque forward direction bulkmode",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = rte_deque_head_enqueue_bulk_elem,
+		.deq = rte_deque_tail_dequeue_bulk_elem,
+		.deq_opp = rte_deque_head_dequeue_bulk_elem,
+	},
+	{
+		.desc = "Deque forward direction burstmode",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = rte_deque_head_enqueue_burst_elem,
+		.deq = rte_deque_tail_dequeue_burst_elem,
+		.deq_opp = rte_deque_head_dequeue_burst_elem,
+	},
+	{
+		.desc = "Deque reverse direction bulkmode",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = rte_deque_tail_enqueue_bulk_elem,
+		.deq = rte_deque_head_dequeue_bulk_elem,
+		.deq_opp = rte_deque_tail_dequeue_bulk_elem,
+	},
+	{
+		.desc = "Deque reverse direction burstmode",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = rte_deque_tail_enqueue_burst_elem,
+		.deq = rte_deque_head_dequeue_burst_elem,
+		.deq_opp = rte_deque_tail_dequeue_burst_elem,
+	},
+	{
+		.desc = "Deque forward direction bulkmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = test_deque_head_enqueue_zc_bulk_elem,
+		.deq = test_deque_tail_dequeue_zc_bulk_elem,
+		.deq_opp = test_deque_head_dequeue_zc_bulk_elem,
+	},
+	{
+		.desc = "Deque forward direction burstmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = test_deque_head_enqueue_zc_burst_elem,
+		.deq = test_deque_tail_dequeue_zc_burst_elem,
+		.deq_opp = test_deque_head_dequeue_zc_burst_elem,
+	},
+	{
+		.desc = "Deque reverse direction bulkmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BULK,
+		.enq = test_deque_tail_enqueue_zc_bulk_elem,
+		.deq = test_deque_head_dequeue_zc_bulk_elem,
+		.deq_opp = test_deque_tail_dequeue_zc_bulk_elem,
+	},
+	{
+		.desc = "Deque reverse direction burstmode zero copy",
+		.api_flags = TEST_DEQUE_ELEM_BURST,
+		.enq = test_deque_tail_enqueue_zc_burst_elem,
+		.deq = test_deque_head_dequeue_zc_burst_elem,
+		.deq_opp = test_deque_tail_dequeue_zc_burst_elem,
+	},
+};
+
+/*
+ * Burst and bulk operations in regular mode and zero copy mode.
+ * Random number of elements are enqueued and dequeued.
+ */
+static int
+test_deque_burst_bulk_tests1(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, j, temp_sz, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Over the boundary deque.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Random full/empty test\n");
+
+		for (j = 0; j != TEST_DEQUE_FULL_EMPTY_ITER; j++) {
+			/* random shift in the deque */
+			unsigned int rand = RTE_MAX(rte_rand() % DEQUE_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+				__func__, i, rand);
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							rand, &free_space);
+			TEST_DEQUE_VERIFY(ret == rand, d, goto fail);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+							rand, &available);
+			TEST_DEQUE_VERIFY(ret == rand, d, goto fail);
+
+			/* fill the deque */
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], dsz,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d), d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d, goto fail);
+
+			/* empty the deque */
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], dsz,
+							&available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d, goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+			/* check data */
+			temp_sz = dsz * esize[i];
+			TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst, temp_sz) == 0,
+							d, goto fail);
+		}
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with regular & zero copy mode.
+ * Sequence of simple enqueues/dequeues and validate the enqueued and
+ * dequeued data.
+ */
+static int
+test_deque_burst_bulk_tests2(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, free_space, available;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+		esize[i]);
+
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Multiple enqs, deqs.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("enqueue 1 obj\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						1, &free_space);
+		TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+
+		printf("enqueue 2 objs\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						2, &free_space);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+		printf("enqueue MAX_BULK objs\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						MAX_BULK, &free_space);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+
+		printf("dequeue 1 obj\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						1, &available);
+		TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 1);
+
+		printf("dequeue 2 objs\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						2, &available);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+		printf("dequeue MAX_BULK objs\n");
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						MAX_BULK, &available);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+				RTE_PTR_DIFF(cur_dst, dst)) == 0,
+				d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with normal mode & zero copy mode.
+ * Enqueue and dequeue to cover the entire deque length.
+ */
+static int
+test_deque_burst_bulk_tests3(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, j, free_space, available;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("fill and empty the deque\n");
+		for (j = 0; j < DEQUE_SIZE / MAX_BULK; j++) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], MAX_BULK,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], MAX_BULK,
+							&available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations with normal mode & zero copy mode.
+ * Enqueue till the deque is full and dequeue till the deque becomes empty.
+ */
+static int
+test_deque_burst_bulk_tests4(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, j, available, free_space;
+	unsigned int num_elems, api_type;
+	api_type = test_enqdeq_impl[test_idx].api_flags;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Test enqueue without enough memory space\n");
+		for (j = 0; j < (DEQUE_SIZE/MAX_BULK - 1); j++) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], MAX_BULK,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+		}
+
+		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						2, &free_space);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_DEQUE_ELEM_BULK))
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		/* Always one free entry left */
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						num_elems, &free_space);
+		TEST_DEQUE_VERIFY(ret == (MAX_BULK - 3), d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i],
+							(MAX_BULK - 3));
+
+		printf("Test if deque is full\n");
+		TEST_DEQUE_VERIFY(rte_deque_full(d) == 1, d, goto fail);
+
+		printf("Test enqueue for a full entry\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						1, &free_space);
+		TEST_DEQUE_VERIFY(ret == 0, d, goto fail);
+
+		printf("Test dequeue without enough objects\n");
+		for (j = 0; j < DEQUE_SIZE / MAX_BULK - 1; j++) {
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+							MAX_BULK, &available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i],
+						MAX_BULK);
+		}
+
+		/* Available memory space for the exact MAX_BULK entries */
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						2, &available);
+		TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_DEQUE_ELEM_BULK))
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		ret = test_enqdeq_impl[test_idx].deq(d, cur_dst, esize[i],
+						num_elems, &available);
+		TEST_DEQUE_VERIFY(ret == MAX_BULK - 3, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK - 3);
+
+		printf("Test if deque is empty\n");
+		/* Check if deque is empty */
+		TEST_DEQUE_VERIFY(rte_deque_empty(d) == 1, d, goto fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Basic test cases with exact size deque.
+ */
+static int
+test_deque_with_exact_size(void)
+{
+	struct rte_deque *std_d = NULL, *exact_sz_d = NULL;
+	void *src_orig = NULL, *dst_orig = NULL;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	const unsigned int deque_sz = 16;
+	unsigned int i, j, free_space, available;
+	int ret = -1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("\nTest exact size deque. Esize: %d\n", esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "std sized deque";
+		std_d = rte_deque_create(DEQUE_NAME, esize[i], deque_sz, 0, 0);
+
+		if (std_d == NULL) {
+			printf("%s: error, can't create std deque\n", __func__);
+			goto test_fail;
+		}
+		static const char *DEQUE_NAME2 = "Exact sized deque";
+		exact_sz_d = rte_deque_create(DEQUE_NAME2, esize[i], deque_sz,
+					0, RTE_DEQUE_F_EXACT_SZ);
+		if (exact_sz_d == NULL) {
+			printf("%s: error, can't create exact size deque\n",
+					__func__);
+			goto test_fail;
+		}
+
+		/* alloc object pointers. Allocate one extra object
+		 * and create an unaligned address.
+		 */
+		src_orig = test_deque_calloc(17, esize[i]);
+		if (src_orig == NULL)
+			goto test_fail;
+		test_deque_mem_init(src_orig, 17, esize[i]);
+		src = (void *)((uintptr_t)src_orig + 1);
+		cur_src = src;
+
+		dst_orig = test_deque_calloc(17, esize[i]);
+		if (dst_orig == NULL)
+			goto test_fail;
+		dst = (void *)((uintptr_t)dst_orig + 1);
+		cur_dst = dst;
+
+		/*
+		 * Check that the exact size deque is bigger than the
+		 * standard deque
+		 */
+		TEST_DEQUE_VERIFY(rte_deque_get_size(std_d) <=
+				rte_deque_get_size(exact_sz_d),
+				std_d, goto test_fail);
+
+		/*
+		 * check that the exact_sz_deque can hold one more element
+		 * than the standard deque. (16 vs 15 elements)
+		 */
+		for (j = 0; j < deque_sz - 1; j++) {
+			ret = test_enqdeq_impl[0].enq(std_d, cur_src, esize[i],
+						1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, std_d, goto test_fail);
+			ret = test_enqdeq_impl[0].enq(exact_sz_d, cur_src,
+						esize[i], 1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, exact_sz_d, goto test_fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+		}
+		ret = test_enqdeq_impl[0].enq(std_d, cur_src, esize[i], 1,
+					&free_space);
+		TEST_DEQUE_VERIFY(ret == 0, std_d, goto test_fail);
+		ret = test_enqdeq_impl[0].enq(exact_sz_d, cur_src, esize[i], 1,
+					&free_space);
+		TEST_DEQUE_VERIFY(ret == 1, exact_sz_d, goto test_fail);
+
+		/* check that dequeue returns the expected number of elements */
+		ret = test_enqdeq_impl[0].deq(exact_sz_d, cur_dst, esize[i],
+					deque_sz, &available);
+		TEST_DEQUE_VERIFY(ret == (int)deque_sz, exact_sz_d,
+				goto test_fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], deque_sz);
+
+		/* check that the capacity function returns expected value */
+		TEST_DEQUE_VERIFY(rte_deque_get_capacity(exact_sz_d) == deque_sz,
+				exact_sz_d, goto test_fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp(src, dst,
+					RTE_PTR_DIFF(cur_dst, dst)) == 0,
+					exact_sz_d, goto test_fail);
+
+		rte_free(src_orig);
+		rte_free(dst_orig);
+		rte_deque_free(std_d);
+		rte_deque_free(exact_sz_d);
+		src_orig = NULL;
+		dst_orig = NULL;
+		std_d = NULL;
+		exact_sz_d = NULL;
+	}
+
+	return 0;
+
+test_fail:
+	rte_free(src_orig);
+	rte_free(dst_orig);
+	rte_deque_free(std_d);
+	rte_deque_free(exact_sz_d);
+	return -1;
+}
+
+/*
+ * Burst and bulk operations in regular mode and zero copy mode.
+ * Random number of elements are enqueued and dequeued first.
+ * Which would bring both head and tail to somewhere in the middle of
+ * the deque. From that point, stack behavior of the deque is tested.
+ */
+static int
+test_deque_stack_random_tests1(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, j, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests1.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Over the boundary deque.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Random starting point stack test\n");
+
+		for (j = 0; j != TEST_DEQUE_FULL_EMPTY_ITER; j++) {
+			/* random shift in the deque */
+			int rand = RTE_MAX(rte_rand() % DEQUE_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+				__func__, i, rand);
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src,
+							esize[i], rand,
+							&free_space);
+			TEST_DEQUE_VERIFY(ret != 0, d, goto fail);
+
+			ret = test_enqdeq_impl[test_idx].deq(d, cur_dst,
+							esize[i], rand,
+							&available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)rand, d,
+					goto fail);
+
+			/* fill the deque */
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							dsz, &free_space);
+			TEST_DEQUE_VERIFY(ret != 0, d, goto fail);
+
+			TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d,
+					goto fail);
+
+			/* empty the deque */
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i], dsz,
+								&available);
+			TEST_DEQUE_VERIFY(ret == (unsigned int)dsz, d, goto fail);
+
+			TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d,
+					goto fail);
+			TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+			/* check data */
+			TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+					dsz, esize[i]) == 0, d, goto fail);
+		}
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/* Tests both standard mode and zero-copy mode.
+ * Keep enqueuing 1, 2, MAX_BULK elements till the deque is full.
+ * Then deque them all and make sure the data is opposite of what
+ * was enqued.
+ */
+static int
+test_deque_stack_random_tests2(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	unsigned int ret;
+	unsigned int i, free_space, available;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests2.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Multiple enqs, deqs.";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+
+		printf("Enqueue objs till the deque is full.\n");
+		unsigned int count = 0;
+		const unsigned int perIterCount = 1 + 2 + MAX_BULK;
+		while (count + perIterCount < DEQUE_SIZE - 1) {
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							1, &free_space);
+			TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 1);
+
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							2, &free_space);
+			TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], 2);
+
+			ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+							MAX_BULK, &free_space);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_src = test_deque_inc_ptr(cur_src, esize[i], MAX_BULK);
+			count += perIterCount;
+		}
+		unsigned int leftOver = DEQUE_SIZE - 1 - count;
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						leftOver, &free_space);
+		TEST_DEQUE_VERIFY(ret == leftOver, d, goto fail);
+		cur_src = test_deque_inc_ptr(cur_src, esize[i], leftOver);
+
+		printf("Deque all the enqued objs.\n");
+		count = 0;
+		while (count + perIterCount < DEQUE_SIZE - 1) {
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+							esize[i], 1, &available);
+			TEST_DEQUE_VERIFY(ret == 1, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 1);
+
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i], 2,
+								&available);
+			TEST_DEQUE_VERIFY(ret == 2, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], 2);
+
+			ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst,
+								esize[i],
+								MAX_BULK,
+								&available);
+			TEST_DEQUE_VERIFY(ret == MAX_BULK, d, goto fail);
+			cur_dst = test_deque_inc_ptr(cur_dst, esize[i], MAX_BULK);
+			count += perIterCount;
+		}
+		leftOver = DEQUE_SIZE - 1 - count;
+		ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst, esize[i],
+							leftOver, &available);
+		TEST_DEQUE_VERIFY(ret == leftOver, d, goto fail);
+		cur_dst = test_deque_inc_ptr(cur_dst, esize[i], leftOver);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+						dsz, esize[i]) == 0, d,
+						goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+/*
+ * Tests both normal mode and zero-copy mode.
+ * Fill up the whole deque, and drain the deque.
+ * Make sure the data matches in reverse order.
+ */
+static int
+test_deque_stack_random_tests3(unsigned int test_idx)
+{
+	struct rte_deque *d;
+	void *src = NULL, *cur_src = NULL, *dst = NULL, *cur_dst = NULL;
+	int ret;
+	unsigned int i, available, free_space;
+	const unsigned int dsz = DEQUE_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		printf("Stackmode tests3.\n");
+		printf("\n%s, esize: %d\n", test_enqdeq_impl[test_idx].desc,
+			esize[i]);
+
+		/* Create the deque */
+		static const char *DEQUE_NAME = "Full deque length test";
+		d = rte_deque_create(DEQUE_NAME, esize[i], DEQUE_SIZE, 0, 0);
+
+		/* alloc dummy object pointers */
+		src = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_deque_mem_init(src, DEQUE_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_deque_calloc(DEQUE_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		/* fill the deque */
+		printf("Fill the whole deque using 1 "
+		"single enqueue operation.\n");
+		ret = test_enqdeq_impl[test_idx].enq(d, cur_src, esize[i],
+						dsz, &free_space);
+		TEST_DEQUE_VERIFY(ret == (int)dsz, d, goto fail);
+
+		TEST_DEQUE_VERIFY(rte_deque_free_count(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(dsz == rte_deque_count(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_full(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_empty(d) == 0, d, goto fail);
+
+		/* empty the deque */
+		printf("Empty the whole deque.\n");
+		ret = test_enqdeq_impl[test_idx].deq_opp(d, cur_dst, esize[i],
+							dsz, &available);
+		TEST_DEQUE_VERIFY(ret == (int)dsz, d, goto fail);
+
+		TEST_DEQUE_VERIFY(dsz == rte_deque_free_count(d), d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_count(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_full(d) == 0, d, goto fail);
+		TEST_DEQUE_VERIFY(rte_deque_empty(d), d, goto fail);
+
+		/* check data */
+		TEST_DEQUE_VERIFY(test_deque_mem_cmp_rvs(src, dst,
+					dsz, esize[i]) == 0, d, goto fail);
+
+		/* Free memory before test completed */
+		rte_deque_free(d);
+		rte_free(src);
+		rte_free(dst);
+		d = NULL;
+		src = NULL;
+		dst = NULL;
+	}
+
+	return 0;
+fail:
+	rte_deque_free(d);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
+}
+
+static int
+deque_enqueue_dequeue_autotest_fn(void)
+{
+	if (test_deque_with_exact_size() != 0)
+		goto fail;
+	int (*test_fns[])(unsigned int test_fn_idx) = {
+		test_deque_burst_bulk_tests1,
+		test_deque_burst_bulk_tests2,
+		test_deque_burst_bulk_tests3,
+		test_deque_burst_bulk_tests4,
+		test_deque_stack_random_tests1,
+		test_deque_stack_random_tests2,
+		test_deque_stack_random_tests3
+	};
+	for (unsigned int test_impl_idx = 0;
+		test_impl_idx < RTE_DIM(test_enqdeq_impl); test_impl_idx++) {
+		for (unsigned int test_fn_idx = 0;
+			test_fn_idx < RTE_DIM(test_fns); test_fn_idx++) {
+			if (test_fns[test_fn_idx](test_impl_idx) != 0)
+				goto fail;
+		}
+	}
+	return 0;
+fail:
+		return -1;
+}
+
+REGISTER_FAST_TEST(deque_enqueue_dequeue_autotest, true, true,
+		deque_enqueue_dequeue_autotest_fn);
diff --git a/app/test/test_deque_helper_functions.c b/app/test/test_deque_helper_functions.c
new file mode 100644
index 0000000000..0e47db7fcb
--- /dev/null
+++ b/app/test/test_deque_helper_functions.c
@@ -0,0 +1,169 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Limited
+ */
+
+#include "test.h"
+
+#include <assert.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_deque.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+
+static int
+test_deque_get_memsize(void)
+{
+	const ssize_t RTE_DEQUE_SZ = sizeof(struct rte_deque);
+	/* (1) Should return EINVAL when the supplied size of deque is not a
+	 * power of 2.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, 9), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (2) Should return EINVAL when the supplied size of deque is not a
+	 * multiple of 4.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(5, 8), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (3) Requested size of the deque should be less than or equal to
+	 * RTE_DEQUEUE_SZ_MASK
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, RTE_DEQUE_SZ_MASK), -EINVAL,
+					  "Get memsize function failed.");
+
+	/* (4) A deque of count 1, where the element size is 0, should not allocate
+	 * any more memory than necessary to hold the dequeu structure.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(0, 1), RTE_DEQUE_SZ,
+					  "Get memsize function failed.");
+
+	/* (5) Make sure the function is calculating the size correctly.
+	 * size of deque: 128. Size for two elements each of size esize: 8
+	 * total: 128 + 8 = 132
+	 * Cache align'd size = 192.
+	 */
+	const ssize_t calculated_sz = RTE_ALIGN(RTE_DEQUE_SZ + 8, RTE_CACHE_LINE_SIZE);
+	TEST_ASSERT_EQUAL(rte_deque_get_memsize_elem(4, 2), calculated_sz,
+					  "Get memsize function failed.");
+	return 0;
+}
+
+/* Define a Test macro that will allow us to correctly free all the rte_deque
+ * objects that were created as a part of the test in case of a failure.
+ */
+
+#define TEST_DEQUE_MEMSAFE(exp, msg, stmt) do { \
+	if (!(exp)) { \
+		printf("error at %s:%d\tcondition " #exp " failed. Msg: %s\n",	\
+			__func__, __LINE__, msg); \
+		stmt; \
+	 } \
+} while (0)
+
+static int
+test_deque_init(void)
+{
+	{
+	/* (1) Make sure init fails when the flags are not correctly passed in. */
+	struct rte_deque deque;
+
+	/* Calling init with undefined flags should fail. */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, 0x8),
+					  -EINVAL, "Init failed.");
+
+	/* Calling init with a count that is not a power of 2
+	 * And also not the setting the RTE_DEQUE_F_EXACT_SZ
+	 * flag should fail.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, 0),
+					  -EINVAL, "Init failed.");
+
+	/* Calling init with a count that is not a power of 2
+	 * Should succeed only if the RTE_DEQUE_F_EXACT_SZ flag is set.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, "Deque", 10, RTE_DEQUE_F_EXACT_SZ),
+					  0, "Init failed.");
+	}
+
+	{
+	/* Make sure all the fields are being correctly set when creating a
+	 * Deque of a size that is not a power of 2.
+	 */
+	struct rte_deque deque;
+	static const char NAME[] = "Deque";
+
+	/* Calling init with a count that is not a power of 2
+	 * But with RTE_DEQUE_F_EXACT_SZ should succeed.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, NAME, 10, RTE_DEQUE_F_EXACT_SZ),
+					  0, "Init failed.");
+
+	TEST_ASSERT_BUFFERS_ARE_EQUAL(deque.name, NAME, sizeof(NAME), "Init failed.");
+	TEST_ASSERT_EQUAL(deque.flags, RTE_DEQUE_F_EXACT_SZ, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.size, 16, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.mask, 15, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.capacity, 10, "Init failed.");
+	}
+
+	{
+	/* Make sure all the fields are being correctly set when creating a
+	 * Deque of a size that is a power of 2.
+	 */
+	struct rte_deque deque;
+	static const char NAME[] = "Deque";
+
+	/* Calling init with a count that is not a power of 2
+	 * But with RTE_DEQUE_F_EXACT_SZ should succeed.
+	 */
+	TEST_ASSERT_EQUAL(rte_deque_init(&deque, NAME, 16, 0), 0, "Init failed.");
+
+	TEST_ASSERT_EQUAL(deque.size, 16, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.mask, 15, "Init failed.");
+	TEST_ASSERT_EQUAL(deque.capacity, 15, "Init failed.");
+	}
+	return 0;
+}
+
+static int
+test_deque_create(void)
+{
+	struct rte_deque *deque;
+	const char *NAME = "Deque";
+	deque = rte_deque_create(NAME, 4, 16, 0, 0);
+
+	/* Make sure the deque creation is successful. */
+	TEST_DEQUE_MEMSAFE(deque != NULL, "Deque creation failed.", goto fail);
+	TEST_DEQUE_MEMSAFE(deque->memzone != NULL, "Deque creation failed.", goto fail);
+	return 0;
+fail:
+	rte_free(deque);
+	return -1;
+}
+
+#undef TEST_DEQUE_MEMSAFE
+
+static struct unit_test_suite deque_helper_functions_testsuite = {
+	.suite_name = "Deque library helper functions test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_deque_get_memsize),
+		TEST_CASE(test_deque_init),
+		TEST_CASE(test_deque_create),
+		TEST_CASES_END(), /**< NULL terminate unit test array */
+	},
+};
+
+static int
+deque_helper_functions_autotest_fn(void)
+{
+	return unit_test_suite_runner(&deque_helper_functions_testsuite);
+}
+
+REGISTER_FAST_TEST(deque_helper_functions_autotest, true, true,
+		deque_helper_functions_autotest_fn);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/2] deque: add multithread unsafe deque library
  2024-05-02 20:19         ` [PATCH v3 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
  2024-05-02 20:19           ` [PATCH v3 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
  2024-05-02 20:19           ` [PATCH v3 2/2] deque: add unit tests for the deque library Aditya Ambadipudi
@ 2024-05-02 20:29           ` Aditya Ambadipudi
  2024-06-27 15:03           ` Thomas Monjalon
  3 siblings, 0 replies; 48+ messages in thread
From: Aditya Ambadipudi @ 2024-05-02 20:29 UTC (permalink / raw)
  To: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors, probb, alialnu
  Cc: Wathsala Wathawana Vithanage, Dhruv Tripathi, Honnappa Nagarahalli, nd

[-- Attachment #1: Type: text/plain, Size: 2934 bytes --]

Hello Ali & Patrick.

Please use v3 of this patch to see if it fixes the "deque" spell check issue that you folks were helping me & Wathsala with. I have removed Gerrit change-id from this patch.

Thank you,
Aditya Ambadipudi
________________________________
From: Aditya Ambadipudi <aditya.ambadipudi@arm.com>
Sent: Thursday, May 2, 2024 3:19 PM
To: dev@dpdk.org <dev@dpdk.org>; jackmin@nvidia.com <jackmin@nvidia.com>; stephen@networkplumber.org <stephen@networkplumber.org>; matan@nvidia.com <matan@nvidia.com>; viacheslavo@nvidia.com <viacheslavo@nvidia.com>; roretzla@linux.microsoft.com <roretzla@linux.microsoft.com>; konstantin.ananyev@huawei.com <konstantin.ananyev@huawei.com>; mb@smartsharesystems.com <mb@smartsharesystems.com>; hofors@lysator.liu.se <hofors@lysator.liu.se>; probb@iol.unh.edu <probb@iol.unh.edu>; alialnu@nvidia.com <alialnu@nvidia.com>
Cc: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; Dhruv Tripathi <Dhruv.Tripathi@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Aditya Ambadipudi <Aditya.Ambadipudi@arm.com>; Aditya Ambadipudi <Aditya.Ambadipudi@arm.com>
Subject: [PATCH v3 0/2] deque: add multithread unsafe deque library

As previously discussed in the mailing list [1] we are sending out this
patch that provides the implementation and unit test cases for the
RTE_DEQUE library. This includes functions for creating a RTE_DEQUE
object. Allocating memory to it. Deleting that object and free'ing the
memory associated with it. Enqueue/Dequeue functions. Functions for
zero-copy API.

Aditya Ambadipudi (1):
  deque: add unit tests for the deque library

Honnappa Nagarahalli (1):
  deque: add multi-thread unsafe double ended queue

 .mailmap                               |    1 +
 app/test/meson.build                   |    2 +
 app/test/test_deque_enqueue_dequeue.c  | 1228 ++++++++++++++++++++++++
 app/test/test_deque_helper_functions.c |  169 ++++
 devtools/build-dict.sh                 |    1 +
 lib/deque/meson.build                  |   11 +
 lib/deque/rte_deque.c                  |  193 ++++
 lib/deque/rte_deque.h                  |  533 ++++++++++
 lib/deque/rte_deque_core.h             |   81 ++
 lib/deque/rte_deque_pvt.h              |  538 +++++++++++
 lib/deque/rte_deque_zc.h               |  430 +++++++++
 lib/deque/version.map                  |   14 +
 lib/meson.build                        |    2 +
 13 files changed, 3203 insertions(+)
 create mode 100644 app/test/test_deque_enqueue_dequeue.c
 create mode 100644 app/test/test_deque_helper_functions.c
 create mode 100644 lib/deque/meson.build
 create mode 100644 lib/deque/rte_deque.c
 create mode 100644 lib/deque/rte_deque.h
 create mode 100644 lib/deque/rte_deque_core.h
 create mode 100644 lib/deque/rte_deque_pvt.h
 create mode 100644 lib/deque/rte_deque_zc.h
 create mode 100644 lib/deque/version.map

--
2.25.1


[-- Attachment #2: Type: text/html, Size: 5400 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/2] deque: add multithread unsafe deque library
  2024-05-02 20:19         ` [PATCH v3 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
                             ` (2 preceding siblings ...)
  2024-05-02 20:29           ` [PATCH v3 0/2] deque: add multithread unsafe " Aditya Ambadipudi
@ 2024-06-27 15:03           ` Thomas Monjalon
  2024-06-28 20:05             ` Wathsala Wathawana Vithanage
  3 siblings, 1 reply; 48+ messages in thread
From: Thomas Monjalon @ 2024-06-27 15:03 UTC (permalink / raw)
  To: honnappa.nagarahalli, Aditya Ambadipudi
  Cc: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors, probb, alialnu,
	wathsala.vithanage, dhruv.tripathi, nd, venamb01

02/05/2024 22:19, Aditya Ambadipudi:
> As previously discussed in the mailing list [1] we are sending out this
> patch that provides the implementation and unit test cases for the
> RTE_DEQUE library. This includes functions for creating a RTE_DEQUE 
> object. Allocating memory to it. Deleting that object and free'ing the
> memory associated with it. Enqueue/Dequeue functions. Functions for 
> zero-copy API.
> 
> Aditya Ambadipudi (1):
>   deque: add unit tests for the deque library
> 
> Honnappa Nagarahalli (1):
>   deque: add multi-thread unsafe double ended queue

There were many comments on previous versions,
and no ack on the v3, so I'm not sure all comments are addressed.
We probably need a new round of reviews on this new library.

Also, in order to show its benefits, would it be a good idea
to replace some existing code with calls to this lib,
inside this patch series?



^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v3 0/2] deque: add multithread unsafe deque library
  2024-06-27 15:03           ` Thomas Monjalon
@ 2024-06-28 20:05             ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 48+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-06-28 20:05 UTC (permalink / raw)
  To: thomas, Honnappa Nagarahalli, Aditya Ambadipudi
  Cc: dev, jackmin, stephen, matan, viacheslavo, roretzla,
	konstantin.ananyev, mb, hofors, probb, alialnu, Dhruv Tripathi,
	nd, Aditya Ambadipudi, nd

Hi Thomas,

Aditya the original author of this patch is no longer at Arm.
One of my colleagues will take over this patch, hence we will need some time to address these comments.

Thank you

> There were many comments on previous versions, and no ack on the v3, so I'm
> not sure all comments are addressed.
> We probably need a new round of reviews on this new library.
> 
> Also, in order to show its benefits, would it be a good idea to replace some
> existing code with calls to this lib, inside this patch series?
> 



^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-06-28 20:05 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-21  6:04 [RFC] lib/st_ring: add single thread ring Honnappa Nagarahalli
2023-08-21  7:37 ` Morten Brørup
2023-08-22  5:47   ` Honnappa Nagarahalli
2023-08-24  8:05     ` Morten Brørup
2023-08-24 10:52       ` Mattias Rönnblom
2023-08-24 11:22         ` Morten Brørup
2023-08-26 23:34           ` Honnappa Nagarahalli
2023-08-21 21:14 ` Mattias Rönnblom
2023-08-22  5:43   ` Honnappa Nagarahalli
2023-08-22  8:04     ` Mattias Rönnblom
2023-08-22 16:28       ` Honnappa Nagarahalli
2023-09-04 10:13 ` Konstantin Ananyev
2023-09-04 18:10   ` Honnappa Nagarahalli
2023-09-05  8:19     ` Konstantin Ananyev
2024-04-01  1:37 ` [PATCH v1 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
2024-04-01  1:37   ` [PATCH v1 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
2024-04-06  9:35     ` Morten Brørup
2024-04-24 13:42     ` [PATCH v2 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
2024-04-24 13:42       ` [PATCH v2 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
2024-04-24 15:16         ` Morten Brørup
2024-04-24 17:21           ` Patrick Robb
2024-04-25  7:43             ` Ali Alnubani
2024-04-24 23:28         ` Mattias Rönnblom
2024-05-02 20:19         ` [PATCH v3 0/2] deque: add multithread unsafe deque library Aditya Ambadipudi
2024-05-02 20:19           ` [PATCH v3 1/2] deque: add multi-thread unsafe double ended queue Aditya Ambadipudi
2024-05-02 20:19           ` [PATCH v3 2/2] deque: add unit tests for the deque library Aditya Ambadipudi
2024-05-02 20:29           ` [PATCH v3 0/2] deque: add multithread unsafe " Aditya Ambadipudi
2024-06-27 15:03           ` Thomas Monjalon
2024-06-28 20:05             ` Wathsala Wathawana Vithanage
2024-04-24 13:42       ` [PATCH v2 2/2] deque: add unit tests for the " Aditya Ambadipudi
2024-04-01  1:37   ` [PATCH v1 " Aditya Ambadipudi
2024-04-01 14:05   ` [PATCH v1 0/2] deque: add multithread unsafe " Stephen Hemminger
2024-04-01 22:28     ` Aditya Ambadipudi
2024-04-02  0:05       ` Tyler Retzlaff
2024-04-02  0:47       ` Stephen Hemminger
2024-04-02  1:35         ` Honnappa Nagarahalli
2024-04-02  2:00           ` Stephen Hemminger
2024-04-02  2:14             ` Honnappa Nagarahalli
2024-04-02  2:53               ` Stephen Hemminger
     [not found]                 ` <PAVPR08MB9185DC373708CBD16A38EFA8EF3E2@PAVPR08MB9185.eurprd08.prod.outlook.com>
2024-04-02  4:20                   ` Tyler Retzlaff
2024-04-02 23:44                     ` Stephen Hemminger
2024-04-03  0:12                       ` Honnappa Nagarahalli
2024-04-03 23:52                         ` Variable name issues with codespell Stephen Hemminger
2024-04-02  4:20                 ` [PATCH v1 0/2] deque: add multithread unsafe deque library Tyler Retzlaff
2024-04-03 16:50                 ` Honnappa Nagarahalli
2024-04-03 17:46                   ` Tyler Retzlaff
2024-04-02  6:05         ` Mattias Rönnblom
2024-04-02 15:25           ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).