DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library
                     ` (4 preceding siblings ...)
  2020-07-09 15:42  4% ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-10  2:22  4% ` Ruifeng Wang
  5 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-10  2:22 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v10:
Added missing Acked-by tags.

v9:
Cleared lpm when allocation failed. (David)

v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 26 deletions(-)

-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-09 12:36  2% ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
@ 2020-07-09 23:47  0%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2020-07-09 23:47 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas,
	Andrew Rybchenko

On 7/9/2020 1:36 PM, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.

Is this a HW support, or is the scheduling planned to be done in the driver?

> 
> The main objective of this RFC is to specify the way how applications

It is no more RFC.

> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.

I was about the ask this, will there be a PMD counterpart implementation of the
feature? It would be better to have it as part of this set.
What is the plan for the PMD implementation?

> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> The device clock is opaque entity, the units and frequency are
> vendor specific and might depend on hardware capabilities and
> configurations. If might (or not) be synchronized with real time
> via PTP, might (or not) be synchronous with CPU clock (for example
> if NIC and CPU share the same clock source there might be no
> any drift between the NIC and CPU clocks), etc.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well. Having the dedicated
> flags for Rx/Tx timestamps allows applications not to perform explicit
> flags reset on forwarding and not to promote received timestamps
> to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
> is considered as candidate to become the dynamic flag.

Is there a deprecation notice for 'PKT_RX_TIMESTAMP'? Is this decided?

> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.

Good to hear that there will be a generic API to get supported dynamic flags. I
was concerned about adding 'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP' flag, since not
sure if there will be any other PMD that will want to use it.
The trouble is it is hard to remove a public macro after it is introduced, in
this release I think only single PMD (mlx) will support this feature, and in
next release the plan is to remove the macro. In this case what do you think to
not introduce the flag at all?

> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the

'rte_eth_read_clock()'?

> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> ---
>   v1->v4:
>      - dedicated dynamic Tx timestamp flag instead of shared with Rx
>   v4->v5:
>      - elaborated commit message
>      - more words about device clocks added,
>      - note about dedicated Rx/Tx timestamp flags added
>   v5->v6:
>      - release notes are updated
> ---
>  doc/guides/rel_notes/release_20_08.rst |  6 ++++++
>  lib/librte_ethdev/rte_ethdev.c         |  1 +
>  lib/librte_ethdev/rte_ethdev.h         |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h         | 31 +++++++++++++++++++++++++++++++
>  4 files changed, 42 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
> index 988474c..5527bab 100644
> --- a/doc/guides/rel_notes/release_20_08.rst
> +++ b/doc/guides/rel_notes/release_20_08.rst
> @@ -200,6 +200,12 @@ New Features
>    See the :doc:`../sample_app_ug/l2_forward_real_virtual` for more
>    details of this parameter usage.
>  
> +* **Introduced send packet scheduling on the timestamps.**
> +
> +  Added the new mbuf dynamic field and flag to provide timestamp on what packet
> +  transmitting can be synchronized. The device Tx offload flag is added to
> +  indicate the PMD supports send scheduling.
> +

This is a core library change, can go up in the section, please check the
section comment for the ordering details.

>  
>  Removed Items
>  -------------
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 7022bd7..c48ca2a 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 631b146..97313a0 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000

Please cc the ethdev maintainers.

As mentioned above my concern is if this is generic enough or are we adding a
flag to a specific PMD? And since commit log says this is temporary solution for
just this release, I repeat my question if we can remove the flag completely?


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
  2020-07-09 11:03  3%     ` Olivier Matz
@ 2020-07-09 15:58  4%     ` Phil Yang
  1 sibling, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09 15:58 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: stephen, david.marchand, drc, Honnappa.Nagarahalli, Ruifeng.Wang, nd

Use C11 atomic built-ins with explicit ordering instead of rte_atomic
ops which enforce unnecessary barriers on aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v4:
1. Add union for refcnt_atomic and refcnt in rte_mbuf_ext_shared_info
to avoid ABI breakage. (Olivier)
2. Add notice of refcnt_atomic deprecation. (Honnappa)

v3:
1.Fix ABI breakage.
2.Simplify data type cast.

v2:
Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
to refcnt_atomic.

 lib/librte_mbuf/rte_mbuf.c      |  1 -
 lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
 lib/librte_mbuf/rte_mbuf_core.h |  6 +++++-
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index ae91ae2..8a456e5 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f8e492e..7259575 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -37,7 +37,6 @@
 #include <rte_config.h>
 #include <rte_mempool.h>
 #include <rte_memory.h>
-#include <rte_atomic.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
 #include <rte_byteorder.h>
@@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
+	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
 }
 
 /**
@@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
+	return __atomic_add_fetch(&m->refcnt, (uint16_t)value,
+				 __ATOMIC_ACQ_REL);
 }
 
 /**
@@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
+	return __atomic_load_n(&shinfo->refcnt, __ATOMIC_RELAXED);
 }
 
 /**
@@ -481,7 +481,7 @@ static inline void
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&shinfo->refcnt, new_value, __ATOMIC_RELAXED);
 }
 
 /**
@@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 		return (uint16_t)value;
 	}
 
-	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
+	return __atomic_add_fetch(&shinfo->refcnt, (uint16_t)value,
+				 __ATOMIC_ACQ_REL);
 }
 
 /** Mbuf prefetch */
@@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(rte_atomic16_add_return
-			(&shinfo->refcnt_atomic, -1)))
+	if (likely(__atomic_add_fetch(&shinfo->refcnt, (uint16_t)-1,
+				     __ATOMIC_ACQ_REL)))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 16600f1..8cd7137 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -679,7 +679,11 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
+	RTE_STD_C11
+	union {
+		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
+		uint16_t refcnt;
+	};
 };
 
 /**< Maximum number of nb_segs allowed. */
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
  2020-07-08 13:09  7% ` Kinsella, Ray
@ 2020-07-09 15:52  4% ` Dodji Seketeli
  1 sibling, 0 replies; 200+ results
From: Dodji Seketeli @ 2020-07-09 15:52 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, Ray Kinsella, Neil Horman

Hello,

David Marchand <david.marchand@redhat.com> writes:

> abidiff can provide some more information about the ABI difference it
> detected.
> In all cases, a discussion on the mailing must happen but we can give
> some hints to know if this is a problem with the script calling abidiff,
> a potential ABI breakage or an unambiguous ABI breakage.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>

For what it's worth, the change looks good to me, at least from an
abidiff perspective.

Thanks.

Cheers.

-- 
		Dodji


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR
  2020-07-09 15:42  4% ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09 15:42  2%   ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-09 15:42 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct wraps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  53 +++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 237 insertions(+), 22 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..2d687c372 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -39,6 +40,17 @@ enum valid_flag {
 	VALID
 };
 
+/** @internal LPM structure. */
+struct __rte_lpm {
+	/* LPM metadata. */
+	struct rte_lpm lpm;
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
 	char mem_name[RTE_LPM_NAMESIZE];
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm *lpm = NULL;
 	struct rte_tailq_entry *te;
 	uint32_t mem_size, rules_size, tbl8s_size;
@@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
 
 	snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
 
-	/* Determine the amount of memory to allocate. */
-	mem_size = sizeof(*lpm);
-	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
 	rte_mcfg_tailq_write_lock();
 
 	/* guarantee there's no existing */
@@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
 		goto exit;
 	}
 
+	/* Determine the amount of memory to allocate. */
+	mem_size = sizeof(*internal_lpm);
+	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
 	/* allocate tailq entry */
 	te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
 	if (te == NULL) {
@@ -170,21 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
 	}
 
 	/* Allocate memory to store the LPM data structures. */
-	lpm = rte_zmalloc_socket(mem_name, mem_size,
+	internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
 			RTE_CACHE_LINE_SIZE, socket_id);
-	if (lpm == NULL) {
+	if (internal_lpm == NULL) {
 		RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
 
+	lpm = &internal_lpm->lpm;
 	lpm->rules_tbl = rte_zmalloc_socket(NULL,
 			(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
 
 	if (lpm->rules_tbl == NULL) {
 		RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -197,7 +212,8 @@ rte_lpm_create(const char *name, int socket_id,
 	if (lpm->tbl8 == NULL) {
 		RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
 		rte_free(lpm->rules_tbl);
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -225,6 +241,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm_list *lpm_list;
 	struct rte_tailq_entry *te;
 
@@ -246,12 +263,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->dq)
+		rte_rcu_qsbr_dq_delete(internal_lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	struct __rte_lpm *internal_lpm;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if (lpm == NULL || cfg == NULL) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->v != NULL) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		internal_lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (internal_lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM, "LPM defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = internal_lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	internal_lpm->rcu_mode = cfg->mode;
+	internal_lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +483,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +517,47 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if (group_idx == -ENOSPC && internal_lpm->dq != NULL) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(internal_lpm->dq, 1,
+				NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (internal_lpm->v == NULL) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(internal_lpm->v,
+			RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(internal_lpm->dq,
+				(void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +646,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +692,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1100,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1116,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -132,6 +145,22 @@ struct rte_lpm {
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
 };
 
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
+};
+
 /**
  * Create an LPM object.
  *
@@ -179,6 +208,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library
                     ` (3 preceding siblings ...)
  2020-07-09  8:02  4% ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09 15:42  4% ` Ruifeng Wang
  2020-07-09 15:42  2%   ` [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-10  2:22  4% ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-07-09 15:42 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v9:
Cleared lpm when allocation failed. (David)

v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 26 deletions(-)

-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 20.11 4/5] rawdev: add private data length parameter to queue fns
  2020-07-09 15:20  4% [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn Bruce Richardson
@ 2020-07-09 15:20  3% ` Bruce Richardson
  2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Nipun Gupta, Hemant Agrawal
  Cc: dev, Rosen Xu, Tianfei zhang, Xiaoyun Li, Jingjing Wu, Satha Rao,
	Mahipal Challa, Jerin Jacob, Bruce Richardson

The queue setup and queue defaults query functions take a void * parameter
as configuration data, preventing any compile-time checking of the
parameters and limiting runtime checks. Adding in the length of the
expected structure provides a measure of typechecking, and can also be used
for ABI compatibility in future, since ABI changes involving structs almost
always involve a change in size.

Signed-off-by:  Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/raw/ntb/ntb.c                       | 25 ++++++++++++++++-----
 drivers/raw/skeleton/skeleton_rawdev.c      | 10 +++++----
 drivers/raw/skeleton/skeleton_rawdev_test.c |  8 +++----
 examples/ntb/ntb_fwd.c                      |  3 ++-
 lib/librte_rawdev/rte_rawdev.c              | 10 +++++----
 lib/librte_rawdev/rte_rawdev.h              | 10 +++++++--
 lib/librte_rawdev/rte_rawdev_pmd.h          |  6 +++--
 7 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index c181094d5..7c15e204c 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -249,11 +249,15 @@ ntb_dev_intr_handler(void *param)
 static void
 ntb_queue_conf_get(struct rte_rawdev *dev,
 		   uint16_t queue_id,
-		   rte_rawdev_obj_t queue_conf)
+		   rte_rawdev_obj_t queue_conf,
+		   size_t conf_size)
 {
 	struct ntb_queue_conf *q_conf = queue_conf;
 	struct ntb_hw *hw = dev->dev_private;
 
+	if (conf_size != sizeof(*q_conf))
+		return;
+
 	q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh;
 	q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc;
 	q_conf->rx_mp = hw->rx_queues[queue_id]->mpool;
@@ -294,12 +298,16 @@ ntb_rxq_release(struct ntb_rx_queue *rxq)
 static int
 ntb_rxq_setup(struct rte_rawdev *dev,
 	      uint16_t qp_id,
-	      rte_rawdev_obj_t queue_conf)
+	      rte_rawdev_obj_t queue_conf,
+	      size_t conf_size)
 {
 	struct ntb_queue_conf *rxq_conf = queue_conf;
 	struct ntb_hw *hw = dev->dev_private;
 	struct ntb_rx_queue *rxq;
 
+	if (conf_size != sizeof(*rxq_conf))
+		return -EINVAL;
+
 	/* Allocate the rx queue data structure */
 	rxq = rte_zmalloc_socket("ntb rx queue",
 				 sizeof(struct ntb_rx_queue),
@@ -375,13 +383,17 @@ ntb_txq_release(struct ntb_tx_queue *txq)
 static int
 ntb_txq_setup(struct rte_rawdev *dev,
 	      uint16_t qp_id,
-	      rte_rawdev_obj_t queue_conf)
+	      rte_rawdev_obj_t queue_conf,
+	      size_t conf_size)
 {
 	struct ntb_queue_conf *txq_conf = queue_conf;
 	struct ntb_hw *hw = dev->dev_private;
 	struct ntb_tx_queue *txq;
 	uint16_t i, prev;
 
+	if (conf_size != sizeof(*txq_conf))
+		return -EINVAL;
+
 	/* Allocate the TX queue data structure. */
 	txq = rte_zmalloc_socket("ntb tx queue",
 				  sizeof(struct ntb_tx_queue),
@@ -439,7 +451,8 @@ ntb_txq_setup(struct rte_rawdev *dev,
 static int
 ntb_queue_setup(struct rte_rawdev *dev,
 		uint16_t queue_id,
-		rte_rawdev_obj_t queue_conf)
+		rte_rawdev_obj_t queue_conf,
+		size_t conf_size)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	int ret;
@@ -447,11 +460,11 @@ ntb_queue_setup(struct rte_rawdev *dev,
 	if (queue_id >= hw->queue_pairs)
 		return -EINVAL;
 
-	ret = ntb_txq_setup(dev, queue_id, queue_conf);
+	ret = ntb_txq_setup(dev, queue_id, queue_conf, conf_size);
 	if (ret < 0)
 		return ret;
 
-	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
+	ret = ntb_rxq_setup(dev, queue_id, queue_conf, conf_size);
 
 	return ret;
 }
diff --git a/drivers/raw/skeleton/skeleton_rawdev.c b/drivers/raw/skeleton/skeleton_rawdev.c
index 531d0450c..f109e4d2c 100644
--- a/drivers/raw/skeleton/skeleton_rawdev.c
+++ b/drivers/raw/skeleton/skeleton_rawdev.c
@@ -222,14 +222,15 @@ static int skeleton_rawdev_reset(struct rte_rawdev *dev)
 
 static void skeleton_rawdev_queue_def_conf(struct rte_rawdev *dev,
 					   uint16_t queue_id,
-					   rte_rawdev_obj_t queue_conf)
+					   rte_rawdev_obj_t queue_conf,
+					   size_t conf_size)
 {
 	struct skeleton_rawdev *skeldev;
 	struct skeleton_rawdev_queue *skelq;
 
 	SKELETON_PMD_FUNC_TRACE();
 
-	if (!dev || !queue_conf)
+	if (!dev || !queue_conf || conf_size != sizeof(struct skeleton_rawdev_queue))
 		return;
 
 	skeldev = skeleton_rawdev_get_priv(dev);
@@ -252,7 +253,8 @@ clear_queue_bufs(int queue_id)
 
 static int skeleton_rawdev_queue_setup(struct rte_rawdev *dev,
 				       uint16_t queue_id,
-				       rte_rawdev_obj_t queue_conf)
+				       rte_rawdev_obj_t queue_conf,
+				       size_t conf_size)
 {
 	int ret = 0;
 	struct skeleton_rawdev *skeldev;
@@ -260,7 +262,7 @@ static int skeleton_rawdev_queue_setup(struct rte_rawdev *dev,
 
 	SKELETON_PMD_FUNC_TRACE();
 
-	if (!dev || !queue_conf)
+	if (!dev || !queue_conf || conf_size != sizeof(struct skeleton_rawdev_queue))
 		return -EINVAL;
 
 	skeldev = skeleton_rawdev_get_priv(dev);
diff --git a/drivers/raw/skeleton/skeleton_rawdev_test.c b/drivers/raw/skeleton/skeleton_rawdev_test.c
index 7dc7c7684..bb4b6efe4 100644
--- a/drivers/raw/skeleton/skeleton_rawdev_test.c
+++ b/drivers/raw/skeleton/skeleton_rawdev_test.c
@@ -185,7 +185,7 @@ test_rawdev_queue_default_conf_get(void)
 	 * depth = DEF_DEPTH
 	 */
 	for (i = 0; i < rdev_conf_get.num_queues; i++) {
-		rte_rawdev_queue_conf_get(test_dev_id, i, &q);
+		rte_rawdev_queue_conf_get(test_dev_id, i, &q, sizeof(q));
 		RTE_TEST_ASSERT_EQUAL(q.depth, SKELETON_QUEUE_DEF_DEPTH,
 				      "Invalid default depth of queue (%d)",
 				      q.depth);
@@ -235,11 +235,11 @@ test_rawdev_queue_setup(void)
 	/* Modify the queue depth for Queue 0 and attach it */
 	qset.depth = 15;
 	qset.state = SKELETON_QUEUE_ATTACH;
-	ret = rte_rawdev_queue_setup(test_dev_id, 0, &qset);
+	ret = rte_rawdev_queue_setup(test_dev_id, 0, &qset, sizeof(qset));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to setup queue (%d)", ret);
 
 	/* Now, fetching the queue 0 should show depth as 15 */
-	ret = rte_rawdev_queue_conf_get(test_dev_id, 0, &qget);
+	ret = rte_rawdev_queue_conf_get(test_dev_id, 0, &qget, sizeof(qget));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to get queue config (%d)", ret);
 
 	RTE_TEST_ASSERT_EQUAL(qset.depth, qget.depth,
@@ -263,7 +263,7 @@ test_rawdev_queue_release(void)
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to release queue 0; (%d)", ret);
 
 	/* Now, fetching the queue 0 should show depth as default */
-	ret = rte_rawdev_queue_conf_get(test_dev_id, 0, &qget);
+	ret = rte_rawdev_queue_conf_get(test_dev_id, 0, &qget, sizeof(qget));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to get queue config (%d)", ret);
 
 	RTE_TEST_ASSERT_EQUAL(qget.depth, SKELETON_QUEUE_DEF_DEPTH,
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index 656f73659..5a8439b8d 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -1411,7 +1411,8 @@ main(int argc, char **argv)
 	ntb_q_conf.rx_mp = mbuf_pool;
 	for (i = 0; i < num_queues; i++) {
 		/* Setup rawdev queue */
-		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf);
+		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf,
+				sizeof(ntb_q_conf));
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE,
 				"Failed to setup ntb queue %u.\n", i);
diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c
index 6c4d783cc..8965f2ce3 100644
--- a/lib/librte_rawdev/rte_rawdev.c
+++ b/lib/librte_rawdev/rte_rawdev.c
@@ -137,7 +137,8 @@ rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
 int
 rte_rawdev_queue_conf_get(uint16_t dev_id,
 			  uint16_t queue_id,
-			  rte_rawdev_obj_t queue_conf)
+			  rte_rawdev_obj_t queue_conf,
+			  size_t queue_conf_size)
 {
 	struct rte_rawdev *dev;
 
@@ -145,14 +146,15 @@ rte_rawdev_queue_conf_get(uint16_t dev_id,
 	dev = &rte_rawdevs[dev_id];
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_def_conf, -ENOTSUP);
-	(*dev->dev_ops->queue_def_conf)(dev, queue_id, queue_conf);
+	(*dev->dev_ops->queue_def_conf)(dev, queue_id, queue_conf, queue_conf_size);
 	return 0;
 }
 
 int
 rte_rawdev_queue_setup(uint16_t dev_id,
 		       uint16_t queue_id,
-		       rte_rawdev_obj_t queue_conf)
+		       rte_rawdev_obj_t queue_conf,
+		       size_t queue_conf_size)
 {
 	struct rte_rawdev *dev;
 
@@ -160,7 +162,7 @@ rte_rawdev_queue_setup(uint16_t dev_id,
 	dev = &rte_rawdevs[dev_id];
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_setup, -ENOTSUP);
-	return (*dev->dev_ops->queue_setup)(dev, queue_id, queue_conf);
+	return (*dev->dev_ops->queue_setup)(dev, queue_id, queue_conf, queue_conf_size);
 }
 
 int
diff --git a/lib/librte_rawdev/rte_rawdev.h b/lib/librte_rawdev/rte_rawdev.h
index 73e3bd5ae..bbd63913a 100644
--- a/lib/librte_rawdev/rte_rawdev.h
+++ b/lib/librte_rawdev/rte_rawdev.h
@@ -146,6 +146,8 @@ rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
  *   previously supplied to rte_rawdev_configure().
  * @param[out] queue_conf
  *   The pointer to the default raw queue configuration data.
+ * @param queue_conf_size
+ *   The size of the structure pointed to by queue_conf
  * @return
  *   - 0: Success, driver updates the default raw queue configuration data.
  *   - <0: Error code returned by the driver info get function.
@@ -156,7 +158,8 @@ rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
 int
 rte_rawdev_queue_conf_get(uint16_t dev_id,
 			  uint16_t queue_id,
-			  rte_rawdev_obj_t queue_conf);
+			  rte_rawdev_obj_t queue_conf,
+			  size_t queue_conf_size);
 
 /**
  * Allocate and set up a raw queue for a raw device.
@@ -169,6 +172,8 @@ rte_rawdev_queue_conf_get(uint16_t dev_id,
  * @param queue_conf
  *   The pointer to the configuration data to be used for the raw queue.
  *   NULL value is allowed, in which case default configuration	used.
+ * @param queue_conf_size
+ *   The size of the structure pointed to by queue_conf
  *
  * @see rte_rawdev_queue_conf_get()
  *
@@ -179,7 +184,8 @@ rte_rawdev_queue_conf_get(uint16_t dev_id,
 int
 rte_rawdev_queue_setup(uint16_t dev_id,
 		       uint16_t queue_id,
-		       rte_rawdev_obj_t queue_conf);
+		       rte_rawdev_obj_t queue_conf,
+		       size_t queue_conf_size);
 
 /**
  * Release and deallocate a raw queue from a raw device.
diff --git a/lib/librte_rawdev/rte_rawdev_pmd.h b/lib/librte_rawdev/rte_rawdev_pmd.h
index 050f8b029..34eb667f6 100644
--- a/lib/librte_rawdev/rte_rawdev_pmd.h
+++ b/lib/librte_rawdev/rte_rawdev_pmd.h
@@ -218,7 +218,8 @@ typedef int (*rawdev_reset_t)(struct rte_rawdev *dev);
  */
 typedef void (*rawdev_queue_conf_get_t)(struct rte_rawdev *dev,
 					uint16_t queue_id,
-					rte_rawdev_obj_t queue_conf);
+					rte_rawdev_obj_t queue_conf,
+					size_t queue_conf_size);
 
 /**
  * Setup an raw queue.
@@ -235,7 +236,8 @@ typedef void (*rawdev_queue_conf_get_t)(struct rte_rawdev *dev,
  */
 typedef int (*rawdev_queue_setup_t)(struct rte_rawdev *dev,
 				    uint16_t queue_id,
-				    rte_rawdev_obj_t queue_conf);
+				    rte_rawdev_obj_t queue_conf,
+				    size_t queue_conf_size);
 
 /**
  * Release resources allocated by given raw queue.
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn
  2020-07-09 15:20  4% [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
@ 2020-07-09 15:20  3% ` Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 4/5] rawdev: add private data length parameter to queue fns Bruce Richardson
  2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Nipun Gupta, Hemant Agrawal
  Cc: dev, Rosen Xu, Tianfei zhang, Xiaoyun Li, Jingjing Wu, Satha Rao,
	Mahipal Challa, Jerin Jacob, Bruce Richardson

Currently with the rawdev API there is no way to check that the structure
passed in via the dev_private pointer in the structure passed to configure
API is of the correct type - it's just checked that it is non-NULL. Adding
in the length of the expected structure provides a measure of typechecking,
and can also be used for ABI compatibility in future, since ABI changes
involving structs almost always involve a change in size.

Signed-off-by:  Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/raw/ifpga/ifpga_rawdev.c            | 3 ++-
 drivers/raw/ioat/ioat_rawdev.c              | 5 +++--
 drivers/raw/ioat/ioat_rawdev_test.c         | 2 +-
 drivers/raw/ntb/ntb.c                       | 6 +++++-
 drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 7 ++++---
 drivers/raw/octeontx2_dma/otx2_dpi_test.c   | 3 ++-
 drivers/raw/octeontx2_ep/otx2_ep_rawdev.c   | 7 ++++---
 drivers/raw/octeontx2_ep/otx2_ep_test.c     | 2 +-
 drivers/raw/skeleton/skeleton_rawdev.c      | 5 +++--
 drivers/raw/skeleton/skeleton_rawdev_test.c | 5 +++--
 examples/ioat/ioatfwd.c                     | 2 +-
 examples/ntb/ntb_fwd.c                      | 2 +-
 lib/librte_rawdev/rte_rawdev.c              | 6 ++++--
 lib/librte_rawdev/rte_rawdev.h              | 8 +++++++-
 lib/librte_rawdev/rte_rawdev_pmd.h          | 3 ++-
 15 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/drivers/raw/ifpga/ifpga_rawdev.c b/drivers/raw/ifpga/ifpga_rawdev.c
index 32a2b96c9..a50173264 100644
--- a/drivers/raw/ifpga/ifpga_rawdev.c
+++ b/drivers/raw/ifpga/ifpga_rawdev.c
@@ -684,7 +684,8 @@ ifpga_rawdev_info_get(struct rte_rawdev *dev,
 
 static int
 ifpga_rawdev_configure(const struct rte_rawdev *dev,
-		rte_rawdev_obj_t config)
+		rte_rawdev_obj_t config,
+		size_t config_size __rte_unused)
 {
 	IFPGA_RAWDEV_PMD_FUNC_TRACE();
 
diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index 6a336795d..b29ff983f 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -41,7 +41,8 @@ RTE_LOG_REGISTER(ioat_pmd_logtype, rawdev.ioat, INFO);
 #define COMPLETION_SZ sizeof(__m128i)
 
 static int
-ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
 {
 	struct rte_ioat_rawdev_config *params = config;
 	struct rte_ioat_rawdev *ioat = dev->dev_private;
@@ -51,7 +52,7 @@ ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 	if (dev->started)
 		return -EBUSY;
 
-	if (params == NULL)
+	if (params == NULL || config_size != sizeof(*params))
 		return -EINVAL;
 
 	if (params->ring_size > 4096 || params->ring_size < 64 ||
diff --git a/drivers/raw/ioat/ioat_rawdev_test.c b/drivers/raw/ioat/ioat_rawdev_test.c
index 90f5974cd..e5b50ae9f 100644
--- a/drivers/raw/ioat/ioat_rawdev_test.c
+++ b/drivers/raw/ioat/ioat_rawdev_test.c
@@ -165,7 +165,7 @@ ioat_rawdev_test(uint16_t dev_id)
 	}
 
 	p.ring_size = IOAT_TEST_RINGSIZE;
-	if (rte_rawdev_configure(dev_id, &info) != 0) {
+	if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
 		printf("Error with rte_rawdev_configure()\n");
 		return -1;
 	}
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index eaeb67b74..c181094d5 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -837,13 +837,17 @@ ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
 }
 
 static int
-ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
 {
 	struct ntb_dev_config *conf = config;
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t xstats_num;
 	int ret;
 
+	if (conf == NULL || config_size != sizeof(*conf))
+		return -EINVAL;
+
 	hw->queue_pairs	= conf->num_queues;
 	hw->queue_size = conf->queue_size;
 	hw->used_mw_num = conf->mz_num;
diff --git a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
index e398abb75..5b496446c 100644
--- a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
+++ b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
@@ -294,7 +294,8 @@ otx2_dpi_rawdev_reset(struct rte_rawdev *dev)
 }
 
 static int
-otx2_dpi_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+otx2_dpi_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
 {
 	struct dpi_rawdev_conf_s *conf = config;
 	struct dpi_vf_s *dpivf = NULL;
@@ -302,8 +303,8 @@ otx2_dpi_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 	uintptr_t pool;
 	uint32_t gaura;
 
-	if (conf == NULL) {
-		otx2_dpi_dbg("NULL configuration");
+	if (conf == NULL || config_size != sizeof(*conf)) {
+		otx2_dpi_dbg("NULL or invalid configuration");
 		return -EINVAL;
 	}
 	dpivf = (struct dpi_vf_s *)dev->dev_private;
diff --git a/drivers/raw/octeontx2_dma/otx2_dpi_test.c b/drivers/raw/octeontx2_dma/otx2_dpi_test.c
index 276658af0..cec6ca91b 100644
--- a/drivers/raw/octeontx2_dma/otx2_dpi_test.c
+++ b/drivers/raw/octeontx2_dma/otx2_dpi_test.c
@@ -182,7 +182,8 @@ test_otx2_dma_rawdev(uint16_t val)
 	/* Configure rawdev ports */
 	conf.chunk_pool = dpi_create_mempool();
 	rdev_info.dev_private = &conf;
-	ret = rte_rawdev_configure(i, (rte_rawdev_obj_t)&rdev_info);
+	ret = rte_rawdev_configure(i, (rte_rawdev_obj_t)&rdev_info,
+			sizeof(conf));
 	if (ret) {
 		otx2_dpi_dbg("Unable to configure DPIVF %d", i);
 		return -ENODEV;
diff --git a/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c b/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
index 0778603d5..2b78a7941 100644
--- a/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
+++ b/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
@@ -224,13 +224,14 @@ sdp_rawdev_close(struct rte_rawdev *dev)
 }
 
 static int
-sdp_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+sdp_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
 {
 	struct sdp_rawdev_info *app_info = (struct sdp_rawdev_info *)config;
 	struct sdp_device *sdpvf;
 
-	if (app_info == NULL) {
-		otx2_err("Application config info [NULL]");
+	if (app_info == NULL || config_size != sizeof(*app_info)) {
+		otx2_err("Application config info [NULL] or incorrect size");
 		return -EINVAL;
 	}
 
diff --git a/drivers/raw/octeontx2_ep/otx2_ep_test.c b/drivers/raw/octeontx2_ep/otx2_ep_test.c
index 091f1827c..b876275f7 100644
--- a/drivers/raw/octeontx2_ep/otx2_ep_test.c
+++ b/drivers/raw/octeontx2_ep/otx2_ep_test.c
@@ -108,7 +108,7 @@ sdp_rawdev_selftest(uint16_t dev_id)
 
 	dev_info.dev_private = &app_info;
 
-	ret = rte_rawdev_configure(dev_id, &dev_info);
+	ret = rte_rawdev_configure(dev_id, &dev_info, sizeof(app_info));
 	if (ret) {
 		otx2_err("Unable to configure SDP_VF %d", dev_id);
 		rte_mempool_free(ioq_mpool);
diff --git a/drivers/raw/skeleton/skeleton_rawdev.c b/drivers/raw/skeleton/skeleton_rawdev.c
index dce300c35..531d0450c 100644
--- a/drivers/raw/skeleton/skeleton_rawdev.c
+++ b/drivers/raw/skeleton/skeleton_rawdev.c
@@ -68,7 +68,8 @@ static int skeleton_rawdev_info_get(struct rte_rawdev *dev,
 }
 
 static int skeleton_rawdev_configure(const struct rte_rawdev *dev,
-				     rte_rawdev_obj_t config)
+				     rte_rawdev_obj_t config,
+				     size_t config_size)
 {
 	struct skeleton_rawdev *skeldev;
 	struct skeleton_rawdev_conf *skeldev_conf;
@@ -77,7 +78,7 @@ static int skeleton_rawdev_configure(const struct rte_rawdev *dev,
 
 	RTE_FUNC_PTR_OR_ERR_RET(dev, -EINVAL);
 
-	if (!config) {
+	if (config == NULL || config_size != sizeof(*skeldev_conf)) {
 		SKELETON_PMD_ERR("Invalid configuration");
 		return -EINVAL;
 	}
diff --git a/drivers/raw/skeleton/skeleton_rawdev_test.c b/drivers/raw/skeleton/skeleton_rawdev_test.c
index 9b8390dfb..7dc7c7684 100644
--- a/drivers/raw/skeleton/skeleton_rawdev_test.c
+++ b/drivers/raw/skeleton/skeleton_rawdev_test.c
@@ -126,7 +126,7 @@ test_rawdev_configure(void)
 	struct skeleton_rawdev_conf rdev_conf_get = {0};
 
 	/* Check invalid configuration */
-	ret = rte_rawdev_configure(test_dev_id, NULL);
+	ret = rte_rawdev_configure(test_dev_id, NULL, 0);
 	RTE_TEST_ASSERT(ret == -EINVAL,
 			"Null configure; Expected -EINVAL, got %d", ret);
 
@@ -137,7 +137,8 @@ test_rawdev_configure(void)
 
 	rdev_info.dev_private = &rdev_conf_set;
 	ret = rte_rawdev_configure(test_dev_id,
-				   (rte_rawdev_obj_t)&rdev_info);
+				   (rte_rawdev_obj_t)&rdev_info,
+				   sizeof(rdev_conf_set));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to configure rawdev (%d)", ret);
 
 	rdev_info.dev_private = &rdev_conf_get;
diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index 5c631da1b..8e9513e44 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -734,7 +734,7 @@ configure_rawdev_queue(uint32_t dev_id)
 	struct rte_ioat_rawdev_config dev_config = { .ring_size = ring_size };
 	struct rte_rawdev_info info = { .dev_private = &dev_config };
 
-	if (rte_rawdev_configure(dev_id, &info) != 0) {
+	if (rte_rawdev_configure(dev_id, &info, sizeof(dev_config)) != 0) {
 		rte_exit(EXIT_FAILURE,
 			"Error with rte_rawdev_configure()\n");
 	}
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index 11e224451..656f73659 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -1401,7 +1401,7 @@ main(int argc, char **argv)
 	ntb_conf.num_queues = num_queues;
 	ntb_conf.queue_size = nb_desc;
 	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
-	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
+	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf, sizeof(ntb_conf));
 	if (ret)
 		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
 			"port=%u\n", ret, dev_id);
diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c
index bde33763e..6c4d783cc 100644
--- a/lib/librte_rawdev/rte_rawdev.c
+++ b/lib/librte_rawdev/rte_rawdev.c
@@ -104,7 +104,8 @@ rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
 }
 
 int
-rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf)
+rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
+		size_t dev_private_size)
 {
 	struct rte_rawdev *dev;
 	int diag;
@@ -123,7 +124,8 @@ rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf)
 	}
 
 	/* Configure the device */
-	diag = (*dev->dev_ops->dev_configure)(dev, dev_conf->dev_private);
+	diag = (*dev->dev_ops->dev_configure)(dev, dev_conf->dev_private,
+			dev_private_size);
 	if (diag != 0)
 		RTE_RDEV_ERR("dev%d dev_configure = %d", dev_id, diag);
 	else
diff --git a/lib/librte_rawdev/rte_rawdev.h b/lib/librte_rawdev/rte_rawdev.h
index cf6acfd26..73e3bd5ae 100644
--- a/lib/librte_rawdev/rte_rawdev.h
+++ b/lib/librte_rawdev/rte_rawdev.h
@@ -116,13 +116,19 @@ rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
  *   driver/implementation can use to configure the device. It is also assumed
  *   that once the configuration is done, a `queue_id` type field can be used
  *   to refer to some arbitrary internal representation of a queue.
+ * @dev_private_size
+ *   The length of the memory space pointed to by dev_private in dev_info.
+ *   This should be set to the size of the expected private structure to be
+ *   used by the driver, and may be checked by drivers to ensure the expected
+ *   struct type is provided.
  *
  * @return
  *   - 0: Success, device configured.
  *   - <0: Error code returned by the driver configuration function.
  */
 int
-rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf);
+rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
+		size_t dev_private_size);
 
 
 /**
diff --git a/lib/librte_rawdev/rte_rawdev_pmd.h b/lib/librte_rawdev/rte_rawdev_pmd.h
index 89e46412a..050f8b029 100644
--- a/lib/librte_rawdev/rte_rawdev_pmd.h
+++ b/lib/librte_rawdev/rte_rawdev_pmd.h
@@ -160,7 +160,8 @@ typedef int (*rawdev_info_get_t)(struct rte_rawdev *dev,
  *   Returns 0 on success
  */
 typedef int (*rawdev_configure_t)(const struct rte_rawdev *dev,
-				  rte_rawdev_obj_t config);
+				  rte_rawdev_obj_t config,
+				  size_t config_size);
 
 /**
  * Start a configured device.
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn
  2020-07-09 15:20  4% [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs Bruce Richardson
@ 2020-07-09 15:20  3% ` Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 4/5] rawdev: add private data length parameter to queue fns Bruce Richardson
  2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Nipun Gupta, Hemant Agrawal
  Cc: dev, Rosen Xu, Tianfei zhang, Xiaoyun Li, Jingjing Wu, Satha Rao,
	Mahipal Challa, Jerin Jacob, Bruce Richardson

Currently with the rawdev API there is no way to check that the structure
passed in via the dev_private pointer in the dev_info structure is of the
correct type - it's just checked that it is non-NULL. Adding in the length
of the expected structure provides a measure of typechecking, and can also
be used for ABI compatibility in future, since ABI changes involving
structs almost always involve a change in size.

Signed-off-by:  Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/bus/ifpga/ifpga_bus.c               |  2 +-
 drivers/raw/ifpga/ifpga_rawdev.c            |  5 +++--
 drivers/raw/ioat/ioat_rawdev.c              |  5 +++--
 drivers/raw/ioat/ioat_rawdev_test.c         |  4 ++--
 drivers/raw/ntb/ntb.c                       |  8 +++++++-
 drivers/raw/skeleton/skeleton_rawdev.c      |  5 +++--
 drivers/raw/skeleton/skeleton_rawdev_test.c | 19 ++++++++++++-------
 examples/ioat/ioatfwd.c                     |  2 +-
 examples/ntb/ntb_fwd.c                      |  2 +-
 lib/librte_rawdev/rte_rawdev.c              |  6 ++++--
 lib/librte_rawdev/rte_rawdev.h              |  9 ++++++++-
 lib/librte_rawdev/rte_rawdev_pmd.h          |  5 ++++-
 12 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
index 6b16a20bb..bb8b3dcfb 100644
--- a/drivers/bus/ifpga/ifpga_bus.c
+++ b/drivers/bus/ifpga/ifpga_bus.c
@@ -162,7 +162,7 @@ ifpga_scan_one(struct rte_rawdev *rawdev,
 	afu_dev->id.port      = afu_pr_conf.afu_id.port;
 
 	if (rawdev->dev_ops && rawdev->dev_ops->dev_info_get)
-		rawdev->dev_ops->dev_info_get(rawdev, afu_dev);
+		rawdev->dev_ops->dev_info_get(rawdev, afu_dev, sizeof(*afu_dev));
 
 	if (rawdev->dev_ops &&
 		rawdev->dev_ops->dev_start &&
diff --git a/drivers/raw/ifpga/ifpga_rawdev.c b/drivers/raw/ifpga/ifpga_rawdev.c
index cc25c662b..47cfa3877 100644
--- a/drivers/raw/ifpga/ifpga_rawdev.c
+++ b/drivers/raw/ifpga/ifpga_rawdev.c
@@ -605,7 +605,8 @@ ifpga_fill_afu_dev(struct opae_accelerator *acc,
 
 static void
 ifpga_rawdev_info_get(struct rte_rawdev *dev,
-				     rte_rawdev_obj_t dev_info)
+		      rte_rawdev_obj_t dev_info,
+		      size_t dev_info_size)
 {
 	struct opae_adapter *adapter;
 	struct opae_accelerator *acc;
@@ -617,7 +618,7 @@ ifpga_rawdev_info_get(struct rte_rawdev *dev,
 
 	IFPGA_RAWDEV_PMD_FUNC_TRACE();
 
-	if (!dev_info) {
+	if (!dev_info || dev_info_size != sizeof(*afu_dev)) {
 		IFPGA_RAWDEV_PMD_ERR("Invalid request");
 		return;
 	}
diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index f876ffc3f..8dd856c55 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -113,12 +113,13 @@ ioat_dev_stop(struct rte_rawdev *dev)
 }
 
 static void
-ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
+ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
+		size_t dev_info_size)
 {
 	struct rte_ioat_rawdev_config *cfg = dev_info;
 	struct rte_ioat_rawdev *ioat = dev->dev_private;
 
-	if (cfg != NULL)
+	if (cfg != NULL && dev_info_size == sizeof(*cfg))
 		cfg->ring_size = ioat->ring_size;
 }
 
diff --git a/drivers/raw/ioat/ioat_rawdev_test.c b/drivers/raw/ioat/ioat_rawdev_test.c
index d99f1bd6b..90f5974cd 100644
--- a/drivers/raw/ioat/ioat_rawdev_test.c
+++ b/drivers/raw/ioat/ioat_rawdev_test.c
@@ -157,7 +157,7 @@ ioat_rawdev_test(uint16_t dev_id)
 		return TEST_SKIPPED;
 	}
 
-	rte_rawdev_info_get(dev_id, &info);
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
 	if (p.ring_size != expected_ring_size[dev_id]) {
 		printf("Error, initial ring size is not as expected (Actual: %d, Expected: %d)\n",
 				(int)p.ring_size, expected_ring_size[dev_id]);
@@ -169,7 +169,7 @@ ioat_rawdev_test(uint16_t dev_id)
 		printf("Error with rte_rawdev_configure()\n");
 		return -1;
 	}
-	rte_rawdev_info_get(dev_id, &info);
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
 	if (p.ring_size != IOAT_TEST_RINGSIZE) {
 		printf("Error, ring size is not %d (%d)\n",
 				IOAT_TEST_RINGSIZE, (int)p.ring_size);
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index e40412bb7..4676c6f8f 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -801,11 +801,17 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
 }
 
 static void
-ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
+ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
+		size_t dev_info_size)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct ntb_dev_info *info = dev_info;
 
+	if (dev_info_size != sizeof(*info)){
+		NTB_LOG(ERR, "Invalid size parameter to %s", __func__);
+		return;
+	}
+
 	info->mw_cnt = hw->mw_cnt;
 	info->mw_size = hw->mw_size;
 
diff --git a/drivers/raw/skeleton/skeleton_rawdev.c b/drivers/raw/skeleton/skeleton_rawdev.c
index 72ece887a..dc05f3ecf 100644
--- a/drivers/raw/skeleton/skeleton_rawdev.c
+++ b/drivers/raw/skeleton/skeleton_rawdev.c
@@ -42,14 +42,15 @@ static struct queue_buffers queue_buf[SKELETON_MAX_QUEUES] = {};
 static void clear_queue_bufs(int queue_id);
 
 static void skeleton_rawdev_info_get(struct rte_rawdev *dev,
-				     rte_rawdev_obj_t dev_info)
+				     rte_rawdev_obj_t dev_info,
+				     size_t dev_info_size)
 {
 	struct skeleton_rawdev *skeldev;
 	struct skeleton_rawdev_conf *skeldev_conf;
 
 	SKELETON_PMD_FUNC_TRACE();
 
-	if (!dev_info) {
+	if (!dev_info || dev_info_size != sizeof(*skeldev_conf)) {
 		SKELETON_PMD_ERR("Invalid request");
 		return;
 	}
diff --git a/drivers/raw/skeleton/skeleton_rawdev_test.c b/drivers/raw/skeleton/skeleton_rawdev_test.c
index 9ecfdee81..9b8390dfb 100644
--- a/drivers/raw/skeleton/skeleton_rawdev_test.c
+++ b/drivers/raw/skeleton/skeleton_rawdev_test.c
@@ -106,12 +106,12 @@ test_rawdev_info_get(void)
 	struct rte_rawdev_info rdev_info = {0};
 	struct skeleton_rawdev_conf skel_conf = {0};
 
-	ret = rte_rawdev_info_get(test_dev_id, NULL);
+	ret = rte_rawdev_info_get(test_dev_id, NULL, 0);
 	RTE_TEST_ASSERT(ret == -EINVAL, "Expected -EINVAL, %d", ret);
 
 	rdev_info.dev_private = &skel_conf;
 
-	ret = rte_rawdev_info_get(test_dev_id, &rdev_info);
+	ret = rte_rawdev_info_get(test_dev_id, &rdev_info, sizeof(skel_conf));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to get raw dev info");
 
 	return TEST_SUCCESS;
@@ -142,7 +142,8 @@ test_rawdev_configure(void)
 
 	rdev_info.dev_private = &rdev_conf_get;
 	ret = rte_rawdev_info_get(test_dev_id,
-				  (rte_rawdev_obj_t)&rdev_info);
+				  (rte_rawdev_obj_t)&rdev_info,
+				  sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret,
 				"Failed to obtain rawdev configuration (%d)",
 				ret);
@@ -170,7 +171,8 @@ test_rawdev_queue_default_conf_get(void)
 	/* Get the current configuration */
 	rdev_info.dev_private = &rdev_conf_get;
 	ret = rte_rawdev_info_get(test_dev_id,
-				  (rte_rawdev_obj_t)&rdev_info);
+				  (rte_rawdev_obj_t)&rdev_info,
+				  sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to obtain rawdev configuration (%d)",
 				ret);
 
@@ -218,7 +220,8 @@ test_rawdev_queue_setup(void)
 	/* Get the current configuration */
 	rdev_info.dev_private = &rdev_conf_get;
 	ret = rte_rawdev_info_get(test_dev_id,
-				  (rte_rawdev_obj_t)&rdev_info);
+				  (rte_rawdev_obj_t)&rdev_info,
+				  sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret,
 				"Failed to obtain rawdev configuration (%d)",
 				ret);
@@ -327,7 +330,8 @@ test_rawdev_start_stop(void)
 	dummy_firmware = NULL;
 
 	rte_rawdev_start(test_dev_id);
-	ret = rte_rawdev_info_get(test_dev_id, (rte_rawdev_obj_t)&rdev_info);
+	ret = rte_rawdev_info_get(test_dev_id, (rte_rawdev_obj_t)&rdev_info,
+			sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret,
 				"Failed to obtain rawdev configuration (%d)",
 				ret);
@@ -336,7 +340,8 @@ test_rawdev_start_stop(void)
 			      rdev_conf_get.device_state);
 
 	rte_rawdev_stop(test_dev_id);
-	ret = rte_rawdev_info_get(test_dev_id, (rte_rawdev_obj_t)&rdev_info);
+	ret = rte_rawdev_info_get(test_dev_id, (rte_rawdev_obj_t)&rdev_info,
+			sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret,
 				"Failed to obtain rawdev configuration (%d)",
 				ret);
diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index b66ee73bc..5c631da1b 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -757,7 +757,7 @@ assign_rawdevs(void)
 			do {
 				if (rdev_id == rte_rawdev_count())
 					goto end;
-				rte_rawdev_info_get(rdev_id++, &rdev_info);
+				rte_rawdev_info_get(rdev_id++, &rdev_info, 0);
 			} while (rdev_info.driver_name == NULL ||
 					strcmp(rdev_info.driver_name,
 						IOAT_PMD_RAWDEV_NAME_STR) != 0);
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index eba8ebf9f..11e224451 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -1389,7 +1389,7 @@ main(int argc, char **argv)
 	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues);
 	printf("Set queue number as %u.\n", num_queues);
 	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
-	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
+	rte_rawdev_info_get(dev_id, &ntb_rawdev_info, sizeof(ntb_info));
 
 	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
 		  MEMPOOL_CACHE_SIZE;
diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c
index 8f84d0b22..a57689035 100644
--- a/lib/librte_rawdev/rte_rawdev.c
+++ b/lib/librte_rawdev/rte_rawdev.c
@@ -78,7 +78,8 @@ rte_rawdev_socket_id(uint16_t dev_id)
 }
 
 int
-rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info)
+rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
+		size_t dev_private_size)
 {
 	struct rte_rawdev *rawdev;
 
@@ -89,7 +90,8 @@ rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info)
 
 	if (dev_info->dev_private != NULL) {
 		RTE_FUNC_PTR_OR_ERR_RET(*rawdev->dev_ops->dev_info_get, -ENOTSUP);
-		(*rawdev->dev_ops->dev_info_get)(rawdev, dev_info->dev_private);
+		(*rawdev->dev_ops->dev_info_get)(rawdev, dev_info->dev_private,
+				dev_private_size);
 	}
 
 	dev_info->driver_name = rawdev->driver_name;
diff --git a/lib/librte_rawdev/rte_rawdev.h b/lib/librte_rawdev/rte_rawdev.h
index 32f6b8bb0..cf6acfd26 100644
--- a/lib/librte_rawdev/rte_rawdev.h
+++ b/lib/librte_rawdev/rte_rawdev.h
@@ -82,13 +82,20 @@ struct rte_rawdev_info;
  *   will be returned. This can be used to safely query the type of a rawdev
  *   instance without needing to know the size of the private data to return.
  *
+ * @param dev_private_size
+ *   The length of the memory space pointed to by dev_private in dev_info.
+ *   This should be set to the size of the expected private structure to be
+ *   returned, and may be checked by drivers to ensure the expected struct
+ *   type is provided.
+ *
  * @return
  *   - 0: Success, driver updates the contextual information of the raw device
  *   - <0: Error code returned by the driver info get function.
  *
  */
 int
-rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info);
+rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
+		size_t dev_private_size);
 
 /**
  * Configure a raw device.
diff --git a/lib/librte_rawdev/rte_rawdev_pmd.h b/lib/librte_rawdev/rte_rawdev_pmd.h
index 4395a2182..0e72a9205 100644
--- a/lib/librte_rawdev/rte_rawdev_pmd.h
+++ b/lib/librte_rawdev/rte_rawdev_pmd.h
@@ -138,12 +138,15 @@ rte_rawdev_pmd_is_valid_dev(uint8_t dev_id)
  *   Raw device pointer
  * @param dev_info
  *   Raw device information structure
+ * @param dev_private_size
+ *   The size of the structure pointed to by dev_info->dev_private
  *
  * @return
  *   Returns 0 on success
  */
 typedef void (*rawdev_info_get_t)(struct rte_rawdev *dev,
-				  rte_rawdev_obj_t dev_info);
+				  rte_rawdev_obj_t dev_info,
+				  size_t dev_private_size);
 
 /**
  * Configure a device.
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs
@ 2020-07-09 15:20  4% Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Bruce Richardson @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Nipun Gupta, Hemant Agrawal
  Cc: dev, Rosen Xu, Tianfei zhang, Xiaoyun Li, Jingjing Wu, Satha Rao,
	Mahipal Challa, Jerin Jacob, Bruce Richardson

This patchset proposes some internal and externally-visible changes to the
rawdev API. If consensus is in favour, I will submit a deprecation notice
for the changes for the 20.08 release, so that these ABI/API-breaking
changes can be merged in 20.11

The changes are in two areas:
* For any APIs which take a void * parameter for driver-specific structs,
  add an additional parameter to provide the struct length. This allows
  some runtime type-checking, as well as possible ABI-compatibility support
  in the future as structure change generally involve a change in the size
  of the structure.
* Ensure all APIs which can return error values have int type, rather than
  void. Since functions like info_get and queue_default_get can now do some
  typechecking, they need to be modified to allow them to return error
  codes on failure.

Bruce Richardson (5):
  rawdev: add private data length parameter to info fn
  rawdev: allow drivers to return error from info function
  rawdev: add private data length parameter to config fn
  rawdev: add private data length parameter to queue fns
  rawdev: allow queue default config query to return error

 drivers/bus/ifpga/ifpga_bus.c               |  2 +-
 drivers/raw/ifpga/ifpga_rawdev.c            | 23 +++++-----
 drivers/raw/ioat/ioat_rawdev.c              | 17 ++++---
 drivers/raw/ioat/ioat_rawdev_test.c         |  6 +--
 drivers/raw/ntb/ntb.c                       | 49 ++++++++++++++++-----
 drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c |  7 +--
 drivers/raw/octeontx2_dma/otx2_dpi_test.c   |  3 +-
 drivers/raw/octeontx2_ep/otx2_ep_rawdev.c   |  7 +--
 drivers/raw/octeontx2_ep/otx2_ep_test.c     |  2 +-
 drivers/raw/skeleton/skeleton_rawdev.c      | 34 ++++++++------
 drivers/raw/skeleton/skeleton_rawdev_test.c | 32 ++++++++------
 examples/ioat/ioatfwd.c                     |  4 +-
 examples/ntb/ntb_fwd.c                      |  7 +--
 lib/librte_rawdev/rte_rawdev.c              | 27 +++++++-----
 lib/librte_rawdev/rte_rawdev.h              | 27 ++++++++++--
 lib/librte_rawdev/rte_rawdev_pmd.h          | 22 ++++++---
 16 files changed, 178 insertions(+), 91 deletions(-)

-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 13:31  0%         ` Honnappa Nagarahalli
@ 2020-07-09 14:10  0%           ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09 14:10 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Olivier Matz
  Cc: dev, stephen, david.marchand, drc, Ruifeng Wang, nd, nd

 <snip>

> 
> > >
> > > Hi Phil,
> > >
> > > On Thu, Jul 09, 2020 at 06:10:42PM +0800, Phil Yang wrote:
> > > > Use C11 atomic built-ins with explicit ordering instead of
> > > > rte_atomic ops which enforce unnecessary barriers on aarch64.
> > > >
> > > > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > > v3:
> > > > 1.Fix ABI breakage.
> > > > 2.Simplify data type cast.
> > > >
> > > > v2:
> > > > Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt
> > > > field to refcnt_atomic.
> > > >
> > > >  lib/librte_mbuf/rte_mbuf.c      |  1 -
> > > >  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
> > > >  lib/librte_mbuf/rte_mbuf_core.h |  2 +-
> > > >  3 files changed, 11 insertions(+), 11 deletions(-)
> > > >
> > <snip>
> > > >
> > > >  /* Reinitialize counter before mbuf freeing. */ diff --git
> > > > a/lib/librte_mbuf/rte_mbuf_core.h
> > > b/lib/librte_mbuf/rte_mbuf_core.h
> > > > index 16600f1..d65d1c8 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > > @@ -679,7 +679,7 @@ typedef void
> > > (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
> > > >  struct rte_mbuf_ext_shared_info {
> > > >  rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> > > function */
> > > >  void *fcb_opaque;                        /**< Free callback argument */
> > > > -rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> > > > +uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
> > > >  };
> > >
> > > To avoid an API breakage (i.e. currently, an application that accesses
> > > to refcnt_atomic expects that its type is rte_atomic16_t), I suggest
> > > to do the same than in the mbuf struct:
> > >
> > > union {
> > > rte_atomic16_t refcnt_atomic;
> > > uint16_t refcnt;
> > > };
> > >
> > > I hope the ABI checker won't complain.
> > >
> > > It will also be better for 20.11 when the deprecated fields will be
> > > renamed: the remaining one will be called 'refcnt' in both mbuf and
> > > mbuf_ext_shared_info.
> Does this need a deprecation notice in 20.08?

Yes. We'd better do that.
I will add a notice for it in this patch. Thanks.

> 
> >
> > Got it. I agree with you.
> > It should work. In my local test machine, the ABI checker happy with this
> > approach.
> > Once the test is done, I will upstream the new patch.
> >
> > Appreciate your comments.
> >
> > Thanks,
> > Phil


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 13:00  3%       ` Phil Yang
@ 2020-07-09 13:31  0%         ` Honnappa Nagarahalli
  2020-07-09 14:10  0%           ` Phil Yang
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2020-07-09 13:31 UTC (permalink / raw)
  To: Phil Yang, Olivier Matz
  Cc: dev, stephen, david.marchand, drc, Ruifeng Wang, nd,
	Honnappa Nagarahalli, nd

<snip>

> >
> > Hi Phil,
> >
> > On Thu, Jul 09, 2020 at 06:10:42PM +0800, Phil Yang wrote:
> > > Use C11 atomic built-ins with explicit ordering instead of
> > > rte_atomic ops which enforce unnecessary barriers on aarch64.
> > >
> > > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > > v3:
> > > 1.Fix ABI breakage.
> > > 2.Simplify data type cast.
> > >
> > > v2:
> > > Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt
> > > field to refcnt_atomic.
> > >
> > >  lib/librte_mbuf/rte_mbuf.c      |  1 -
> > >  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
> > >  lib/librte_mbuf/rte_mbuf_core.h |  2 +-
> > >  3 files changed, 11 insertions(+), 11 deletions(-)
> > >
> <snip>
> > >
> > >  /* Reinitialize counter before mbuf freeing. */ diff --git
> > > a/lib/librte_mbuf/rte_mbuf_core.h
> > b/lib/librte_mbuf/rte_mbuf_core.h
> > > index 16600f1..d65d1c8 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > @@ -679,7 +679,7 @@ typedef void
> > (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
> > >  struct rte_mbuf_ext_shared_info {
> > >  rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> > function */
> > >  void *fcb_opaque;                        /**< Free callback argument */
> > > -rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> > > +uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
> > >  };
> >
> > To avoid an API breakage (i.e. currently, an application that accesses
> > to refcnt_atomic expects that its type is rte_atomic16_t), I suggest
> > to do the same than in the mbuf struct:
> >
> > union {
> > rte_atomic16_t refcnt_atomic;
> > uint16_t refcnt;
> > };
> >
> > I hope the ABI checker won't complain.
> >
> > It will also be better for 20.11 when the deprecated fields will be
> > renamed: the remaining one will be called 'refcnt' in both mbuf and
> > mbuf_ext_shared_info.
Does this need a deprecation notice in 20.08?

> 
> Got it. I agree with you.
> It should work. In my local test machine, the ABI checker happy with this
> approach.
> Once the test is done, I will upstream the new patch.
> 
> Appreciate your comments.
> 
> Thanks,
> Phil


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 11:03  3%     ` Olivier Matz
@ 2020-07-09 13:00  3%       ` Phil Yang
  2020-07-09 13:31  0%         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Phil Yang @ 2020-07-09 13:00 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, stephen, david.marchand, drc, Honnappa Nagarahalli,
	Ruifeng Wang, nd

Hi Oliver,

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Thursday, July 9, 2020 7:04 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> david.marchand@redhat.com; drc@linux.vnet.ibm.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
> 
> Hi Phil,
> 
> On Thu, Jul 09, 2020 at 06:10:42PM +0800, Phil Yang wrote:
> > Use C11 atomic built-ins with explicit ordering instead of rte_atomic
> > ops which enforce unnecessary barriers on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > v3:
> > 1.Fix ABI breakage.
> > 2.Simplify data type cast.
> >
> > v2:
> > Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> > to refcnt_atomic.
> >
> >  lib/librte_mbuf/rte_mbuf.c      |  1 -
> >  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
> >  lib/librte_mbuf/rte_mbuf_core.h |  2 +-
> >  3 files changed, 11 insertions(+), 11 deletions(-)
> >
<snip>
> >
> >  	/* Reinitialize counter before mbuf freeing. */
> > diff --git a/lib/librte_mbuf/rte_mbuf_core.h
> b/lib/librte_mbuf/rte_mbuf_core.h
> > index 16600f1..d65d1c8 100644
> > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > @@ -679,7 +679,7 @@ typedef void
> (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
> >  struct rte_mbuf_ext_shared_info {
> >  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> function */
> >  	void *fcb_opaque;                        /**< Free callback argument */
> > -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> > +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
> >  };
> 
> To avoid an API breakage (i.e. currently, an application that accesses
> to refcnt_atomic expects that its type is rte_atomic16_t), I suggest to
> do the same than in the mbuf struct:
> 
> 	union {
> 		rte_atomic16_t refcnt_atomic;
> 		uint16_t refcnt;
> 	};
> 
> I hope the ABI checker won't complain.
> 
> It will also be better for 20.11 when the deprecated fields will be
> renamed: the remaining one will be called 'refcnt' in both mbuf and
> mbuf_ext_shared_info.

Got it. I agree with you.
It should work. In my local test machine, the ABI checker happy with this approach. 
Once the test is done, I will upstream the new patch.

Appreciate your comments.

Thanks,
Phil

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (5 preceding siblings ...)
  2020-07-08 15:47  2% ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-09 12:36  2% ` Viacheslav Ovsiienko
  2020-07-09 23:47  0%   ` Ferruh Yigit
  6 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-09 12:36 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well. Having the dedicated
flags for Rx/Tx timestamps allows applications not to perform explicit
flags reset on forwarding and not to promote received timestamps
to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
is considered as candidate to become the dynamic flag.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

---
  v1->v4:
     - dedicated dynamic Tx timestamp flag instead of shared with Rx
  v4->v5:
     - elaborated commit message
     - more words about device clocks added,
     - note about dedicated Rx/Tx timestamp flags added
  v5->v6:
     - release notes are updated
---
 doc/guides/rel_notes/release_20_08.rst |  6 ++++++
 lib/librte_ethdev/rte_ethdev.c         |  1 +
 lib/librte_ethdev/rte_ethdev.h         |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 988474c..5527bab 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -200,6 +200,12 @@ New Features
   See the :doc:`../sample_app_ug/l2_forward_real_virtual` for more
   details of this parameter usage.
 
+* **Introduced send packet scheduling on the timestamps.**
+
+  Added the new mbuf dynamic field and flag to provide timestamp on what packet
+  transmitting can be synchronized. The device Tx offload flag is added to
+  indicate the PMD supports send scheduling.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7022bd7..c48ca2a 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 631b146..97313a0 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..8407230 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value for the packets being sent, this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
@ 2020-07-09 11:03  3%     ` Olivier Matz
  2020-07-09 13:00  3%       ` Phil Yang
  2020-07-09 15:58  4%     ` [dpdk-dev] [PATCH v4 1/2] " Phil Yang
  1 sibling, 1 reply; 200+ results
From: Olivier Matz @ 2020-07-09 11:03 UTC (permalink / raw)
  To: Phil Yang
  Cc: dev, stephen, david.marchand, drc, Honnappa.Nagarahalli,
	Ruifeng.Wang, nd

Hi Phil,

On Thu, Jul 09, 2020 at 06:10:42PM +0800, Phil Yang wrote:
> Use C11 atomic built-ins with explicit ordering instead of rte_atomic
> ops which enforce unnecessary barriers on aarch64.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> v3:
> 1.Fix ABI breakage.
> 2.Simplify data type cast.
> 
> v2:
> Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> to refcnt_atomic.
> 
>  lib/librte_mbuf/rte_mbuf.c      |  1 -
>  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
>  lib/librte_mbuf/rte_mbuf_core.h |  2 +-
>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index ae91ae2..8a456e5 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -22,7 +22,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_mempool.h>
>  #include <rte_mbuf.h>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index f8e492e..c1c0956 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -37,7 +37,6 @@
>  #include <rte_config.h>
>  #include <rte_mempool.h>
>  #include <rte_memory.h>
> -#include <rte_atomic.h>
>  #include <rte_prefetch.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_byteorder.h>
> @@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
>  static inline uint16_t
>  rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>  {
> -	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
> +	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>  static inline void
>  rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>  {
> -	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
> +	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
>  }
>  
>  /* internal */
>  static inline uint16_t
>  __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
>  {
> -	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
> +	return __atomic_add_fetch(&m->refcnt, (uint16_t)value,
> +				 __ATOMIC_ACQ_REL);
>  }
>  
>  /**
> @@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>  static inline uint16_t
>  rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
>  {
> -	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
> +	return __atomic_load_n(&shinfo->refcnt_atomic, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -481,7 +481,7 @@ static inline void
>  rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
>  	uint16_t new_value)
>  {
> -	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
> +	__atomic_store_n(&shinfo->refcnt_atomic, new_value, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
>  		return (uint16_t)value;
>  	}
>  
> -	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
> +	return __atomic_add_fetch(&shinfo->refcnt_atomic, (uint16_t)value,
> +				 __ATOMIC_ACQ_REL);
>  }
>  
>  /** Mbuf prefetch */
> @@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
>  	 * Direct usage of add primitive to avoid
>  	 * duplication of comparing with one.
>  	 */
> -	if (likely(rte_atomic16_add_return
> -			(&shinfo->refcnt_atomic, -1)))
> +	if (likely(__atomic_add_fetch(&shinfo->refcnt_atomic, (uint16_t)-1,
> +				     __ATOMIC_ACQ_REL)))
>  		return 1;
>  
>  	/* Reinitialize counter before mbuf freeing. */
> diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> index 16600f1..d65d1c8 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -679,7 +679,7 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
>  struct rte_mbuf_ext_shared_info {
>  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
>  	void *fcb_opaque;                        /**< Free callback argument */
> -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
>  };

To avoid an API breakage (i.e. currently, an application that accesses
to refcnt_atomic expects that its type is rte_atomic16_t), I suggest to
do the same than in the mbuf struct:

	union {
		rte_atomic16_t refcnt_atomic;
		uint16_t refcnt;
	};

I hope the ABI checker won't complain.

It will also be better for 20.11 when the deprecated fields will be
renamed: the remaining one will be called 'refcnt' in both mbuf and
mbuf_ext_shared_info.


Olivier

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] eal: use c11 atomic built-ins for interrupt status
  2020-07-09  8:34  2%   ` [dpdk-dev] [PATCH v3] " Phil Yang
@ 2020-07-09 10:30  0%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-07-09 10:30 UTC (permalink / raw)
  To: Phil Yang, Ray Kinsella, Harman Kalra
  Cc: dev, stefan.puiu, Aaron Conole, David Christensen,
	Honnappa Nagarahalli, Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Neil Horman

On Thu, Jul 9, 2020 at 10:35 AM Phil Yang <phil.yang@arm.com> wrote:
>
> The event status is defined as a volatile variable and shared between
> threads. Use c11 atomic built-ins with explicit ordering instead of
> rte_atomic ops which enforce unnecessary barriers on aarch64.
>
> The event status has been cleaned up by the compare-and-swap operation
> when we free the event data, so there is no need to set it to invalid
> after that.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Harman Kalra <hkalra@marvell.com>
> ---
> v3:
> Fixed typo.
>
> v2:
> 1. Fixed typo.
> 2. Updated libabigail.abignore to pass ABI check.
> 3. Merged v1 two patches into one patch.
>
>  devtools/libabigail.abignore                |  4 +++
>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>  lib/librte_eal/linux/eal_interrupts.c       | 48 ++++++++++++++++++++---------
>  3 files changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 0133f75..daa4631 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -48,6 +48,10 @@
>          changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
>  [suppress_variable]
>          name = rte_crypto_aead_algorithm_strings
> +; Ignore updates of epoll event
> +[suppress_type]
> +        type_kind = struct
> +        name = rte_epoll_event

In general, ignoring all changes on a structure is risky.
But the risk is acceptable as long as we remember this for the rest of
the 20.08 release (and we will start from scratch for 20.11).


Without any comment from others, I'll merge this by the end of (my) day.

Thanks.

-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
  2020-07-08  5:11  3%   ` Phil Yang
  2020-07-08 11:44  0%   ` Olivier Matz
@ 2020-07-09 10:10  4%   ` Phil Yang
  2020-07-09 11:03  3%     ` Olivier Matz
  2020-07-09 15:58  4%     ` [dpdk-dev] [PATCH v4 1/2] " Phil Yang
  2 siblings, 2 replies; 200+ results
From: Phil Yang @ 2020-07-09 10:10 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: stephen, david.marchand, drc, Honnappa.Nagarahalli, Ruifeng.Wang, nd

Use C11 atomic built-ins with explicit ordering instead of rte_atomic
ops which enforce unnecessary barriers on aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v3:
1.Fix ABI breakage.
2.Simplify data type cast.

v2:
Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
to refcnt_atomic.

 lib/librte_mbuf/rte_mbuf.c      |  1 -
 lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
 lib/librte_mbuf/rte_mbuf_core.h |  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index ae91ae2..8a456e5 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f8e492e..c1c0956 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -37,7 +37,6 @@
 #include <rte_config.h>
 #include <rte_mempool.h>
 #include <rte_memory.h>
-#include <rte_atomic.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
 #include <rte_byteorder.h>
@@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
+	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
 }
 
 /**
@@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
+	return __atomic_add_fetch(&m->refcnt, (uint16_t)value,
+				 __ATOMIC_ACQ_REL);
 }
 
 /**
@@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
+	return __atomic_load_n(&shinfo->refcnt_atomic, __ATOMIC_RELAXED);
 }
 
 /**
@@ -481,7 +481,7 @@ static inline void
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&shinfo->refcnt_atomic, new_value, __ATOMIC_RELAXED);
 }
 
 /**
@@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 		return (uint16_t)value;
 	}
 
-	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
+	return __atomic_add_fetch(&shinfo->refcnt_atomic, (uint16_t)value,
+				 __ATOMIC_ACQ_REL);
 }
 
 /** Mbuf prefetch */
@@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(rte_atomic16_add_return
-			(&shinfo->refcnt_atomic, -1)))
+	if (likely(__atomic_add_fetch(&shinfo->refcnt_atomic, (uint16_t)-1,
+				     __ATOMIC_ACQ_REL)))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 16600f1..d65d1c8 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -679,7 +679,7 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
+	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
 };
 
 /**< Maximum number of nb_segs allowed. */
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
  2020-07-08 11:44  0%   ` Olivier Matz
@ 2020-07-09 10:00  3%     ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09 10:00 UTC (permalink / raw)
  To: Olivier Matz
  Cc: david.marchand, dev, drc, Honnappa Nagarahalli, Ruifeng Wang, nd

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Wednesday, July 8, 2020 7:44 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: david.marchand@redhat.com; dev@dpdk.org; drc@linux.vnet.ibm.com;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v2] mbuf: use C11 atomics for refcnt operations
> 
> Hi,
> 
> On Tue, Jul 07, 2020 at 06:10:33PM +0800, Phil Yang wrote:
> > Use C11 atomics with explicit ordering instead of rte_atomic ops which
> > enforce unnecessary barriers on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > v2:
> > Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> > to refcnt_atomic.
> >
> >  lib/librte_mbuf/rte_mbuf.c      |  1 -
> >  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
> >  lib/librte_mbuf/rte_mbuf_core.h | 11 +++--------
> >  3 files changed, 13 insertions(+), 18 deletions(-)
> >

<snip>

> 
> It seems this patch does 2 things:
> - remove refcnt_atomic
> - use C11 atomics
> 
> The first change is an API break. I think it should be announced in a
> deprecation
> notice. The one about atomic does not talk about it.
> 
> So I suggest to keep refcnt_atomic until next version.

Agreed.
I did a local test, this approach doesn't have any ABI breakage issue.
I will update in the next version. 

Thanks,
Phil

> 
> 
> >  	uint16_t nb_segs;         /**< Number of segments. */
> >
> >  	/** Input port (16 bits to support more than 256 virtual ports).
> > @@ -679,7 +674,7 @@ typedef void
> (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
> >  struct rte_mbuf_ext_shared_info {
> >  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> function */
> >  	void *fcb_opaque;                        /**< Free callback argument */
> > -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> > +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
> >  };
> >
> >  /**< Maximum number of nb_segs allowed. */
> > --
> > 2.7.4
> >

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3] eal: use c11 atomic built-ins for interrupt status
  2020-07-09  6:46  3% ` [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins " Phil Yang
  2020-07-09  8:02  0%   ` Stefan Puiu
@ 2020-07-09  8:34  2%   ` Phil Yang
  2020-07-09 10:30  0%     ` David Marchand
  1 sibling, 1 reply; 200+ results
From: Phil Yang @ 2020-07-09  8:34 UTC (permalink / raw)
  To: david.marchand, dev
  Cc: stefan.puiu, mdr, aconole, drc, Honnappa.Nagarahalli,
	Ruifeng.Wang, nd, dodji, nhorman, hkalra

The event status is defined as a volatile variable and shared between
threads. Use c11 atomic built-ins with explicit ordering instead of
rte_atomic ops which enforce unnecessary barriers on aarch64.

The event status has been cleaned up by the compare-and-swap operation
when we free the event data, so there is no need to set it to invalid
after that.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
---
v3:
Fixed typo.

v2:
1. Fixed typo.
2. Updated libabigail.abignore to pass ABI check.
3. Merged v1 two patches into one patch.

 devtools/libabigail.abignore                |  4 +++
 lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
 lib/librte_eal/linux/eal_interrupts.c       | 48 ++++++++++++++++++++---------
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 0133f75..daa4631 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -48,6 +48,10 @@
         changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
 [suppress_variable]
         name = rte_crypto_aead_algorithm_strings
+; Ignore updates of epoll event
+[suppress_type]
+        type_kind = struct
+        name = rte_epoll_event
 
 ;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till DPDK 20.11
diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
index 773a34a..b1e8a29 100644
--- a/lib/librte_eal/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/include/rte_eal_interrupts.h
@@ -59,7 +59,7 @@ enum {
 
 /** interrupt epoll event obj, taken by epoll_event.ptr */
 struct rte_epoll_event {
-	volatile uint32_t status;  /**< OUT: event status */
+	uint32_t status;           /**< OUT: event status */
 	int fd;                    /**< OUT: event fd */
 	int epfd;       /**< OUT: epoll instance the ev associated with */
 	struct rte_epoll_data epdata;
diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
index 84eeaa1..ad09049 100644
--- a/lib/librte_eal/linux/eal_interrupts.c
+++ b/lib/librte_eal/linux/eal_interrupts.c
@@ -26,7 +26,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_log.h>
@@ -1221,11 +1220,18 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
 {
 	unsigned int i, count = 0;
 	struct rte_epoll_event *rev;
+	uint32_t valid_status;
 
 	for (i = 0; i < n; i++) {
 		rev = evs[i].data.ptr;
-		if (!rev || !rte_atomic32_cmpset(&rev->status, RTE_EPOLL_VALID,
-						 RTE_EPOLL_EXEC))
+		valid_status =  RTE_EPOLL_VALID;
+		/* ACQUIRE memory ordering here pairs with RELEASE
+		 * ordering below acting as a lock to synchronize
+		 * the event data updating.
+		 */
+		if (!rev || !__atomic_compare_exchange_n(&rev->status,
+				    &valid_status, RTE_EPOLL_EXEC, 0,
+				    __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
 			continue;
 
 		events[count].status        = RTE_EPOLL_VALID;
@@ -1237,8 +1243,11 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
 			rev->epdata.cb_fun(rev->fd,
 					   rev->epdata.cb_arg);
 
-		rte_compiler_barrier();
-		rev->status = RTE_EPOLL_VALID;
+		/* the status update should be observed after
+		 * the other fields change.
+		 */
+		__atomic_store_n(&rev->status, RTE_EPOLL_VALID,
+				__ATOMIC_RELEASE);
 		count++;
 	}
 	return count;
@@ -1308,10 +1317,14 @@ rte_epoll_wait(int epfd, struct rte_epoll_event *events,
 static inline void
 eal_epoll_data_safe_free(struct rte_epoll_event *ev)
 {
-	while (!rte_atomic32_cmpset(&ev->status, RTE_EPOLL_VALID,
-				    RTE_EPOLL_INVALID))
-		while (ev->status != RTE_EPOLL_VALID)
+	uint32_t valid_status = RTE_EPOLL_VALID;
+	while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
+		    RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+		while (__atomic_load_n(&ev->status,
+				__ATOMIC_RELAXED) != RTE_EPOLL_VALID)
 			rte_pause();
+		valid_status = RTE_EPOLL_VALID;
+	}
 	memset(&ev->epdata, 0, sizeof(ev->epdata));
 	ev->fd = -1;
 	ev->epfd = -1;
@@ -1333,7 +1346,8 @@ rte_epoll_ctl(int epfd, int op, int fd,
 		epfd = rte_intr_tls_epfd();
 
 	if (op == EPOLL_CTL_ADD) {
-		event->status = RTE_EPOLL_VALID;
+		__atomic_store_n(&event->status, RTE_EPOLL_VALID,
+				__ATOMIC_RELAXED);
 		event->fd = fd;  /* ignore fd in event */
 		event->epfd = epfd;
 		ev.data.ptr = (void *)event;
@@ -1345,11 +1359,13 @@ rte_epoll_ctl(int epfd, int op, int fd,
 			op, fd, strerror(errno));
 		if (op == EPOLL_CTL_ADD)
 			/* rollback status when CTL_ADD fail */
-			event->status = RTE_EPOLL_INVALID;
+			__atomic_store_n(&event->status, RTE_EPOLL_INVALID,
+					__ATOMIC_RELAXED);
 		return -1;
 	}
 
-	if (op == EPOLL_CTL_DEL && event->status != RTE_EPOLL_INVALID)
+	if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
+			__ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
 		eal_epoll_data_safe_free(event);
 
 	return 0;
@@ -1378,7 +1394,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 	case RTE_INTR_EVENT_ADD:
 		epfd_op = EPOLL_CTL_ADD;
 		rev = &intr_handle->elist[efd_idx];
-		if (rev->status != RTE_EPOLL_INVALID) {
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event already been added.\n");
 			return -EEXIST;
 		}
@@ -1401,7 +1418,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 	case RTE_INTR_EVENT_DEL:
 		epfd_op = EPOLL_CTL_DEL;
 		rev = &intr_handle->elist[efd_idx];
-		if (rev->status == RTE_EPOLL_INVALID) {
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event does not exist.\n");
 			return -EPERM;
 		}
@@ -1426,12 +1444,12 @@ rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle)
 
 	for (i = 0; i < intr_handle->nb_efd; i++) {
 		rev = &intr_handle->elist[i];
-		if (rev->status == RTE_EPOLL_INVALID)
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
 			continue;
 		if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
 			/* force free if the entry valid */
 			eal_epoll_data_safe_free(rev);
-			rev->status = RTE_EPOLL_INVALID;
 		}
 	}
 }
-- 
2.7.4


^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR
  2020-07-09  8:02  4% ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09  8:02  2%   ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-09  8:02 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct warps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 167 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  53 +++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 237 insertions(+), 24 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..4fbf5b6df 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -39,6 +40,17 @@ enum valid_flag {
 	VALID
 };
 
+/** @internal LPM structure. */
+struct __rte_lpm {
+	/* LPM metadata. */
+	struct rte_lpm lpm;
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
 	char mem_name[RTE_LPM_NAMESIZE];
+	struct __rte_lpm *internal_lpm = NULL;
 	struct rte_lpm *lpm = NULL;
 	struct rte_tailq_entry *te;
 	uint32_t mem_size, rules_size, tbl8s_size;
@@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
 
 	snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
 
-	/* Determine the amount of memory to allocate. */
-	mem_size = sizeof(*lpm);
-	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
 	rte_mcfg_tailq_write_lock();
 
 	/* guarantee there's no existing */
@@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
 		goto exit;
 	}
 
+	/* Determine the amount of memory to allocate. */
+	mem_size = sizeof(*internal_lpm);
+	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
 	/* allocate tailq entry */
 	te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
 	if (te == NULL) {
@@ -170,22 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
 	}
 
 	/* Allocate memory to store the LPM data structures. */
-	lpm = rte_zmalloc_socket(mem_name, mem_size,
+	internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
 			RTE_CACHE_LINE_SIZE, socket_id);
-	if (lpm == NULL) {
+	if (internal_lpm == NULL) {
 		RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
 
+	lpm = &internal_lpm->lpm;
 	lpm->rules_tbl = rte_zmalloc_socket(NULL,
 			(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
 
 	if (lpm->rules_tbl == NULL) {
 		RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
-		rte_free(lpm);
-		lpm = NULL;
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
@@ -197,8 +211,8 @@ rte_lpm_create(const char *name, int socket_id,
 	if (lpm->tbl8 == NULL) {
 		RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
 		rte_free(lpm->rules_tbl);
-		rte_free(lpm);
-		lpm = NULL;
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
@@ -225,6 +239,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm_list *lpm_list;
 	struct rte_tailq_entry *te;
 
@@ -246,12 +261,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->dq)
+		rte_rcu_qsbr_dq_delete(internal_lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	struct __rte_lpm *internal_lpm;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if (lpm == NULL || cfg == NULL) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->v != NULL) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		internal_lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (internal_lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM, "LPM defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = internal_lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	internal_lpm->rcu_mode = cfg->mode;
+	internal_lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +481,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +515,47 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if (group_idx == -ENOSPC && internal_lpm->dq != NULL) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(internal_lpm->dq, 1,
+				NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (internal_lpm->v == NULL) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(internal_lpm->v,
+			RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(internal_lpm->dq,
+				(void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +644,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +690,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1098,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1114,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -132,6 +145,22 @@ struct rte_lpm {
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
 };
 
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
+};
+
 /**
  * Create an LPM object.
  *
@@ -179,6 +208,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library
                     ` (2 preceding siblings ...)
  2020-07-07 15:15  3% ` [dpdk-dev] [PATCH v7 " Ruifeng Wang
@ 2020-07-09  8:02  4% ` Ruifeng Wang
  2020-07-09  8:02  2%   ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-09 15:42  4% ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-10  2:22  4% ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-07-09  8:02 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 167 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 28 deletions(-)

-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins for interrupt status
  2020-07-09  6:46  3% ` [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins " Phil Yang
@ 2020-07-09  8:02  0%   ` Stefan Puiu
  2020-07-09  8:34  2%   ` [dpdk-dev] [PATCH v3] " Phil Yang
  1 sibling, 0 replies; 200+ results
From: Stefan Puiu @ 2020-07-09  8:02 UTC (permalink / raw)
  To: Phil Yang
  Cc: david.marchand, dev, mdr, aconole, drc, Honnappa.Nagarahalli,
	Ruifeng.Wang, nd, dodji, Neil Horman, hkalra

Hi,

Noticed 2 typos:

On Thu, Jul 9, 2020 at 9:46 AM Phil Yang <phil.yang@arm.com> wrote:
>
> The event status is defined as a volatile variable and shared between
> threads. Use c11 atomic built-ins with explicit ordering instead of
> rte_atomic ops which enforce unnecessary barriers on aarch64.
>
> The event status has been cleaned up by the compare-and-swap operation
> when we free the event data, so there is no need to set it to invalid
> after that.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Harman Kalra <hkalra@marvell.com>
> ---
> v2:
> 1. Fixed typo.
> 2. Updated libabigail.abignore to pass ABI check.
> 3. Merged v1 two patches into one patch.
>
>  devtools/libabigail.abignore                |  4 +++
>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>  lib/librte_eal/linux/eal_interrupts.c       | 48 ++++++++++++++++++++---------
>  3 files changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 0133f75..daa4631 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -48,6 +48,10 @@
>          changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
>  [suppress_variable]
>          name = rte_crypto_aead_algorithm_strings
> +; Ignore updates of epoll event
> +[suppress_type]
> +        type_kind = struct
> +        name = rte_epoll_event
>
>  ;;;;;;;;;;;;;;;;;;;;;;
>  ; Temporary exceptions till DPDK 20.11
> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
> index 773a34a..b1e8a29 100644
> --- a/lib/librte_eal/include/rte_eal_interrupts.h
> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
> @@ -59,7 +59,7 @@ enum {
>
>  /** interrupt epoll event obj, taken by epoll_event.ptr */
>  struct rte_epoll_event {
> -       volatile uint32_t status;  /**< OUT: event status */
> +       uint32_t status;           /**< OUT: event status */
>         int fd;                    /**< OUT: event fd */
>         int epfd;       /**< OUT: epoll instance the ev associated with */
>         struct rte_epoll_data epdata;
> diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
> index 84eeaa1..7a50869 100644
> --- a/lib/librte_eal/linux/eal_interrupts.c
> +++ b/lib/librte_eal/linux/eal_interrupts.c
> @@ -26,7 +26,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_debug.h>
>  #include <rte_log.h>
> @@ -1221,11 +1220,18 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
>  {
>         unsigned int i, count = 0;
>         struct rte_epoll_event *rev;
> +       uint32_t valid_status;
>
>         for (i = 0; i < n; i++) {
>                 rev = evs[i].data.ptr;
> -               if (!rev || !rte_atomic32_cmpset(&rev->status, RTE_EPOLL_VALID,
> -                                                RTE_EPOLL_EXEC))
> +               valid_status =  RTE_EPOLL_VALID;
> +               /* ACQUIRE memory ordering here pairs with RELEASE
> +                * ordering bellow acting as a lock to synchronize
s/bellow/below

> +                * the event data updating.
> +                */
> +               if (!rev || !__atomic_compare_exchange_n(&rev->status,
> +                                   &valid_status, RTE_EPOLL_EXEC, 0,
> +                                   __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
>                         continue;
>
>                 events[count].status        = RTE_EPOLL_VALID;
> @@ -1237,8 +1243,11 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
>                         rev->epdata.cb_fun(rev->fd,
>                                            rev->epdata.cb_arg);
>
> -               rte_compiler_barrier();
> -               rev->status = RTE_EPOLL_VALID;
> +               /* the status update should be observed after
> +                * the other fields changes.
s/fields changes/fields change/

Thanks,
Stefan.

> +                */
> +               __atomic_store_n(&rev->status, RTE_EPOLL_VALID,
> +                               __ATOMIC_RELEASE);
>                 count++;
>         }
>         return count;
> @@ -1308,10 +1317,14 @@ rte_epoll_wait(int epfd, struct rte_epoll_event *events,
>  static inline void
>  eal_epoll_data_safe_free(struct rte_epoll_event *ev)
>  {
> -       while (!rte_atomic32_cmpset(&ev->status, RTE_EPOLL_VALID,
> -                                   RTE_EPOLL_INVALID))
> -               while (ev->status != RTE_EPOLL_VALID)
> +       uint32_t valid_status = RTE_EPOLL_VALID;
> +       while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
> +                   RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
> +               while (__atomic_load_n(&ev->status,
> +                               __ATOMIC_RELAXED) != RTE_EPOLL_VALID)
>                         rte_pause();
> +               valid_status = RTE_EPOLL_VALID;
> +       }
>         memset(&ev->epdata, 0, sizeof(ev->epdata));
>         ev->fd = -1;
>         ev->epfd = -1;
> @@ -1333,7 +1346,8 @@ rte_epoll_ctl(int epfd, int op, int fd,
>                 epfd = rte_intr_tls_epfd();
>
>         if (op == EPOLL_CTL_ADD) {
> -               event->status = RTE_EPOLL_VALID;
> +               __atomic_store_n(&event->status, RTE_EPOLL_VALID,
> +                               __ATOMIC_RELAXED);
>                 event->fd = fd;  /* ignore fd in event */
>                 event->epfd = epfd;
>                 ev.data.ptr = (void *)event;
> @@ -1345,11 +1359,13 @@ rte_epoll_ctl(int epfd, int op, int fd,
>                         op, fd, strerror(errno));
>                 if (op == EPOLL_CTL_ADD)
>                         /* rollback status when CTL_ADD fail */
> -                       event->status = RTE_EPOLL_INVALID;
> +                       __atomic_store_n(&event->status, RTE_EPOLL_INVALID,
> +                                       __ATOMIC_RELAXED);
>                 return -1;
>         }
>
> -       if (op == EPOLL_CTL_DEL && event->status != RTE_EPOLL_INVALID)
> +       if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
> +                       __ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
>                 eal_epoll_data_safe_free(event);
>
>         return 0;
> @@ -1378,7 +1394,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
>         case RTE_INTR_EVENT_ADD:
>                 epfd_op = EPOLL_CTL_ADD;
>                 rev = &intr_handle->elist[efd_idx];
> -               if (rev->status != RTE_EPOLL_INVALID) {
> +               if (__atomic_load_n(&rev->status,
> +                               __ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
>                         RTE_LOG(INFO, EAL, "Event already been added.\n");
>                         return -EEXIST;
>                 }
> @@ -1401,7 +1418,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
>         case RTE_INTR_EVENT_DEL:
>                 epfd_op = EPOLL_CTL_DEL;
>                 rev = &intr_handle->elist[efd_idx];
> -               if (rev->status == RTE_EPOLL_INVALID) {
> +               if (__atomic_load_n(&rev->status,
> +                               __ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
>                         RTE_LOG(INFO, EAL, "Event does not exist.\n");
>                         return -EPERM;
>                 }
> @@ -1426,12 +1444,12 @@ rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle)
>
>         for (i = 0; i < intr_handle->nb_efd; i++) {
>                 rev = &intr_handle->elist[i];
> -               if (rev->status == RTE_EPOLL_INVALID)
> +               if (__atomic_load_n(&rev->status,
> +                               __ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
>                         continue;
>                 if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
>                         /* force free if the entry valid */
>                         eal_epoll_data_safe_free(rev);
> -                       rev->status = RTE_EPOLL_INVALID;
>                 }
>         }
>  }
> --
> 2.7.4
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] devtools: fix ninja break under default DESTDIR path
@ 2020-07-09  6:53  4% Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09  6:53 UTC (permalink / raw)
  To: david.marchand, dev; +Cc: Honnappa.Nagarahalli, Ruifeng.Wang, nd

If DPDK_ABI_REF_DIR is not set, the default DESTDIR is a relative path.
This will break ninja in the ABI check test.

Fixes: 777014e56d07 ("devtools: add ABI checks")

Signed-off-by: Phil Yang <phil.yang@arm.com>
---
 devtools/test-meson-builds.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index a87de63..2bfcaca 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -143,7 +143,7 @@ build () # <directory> <target compiler | cross file> <meson options>
 	config $srcdir $builds_dir/$targetdir $cross --werror $*
 	compile $builds_dir/$targetdir
 	if [ -n "$DPDK_ABI_REF_VERSION" ]; then
-		abirefdir=${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION
+		abirefdir=${DPDK_ABI_REF_DIR:-$(pwd)/reference}/$DPDK_ABI_REF_VERSION
 		if [ ! -d $abirefdir/$targetdir ]; then
 			# clone current sources
 			if [ ! -d $abirefdir/src ]; then
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins for interrupt status
    @ 2020-07-09  6:46  3% ` Phil Yang
  2020-07-09  8:02  0%   ` Stefan Puiu
  2020-07-09  8:34  2%   ` [dpdk-dev] [PATCH v3] " Phil Yang
  1 sibling, 2 replies; 200+ results
From: Phil Yang @ 2020-07-09  6:46 UTC (permalink / raw)
  To: david.marchand, dev
  Cc: mdr, aconole, drc, Honnappa.Nagarahalli, Ruifeng.Wang, nd, dodji,
	nhorman, hkalra

The event status is defined as a volatile variable and shared between
threads. Use c11 atomic built-ins with explicit ordering instead of
rte_atomic ops which enforce unnecessary barriers on aarch64.

The event status has been cleaned up by the compare-and-swap operation
when we free the event data, so there is no need to set it to invalid
after that.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
---
v2:
1. Fixed typo.
2. Updated libabigail.abignore to pass ABI check.
3. Merged v1 two patches into one patch.

 devtools/libabigail.abignore                |  4 +++
 lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
 lib/librte_eal/linux/eal_interrupts.c       | 48 ++++++++++++++++++++---------
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 0133f75..daa4631 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -48,6 +48,10 @@
         changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
 [suppress_variable]
         name = rte_crypto_aead_algorithm_strings
+; Ignore updates of epoll event
+[suppress_type]
+        type_kind = struct
+        name = rte_epoll_event
 
 ;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till DPDK 20.11
diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
index 773a34a..b1e8a29 100644
--- a/lib/librte_eal/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/include/rte_eal_interrupts.h
@@ -59,7 +59,7 @@ enum {
 
 /** interrupt epoll event obj, taken by epoll_event.ptr */
 struct rte_epoll_event {
-	volatile uint32_t status;  /**< OUT: event status */
+	uint32_t status;           /**< OUT: event status */
 	int fd;                    /**< OUT: event fd */
 	int epfd;       /**< OUT: epoll instance the ev associated with */
 	struct rte_epoll_data epdata;
diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
index 84eeaa1..7a50869 100644
--- a/lib/librte_eal/linux/eal_interrupts.c
+++ b/lib/librte_eal/linux/eal_interrupts.c
@@ -26,7 +26,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_log.h>
@@ -1221,11 +1220,18 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
 {
 	unsigned int i, count = 0;
 	struct rte_epoll_event *rev;
+	uint32_t valid_status;
 
 	for (i = 0; i < n; i++) {
 		rev = evs[i].data.ptr;
-		if (!rev || !rte_atomic32_cmpset(&rev->status, RTE_EPOLL_VALID,
-						 RTE_EPOLL_EXEC))
+		valid_status =  RTE_EPOLL_VALID;
+		/* ACQUIRE memory ordering here pairs with RELEASE
+		 * ordering bellow acting as a lock to synchronize
+		 * the event data updating.
+		 */
+		if (!rev || !__atomic_compare_exchange_n(&rev->status,
+				    &valid_status, RTE_EPOLL_EXEC, 0,
+				    __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
 			continue;
 
 		events[count].status        = RTE_EPOLL_VALID;
@@ -1237,8 +1243,11 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
 			rev->epdata.cb_fun(rev->fd,
 					   rev->epdata.cb_arg);
 
-		rte_compiler_barrier();
-		rev->status = RTE_EPOLL_VALID;
+		/* the status update should be observed after
+		 * the other fields changes.
+		 */
+		__atomic_store_n(&rev->status, RTE_EPOLL_VALID,
+				__ATOMIC_RELEASE);
 		count++;
 	}
 	return count;
@@ -1308,10 +1317,14 @@ rte_epoll_wait(int epfd, struct rte_epoll_event *events,
 static inline void
 eal_epoll_data_safe_free(struct rte_epoll_event *ev)
 {
-	while (!rte_atomic32_cmpset(&ev->status, RTE_EPOLL_VALID,
-				    RTE_EPOLL_INVALID))
-		while (ev->status != RTE_EPOLL_VALID)
+	uint32_t valid_status = RTE_EPOLL_VALID;
+	while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
+		    RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+		while (__atomic_load_n(&ev->status,
+				__ATOMIC_RELAXED) != RTE_EPOLL_VALID)
 			rte_pause();
+		valid_status = RTE_EPOLL_VALID;
+	}
 	memset(&ev->epdata, 0, sizeof(ev->epdata));
 	ev->fd = -1;
 	ev->epfd = -1;
@@ -1333,7 +1346,8 @@ rte_epoll_ctl(int epfd, int op, int fd,
 		epfd = rte_intr_tls_epfd();
 
 	if (op == EPOLL_CTL_ADD) {
-		event->status = RTE_EPOLL_VALID;
+		__atomic_store_n(&event->status, RTE_EPOLL_VALID,
+				__ATOMIC_RELAXED);
 		event->fd = fd;  /* ignore fd in event */
 		event->epfd = epfd;
 		ev.data.ptr = (void *)event;
@@ -1345,11 +1359,13 @@ rte_epoll_ctl(int epfd, int op, int fd,
 			op, fd, strerror(errno));
 		if (op == EPOLL_CTL_ADD)
 			/* rollback status when CTL_ADD fail */
-			event->status = RTE_EPOLL_INVALID;
+			__atomic_store_n(&event->status, RTE_EPOLL_INVALID,
+					__ATOMIC_RELAXED);
 		return -1;
 	}
 
-	if (op == EPOLL_CTL_DEL && event->status != RTE_EPOLL_INVALID)
+	if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
+			__ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
 		eal_epoll_data_safe_free(event);
 
 	return 0;
@@ -1378,7 +1394,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 	case RTE_INTR_EVENT_ADD:
 		epfd_op = EPOLL_CTL_ADD;
 		rev = &intr_handle->elist[efd_idx];
-		if (rev->status != RTE_EPOLL_INVALID) {
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event already been added.\n");
 			return -EEXIST;
 		}
@@ -1401,7 +1418,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 	case RTE_INTR_EVENT_DEL:
 		epfd_op = EPOLL_CTL_DEL;
 		rev = &intr_handle->elist[efd_idx];
-		if (rev->status == RTE_EPOLL_INVALID) {
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event does not exist.\n");
 			return -EPERM;
 		}
@@ -1426,12 +1444,12 @@ rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle)
 
 	for (i = 0; i < intr_handle->nb_efd; i++) {
 		rev = &intr_handle->elist[i];
-		if (rev->status == RTE_EPOLL_INVALID)
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
 			continue;
 		if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
 			/* force free if the entry valid */
 			eal_epoll_data_safe_free(rev);
-			rev->status = RTE_EPOLL_INVALID;
 		}
 	}
 }
-- 
2.7.4


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 2/3] ring: remove experimental tag for ring element APIs
    2020-07-09  6:12  3%   ` [dpdk-dev] [PATCH v2 1/3] ring: remove experimental tag for ring reset API Feifei Wang
@ 2020-07-09  6:12  3%   ` Feifei Wang
  1 sibling, 0 replies; 200+ results
From: Feifei Wang @ 2020-07-09  6:12 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev, Ray Kinsella, Neil Horman
  Cc: dev, nd, Ruifeng.wang, Feifei Wang

Remove the experimental tag for rte_ring_xxx_elem APIs that have been
around for 2 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v2:
1. add the changed API into DPDK_21 ABI in the map file. (Ray)

 lib/librte_ring/rte_ring.h           |  5 +----
 lib/librte_ring/rte_ring_elem.h      |  8 --------
 lib/librte_ring/rte_ring_version.map | 10 ++--------
 3 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 7181c33b4..35f3f8c42 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -40,6 +40,7 @@ extern "C" {
 #endif
 
 #include <rte_ring_core.h>
+#include <rte_ring_elem.h>
 
 /**
  * Calculate the memory size needed for a ring
@@ -401,10 +402,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_SYNC_ST, free_space);
 }
 
-#ifdef ALLOW_EXPERIMENTAL_API
-#include <rte_ring_elem.h>
-#endif
-
 /**
  * Enqueue several objects on a ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 9e5192ae6..69dc51746 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -23,9 +23,6 @@ extern "C" {
 #include <rte_ring_core.h>
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Calculate the memory size needed for a ring with given element size
  *
  * This function returns the number of bytes needed for a ring, given
@@ -43,13 +40,9 @@ extern "C" {
  *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
  *		 power of 2.
  */
-__rte_experimental
 ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Create a new ring named *name* that stores elements with given size.
  *
  * This function uses ``memzone_reserve()`` to allocate memory. Then it
@@ -109,7 +102,6 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *    - EEXIST - a memzone with the same name already exists
  *    - ENOMEM - no appropriate memory area found in which to create memzone
  */
-__rte_experimental
 struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
 			unsigned int count, int socket_id, unsigned int flags);
 
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 9a6ce4d32..ac392f3ca 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -15,13 +15,7 @@ DPDK_20.0 {
 DPDK_21 {
 	global:
 
-	rte_ring_reset;
-} DPDK_20.0;
-
-EXPERIMENTAL {
-	global:
-
-	# added in 20.02
 	rte_ring_create_elem;
 	rte_ring_get_memsize_elem;
-};
+	rte_ring_reset;
+} DPDK_20.0;
-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 1/3] ring: remove experimental tag for ring reset API
  @ 2020-07-09  6:12  3%   ` Feifei Wang
  2020-07-09  6:12  3%   ` [dpdk-dev] [PATCH v2 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
  1 sibling, 0 replies; 200+ results
From: Feifei Wang @ 2020-07-09  6:12 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev, Ray Kinsella, Neil Horman
  Cc: dev, nd, Ruifeng.wang, Feifei Wang

Remove the experimental tag for rte_ring_reset API that have been around
for 4 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v2:
1. add the changed API into DPDK_21 ABI in the map file. (Ray)

 lib/librte_ring/rte_ring.h           | 3 ---
 lib/librte_ring/rte_ring_version.map | 7 +++++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index f67141482..7181c33b4 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
  *
  * This function flush all the elements in a ring
  *
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * @warning
  * Make sure the ring is not in use while calling this function.
  *
  * @param r
  *   A pointer to the ring structure.
  */
-__rte_experimental
 void
 rte_ring_reset(struct rte_ring *r);
 
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index e88c143cf..9a6ce4d32 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -12,11 +12,14 @@ DPDK_20.0 {
 	local: *;
 };
 
-EXPERIMENTAL {
+DPDK_21 {
 	global:
 
-	# added in 19.08
 	rte_ring_reset;
+} DPDK_20.0;
+
+EXPERIMENTAL {
+	global:
 
 	# added in 20.02
 	rte_ring_create_elem;
-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
  2020-07-08 15:04  0%     ` Kinsella, Ray
@ 2020-07-09  5:21  0%       ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09  5:21 UTC (permalink / raw)
  To: Kinsella, Ray, David Marchand, Aaron Conole
  Cc: dev, David Christensen, Honnappa Nagarahalli, Ruifeng Wang, nd,
	Dodji Seketeli, Neil Horman, Harman Kalra

> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: Wednesday, July 8, 2020 11:05 PM
> To: David Marchand <david.marchand@redhat.com>; Phil Yang
> <Phil.Yang@arm.com>; Aaron Conole <aconole@redhat.com>
> Cc: dev <dev@dpdk.org>; David Christensen <drc@linux.vnet.ibm.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; Dodji Seketeli
> <dodji@redhat.com>; Neil Horman <nhorman@tuxdriver.com>; Harman
> Kalra <hkalra@marvell.com>
> Subject: Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
> 
> 
> 
> On 08/07/2020 13:29, David Marchand wrote:
> > On Thu, Jun 11, 2020 at 12:25 PM Phil Yang <phil.yang@arm.com> wrote:
> >>
> >> The event status is defined as a volatile variable and shared
> >> between threads. Use c11 atomics with explicit ordering instead
> >> of rte_atomic ops which enforce unnecessary barriers on aarch64.
> >>
> >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> ---
> >>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
> >>  lib/librte_eal/linux/eal_interrupts.c       | 47 ++++++++++++++++++++----
> -----
> >>  2 files changed, 34 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h
> b/lib/librte_eal/include/rte_eal_interrupts.h
> >> index 773a34a..b1e8a29 100644
> >> --- a/lib/librte_eal/include/rte_eal_interrupts.h
> >> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
> >> @@ -59,7 +59,7 @@ enum {
> >>
> >>  /** interrupt epoll event obj, taken by epoll_event.ptr */
> >>  struct rte_epoll_event {
> >> -       volatile uint32_t status;  /**< OUT: event status */
> >> +       uint32_t status;           /**< OUT: event status */
> >>         int fd;                    /**< OUT: event fd */
> >>         int epfd;       /**< OUT: epoll instance the ev associated with */
> >>         struct rte_epoll_data epdata;
> >
> > I got a reject from the ABI check in my env.
> >
> > 1 function with some indirect sub-type change:
> >
> >   [C]'function int rte_pci_ioport_map(rte_pci_device*, int,
> > rte_pci_ioport*)' at pci.c:756:1 has some indirect sub-type changes:
> >     parameter 1 of type 'rte_pci_device*' has sub-type changes:
> >       in pointed to type 'struct rte_pci_device' at rte_bus_pci.h:57:1:
> >         type size hasn't changed
> >         1 data member changes (2 filtered):
> >          type of 'rte_intr_handle rte_pci_device::intr_handle' changed:
> >            type size hasn't changed
> >            1 data member change:
> >             type of 'rte_epoll_event rte_intr_handle::elist[512]' changed:
> >               array element type 'struct rte_epoll_event' changed:
> >                 type size hasn't changed
> >                 1 data member change:
> >                  type of 'volatile uint32_t rte_epoll_event::status' changed:
> >                    entity changed from 'volatile uint32_t' to 'typedef
> > uint32_t' at stdint-uintn.h:26:1
> >                    type size hasn't changed
> >
> >               type size hasn't changed
> >
> >
> > This is probably harmless in our case (going from volatile to non
> > volatile), but it won't pass the check in the CI without an exception
> > rule.
> >
> > Note: checking on the test-report ml, I saw nothing, but ovsrobot did
> > catch the issue with this change too, Aaron?
> >
> >
> Agreed, probably harmless and requires something in libagigail.ignore.

OK. Will update libagigail.ignore in the next version.

Thanks,
Phil


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-08 15:47  2% ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-08 16:05  0%   ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-08 16:05 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger,
	thomas, mb

> promote Acked-bt from previous patch version to maintain patchwork status accordingly

Acked-by: Olivier Matz <olivier.matz@6wind.com>

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
> Sent: Wednesday, July 8, 2020 18:47
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@monjalon.com;
> mb@smartsharesystems.com
> Subject: [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx
> scheduling
> 
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive) the
> packets at the very precisely specified moment of time provides the
> opportunity to support the connections with Time Division Multiplexing using
> the contemporary general purpose NIC without involving an auxiliary
> hardware. For example, the supporting of O-RAN Fronthaul interface is one
> of the promising features for potentially usage of the precise time
> management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications can
> provide the moment of time at what the packet transmission must be started
> and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not explicitly
> defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return the current
> device timestamp. The dynamic timestamp flag tells whether the field
> contains actual timestamp value. For the packets being sent this value can be
> used by PMD to schedule packet sending.
> 
> The device clock is opaque entity, the units and frequency are vendor specific
> and might depend on hardware capabilities and configurations. If might (or
> not) be synchronized with real time via PTP, might (or not) be synchronous
> with CPU clock (for example if NIC and CPU share the same clock source
> there might be no any drift between the NIC and CPU clocks), etc.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> obsoleting, these dynamic flag and field will be used to manage the
> timestamps on receiving datapath as well. Having the dedicated flags for
> Rx/Tx timestamps allows applications not to perform explicit flags reset on
> forwarding and not to promote received timestamps to the transmitting
> datapath by default. The static PKT_RX_TIMESTAMP is considered as
> candidate to become the dynamic flag.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it
> tries to synchronize the time of packet appearing on the wire with the
> specified packet timestamp. If the specified one is in the past it should be
> ignored, if one is in the distant future it should be capped with some
> reasonable value (in range of seconds). These specific cases ("too late" and
> "distant future") can be optionally reported via device xstats to assist
> applications to detect the time-related problems.
> 
> There is no any packet reordering according timestamps is supposed, neither
> within packet burst, nor between packets, it is an entirely application
> responsibility to generate packets and its timestamps in desired order. The
> timestamps can be put only in the first packet in the burst providing the
> entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp with
> new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API about
> reporting/managing the supported dynamic flags and its related features.
> This API would break ABI compatibility and can't be introduced at the
> moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific delays
> between the packets within the burst and specific delay between the bursts.
> The rte_eth_get_clock is supposed to be engaged to get the current device
> clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> ---
>   v1->v4:
>      - dedicated dynamic Tx timestamp flag instead of shared with Rx
>   v4->v5:
>      - elaborated commit message
>      - more words about device clocks added,
>      - note about dedicated Rx/Tx timestamp flags added
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h |
> 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
> 
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */  #define
> DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> 
> +/** Device supports send on timestamp */ #define
> +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**< Device supports Rx queue setup after device started*/  #define
> RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h index
> 96c3631..8407230 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> 
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices
> +allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field
> +contains
> + * actual timestamp value for the packets being sent, this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> set on the
> + * packet being sent it tries to synchronize the time of packet
> +appearing
> + * on the wire with the specified packet timestamp. If the specified
> +one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according to timestamps is
> +supposed,
> + * neither for packet within the burst, nor for the whole bursts, it is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> "rte_dynflag_tx_timestamp"
> +
>  #endif
> --
> 1.8.3.1


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 15:27  0%       ` Morten Brørup
@ 2020-07-08 15:51  0%         ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-08 15:51 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

Hi, Morten

Addressed most of your comments in the v5 commit message.
Header file comments are close to become too wordy,
and I did not dare to elaborate ones more.

With best regards, Slava

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, July 8, 2020 18:27
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@mellanox.net
> Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet
> Txscheduling
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> > Sent: Wednesday, July 8, 2020 4:54 PM
> >
> > Hi, Morten
> >
> > Thank you for the comments. Please, see below.
> >
> > > -----Original Message-----
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Wednesday, July 8, 2020 17:16
> > > To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > > Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> > > <rasland@mellanox.com>; olivier.matz@6wind.com;
> > > bernard.iremonger@intel.com; thomas@mellanox.net
> > > Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate
> > packet
> > > Txscheduling
> > >
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > > > Ovsiienko
> > > > Sent: Tuesday, July 7, 2020 4:57 PM
> > > >
> > > > There is the requirement on some networks for precise traffic
> > timing
> > > > management. The ability to send (and, generally speaking, receive)
> > the
> > > > packets at the very precisely specified moment of time provides
> > > > the opportunity to support the connections with Time Division
> > Multiplexing
> > > > using the contemporary general purpose NIC without involving an
> > > > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > > > interface is one of the promising features for potentially usage
> > > > of the precise time management for the egress packets.
> > > >
> > > > The main objective of this RFC is to specify the way how
> > applications
> > > > can provide the moment of time at what the packet transmission
> > > > must
> > be
> > > > started and to describe in preliminary the supporting this feature
> > > > from
> > > > mlx5 PMD side.
> > > >
> > > > The new dynamic timestamp field is proposed, it provides some
> > timing
> > > > information, the units and time references (initial phase) are not
> > > > explicitly defined but are maintained always the same for a given
> > port.
> > > > Some devices allow to query rte_eth_read_clock() that will return
> > the
> > > > current device timestamp. The dynamic timestamp flag tells whether
> > the
> > > > field contains actual timestamp value. For the packets being sent
> > this
> > > > value can be used by PMD to schedule packet sending.
> > > >
> > > > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > and
> > > > obsoleting, these dynamic flag and field will be used to manage
> > > > the timestamps on receiving datapath as well.
> > > >
> > > > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > > > sent it tries to synchronize the time of packet appearing on the
> > wire
> > > > with the specified packet timestamp. If the specified one is in
> > > > the past it should be ignored, if one is in the distant future it
> > should
> > > > be capped with some reasonable value (in range of seconds). These
> > > > specific cases ("too late" and "distant future") can be optionally
> > > > reported via device xstats to assist applications to detect the
> > > > time-related problems.
> > > >
> > > > There is no any packet reordering according timestamps is
> > > > supposed, neither within packet burst, nor between packets, it is
> > > > an entirely application responsibility to generate packets and its
> > > > timestamps
> > in
> > > > desired order. The timestamps can be put only in the first packet
> > in
> > > > the burst providing the entire burst scheduling.
> > > >
> > > > PMD reports the ability to synchronize packet sending on timestamp
> > > > with new offload flag:
> > > >
> > > > This is palliative and is going to be replaced with new eth_dev
> > > > API about reporting/managing the supported dynamic flags and its
> > related
> > > > features. This API would break ABI compatibility and can't be
> > > > introduced at the moment, so is postponed to 20.11.
> > > >
> > > > For testing purposes it is proposed to update testpmd "txonly"
> > > > forwarding mode routine. With this update testpmd application
> > > > generates the packets and sets the dynamic timestamps according to
> > > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> > registered.
> > > >
> > > > The new testpmd command is proposed to configure sending pattern:
> > > >
> > > > set tx_times <burst_gap>,<intra_gap>
> > > >
> > > > <intra_gap> - the delay between the packets within the burst
> > > >               specified in the device clock units. The number
> > > >               of packets in the burst is defined by txburst
> > parameter
> > > >
> > > > <burst_gap> - the delay between the bursts in the device clock
> > units
> > > >
> > > > As the result the bursts of packet will be transmitted with
> > specific
> > > > delays between the packets within the burst and specific delay
> > between
> > > > the bursts. The rte_eth_get_clock is supposed to be engaged to get
> > the
> > > > current device clock value and provide the reference for the
> > > > timestamps.
> > > >
> > > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > > ---
> > > >  v1->v4:
> > > >     - dedicated dynamic Tx timestamp flag instead of shared with
> > > > Rx
> > >
> > > The detailed description above should be updated to reflect that it
> > is now
> > > two flags.
> > OK
> >
> > >
> > > >     - Doxygen-style comment
> > > >     - comments update
> > > >
> > > > ---
> > > >  lib/librte_ethdev/rte_ethdev.c |  1 +
> > lib/librte_ethdev/rte_ethdev.h
> > > > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > > > +++++++++++++++++++++++++++++++
> > > >  3 files changed, 36 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > > > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.c
> > > > +++ b/lib/librte_ethdev/rte_ethdev.c
> > > > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> > > >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> > > >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> > > >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > > > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> > > >  };
> > > >
> > > >  #undef RTE_TX_OFFLOAD_BIT2STR
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> > > >  /** Device supports outer UDP checksum */  #define
> > > > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> > > >
> > > > +/** Device supports send on timestamp */ #define
> > > > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > > > +
> > > > +
> > > >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP
> 0x00000001
> > > /**<
> > > > Device supports Rx queue setup after device started*/  #define
> > > > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --
> git
> > > > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > index 96c3631..7e9f7d2 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char
> *name,
> > > > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> > > "rte_flow_dynfield_metadata"
> > > >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> > > "rte_flow_dynflag_metadata"
> > > >
> > > > +/**
> > > > + * The timestamp dynamic field provides some timing information,
> > the
> > > > + * units and time references (initial phase) are not explicitly
> > > > defined
> > > > + * but are maintained always the same for a given port. Some
> > devices
> > > > allow4
> > > > + * to query rte_eth_read_clock() that will return the current
> > device
> > > > + * timestamp. The dynamic Tx timestamp flag tells whether the
> > field
> > > > contains
> > > > + * actual timestamp value. For the packets being sent this value
> > can
> > > > be
> > > > + * used by PMD to schedule packet sending.
> > > > + *
> > > > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field
> > deprecation
> > > > + * and obsoleting, the dedicated Rx timestamp flag is supposed to
> > be
> > > > + * introduced and the shared dynamic timestamp field will be used
> > > > + * to handle the timestamps on receiving datapath as well.
> > > > + */
> > > > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > > "rte_dynfield_timestamp"
> > >
> > > The description above should not say anything about the dynamic TX
> > > timestamp flag.
> > It does not. Or do you mean RX?
> > Not sure, field and flag are tightly coupled, it is nice to mention
> > this relation for better understanding.
> > And mentioning the RX explains why it is not like this:
> > RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME
> 
> Sorry. I misunderstood its purpose!
> It's the name of the field, and the field will not only be used for RX, but in the
> future also for RX.
> (I thought it was the name of the RX flag, reserved for future use.)
> 
> >
> > >
> > > Please elaborate "some timing information", e.g. add "... about when
> > the
> > > packet was received".
> >
> > Sorry, I do not follow,  currently the dynamic field is not "about
> > when the packet was received". Now it is introduced for Tx only and
> > just the opportunity to be shared with Rx one in coming releases is
> > mentioned. "Some" means - not specified (herein) exactly.
> > And it is elaborated what Is not specified and how it is supposed to
> > use Tx timestamp.
> 
> It should be described when it is valid, and how it is being used, e.g. by
> adding a reference to the "rte_dynflag_tx_timestamp" flag.
> 
> > >
> > > > +
> > > > +/**
> > > > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> flag
> > > set on
> > > > the
> > > > + * packet being sent it tries to synchronize the time of packet
> > > > appearing
> > > > + * on the wire with the specified packet timestamp. If the
> > specified
> > > > one
> > > > + * is in the past it should be ignored, if one is in the distant
> > > > future
> > > > + * it should be capped with some reasonable value (in range of
> > > > seconds).
> > > > + *
> > > > + * There is no any packet reordering according to timestamps is
> > > > supposed,
> > > > + * neither for packet within the burst, nor for the whole bursts,
> > it
> > > > is
> > > > + * an entirely application responsibility to generate packets and
> > its
> > > > + * timestamps in desired order. The timestamps might be put only
> > in
> > > > + * the first packet in the burst providing the entire burst
> > > > scheduling.
> > > > + */
> > > > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> > > "rte_dynflag_tx_timestamp"
> > > > +
> > > >  #endif
> > > > --
> > > > 1.8.3.1
> > > >
> > >
> > > It may be worth adding some documentation about how the clocks of
> > > the NICs are out of sync with the clock of the CPU, and are all
> > > drifting
> > relatively.
> > >
> > > And those clocks are also out of sync with the actual time (NTP
> > clock).
> >
> > IMO, It is out of scope of this very generic patch.  As for mlx NICs -
> > the internal device clock might be (or might be not) synchronized with
> > PTP, it can provide timestamps in real nanoseconds in various formats
> > or just some free running counter.
> 
> Cool!
> 
> > On some systems the NIC and CPU might share the same clock source (for
> > their PLL inputs for example) and there will be no any drifts. As we
> > can see - it is a wide and interesting opic to discuss, but, IMO,  the
> > comment in header file might be not the most relevant place to do. As
> > for mlx5 devices clock specifics - it will be documented in PMD
> > chapter.
> >
> > OK, will add few generic words, the few ones - in order not to make
> > comment wordy, just point the direction for further thinking.
> 
> I agree - we don't want cookbooks in the header files. Only enough
> description to avoid the worst misunderstandings.
> 
> >
> > >
> > > Preferably, some sort of cookbook for handling this should be
> > provided.
> > > PCAP could be used as an example.
> > >
> > testpmd example is included in series, mlx5 PMD patch is prepared and
> > coming soon.
> 
> Great.
> 
> And I suppose that the more detailed cookbook/example - regarding offset
> and drift of various clocks - is probably more relevant for the RX side (for
> various PCAP applications), and thus completely unrelated to this patch.
> 
> >
> > With best regards, Slava


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (4 preceding siblings ...)
  2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
@ 2020-07-08 15:47  2% ` Viacheslav Ovsiienko
  2020-07-08 16:05  0%   ` Slava Ovsiienko
  2020-07-09 12:36  2% ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
  6 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-08 15:47 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas, mb

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well. Having the dedicated
flags for Rx/Tx timestamps allows applications not to perform explicit
flags reset on forwarding and not to promote received timestamps
to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
is considered as candidate to become the dynamic flag.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
  v1->v4:
     - dedicated dynamic Tx timestamp flag instead of shared with Rx
  v4->v5:
     - elaborated commit message
     - more words about device clocks added,
     - note about dedicated Rx/Tx timestamp flags added

---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..8407230 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value for the packets being sent, this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
  2020-07-08 14:30  2%     ` David Marchand
@ 2020-07-08 15:34  5%       ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-08 15:34 UTC (permalink / raw)
  To: David Marchand
  Cc: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, dev, Ananyev,
	Konstantin, Honnappa Nagarahalli, nd, nd


> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, July 8, 2020 10:30 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>; dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
> 
> On Tue, Jul 7, 2020 at 5:16 PM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > b9d49ac87..7889f21b3 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #ifndef _RTE_LPM_H_
> > @@ -20,6 +21,7 @@
> >  #include <rte_memory.h>
> >  #include <rte_common.h>
> >  #include <rte_vect.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> > @@ -62,6 +64,17 @@ extern "C" {
> >  /** Bitmask used to indicate successful lookup */
> >  #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
> >
> > +/** @internal Default RCU defer queue entries to reclaim in one go. */
> > +#define RTE_LPM_RCU_DQ_RECLAIM_MAX     16
> > +
> > +/** RCU reclamation modes */
> > +enum rte_lpm_qsbr_mode {
> > +       /** Create defer queue for reclaim. */
> > +       RTE_LPM_QSBR_MODE_DQ = 0,
> > +       /** Use blocking mode reclaim. No defer queue created. */
> > +       RTE_LPM_QSBR_MODE_SYNC
> > +};
> > +
> >  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> >  /** @internal Tbl24 entry structure. */  __extension__ @@ -130,6
> > +143,28 @@ struct rte_lpm {
> >                         __rte_cache_aligned; /**< LPM tbl24 table. */
> >         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       /* RCU config. */
> > +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> > +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> > +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> > +#endif
> > +};
> 
> I can see failures in travis reports for v7 and v6.
> I reproduced them in my env.
> 
> 1 function with some indirect sub-type change:
> 
>   [C]'function int rte_lpm_add(rte_lpm*, uint32_t, uint8_t, uint32_t)'
> at rte_lpm.c:764:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_lpm*' has sub-type changes:
>       in pointed to type 'struct rte_lpm' at rte_lpm.h:134:1:
>         type size hasn't changed
>         3 data member insertions:
>           'rte_rcu_qsbr* rte_lpm::v', at offset 536873600 (in bits) at
> rte_lpm.h:148:1
>           'rte_lpm_qsbr_mode rte_lpm::rcu_mode', at offset 536873664 (in bits)
> at rte_lpm.h:149:1
>           'rte_rcu_qsbr_dq* rte_lpm::dq', at offset 536873728 (in
> bits) at rte_lpm.h:150:1
> 
Sorry, I thought if ALLOW_EXPERIMENTAL was added, ABI would be kept when experimental was not allowed by user.
ABI and ALLOW_EXPERIMENTAL should be two different things.

> 
> Going back to my proposal of hiding what does not need to be seen.
> 
> Disclaimer, *this is quick & dirty* but it builds and passes ABI check:
> 
> $ git diff
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> d498ba761..7109aef6a 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
I understand your proposal in v5 now. A new data structure encloses rte_lpm and new members that for RCU use.
In this way, rte_lpm ABI is kept. And we can move out other members in rte_lpm that not need to be exposed in 20.11 release.
I will fix the ABI issue in next version.

> @@ -115,6 +115,15 @@ rte_lpm_find_existing(const char *name)
>         return l;
>  }
> 
> +struct internal_lpm {
> +       /* Public object */
> +       struct rte_lpm lpm;
> +       /* RCU config. */
> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> +};
> +
>  /*
>   * Allocates memory for LPM object
>   */
> @@ -123,6 +132,7 @@ rte_lpm_create(const char *name, int socket_id,
>                 const struct rte_lpm_config *config)  {
>         char mem_name[RTE_LPM_NAMESIZE];
> +       struct internal_lpm *internal = NULL;
>         struct rte_lpm *lpm = NULL;
>         struct rte_tailq_entry *te;
>         uint32_t mem_size, rules_size, tbl8s_size; @@ -141,12 +151,6 @@
> rte_lpm_create(const char *name, int socket_id,
> 
>         snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
> 
> -       /* Determine the amount of memory to allocate. */
> -       mem_size = sizeof(*lpm);
> -       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> -       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> -                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config-
> >number_tbl8s);
> -
>         rte_mcfg_tailq_write_lock();
> 
>         /* guarantee there's no existing */ @@ -170,16 +174,23 @@
> rte_lpm_create(const char *name, int socket_id,
>                 goto exit;
>         }
> 
> +       /* Determine the amount of memory to allocate. */
> +       mem_size = sizeof(*internal);
> +       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> +       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> +                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> + config->number_tbl8s);
> +
>         /* Allocate memory to store the LPM data structures. */
> -       lpm = rte_zmalloc_socket(mem_name, mem_size,
> +       internal = rte_zmalloc_socket(mem_name, mem_size,
>                         RTE_CACHE_LINE_SIZE, socket_id);
> -       if (lpm == NULL) {
> +       if (internal == NULL) {
>                 RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
>                 rte_free(te);
>                 rte_errno = ENOMEM;
>                 goto exit;
>         }
> 
> +       lpm = &internal->lpm;
>         lpm->rules_tbl = rte_zmalloc_socket(NULL,
>                         (size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
> 
> @@ -226,6 +237,7 @@ rte_lpm_create(const char *name, int socket_id,
> void  rte_lpm_free(struct rte_lpm *lpm)  {
> +       struct internal_lpm *internal;
>         struct rte_lpm_list *lpm_list;
>         struct rte_tailq_entry *te;
> 
> @@ -247,8 +259,9 @@ rte_lpm_free(struct rte_lpm *lpm)
> 
>         rte_mcfg_tailq_write_unlock();
> 
> -       if (lpm->dq)
> -               rte_rcu_qsbr_dq_delete(lpm->dq);
> +       internal = container_of(lpm, struct internal_lpm, lpm);
> +       if (internal->dq != NULL)
> +               rte_rcu_qsbr_dq_delete(internal->dq);
>         rte_free(lpm->tbl8);
>         rte_free(lpm->rules_tbl);
>         rte_free(lpm);
> @@ -276,13 +289,15 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
> rte_lpm_rcu_config *cfg,  {
>         char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
>         struct rte_rcu_qsbr_dq_parameters params = {0};
> +       struct internal_lpm *internal;
> 
> -       if ((lpm == NULL) || (cfg == NULL)) {
> +       if (lpm == NULL || cfg == NULL) {
>                 rte_errno = EINVAL;
>                 return 1;
>         }
> 
> -       if (lpm->v) {
> +       internal = container_of(lpm, struct internal_lpm, lpm);
> +       if (internal->v != NULL) {
>                 rte_errno = EEXIST;
>                 return 1;
>         }
> @@ -305,20 +320,19 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
> rte_lpm_rcu_config *cfg,
>                 params.free_fn = __lpm_rcu_qsbr_free_resource;
>                 params.p = lpm;
>                 params.v = cfg->v;
> -               lpm->dq = rte_rcu_qsbr_dq_create(&params);
> -               if (lpm->dq == NULL) {
> -                       RTE_LOG(ERR, LPM,
> -                                       "LPM QS defer queue creation failed\n");
> +               internal->dq = rte_rcu_qsbr_dq_create(&params);
> +               if (internal->dq == NULL) {
> +                       RTE_LOG(ERR, LPM, "LPM QS defer queue creation
> failed\n");
>                         return 1;
>                 }
>                 if (dq)
> -                       *dq = lpm->dq;
> +                       *dq = internal->dq;
>         } else {
>                 rte_errno = EINVAL;
>                 return 1;
>         }
> -       lpm->rcu_mode = cfg->mode;
> -       lpm->v = cfg->v;
> +       internal->rcu_mode = cfg->mode;
> +       internal->v = cfg->v;
> 
>         return 0;
>  }
> @@ -502,12 +516,13 @@ _tbl8_alloc(struct rte_lpm *lpm)  static int32_t
> tbl8_alloc(struct rte_lpm *lpm)  {
> +       struct internal_lpm *internal = container_of(lpm, struct
> internal_lpm, lpm);
>         int32_t group_idx; /* tbl8 group index. */
> 
>         group_idx = _tbl8_alloc(lpm);
> -       if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
> +       if (group_idx == -ENOSPC && internal->dq != NULL) {
>                 /* If there are no tbl8 groups try to reclaim one. */
> -               if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
> +               if (rte_rcu_qsbr_dq_reclaim(internal->dq, 1, NULL,
> NULL, NULL) == 0)
>                         group_idx = _tbl8_alloc(lpm);
>         }
> 
> @@ -518,20 +533,21 @@ static void
>  tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)  {
>         struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +       struct internal_lpm *internal = container_of(lpm, struct
> internal_lpm, lpm);
> 
> -       if (!lpm->v) {
> +       if (internal->v == NULL) {
>                 /* Set tbl8 group invalid*/
>                 __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
>                                 __ATOMIC_RELAXED);
> -       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> +       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
>                 /* Wait for quiescent state change. */
> -               rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
> +               rte_rcu_qsbr_synchronize(internal->v,
> + RTE_QSBR_THRID_INVALID);
>                 /* Set tbl8 group invalid*/
>                 __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
>                                 __ATOMIC_RELAXED);
> -       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> +       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
>                 /* Push into QSBR defer queue. */
> -               rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
> +               rte_rcu_qsbr_dq_enqueue(internal->dq, (void
> *)&tbl8_group_start);
>         }
>  }
> 
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> 7889f21b3..a9568fcdd 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -143,12 +143,6 @@ struct rte_lpm {
>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */ -#ifdef
> ALLOW_EXPERIMENTAL_API
> -       /* RCU config. */
> -       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> -       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> -       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> -#endif
>  };
> 
>  /** LPM RCU QSBR configuration structure. */
> 
> 
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 14:54  0%     ` Slava Ovsiienko
@ 2020-07-08 15:27  0%       ` Morten Brørup
  2020-07-08 15:51  0%         ` Slava Ovsiienko
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2020-07-08 15:27 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> Sent: Wednesday, July 8, 2020 4:54 PM
> 
> Hi, Morten
> 
> Thank you for the comments. Please, see below.
> 
> > -----Original Message-----
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Wednesday, July 8, 2020 17:16
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> > <rasland@mellanox.com>; olivier.matz@6wind.com;
> > bernard.iremonger@intel.com; thomas@mellanox.net
> > Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate
> packet
> > Txscheduling
> >
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > > Ovsiienko
> > > Sent: Tuesday, July 7, 2020 4:57 PM
> > >
> > > There is the requirement on some networks for precise traffic
> timing
> > > management. The ability to send (and, generally speaking, receive)
> the
> > > packets at the very precisely specified moment of time provides the
> > > opportunity to support the connections with Time Division
> Multiplexing
> > > using the contemporary general purpose NIC without involving an
> > > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > > interface is one of the promising features for potentially usage of
> > > the precise time management for the egress packets.
> > >
> > > The main objective of this RFC is to specify the way how
> applications
> > > can provide the moment of time at what the packet transmission must
> be
> > > started and to describe in preliminary the supporting this feature
> > > from
> > > mlx5 PMD side.
> > >
> > > The new dynamic timestamp field is proposed, it provides some
> timing
> > > information, the units and time references (initial phase) are not
> > > explicitly defined but are maintained always the same for a given
> port.
> > > Some devices allow to query rte_eth_read_clock() that will return
> the
> > > current device timestamp. The dynamic timestamp flag tells whether
> the
> > > field contains actual timestamp value. For the packets being sent
> this
> > > value can be used by PMD to schedule packet sending.
> > >
> > > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and
> > > obsoleting, these dynamic flag and field will be used to manage the
> > > timestamps on receiving datapath as well.
> > >
> > > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > > sent it tries to synchronize the time of packet appearing on the
> wire
> > > with the specified packet timestamp. If the specified one is in the
> > > past it should be ignored, if one is in the distant future it
> should
> > > be capped with some reasonable value (in range of seconds). These
> > > specific cases ("too late" and "distant future") can be optionally
> > > reported via device xstats to assist applications to detect the
> > > time-related problems.
> > >
> > > There is no any packet reordering according timestamps is supposed,
> > > neither within packet burst, nor between packets, it is an entirely
> > > application responsibility to generate packets and its timestamps
> in
> > > desired order. The timestamps can be put only in the first packet
> in
> > > the burst providing the entire burst scheduling.
> > >
> > > PMD reports the ability to synchronize packet sending on timestamp
> > > with new offload flag:
> > >
> > > This is palliative and is going to be replaced with new eth_dev API
> > > about reporting/managing the supported dynamic flags and its
> related
> > > features. This API would break ABI compatibility and can't be
> > > introduced at the moment, so is postponed to 20.11.
> > >
> > > For testing purposes it is proposed to update testpmd "txonly"
> > > forwarding mode routine. With this update testpmd application
> > > generates the packets and sets the dynamic timestamps according to
> > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> registered.
> > >
> > > The new testpmd command is proposed to configure sending pattern:
> > >
> > > set tx_times <burst_gap>,<intra_gap>
> > >
> > > <intra_gap> - the delay between the packets within the burst
> > >               specified in the device clock units. The number
> > >               of packets in the burst is defined by txburst
> parameter
> > >
> > > <burst_gap> - the delay between the bursts in the device clock
> units
> > >
> > > As the result the bursts of packet will be transmitted with
> specific
> > > delays between the packets within the burst and specific delay
> between
> > > the bursts. The rte_eth_get_clock is supposed to be engaged to get
> the
> > > current device clock value and provide the reference for the
> > > timestamps.
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > ---
> > >  v1->v4:
> > >     - dedicated dynamic Tx timestamp flag instead of shared with Rx
> >
> > The detailed description above should be updated to reflect that it
> is now
> > two flags.
> OK
> 
> >
> > >     - Doxygen-style comment
> > >     - comments update
> > >
> > > ---
> > >  lib/librte_ethdev/rte_ethdev.c |  1 +
> lib/librte_ethdev/rte_ethdev.h
> > > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > > +++++++++++++++++++++++++++++++
> > >  3 files changed, 36 insertions(+)
> > >
> > > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.c
> > > +++ b/lib/librte_ethdev/rte_ethdev.c
> > > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> > >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> > >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> > >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> > >  };
> > >
> > >  #undef RTE_TX_OFFLOAD_BIT2STR
> > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> > >  /** Device supports outer UDP checksum */  #define
> > > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> > >
> > > +/** Device supports send on timestamp */ #define
> > > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > > +
> > > +
> > >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> > /**<
> > > Device supports Rx queue setup after device started*/  #define
> > > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > index 96c3631..7e9f7d2 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> > "rte_flow_dynfield_metadata"
> > >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> > "rte_flow_dynflag_metadata"
> > >
> > > +/**
> > > + * The timestamp dynamic field provides some timing information,
> the
> > > + * units and time references (initial phase) are not explicitly
> > > defined
> > > + * but are maintained always the same for a given port. Some
> devices
> > > allow4
> > > + * to query rte_eth_read_clock() that will return the current
> device
> > > + * timestamp. The dynamic Tx timestamp flag tells whether the
> field
> > > contains
> > > + * actual timestamp value. For the packets being sent this value
> can
> > > be
> > > + * used by PMD to schedule packet sending.
> > > + *
> > > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field
> deprecation
> > > + * and obsoleting, the dedicated Rx timestamp flag is supposed to
> be
> > > + * introduced and the shared dynamic timestamp field will be used
> > > + * to handle the timestamps on receiving datapath as well.
> > > + */
> > > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > "rte_dynfield_timestamp"
> >
> > The description above should not say anything about the dynamic TX
> > timestamp flag.
> It does not. Or do you mean RX?
> Not sure, field and flag are tightly coupled,
> it is nice to mention this relation for better understanding.
> And mentioning the RX explains why it is not like this:
> RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME

Sorry. I misunderstood its purpose!
It's the name of the field, and the field will not only be used for RX, but in the future also for RX.
(I thought it was the name of the RX flag, reserved for future use.)

> 
> >
> > Please elaborate "some timing information", e.g. add "... about when
> the
> > packet was received".
> 
> Sorry, I do not follow,  currently the dynamic field is not
> "about when the packet was received". Now it is introduced for Tx
> only and just the opportunity to be shared with Rx one in coming
> releases
> is mentioned. "Some" means - not specified (herein) exactly.
> And it is elaborated what Is not specified and how it is supposed
> to use Tx timestamp.

It should be described when it is valid, and how it is being used, e.g. by adding a reference to the "rte_dynflag_tx_timestamp" flag.

> >
> > > +
> > > +/**
> > > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> > set on
> > > the
> > > + * packet being sent it tries to synchronize the time of packet
> > > appearing
> > > + * on the wire with the specified packet timestamp. If the
> specified
> > > one
> > > + * is in the past it should be ignored, if one is in the distant
> > > future
> > > + * it should be capped with some reasonable value (in range of
> > > seconds).
> > > + *
> > > + * There is no any packet reordering according to timestamps is
> > > supposed,
> > > + * neither for packet within the burst, nor for the whole bursts,
> it
> > > is
> > > + * an entirely application responsibility to generate packets and
> its
> > > + * timestamps in desired order. The timestamps might be put only
> in
> > > + * the first packet in the burst providing the entire burst
> > > scheduling.
> > > + */
> > > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> > "rte_dynflag_tx_timestamp"
> > > +
> > >  #endif
> > > --
> > > 1.8.3.1
> > >
> >
> > It may be worth adding some documentation about how the clocks of the
> > NICs are out of sync with the clock of the CPU, and are all drifting
> relatively.
> >
> > And those clocks are also out of sync with the actual time (NTP
> clock).
> 
> IMO, It is out of scope of this very generic patch.  As for mlx NICs -
> the internal device
> clock might be (or might be not) synchronized with PTP, it can provide
> timestamps
> in real nanoseconds in various formats or just some free running
> counter.

Cool!

> On some systems the NIC and CPU might share the same clock source (for
> their PLL inputs
> for example) and there will be no any drifts. As we can see - it is a
> wide and interesting
> opic to discuss, but, IMO,  the comment in header file might be not the
> most relevant
> place to do. As for mlx5 devices clock specifics - it will be
> documented in PMD chapter.
> 
> OK, will add few generic words, the few ones - in order not to make
> comment wordy, just
> point the direction for further thinking.

I agree - we don't want cookbooks in the header files. Only enough description to avoid the worst misunderstandings.

> 
> >
> > Preferably, some sort of cookbook for handling this should be
> provided.
> > PCAP could be used as an example.
> >
> testpmd example is included in series, mlx5 PMD patch is prepared and
> coming soon.

Great.

And I suppose that the more detailed cookbook/example - regarding offset and drift of various clocks - is probably more relevant for the RX side (for various PCAP applications), and thus completely unrelated to this patch.

> 
> With best regards, Slava


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
  2020-07-08 12:29  3%   ` David Marchand
  2020-07-08 13:43  0%     ` Aaron Conole
@ 2020-07-08 15:04  0%     ` Kinsella, Ray
  2020-07-09  5:21  0%       ` Phil Yang
  1 sibling, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-08 15:04 UTC (permalink / raw)
  To: David Marchand, Phil Yang, Aaron Conole
  Cc: dev, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Neil Horman, Harman Kalra



On 08/07/2020 13:29, David Marchand wrote:
> On Thu, Jun 11, 2020 at 12:25 PM Phil Yang <phil.yang@arm.com> wrote:
>>
>> The event status is defined as a volatile variable and shared
>> between threads. Use c11 atomics with explicit ordering instead
>> of rte_atomic ops which enforce unnecessary barriers on aarch64.
>>
>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> ---
>>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>>  lib/librte_eal/linux/eal_interrupts.c       | 47 ++++++++++++++++++++---------
>>  2 files changed, 34 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
>> index 773a34a..b1e8a29 100644
>> --- a/lib/librte_eal/include/rte_eal_interrupts.h
>> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
>> @@ -59,7 +59,7 @@ enum {
>>
>>  /** interrupt epoll event obj, taken by epoll_event.ptr */
>>  struct rte_epoll_event {
>> -       volatile uint32_t status;  /**< OUT: event status */
>> +       uint32_t status;           /**< OUT: event status */
>>         int fd;                    /**< OUT: event fd */
>>         int epfd;       /**< OUT: epoll instance the ev associated with */
>>         struct rte_epoll_data epdata;
> 
> I got a reject from the ABI check in my env.
> 
> 1 function with some indirect sub-type change:
> 
>   [C]'function int rte_pci_ioport_map(rte_pci_device*, int,
> rte_pci_ioport*)' at pci.c:756:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_pci_device*' has sub-type changes:
>       in pointed to type 'struct rte_pci_device' at rte_bus_pci.h:57:1:
>         type size hasn't changed
>         1 data member changes (2 filtered):
>          type of 'rte_intr_handle rte_pci_device::intr_handle' changed:
>            type size hasn't changed
>            1 data member change:
>             type of 'rte_epoll_event rte_intr_handle::elist[512]' changed:
>               array element type 'struct rte_epoll_event' changed:
>                 type size hasn't changed
>                 1 data member change:
>                  type of 'volatile uint32_t rte_epoll_event::status' changed:
>                    entity changed from 'volatile uint32_t' to 'typedef
> uint32_t' at stdint-uintn.h:26:1
>                    type size hasn't changed
> 
>               type size hasn't changed
> 
> 
> This is probably harmless in our case (going from volatile to non
> volatile), but it won't pass the check in the CI without an exception
> rule.
> 
> Note: checking on the test-report ml, I saw nothing, but ovsrobot did
> catch the issue with this change too, Aaron?
> 
> 
Agreed, probably harmless and requires something in libagigail.ignore. 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter
  2020-07-08 13:30  4%       ` [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter Jerin Jacob
@ 2020-07-08 15:01  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-08 15:01 UTC (permalink / raw)
  To: Phil Yang, Jerin Jacob
  Cc: dev, Erik Gabriel Carrillo, Honnappa Nagarahalli,
	David Christensen, Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, nd, David Marchand, Ray Kinsella, Neil Horman,
	dodji, dpdk stable, Jerin Jacob

08/07/2020 15:30, Jerin Jacob:
> On Tue, Jul 7, 2020 at 9:25 PM Phil Yang <phil.yang@arm.com> wrote:
> >
> > The n_poll_lcores counter and poll_lcore array are shared between lcores
> > and the update of these variables are out of the protection of spinlock
> > on each lcore timer list. The read-modify-write operations of the counter
> > are not atomic, so it has the potential of race condition between lcores.
> >
> > Use c11 atomics with RELAXED ordering to prevent confliction.
> >
> > Fixes: cc7b73ea9e3b ("eventdev: add new software timer adapter")
> > Cc: erik.g.carrillo@intel.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> 
> Hi Thomas,
> 
> The latest version does not have ABI breakage issue.
> 
> I have added the ABI verifier in my local patch verification setup.
> 
> Series applied to dpdk-next-eventdev/master.
> 
> Please pull this series from dpdk-next-eventdev/master. Thanks.
> 
> I am marking this patch series as "Awaiting Upstream" in patchwork
> status to reflect the actual status.

OK, pulled and marked as Accepted in patchwork.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 14:16  0%   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
@ 2020-07-08 14:54  0%     ` Slava Ovsiienko
  2020-07-08 15:27  0%       ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Slava Ovsiienko @ 2020-07-08 14:54 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

Hi, Morten

Thank you for the comments. Please, see below.

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, July 8, 2020 17:16
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@mellanox.net
> Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet
> Txscheduling
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > Ovsiienko
> > Sent: Tuesday, July 7, 2020 4:57 PM
> >
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> >
> > The main objective of this RFC is to specify the way how applications
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well.
> >
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. If the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> >
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.
> >
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> >
> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <burst_gap>,<intra_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> > current device clock value and provide the reference for the
> > timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  v1->v4:
> >     - dedicated dynamic Tx timestamp flag instead of shared with Rx
> 
> The detailed description above should be updated to reflect that it is now
> two flags.
OK

> 
> >     - Doxygen-style comment
> >     - comments update
> >
> > ---
> >  lib/librte_ethdev/rte_ethdev.c |  1 +  lib/librte_ethdev/rte_ethdev.h
> > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > +++++++++++++++++++++++++++++++
> >  3 files changed, 36 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> >  };
> >
> >  #undef RTE_TX_OFFLOAD_BIT2STR
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > +
> > +
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**<
> > Device supports Rx queue setup after device started*/  #define
> > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > index 96c3631..7e9f7d2 100644
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
> >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> >
> > +/**
> > + * The timestamp dynamic field provides some timing information, the
> > + * units and time references (initial phase) are not explicitly
> > defined
> > + * but are maintained always the same for a given port. Some devices
> > allow4
> > + * to query rte_eth_read_clock() that will return the current device
> > + * timestamp. The dynamic Tx timestamp flag tells whether the field
> > contains
> > + * actual timestamp value. For the packets being sent this value can
> > be
> > + * used by PMD to schedule packet sending.
> > + *
> > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> > + * introduced and the shared dynamic timestamp field will be used
> > + * to handle the timestamps on receiving datapath as well.
> > + */
> > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> 
> The description above should not say anything about the dynamic TX
> timestamp flag.
It does not. Or do you mean RX? 
Not sure, field and flag are tightly coupled,
it is nice to mention this relation for better understanding. 
And mentioning the RX explains why it is not like this:
RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME

> 
> Please elaborate "some timing information", e.g. add "... about when the
> packet was received".

Sorry, I do not follow,  currently the dynamic field is not
"about when the packet was received". Now it is introduced for Tx
only and just the opportunity to be shared with Rx one in coming releases
is mentioned. "Some" means - not specified (herein) exactly.
And it is elaborated what Is not specified and how it is supposed
to use Tx timestamp.

> 
> > +
> > +/**
> > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> set on
> > the
> > + * packet being sent it tries to synchronize the time of packet
> > appearing
> > + * on the wire with the specified packet timestamp. If the specified
> > one
> > + * is in the past it should be ignored, if one is in the distant
> > future
> > + * it should be capped with some reasonable value (in range of
> > seconds).
> > + *
> > + * There is no any packet reordering according to timestamps is
> > supposed,
> > + * neither for packet within the burst, nor for the whole bursts, it
> > is
> > + * an entirely application responsibility to generate packets and its
> > + * timestamps in desired order. The timestamps might be put only in
> > + * the first packet in the burst providing the entire burst
> > scheduling.
> > + */
> > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> "rte_dynflag_tx_timestamp"
> > +
> >  #endif
> > --
> > 1.8.3.1
> >
> 
> It may be worth adding some documentation about how the clocks of the
> NICs are out of sync with the clock of the CPU, and are all drifting relatively.
> 
> And those clocks are also out of sync with the actual time (NTP clock).

IMO, It is out of scope of this very generic patch.  As for mlx NICs - the internal device
clock might be (or might be not) synchronized with PTP, it can provide timestamps
in real nanoseconds in various formats or just some free running counter.
On some systems the NIC and CPU might share the same clock source (for their PLL inputs
for example) and there will be no any drifts. As we can see - it is a wide and interesting 
opic to discuss, but, IMO,  the comment in header file might be not the most relevant
place to do. As for mlx5 devices clock specifics - it will be documented in PMD chapter.

OK, will add few generic words, the few ones - in order not to make comment wordy, just
point the direction for further thinking.

> 
> Preferably, some sort of cookbook for handling this should be provided.
> PCAP could be used as an example.
> 
testpmd example is included in series, mlx5 PMD patch is prepared and coming soon.

With best regards, Slava


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
  @ 2020-07-08 14:30  2%     ` David Marchand
  2020-07-08 15:34  5%       ` Ruifeng Wang
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2020-07-08 14:30 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, dev, Ananyev,
	Konstantin, Honnappa Nagarahalli, nd

On Tue, Jul 7, 2020 at 5:16 PM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b9d49ac87..7889f21b3 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
>
>  #ifndef _RTE_LPM_H_
> @@ -20,6 +21,7 @@
>  #include <rte_memory.h>
>  #include <rte_common.h>
>  #include <rte_vect.h>
> +#include <rte_rcu_qsbr.h>
>
>  #ifdef __cplusplus
>  extern "C" {
> @@ -62,6 +64,17 @@ extern "C" {
>  /** Bitmask used to indicate successful lookup */
>  #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
>
> +/** @internal Default RCU defer queue entries to reclaim in one go. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_MAX     16
> +
> +/** RCU reclamation modes */
> +enum rte_lpm_qsbr_mode {
> +       /** Create defer queue for reclaim. */
> +       RTE_LPM_QSBR_MODE_DQ = 0,
> +       /** Use blocking mode reclaim. No defer queue created. */
> +       RTE_LPM_QSBR_MODE_SYNC
> +};
> +
>  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>  /** @internal Tbl24 entry structure. */
>  __extension__
> @@ -130,6 +143,28 @@ struct rte_lpm {
>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +#ifdef ALLOW_EXPERIMENTAL_API
> +       /* RCU config. */
> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> +#endif
> +};

I can see failures in travis reports for v7 and v6.
I reproduced them in my env.

1 function with some indirect sub-type change:

  [C]'function int rte_lpm_add(rte_lpm*, uint32_t, uint8_t, uint32_t)'
at rte_lpm.c:764:1 has some indirect sub-type changes:
    parameter 1 of type 'rte_lpm*' has sub-type changes:
      in pointed to type 'struct rte_lpm' at rte_lpm.h:134:1:
        type size hasn't changed
        3 data member insertions:
          'rte_rcu_qsbr* rte_lpm::v', at offset 536873600 (in bits) at
rte_lpm.h:148:1
          'rte_lpm_qsbr_mode rte_lpm::rcu_mode', at offset 536873664
(in bits) at rte_lpm.h:149:1
          'rte_rcu_qsbr_dq* rte_lpm::dq', at offset 536873728 (in
bits) at rte_lpm.h:150:1


Going back to my proposal of hiding what does not need to be seen.

Disclaimer, *this is quick & dirty* but it builds and passes ABI check:

$ git diff
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index d498ba761..7109aef6a 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -115,6 +115,15 @@ rte_lpm_find_existing(const char *name)
        return l;
 }

+struct internal_lpm {
+       /* Public object */
+       struct rte_lpm lpm;
+       /* RCU config. */
+       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
+       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
+};
+
 /*
  * Allocates memory for LPM object
  */
@@ -123,6 +132,7 @@ rte_lpm_create(const char *name, int socket_id,
                const struct rte_lpm_config *config)
 {
        char mem_name[RTE_LPM_NAMESIZE];
+       struct internal_lpm *internal = NULL;
        struct rte_lpm *lpm = NULL;
        struct rte_tailq_entry *te;
        uint32_t mem_size, rules_size, tbl8s_size;
@@ -141,12 +151,6 @@ rte_lpm_create(const char *name, int socket_id,

        snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);

-       /* Determine the amount of memory to allocate. */
-       mem_size = sizeof(*lpm);
-       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
        rte_mcfg_tailq_write_lock();

        /* guarantee there's no existing */
@@ -170,16 +174,23 @@ rte_lpm_create(const char *name, int socket_id,
                goto exit;
        }

+       /* Determine the amount of memory to allocate. */
+       mem_size = sizeof(*internal);
+       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
        /* Allocate memory to store the LPM data structures. */
-       lpm = rte_zmalloc_socket(mem_name, mem_size,
+       internal = rte_zmalloc_socket(mem_name, mem_size,
                        RTE_CACHE_LINE_SIZE, socket_id);
-       if (lpm == NULL) {
+       if (internal == NULL) {
                RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
                rte_free(te);
                rte_errno = ENOMEM;
                goto exit;
        }

+       lpm = &internal->lpm;
        lpm->rules_tbl = rte_zmalloc_socket(NULL,
                        (size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);

@@ -226,6 +237,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+       struct internal_lpm *internal;
        struct rte_lpm_list *lpm_list;
        struct rte_tailq_entry *te;

@@ -247,8 +259,9 @@ rte_lpm_free(struct rte_lpm *lpm)

        rte_mcfg_tailq_write_unlock();

-       if (lpm->dq)
-               rte_rcu_qsbr_dq_delete(lpm->dq);
+       internal = container_of(lpm, struct internal_lpm, lpm);
+       if (internal->dq != NULL)
+               rte_rcu_qsbr_dq_delete(internal->dq);
        rte_free(lpm->tbl8);
        rte_free(lpm->rules_tbl);
        rte_free(lpm);
@@ -276,13 +289,15 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
rte_lpm_rcu_config *cfg,
 {
        char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
        struct rte_rcu_qsbr_dq_parameters params = {0};
+       struct internal_lpm *internal;

-       if ((lpm == NULL) || (cfg == NULL)) {
+       if (lpm == NULL || cfg == NULL) {
                rte_errno = EINVAL;
                return 1;
        }

-       if (lpm->v) {
+       internal = container_of(lpm, struct internal_lpm, lpm);
+       if (internal->v != NULL) {
                rte_errno = EEXIST;
                return 1;
        }
@@ -305,20 +320,19 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
rte_lpm_rcu_config *cfg,
                params.free_fn = __lpm_rcu_qsbr_free_resource;
                params.p = lpm;
                params.v = cfg->v;
-               lpm->dq = rte_rcu_qsbr_dq_create(&params);
-               if (lpm->dq == NULL) {
-                       RTE_LOG(ERR, LPM,
-                                       "LPM QS defer queue creation failed\n");
+               internal->dq = rte_rcu_qsbr_dq_create(&params);
+               if (internal->dq == NULL) {
+                       RTE_LOG(ERR, LPM, "LPM QS defer queue creation
failed\n");
                        return 1;
                }
                if (dq)
-                       *dq = lpm->dq;
+                       *dq = internal->dq;
        } else {
                rte_errno = EINVAL;
                return 1;
        }
-       lpm->rcu_mode = cfg->mode;
-       lpm->v = cfg->v;
+       internal->rcu_mode = cfg->mode;
+       internal->v = cfg->v;

        return 0;
 }
@@ -502,12 +516,13 @@ _tbl8_alloc(struct rte_lpm *lpm)
 static int32_t
 tbl8_alloc(struct rte_lpm *lpm)
 {
+       struct internal_lpm *internal = container_of(lpm, struct
internal_lpm, lpm);
        int32_t group_idx; /* tbl8 group index. */

        group_idx = _tbl8_alloc(lpm);
-       if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
+       if (group_idx == -ENOSPC && internal->dq != NULL) {
                /* If there are no tbl8 groups try to reclaim one. */
-               if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
+               if (rte_rcu_qsbr_dq_reclaim(internal->dq, 1, NULL,
NULL, NULL) == 0)
                        group_idx = _tbl8_alloc(lpm);
        }

@@ -518,20 +533,21 @@ static void
 tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
        struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+       struct internal_lpm *internal = container_of(lpm, struct
internal_lpm, lpm);

-       if (!lpm->v) {
+       if (internal->v == NULL) {
                /* Set tbl8 group invalid*/
                __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
                                __ATOMIC_RELAXED);
-       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
                /* Wait for quiescent state change. */
-               rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
+               rte_rcu_qsbr_synchronize(internal->v, RTE_QSBR_THRID_INVALID);
                /* Set tbl8 group invalid*/
                __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
                                __ATOMIC_RELAXED);
-       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
                /* Push into QSBR defer queue. */
-               rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
+               rte_rcu_qsbr_dq_enqueue(internal->dq, (void
*)&tbl8_group_start);
        }
 }

diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 7889f21b3..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -143,12 +143,6 @@ struct rte_lpm {
                        __rte_cache_aligned; /**< LPM tbl24 table. */
        struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
        struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
-#ifdef ALLOW_EXPERIMENTAL_API
-       /* RCU config. */
-       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
-       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
-       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
-#endif
 };

 /** LPM RCU QSBR configuration structure. */




-- 
David Marchand


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
  2020-07-07 15:23  0%   ` Olivier Matz
@ 2020-07-08 14:16  0%   ` Morten Brørup
  2020-07-08 14:54  0%     ` Slava Ovsiienko
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2020-07-08 14:16 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> Ovsiienko
> Sent: Tuesday, July 7, 2020 4:57 PM
> 
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without
> involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be
> introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the
> timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  v1->v4:
>     - dedicated dynamic Tx timestamp flag instead of shared with Rx

The detailed description above should be updated to reflect that it is now two flags.

>     - Doxygen-style comment
>     - comments update
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c
> b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
> 
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h
> b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> 
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h
> b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..7e9f7d2 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
> 
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly
> defined
> + * but are maintained always the same for a given port. Some devices
> allow4
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field
> contains
> + * actual timestamp value. For the packets being sent this value can
> be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"

The description above should not say anything about the dynamic TX timestamp flag.

Please elaborate "some timing information", e.g. add "... about when the packet was received".

> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on
> the
> + * packet being sent it tries to synchronize the time of packet
> appearing
> + * on the wire with the specified packet timestamp. If the specified
> one
> + * is in the past it should be ignored, if one is in the distant
> future
> + * it should be capped with some reasonable value (in range of
> seconds).
> + *
> + * There is no any packet reordering according to timestamps is
> supposed,
> + * neither for packet within the burst, nor for the whole bursts, it
> is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst
> scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
> +
>  #endif
> --
> 1.8.3.1
> 

It may be worth adding some documentation about how the clocks of the NICs are out of sync with the clock of the CPU, and are all drifting relatively.

And those clocks are also out of sync with the actual time (NTP clock).

Preferably, some sort of cookbook for handling this should be provided. PCAP could be used as an example.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 13:45  7%   ` Aaron Conole
@ 2020-07-08 14:01  4%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-08 14:01 UTC (permalink / raw)
  To: Aaron Conole; +Cc: David Marchand, dev, thomas, dodji, Neil Horman



On 08/07/2020 14:45, Aaron Conole wrote:
> "Kinsella, Ray" <mdr@ashroe.eu> writes:
> 
>> + Aaron
>>
>> On 08/07/2020 11:22, David Marchand wrote:
>>> abidiff can provide some more information about the ABI difference it
>>> detected.
>>> In all cases, a discussion on the mailing must happen but we can give
>>> some hints to know if this is a problem with the script calling abidiff,
>>> a potential ABI breakage or an unambiguous ABI breakage.
>>>
>>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>>> ---
>>>  devtools/check-abi.sh | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
>>> index e17fedbd9f..521e2cce7c 100755
>>> --- a/devtools/check-abi.sh
>>> +++ b/devtools/check-abi.sh
>>> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>>>  		error=1
>>>  		continue
>>>  	fi
>>> -	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
>>> +	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
>>> +		abiret=$?
>>>  		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>>>  		error=1
>>> -	fi
>>> +		echo
>>> +		if [ $(($abiret & 3)) != 0 ]; then
>>> +			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
>>> +		fi
>>> +		if [ $(($abiret & 4)) != 0 ]; then
>>> +			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
>>> +		fi
>>> +		if [ $(($abiret & 8)) != 0 ]; then
>>> +			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
>>> +		fi
>>> +		echo
>>> +	}
>>>  done
>>>  
>>>  [ -z "$error" ] || [ -n "$warnonly" ]
>>>
>>
>> This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
>> At the moment it takes time to find the failure reason in the Travis log.
> 
> That's a problem even for non-ABI failures.  I was considering pulling
> the travis log for each failed build and attaching it, but even that
> isn't a great solution (very large emails aren't much easier to search).
> 
> I'm open to suggestions.

For me the problem arises when you log on to the Travis interface,
you need to search for ERROR etc ... there must a better way.

> 
>> Ray K
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 13:09  7% ` Kinsella, Ray
  2020-07-08 13:15  4%   ` David Marchand
@ 2020-07-08 13:45  7%   ` Aaron Conole
  2020-07-08 14:01  4%     ` Kinsella, Ray
  1 sibling, 1 reply; 200+ results
From: Aaron Conole @ 2020-07-08 13:45 UTC (permalink / raw)
  To: Kinsella, Ray; +Cc: David Marchand, dev, thomas, dodji, Neil Horman

"Kinsella, Ray" <mdr@ashroe.eu> writes:

> + Aaron
>
> On 08/07/2020 11:22, David Marchand wrote:
>> abidiff can provide some more information about the ABI difference it
>> detected.
>> In all cases, a discussion on the mailing must happen but we can give
>> some hints to know if this is a problem with the script calling abidiff,
>> a potential ABI breakage or an unambiguous ABI breakage.
>> 
>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>> ---
>>  devtools/check-abi.sh | 16 ++++++++++++++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>> 
>> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
>> index e17fedbd9f..521e2cce7c 100755
>> --- a/devtools/check-abi.sh
>> +++ b/devtools/check-abi.sh
>> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>>  		error=1
>>  		continue
>>  	fi
>> -	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
>> +	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
>> +		abiret=$?
>>  		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>>  		error=1
>> -	fi
>> +		echo
>> +		if [ $(($abiret & 3)) != 0 ]; then
>> +			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
>> +		fi
>> +		if [ $(($abiret & 4)) != 0 ]; then
>> +			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
>> +		fi
>> +		if [ $(($abiret & 8)) != 0 ]; then
>> +			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
>> +		fi
>> +		echo
>> +	}
>>  done
>>  
>>  [ -z "$error" ] || [ -n "$warnonly" ]
>> 
>
> This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
> At the moment it takes time to find the failure reason in the Travis log.

That's a problem even for non-ABI failures.  I was considering pulling
the travis log for each failed build and attaching it, but even that
isn't a great solution (very large emails aren't much easier to search).

I'm open to suggestions.

> Ray K


^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
  2020-07-08 12:29  3%   ` David Marchand
@ 2020-07-08 13:43  0%     ` Aaron Conole
  2020-07-08 15:04  0%     ` Kinsella, Ray
  1 sibling, 0 replies; 200+ results
From: Aaron Conole @ 2020-07-08 13:43 UTC (permalink / raw)
  To: David Marchand
  Cc: Phil Yang, dev, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Neil Horman, Ray Kinsella, Harman Kalra

David Marchand <david.marchand@redhat.com> writes:

> On Thu, Jun 11, 2020 at 12:25 PM Phil Yang <phil.yang@arm.com> wrote:
>>
>> The event status is defined as a volatile variable and shared
>> between threads. Use c11 atomics with explicit ordering instead
>> of rte_atomic ops which enforce unnecessary barriers on aarch64.
>>
>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> ---
>>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>>  lib/librte_eal/linux/eal_interrupts.c       | 47 ++++++++++++++++++++---------
>>  2 files changed, 34 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
>> index 773a34a..b1e8a29 100644
>> --- a/lib/librte_eal/include/rte_eal_interrupts.h
>> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
>> @@ -59,7 +59,7 @@ enum {
>>
>>  /** interrupt epoll event obj, taken by epoll_event.ptr */
>>  struct rte_epoll_event {
>> -       volatile uint32_t status;  /**< OUT: event status */
>> +       uint32_t status;           /**< OUT: event status */
>>         int fd;                    /**< OUT: event fd */
>>         int epfd;       /**< OUT: epoll instance the ev associated with */
>>         struct rte_epoll_data epdata;
>
> I got a reject from the ABI check in my env.
>
> 1 function with some indirect sub-type change:
>
>   [C]'function int rte_pci_ioport_map(rte_pci_device*, int,
> rte_pci_ioport*)' at pci.c:756:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_pci_device*' has sub-type changes:
>       in pointed to type 'struct rte_pci_device' at rte_bus_pci.h:57:1:
>         type size hasn't changed
>         1 data member changes (2 filtered):
>          type of 'rte_intr_handle rte_pci_device::intr_handle' changed:
>            type size hasn't changed
>            1 data member change:
>             type of 'rte_epoll_event rte_intr_handle::elist[512]' changed:
>               array element type 'struct rte_epoll_event' changed:
>                 type size hasn't changed
>                 1 data member change:
>                  type of 'volatile uint32_t rte_epoll_event::status' changed:
>                    entity changed from 'volatile uint32_t' to 'typedef
> uint32_t' at stdint-uintn.h:26:1
>                    type size hasn't changed
>
>               type size hasn't changed
>
>
> This is probably harmless in our case (going from volatile to non
> volatile), but it won't pass the check in the CI without an exception
> rule.
>
> Note: checking on the test-report ml, I saw nothing, but ovsrobot did
> catch the issue with this change too, Aaron?

I don't have archives back to Jun 11 on the robot server.  I think it
doesn't preserve forever (and the archives seem to go back only until
Jul 03).  I will update it.

I do see that we have a failed travis job:

https://travis-ci.org/github/ovsrobot/dpdk/builds/697180855

I'm surprised this didn't go out.  Have we seen other failures to report
of the ovs robot recently?  I can double check the job config.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter
    2020-07-07 15:54  4%       ` [dpdk-dev] [PATCH v4 4/4] eventdev: relax smp barriers with C11 atomics Phil Yang
@ 2020-07-08 13:30  4%       ` Jerin Jacob
  2020-07-08 15:01  0%         ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Jerin Jacob @ 2020-07-08 13:30 UTC (permalink / raw)
  To: Phil Yang
  Cc: Jerin Jacob, dpdk-dev, Thomas Monjalon, Erik Gabriel Carrillo,
	Honnappa Nagarahalli, David Christensen,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, nd, David Marchand, Ray Kinsella, Neil Horman,
	dodji, dpdk stable

On Tue, Jul 7, 2020 at 9:25 PM Phil Yang <phil.yang@arm.com> wrote:
>
> The n_poll_lcores counter and poll_lcore array are shared between lcores
> and the update of these variables are out of the protection of spinlock
> on each lcore timer list. The read-modify-write operations of the counter
> are not atomic, so it has the potential of race condition between lcores.
>
> Use c11 atomics with RELAXED ordering to prevent confliction.
>
> Fixes: cc7b73ea9e3b ("eventdev: add new software timer adapter")
> Cc: erik.g.carrillo@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>

Hi Thomas,

The latest version does not have ABI breakage issue.

I have added the ABI verifier in my local patch verification setup.

Series applied to dpdk-next-eventdev/master.

Please pull this series from dpdk-next-eventdev/master. Thanks.

I am marking this patch series as "Awaiting Upstream" in patchwork
status to reflect the actual status.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 13:15  4%   ` David Marchand
@ 2020-07-08 13:22  4%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-08 13:22 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Thomas Monjalon, Dodji Seketeli, Neil Horman, Aaron Conole



On 08/07/2020 14:15, David Marchand wrote:
> On Wed, Jul 8, 2020 at 3:09 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>
>> + Aaron
>>
>> On 08/07/2020 11:22, David Marchand wrote:
>>> abidiff can provide some more information about the ABI difference it
>>> detected.
>>> In all cases, a discussion on the mailing must happen but we can give
>>> some hints to know if this is a problem with the script calling abidiff,
>>> a potential ABI breakage or an unambiguous ABI breakage.
>>>
>>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>>> ---
>>>  devtools/check-abi.sh | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
>>> index e17fedbd9f..521e2cce7c 100755
>>> --- a/devtools/check-abi.sh
>>> +++ b/devtools/check-abi.sh
>>> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>>>               error=1
>>>               continue
>>>       fi
>>> -     if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
>>> +     abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
>>> +             abiret=$?
>>>               echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>>>               error=1
>>> -     fi
>>> +             echo
>>> +             if [ $(($abiret & 3)) != 0 ]; then
>>> +                     echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
> 
> Forgot to --amend.
> Hopefully yes, this will be reported to dev@dpdk.org... I wanted to
> highlight this could be a script or env issue.
> 
> 
>>> +             fi
>>> +             if [ $(($abiret & 4)) != 0 ]; then
>>> +                     echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
>>> +             fi
>>> +             if [ $(($abiret & 8)) != 0 ]; then
>>> +                     echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
>>> +             fi
>>> +             echo
>>> +     }
>>>  done
>>>
>>>  [ -z "$error" ] || [ -n "$warnonly" ]
>>>
>>
>> This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
>> At the moment it takes time to find the failure reason in the Travis log.
> 
> I usually look for "FILES_TO" to get to the last error.
> 
Right, but there is hopefully a better way to give Travis some clues ...
 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 13:09  7% ` Kinsella, Ray
@ 2020-07-08 13:15  4%   ` David Marchand
  2020-07-08 13:22  4%     ` Kinsella, Ray
  2020-07-08 13:45  7%   ` Aaron Conole
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2020-07-08 13:15 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: dev, Thomas Monjalon, Dodji Seketeli, Neil Horman, Aaron Conole

On Wed, Jul 8, 2020 at 3:09 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
> + Aaron
>
> On 08/07/2020 11:22, David Marchand wrote:
> > abidiff can provide some more information about the ABI difference it
> > detected.
> > In all cases, a discussion on the mailing must happen but we can give
> > some hints to know if this is a problem with the script calling abidiff,
> > a potential ABI breakage or an unambiguous ABI breakage.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> >  devtools/check-abi.sh | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
> > index e17fedbd9f..521e2cce7c 100755
> > --- a/devtools/check-abi.sh
> > +++ b/devtools/check-abi.sh
> > @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
> >               error=1
> >               continue
> >       fi
> > -     if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
> > +     abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
> > +             abiret=$?
> >               echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
> >               error=1
> > -     fi
> > +             echo
> > +             if [ $(($abiret & 3)) != 0 ]; then
> > +                     echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."

Forgot to --amend.
Hopefully yes, this will be reported to dev@dpdk.org... I wanted to
highlight this could be a script or env issue.


> > +             fi
> > +             if [ $(($abiret & 4)) != 0 ]; then
> > +                     echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
> > +             fi
> > +             if [ $(($abiret & 8)) != 0 ]; then
> > +                     echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
> > +             fi
> > +             echo
> > +     }
> >  done
> >
> >  [ -z "$error" ] || [ -n "$warnonly" ]
> >
>
> This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
> At the moment it takes time to find the failure reason in the Travis log.

I usually look for "FILES_TO" to get to the last error.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
@ 2020-07-08 13:09  7% ` Kinsella, Ray
  2020-07-08 13:15  4%   ` David Marchand
  2020-07-08 13:45  7%   ` Aaron Conole
  2020-07-09 15:52  4% ` Dodji Seketeli
  1 sibling, 2 replies; 200+ results
From: Kinsella, Ray @ 2020-07-08 13:09 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: thomas, dodji, Neil Horman, Aaron Conole

+ Aaron

On 08/07/2020 11:22, David Marchand wrote:
> abidiff can provide some more information about the ABI difference it
> detected.
> In all cases, a discussion on the mailing must happen but we can give
> some hints to know if this is a problem with the script calling abidiff,
> a potential ABI breakage or an unambiguous ABI breakage.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>  devtools/check-abi.sh | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
> index e17fedbd9f..521e2cce7c 100755
> --- a/devtools/check-abi.sh
> +++ b/devtools/check-abi.sh
> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>  		error=1
>  		continue
>  	fi
> -	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
> +	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
> +		abiret=$?
>  		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>  		error=1
> -	fi
> +		echo
> +		if [ $(($abiret & 3)) != 0 ]; then
> +			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
> +		fi
> +		if [ $(($abiret & 4)) != 0 ]; then
> +			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
> +		fi
> +		if [ $(($abiret & 8)) != 0 ]; then
> +			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
> +		fi
> +		echo
> +	}
>  done
>  
>  [ -z "$error" ] || [ -n "$warnonly" ]
> 

This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
At the moment it takes time to find the failure reason in the Travis log.

Ray K

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
  @ 2020-07-08 12:29  3%   ` David Marchand
  2020-07-08 13:43  0%     ` Aaron Conole
  2020-07-08 15:04  0%     ` Kinsella, Ray
  0 siblings, 2 replies; 200+ results
From: David Marchand @ 2020-07-08 12:29 UTC (permalink / raw)
  To: Phil Yang, Aaron Conole
  Cc: dev, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Neil Horman, Ray Kinsella, Harman Kalra

On Thu, Jun 11, 2020 at 12:25 PM Phil Yang <phil.yang@arm.com> wrote:
>
> The event status is defined as a volatile variable and shared
> between threads. Use c11 atomics with explicit ordering instead
> of rte_atomic ops which enforce unnecessary barriers on aarch64.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>  lib/librte_eal/linux/eal_interrupts.c       | 47 ++++++++++++++++++++---------
>  2 files changed, 34 insertions(+), 15 deletions(-)
>
> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
> index 773a34a..b1e8a29 100644
> --- a/lib/librte_eal/include/rte_eal_interrupts.h
> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
> @@ -59,7 +59,7 @@ enum {
>
>  /** interrupt epoll event obj, taken by epoll_event.ptr */
>  struct rte_epoll_event {
> -       volatile uint32_t status;  /**< OUT: event status */
> +       uint32_t status;           /**< OUT: event status */
>         int fd;                    /**< OUT: event fd */
>         int epfd;       /**< OUT: epoll instance the ev associated with */
>         struct rte_epoll_data epdata;

I got a reject from the ABI check in my env.

1 function with some indirect sub-type change:

  [C]'function int rte_pci_ioport_map(rte_pci_device*, int,
rte_pci_ioport*)' at pci.c:756:1 has some indirect sub-type changes:
    parameter 1 of type 'rte_pci_device*' has sub-type changes:
      in pointed to type 'struct rte_pci_device' at rte_bus_pci.h:57:1:
        type size hasn't changed
        1 data member changes (2 filtered):
         type of 'rte_intr_handle rte_pci_device::intr_handle' changed:
           type size hasn't changed
           1 data member change:
            type of 'rte_epoll_event rte_intr_handle::elist[512]' changed:
              array element type 'struct rte_epoll_event' changed:
                type size hasn't changed
                1 data member change:
                 type of 'volatile uint32_t rte_epoll_event::status' changed:
                   entity changed from 'volatile uint32_t' to 'typedef
uint32_t' at stdint-uintn.h:26:1
                   type size hasn't changed

              type size hasn't changed


This is probably harmless in our case (going from volatile to non
volatile), but it won't pass the check in the CI without an exception
rule.

Note: checking on the test-report ml, I saw nothing, but ovsrobot did
catch the issue with this change too, Aaron?


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes
  2020-07-08 10:32  7%   ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Thomas Monjalon
@ 2020-07-08 12:02  4%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-08 12:02 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon



On 08/07/2020 11:32, Thomas Monjalon wrote:
> 07/07/2020 19:50, Ray Kinsella:
>> Few documentation fixexs, clarifing the Windows ABI policy and aliases to
>> experimental mode.
>>
>> Ray Kinsella (2):
>>   doc: reword abi policy for windows
>>   doc: clarify alias to experimental period
>>
>> v2:
>>   Addressed feedback from Thomas Monjalon.
> 
> One more sentence needs to start on its line,
> avoiding to split a link on two lines.

ah yes, missed that one sorry.
> 
> Reworded titles with uppercases as well:
> 	doc: reword ABI policy for Windows
> 	doc: clarify period of alias to experimental symbol
> 
> Applied with above changes, thanks
> 
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
  2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
  2020-07-08  5:11  3%   ` Phil Yang
@ 2020-07-08 11:44  0%   ` Olivier Matz
  2020-07-09 10:00  3%     ` Phil Yang
  2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
  2 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2020-07-08 11:44 UTC (permalink / raw)
  To: Phil Yang
  Cc: david.marchand, dev, drc, Honnappa.Nagarahalli, ruifeng.wang, nd

Hi,

On Tue, Jul 07, 2020 at 06:10:33PM +0800, Phil Yang wrote:
> Use C11 atomics with explicit ordering instead of rte_atomic ops which
> enforce unnecessary barriers on aarch64.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> v2:
> Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> to refcnt_atomic.
> 
>  lib/librte_mbuf/rte_mbuf.c      |  1 -
>  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
>  lib/librte_mbuf/rte_mbuf_core.h | 11 +++--------
>  3 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index ae91ae2..8a456e5 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -22,7 +22,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_mempool.h>
>  #include <rte_mbuf.h>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index f8e492e..4a7a98c 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -37,7 +37,6 @@
>  #include <rte_config.h>
>  #include <rte_mempool.h>
>  #include <rte_memory.h>
> -#include <rte_atomic.h>
>  #include <rte_prefetch.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_byteorder.h>
> @@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
>  static inline uint16_t
>  rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>  {
> -	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
> +	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>  static inline void
>  rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>  {
> -	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
> +	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
>  }
>  
>  /* internal */
>  static inline uint16_t
>  __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
>  {
> -	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
> +	return (uint16_t)(__atomic_add_fetch((int16_t *)&m->refcnt, value,
> +					__ATOMIC_ACQ_REL));
>  }
>  
>  /**
> @@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>  static inline uint16_t
>  rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
>  {
> -	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
> +	return __atomic_load_n(&shinfo->refcnt_atomic, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -481,7 +481,7 @@ static inline void
>  rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
>  	uint16_t new_value)
>  {
> -	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
> +	__atomic_store_n(&shinfo->refcnt_atomic, new_value, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
>  		return (uint16_t)value;
>  	}
>  
> -	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
> +	return (uint16_t)(__atomic_add_fetch((int16_t *)&shinfo->refcnt_atomic,
> +					    value, __ATOMIC_ACQ_REL));
>  }
>  
>  /** Mbuf prefetch */
> @@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
>  	 * Direct usage of add primitive to avoid
>  	 * duplication of comparing with one.
>  	 */
> -	if (likely(rte_atomic16_add_return
> -			(&shinfo->refcnt_atomic, -1)))
> +	if (likely(__atomic_add_fetch((int *)&shinfo->refcnt_atomic, -1,
> +				     __ATOMIC_ACQ_REL)))
>  		return 1;
>  
>  	/* Reinitialize counter before mbuf freeing. */
> diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> index 16600f1..806313a 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -18,7 +18,6 @@
>  
>  #include <stdint.h>
>  #include <rte_compat.h>
> -#include <generic/rte_atomic.h>
>  
>  #ifdef __cplusplus
>  extern "C" {
> @@ -495,12 +494,8 @@ struct rte_mbuf {
>  	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
>  	 * config option.
>  	 */
> -	RTE_STD_C11
> -	union {
> -		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
> -		/** Non-atomically accessed refcnt */
> -		uint16_t refcnt;
> -	};
> +	uint16_t refcnt;
> +

It seems this patch does 2 things:
- remove refcnt_atomic
- use C11 atomics

The first change is an API break. I think it should be announced in a deprecation
notice. The one about atomic does not talk about it.

So I suggest to keep refcnt_atomic until next version.


>  	uint16_t nb_segs;         /**< Number of segments. */
>  
>  	/** Input port (16 bits to support more than 256 virtual ports).
> @@ -679,7 +674,7 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
>  struct rte_mbuf_ext_shared_info {
>  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
>  	void *fcb_opaque;                        /**< Free callback argument */
> -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
>  };
>  
>  /**< Maximum number of nb_segs allowed. */
> -- 
> 2.7.4
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
  2020-07-07 17:51 24%   ` [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows Ray Kinsella
  2020-07-07 17:51 12%   ` [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period Ray Kinsella
@ 2020-07-08 10:32  7%   ` Thomas Monjalon
  2020-07-08 12:02  4%     ` Kinsella, Ray
  2 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-08 10:32 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon

07/07/2020 19:50, Ray Kinsella:
> Few documentation fixexs, clarifing the Windows ABI policy and aliases to
> experimental mode.
> 
> Ray Kinsella (2):
>   doc: reword abi policy for windows
>   doc: clarify alias to experimental period
> 
> v2:
>   Addressed feedback from Thomas Monjalon.

One more sentence needs to start on its line,
avoiding to split a link on two lines.

Reworded titles with uppercases as well:
	doc: reword ABI policy for Windows
	doc: clarify period of alias to experimental symbol

Applied with above changes, thanks



^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
@ 2020-07-08 10:22 25% David Marchand
  2020-07-08 13:09  7% ` Kinsella, Ray
  2020-07-09 15:52  4% ` Dodji Seketeli
  0 siblings, 2 replies; 200+ results
From: David Marchand @ 2020-07-08 10:22 UTC (permalink / raw)
  To: dev; +Cc: thomas, dodji, Ray Kinsella, Neil Horman

abidiff can provide some more information about the ABI difference it
detected.
In all cases, a discussion on the mailing must happen but we can give
some hints to know if this is a problem with the script calling abidiff,
a potential ABI breakage or an unambiguous ABI breakage.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 devtools/check-abi.sh | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
index e17fedbd9f..521e2cce7c 100755
--- a/devtools/check-abi.sh
+++ b/devtools/check-abi.sh
@@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
 		error=1
 		continue
 	fi
-	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
+	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
+		abiret=$?
 		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
 		error=1
-	fi
+		echo
+		if [ $(($abiret & 3)) != 0 ]; then
+			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
+		fi
+		if [ $(($abiret & 4)) != 0 ]; then
+			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
+		fi
+		if [ $(($abiret & 8)) != 0 ]; then
+			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
+		fi
+		echo
+	}
 done
 
 [ -z "$error" ] || [ -n "$warnonly" ]
-- 
2.23.0


^ permalink raw reply	[relevance 25%]

* Re: [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
  2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
@ 2020-07-08  5:11  3%   ` Phil Yang
  2020-07-08 11:44  0%   ` Olivier Matz
  2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
  2 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-08  5:11 UTC (permalink / raw)
  To: Phil Yang, david.marchand, dev
  Cc: drc, Honnappa Nagarahalli, olivier.matz, Ruifeng Wang, nd

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Phil Yang
> Sent: Tuesday, July 7, 2020 6:11 PM
> To: david.marchand@redhat.com; dev@dpdk.org
> Cc: drc@linux.vnet.ibm.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
> 
> Use C11 atomics with explicit ordering instead of rte_atomic ops which
> enforce unnecessary barriers on aarch64.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> v2:
> Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> to refcnt_atomic.
> 

<snip>

> diff --git a/lib/librte_mbuf/rte_mbuf_core.h
> b/lib/librte_mbuf/rte_mbuf_core.h
> index 16600f1..806313a 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -18,7 +18,6 @@
> 
>  #include <stdint.h>
>  #include <rte_compat.h>
> -#include <generic/rte_atomic.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -495,12 +494,8 @@ struct rte_mbuf {
>  	 * or non-atomic) is controlled by the
> CONFIG_RTE_MBUF_REFCNT_ATOMIC
>  	 * config option.
>  	 */
> -	RTE_STD_C11
> -	union {
> -		rte_atomic16_t refcnt_atomic; /**< Atomically accessed
> refcnt */
> -		/** Non-atomically accessed refcnt */
> -		uint16_t refcnt;
> -	};
> +	uint16_t refcnt;
> +
>  	uint16_t nb_segs;         /**< Number of segments. */
> 
>  	/** Input port (16 bits to support more than 256 virtual ports).
> @@ -679,7 +674,7 @@ typedef void
> (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
>  struct rte_mbuf_ext_shared_info {
>  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> function */
>  	void *fcb_opaque;                        /**< Free callback argument */
> -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */

It still causes an ABI check failure in Travis CI on this type change.
I think we need an exception in libabigail.abignore for this. 

Thanks,
Phil
>  };
> 
>  /**< Maximum number of nb_segs allowed. */
> --
> 2.7.4


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
  2020-06-29 23:56  0%           ` Dmitry Kozlyuk
@ 2020-07-08  1:09  0%             ` Dmitry Kozlyuk
  0 siblings, 0 replies; 200+ results
From: Dmitry Kozlyuk @ 2020-07-08  1:09 UTC (permalink / raw)
  To: Tal Shnaiderman
  Cc: Ranjit Menon, Fady Bader, dev, Dmitry Malloy,
	Narcisa Ana Maria Vasile, Thomas Monjalon, Olivier Matz

On Tue, 30 Jun 2020 02:56:20 +0300, Dmitry Kozlyuk wrote:
> On Mon, 29 Jun 2020 08:12:51 +0000, Tal Shnaiderman wrote:
> > > From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > > Subject: Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
> > > 
> > > On Sun, 28 Jun 2020 23:23:11 -0700, Ranjit Menon wrote:    
[snip]
> > > > The issue is that UINT8, UINT16, INT32, INT64 etc. are reserved types
> > > > in Windows headers for integer types. We found that it is easier to
> > > > change the enum in cmdline_parse_num.h than try to play with the
> > > > include order of headers. AFAIK, the enums were only used to determine
> > > > the type in a series of switch() statements in librte_cmdline, so we
> > > > simply renamed the enums. Not sure, if that will be acceptable here.    
> > > 
> > > +1 for renaming enum values. It's not a problem of librte_cmdline itself
> > > +but a
> > > problem of its consumption on Windows, however renaming enum values
> > > doesn't break ABI and winn make librte_cmdline API "namespaced".
> > > 
[snip]
> > 
> > test_pmd redefine BOOLEAN and PATTERN in the index enum, I'm not sure how many more conflicts we will face because of this huge include.
> >
> > Also, DPDK applications will inherit it unknowingly, not sure if this is common for windows libraries.  
> 
> I never hit these particular conflicts, but you're right that there will be
> more, e.g. I remember particularly nasty clashes in failsafe PMD, unrelated
> to cmdline token names.

Still, I'd go for renaming, with or without additional steps to hide
<windows.h>. Although I wouldn't include it in this series: renaming will
touch numerous places and require much more reviewers.

> We could take the same approach as with networking headers: copy required
> declarations instead of including them from SDK. Here's a list of what
> pthread.h uses:

While this will resolve the issue for DPDK code, applications using DPDK
headers can easily hit it by including <windows.h> on their own. On the other
hand, they can always split translation units and I don't know how practical
it is to use system and DPDK networking headers at the same time.

-- 
Dmitry Kozlyuk

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period
  2020-07-07 17:51 12%   ` [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period Ray Kinsella
@ 2020-07-07 18:44  0%     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2020-07-07 18:44 UTC (permalink / raw)
  To: Ray Kinsella, dev
  Cc: fady, thomas, Neil Horman, John McNamara, Marko Kovacevic,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon,
	Honnappa Nagarahalli, nd, nd

<snip>

> Subject: [PATCH v2 2/2] doc: clarify alias to experimental period
> 
> Clarify retention period for aliases to experimental.
> 
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  doc/guides/contributing/abi_versioning.rst | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/contributing/abi_versioning.rst
> b/doc/guides/contributing/abi_versioning.rst
> index 31a9205..b1d09c7 100644
> --- a/doc/guides/contributing/abi_versioning.rst
> +++ b/doc/guides/contributing/abi_versioning.rst
> @@ -158,7 +158,7 @@ The macros exported are:
>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table
> entry
>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function
> ``be``.
>    The macro is used when a symbol matures to become part of the stable ABI,
> to
> -  provide an alias to experimental for some time.
> +  provide an alias to experimental until the next major ABI version.
> 
>  .. _example_abi_macro_usage:
> 
> @@ -428,8 +428,9 @@ _____________________________
> 
>  In situations in which an ``experimental`` symbol has been stable for some
> time,  and it becomes a candidate for promotion to the stable ABI. At this
> time, when -promoting the symbol, maintainer may choose to provide an
> alias to the
> +promoting the symbol, the maintainer may choose to provide an alias to
> +the
>  ``experimental`` symbol version, so as not to break consuming applications.
> +This alias is then dropped in the next major ABI version.
> 
>  The process to provide an alias to ``experimental`` is similar to that,
> of  :ref:`symbol versioning <example_abi_macro_usage>` described above.
> --
> 2.7.4


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
  2020-07-07 17:51 24%   ` [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows Ray Kinsella
@ 2020-07-07 17:51 12%   ` Ray Kinsella
  2020-07-07 18:44  0%     ` Honnappa Nagarahalli
  2020-07-08 10:32  7%   ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Ray Kinsella @ 2020-07-07 17:51 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Clarify retention period for aliases to experimental.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_versioning.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/doc/guides/contributing/abi_versioning.rst b/doc/guides/contributing/abi_versioning.rst
index 31a9205..b1d09c7 100644
--- a/doc/guides/contributing/abi_versioning.rst
+++ b/doc/guides/contributing/abi_versioning.rst
@@ -158,7 +158,7 @@ The macros exported are:
 * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
   binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
   The macro is used when a symbol matures to become part of the stable ABI, to
-  provide an alias to experimental for some time.
+  provide an alias to experimental until the next major ABI version.
 
 .. _example_abi_macro_usage:
 
@@ -428,8 +428,9 @@ _____________________________
 
 In situations in which an ``experimental`` symbol has been stable for some time,
 and it becomes a candidate for promotion to the stable ABI. At this time, when
-promoting the symbol, maintainer may choose to provide an alias to the
+promoting the symbol, the maintainer may choose to provide an alias to the
 ``experimental`` symbol version, so as not to break consuming applications.
+This alias is then dropped in the next major ABI version.
 
 The process to provide an alias to ``experimental`` is similar to that, of
 :ref:`symbol versioning <example_abi_macro_usage>` described above.
-- 
2.7.4


^ permalink raw reply	[relevance 12%]

* [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes
  2020-07-07 14:45  8% [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes Ray Kinsella
  2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
  2020-07-07 14:45 12% ` [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period Ray Kinsella
@ 2020-07-07 17:50  8% ` Ray Kinsella
  2020-07-07 17:51 24%   ` [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows Ray Kinsella
                     ` (2 more replies)
  2 siblings, 3 replies; 200+ results
From: Ray Kinsella @ 2020-07-07 17:50 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Few documentation fixexs, clarifing the Windows ABI policy and aliases to
experimental mode.

Ray Kinsella (2):
  doc: reword abi policy for windows
  doc: clarify alias to experimental period

v2:
  Addressed feedback from Thomas Monjalon.

 doc/guides/contributing/abi_policy.rst     | 4 +++-
 doc/guides/contributing/abi_versioning.rst | 5 +++--
 doc/guides/windows_gsg/intro.rst           | 6 +++---
 3 files changed, 9 insertions(+), 6 deletions(-)

--
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
@ 2020-07-07 17:51 24%   ` Ray Kinsella
  2020-07-07 17:51 12%   ` [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period Ray Kinsella
  2020-07-08 10:32  7%   ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2020-07-07 17:51 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Minor changes to the abi policy for windows.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_policy.rst | 4 +++-
 doc/guides/windows_gsg/intro.rst       | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index d0affa9..4452362 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -40,7 +40,9 @@ General Guidelines
    maintaining ABI stability through one year of DPDK releases starting from
    DPDK 19.11. This policy will be reviewed in 2020, with intention of
    lengthening the stability period. Additional implementation detail can be
-   found in the :ref:`release notes <20_02_abi_changes>`.
+   found in the :ref:`release notes <20_02_abi_changes>`. Please note that this
+   policy does not currently apply to the :doc:`Windows build
+   <../windows_gsg/intro>`.
 
 What is an ABI?
 ~~~~~~~~~~~~~~~
diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
index 58c6246..4ac7f97 100644
--- a/doc/guides/windows_gsg/intro.rst
+++ b/doc/guides/windows_gsg/intro.rst
@@ -19,6 +19,6 @@ compile. Support is being added in pieces so as to limit the overall scope
 of any individual patch series. The goal is to be able to run any DPDK
 application natively on Windows.
 
-The :doc:`../contributing/abi_policy` cannot be respected for Windows.
-Minor ABI versions may be incompatible
-because function versioning is not supported on Windows.
+The :doc:`../contributing/abi_policy` does not apply to the Windows build,
+as function versioning is not supported on Windows,
+therefore minor ABI versions may be incompatible.
-- 
2.7.4


^ permalink raw reply	[relevance 24%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 17:01  4%             ` Kinsella, Ray
@ 2020-07-07 17:08  0%               ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-07 17:08 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson

07/07/2020 19:01, Kinsella, Ray:
> On 07/07/2020 17:57, Thomas Monjalon wrote:
> > 07/07/2020 18:37, Kinsella, Ray:
> >> On 07/07/2020 17:36, Thomas Monjalon wrote:
> >>> 07/07/2020 18:35, Kinsella, Ray:
> >>>> On 07/07/2020 16:26, Thomas Monjalon wrote:
> >>>>> 07/07/2020 16:45, Ray Kinsella:
> >>>>>> Clarify retention period for aliases to experimental.
> >>>>>>
> >>>>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> >>>>>> ---
> >>>>>> --- a/doc/guides/contributing/abi_versioning.rst
> >>>>>> +++ b/doc/guides/contributing/abi_versioning.rst
> >>>>>> @@ -158,7 +158,7 @@ The macros exported are:
> >>>>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
> >>>>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
> >>>>>>    The macro is used when a symbol matures to become part of the stable ABI, to
> >>>>>> -  provide an alias to experimental for some time.
> >>>>>> +  provide an alias to experimental until the next major ABI version.
> >>>>>
> >>>>> Why limiting the period for experimental status?
> >>>>> Some API want to remain experimental longer.
> >>>>>
> >>>>> [...]
> >>>>>> +alias will then typically be dropped in the next major ABI version.
> >>>>>
> >>>>> I don't see the need for the time estimation.
> >>>>
> >>>> Will reword to ...
> >>>>
> >>>> "This alias will then be dropped in the next major ABI version."
> >>>
> >>> It is not addressing my first comment. Please see above.
> >>
> >> Thank you, I don't necessarily agree with the first comment :-)
> > 
> > You don't have to agree. But in this case we must discuss :-)
> > 
> >> We need to say when the alias should be dropped no?
> > 
> > I don't think so.
> > Until now, it is let to the appreciation of the maintainer.
> > If we want to change the rule, especially for experimental period,
> > it must be said clearly and debated.
> 
> It doesn't make _any_ sense to maintain an alias after the new ABI.
> 
> The alias is there to maintain ABI compatibility, 
> there is no reason to maintain compatibility in the new ABI - so it should be dropped

Yes I was wrong, sorry.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 17:00  0%             ` Thomas Monjalon
@ 2020-07-07 17:01  0%               ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-07 17:01 UTC (permalink / raw)
  To: Thomas Monjalon, Honnappa Nagarahalli
  Cc: dev, fady, Neil Horman, John McNamara, Marko Kovacevic,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon,
	david.marchand, bruce.richardson, nd



On 07/07/2020 18:00, Thomas Monjalon wrote:
> 07/07/2020 18:55, Honnappa Nagarahalli:
>>> On 07/07/2020 17:36, Thomas Monjalon wrote:
>>>> 07/07/2020 18:35, Kinsella, Ray:
>>>>> On 07/07/2020 16:26, Thomas Monjalon wrote:
>>>>>> 07/07/2020 16:45, Ray Kinsella:
>>>>>>> Clarify retention period for aliases to experimental.
>>>>>>>
>>>>>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>>>>>>> ---
>>>>>>> --- a/doc/guides/contributing/abi_versioning.rst
>>>>>>> +++ b/doc/guides/contributing/abi_versioning.rst
>>>>>>> @@ -158,7 +158,7 @@ The macros exported are:
>>>>>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version
>>> table entry
>>>>>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal
>>> function ``be``.
>>>>>>>    The macro is used when a symbol matures to become part of the
>>>>>>> stable ABI, to
>>>>>>> -  provide an alias to experimental for some time.
>>>>>>> +  provide an alias to experimental until the next major ABI version.
>>>>>>
>>>>>> Why limiting the period for experimental status?
>>>>>> Some API want to remain experimental longer.
>>
>> This is not limiting the period.
>> This is about how long VERSION_SYMBOL_EXPERIMENTAL should be in place
>> for a symbol after the experimental tag is removed for the symbol.
> 
> Oh wait, I was wrong. It is only about the alias which is set
> AFTER the experimental period.
> 
>>>>>> [...]
>>>>>>>  In situations in which an ``experimental`` symbol has been stable
>>>>>>> for some time,  and it becomes a candidate for promotion to the
>>>>>>> stable ABI. At this time, when -promoting the symbol, maintainer
>>>>>>> may choose to provide an alias to the -``experimental`` symbol version,
>>> so as not to break consuming applications.
>>>>>>> +promoting the symbol, the maintainer may choose to provide an
>>>>>>> +alias to the ``experimental`` symbol version, so as not to break
>>>>>>> +consuming applications. This
>>>>>>
>>>>>> Please start a sentence on a new line.
>>>>>
>>>>> ACK
>>>>>
>>>>>>
>>>>>>> +alias will then typically be dropped in the next major ABI version.
>>>>>>
>>>>>> I don't see the need for the time estimation.
>>
>> I prefer this wording as it clarifying what should be done while creating a patch.
> 
> Yes, after a second read, I am OK.
> 
perfect, I will sort out the other bits. 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:57  0%           ` Thomas Monjalon
@ 2020-07-07 17:01  4%             ` Kinsella, Ray
  2020-07-07 17:08  0%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-07 17:01 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson



On 07/07/2020 17:57, Thomas Monjalon wrote:
> 07/07/2020 18:37, Kinsella, Ray:
>>
>> On 07/07/2020 17:36, Thomas Monjalon wrote:
>>> 07/07/2020 18:35, Kinsella, Ray:
>>>> On 07/07/2020 16:26, Thomas Monjalon wrote:
>>>>> 07/07/2020 16:45, Ray Kinsella:
>>>>>> Clarify retention period for aliases to experimental.
>>>>>>
>>>>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>>>>>> ---
>>>>>> --- a/doc/guides/contributing/abi_versioning.rst
>>>>>> +++ b/doc/guides/contributing/abi_versioning.rst
>>>>>> @@ -158,7 +158,7 @@ The macros exported are:
>>>>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
>>>>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
>>>>>>    The macro is used when a symbol matures to become part of the stable ABI, to
>>>>>> -  provide an alias to experimental for some time.
>>>>>> +  provide an alias to experimental until the next major ABI version.
>>>>>
>>>>> Why limiting the period for experimental status?
>>>>> Some API want to remain experimental longer.
>>>>>
>>>>> [...]
>>>>>> +alias will then typically be dropped in the next major ABI version.
>>>>>
>>>>> I don't see the need for the time estimation.
>>>>
>>>> Will reword to ...
>>>>
>>>> "This alias will then be dropped in the next major ABI version."
>>>
>>> It is not addressing my first comment. Please see above.
>>
>> Thank you, I don't necessarily agree with the first comment :-)
> 
> You don't have to agree. But in this case we must discuss :-)
> 
>> We need to say when the alias should be dropped no?
> 
> I don't think so.
> Until now, it is let to the appreciation of the maintainer.
> If we want to change the rule, especially for experimental period,
> it must be said clearly and debated.

It doesn't make _any_ sense to maintain an alias after the new ABI.

The alias is there to maintain ABI compatibility, 
there is no reason to maintain compatibility in the new ABI - so it should be dropped

 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:55  0%           ` Honnappa Nagarahalli
@ 2020-07-07 17:00  0%             ` Thomas Monjalon
  2020-07-07 17:01  0%               ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 17:00 UTC (permalink / raw)
  To: Kinsella, Ray, Honnappa Nagarahalli
  Cc: dev, fady, Neil Horman, John McNamara, Marko Kovacevic,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon,
	david.marchand, bruce.richardson, nd

07/07/2020 18:55, Honnappa Nagarahalli:
> > On 07/07/2020 17:36, Thomas Monjalon wrote:
> > > 07/07/2020 18:35, Kinsella, Ray:
> > >> On 07/07/2020 16:26, Thomas Monjalon wrote:
> > >>> 07/07/2020 16:45, Ray Kinsella:
> > >>>> Clarify retention period for aliases to experimental.
> > >>>>
> > >>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> > >>>> ---
> > >>>> --- a/doc/guides/contributing/abi_versioning.rst
> > >>>> +++ b/doc/guides/contributing/abi_versioning.rst
> > >>>> @@ -158,7 +158,7 @@ The macros exported are:
> > >>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version
> > table entry
> > >>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal
> > function ``be``.
> > >>>>    The macro is used when a symbol matures to become part of the
> > >>>> stable ABI, to
> > >>>> -  provide an alias to experimental for some time.
> > >>>> +  provide an alias to experimental until the next major ABI version.
> > >>>
> > >>> Why limiting the period for experimental status?
> > >>> Some API want to remain experimental longer.
> 
> This is not limiting the period.
> This is about how long VERSION_SYMBOL_EXPERIMENTAL should be in place
> for a symbol after the experimental tag is removed for the symbol.

Oh wait, I was wrong. It is only about the alias which is set
AFTER the experimental period.

> > >>> [...]
> > >>>>  In situations in which an ``experimental`` symbol has been stable
> > >>>> for some time,  and it becomes a candidate for promotion to the
> > >>>> stable ABI. At this time, when -promoting the symbol, maintainer
> > >>>> may choose to provide an alias to the -``experimental`` symbol version,
> > so as not to break consuming applications.
> > >>>> +promoting the symbol, the maintainer may choose to provide an
> > >>>> +alias to the ``experimental`` symbol version, so as not to break
> > >>>> +consuming applications. This
> > >>>
> > >>> Please start a sentence on a new line.
> > >>
> > >> ACK
> > >>
> > >>>
> > >>>> +alias will then typically be dropped in the next major ABI version.
> > >>>
> > >>> I don't see the need for the time estimation.
> 
> I prefer this wording as it clarifying what should be done while creating a patch.

Yes, after a second read, I am OK.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:37  0%         ` Kinsella, Ray
  2020-07-07 16:55  0%           ` Honnappa Nagarahalli
@ 2020-07-07 16:57  0%           ` Thomas Monjalon
  2020-07-07 17:01  4%             ` Kinsella, Ray
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 16:57 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson

07/07/2020 18:37, Kinsella, Ray:
> 
> On 07/07/2020 17:36, Thomas Monjalon wrote:
> > 07/07/2020 18:35, Kinsella, Ray:
> >> On 07/07/2020 16:26, Thomas Monjalon wrote:
> >>> 07/07/2020 16:45, Ray Kinsella:
> >>>> Clarify retention period for aliases to experimental.
> >>>>
> >>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> >>>> ---
> >>>> --- a/doc/guides/contributing/abi_versioning.rst
> >>>> +++ b/doc/guides/contributing/abi_versioning.rst
> >>>> @@ -158,7 +158,7 @@ The macros exported are:
> >>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
> >>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
> >>>>    The macro is used when a symbol matures to become part of the stable ABI, to
> >>>> -  provide an alias to experimental for some time.
> >>>> +  provide an alias to experimental until the next major ABI version.
> >>>
> >>> Why limiting the period for experimental status?
> >>> Some API want to remain experimental longer.
> >>>
> >>> [...]
> >>>> +alias will then typically be dropped in the next major ABI version.
> >>>
> >>> I don't see the need for the time estimation.
> >>
> >> Will reword to ...
> >>
> >> "This alias will then be dropped in the next major ABI version."
> > 
> > It is not addressing my first comment. Please see above.
> 
> Thank you, I don't necessarily agree with the first comment :-)

You don't have to agree. But in this case we must discuss :-)

> We need to say when the alias should be dropped no?

I don't think so.
Until now, it is let to the appreciation of the maintainer.
If we want to change the rule, especially for experimental period,
it must be said clearly and debated.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:37  0%         ` Kinsella, Ray
@ 2020-07-07 16:55  0%           ` Honnappa Nagarahalli
  2020-07-07 17:00  0%             ` Thomas Monjalon
  2020-07-07 16:57  0%           ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2020-07-07 16:55 UTC (permalink / raw)
  To: Kinsella, Ray, thomas
  Cc: dev, fady, Neil Horman, John McNamara, Marko Kovacevic,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon,
	david.marchand, bruce.richardson, Honnappa Nagarahalli, nd, nd

<snip>

> Subject: Re: [PATCH v1 2/2] doc: clarify alias to experimental period
> 
> 
> 
> On 07/07/2020 17:36, Thomas Monjalon wrote:
> > 07/07/2020 18:35, Kinsella, Ray:
> >> On 07/07/2020 16:26, Thomas Monjalon wrote:
> >>> 07/07/2020 16:45, Ray Kinsella:
> >>>> Clarify retention period for aliases to experimental.
> >>>>
> >>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> >>>> ---
> >>>> --- a/doc/guides/contributing/abi_versioning.rst
> >>>> +++ b/doc/guides/contributing/abi_versioning.rst
> >>>> @@ -158,7 +158,7 @@ The macros exported are:
> >>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version
> table entry
> >>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal
> function ``be``.
> >>>>    The macro is used when a symbol matures to become part of the
> >>>> stable ABI, to
> >>>> -  provide an alias to experimental for some time.
> >>>> +  provide an alias to experimental until the next major ABI version.
> >>>
> >>> Why limiting the period for experimental status?
> >>> Some API want to remain experimental longer.
This is not limiting the period. This is about how long VERSION_SYMBOL_EXPERIMENTAL should be in place for a symbol after the experimental tag is removed for the symbol.

> >>>
> >>> [...]
> >>>>  In situations in which an ``experimental`` symbol has been stable
> >>>> for some time,  and it becomes a candidate for promotion to the
> >>>> stable ABI. At this time, when -promoting the symbol, maintainer
> >>>> may choose to provide an alias to the -``experimental`` symbol version,
> so as not to break consuming applications.
> >>>> +promoting the symbol, the maintainer may choose to provide an
> >>>> +alias to the ``experimental`` symbol version, so as not to break
> >>>> +consuming applications. This
> >>>
> >>> Please start a sentence on a new line.
> >>
> >> ACK
> >>
> >>>
> >>>> +alias will then typically be dropped in the next major ABI version.
> >>>
> >>> I don't see the need for the time estimation.
I prefer this wording as it clarifying what should be done while creating a patch.

> >>>
> >>>
> >>
> >> Will reword to ...
> >>
> >> "This alias will then be dropped in the next major ABI version."
> >
> > It is not addressing my first comment. Please see above.
> >
> 
> Thank you, I don't necessarily agree with the first comment :-) We need to say
> when the alias should be dropped no?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:36  0%       ` Thomas Monjalon
@ 2020-07-07 16:37  0%         ` Kinsella, Ray
  2020-07-07 16:55  0%           ` Honnappa Nagarahalli
  2020-07-07 16:57  0%           ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Kinsella, Ray @ 2020-07-07 16:37 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson



On 07/07/2020 17:36, Thomas Monjalon wrote:
> 07/07/2020 18:35, Kinsella, Ray:
>> On 07/07/2020 16:26, Thomas Monjalon wrote:
>>> 07/07/2020 16:45, Ray Kinsella:
>>>> Clarify retention period for aliases to experimental.
>>>>
>>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>>>> ---
>>>> --- a/doc/guides/contributing/abi_versioning.rst
>>>> +++ b/doc/guides/contributing/abi_versioning.rst
>>>> @@ -158,7 +158,7 @@ The macros exported are:
>>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
>>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
>>>>    The macro is used when a symbol matures to become part of the stable ABI, to
>>>> -  provide an alias to experimental for some time.
>>>> +  provide an alias to experimental until the next major ABI version.
>>>
>>> Why limiting the period for experimental status?
>>> Some API want to remain experimental longer.
>>>
>>> [...]
>>>>  In situations in which an ``experimental`` symbol has been stable for some time,
>>>>  and it becomes a candidate for promotion to the stable ABI. At this time, when
>>>> -promoting the symbol, maintainer may choose to provide an alias to the
>>>> -``experimental`` symbol version, so as not to break consuming applications.
>>>> +promoting the symbol, the maintainer may choose to provide an alias to the
>>>> +``experimental`` symbol version, so as not to break consuming applications. This
>>>
>>> Please start a sentence on a new line.
>>
>> ACK
>>
>>>
>>>> +alias will then typically be dropped in the next major ABI version.
>>>
>>> I don't see the need for the time estimation.
>>>
>>>
>>
>> Will reword to ...
>>
>> "This alias will then be dropped in the next major ABI version."
> 
> It is not addressing my first comment. Please see above.
> 

Thank you, I don't necessarily agree with the first comment :-)
We need to say when the alias should be dropped no?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:35  3%     ` Kinsella, Ray
@ 2020-07-07 16:36  0%       ` Thomas Monjalon
  2020-07-07 16:37  0%         ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 16:36 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson

07/07/2020 18:35, Kinsella, Ray:
> On 07/07/2020 16:26, Thomas Monjalon wrote:
> > 07/07/2020 16:45, Ray Kinsella:
> >> Clarify retention period for aliases to experimental.
> >>
> >> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> >> ---
> >> --- a/doc/guides/contributing/abi_versioning.rst
> >> +++ b/doc/guides/contributing/abi_versioning.rst
> >> @@ -158,7 +158,7 @@ The macros exported are:
> >>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
> >>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
> >>    The macro is used when a symbol matures to become part of the stable ABI, to
> >> -  provide an alias to experimental for some time.
> >> +  provide an alias to experimental until the next major ABI version.
> > 
> > Why limiting the period for experimental status?
> > Some API want to remain experimental longer.
> > 
> > [...]
> >>  In situations in which an ``experimental`` symbol has been stable for some time,
> >>  and it becomes a candidate for promotion to the stable ABI. At this time, when
> >> -promoting the symbol, maintainer may choose to provide an alias to the
> >> -``experimental`` symbol version, so as not to break consuming applications.
> >> +promoting the symbol, the maintainer may choose to provide an alias to the
> >> +``experimental`` symbol version, so as not to break consuming applications. This
> > 
> > Please start a sentence on a new line.
> 
> ACK
> 
> > 
> >> +alias will then typically be dropped in the next major ABI version.
> > 
> > I don't see the need for the time estimation.
> > 
> > 
> 
> Will reword to ...
> 
> "This alias will then be dropped in the next major ABI version."

It is not addressing my first comment. Please see above.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 15:26  0%   ` Thomas Monjalon
@ 2020-07-07 16:35  3%     ` Kinsella, Ray
  2020-07-07 16:36  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-07 16:35 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson



On 07/07/2020 16:26, Thomas Monjalon wrote:
> 07/07/2020 16:45, Ray Kinsella:
>> Clarify retention period for aliases to experimental.
>>
>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>> ---
>> --- a/doc/guides/contributing/abi_versioning.rst
>> +++ b/doc/guides/contributing/abi_versioning.rst
>> @@ -158,7 +158,7 @@ The macros exported are:
>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
>>    The macro is used when a symbol matures to become part of the stable ABI, to
>> -  provide an alias to experimental for some time.
>> +  provide an alias to experimental until the next major ABI version.
> 
> Why limiting the period for experimental status?
> Some API want to remain experimental longer.
> 
> [...]
>>  In situations in which an ``experimental`` symbol has been stable for some time,
>>  and it becomes a candidate for promotion to the stable ABI. At this time, when
>> -promoting the symbol, maintainer may choose to provide an alias to the
>> -``experimental`` symbol version, so as not to break consuming applications.
>> +promoting the symbol, the maintainer may choose to provide an alias to the
>> +``experimental`` symbol version, so as not to break consuming applications. This
> 
> Please start a sentence on a new line.

ACK

> 
>> +alias will then typically be dropped in the next major ABI version.
> 
> I don't see the need for the time estimation.
> 
> 

Will reword to ...

"This alias will then be dropped in the next major ABI version."

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows
  2020-07-07 15:23  7%   ` Thomas Monjalon
@ 2020-07-07 16:33  4%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-07 16:33 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, talshn



On 07/07/2020 16:23, Thomas Monjalon wrote:
> 07/07/2020 16:45, Ray Kinsella:
>> Minor changes to the abi policy for windows.
> 
> It looks like you were not fast enough to comment
> in the original thread :)
> Please add a Fixes line to reference the original commit.
> 
>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>> ---
>>  doc/guides/contributing/abi_policy.rst | 4 +++-
>>  doc/guides/windows_gsg/intro.rst       | 6 +++---
>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
>> index d0affa9..8e70b45 100644
>> --- a/doc/guides/contributing/abi_policy.rst
>> +++ b/doc/guides/contributing/abi_policy.rst
>> @@ -40,7 +40,9 @@ General Guidelines
>>     maintaining ABI stability through one year of DPDK releases starting from
>>     DPDK 19.11. This policy will be reviewed in 2020, with intention of
>>     lengthening the stability period. Additional implementation detail can be
>> -   found in the :ref:`release notes <20_02_abi_changes>`.
>> +   found in the :ref:`release notes <20_02_abi_changes>`. Please note that this
>> +   policy does not currently apply to the :doc:`Window build
> 
> Window -> Windows

ACK

> 
>> +   <../windows_gsg/intro>`.
>>  
>>  What is an ABI?
>>  ~~~~~~~~~~~~~~~
>> diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
>> index 58c6246..707afd3 100644
>> --- a/doc/guides/windows_gsg/intro.rst
>> +++ b/doc/guides/windows_gsg/intro.rst
>> @@ -19,6 +19,6 @@ compile. Support is being added in pieces so as to limit the overall scope
>>  of any individual patch series. The goal is to be able to run any DPDK
>>  application natively on Windows.
>>  
>> -The :doc:`../contributing/abi_policy` cannot be respected for Windows.
>> -Minor ABI versions may be incompatible
>> -because function versioning is not supported on Windows.
>> +The :doc:`../contributing/abi_policy` does not apply to the Windows build, as
>> +function versioning is not supported on Windows, therefore minor ABI versions
>> +may be incompatible.
> 
> Please I really prefer we split lines logically rather than filling the space:
> The :doc:`../contributing/abi_policy` does not apply to the Windows build,
> as function versioning is not supported on Windows,
> therefore minor ABI versions may be incompatible.
> 
That is a single line though :-)
 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics
  2020-07-07 14:29  0%       ` Jerin Jacob
@ 2020-07-07 15:56  0%         ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-07 15:56 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: thomas, Erik Gabriel Carrillo, dpdk-dev, jerinj,
	Honnappa Nagarahalli, David Christensen, Ruifeng Wang,
	Dharmik Thakkar, nd, David Marchand, Ray Kinsella, Neil Horman,
	dodji

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, July 7, 2020 10:30 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: thomas@monjalon.net; Erik Gabriel Carrillo <erik.g.carrillo@intel.com>;
> dpdk-dev <dev@dpdk.org>; jerinj@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; David Christensen
> <drc@linux.vnet.ibm.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>;
> Dharmik Thakkar <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; David
> Marchand <david.marchand@redhat.com>; Ray Kinsella <mdr@ashroe.eu>;
> Neil Horman <nhorman@tuxdriver.com>; dodji@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with
> C11 atomics
> 
> On Tue, Jul 7, 2020 at 4:45 PM Phil Yang <phil.yang@arm.com> wrote:
> >
> > The impl_opaque field is shared between the timer arm and cancel
> > operations. Meanwhile, the state flag acts as a guard variable to
> > make sure the update of impl_opaque is synchronized. The original
> > code uses rte_smp barriers to achieve that. This patch uses C11
> > atomics with an explicit one-way memory barrier instead of full
> > barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.
> >
> > Since compilers can generate the same instructions for volatile and
> > non-volatile variable in C11 __atomics built-ins, so remain the volatile
> > keyword in front of state enum to avoid the ABI break issue.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> 
> 
> Could you fix the following:
> 
> WARNING:TYPO_SPELLING: 'opague' may be misspelled - perhaps 'opaque'?
> #184: FILE: lib/librte_eventdev/rte_event_timer_adapter.c:1161:
> + * specific opague data under the correct state.
Done. 

Thanks,
Phil

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 4/4] eventdev: relax smp barriers with C11 atomics
  @ 2020-07-07 15:54  4%       ` Phil Yang
  2020-07-08 13:30  4%       ` [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter Jerin Jacob
  1 sibling, 0 replies; 200+ results
From: Phil Yang @ 2020-07-07 15:54 UTC (permalink / raw)
  To: jerinj, dev
  Cc: thomas, erik.g.carrillo, Honnappa.Nagarahalli, drc, Ruifeng.Wang,
	Dharmik.Thakkar, nd, david.marchand, mdr, nhorman, dodji, stable

The impl_opaque field is shared between the timer arm and cancel
operations. Meanwhile, the state flag acts as a guard variable to
make sure the update of impl_opaque is synchronized. The original
code uses rte_smp barriers to achieve that. This patch uses C11
atomics with an explicit one-way memory barrier instead of full
barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.

Since compilers can generate the same instructions for volatile and
non-volatile variable in C11 __atomics built-ins, so remain the volatile
keyword in front of state enum to avoid the ABI break issue.

Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
v4:
1. Fix typo.
2. Cc to stable release. (Honnappa)

v3:
Fix ABI issue: revert to 'volatile enum rte_event_timer_state type state'.

v2:
1. Removed implementation-specific opaque data cleanup code.
2. Replaced thread fence with atomic ACQURE/RELEASE ordering on state access.

 lib/librte_eventdev/rte_event_timer_adapter.c | 55 ++++++++++++++++++---------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index aa01b4d..4c5e49e 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -629,7 +629,8 @@ swtim_callback(struct rte_timer *tim)
 		sw->expired_timers[sw->n_expired_timers++] = tim;
 		sw->stats.evtim_exp_count++;
 
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+		__atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
+				__ATOMIC_RELEASE);
 	}
 
 	if (event_buffer_batch_ready(&sw->buffer)) {
@@ -1020,6 +1021,7 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 	int n_lcores;
 	/* Timer list for this lcore is not in use. */
 	uint16_t exp_state = 0;
+	enum rte_event_timer_state n_state;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1060,30 +1062,36 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 
 	for (i = 0; i < nb_evtims; i++) {
-		/* Don't modify the event timer state in these cases */
-		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
+		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		if (n_state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
-		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+		} else if (!(n_state == RTE_EVENT_TIMER_NOT_ARMED ||
+			     n_state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
 		if (unlikely(ret == -1)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR_TOOLATE,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		} else if (unlikely(ret == -2)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR_TOOEARLY,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		}
 
 		if (unlikely(check_destination_event_queue(evtims[i],
 							   adapter) < 0)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1099,13 +1107,18 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 					  SINGLE, lcore_id, NULL, evtims[i]);
 		if (ret < 0) {
 			/* tim was in RUNNING or CONFIG state */
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR,
+					__ATOMIC_RELEASE);
 			break;
 		}
 
-		rte_smp_wmb();
 		EVTIM_LOG_DBG("armed an event timer");
-		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
+		/* RELEASE ordering guarantees the adapter specific value
+		 * changes observed before the update of state.
+		 */
+		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
+				__ATOMIC_RELEASE);
 	}
 
 	if (i < nb_evtims)
@@ -1132,6 +1145,7 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	struct rte_timer *timp;
 	uint64_t opaque;
 	struct swtim *sw = swtim_pmd_priv(adapter);
+	enum rte_event_timer_state n_state;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1143,16 +1157,18 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
-		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
+		/* ACQUIRE ordering guarantees the access of implementation
+		 * specific opaque data under the correct state.
+		 */
+		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		if (n_state == RTE_EVENT_TIMER_CANCELED) {
 			rte_errno = EALREADY;
 			break;
-		} else if (evtims[i]->state != RTE_EVENT_TIMER_ARMED) {
+		} else if (n_state != RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EINVAL;
 			break;
 		}
 
-		rte_smp_rmb();
-
 		opaque = evtims[i]->impl_opaque[0];
 		timp = (struct rte_timer *)(uintptr_t)opaque;
 		RTE_ASSERT(timp != NULL);
@@ -1166,9 +1182,12 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 
 		rte_mempool_put(sw->tim_pool, (void **)timp);
 
-		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-
-		rte_smp_wmb();
+		/* The RELEASE ordering here pairs with atomic ordering
+		 * to make sure the state update data observed between
+		 * threads.
+		 */
+		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
+				__ATOMIC_RELEASE);
 	}
 
 	return i;
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 14:45 12% ` [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period Ray Kinsella
@ 2020-07-07 15:26  0%   ` Thomas Monjalon
  2020-07-07 16:35  3%     ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 15:26 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, fady, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon, david.marchand,
	nhorman, bruce.richardson

07/07/2020 16:45, Ray Kinsella:
> Clarify retention period for aliases to experimental.
> 
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
> --- a/doc/guides/contributing/abi_versioning.rst
> +++ b/doc/guides/contributing/abi_versioning.rst
> @@ -158,7 +158,7 @@ The macros exported are:
>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
>    The macro is used when a symbol matures to become part of the stable ABI, to
> -  provide an alias to experimental for some time.
> +  provide an alias to experimental until the next major ABI version.

Why limiting the period for experimental status?
Some API want to remain experimental longer.

[...]
>  In situations in which an ``experimental`` symbol has been stable for some time,
>  and it becomes a candidate for promotion to the stable ABI. At this time, when
> -promoting the symbol, maintainer may choose to provide an alias to the
> -``experimental`` symbol version, so as not to break consuming applications.
> +promoting the symbol, the maintainer may choose to provide an alias to the
> +``experimental`` symbol version, so as not to break consuming applications. This

Please start a sentence on a new line.

> +alias will then typically be dropped in the next major ABI version.

I don't see the need for the time estimation.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows
  2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
@ 2020-07-07 15:23  7%   ` Thomas Monjalon
  2020-07-07 16:33  4%     ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 15:23 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, fady, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon, talshn

07/07/2020 16:45, Ray Kinsella:
> Minor changes to the abi policy for windows.

It looks like you were not fast enough to comment
in the original thread :)
Please add a Fixes line to reference the original commit.

> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
>  doc/guides/contributing/abi_policy.rst | 4 +++-
>  doc/guides/windows_gsg/intro.rst       | 6 +++---
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
> index d0affa9..8e70b45 100644
> --- a/doc/guides/contributing/abi_policy.rst
> +++ b/doc/guides/contributing/abi_policy.rst
> @@ -40,7 +40,9 @@ General Guidelines
>     maintaining ABI stability through one year of DPDK releases starting from
>     DPDK 19.11. This policy will be reviewed in 2020, with intention of
>     lengthening the stability period. Additional implementation detail can be
> -   found in the :ref:`release notes <20_02_abi_changes>`.
> +   found in the :ref:`release notes <20_02_abi_changes>`. Please note that this
> +   policy does not currently apply to the :doc:`Window build

Window -> Windows

> +   <../windows_gsg/intro>`.
>  
>  What is an ABI?
>  ~~~~~~~~~~~~~~~
> diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
> index 58c6246..707afd3 100644
> --- a/doc/guides/windows_gsg/intro.rst
> +++ b/doc/guides/windows_gsg/intro.rst
> @@ -19,6 +19,6 @@ compile. Support is being added in pieces so as to limit the overall scope
>  of any individual patch series. The goal is to be able to run any DPDK
>  application natively on Windows.
>  
> -The :doc:`../contributing/abi_policy` cannot be respected for Windows.
> -Minor ABI versions may be incompatible
> -because function versioning is not supported on Windows.
> +The :doc:`../contributing/abi_policy` does not apply to the Windows build, as
> +function versioning is not supported on Windows, therefore minor ABI versions
> +may be incompatible.

Please I really prefer we split lines logically rather than filling the space:
The :doc:`../contributing/abi_policy` does not apply to the Windows build,
as function versioning is not supported on Windows,
therefore minor ABI versions may be incompatible.




^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
@ 2020-07-07 15:23  0%   ` Olivier Matz
  2020-07-08 14:16  0%   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
  1 sibling, 0 replies; 200+ results
From: Olivier Matz @ 2020-07-07 15:23 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

On Tue, Jul 07, 2020 at 02:57:11PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  v1->v4:
>     - dedicated dynamic Tx timestamp flag instead of shared with Rx
>     - Doxygen-style comment
>     - comments update
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..7e9f7d2 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow4

allow4 -> allow

> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
> + * packet being sent it tries to synchronize the time of packet appearing
> + * on the wire with the specified packet timestamp. If the specified one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according to timestamps is supposed,
> + * neither for packet within the burst, nor for the whole bursts, it is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
> +

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library
    2020-06-29  8:02  3% ` [dpdk-dev] [PATCH v5 0/3] " Ruifeng Wang
  2020-07-07 14:40  3% ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-07 15:15  3% ` Ruifeng Wang
    2020-07-09  8:02  4% ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-07-07 15:15 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 120 ++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 987 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (3 preceding siblings ...)
  2020-07-07 13:08  2% ` [dpdk-dev] [PATCH v3 " Viacheslav Ovsiienko
@ 2020-07-07 14:57  2% ` Viacheslav Ovsiienko
  2020-07-07 15:23  0%   ` Olivier Matz
  2020-07-08 14:16  0%   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
  2020-07-08 15:47  2% ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-07-09 12:36  2% ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
  6 siblings, 2 replies; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-07 14:57 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 v1->v4:
    - dedicated dynamic Tx timestamp flag instead of shared with Rx
    - Doxygen-style comment
    - comments update

---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..7e9f7d2 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow4
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 14:45  8% [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes Ray Kinsella
  2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
@ 2020-07-07 14:45 12% ` Ray Kinsella
  2020-07-07 15:26  0%   ` Thomas Monjalon
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
  2 siblings, 1 reply; 200+ results
From: Ray Kinsella @ 2020-07-07 14:45 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Clarify retention period for aliases to experimental.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_versioning.rst | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/guides/contributing/abi_versioning.rst b/doc/guides/contributing/abi_versioning.rst
index 31a9205..e00dfa8 100644
--- a/doc/guides/contributing/abi_versioning.rst
+++ b/doc/guides/contributing/abi_versioning.rst
@@ -158,7 +158,7 @@ The macros exported are:
 * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
   binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
   The macro is used when a symbol matures to become part of the stable ABI, to
-  provide an alias to experimental for some time.
+  provide an alias to experimental until the next major ABI version.
 
 .. _example_abi_macro_usage:
 
@@ -428,8 +428,9 @@ _____________________________
 
 In situations in which an ``experimental`` symbol has been stable for some time,
 and it becomes a candidate for promotion to the stable ABI. At this time, when
-promoting the symbol, maintainer may choose to provide an alias to the
-``experimental`` symbol version, so as not to break consuming applications.
+promoting the symbol, the maintainer may choose to provide an alias to the
+``experimental`` symbol version, so as not to break consuming applications. This
+alias will then typically be dropped in the next major ABI version.
 
 The process to provide an alias to ``experimental`` is similar to that, of
 :ref:`symbol versioning <example_abi_macro_usage>` described above.
-- 
2.7.4


^ permalink raw reply	[relevance 12%]

* [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows
  2020-07-07 14:45  8% [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes Ray Kinsella
@ 2020-07-07 14:45 24% ` Ray Kinsella
  2020-07-07 15:23  7%   ` Thomas Monjalon
  2020-07-07 14:45 12% ` [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period Ray Kinsella
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
  2 siblings, 1 reply; 200+ results
From: Ray Kinsella @ 2020-07-07 14:45 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Minor changes to the abi policy for windows.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_policy.rst | 4 +++-
 doc/guides/windows_gsg/intro.rst       | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index d0affa9..8e70b45 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -40,7 +40,9 @@ General Guidelines
    maintaining ABI stability through one year of DPDK releases starting from
    DPDK 19.11. This policy will be reviewed in 2020, with intention of
    lengthening the stability period. Additional implementation detail can be
-   found in the :ref:`release notes <20_02_abi_changes>`.
+   found in the :ref:`release notes <20_02_abi_changes>`. Please note that this
+   policy does not currently apply to the :doc:`Window build
+   <../windows_gsg/intro>`.
 
 What is an ABI?
 ~~~~~~~~~~~~~~~
diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
index 58c6246..707afd3 100644
--- a/doc/guides/windows_gsg/intro.rst
+++ b/doc/guides/windows_gsg/intro.rst
@@ -19,6 +19,6 @@ compile. Support is being added in pieces so as to limit the overall scope
 of any individual patch series. The goal is to be able to run any DPDK
 application natively on Windows.
 
-The :doc:`../contributing/abi_policy` cannot be respected for Windows.
-Minor ABI versions may be incompatible
-because function versioning is not supported on Windows.
+The :doc:`../contributing/abi_policy` does not apply to the Windows build, as
+function versioning is not supported on Windows, therefore minor ABI versions
+may be incompatible.
-- 
2.7.4


^ permalink raw reply	[relevance 24%]

* [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes
@ 2020-07-07 14:45  8% Ray Kinsella
  2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Ray Kinsella @ 2020-07-07 14:45 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Few documentation fixexs, clarifing the Windows ABI policy and aliases to
experimental mode.

Ray Kinsella (2):
  doc: reword abi policy for windows
  doc: clarify alias to experimental period

 doc/guides/contributing/abi_policy.rst     | 4 +++-
 doc/guides/contributing/abi_versioning.rst | 7 ++++---
 doc/guides/windows_gsg/intro.rst           | 6 +++---
 3 files changed, 10 insertions(+), 7 deletions(-)

--
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library
    2020-06-29  8:02  3% ` [dpdk-dev] [PATCH v5 0/3] " Ruifeng Wang
@ 2020-07-07 14:40  3% ` Ruifeng Wang
  2020-07-07 15:15  3% ` [dpdk-dev] [PATCH v7 " Ruifeng Wang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-07 14:40 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 120 ++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 987 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 13:08  2% ` [dpdk-dev] [PATCH v3 " Viacheslav Ovsiienko
@ 2020-07-07 14:32  0%   ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2020-07-07 14:32 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

On Tue, Jul 07, 2020 at 01:08:02PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..5b2f3da 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared timestamp field will be used to handle the
> + * timestamps on receiving datapath as well. Having the dedicated flags
> + * for Rx/Tx timstamps allows applications not to perform explicit flags
> + * reset on forwarding and not to promote received timestamps to the
> + * transmitting datapath by default.
> + *
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
> + * packet being sent it tries to synchronize the time of packet appearing
> + * on the wire with the specified packet timestamp. If the specified one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according timestamps is supposed,

I think there is a typo here

> + * neither within packet burst, nor between packets, it is an entirely
> + * application responsibility to generate packets and its timestamps in
> + * desired order. The timestamps might be put only in the first packet in
> + * the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"

Is it possible to split the comment, to document both
RTE_MBUF_DYNFIELD_TIMESTAMP_NAME and RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME ?  I
didn't try to generate the documentation, but I think, like this, that
RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME will be undocumented.

Apart from that, it looks good to me.


> +
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics
  2020-07-07 11:13  4%     ` [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics Phil Yang
@ 2020-07-07 14:29  0%       ` Jerin Jacob
  2020-07-07 15:56  0%         ` Phil Yang
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-07-07 14:29 UTC (permalink / raw)
  To: Phil Yang
  Cc: Thomas Monjalon, Erik Gabriel Carrillo, dpdk-dev, Jerin Jacob,
	Honnappa Nagarahalli, David Christensen,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, nd, David Marchand, Ray Kinsella, Neil Horman,
	dodji

On Tue, Jul 7, 2020 at 4:45 PM Phil Yang <phil.yang@arm.com> wrote:
>
> The impl_opaque field is shared between the timer arm and cancel
> operations. Meanwhile, the state flag acts as a guard variable to
> make sure the update of impl_opaque is synchronized. The original
> code uses rte_smp barriers to achieve that. This patch uses C11
> atomics with an explicit one-way memory barrier instead of full
> barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.
>
> Since compilers can generate the same instructions for volatile and
> non-volatile variable in C11 __atomics built-ins, so remain the volatile
> keyword in front of state enum to avoid the ABI break issue.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>


Could you fix the following:

WARNING:TYPO_SPELLING: 'opague' may be misspelled - perhaps 'opaque'?
#184: FILE: lib/librte_eventdev/rte_event_timer_adapter.c:1161:
+ * specific opague data under the correct state.

total: 0 errors, 1 warnings, 124 lines checked


> ---
> v3:
> Fix ABI issue: revert to 'volatile enum rte_event_timer_state type state'.
>
> v2:
> 1. Removed implementation-specific opaque data cleanup code.
> 2. Replaced thread fence with atomic ACQURE/RELEASE ordering on state access.
>
>  lib/librte_eventdev/rte_event_timer_adapter.c | 55 ++++++++++++++++++---------
>  1 file changed, 37 insertions(+), 18 deletions(-)
>
> diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
> index d75415c..eb2c93a 100644
> --- a/lib/librte_eventdev/rte_event_timer_adapter.c
> +++ b/lib/librte_eventdev/rte_event_timer_adapter.c
> @@ -629,7 +629,8 @@ swtim_callback(struct rte_timer *tim)
>                 sw->expired_timers[sw->n_expired_timers++] = tim;
>                 sw->stats.evtim_exp_count++;
>
> -               evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
> +               __atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
> +                               __ATOMIC_RELEASE);
>         }
>
>         if (event_buffer_batch_ready(&sw->buffer)) {
> @@ -1020,6 +1021,7 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
>         int n_lcores;
>         /* Timer list for this lcore is not in use. */
>         uint16_t exp_state = 0;
> +       enum rte_event_timer_state n_state;
>
>  #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
>         /* Check that the service is running. */
> @@ -1060,30 +1062,36 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
>         }
>
>         for (i = 0; i < nb_evtims; i++) {
> -               /* Don't modify the event timer state in these cases */
> -               if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
> +               n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
> +               if (n_state == RTE_EVENT_TIMER_ARMED) {
>                         rte_errno = EALREADY;
>                         break;
> -               } else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
> -                            evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
> +               } else if (!(n_state == RTE_EVENT_TIMER_NOT_ARMED ||
> +                            n_state == RTE_EVENT_TIMER_CANCELED)) {
>                         rte_errno = EINVAL;
>                         break;
>                 }
>
>                 ret = check_timeout(evtims[i], adapter);
>                 if (unlikely(ret == -1)) {
> -                       evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
> +                       __atomic_store_n(&evtims[i]->state,
> +                                       RTE_EVENT_TIMER_ERROR_TOOLATE,
> +                                       __ATOMIC_RELAXED);
>                         rte_errno = EINVAL;
>                         break;
>                 } else if (unlikely(ret == -2)) {
> -                       evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
> +                       __atomic_store_n(&evtims[i]->state,
> +                                       RTE_EVENT_TIMER_ERROR_TOOEARLY,
> +                                       __ATOMIC_RELAXED);
>                         rte_errno = EINVAL;
>                         break;
>                 }
>
>                 if (unlikely(check_destination_event_queue(evtims[i],
>                                                            adapter) < 0)) {
> -                       evtims[i]->state = RTE_EVENT_TIMER_ERROR;
> +                       __atomic_store_n(&evtims[i]->state,
> +                                       RTE_EVENT_TIMER_ERROR,
> +                                       __ATOMIC_RELAXED);
>                         rte_errno = EINVAL;
>                         break;
>                 }
> @@ -1099,13 +1107,18 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
>                                           SINGLE, lcore_id, NULL, evtims[i]);
>                 if (ret < 0) {
>                         /* tim was in RUNNING or CONFIG state */
> -                       evtims[i]->state = RTE_EVENT_TIMER_ERROR;
> +                       __atomic_store_n(&evtims[i]->state,
> +                                       RTE_EVENT_TIMER_ERROR,
> +                                       __ATOMIC_RELEASE);
>                         break;
>                 }
>
> -               rte_smp_wmb();
>                 EVTIM_LOG_DBG("armed an event timer");
> -               evtims[i]->state = RTE_EVENT_TIMER_ARMED;
> +               /* RELEASE ordering guarantees the adapter specific value
> +                * changes observed before the update of state.
> +                */
> +               __atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
> +                               __ATOMIC_RELEASE);
>         }
>
>         if (i < nb_evtims)
> @@ -1132,6 +1145,7 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
>         struct rte_timer *timp;
>         uint64_t opaque;
>         struct swtim *sw = swtim_pmd_priv(adapter);
> +       enum rte_event_timer_state n_state;
>
>  #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
>         /* Check that the service is running. */
> @@ -1143,16 +1157,18 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
>
>         for (i = 0; i < nb_evtims; i++) {
>                 /* Don't modify the event timer state in these cases */
> -               if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
> +               /* ACQUIRE ordering guarantees the access of implementation
> +                * specific opague data under the correct state.
> +                */
> +               n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
> +               if (n_state == RTE_EVENT_TIMER_CANCELED) {
>                         rte_errno = EALREADY;
>                         break;
> -               } else if (evtims[i]->state != RTE_EVENT_TIMER_ARMED) {
> +               } else if (n_state != RTE_EVENT_TIMER_ARMED) {
>                         rte_errno = EINVAL;
>                         break;
>                 }
>
> -               rte_smp_rmb();
> -
>                 opaque = evtims[i]->impl_opaque[0];
>                 timp = (struct rte_timer *)(uintptr_t)opaque;
>                 RTE_ASSERT(timp != NULL);
> @@ -1166,9 +1182,12 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
>
>                 rte_mempool_put(sw->tim_pool, (void **)timp);
>
> -               evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
> -
> -               rte_smp_wmb();
> +               /* The RELEASE ordering here pairs with atomic ordering
> +                * to make sure the state update data observed between
> +                * threads.
> +                */
> +               __atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
> +                               __ATOMIC_RELEASE);
>         }
>
>         return i;
> --
> 2.7.4
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (2 preceding siblings ...)
  2020-07-07 12:59  2% ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
@ 2020-07-07 13:08  2% ` Viacheslav Ovsiienko
  2020-07-07 14:32  0%   ` Olivier Matz
  2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-07 13:08 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..5b2f3da 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared timestamp field will be used to handle the
+ * timestamps on receiving datapath as well. Having the dedicated flags
+ * for Rx/Tx timstamps allows applications not to perform explicit flags
+ * reset on forwarding and not to promote received timestamps to the
+ * transmitting datapath by default.
+ *
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according timestamps is supposed,
+ * neither within packet burst, nor between packets, it is an entirely
+ * application responsibility to generate packets and its timestamps in
+ * desired order. The timestamps might be put only in the first packet in
+ * the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 1/2] mbuf: introduce accurate packet Tx scheduling
      2020-07-01 15:36  2% ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
@ 2020-07-07 12:59  2% ` Viacheslav Ovsiienko
  2020-07-07 13:08  2% ` [dpdk-dev] [PATCH v3 " Viacheslav Ovsiienko
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-07 12:59 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..834acdd 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared timestamp field will be used to handle the
+ * timestamps on receiving datapath as well. Having the dedicated flags
+ * for Rx/Tx timstamps allows applications not to perform explicit flags
+ * reset on forwaring and not to promote received timestamps to the
+ * transmitting datapath by default.
+ *
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according timestamps is supposed,
+ * neither within packet burst, nor between packets, it is an entirely
+ * application responsibility to generate packets and its timestamps in
+ * desired order. The timestamps might be put only in the first packet in
+ * the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 11:50  0%   ` Olivier Matz
@ 2020-07-07 12:46  0%     ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-07 12:46 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, Matan Azrad, Raslan Darawsheh, bernard.iremonger, thomas

Hi, Olivier

Thanks a lot for the review.

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Tuesday, July 7, 2020 14:51
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Raslan
> Darawsheh <rasland@mellanox.com>; bernard.iremonger@intel.com;
> thomas@mellanox.net
> Subject: Re: [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
> 
> Hi Slava,
> 
> Few question/comments below.
> 
> On Wed, Jul 01, 2020 at 03:36:26PM +0000, Viacheslav Ovsiienko wrote:
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> >
> > The main objective of this RFC is to specify the way how applications
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well.
> 
> Do you mean the same flag will be used for both Rx and Tx?  I wonder if it's a
> good idea: if you enable the timestamp on Rx, the packets will be flagged
> and it will impact Tx, except if the application explicitly resets the flag in all
> mbufs. Wouldn't it be safer to have an Rx flag and a Tx flag?

A little bit difficult to say ambiguously, I thought about and did not make strong decision.
We have the flag sharing for the Rx/Tx metadata and just follow the same approach.
OK, I will promote ones to:
RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
RTE_MBUF_DYNFIELD_TX_TIMESTAMP_NAME

And, possible, we should reconsider metadata dynamic flags.
> 
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. It the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> 
> I think what to do with packets to be send in the "past" could be configurable
> through an ethdev API in the future (drop or send).
Yes, currently there is no complete understanding how to handle packets out of the time slot.
In 20.11 we are going to introduce the time-based rte flow API to manage out-of-window packets.
 
> 
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.

"can" should be replaced with "might". Current mlx5 implementation
checks each packet in the burst for the timestamp presence.

> 
> This constraint makes sense. At first glance, it looks it is imposed by a PMD or
> hw limitation, but thinking more about it, I think it is the correct behavior to
> have. Packets are ordered within a PMD queue, and the ability to set the
> timestamp for one packet to delay the subsequent ones looks useful.
> 
> Should this behavior be documented somewhere? Maybe in the API
> comment documenting the dynamic flag?

It is documented in mlx5.rst (coming soon), do you think it should be
more common? OK, will update.

> 
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> >
> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <burst_gap>,<intra_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> > current device clock value and provide the reference for the timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  lib/librte_ethdev/rte_ethdev.c |  1 +  lib/librte_ethdev/rte_ethdev.h
> > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
> >  3 files changed, 21 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> >  };
> >
> >  #undef RTE_TX_OFFLOAD_BIT2STR
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > +
> > +
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**<
> > Device supports Rx queue setup after device started*/  #define
> > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > index 96c3631..fb5477c 100644
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
> >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> >
> > +/*
> > + * The timestamp dynamic field provides some timing information, the
> > + * units and time references (initial phase) are not explicitly
> > +defined
> > + * but are maintained always the same for a given port. Some devices
> > +allow
> > + * to query rte_eth_read_clock() that will return the current device
> > + * timestamp. The dynamic timestamp flag tells whether the field
> > +contains
> > + * actual timestamp value. For the packets being sent this value can
> > +be
> > + * used by PMD to schedule packet sending.
> > + *
> > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > + * and obsoleting, these dynamic flag and field will be used to
> > +manage
> > + * the timestamps on receiving datapath as well.
> > + */
> > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> > +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME
> "rte_dynflag_timestamp"
> > +
> 
> I realize that's not the case for rte_flow_dynfield_metadata, but I think it
> would be good to have a doxygen-like comment.
OK, will extend comment  with expected PMD behavior and replace "/*" with "/**"
> 
> 
> 
> Regards,
> Olivier

With best regards, Slava


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-01 15:36  2% ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
@ 2020-07-07 11:50  0%   ` Olivier Matz
  2020-07-07 12:46  0%     ` Slava Ovsiienko
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2020-07-07 11:50 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

Hi Slava,

Few question/comments below.

On Wed, Jul 01, 2020 at 03:36:26PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.

Do you mean the same flag will be used for both Rx and Tx?  I wonder if
it's a good idea: if you enable the timestamp on Rx, the packets will be
flagged and it will impact Tx, except if the application explicitly
resets the flag in all mbufs. Wouldn't it be safer to have an Rx flag
and a Tx flag?

> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. It the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.

I think what to do with packets to be send in the "past" could be
configurable through an ethdev API in the future (drop or send).

> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.

This constraint makes sense. At first glance, it looks it is imposed by
a PMD or hw limitation, but thinking more about it, I think it is the
correct behavior to have. Packets are ordered within a PMD queue, and
the ability to set the timestamp for one packet to delay the subsequent
ones looks useful.

Should this behavior be documented somewhere? Maybe in the API comment
documenting the dynamic flag?

> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
>  3 files changed, 21 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..fb5477c 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/*
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, these dynamic flag and field will be used to manage
> + * the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME "rte_dynflag_timestamp"
> +

I realize that's not the case for rte_flow_dynfield_metadata, but
I think it would be good to have a doxygen-like comment.



Regards,
Olivier

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics
  @ 2020-07-07 11:13  4%     ` Phil Yang
  2020-07-07 14:29  0%       ` Jerin Jacob
    1 sibling, 1 reply; 200+ results
From: Phil Yang @ 2020-07-07 11:13 UTC (permalink / raw)
  To: thomas, erik.g.carrillo, dev
  Cc: jerinj, Honnappa.Nagarahalli, drc, Ruifeng.Wang, Dharmik.Thakkar,
	nd, david.marchand, mdr, nhorman, dodji

The impl_opaque field is shared between the timer arm and cancel
operations. Meanwhile, the state flag acts as a guard variable to
make sure the update of impl_opaque is synchronized. The original
code uses rte_smp barriers to achieve that. This patch uses C11
atomics with an explicit one-way memory barrier instead of full
barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.

Since compilers can generate the same instructions for volatile and
non-volatile variable in C11 __atomics built-ins, so remain the volatile
keyword in front of state enum to avoid the ABI break issue.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
v3:
Fix ABI issue: revert to 'volatile enum rte_event_timer_state type state'.

v2:
1. Removed implementation-specific opaque data cleanup code.
2. Replaced thread fence with atomic ACQURE/RELEASE ordering on state access.

 lib/librte_eventdev/rte_event_timer_adapter.c | 55 ++++++++++++++++++---------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index d75415c..eb2c93a 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -629,7 +629,8 @@ swtim_callback(struct rte_timer *tim)
 		sw->expired_timers[sw->n_expired_timers++] = tim;
 		sw->stats.evtim_exp_count++;
 
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+		__atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
+				__ATOMIC_RELEASE);
 	}
 
 	if (event_buffer_batch_ready(&sw->buffer)) {
@@ -1020,6 +1021,7 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 	int n_lcores;
 	/* Timer list for this lcore is not in use. */
 	uint16_t exp_state = 0;
+	enum rte_event_timer_state n_state;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1060,30 +1062,36 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 
 	for (i = 0; i < nb_evtims; i++) {
-		/* Don't modify the event timer state in these cases */
-		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
+		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		if (n_state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
-		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+		} else if (!(n_state == RTE_EVENT_TIMER_NOT_ARMED ||
+			     n_state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
 		if (unlikely(ret == -1)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR_TOOLATE,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		} else if (unlikely(ret == -2)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR_TOOEARLY,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		}
 
 		if (unlikely(check_destination_event_queue(evtims[i],
 							   adapter) < 0)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1099,13 +1107,18 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 					  SINGLE, lcore_id, NULL, evtims[i]);
 		if (ret < 0) {
 			/* tim was in RUNNING or CONFIG state */
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR,
+					__ATOMIC_RELEASE);
 			break;
 		}
 
-		rte_smp_wmb();
 		EVTIM_LOG_DBG("armed an event timer");
-		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
+		/* RELEASE ordering guarantees the adapter specific value
+		 * changes observed before the update of state.
+		 */
+		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
+				__ATOMIC_RELEASE);
 	}
 
 	if (i < nb_evtims)
@@ -1132,6 +1145,7 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	struct rte_timer *timp;
 	uint64_t opaque;
 	struct swtim *sw = swtim_pmd_priv(adapter);
+	enum rte_event_timer_state n_state;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1143,16 +1157,18 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
-		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
+		/* ACQUIRE ordering guarantees the access of implementation
+		 * specific opague data under the correct state.
+		 */
+		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		if (n_state == RTE_EVENT_TIMER_CANCELED) {
 			rte_errno = EALREADY;
 			break;
-		} else if (evtims[i]->state != RTE_EVENT_TIMER_ARMED) {
+		} else if (n_state != RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EINVAL;
 			break;
 		}
 
-		rte_smp_rmb();
-
 		opaque = evtims[i]->impl_opaque[0];
 		timp = (struct rte_timer *)(uintptr_t)opaque;
 		RTE_ASSERT(timp != NULL);
@@ -1166,9 +1182,12 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 
 		rte_mempool_put(sw->tim_pool, (void **)timp);
 
-		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-
-		rte_smp_wmb();
+		/* The RELEASE ordering here pairs with atomic ordering
+		 * to make sure the state update data observed between
+		 * threads.
+		 */
+		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
+				__ATOMIC_RELEASE);
 	}
 
 	return i;
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
    2020-07-03 15:38  3% ` David Marchand
@ 2020-07-07 10:10  3% ` Phil Yang
  2020-07-08  5:11  3%   ` Phil Yang
                     ` (2 more replies)
  1 sibling, 3 replies; 200+ results
From: Phil Yang @ 2020-07-07 10:10 UTC (permalink / raw)
  To: david.marchand, dev
  Cc: drc, Honnappa.Nagarahalli, olivier.matz, ruifeng.wang, nd

Use C11 atomics with explicit ordering instead of rte_atomic ops which
enforce unnecessary barriers on aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v2:
Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
to refcnt_atomic.

 lib/librte_mbuf/rte_mbuf.c      |  1 -
 lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
 lib/librte_mbuf/rte_mbuf_core.h | 11 +++--------
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index ae91ae2..8a456e5 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f8e492e..4a7a98c 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -37,7 +37,6 @@
 #include <rte_config.h>
 #include <rte_mempool.h>
 #include <rte_memory.h>
-#include <rte_atomic.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
 #include <rte_byteorder.h>
@@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
+	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
 }
 
 /**
@@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
+	return (uint16_t)(__atomic_add_fetch((int16_t *)&m->refcnt, value,
+					__ATOMIC_ACQ_REL));
 }
 
 /**
@@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
+	return __atomic_load_n(&shinfo->refcnt_atomic, __ATOMIC_RELAXED);
 }
 
 /**
@@ -481,7 +481,7 @@ static inline void
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&shinfo->refcnt_atomic, new_value, __ATOMIC_RELAXED);
 }
 
 /**
@@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 		return (uint16_t)value;
 	}
 
-	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
+	return (uint16_t)(__atomic_add_fetch((int16_t *)&shinfo->refcnt_atomic,
+					    value, __ATOMIC_ACQ_REL));
 }
 
 /** Mbuf prefetch */
@@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(rte_atomic16_add_return
-			(&shinfo->refcnt_atomic, -1)))
+	if (likely(__atomic_add_fetch((int *)&shinfo->refcnt_atomic, -1,
+				     __ATOMIC_ACQ_REL)))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 16600f1..806313a 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -18,7 +18,6 @@
 
 #include <stdint.h>
 #include <rte_compat.h>
-#include <generic/rte_atomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -495,12 +494,8 @@ struct rte_mbuf {
 	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
 	 * config option.
 	 */
-	RTE_STD_C11
-	union {
-		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
-		/** Non-atomically accessed refcnt */
-		uint16_t refcnt;
-	};
+	uint16_t refcnt;
+
 	uint16_t nb_segs;         /**< Number of segments. */
 
 	/** Input port (16 bits to support more than 256 virtual ports).
@@ -679,7 +674,7 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
+	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
 };
 
 /**< Maximum number of nb_segs allowed. */
-- 
2.7.4


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-07  3:19  3%         ` Feifei Wang
@ 2020-07-07  7:40  0%           ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-07  7:40 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 07/07/2020 04:19, Feifei Wang wrote:
> 
> 
>> -----Original Message-----
>> From: Kinsella, Ray <mdr@ashroe.eu>
>> Sent: 2020年7月6日 14:23
>> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Feifei Wang
>> <Feifei.Wang2@arm.com>; Konstantin Ananyev
>> <konstantin.ananyev@intel.com>; Neil Horman <nhorman@tuxdriver.com>
>> Cc: dev@dpdk.org; nd <nd@arm.com>
>> Subject: Re: [PATCH 1/3] ring: remove experimental tag for ring reset API
>>
>>
>>
>> On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>
>>>> On 03/07/2020 11:26, Feifei Wang wrote:
>>>>> Remove the experimental tag for rte_ring_reset API that have been
>>>>> around for 4 releases.
>>>>>
>>>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>>>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>> ---
>>>>>  lib/librte_ring/rte_ring.h           | 3 ---
>>>>>  lib/librte_ring/rte_ring_version.map | 4 +---
>>>>>  2 files changed, 1 insertion(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
>>>>> index f67141482..7181c33b4 100644
>>>>> --- a/lib/librte_ring/rte_ring.h
>>>>> +++ b/lib/librte_ring/rte_ring.h
>>>>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void
>> **obj_p)
>>>>>   *
>>>>>   * This function flush all the elements in a ring
>>>>>   *
>>>>> - * @b EXPERIMENTAL: this API may change without prior notice
>>>>> - *
>>>>>   * @warning
>>>>>   * Make sure the ring is not in use while calling this function.
>>>>>   *
>>>>>   * @param r
>>>>>   *   A pointer to the ring structure.
>>>>>   */
>>>>> -__rte_experimental
>>>>>  void
>>>>>  rte_ring_reset(struct rte_ring *r);
>>>>>
>>>>> diff --git a/lib/librte_ring/rte_ring_version.map
>>>>> b/lib/librte_ring/rte_ring_version.map
>>>>> index e88c143cf..aec6f3820 100644
>>>>> --- a/lib/librte_ring/rte_ring_version.map
>>>>> +++ b/lib/librte_ring/rte_ring_version.map
>>>>> @@ -8,6 +8,7 @@ DPDK_20.0 {
>>>>>  	rte_ring_init;
>>>>>  	rte_ring_list_dump;
>>>>>  	rte_ring_lookup;
>>>>> +	rte_ring_reset;
>>>>>
>>>>>  	local: *;
>>>>>  };
>>>>> @@ -15,9 +16,6 @@ DPDK_20.0 {
>>>>>  EXPERIMENTAL {
>>>>>  	global:
>>>>>
>>>>> -	# added in 19.08
>>>>> -	rte_ring_reset;
>>>>> -
>>>>>  	# added in 20.02
>>>>>  	rte_ring_create_elem;
>>>>>  	rte_ring_get_memsize_elem;
>>>>
>>>> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not
>>>> the v20.0 ABI.
>>> Thanks Ray for clarifying this.
>>>
> Thanks very much for pointing this.
>>>>
>>>> The way to solve is to add it the DPDK_21 ABI in the map file.
>>>> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to
>> experimental
>>>> if necessary.
>>> Is using VERSION_SYMBOL_EXPERIMENTAL a must?
>>
>> Purely at the discretion of the contributor and maintainer.
>> If it has been around for a while, applications are using it and changing the
>> symbol will break them.
>>
>> You may choose to provide the alias or not.
> Ok, in the new patch version, I will add it into the DPDK_21 ABI but the 
> VERSION_SYMBOL_EXPERIMENTAL will not be added, because if it is added in this
> version, it is still needed to be removed in the near future.
> 
> Thanks very much for your review.

That is 100%

>>
>>> The documentation also seems to be vague. It says " The macro is used
>> when a symbol matures to become part of the stable ABI, to provide an alias
>> to experimental for some time". What does 'some time' mean?
>>
>> "Some time" is a bit vague alright, should be "until the next major ABI
>> version" - I will fix.
>>
>>>
>>>>
>>>> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioni
>>>> ng-
>>>> macros

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-06  6:23  3%       ` Kinsella, Ray
@ 2020-07-07  3:19  3%         ` Feifei Wang
  2020-07-07  7:40  0%           ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Feifei Wang @ 2020-07-07  3:19 UTC (permalink / raw)
  To: Kinsella, Ray, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd, nd



> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: 2020年7月6日 14:23
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Feifei Wang
> <Feifei.Wang2@arm.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>; Neil Horman <nhorman@tuxdriver.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>
> Subject: Re: [PATCH 1/3] ring: remove experimental tag for ring reset API
> 
> 
> 
> On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
> > <snip>
> >
> >>
> >> On 03/07/2020 11:26, Feifei Wang wrote:
> >>> Remove the experimental tag for rte_ring_reset API that have been
> >>> around for 4 releases.
> >>>
> >>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> >>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>> ---
> >>>  lib/librte_ring/rte_ring.h           | 3 ---
> >>>  lib/librte_ring/rte_ring_version.map | 4 +---
> >>>  2 files changed, 1 insertion(+), 6 deletions(-)
> >>>
> >>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> >>> index f67141482..7181c33b4 100644
> >>> --- a/lib/librte_ring/rte_ring.h
> >>> +++ b/lib/librte_ring/rte_ring.h
> >>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void
> **obj_p)
> >>>   *
> >>>   * This function flush all the elements in a ring
> >>>   *
> >>> - * @b EXPERIMENTAL: this API may change without prior notice
> >>> - *
> >>>   * @warning
> >>>   * Make sure the ring is not in use while calling this function.
> >>>   *
> >>>   * @param r
> >>>   *   A pointer to the ring structure.
> >>>   */
> >>> -__rte_experimental
> >>>  void
> >>>  rte_ring_reset(struct rte_ring *r);
> >>>
> >>> diff --git a/lib/librte_ring/rte_ring_version.map
> >>> b/lib/librte_ring/rte_ring_version.map
> >>> index e88c143cf..aec6f3820 100644
> >>> --- a/lib/librte_ring/rte_ring_version.map
> >>> +++ b/lib/librte_ring/rte_ring_version.map
> >>> @@ -8,6 +8,7 @@ DPDK_20.0 {
> >>>  	rte_ring_init;
> >>>  	rte_ring_list_dump;
> >>>  	rte_ring_lookup;
> >>> +	rte_ring_reset;
> >>>
> >>>  	local: *;
> >>>  };
> >>> @@ -15,9 +16,6 @@ DPDK_20.0 {
> >>>  EXPERIMENTAL {
> >>>  	global:
> >>>
> >>> -	# added in 19.08
> >>> -	rte_ring_reset;
> >>> -
> >>>  	# added in 20.02
> >>>  	rte_ring_create_elem;
> >>>  	rte_ring_get_memsize_elem;
> >>
> >> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not
> >> the v20.0 ABI.
> > Thanks Ray for clarifying this.
> >
Thanks very much for pointing this.
> >>
> >> The way to solve is to add it the DPDK_21 ABI in the map file.
> >> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to
> experimental
> >> if necessary.
> > Is using VERSION_SYMBOL_EXPERIMENTAL a must?
> 
> Purely at the discretion of the contributor and maintainer.
> If it has been around for a while, applications are using it and changing the
> symbol will break them.
> 
> You may choose to provide the alias or not.
Ok, in the new patch version, I will add it into the DPDK_21 ABI but the 
VERSION_SYMBOL_EXPERIMENTAL will not be added, because if it is added in this
version, it is still needed to be removed in the near future.

Thanks very much for your review.
> 
> > The documentation also seems to be vague. It says " The macro is used
> when a symbol matures to become part of the stable ABI, to provide an alias
> to experimental for some time". What does 'some time' mean?
> 
> "Some time" is a bit vague alright, should be "until the next major ABI
> version" - I will fix.
> 
> >
> >>
> >> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioni
> >> ng-
> >> macros

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning on Windows
  2020-07-06 12:22  0%       ` Bruce Richardson
@ 2020-07-06 23:16  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06 23:16 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Fady Bader, dev, tbashar, talshn, yohadt, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman

06/07/2020 14:22, Bruce Richardson:
> On Mon, Jul 06, 2020 at 02:32:39PM +0300, Fady Bader wrote:
> > Function versioning implementation is not supported by Windows.
> > Function versioning is disabled on Windows.
> > 
> > Signed-off-by: Fady Bader <fady@mellanox.com>
> > ---
> >  doc/guides/windows_gsg/intro.rst | 4 ++++
> >  lib/meson.build                  | 6 +++++-
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
> > index a0285732df..58c6246404 100644
> > --- a/doc/guides/windows_gsg/intro.rst
> > +++ b/doc/guides/windows_gsg/intro.rst
> > @@ -18,3 +18,7 @@ DPDK for Windows is currently a work in progress. Not all DPDK source files
> >  compile. Support is being added in pieces so as to limit the overall scope
> >  of any individual patch series. The goal is to be able to run any DPDK
> >  application natively on Windows.
> > +
> > +The :doc:`../contributing/abi_policy` cannot be respected for Windows.
> > +Minor ABI versions may be incompatible
> > +because function versioning is not supported on Windows.
> > diff --git a/lib/meson.build b/lib/meson.build
> > index c1b9e1633f..dadf151f78 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -107,6 +107,10 @@ foreach l:libraries
> >  			shared_dep = declare_dependency(include_directories: includes)
> >  			static_dep = shared_dep
> >  		else
> > +			if is_windows and use_function_versioning
> > +				message('@0@: Function versioning is not supported by Windows.'
> > +				.format(name))
> > +			endif
> >  
> 
> This is ok here, but I think it might be better just moved to somewhere
> like config/meson.build, so that it is always just printed once for each
> build. I don't see an issue with having it printed even if there is no
> function versioning in the build itself.

Moving such message in config/meson.build is the same
as moving it to the doc.
I prefer having a message each time a library compatibility
is required but not possible.

> With or without the code move above, which is just a suggestion,
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

OK thanks, I'll merge as is.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7] sched: make RED scaling configurable
  @ 2020-07-06 23:09  3%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06 23:09 UTC (permalink / raw)
  To: Alan Dewar, Alan Dewar
  Cc: dev, Yigit, Ferruh, Kantecki, Tomasz, Stephen Hemminger, dev,
	Dumitrescu, Cristian, jasvinder.singh, david.marchand,
	bruce.richardson

08/04/2019 15:29, Dumitrescu, Cristian:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 08/04/2019 10:24, Alan Dewar:
> > > On Fri, Apr 5, 2019 at 4:36 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> > > > On 1/16/2018 4:07 PM, alangordondewar@gmail.com wrote:
> > > > > From: Alan Dewar <alan.dewar@att.com>
> > > > >
> > > > > The RED code stores the weighted moving average in a 32-bit integer as
> > > > > a pseudo fixed-point floating number with 10 fractional bits.  Twelve
> > > > > other bits are used to encode the filter weight, leaving just 10 bits
> > > > > for the queue length.  This limits the maximum queue length supported
> > > > > by RED queues to 1024 packets.
> > > > >
> > > > > Introduce a new API to allow the RED scaling factor to be configured
> > > > > based upon maximum queue length.  If this API is not called, the RED
> > > > > scaling factor remains at its default value.
> > > > >
> > > > > Added some new RED scaling unit-tests to test with RED queue-lengths
> > > > > up to 8192 packets long.
> > > > >
> > > > > Signed-off-by: Alan Dewar <alan.dewar@att.com>
> > > >
> > > > Hi Cristian, Alan,
> > > >
> > > > The v7 of this patch is sting without any comment for more than a year.
> > > > What is the status of this patch? Is it still valid? What is blocking it?
> > > >
> > > > For reference patch:
> > > > https://patches.dpdk.org/patch/33837/
> > >
> > > We are still using this patch against DPDK 17.11 and 18.11 as part of
> > > the AT&T Vyatta NOS.   It is needed to make WRED queues longer than
> > > 1024 packets work correctly.  I'm afraid that I have no idea what is
> > > holding it up from being merged.
> > 
> > It will be in a release when it will be merged in the git tree
> > dpdk-next-qos, managed by Cristian.
> 
> I was hoping to get a review & ACK from Tomasz Kantecki, the author of the WRED code in DPDK, hence the lack of progress on this patch.

It seems nobody was able to provide an feedback after two years,
and it was never merged in the QoS git tree.
The handling of this patch is really a shame.

Alan, please rebase this patch.
If nothing is wrong in CI (including ABI check),
I will merge the next version.



^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 04/10] eal: introduce thread uninit helper
    2020-07-06 20:52  3%   ` [dpdk-dev] [PATCH v6 02/10] eal: fix multiple definition of per lcore thread id David Marchand
@ 2020-07-06 20:52  3%   ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2020-07-06 20:52 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, olivier.matz, konstantin.ananyev,
	Jerin Jacob, Sunil Kumar Kori, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon

This is a preparation step for dynamically unregistering threads.

Since we explicitly allocate a per thread trace buffer in
__rte_thread_init, add an internal helper to free this buffer.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v5:
- fixed windows build,

Changes since v4:
- renamed rte_thread_uninit and moved to eal_private.h,
- hid freeing helper,

Changes since v2:
- added missing stub for windows tracing support,
- moved free symbol to exported (experimental) ABI as a counterpart of
  the alloc symbol we already had,

Changes since v1:
- rebased on master, removed Windows workaround wrt traces support,

---
 lib/librte_eal/common/eal_common_thread.c |  9 +++++
 lib/librte_eal/common/eal_common_trace.c  | 49 +++++++++++++++++++----
 lib/librte_eal/common/eal_private.h       |  5 +++
 lib/librte_eal/common/eal_trace.h         |  1 +
 lib/librte_eal/windows/eal.c              |  7 +++-
 5 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index fb06f8f802..6d1c87b1c2 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -20,6 +20,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_private.h"
 #include "eal_thread.h"
+#include "eal_trace.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
 RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
@@ -161,6 +162,14 @@ __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset)
 	__rte_trace_mem_per_thread_alloc();
 }
 
+void
+__rte_thread_uninit(void)
+{
+	trace_mem_per_thread_free();
+
+	RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY;
+}
+
 struct rte_thread_ctrl_params {
 	void *(*start_routine)(void *);
 	void *arg;
diff --git a/lib/librte_eal/common/eal_common_trace.c b/lib/librte_eal/common/eal_common_trace.c
index 875553d7e5..b6da5537fe 100644
--- a/lib/librte_eal/common/eal_common_trace.c
+++ b/lib/librte_eal/common/eal_common_trace.c
@@ -101,7 +101,7 @@ eal_trace_fini(void)
 {
 	if (!rte_trace_is_enabled())
 		return;
-	trace_mem_per_thread_free();
+	trace_mem_free();
 	trace_metadata_destroy();
 	eal_trace_args_free();
 }
@@ -370,24 +370,59 @@ __rte_trace_mem_per_thread_alloc(void)
 	rte_spinlock_unlock(&trace->lock);
 }
 
+static void
+trace_mem_per_thread_free_unlocked(struct thread_mem_meta *meta)
+{
+	if (meta->area == TRACE_AREA_HUGEPAGE)
+		eal_free_no_trace(meta->mem);
+	else if (meta->area == TRACE_AREA_HEAP)
+		free(meta->mem);
+}
+
 void
 trace_mem_per_thread_free(void)
+{
+	struct trace *trace = trace_obj_get();
+	struct __rte_trace_header *header;
+	uint32_t count;
+
+	header = RTE_PER_LCORE(trace_mem);
+	if (header == NULL)
+		return;
+
+	rte_spinlock_lock(&trace->lock);
+	for (count = 0; count < trace->nb_trace_mem_list; count++) {
+		if (trace->lcore_meta[count].mem == header)
+			break;
+	}
+	if (count != trace->nb_trace_mem_list) {
+		struct thread_mem_meta *meta = &trace->lcore_meta[count];
+
+		trace_mem_per_thread_free_unlocked(meta);
+		if (count != trace->nb_trace_mem_list - 1) {
+			memmove(meta, meta + 1,
+				sizeof(*meta) *
+				 (trace->nb_trace_mem_list - count - 1));
+		}
+		trace->nb_trace_mem_list--;
+	}
+	rte_spinlock_unlock(&trace->lock);
+}
+
+void
+trace_mem_free(void)
 {
 	struct trace *trace = trace_obj_get();
 	uint32_t count;
-	void *mem;
 
 	if (!rte_trace_is_enabled())
 		return;
 
 	rte_spinlock_lock(&trace->lock);
 	for (count = 0; count < trace->nb_trace_mem_list; count++) {
-		mem = trace->lcore_meta[count].mem;
-		if (trace->lcore_meta[count].area == TRACE_AREA_HUGEPAGE)
-			eal_free_no_trace(mem);
-		else if (trace->lcore_meta[count].area == TRACE_AREA_HEAP)
-			free(mem);
+		trace_mem_per_thread_free_unlocked(&trace->lcore_meta[count]);
 	}
+	trace->nb_trace_mem_list = 0;
 	rte_spinlock_unlock(&trace->lock);
 }
 
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 5d8b53882d..a77ac7a963 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -709,4 +709,9 @@ eal_get_application_usage_hook(void);
  */
 void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
 
+/**
+ * Uninitialize per-lcore info for current thread.
+ */
+void __rte_thread_uninit(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/eal_trace.h b/lib/librte_eal/common/eal_trace.h
index 8f60616156..92c5951c3a 100644
--- a/lib/librte_eal/common/eal_trace.h
+++ b/lib/librte_eal/common/eal_trace.h
@@ -106,6 +106,7 @@ int trace_metadata_create(void);
 void trace_metadata_destroy(void);
 int trace_mkdir(void);
 int trace_epoch_time_save(void);
+void trace_mem_free(void);
 void trace_mem_per_thread_free(void);
 
 /* EAL interface */
diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c
index 9f5d019e64..addac62ae5 100644
--- a/lib/librte_eal/windows/eal.c
+++ b/lib/librte_eal/windows/eal.c
@@ -17,10 +17,10 @@
 #include <eal_filesystem.h>
 #include <eal_options.h>
 #include <eal_private.h>
-#include <rte_trace_point.h>
 #include <rte_vfio.h>
 
 #include "eal_hugepages.h"
+#include "eal_trace.h"
 #include "eal_windows.h"
 
 #define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL)
@@ -215,6 +215,11 @@ __rte_trace_mem_per_thread_alloc(void)
 {
 }
 
+void
+trace_mem_per_thread_free(void)
+{
+}
+
 void
 __rte_trace_point_emit_field(size_t sz, const char *field,
 	const char *type)
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 02/10] eal: fix multiple definition of per lcore thread id
  @ 2020-07-06 20:52  3%   ` David Marchand
  2020-07-06 20:52  3%   ` [dpdk-dev] [PATCH v6 04/10] eal: introduce thread uninit helper David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2020-07-06 20:52 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, olivier.matz, konstantin.ananyev,
	Neil Horman, Cunming Liang

Because of the inline accessor + static declaration in rte_gettid(),
we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
Each compilation unit will pay a cost when accessing this information
for the first time.

$ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
0000000000000054 d per_lcore__thread_id.5037
0000000000000040 d per_lcore__thread_id.5103
0000000000000048 d per_lcore__thread_id.5259
000000000000004c d per_lcore__thread_id.5259
0000000000000044 d per_lcore__thread_id.5933
0000000000000058 d per_lcore__thread_id.6261
0000000000000050 d per_lcore__thread_id.7378
000000000000005c d per_lcore__thread_id.7496
000000000000000c d per_lcore__thread_id.8016
0000000000000010 d per_lcore__thread_id.8431

Make it global as part of the DPDK_21 stable ABI.

Fixes: ef76436c6834 ("eal: get unique thread id")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_eal/common/eal_common_thread.c | 1 +
 lib/librte_eal/include/rte_eal.h          | 3 ++-
 lib/librte_eal/rte_eal_version.map        | 7 +++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index 7be80c292e..fd13453fee 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -22,6 +22,7 @@
 #include "eal_thread.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 static RTE_DEFINE_PER_LCORE(unsigned int, _socket_id) =
 	(unsigned int)SOCKET_ID_ANY;
 static RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
diff --git a/lib/librte_eal/include/rte_eal.h b/lib/librte_eal/include/rte_eal.h
index 2f9ed298de..2edf8c6556 100644
--- a/lib/librte_eal/include/rte_eal.h
+++ b/lib/librte_eal/include/rte_eal.h
@@ -447,6 +447,8 @@ enum rte_intr_mode rte_eal_vfio_intr_mode(void);
  */
 int rte_sys_gettid(void);
 
+RTE_DECLARE_PER_LCORE(int, _thread_id);
+
 /**
  * Get system unique thread id.
  *
@@ -456,7 +458,6 @@ int rte_sys_gettid(void);
  */
 static inline int rte_gettid(void)
 {
-	static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 	if (RTE_PER_LCORE(_thread_id) == -1)
 		RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
 	return RTE_PER_LCORE(_thread_id);
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 196eef5afa..0d42d44ce9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,6 +221,13 @@ DPDK_20.0 {
 	local: *;
 };
 
+DPDK_21 {
+	global:
+
+	per_lcore__thread_id;
+
+} DPDK_20.0;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup
  2020-07-05 19:55  3%   ` [dpdk-dev] " Thomas Monjalon
  2020-07-06  8:02  3%     ` [dpdk-dev] [dpdk-techboard] " Bruce Richardson
@ 2020-07-06 16:57  0%     ` Medvedkin, Vladimir
  1 sibling, 0 replies; 200+ results
From: Medvedkin, Vladimir @ 2020-07-06 16:57 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand
  Cc: dev, honnappa.nagarahalli, techboard, Jiayu Hu, Yipeng Wang,
	Sameh Gobriel, Nipun Gupta, Hemant Agrawal


On 05/07/2020 20:55, Thomas Monjalon wrote:
> +Cc maintainers of the problematic libraries:
> 	- librte_fib
> 	- librte_rib
> 	- librte_gro
> 	- librte_member
> 	- librte_rawdev
>
> 26/06/2020 10:16, David Marchand:
>> Following discussions on the mailing list and the 05/20 TB meeting, here
>> is a series that drops the special versioning for non stable libraries.
>>
>> Two notes:
>>
>> - RIB/FIB library is not referenced in the API doxygen index, is this
>>    intentional?
> Vladimir please, could you fix the miss in the doxygen index?


Sure, I'll send a patch.


>
>> - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
>>    announced as experimental while their functions are part of the 20
>>    stable ABI (in .map files + no __rte_experimental marking).
>>    Their fate must be discussed.
> I would suggest removing EXPERIMENTAL flag for gro, member and rawdev.
> They are probably already considered stable for a lot of users.
> Maintainers, are you OK to follow the ABI compatibility rules
> for these libraries? Do you feel these libraries are mature enough?
>
>
>
-- 
Regards,
Vladimir


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11 atomics
  2020-07-06 15:32  0%       ` Phil Yang
@ 2020-07-06 15:40  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06 15:40 UTC (permalink / raw)
  To: Phil Yang
  Cc: erik.g.carrillo, dev, jerinj, Honnappa Nagarahalli, drc,
	Ruifeng Wang, Dharmik Thakkar, nd, david.marchand, mdr,
	Neil Horman, Dodji Seketeli

06/07/2020 17:32, Phil Yang:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 02/07/2020 07:26, Phil Yang:
> > > --- a/lib/librte_eventdev/rte_event_timer_adapter.h
> > > +++ b/lib/librte_eventdev/rte_event_timer_adapter.h
> > > @@ -467,7 +467,7 @@ struct rte_event_timer {
> > >  	 *  - op: RTE_EVENT_OP_NEW
> > >  	 *  - event_type: RTE_EVENT_TYPE_TIMER
> > >  	 */
> > > -	volatile enum rte_event_timer_state state;
> > > +	enum rte_event_timer_state state;
> > >  	/**< State of the event timer. */
> > 
> > Why do you remove the volatile keyword?
> > It is not explained in the commit log.
> By using the C11 atomic operations, it will generate the same instructions for non-volatile and volatile version.
> Please check the sample code here: https://gcc.godbolt.org/z/8x5rWs
> 
> > This change is triggering a warning in the ABI check:
> > http://mails.dpdk.org/archives/test-report/2020-July/140440.html
> > Moving from volatile to non-volatile is probably not an issue.
> > I expect the code generated for the volatile case to work the same
> > in non-volatile case. Do you confirm?
> They generate the same instructions, so either way will work.
> Do I need to revert it to the volatile version?

Either you revert, or you add explanation in the commit log
+ exception in libabigail.abignore



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11 atomics
  2020-07-06 10:04  4%     ` Thomas Monjalon
@ 2020-07-06 15:32  0%       ` Phil Yang
  2020-07-06 15:40  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Phil Yang @ 2020-07-06 15:32 UTC (permalink / raw)
  To: thomas
  Cc: erik.g.carrillo, dev, jerinj, Honnappa Nagarahalli, drc,
	Ruifeng Wang, Dharmik Thakkar, nd, david.marchand, mdr,
	Neil Horman, Dodji Seketeli

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 6, 2020 6:04 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: erik.g.carrillo@intel.com; dev@dpdk.org; jerinj@marvell.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; drc@linux.vnet.ibm.com;
> Ruifeng Wang <Ruifeng.Wang@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>;
> david.marchand@redhat.com; mdr@ashroe.eu; Neil Horman
> <nhorman@tuxdriver.com>; Dodji Seketeli <dodji@redhat.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11
> atomics
> 
> 02/07/2020 07:26, Phil Yang:
> > The implementation-specific opaque data is shared between arm and
> cancel
> > operations. The state flag acts as a guard variable to make sure the
> > update of opaque data is synchronized. This patch uses c11 atomics with
> > explicit one way memory barrier instead of full barriers rte_smp_w/rmb()
> > to synchronize the opaque data between timer arm and cancel threads.
> 
> I think we should write C11 (uppercase).
Agreed. 
I will change it in the next version.

> 
> Please, in your explanations, try to be more specific.
> Naming fields may help to make things clear.
OK. Thanks.

> 
> [...]
> > --- a/lib/librte_eventdev/rte_event_timer_adapter.h
> > +++ b/lib/librte_eventdev/rte_event_timer_adapter.h
> > @@ -467,7 +467,7 @@ struct rte_event_timer {
> >  	 *  - op: RTE_EVENT_OP_NEW
> >  	 *  - event_type: RTE_EVENT_TYPE_TIMER
> >  	 */
> > -	volatile enum rte_event_timer_state state;
> > +	enum rte_event_timer_state state;
> >  	/**< State of the event timer. */
> 
> Why do you remove the volatile keyword?
> It is not explained in the commit log.
By using the C11 atomic operations, it will generate the same instructions for non-volatile and volatile version.
Please check the sample code here: https://gcc.godbolt.org/z/8x5rWs

> 
> This change is triggering a warning in the ABI check:
> http://mails.dpdk.org/archives/test-report/2020-July/140440.html
> Moving from volatile to non-volatile is probably not an issue.
> I expect the code generated for the volatile case to work the same
> in non-volatile case. Do you confirm?
They generate the same instructions, so either way will work.
Do I need to revert it to the volatile version?


Thanks,
Phil
> 
> In any case, we need an explanation and an ABI check exception.
> 


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v5 04/10] eal: introduce thread uninit helper
    2020-07-06 14:15  3%   ` [dpdk-dev] [PATCH v5 02/10] eal: fix multiple definition of per lcore thread id David Marchand
@ 2020-07-06 14:16  3%   ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2020-07-06 14:16 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, olivier.matz, konstantin.ananyev,
	Jerin Jacob, Sunil Kumar Kori, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon

This is a preparation step for dynamically unregistering threads.

Since we explicitly allocate a per thread trace buffer in
__rte_thread_init, add an internal helper to free this buffer.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v4:
- renamed rte_thread_uninit and moved to eal_private.h,
- hid freeing helper,

Changes since v2:
- added missing stub for windows tracing support,
- moved free symbol to exported (experimental) ABI as a counterpart of
  the alloc symbol we already had,

Changes since v1:
- rebased on master, removed Windows workaround wrt traces support,

---
 lib/librte_eal/common/eal_common_thread.c |  9 +++++
 lib/librte_eal/common/eal_common_trace.c  | 49 +++++++++++++++++++----
 lib/librte_eal/common/eal_private.h       |  5 +++
 lib/librte_eal/common/eal_trace.h         |  1 +
 lib/librte_eal/windows/eal.c              |  5 +++
 5 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index fb06f8f802..6d1c87b1c2 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -20,6 +20,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_private.h"
 #include "eal_thread.h"
+#include "eal_trace.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
 RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
@@ -161,6 +162,14 @@ __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset)
 	__rte_trace_mem_per_thread_alloc();
 }
 
+void
+__rte_thread_uninit(void)
+{
+	trace_mem_per_thread_free();
+
+	RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY;
+}
+
 struct rte_thread_ctrl_params {
 	void *(*start_routine)(void *);
 	void *arg;
diff --git a/lib/librte_eal/common/eal_common_trace.c b/lib/librte_eal/common/eal_common_trace.c
index 875553d7e5..b6da5537fe 100644
--- a/lib/librte_eal/common/eal_common_trace.c
+++ b/lib/librte_eal/common/eal_common_trace.c
@@ -101,7 +101,7 @@ eal_trace_fini(void)
 {
 	if (!rte_trace_is_enabled())
 		return;
-	trace_mem_per_thread_free();
+	trace_mem_free();
 	trace_metadata_destroy();
 	eal_trace_args_free();
 }
@@ -370,24 +370,59 @@ __rte_trace_mem_per_thread_alloc(void)
 	rte_spinlock_unlock(&trace->lock);
 }
 
+static void
+trace_mem_per_thread_free_unlocked(struct thread_mem_meta *meta)
+{
+	if (meta->area == TRACE_AREA_HUGEPAGE)
+		eal_free_no_trace(meta->mem);
+	else if (meta->area == TRACE_AREA_HEAP)
+		free(meta->mem);
+}
+
 void
 trace_mem_per_thread_free(void)
+{
+	struct trace *trace = trace_obj_get();
+	struct __rte_trace_header *header;
+	uint32_t count;
+
+	header = RTE_PER_LCORE(trace_mem);
+	if (header == NULL)
+		return;
+
+	rte_spinlock_lock(&trace->lock);
+	for (count = 0; count < trace->nb_trace_mem_list; count++) {
+		if (trace->lcore_meta[count].mem == header)
+			break;
+	}
+	if (count != trace->nb_trace_mem_list) {
+		struct thread_mem_meta *meta = &trace->lcore_meta[count];
+
+		trace_mem_per_thread_free_unlocked(meta);
+		if (count != trace->nb_trace_mem_list - 1) {
+			memmove(meta, meta + 1,
+				sizeof(*meta) *
+				 (trace->nb_trace_mem_list - count - 1));
+		}
+		trace->nb_trace_mem_list--;
+	}
+	rte_spinlock_unlock(&trace->lock);
+}
+
+void
+trace_mem_free(void)
 {
 	struct trace *trace = trace_obj_get();
 	uint32_t count;
-	void *mem;
 
 	if (!rte_trace_is_enabled())
 		return;
 
 	rte_spinlock_lock(&trace->lock);
 	for (count = 0; count < trace->nb_trace_mem_list; count++) {
-		mem = trace->lcore_meta[count].mem;
-		if (trace->lcore_meta[count].area == TRACE_AREA_HUGEPAGE)
-			eal_free_no_trace(mem);
-		else if (trace->lcore_meta[count].area == TRACE_AREA_HEAP)
-			free(mem);
+		trace_mem_per_thread_free_unlocked(&trace->lcore_meta[count]);
 	}
+	trace->nb_trace_mem_list = 0;
 	rte_spinlock_unlock(&trace->lock);
 }
 
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 5d8b53882d..a77ac7a963 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -709,4 +709,9 @@ eal_get_application_usage_hook(void);
  */
 void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
 
+/**
+ * Uninitialize per-lcore info for current thread.
+ */
+void __rte_thread_uninit(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/eal_trace.h b/lib/librte_eal/common/eal_trace.h
index 8f60616156..92c5951c3a 100644
--- a/lib/librte_eal/common/eal_trace.h
+++ b/lib/librte_eal/common/eal_trace.h
@@ -106,6 +106,7 @@ int trace_metadata_create(void);
 void trace_metadata_destroy(void);
 int trace_mkdir(void);
 int trace_epoch_time_save(void);
+void trace_mem_free(void);
 void trace_mem_per_thread_free(void);
 
 /* EAL interface */
diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c
index 9f5d019e64..a11daee68b 100644
--- a/lib/librte_eal/windows/eal.c
+++ b/lib/librte_eal/windows/eal.c
@@ -215,6 +215,11 @@ __rte_trace_mem_per_thread_alloc(void)
 {
 }
 
+void
+trace_mem_per_thread_free(void)
+{
+}
+
 void
 __rte_trace_point_emit_field(size_t sz, const char *field,
 	const char *type)
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 02/10] eal: fix multiple definition of per lcore thread id
  @ 2020-07-06 14:15  3%   ` David Marchand
  2020-07-06 14:16  3%   ` [dpdk-dev] [PATCH v5 04/10] eal: introduce thread uninit helper David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2020-07-06 14:15 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, olivier.matz, konstantin.ananyev,
	Neil Horman, Cunming Liang

Because of the inline accessor + static declaration in rte_gettid(),
we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
Each compilation unit will pay a cost when accessing this information
for the first time.

$ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
0000000000000054 d per_lcore__thread_id.5037
0000000000000040 d per_lcore__thread_id.5103
0000000000000048 d per_lcore__thread_id.5259
000000000000004c d per_lcore__thread_id.5259
0000000000000044 d per_lcore__thread_id.5933
0000000000000058 d per_lcore__thread_id.6261
0000000000000050 d per_lcore__thread_id.7378
000000000000005c d per_lcore__thread_id.7496
000000000000000c d per_lcore__thread_id.8016
0000000000000010 d per_lcore__thread_id.8431

Make it global as part of the DPDK_21 stable ABI.

Fixes: ef76436c6834 ("eal: get unique thread id")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_eal/common/eal_common_thread.c | 1 +
 lib/librte_eal/include/rte_eal.h          | 3 ++-
 lib/librte_eal/rte_eal_version.map        | 7 +++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index 7be80c292e..fd13453fee 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -22,6 +22,7 @@
 #include "eal_thread.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 static RTE_DEFINE_PER_LCORE(unsigned int, _socket_id) =
 	(unsigned int)SOCKET_ID_ANY;
 static RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
diff --git a/lib/librte_eal/include/rte_eal.h b/lib/librte_eal/include/rte_eal.h
index 2f9ed298de..2edf8c6556 100644
--- a/lib/librte_eal/include/rte_eal.h
+++ b/lib/librte_eal/include/rte_eal.h
@@ -447,6 +447,8 @@ enum rte_intr_mode rte_eal_vfio_intr_mode(void);
  */
 int rte_sys_gettid(void);
 
+RTE_DECLARE_PER_LCORE(int, _thread_id);
+
 /**
  * Get system unique thread id.
  *
@@ -456,7 +458,6 @@ int rte_sys_gettid(void);
  */
 static inline int rte_gettid(void)
 {
-	static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 	if (RTE_PER_LCORE(_thread_id) == -1)
 		RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
 	return RTE_PER_LCORE(_thread_id);
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 196eef5afa..0d42d44ce9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,6 +221,13 @@ DPDK_20.0 {
 	local: *;
 };
 
+DPDK_21 {
+	global:
+
+	per_lcore__thread_id;
+
+} DPDK_20.0;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning on Windows
  2020-07-06 11:32  5%     ` [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning " Fady Bader
@ 2020-07-06 12:22  0%       ` Bruce Richardson
  2020-07-06 23:16  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2020-07-06 12:22 UTC (permalink / raw)
  To: Fady Bader
  Cc: dev, thomas, tbashar, talshn, yohadt, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman

On Mon, Jul 06, 2020 at 02:32:39PM +0300, Fady Bader wrote:
> Function versioning implementation is not supported by Windows.
> Function versioning is disabled on Windows.
> 
> Signed-off-by: Fady Bader <fady@mellanox.com>
> ---
>  doc/guides/windows_gsg/intro.rst | 4 ++++
>  lib/meson.build                  | 6 +++++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
> index a0285732df..58c6246404 100644
> --- a/doc/guides/windows_gsg/intro.rst
> +++ b/doc/guides/windows_gsg/intro.rst
> @@ -18,3 +18,7 @@ DPDK for Windows is currently a work in progress. Not all DPDK source files
>  compile. Support is being added in pieces so as to limit the overall scope
>  of any individual patch series. The goal is to be able to run any DPDK
>  application natively on Windows.
> +
> +The :doc:`../contributing/abi_policy` cannot be respected for Windows.
> +Minor ABI versions may be incompatible
> +because function versioning is not supported on Windows.
> diff --git a/lib/meson.build b/lib/meson.build
> index c1b9e1633f..dadf151f78 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -107,6 +107,10 @@ foreach l:libraries
>  			shared_dep = declare_dependency(include_directories: includes)
>  			static_dep = shared_dep
>  		else
> +			if is_windows and use_function_versioning
> +				message('@0@: Function versioning is not supported by Windows.'
> +				.format(name))
> +			endif
>  

This is ok here, but I think it might be better just moved to somewhere
like config/meson.build, so that it is always just printed once for each
build. I don't see an issue with having it printed even if there is no
function versioning in the build itself.

>  			if use_function_versioning
>  				cflags += '-DRTE_USE_FUNCTION_VERSIONING'
> @@ -138,7 +142,7 @@ foreach l:libraries
>  					include_directories: includes,
>  					dependencies: static_deps)
>  
> -			if not use_function_versioning
> +			if not use_function_versioning or is_windows
>  				# use pre-build objects to build shared lib
>  				sources = []
>  				objs += static_lib.extract_all_objects(recursive: false)
> -- 
> 2.16.1.windows.4
> 

With or without the code move above, which is just a suggestion,

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning on Windows
  @ 2020-07-06 11:32  5%     ` Fady Bader
  2020-07-06 12:22  0%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Fady Bader @ 2020-07-06 11:32 UTC (permalink / raw)
  To: dev
  Cc: thomas, tbashar, talshn, yohadt, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman

Function versioning implementation is not supported by Windows.
Function versioning is disabled on Windows.

Signed-off-by: Fady Bader <fady@mellanox.com>
---
 doc/guides/windows_gsg/intro.rst | 4 ++++
 lib/meson.build                  | 6 +++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
index a0285732df..58c6246404 100644
--- a/doc/guides/windows_gsg/intro.rst
+++ b/doc/guides/windows_gsg/intro.rst
@@ -18,3 +18,7 @@ DPDK for Windows is currently a work in progress. Not all DPDK source files
 compile. Support is being added in pieces so as to limit the overall scope
 of any individual patch series. The goal is to be able to run any DPDK
 application natively on Windows.
+
+The :doc:`../contributing/abi_policy` cannot be respected for Windows.
+Minor ABI versions may be incompatible
+because function versioning is not supported on Windows.
diff --git a/lib/meson.build b/lib/meson.build
index c1b9e1633f..dadf151f78 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -107,6 +107,10 @@ foreach l:libraries
 			shared_dep = declare_dependency(include_directories: includes)
 			static_dep = shared_dep
 		else
+			if is_windows and use_function_versioning
+				message('@0@: Function versioning is not supported by Windows.'
+				.format(name))
+			endif
 
 			if use_function_versioning
 				cflags += '-DRTE_USE_FUNCTION_VERSIONING'
@@ -138,7 +142,7 @@ foreach l:libraries
 					include_directories: includes,
 					dependencies: static_deps)
 
-			if not use_function_versioning
+			if not use_function_versioning or is_windows
 				# use pre-build objects to build shared lib
 				sources = []
 				objs += static_lib.extract_all_objects(recursive: false)
-- 
2.16.1.windows.4


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11 atomics
  @ 2020-07-06 10:04  4%     ` Thomas Monjalon
  2020-07-06 15:32  0%       ` Phil Yang
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-06 10:04 UTC (permalink / raw)
  To: Phil Yang
  Cc: erik.g.carrillo, dev, jerinj, Honnappa.Nagarahalli, drc,
	Ruifeng.Wang, Dharmik.Thakkar, nd, david.marchand, mdr,
	Neil Horman, Dodji Seketeli

02/07/2020 07:26, Phil Yang:
> The implementation-specific opaque data is shared between arm and cancel
> operations. The state flag acts as a guard variable to make sure the
> update of opaque data is synchronized. This patch uses c11 atomics with
> explicit one way memory barrier instead of full barriers rte_smp_w/rmb()
> to synchronize the opaque data between timer arm and cancel threads.

I think we should write C11 (uppercase).

Please, in your explanations, try to be more specific.
Naming fields may help to make things clear.

[...]
> --- a/lib/librte_eventdev/rte_event_timer_adapter.h
> +++ b/lib/librte_eventdev/rte_event_timer_adapter.h
> @@ -467,7 +467,7 @@ struct rte_event_timer {
>  	 *  - op: RTE_EVENT_OP_NEW
>  	 *  - event_type: RTE_EVENT_TYPE_TIMER
>  	 */
> -	volatile enum rte_event_timer_state state;
> +	enum rte_event_timer_state state;
>  	/**< State of the event timer. */

Why do you remove the volatile keyword?
It is not explained in the commit log.

This change is triggering a warning in the ABI check:
http://mails.dpdk.org/archives/test-report/2020-July/140440.html
Moving from volatile to non-volatile is probably not an issue.
I expect the code generated for the volatile case to work the same
in non-volatile case. Do you confirm?

In any case, we need an explanation and an ABI check exception.



^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [pull-request] next-eventdev 20.08 RC1
  @ 2020-07-06  9:57  3% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06  9:57 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: dev, phil.yang

05/07/2020 05:41, Jerin Jacob Kollanukkaran:
>   http://dpdk.org/git/next/dpdk-next-eventdev
> 
> ----------------------------------------------------------------
> Harman Kalra (1):
>       event/octeontx: fix memory corruption
> 
> Harry van Haaren (1):
>       examples/eventdev_pipeline: fix 32-bit coremask logic
> 
> Pavan Nikhilesh (3):
>       event/octeontx2: fix device reconfigure
>       event/octeontx2: fix sub event type violation
>       event/octeontx2: improve datapath memory locality

Pulled patches above.

> Phil Yang (4):
>       eventdev: fix race condition on timer list counter
>       eventdev: use c11 atomics for lcore timer armed flag
>       eventdev: remove redundant code
>       eventdev: relax smp barriers with c11 atomics

I cannot merge this C11 series because of an ABI breakage:
	http://mails.dpdk.org/archives/test-report/2020-July/140440.html



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-techboard] [PATCH v3 0/3] Experimental/internal libraries cleanup
  2020-07-06  8:02  3%     ` [dpdk-dev] [dpdk-techboard] " Bruce Richardson
@ 2020-07-06  8:12  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06  8:12 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: David Marchand, dev, honnappa.nagarahalli, techboard, Jiayu Hu,
	Yipeng Wang, Sameh Gobriel, Vladimir Medvedkin, Nipun Gupta,
	Hemant Agrawal

06/07/2020 10:02, Bruce Richardson:
> On Sun, Jul 05, 2020 at 09:55:41PM +0200, Thomas Monjalon wrote:
> > +Cc maintainers of the problematic libraries:
> > 	- librte_fib
> > 	- librte_rib
> > 	- librte_gro
> > 	- librte_member
> > 	- librte_rawdev
> > 
> > 26/06/2020 10:16, David Marchand:
> > > Following discussions on the mailing list and the 05/20 TB meeting, here
> > > is a series that drops the special versioning for non stable libraries.
> > > 
> > > Two notes:
> > > 
> > > - RIB/FIB library is not referenced in the API doxygen index, is this
> > >   intentional?
> > 
> > Vladimir please, could you fix the miss in the doxygen index?
> > 
> > > - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
> > >   announced as experimental while their functions are part of the 20
> > >   stable ABI (in .map files + no __rte_experimental marking).
> > >   Their fate must be discussed.
> > 
> > I would suggest removing EXPERIMENTAL flag for gro, member and rawdev.
> > They are probably already considered stable for a lot of users.
> > Maintainers, are you OK to follow the ABI compatibility rules
> > for these libraries? Do you feel these libraries are mature enough?
> >
> 
> I think things being added to the official ABI is good. For these, I wonder
> if waiting till the 20.11 release is the best time to officially mark them
> as stable, rather than doing so now? 

They are already not marked as experimental symbols...
I think we should remove confusion in the MAINTAINERS file.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] mbuf: use c11 atomics for refcnt operations
  2020-07-03 15:38  3% ` David Marchand
@ 2020-07-06  8:03  3%   ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-06  8:03 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Olivier Matz, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang, nd

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, July 3, 2020 11:39 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: dev <dev@dpdk.org>; Olivier Matz <olivier.matz@6wind.com>; David
> Christensen <drc@linux.vnet.ibm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH] mbuf: use c11 atomics for refcnt operations
> 
> On Thu, Jun 11, 2020 at 12:26 PM Phil Yang <phil.yang@arm.com> wrote:
> >
> > Use c11 atomics with explicit ordering instead of rte_atomic ops which
> > enforce unnecessary barriers on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> I did not look at the details, but this patch is refused by the ABI
> check in Travis.

Thanks, David.
The ABI issue is the name of 'rte_mbuf_ext_shared_info::refcnt_atomic' changed to 'rte_mbuf_ext_shared_info::refcnt' at rte_mbuf_core.h.
I made this change just to simplify the name of the variable.

Revert the 'rte_mbuf_ext_shared_info::refcnt' to refcnt_atomic can fix this issue.
I will update it in v2.

Thanks,
Phil

> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-techboard] [PATCH v3 0/3] Experimental/internal libraries cleanup
  2020-07-05 19:55  3%   ` [dpdk-dev] " Thomas Monjalon
@ 2020-07-06  8:02  3%     ` Bruce Richardson
  2020-07-06  8:12  0%       ` Thomas Monjalon
  2020-07-06 16:57  0%     ` [dpdk-dev] " Medvedkin, Vladimir
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2020-07-06  8:02 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: David Marchand, dev, honnappa.nagarahalli, techboard, Jiayu Hu,
	Yipeng Wang, Sameh Gobriel, Vladimir Medvedkin, Nipun Gupta,
	Hemant Agrawal

On Sun, Jul 05, 2020 at 09:55:41PM +0200, Thomas Monjalon wrote:
> +Cc maintainers of the problematic libraries:
> 	- librte_fib
> 	- librte_rib
> 	- librte_gro
> 	- librte_member
> 	- librte_rawdev
> 
> 26/06/2020 10:16, David Marchand:
> > Following discussions on the mailing list and the 05/20 TB meeting, here
> > is a series that drops the special versioning for non stable libraries.
> > 
> > Two notes:
> > 
> > - RIB/FIB library is not referenced in the API doxygen index, is this
> >   intentional?
> 
> Vladimir please, could you fix the miss in the doxygen index?
> 
> > - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
> >   announced as experimental while their functions are part of the 20
> >   stable ABI (in .map files + no __rte_experimental marking).
> >   Their fate must be discussed.
> 
> I would suggest removing EXPERIMENTAL flag for gro, member and rawdev.
> They are probably already considered stable for a lot of users.
> Maintainers, are you OK to follow the ABI compatibility rules
> for these libraries? Do you feel these libraries are mature enough?
>

I think things being added to the official ABI is good. For these, I wonder
if waiting till the 20.11 release is the best time to officially mark them
as stable, rather than doing so now? 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 1/3] eal: disable function versioning on Windows
  2020-07-05 20:23  4%     ` Thomas Monjalon
@ 2020-07-06  7:02  0%       ` Fady Bader
  0 siblings, 0 replies; 200+ results
From: Fady Bader @ 2020-07-06  7:02 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Tasnim Bashar, Tal Shnaiderman, Yohad Tor, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Sunday, July 5, 2020 11:24 PM
> To: Fady Bader <fady@mellanox.com>
> Cc: dev@dpdk.org; Tasnim Bashar <tbashar@mellanox.com>; Tal Shnaiderman
> <talshn@mellanox.com>; Yohad Tor <yohadt@mellanox.com>;
> dmitry.kozliuk@gmail.com; harini.ramakrishnan@microsoft.com;
> ocardona@microsoft.com; pallavi.kadam@intel.com; ranjit.menon@intel.com;
> olivier.matz@6wind.com; arybchenko@solarflare.com; mdr@ashroe.eu;
> nhorman@tuxdriver.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/3] eal: disable function versioning on
> Windows
> 
> 05/07/2020 15:47, Fady Bader:
> > Function versioning implementation is not supported by Windows.
> > Function versioning was disabled on Windows.
> 
> was -> is
> 
> > Signed-off-by: Fady Bader <fady@mellanox.com>
> > ---
> >  lib/librte_eal/include/rte_function_versioning.h | 2 +-
> >  lib/meson.build                                  | 5 +++++
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> As suggested by Ray, we should add a note in the documentation about the ABI
> compatibility. Because we have no function versioning, we cannot ensure ABI
> compatibility on Windows.
> 
> I recommend adding this text in doc/guides/windows_gsg/intro.rst under
> "Limitations":
> "
> The :doc:`../contributing/abi_policy` cannot be respected for Windows.
> Minor ABI versions may be incompatible
> because function versioning is not supported on Windows.
> "

Ok, I'll send a new patch with the changes soon.

> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-03 18:46  3%     ` Honnappa Nagarahalli
@ 2020-07-06  6:23  3%       ` Kinsella, Ray
  2020-07-07  3:19  3%         ` Feifei Wang
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-06  6:23 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Feifei Wang, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>> On 03/07/2020 11:26, Feifei Wang wrote:
>>> Remove the experimental tag for rte_ring_reset API that have been
>>> around for 4 releases.
>>>
>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> ---
>>>  lib/librte_ring/rte_ring.h           | 3 ---
>>>  lib/librte_ring/rte_ring_version.map | 4 +---
>>>  2 files changed, 1 insertion(+), 6 deletions(-)
>>>
>>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
>>> index f67141482..7181c33b4 100644
>>> --- a/lib/librte_ring/rte_ring.h
>>> +++ b/lib/librte_ring/rte_ring.h
>>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
>>>   *
>>>   * This function flush all the elements in a ring
>>>   *
>>> - * @b EXPERIMENTAL: this API may change without prior notice
>>> - *
>>>   * @warning
>>>   * Make sure the ring is not in use while calling this function.
>>>   *
>>>   * @param r
>>>   *   A pointer to the ring structure.
>>>   */
>>> -__rte_experimental
>>>  void
>>>  rte_ring_reset(struct rte_ring *r);
>>>
>>> diff --git a/lib/librte_ring/rte_ring_version.map
>>> b/lib/librte_ring/rte_ring_version.map
>>> index e88c143cf..aec6f3820 100644
>>> --- a/lib/librte_ring/rte_ring_version.map
>>> +++ b/lib/librte_ring/rte_ring_version.map
>>> @@ -8,6 +8,7 @@ DPDK_20.0 {
>>>  	rte_ring_init;
>>>  	rte_ring_list_dump;
>>>  	rte_ring_lookup;
>>> +	rte_ring_reset;
>>>
>>>  	local: *;
>>>  };
>>> @@ -15,9 +16,6 @@ DPDK_20.0 {
>>>  EXPERIMENTAL {
>>>  	global:
>>>
>>> -	# added in 19.08
>>> -	rte_ring_reset;
>>> -
>>>  	# added in 20.02
>>>  	rte_ring_create_elem;
>>>  	rte_ring_get_memsize_elem;
>>
>> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0
>> ABI.
> Thanks Ray for clarifying this.
> 
>>
>> The way to solve is to add it the DPDK_21 ABI in the map file.
>> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental
>> if necessary.
> Is using VERSION_SYMBOL_EXPERIMENTAL a must? 

Purely at the discretion of the contributor and maintainer. 
If it has been around for a while, applications are using it and changing the symbol will break them.

You may choose to provide the alias or not. 

> The documentation also seems to be vague. It says " The macro is used when a symbol matures to become part of the stable ABI, to provide an alias to experimental for some time". What does 'some time' mean?

"Some time" is a bit vague alright, should be "until the next major ABI version" - I will fix. 

> 
>>
>> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-
>> macros

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 1/3] eal: disable function versioning on Windows
  @ 2020-07-05 20:23  4%     ` Thomas Monjalon
  2020-07-06  7:02  0%       ` Fady Bader
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-05 20:23 UTC (permalink / raw)
  To: Fady Bader
  Cc: dev, tbashar, talshn, yohadt, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman

05/07/2020 15:47, Fady Bader:
> Function versioning implementation is not supported by Windows.
> Function versioning was disabled on Windows.

was -> is

> Signed-off-by: Fady Bader <fady@mellanox.com>
> ---
>  lib/librte_eal/include/rte_function_versioning.h | 2 +-
>  lib/meson.build                                  | 5 +++++
>  2 files changed, 6 insertions(+), 1 deletion(-)

As suggested by Ray, we should add a note in the documentation
about the ABI compatibility. Because we have no function versioning,
we cannot ensure ABI compatibility on Windows.

I recommend adding this text in doc/guides/windows_gsg/intro.rst
under "Limitations":
"
The :doc:`../contributing/abi_policy` cannot be respected for Windows.
Minor ABI versions may be incompatible
because function versioning is not supported on Windows.
"



^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup
  2020-06-26  8:16  5% ` [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup David Marchand
                     ` (3 preceding siblings ...)
  2020-06-26  9:25  0%   ` [dpdk-dev] [dpdk-techboard] [PATCH v3 0/3] Experimental/internal libraries cleanup Bruce Richardson
@ 2020-07-05 19:55  3%   ` Thomas Monjalon
  2020-07-06  8:02  3%     ` [dpdk-dev] [dpdk-techboard] " Bruce Richardson
  2020-07-06 16:57  0%     ` [dpdk-dev] " Medvedkin, Vladimir
  4 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2020-07-05 19:55 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, honnappa.nagarahalli, techboard, Jiayu Hu, Yipeng Wang,
	Sameh Gobriel, Vladimir Medvedkin, Nipun Gupta, Hemant Agrawal

+Cc maintainers of the problematic libraries:
	- librte_fib
	- librte_rib
	- librte_gro
	- librte_member
	- librte_rawdev

26/06/2020 10:16, David Marchand:
> Following discussions on the mailing list and the 05/20 TB meeting, here
> is a series that drops the special versioning for non stable libraries.
> 
> Two notes:
> 
> - RIB/FIB library is not referenced in the API doxygen index, is this
>   intentional?

Vladimir please, could you fix the miss in the doxygen index?

> - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
>   announced as experimental while their functions are part of the 20
>   stable ABI (in .map files + no __rte_experimental marking).
>   Their fate must be discussed.

I would suggest removing EXPERIMENTAL flag for gro, member and rawdev.
They are probably already considered stable for a lot of users.
Maintainers, are you OK to follow the ABI compatibility rules
for these libraries? Do you feel these libraries are mature enough?




^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ethdev: add fragment attribute to IPv6 item
    @ 2020-07-05 13:13  0%       ` Andrew Rybchenko
  1 sibling, 0 replies; 200+ results
From: Andrew Rybchenko @ 2020-07-05 13:13 UTC (permalink / raw)
  To: Adrien Mazarguil, Ori Kam
  Cc: Dekel Peled, ferruh.yigit, john.mcnamara, marko.kovacevic,
	Asaf Penso, Matan Azrad, Eli Britstein, dev, Ivan Malov

On 6/2/20 10:04 PM, Adrien Mazarguil wrote:
> Hi Ori, Andrew, Delek,
> 
> (been a while eh?)
> 
> On Tue, Jun 02, 2020 at 06:28:41PM +0000, Ori Kam wrote:
>> Hi Andrew,
>>
>> PSB,
> [...]
>>>> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
>>>> index b0e4199..3bc8ce1 100644
>>>> --- a/lib/librte_ethdev/rte_flow.h
>>>> +++ b/lib/librte_ethdev/rte_flow.h
>>>> @@ -787,6 +787,8 @@ struct rte_flow_item_ipv4 {
>>>>   */
>>>>  struct rte_flow_item_ipv6 {
>>>>  	struct rte_ipv6_hdr hdr; /**< IPv6 header definition. */
>>>> +	uint32_t is_frag:1; /**< Is IPv6 packet fragmented/non-fragmented. */
>>>> +	uint32_t reserved:31; /**< Reserved, must be zero. */
>>>
>>> The solution is simple, but hardly generic and adds an
>>> example for the future extensions. I doubt that it is a
>>> right way to go.
>>>
>> I agree with you that this is not the most generic way possible,
>> but the IPV6 extensions are very unique. So the solution is also unique.
>> In general, I'm always in favor of finding the most generic way, but sometimes
>> it is better to keep things simple, and see how it goes.
> 
> Same feeling here, it doesn't look right.
> 
>>> May be we should add 256-bit string with one bit for each
>>> IP protocol number and apply it to extension headers only?
>>> If bit A is set in the mask:
>>>  - if bit A is set in spec as well, extension header with
>>>    IP protocol (1 << A) number must present
>>>  - if bit A is clear in spec, extension header with
>>>    IP protocol (1 << A) number must absent
>>> If bit is clear in the mask, corresponding extension header
>>> may present and may absent (i.e. don't care).
>>>
>> There are only 12 possible extension headers and currently none of them
>> are supported in rte_flow. So adding a logic to parse the 256 just to get a max of 12 
>> possible values is an overkill. Also, if we disregard the case of the extension, 
>> the application must select only one next proto. For example, the application
>> can't select udp + tcp. There is the option to add a flag for each of the
>> possible extensions, does it makes more sense to you?
> 
> Each of these extension headers has its own structure, we first need the
> ability to match them properly by adding the necessary pattern items.
> 
>>> The RFC indirectly touches IPv6 proto (next header) matching
>>> logic.
>>>
>>> If logic used in ETH+VLAN is applied on IPv6 as well, it would
>>> make pattern specification and handling complicated. E.g.:
>>>   eth / ipv6 / udp / end
>>> should match UDP over IPv6 without any extension headers only.
>>>
>> The issue with VLAN I agree is different since by definition VLAN is 
>> layer 2.5. We can add the same logic also to the VLAN case, maybe it will
>> be easier. 
>> In any case, in your example above and according to the RFC we will
>> get all ipv6 udp traffic with and without extensions.
>>
>>> And how to specify UPD over IPv6 regardless extension headers?
>>
>> Please see above the rule will be eth / ipv6 /udp.
>>
>>>   eth / ipv6 / ipv6_ext / udp / end
>>> with a convention that ipv6_ext is optional if spec and mask
>>> are NULL (or mask is empty).
>>>
>> I would guess that this flow should match all ipv6 that has one ext and the next 
>> proto is udp.
> 
> In my opinion RTE_FLOW_ITEM_TYPE_IPV6_EXT is a bit useless on its own. It's
> only for matching packets that contain some kind of extension header, not a
> specific one, more about that below.
> 
>>> I'm wondering if any driver treats it this way?
>>>
>> I'm not sure, we can support only the frag ext by default, but if required we can support other 
>> ext.
>>  
>>> I agree that the problem really comes when we'd like match
>>> IPv6 frags or even worse not fragments.
>>>
>>> Two patterns for fragments:
>>>   eth / ipv6 (proto=FRAGMENT) / end
>>>   eth / ipv6 / ipv6_ext (next_hdr=FRAGMENT) / end
>>>
>>> Any sensible solution for not-fragments with any other
>>> extension headers?
>>>
>> The one propose in this mail 😊 
>>
>>> INVERT exists, but hardly useful, since it simply says
>>> that patches which do not match pattern without INVERT
>>> matches the pattern with INVERT and
>>>   invert / eth / ipv6 (proto=FRAGMENT) / end
>>> will match ARP, IPv4, IPv6 with an extension header before
>>> fragment header and so on.
>>>
>> I agree with you, INVERT in this doesn’t help.
>> We were considering adding some kind of not mask / item per item.
>> some think around this line:
>> user request ipv6 unfragmented udp packets. The flow would look something
>> like this:
>> Eth / ipv6 / Not (Ipv6.proto = frag_proto) / udp
>> But it makes the rules much harder to use, and I don't think that there
>> is any HW that support not, and adding such feature to all items is overkill.
>>
>>  
>>> Bit string suggested above will allow to match:
>>>  - UDP over IPv6 with any extension headers:
>>>     eth / ipv6 (ext_hdrs mask empty) / udp / end
>>>  - UDP over IPv6 without any extension headers:
>>>     eth / ipv6 (ext_hdrs mask full, spec empty) / udp / end
>>>  - UDP over IPv6 without fragment header:
>>>     eth / ipv6 (ext.spec & ~FRAGMENT, ext.mask | FRAGMENT) / udp / end
>>>  - UDP over IPv6 with fragment header
>>>     eth / ipv6 (ext.spec | FRAGMENT, ext.mask | FRAGMENT) / udp / end
>>>
>>> where FRAGMENT is 1 << IPPROTO_FRAGMENT.
>>>
>> Please see my response regarding this above.
>>
>>> Above I intentionally keep 'proto' unspecified in ipv6
>>> since otherwise it would specify the next header after IPv6
>>> header.
>>>
>>> Extension headers mask should be empty by default.
> 
> This is a deliberate design choice/issue with rte_flow: an empty pattern
> matches everything; adding items only narrows the selection. As Andrew said
> there is currently no way to provide a specific item to reject, it can only
> be done globally on a pattern through INVERT that no PMD implements so far.
> 
> So we have two requirements here: the ability to specifically match IPv6
> fragment headers and the ability to reject them.
> 

I think that one of key requirements here is an ability to say
that an extension header may be anywhere (or it must be no
extension header anywhere), since specification using a pattern
item suggests specified order of items, but it could be other
extension headers before frag header and after it before UDP protocol
header.

> To match IPv6 fragment headers, we need a dedicated pattern item. The
> generic RTE_FLOW_ITEM_TYPE_IPV6_EXT is useless for that on its own, it must
> be completed with RTE_FLOW_ITEM_TYPE_IPV6_EXT_FRAG and associated object
> to match individual fields if needed (like all the others
> protocols/headers).
> 

Yes, I agree, but it is strictly required if we want to match
on fragment header content or see it in exact place in next
protocols chain.

> Then to reject a pattern item... My preference goes to a new "NOT" meta item
> affecting the meaning of the item coming immediately after in the pattern
> list. That would be ultra generic, wouldn't break any ABI/API and like
> INVERT, wouldn't even require a new object associated with it.
> 

Yes, that's true, but I'm not sure if it is easy to do in HW.
Also, *NOT* scope could be per item field in fact, not whole
item. It sounds like it is getting more and more complicated.

> To match UDPv6 traffic when there is no fragment header, one could then do
> something like:
> 
>  eth / ipv6 / not / ipv6_ext_frag / udp
> 
> PMD support would be trivial to implement (I'm sure!)
> 

The problem is an interpretation of the above pattern.
Strictly speaking only UDP packets with exactly one
not frag extension header match the pattern.
What about packets without any extension headers?
Or packet with two (more) extension headers when the first
one is not frag header?

> We may later implement other kinds of "operator" items as Andrew suggested,
> for bit-wise stuff and so on. Let's keep adding features on a needed basis
> though.
> 

Thanks,
Andrew.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
    @ 2020-07-04 17:00  3%       ` Ruifeng Wang
  1 sibling, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-04 17:00 UTC (permalink / raw)
  To: David Marchand, Vladimir Medvedkin, Bruce Richardson
  Cc: John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman, dev,
	Ananyev, Konstantin, Honnappa Nagarahalli, nd, nd

Hi David,

Sorry, I missed tracking of this thread.

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Monday, June 29, 2020 7:56 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Cc: John McNamara <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>; dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
> 
> On Mon, Jun 29, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> >
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
> >  lib/librte_lpm/Makefile            |   2 +-
> >  lib/librte_lpm/meson.build         |   1 +
> >  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
> >  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
> >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> >  6 files changed, 216 insertions(+), 13 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/lpm_lib.rst
> > b/doc/guides/prog_guide/lpm_lib.rst
> > index 1609a57d0..7cc99044a 100644
> > --- a/doc/guides/prog_guide/lpm_lib.rst
> > +++ b/doc/guides/prog_guide/lpm_lib.rst
> > @@ -145,6 +145,38 @@ depending on whether we need to move to the
> next table or not.
> >  Prefix expansion is one of the keys of this algorithm,  since it
> > improves the speed dramatically by adding redundancy.
> >
> > +Deletion
> > +~~~~~~~~
> > +
> > +When deleting a rule, a replacement rule is searched for. Replacement
> > +rule is an existing rule that has the longest prefix match with the rule to be
> deleted, but has smaller depth.
> > +
> > +If a replacement rule is found, target tbl24 and tbl8 entries are
> > +updated to have the same depth and next hop value with the
> replacement rule.
> > +
> > +If no replacement rule can be found, target tbl24 and tbl8 entries will be
> cleared.
> > +
> > +Prefix expansion is performed if the rule's depth is not exactly 24 bits or
> 32 bits.
> > +
> > +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry
> are freed in following cases:
> > +
> > +*   All tbl8s in the group are empty .
> > +
> > +*   All tbl8s in the group have the same values and with depth no greater
> than 24.
> > +
> > +Free of tbl8s have different behaviors:
> > +
> > +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> > +
> > +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> > +
> > +When the LPM is not using RCU, tbl8 group can be freed immediately
> > +even though the readers might be using the tbl8 group entries. This might
> result in incorrect lookup results.
> > +
> > +RCU QSBR process is integrated for safe tbl8 group reclaimation.
> > +Application has certain responsibilities while using this feature.
> > +Please refer to resource reclaimation framework of :ref:`RCU library
> <RCU_Library>` for more details.
> > +
> 
> Would the lpm6 library benefit from the same?
> Asking as I do not see much code shared between lpm and lpm6.
> 
Didn't look into lpm6. It may need separate integration with RCU since no shared code between lpm and lpm6 as you mentioned.

> [...]
> 
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 38ab512a4..41e9c49b8 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #include <string.h>
> > @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
> >                 TAILQ_REMOVE(lpm_list, te, next);
> >
> >         rte_mcfg_tailq_write_unlock();
> > -
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       if (lpm->dq)
> > +               rte_rcu_qsbr_dq_delete(lpm->dq); #endif
> 
> All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API
> flag set.
> There is no need to protect against this flag in rte_lpm.c.
> 
OK, I see. So DPDK libraries will always be compiled with the ALLOW_EXPERIMENTAL_API. It is application's 
choice to use experimental APIs. 
Will update in next version to remove the ALLOW_EXPERIMENTAL_API flag from rte_lpm.c and only keep the one in rte_lpm.h.

> [...]
> 
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > b9d49ac87..7889f21b3 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> 
> > @@ -130,6 +143,28 @@ struct rte_lpm {
> >                         __rte_cache_aligned; /**< LPM tbl24 table. */
> >         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       /* RCU config. */
> > +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> > +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> > +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> > +#endif
> > +};
> 
> This is more a comment/question for the lpm maintainers.
> 
> Afaics, the rte_lpm structure is exported/public because of lookup which is
> inlined.
> But most of the structure can be hidden and stored in a private structure that
> would embed the exposed rte_lpm.
> The slowpath functions would only have to translate from publicly exposed
> to internal representation (via container_of).
> 
> This patch could do this and be the first step to hide the unneeded exposure
> of other fields (later/in 20.11 ?).
> 
To hide most of the structure is reasonable. 
Since it will break ABI, I can do that in 20.11.

> Thoughts?
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-03 16:16  4%   ` Kinsella, Ray
@ 2020-07-03 18:46  3%     ` Honnappa Nagarahalli
  2020-07-06  6:23  3%       ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2020-07-03 18:46 UTC (permalink / raw)
  To: Kinsella, Ray, Feifei Wang, Konstantin Ananyev, Neil Horman
  Cc: dev, nd, Honnappa Nagarahalli, nd

<snip>

> 
> On 03/07/2020 11:26, Feifei Wang wrote:
> > Remove the experimental tag for rte_ring_reset API that have been
> > around for 4 releases.
> >
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/rte_ring.h           | 3 ---
> >  lib/librte_ring/rte_ring_version.map | 4 +---
> >  2 files changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index f67141482..7181c33b4 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
> >   *
> >   * This function flush all the elements in a ring
> >   *
> > - * @b EXPERIMENTAL: this API may change without prior notice
> > - *
> >   * @warning
> >   * Make sure the ring is not in use while calling this function.
> >   *
> >   * @param r
> >   *   A pointer to the ring structure.
> >   */
> > -__rte_experimental
> >  void
> >  rte_ring_reset(struct rte_ring *r);
> >
> > diff --git a/lib/librte_ring/rte_ring_version.map
> > b/lib/librte_ring/rte_ring_version.map
> > index e88c143cf..aec6f3820 100644
> > --- a/lib/librte_ring/rte_ring_version.map
> > +++ b/lib/librte_ring/rte_ring_version.map
> > @@ -8,6 +8,7 @@ DPDK_20.0 {
> >  	rte_ring_init;
> >  	rte_ring_list_dump;
> >  	rte_ring_lookup;
> > +	rte_ring_reset;
> >
> >  	local: *;
> >  };
> > @@ -15,9 +16,6 @@ DPDK_20.0 {
> >  EXPERIMENTAL {
> >  	global:
> >
> > -	# added in 19.08
> > -	rte_ring_reset;
> > -
> >  	# added in 20.02
> >  	rte_ring_create_elem;
> >  	rte_ring_get_memsize_elem;
> 
> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0
> ABI.
Thanks Ray for clarifying this.

> 
> The way to solve is to add it the DPDK_21 ABI in the map file.
> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental
> if necessary.
Is using VERSION_SYMBOL_EXPERIMENTAL a must? The documentation also seems to be vague. It says " The macro is used when a symbol matures to become part of the stable ABI, to provide an alias to experimental for some time". What does 'some time' mean?

> 
> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-
> macros

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: add sample for ABI checks in contribution guide
@ 2020-07-03 17:15  4% Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2020-07-03 17:15 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic; +Cc: dev, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 doc/guides/contributing/patches.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 25d97b85b..39ec64ec8 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -550,6 +550,10 @@ results in a subfolder of the current working directory.
 The environment variable ``DPDK_ABI_REF_DIR`` can be set so that the results go
 to a different location.
 
+Sample::
+
+        DPDK_ABI_REF_VERSION=v19.11 DPDK_ABI_REF_DIR=/tmp ./devtools/test-meson-builds.sh
+
 
 Sending Patches
 ---------------
-- 
2.25.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs
  @ 2020-07-03 16:17  3%   ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-03 16:17 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 11:26, Feifei Wang wrote:
> Remove the experimental tag for rte_ring_xxx_elem APIs that have been
> around for 2 releases.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/rte_ring.h           | 5 +----
>  lib/librte_ring/rte_ring_elem.h      | 8 --------
>  lib/librte_ring/rte_ring_version.map | 9 ++-------
>  3 files changed, 3 insertions(+), 19 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 7181c33b4..35f3f8c42 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -40,6 +40,7 @@ extern "C" {
>  #endif
>  
>  #include <rte_ring_core.h>
> +#include <rte_ring_elem.h>
>  
>  /**
>   * Calculate the memory size needed for a ring
> @@ -401,10 +402,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			RTE_RING_SYNC_ST, free_space);
>  }
>  
> -#ifdef ALLOW_EXPERIMENTAL_API
> -#include <rte_ring_elem.h>
> -#endif
> -
>  /**
>   * Enqueue several objects on a ring.
>   *
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 9e5192ae6..69dc51746 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -23,9 +23,6 @@ extern "C" {
>  #include <rte_ring_core.h>
>  
>  /**
> - * @warning
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * Calculate the memory size needed for a ring with given element size
>   *
>   * This function returns the number of bytes needed for a ring, given
> @@ -43,13 +40,9 @@ extern "C" {
>   *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
>   *		 power of 2.
>   */
> -__rte_experimental
>  ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
>  
>  /**
> - * @warning
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * Create a new ring named *name* that stores elements with given size.
>   *
>   * This function uses ``memzone_reserve()`` to allocate memory. Then it
> @@ -109,7 +102,6 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
>   *    - EEXIST - a memzone with the same name already exists
>   *    - ENOMEM - no appropriate memory area found in which to create memzone
>   */
> -__rte_experimental
>  struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
>  			unsigned int count, int socket_id, unsigned int flags);
>  
> diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
> index aec6f3820..3030e8edb 100644
> --- a/lib/librte_ring/rte_ring_version.map
> +++ b/lib/librte_ring/rte_ring_version.map
> @@ -2,9 +2,11 @@ DPDK_20.0 {
>  	global:
>  
>  	rte_ring_create;
> +	rte_ring_create_elem;
>  	rte_ring_dump;
>  	rte_ring_free;
>  	rte_ring_get_memsize;
> +	rte_ring_get_memsize_elem;
>  	rte_ring_init;
>  	rte_ring_list_dump;
>  	rte_ring_lookup;
> @@ -13,10 +15,3 @@ DPDK_20.0 {
>  	local: *;
>  };
>  
> -EXPERIMENTAL {
> -	global:
> -
> -	# added in 20.02
> -	rte_ring_create_elem;
> -	rte_ring_get_memsize_elem;
> -};
> 

Same as the last comment.
Rte_ring_get_memsize_elem and rte_ring_create_elem are part of the DPDK_21 ABI.

Ray K

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  @ 2020-07-03 16:16  4%   ` Kinsella, Ray
  2020-07-03 18:46  3%     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-03 16:16 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 11:26, Feifei Wang wrote:
> Remove the experimental tag for rte_ring_reset API that have been around
> for 4 releases.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/rte_ring.h           | 3 ---
>  lib/librte_ring/rte_ring_version.map | 4 +---
>  2 files changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index f67141482..7181c33b4 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
>   *
>   * This function flush all the elements in a ring
>   *
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * @warning
>   * Make sure the ring is not in use while calling this function.
>   *
>   * @param r
>   *   A pointer to the ring structure.
>   */
> -__rte_experimental
>  void
>  rte_ring_reset(struct rte_ring *r);
>  
> diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
> index e88c143cf..aec6f3820 100644
> --- a/lib/librte_ring/rte_ring_version.map
> +++ b/lib/librte_ring/rte_ring_version.map
> @@ -8,6 +8,7 @@ DPDK_20.0 {
>  	rte_ring_init;
>  	rte_ring_list_dump;
>  	rte_ring_lookup;
> +	rte_ring_reset;
>  
>  	local: *;
>  };
> @@ -15,9 +16,6 @@ DPDK_20.0 {
>  EXPERIMENTAL {
>  	global:
>  
> -	# added in 19.08
> -	rte_ring_reset;
> -
>  	# added in 20.02
>  	rte_ring_create_elem;
>  	rte_ring_get_memsize_elem;

So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0 ABI.

The way to solve is to add it the DPDK_21 ABI in the map file.
And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental if necessary. 

https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-macros

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] mbuf: use c11 atomics for refcnt operations
  @ 2020-07-03 15:38  3% ` David Marchand
  2020-07-06  8:03  3%   ` Phil Yang
  2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2020-07-03 15:38 UTC (permalink / raw)
  To: Phil Yang
  Cc: dev, Olivier Matz, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd

On Thu, Jun 11, 2020 at 12:26 PM Phil Yang <phil.yang@arm.com> wrote:
>
> Use c11 atomics with explicit ordering instead of rte_atomic ops which
> enforce unnecessary barriers on aarch64.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

I did not look at the details, but this patch is refused by the ABI
check in Travis.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v17 0/2] support for VFIO-PCI VF token interface
    @ 2020-07-03 14:57  4% ` Haiyue Wang
  1 sibling, 0 replies; 200+ results
From: Haiyue Wang @ 2020-07-03 14:57 UTC (permalink / raw)
  To: dev, anatoly.burakov, thomas, jerinj, david.marchand, arybchenko
  Cc: Haiyue Wang

v17: Rebase for new EAL config API, update the commit message and doc.

v16: Rebase the patch for 20.08 release note.

v15: Add the missed EXPERIMENTAL warning for API doxgen.

v14: Rebase the patch for 20.08 release note.

v13: Rename the EAL get VF token function, and leave the freebsd type as empty.

v12: support to vfio devices with VF token and no token.

v11: Use the eal parameter to pass the VF token, then not every PCI
     device needs to be specified with this token. Also no ABI issue
     now.

v10: Use the __rte_internal to mark the internal API changing.

v9: Rewrite the document.

v8: Update the document.

v7: Add the Fixes tag in uuid, the release note and help
    document.

v6: Drop the Fixes tag in uuid, since the file has been
    moved to another place, not suitable to apply on stable.
    And this is not a bug, just some kind of enhancement.

v5: 1. Add the VF token parse error handling.
    2. Split into two patches for different logic module.
    3. Add more comments into the code for explaining the design.
    4. Drop the ABI change workaround, this patch set focuses on code review.

v4: 1. Ignore rte_vfio_setup_device ABI check since it is
       for Linux driver use.

v3: Fix the Travis build failed:
           (1). rte_uuid.h:97:55: error: unknown type name ‘size_t’
           (2). rte_uuid.h:58:2: error: implicit declaration of function ‘memcpy’

v2: Fix the FreeBSD build error.

v1: Update the commit message.

RFC v2:
         Based on Vamsi's RFC v1, and Alex's patch for Qemu
        [https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/]: 
       Use the devarg to pass-down the VF token.

RFC v1: https://patchwork.dpdk.org/patch/66281/ by Vamsi.

Haiyue Wang (2):
  eal: add uuid dependent header files explicitly
  eal: support for VFIO-PCI VF token

 doc/guides/linux_gsg/linux_drivers.rst        | 35 ++++++++++++++++++-
 doc/guides/linux_gsg/linux_eal_parameters.rst |  4 +++
 doc/guides/rel_notes/release_20_08.rst        |  6 ++++
 lib/librte_eal/common/eal_common_options.c    |  3 ++
 lib/librte_eal/common/eal_internal_cfg.h      |  2 ++
 lib/librte_eal/common/eal_options.h           |  2 ++
 lib/librte_eal/freebsd/eal.c                  |  5 +++
 lib/librte_eal/include/rte_eal.h              | 14 ++++++++
 lib/librte_eal/include/rte_uuid.h             |  2 ++
 lib/librte_eal/linux/eal.c                    | 33 +++++++++++++++++
 lib/librte_eal/linux/eal_vfio.c               | 19 ++++++++++
 lib/librte_eal/rte_eal_version.map            |  3 ++
 12 files changed, 127 insertions(+), 1 deletion(-)

-- 
2.27.0


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] devtools: remove useless files from ABI reference
  @ 2020-07-03  9:08  4%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-07-03  9:08 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Bruce Richardson

On Thu, May 28, 2020 at 3:16 PM David Marchand
<david.marchand@redhat.com> wrote:
> On Sun, May 24, 2020 at 7:43 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > When building an ABI reference with meson, some static libraries
> > are built and linked in apps. They are useless and take a lot of space.
> > Those binaries, and other useless files (examples and doc files)
> > in the share/ directory, are removed after being installed.
> >
> > In order to save time when building the ABI reference,
> > the examples (which are not installed anyway) are not compiled.
> >
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> Acked-by: David Marchand <david.marchand@redhat.com>

Applied, thanks.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-07-02 18:02  3%       ` Chautru, Nicolas
@ 2020-07-02 18:09  4%         ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2020-07-02 18:09 UTC (permalink / raw)
  To: Chautru, Nicolas, David Marchand; +Cc: dev, Thomas Monjalon

> 
> > From: Akhil Goyal <akhil.goyal@nxp.com>
> > > > Hello Nicolas,
> > > >
> > > > On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
> > > > <nicolas.chautru@intel.com> wrote:
> > > > >
> > > > > Planning to move bbdev API to stable from 20.11 (ABI version 21)
> > > > > and remove experimental tag.
> > > > > Sending now to advertise and get any feedback.
> > > > > Some manual rebase will be required later on notably as the actual
> > > > > release note which is not there yet.
> > > >
> > > > Cool that we want to stabilize this API.
> > > > My concern is that we have drivers from a single vendor.
> > > > I would hate to see a new vendor unable to submit a driver (or
> > > > having to wait until the next ABI breakage window) because of the
> > > > current API/ABI.
> > > >
> > > >
> > >
> > > +1 from my side. I am not sure how much it is acceptable for all the
> > > vendors/customers.
> > > It is not reviewed by most of the vendors who may support in future.
> > > It is not good to remove experimental tag as we have a long 1 year
> > > cycle to break the API/ABI.
> > >
> > Moving the patch as deferred in patchworks.
> 
> That is fine and all good discussion.
> We know of another vendor who plan to release a bbdev driver but probably
> after 20.11.
> There is one extra capability they will need exposed, we will aim to have the API
> is updated prior to that.
> Assuming the API get updated between now and 20.11, is there still room to
> remove experimental tag in 20.11 or the expectation is to wait regardless for a
> full stable cycle and only intercept ABI v22 in  21.11?
> 
I think ABI v22 in 21.11 would be good to move this to stable so that if there are changes
In the ABI when a new vendor PMD comes up, they can be incorporated.
And as the world is evolving towards 5G, there may be multiple vendors and ABI may change.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-07-02 17:54  0%     ` Akhil Goyal
@ 2020-07-02 18:02  3%       ` Chautru, Nicolas
  2020-07-02 18:09  4%         ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Chautru, Nicolas @ 2020-07-02 18:02 UTC (permalink / raw)
  To: Akhil Goyal, David Marchand; +Cc: dev, Thomas Monjalon

> From: Akhil Goyal <akhil.goyal@nxp.com>
> > > Hello Nicolas,
> > >
> > > On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
> > > <nicolas.chautru@intel.com> wrote:
> > > >
> > > > Planning to move bbdev API to stable from 20.11 (ABI version 21)
> > > > and remove experimental tag.
> > > > Sending now to advertise and get any feedback.
> > > > Some manual rebase will be required later on notably as the actual
> > > > release note which is not there yet.
> > >
> > > Cool that we want to stabilize this API.
> > > My concern is that we have drivers from a single vendor.
> > > I would hate to see a new vendor unable to submit a driver (or
> > > having to wait until the next ABI breakage window) because of the
> > > current API/ABI.
> > >
> > >
> >
> > +1 from my side. I am not sure how much it is acceptable for all the
> > vendors/customers.
> > It is not reviewed by most of the vendors who may support in future.
> > It is not good to remove experimental tag as we have a long 1 year
> > cycle to break the API/ABI.
> >
> Moving the patch as deferred in patchworks.

That is fine and all good discussion. 
We know of another vendor who plan to release a bbdev driver but probably after 20.11.
There is one extra capability they will need exposed, we will aim to have the API is updated prior to that.
Assuming the API get updated between now and 20.11, is there still room to remove experimental tag in 20.11 or the expectation is to wait regardless for a full stable cycle and only intercept ABI v22 in  21.11?

Thanks
Nic


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-06-30  7:35  3%   ` Akhil Goyal
@ 2020-07-02 17:54  0%     ` Akhil Goyal
  2020-07-02 18:02  3%       ` Chautru, Nicolas
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2020-07-02 17:54 UTC (permalink / raw)
  To: David Marchand, Nicolas Chautru; +Cc: dev, Thomas Monjalon


> 
> >
> > Hello Nicolas,
> >
> > On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
> > <nicolas.chautru@intel.com> wrote:
> > >
> > > Planning to move bbdev API to stable from 20.11 (ABI version 21)
> > > and remove experimental tag.
> > > Sending now to advertise and get any feedback.
> > > Some manual rebase will be required later on notably as the
> > > actual release note which is not there yet.
> >
> > Cool that we want to stabilize this API.
> > My concern is that we have drivers from a single vendor.
> > I would hate to see a new vendor unable to submit a driver (or having
> > to wait until the next ABI breakage window) because of the current
> > API/ABI.
> >
> >
> 
> +1 from my side. I am not sure how much it is acceptable for all the
> vendors/customers.
> It is not reviewed by most of the vendors who may support in future.
> It is not good to remove experimental tag as we have a long 1 year cycle to
> break the API/ABI.
> 
Moving the patch as deferred in patchworks.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-07-02 15:21  0%             ` Kinsella, Ray
@ 2020-07-02 16:35  3%               ` McDaniel, Timothy
  0 siblings, 0 replies; 200+ results
From: McDaniel, Timothy @ 2020-07-02 16:35 UTC (permalink / raw)
  To: Kinsella, Ray, Jerin Jacob
  Cc: Neil Horman, Jerin Jacob, Mattias Rönnblom, dpdk-dev, Eads,
	Gage, Van Haaren, Harry

>-----Original Message-----
>From: Kinsella, Ray <mdr@ashroe.eu>
>Sent: Thursday, July 2, 2020 10:21 AM
>To: Jerin Jacob <jerinjacobk@gmail.com>
>Cc: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Neil Horman
><nhorman@tuxdriver.com>; Jerin Jacob <jerinj@marvell.com>; Mattias
>Rönnblom <mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>
>
>On 30/06/2020 13:14, Jerin Jacob wrote:
>> On Tue, Jun 30, 2020 at 5:06 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>>
>>>
>>>
>>> On 30/06/2020 12:30, Jerin Jacob wrote:
>>>> On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 27/06/2020 08:44, Jerin Jacob wrote:
>>>>>>> +
>>>>>>> +/** Event port configuration structure */
>>>>>>> +struct rte_event_port_conf_v20 {
>>>>>>> +       int32_t new_event_threshold;
>>>>>>> +       /**< A backpressure threshold for new event enqueues on this port.
>>>>>>> +        * Use for *closed system* event dev where event capacity is
>limited,
>>>>>>> +        * and cannot exceed the capacity of the event dev.
>>>>>>> +        * Configuring ports with different thresholds can make higher
>priority
>>>>>>> +        * traffic less likely to  be backpressured.
>>>>>>> +        * For example, a port used to inject NIC Rx packets into the event
>dev
>>>>>>> +        * can have a lower threshold so as not to overwhelm the device,
>>>>>>> +        * while ports used for worker pools can have a higher threshold.
>>>>>>> +        * This value cannot exceed the *nb_events_limit*
>>>>>>> +        * which was previously supplied to rte_event_dev_configure().
>>>>>>> +        * This should be set to '-1' for *open system*.
>>>>>>> +        */
>>>>>>> +       uint16_t dequeue_depth;
>>>>>>> +       /**< Configure number of bulk dequeues for this event port.
>>>>>>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>>>>>>> +        * which previously supplied to rte_event_dev_configure().
>>>>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>capable.
>>>>>>> +        */
>>>>>>> +       uint16_t enqueue_depth;
>>>>>>> +       /**< Configure number of bulk enqueues for this event port.
>>>>>>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>>>>>>> +        * which previously supplied to rte_event_dev_configure().
>>>>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>capable.
>>>>>>> +        */
>>>>>>>         uint8_t disable_implicit_release;
>>>>>>>         /**< Configure the port not to release outstanding events in
>>>>>>>          * rte_event_dev_dequeue_burst(). If true, all events received
>through
>>>>>>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>>>>>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>>>>>>>                                 struct rte_event_port_conf *port_conf);
>>>>>>>
>>>>>>> +int
>>>>>>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>>>>>>> +                               struct rte_event_port_conf_v20 *port_conf);
>>>>>>> +
>>>>>>> +int
>>>>>>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>>>>>>> +                                     struct rte_event_port_conf *port_conf);
>>>>>>
>>>>>> Hi Timothy,
>>>>>>
>>>>>> + ABI Maintainers (Ray, Neil)
>>>>>>
>>>>>> # As per my understanding, the structures can not be versioned, only
>>>>>> function can be versioned.
>>>>>> i.e we can not make any change to " struct rte_event_port_conf"
>>>>>
>>>>> So the answer is (as always): depends
>>>>>
>>>>> If the structure is being use in inline functions is when you run into trouble
>>>>> - as knowledge of the structure is embedded in the linked application.
>>>>>
>>>>> However if the structure is _strictly_ being used as a non-inlined function
>parameter,
>>>>> It can be safe to version in this way.
>>>>
>>>> But based on the optimization applied when building the consumer code
>>>> matters. Right?
>>>> i.e compiler can "inline" it, based on the optimization even the
>>>> source code explicitly mentions it.
>>>
>>> Well a compiler will typically only inline within the confines of a given object
>file, or
>>> binary, if LTO is enabled.
>>
>>>
>>> If a function symbol is exported from library however, it won't be inlined in a
>linked application.
>>
>> Yes, With respect to that function.
>> But the application can use struct rte_event_port_conf in their code
>> and it can be part of other structures.
>> Right?
>
>Tim, it looks like you might be inadvertently breaking other symbols also.
>For example ...
>
>int
>rte_event_crypto_adapter_create(uint8_t id, uint8_t dev_id,
>                                struct rte_event_port_conf *port_config,
>                                enum rte_event_crypto_adapter_mode mode);
>
>int
>rte_event_port_setup(uint8_t dev_id, uint8_t port_id,
>                     const struct rte_event_port_conf *port_conf);
>
>These and others symbols are also using rte_event_port_conf and would need to
>be updated to use the v20 struct,
>if we aren't to break them .
>

Yes, we just discovered that after successfully running the ABI checker. I will address those in the v3
patch set.  Thanks.

>>
>>
>>> The compiler doesn't have enough information to inline it.
>>> All the compiler will know about it is it's offset in memory, and it's signature.
>>>
>>>>
>>>>
>>>>>
>>>>> So just to be clear, it is still the function that is actually being versioned
>here.
>>>>>
>>>>>>
>>>>>> # We have a similar case with ethdev and it deferred to next release v20.11
>>>>>> http://patches.dpdk.org/patch/69113/
>>>>>
>>>>> Yes - I spent a why looking at this one, but I am struggling to recall,
>>>>> why when I looked it we didn't suggest function versioning as a potential
>solution in this case.
>>>>>
>>>>> Looking back at it now, looks like it would have been ok.
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>>>
>>>>>> Regarding the API changes:
>>>>>> # The slow path changes general looks good to me. I will review the
>>>>>> next level in the coming days
>>>>>> # The following fast path changes bothers to me. Could you share more
>>>>>> details on below change?
>>>>>>
>>>>>> diff --git a/app/test-eventdev/test_order_atq.c
>>>>>> b/app/test-eventdev/test_order_atq.c
>>>>>> index 3366cfc..8246b96 100644
>>>>>> --- a/app/test-eventdev/test_order_atq.c
>>>>>> +++ b/app/test-eventdev/test_order_atq.c
>>>>>> @@ -34,6 +34,8 @@
>>>>>>                         continue;
>>>>>>                 }
>>>>>>
>>>>>> +               ev.flow_id = ev.mbuf->udata64;
>>>>>> +
>>>>>> # Since RC1 is near, I am not sure how to accommodate the API changes
>>>>>> now and sort out ABI stuffs.
>>>>>> # Other concern is eventdev spec get bloated with versioning files
>>>>>> just for ONE release as 20.11 will be OK to change the ABI.
>>>>>> # While we discuss the API change, Please send deprecation notice for
>>>>>> ABI change for 20.11,
>>>>>> so that there is no ambiguity of this patch for the 20.11 release.
>>>>>>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 12:14  0%           ` Jerin Jacob
@ 2020-07-02 15:21  0%             ` Kinsella, Ray
  2020-07-02 16:35  3%               ` McDaniel, Timothy
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-02 15:21 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Tim McDaniel, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Gage Eads, Van Haaren, Harry



On 30/06/2020 13:14, Jerin Jacob wrote:
> On Tue, Jun 30, 2020 at 5:06 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>
>>
>>
>> On 30/06/2020 12:30, Jerin Jacob wrote:
>>> On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>>>
>>>>
>>>>
>>>> On 27/06/2020 08:44, Jerin Jacob wrote:
>>>>>> +
>>>>>> +/** Event port configuration structure */
>>>>>> +struct rte_event_port_conf_v20 {
>>>>>> +       int32_t new_event_threshold;
>>>>>> +       /**< A backpressure threshold for new event enqueues on this port.
>>>>>> +        * Use for *closed system* event dev where event capacity is limited,
>>>>>> +        * and cannot exceed the capacity of the event dev.
>>>>>> +        * Configuring ports with different thresholds can make higher priority
>>>>>> +        * traffic less likely to  be backpressured.
>>>>>> +        * For example, a port used to inject NIC Rx packets into the event dev
>>>>>> +        * can have a lower threshold so as not to overwhelm the device,
>>>>>> +        * while ports used for worker pools can have a higher threshold.
>>>>>> +        * This value cannot exceed the *nb_events_limit*
>>>>>> +        * which was previously supplied to rte_event_dev_configure().
>>>>>> +        * This should be set to '-1' for *open system*.
>>>>>> +        */
>>>>>> +       uint16_t dequeue_depth;
>>>>>> +       /**< Configure number of bulk dequeues for this event port.
>>>>>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>>>>>> +        * which previously supplied to rte_event_dev_configure().
>>>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>>>>>> +        */
>>>>>> +       uint16_t enqueue_depth;
>>>>>> +       /**< Configure number of bulk enqueues for this event port.
>>>>>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>>>>>> +        * which previously supplied to rte_event_dev_configure().
>>>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>>>>>> +        */
>>>>>>         uint8_t disable_implicit_release;
>>>>>>         /**< Configure the port not to release outstanding events in
>>>>>>          * rte_event_dev_dequeue_burst(). If true, all events received through
>>>>>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>>>>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>>>>>>                                 struct rte_event_port_conf *port_conf);
>>>>>>
>>>>>> +int
>>>>>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>>>>>> +                               struct rte_event_port_conf_v20 *port_conf);
>>>>>> +
>>>>>> +int
>>>>>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>>>>>> +                                     struct rte_event_port_conf *port_conf);
>>>>>
>>>>> Hi Timothy,
>>>>>
>>>>> + ABI Maintainers (Ray, Neil)
>>>>>
>>>>> # As per my understanding, the structures can not be versioned, only
>>>>> function can be versioned.
>>>>> i.e we can not make any change to " struct rte_event_port_conf"
>>>>
>>>> So the answer is (as always): depends
>>>>
>>>> If the structure is being use in inline functions is when you run into trouble
>>>> - as knowledge of the structure is embedded in the linked application.
>>>>
>>>> However if the structure is _strictly_ being used as a non-inlined function parameter,
>>>> It can be safe to version in this way.
>>>
>>> But based on the optimization applied when building the consumer code
>>> matters. Right?
>>> i.e compiler can "inline" it, based on the optimization even the
>>> source code explicitly mentions it.
>>
>> Well a compiler will typically only inline within the confines of a given object file, or
>> binary, if LTO is enabled.
> 
>>
>> If a function symbol is exported from library however, it won't be inlined in a linked application.
> 
> Yes, With respect to that function.
> But the application can use struct rte_event_port_conf in their code
> and it can be part of other structures.
> Right?

Tim, it looks like you might be inadvertently breaking other symbols also.
For example ... 

int
rte_event_crypto_adapter_create(uint8_t id, uint8_t dev_id,
                                struct rte_event_port_conf *port_config,
                                enum rte_event_crypto_adapter_mode mode);

int
rte_event_port_setup(uint8_t dev_id, uint8_t port_id,
                     const struct rte_event_port_conf *port_conf);

These and others symbols are also using rte_event_port_conf and would need to be updated to use the v20 struct,
if we aren't to break them . 

> 
> 
>> The compiler doesn't have enough information to inline it.
>> All the compiler will know about it is it's offset in memory, and it's signature.
>>
>>>
>>>
>>>>
>>>> So just to be clear, it is still the function that is actually being versioned here.
>>>>
>>>>>
>>>>> # We have a similar case with ethdev and it deferred to next release v20.11
>>>>> http://patches.dpdk.org/patch/69113/
>>>>
>>>> Yes - I spent a why looking at this one, but I am struggling to recall,
>>>> why when I looked it we didn't suggest function versioning as a potential solution in this case.
>>>>
>>>> Looking back at it now, looks like it would have been ok.
>>>
>>> Ok.
>>>
>>>>
>>>>>
>>>>> Regarding the API changes:
>>>>> # The slow path changes general looks good to me. I will review the
>>>>> next level in the coming days
>>>>> # The following fast path changes bothers to me. Could you share more
>>>>> details on below change?
>>>>>
>>>>> diff --git a/app/test-eventdev/test_order_atq.c
>>>>> b/app/test-eventdev/test_order_atq.c
>>>>> index 3366cfc..8246b96 100644
>>>>> --- a/app/test-eventdev/test_order_atq.c
>>>>> +++ b/app/test-eventdev/test_order_atq.c
>>>>> @@ -34,6 +34,8 @@
>>>>>                         continue;
>>>>>                 }
>>>>>
>>>>> +               ev.flow_id = ev.mbuf->udata64;
>>>>> +
>>>>> # Since RC1 is near, I am not sure how to accommodate the API changes
>>>>> now and sort out ABI stuffs.
>>>>> # Other concern is eventdev spec get bloated with versioning files
>>>>> just for ONE release as 20.11 will be OK to change the ABI.
>>>>> # While we discuss the API change, Please send deprecation notice for
>>>>> ABI change for 20.11,
>>>>> so that there is no ambiguity of this patch for the 20.11 release.
>>>>>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] DPDK Release Status Meeting 2/07/2020
@ 2020-07-02 14:58  4% Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2020-07-02 14:58 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

Minutes 2 July 2020
-------------------

Agenda:
* Release Dates
* Highlights
* Subtrees
* LTS


Participants:
* Arm
* Debian/Microsoft
* Intel
* Marvell
* Nvidia
* NXP
* Red Hat


Release Dates
-------------

* v20.08 dates:
  * -rc1:           Wednesday, 8 July   2020
  * -rc2:           Monday,   20 July   2020
  * Release:        Tuesday,   4 August 2020

* v20.11 proposal dates, please comment:
  * Proposal/V1:    Wednesday, 2 September 2020
  * -rc1:           Wednesday, 30 September 2020
  * -rc2:           Friday, 16 October 2020
  * Release:        Friday, 6 November 2020


Highlights
----------

* We are close to -rc1 but still lots of patches in backlog waiting for review
  *Please help on code reviews*, missing code reviews may lead some features
  missing the release.
  Please check "call for reviews" email for the list of patches to review:
  https://mails.dpdk.org/archives/announce/2020-June/000329.html

* Please subscribe to patchwork to be able to update status of your patches,
  not updating them adding overhead to the maintainers.
  * We are observing an issue in Intel, that not receiving patchwork
    registration and lost password emails.
    * If there is anyone else outside Intel having the same problem please reach
      out to help analyzing the problem.
    * Within Intel please reach to Ferruh if there are patches their status
      needs update in patchwork and you don't have the access.


Subtrees
--------

* main
  * Started to merge ring and vfio patches
  * Would like to close following
    * non-EAL threads as lcore from David
    * rte_log registration usage improvement from Jerin
    * if-proxy
      * Stephen reviewed the patch
    * regex
      * Waiting for PMD implementations. How many PMD required for merge?
      * A HW and two SW PMDs were planned
  * Worrying that ethdev doesn't have enough review
    * Jerin did review on some rte flow ones

* next-net
  * Pulled from vendor sub-trees
  * Some big base update patches from Intel and bnxt merged
  * Need to get ethdev patches before -rc1, requires more review

* next-crypto
  * Reviewed half of the backlog
  * Will be good for -rc1
  * cryptodev patches has been reviewed

* next-eventdev
  * Almost ready for -rc1
  * Intel DLB PMD new version still has ABI breakage
    * Postponed to next release because of the ABI
    * No controversial issues otherwise

* next-virtio
  * Maxime did a pull request for majority of the patches
  * Maxime sent a status for remaining ones
    * 2 patches for async datapath, looks good
    * 2 patches for vhost-user protocol features
      * Has a dependency to quemu
      * Adrian from Red Hat will takeover the patches
    * Performance optimization (loops vectorization)
      * Waiting for new version
      * Not critical for this release, may be postponed if needed
  * Chenbo is managing the virtio patches during Maxime's absence

* next-net-intel
  * Qi is actively merging patches
  * Some base code updates already merged
  * DCF datapath merged

* next-net-mlx
  * Some patches already merged
  * Expecting more but not many

* next-net-mrvl
  * A few patches merged
  * Two more patches for -rc1
  * change requested for qede patches, can merge when they are ready


LTS
---

* v18.11.9-rc2 is out, please test
  * https://mails.dpdk.org/archives/dev/2020-June/171690.html
  * OvS testing reported an issue
    * A workaround can exist for it
  * Nvidia reported an error
    * Which is not regression for 18.11.9 release
  * The release is planned on end of this week or early next week



DPDK Release Status Meetings
============================

The DPDK Release Status Meeting is intended for DPDK Committers to discuss the
status of the master tree and sub-trees, and for project managers to track
progress or milestone dates.

The meeting occurs on every Thursdays at 8:30 UTC. on https://meet.jit.si/DPDK

If you wish to attend just send an email to
"John McNamara <john.mcnamara@intel.com>" for the invite.

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 8/9] devtools: support python3 only
  @ 2020-07-02 10:37  4%   ` Louise Kilheeney
  0 siblings, 0 replies; 200+ results
From: Louise Kilheeney @ 2020-07-02 10:37 UTC (permalink / raw)
  To: dev
  Cc: robin.jarry, anatoly.burakov, bruce.richardson, Louise Kilheeney,
	Neil Horman, Ray Kinsella

Changed script to explicitly use python3 only to avoid
maintaining python 2.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Ray Kinsella <mdr@ashroe.eu>

Signed-off-by: Louise Kilheeney <louise.kilheeney@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 devtools/update_version_map_abi.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/devtools/update_version_map_abi.py b/devtools/update_version_map_abi.py
index e2104e61e..830e6c58c 100755
--- a/devtools/update_version_map_abi.py
+++ b/devtools/update_version_map_abi.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2019 Intel Corporation
 
@@ -9,7 +9,6 @@
 from the devtools/update-abi.sh utility.
 """
 
-from __future__ import print_function
 import argparse
 import sys
 import re
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH (v20.11) 2/2] eventdev: reserve space in timer structs for extension
  2020-07-02  6:19  4% [dpdk-dev] [PATCH (v20.11) 1/2] eventdev: reserve space in config structs for extension pbhagavatula
@ 2020-07-02  6:19  4% ` pbhagavatula
  0 siblings, 0 replies; 200+ results
From: pbhagavatula @ 2020-07-02  6:19 UTC (permalink / raw)
  To: jerinj, Erik Gabriel Carrillo; +Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

The struct rte_event_timer_adapter and rte_event_timer_adapter_data are
supposed to be used internally only, but there is a chance that
increasing their size would break ABI for some applications.
In order to allow smooth addition of features without breaking
ABI compatibility, reserve some space.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.h     | 5 +++++
 lib/librte_eventdev/rte_event_timer_adapter_pmd.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.h b/lib/librte_eventdev/rte_event_timer_adapter.h
index f83d85f4d..ce57a990a 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.h
+++ b/lib/librte_eventdev/rte_event_timer_adapter.h
@@ -529,6 +529,11 @@ struct rte_event_timer_adapter {
 	RTE_STD_C11
 	uint8_t allocated : 1;
 	/**< Flag to indicate that this adapter has been allocated */
+
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields */
 } __rte_cache_aligned;
 
 #define ADAPTER_VALID_OR_ERR_RET(adapter, retval) do {		\
diff --git a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
index cf3509dc6..0a6682833 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
+++ b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
@@ -105,6 +105,11 @@ struct rte_event_timer_adapter_data {
 	RTE_STD_C11
 	uint8_t started : 1;
 	/**< Flag to indicate adapter started. */
+
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields */
 } __rte_cache_aligned;
 
 #ifdef __cplusplus
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH (v20.11) 1/2] eventdev: reserve space in config structs for extension
@ 2020-07-02  6:19  4% pbhagavatula
  2020-07-02  6:19  4% ` [dpdk-dev] [PATCH (v20.11) 2/2] eventdev: reserve space in timer " pbhagavatula
  0 siblings, 1 reply; 200+ results
From: pbhagavatula @ 2020-07-02  6:19 UTC (permalink / raw)
  To: jerinj, Abhinandan Gujjar, Nikhil Rao, Erik Gabriel Carrillo
  Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Reserve space in event device configuration structures as increasing their
size would break ABI for some applications.
In order to allow smooth addition of features without breaking
ABI compatibility, reserve some space.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/librte_eventdev/rte_event_crypto_adapter.h |  4 ++++
 lib/librte_eventdev/rte_event_eth_rx_adapter.h |  8 ++++++++
 lib/librte_eventdev/rte_event_eth_tx_adapter.h |  4 ++++
 lib/librte_eventdev/rte_event_timer_adapter.h  |  8 ++++++++
 lib/librte_eventdev/rte_eventdev.h             | 16 ++++++++++++++++
 5 files changed, 40 insertions(+)

diff --git a/lib/librte_eventdev/rte_event_crypto_adapter.h b/lib/librte_eventdev/rte_event_crypto_adapter.h
index 60630ef66..c471ece79 100644
--- a/lib/librte_eventdev/rte_event_crypto_adapter.h
+++ b/lib/librte_eventdev/rte_event_crypto_adapter.h
@@ -250,6 +250,10 @@ struct rte_event_crypto_adapter_conf {
 	 * max_nb crypto ops. This isn't treated as a requirement; batching
 	 * may cause the adapter to process more than max_nb crypto ops.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
index 2dd259c27..d10f632e9 100644
--- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
@@ -112,6 +112,10 @@ struct rte_event_eth_rx_adapter_conf {
 	 * max_nb_rx mbufs. This isn't treated as a requirement; batching may
 	 * cause the adapter to process more than max_nb_rx mbufs.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -171,6 +175,10 @@ struct rte_event_eth_rx_adapter_queue_conf {
 	 * The event adapter sets ev.event_type to RTE_EVENT_TYPE_ETHDEV in the
 	 * enqueued event.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
diff --git a/lib/librte_eventdev/rte_event_eth_tx_adapter.h b/lib/librte_eventdev/rte_event_eth_tx_adapter.h
index 8c5954716..442e54da4 100644
--- a/lib/librte_eventdev/rte_event_eth_tx_adapter.h
+++ b/lib/librte_eventdev/rte_event_eth_tx_adapter.h
@@ -97,6 +97,10 @@ struct rte_event_eth_tx_adapter_conf {
 	 * max_nb_tx mbufs. This isn't treated as a requirement; batching may
 	 * cause the adapter to process more than max_nb_tx mbufs.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.h b/lib/librte_eventdev/rte_event_timer_adapter.h
index d2ebcb090..f83d85f4d 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.h
+++ b/lib/librte_eventdev/rte_event_timer_adapter.h
@@ -171,6 +171,10 @@ struct rte_event_timer_adapter_conf {
 	/**< Total number of timers per adapter */
 	uint64_t flags;
 	/**< Timer adapter config flags (RTE_EVENT_TIMER_ADAPTER_F_*) */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -268,6 +272,10 @@ struct rte_event_timer_adapter_info {
 	/**< Event timer adapter capabilities */
 	int16_t event_dev_port_id;
 	/**< Event device port ID, if applicable */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index 7dc832353..1effeed58 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -387,6 +387,10 @@ struct rte_event_dev_info {
 	 */
 	uint32_t event_dev_cap;
 	/**< Event device capabilities(RTE_EVENT_DEV_CAP_)*/
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -494,6 +498,10 @@ struct rte_event_dev_config {
 	 */
 	uint32_t event_dev_cfg;
 	/**< Event device config flags(RTE_EVENT_DEV_CFG_)*/
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -574,6 +582,10 @@ struct rte_event_queue_conf {
 	 * event device supported priority value.
 	 * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -705,6 +717,10 @@ struct rte_event_port_conf {
 	 * RTE_EVENT_OP_FORWARD. Must be false when the device is not
 	 * RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capable.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4 22/27] doc: update references to master/slave lcore in documentation
  @ 2020-07-01 20:23  1%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2020-07-01 20:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

New terms are initial and worker lcores.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/contributing/coding_style.rst            |  2 +-
 doc/guides/faq/faq.rst                              |  6 +++---
 doc/guides/howto/debug_troubleshoot.rst             |  2 +-
 doc/guides/linux_gsg/eal_args.include.rst           |  4 ++--
 doc/guides/nics/bnxt.rst                            |  2 +-
 doc/guides/nics/fail_safe.rst                       |  3 ---
 doc/guides/prog_guide/env_abstraction_layer.rst     |  6 +++---
 doc/guides/prog_guide/event_ethernet_rx_adapter.rst |  2 +-
 doc/guides/prog_guide/glossary.rst                  |  8 ++++----
 doc/guides/rel_notes/release_20_08.rst              |  7 ++++++-
 doc/guides/sample_app_ug/bbdev_app.rst              |  2 +-
 doc/guides/sample_app_ug/ethtool.rst                |  4 ++--
 doc/guides/sample_app_ug/hello_world.rst            |  8 ++++----
 doc/guides/sample_app_ug/ioat.rst                   | 12 ++++++------
 doc/guides/sample_app_ug/ip_pipeline.rst            |  4 ++--
 doc/guides/sample_app_ug/keep_alive.rst             |  2 +-
 doc/guides/sample_app_ug/l2_forward_event.rst       |  4 ++--
 .../sample_app_ug/l2_forward_real_virtual.rst       |  4 ++--
 doc/guides/sample_app_ug/l3_forward_graph.rst       |  6 +++---
 doc/guides/sample_app_ug/l3_forward_power_man.rst   |  2 +-
 doc/guides/sample_app_ug/link_status_intr.rst       |  4 ++--
 doc/guides/sample_app_ug/multi_process.rst          |  6 +++---
 doc/guides/sample_app_ug/packet_ordering.rst        |  8 ++++----
 doc/guides/sample_app_ug/performance_thread.rst     |  6 +++---
 doc/guides/sample_app_ug/qos_scheduler.rst          |  4 ++--
 doc/guides/sample_app_ug/timer.rst                  | 13 +++++++------
 doc/guides/testpmd_app_ug/run_app.rst               |  2 +-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst         |  2 +-
 28 files changed, 69 insertions(+), 66 deletions(-)

diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst
index 4efde93f6af0..321d54438f7d 100644
--- a/doc/guides/contributing/coding_style.rst
+++ b/doc/guides/contributing/coding_style.rst
@@ -334,7 +334,7 @@ For example:
 	typedef int (lcore_function_t)(void *);
 
 	/* launch a function of lcore_function_t type */
-	int rte_eal_remote_launch(lcore_function_t *f, void *arg, unsigned slave_id);
+	int rte_eal_remote_launch(lcore_function_t *f, void *arg, unsigned id);
 
 
 C Indentation
diff --git a/doc/guides/faq/faq.rst b/doc/guides/faq/faq.rst
index f19c1389b6af..cb5f35923d64 100644
--- a/doc/guides/faq/faq.rst
+++ b/doc/guides/faq/faq.rst
@@ -42,13 +42,13 @@ I am running a 32-bit DPDK application on a NUMA system, and sometimes the appli
 If your system has a lot (>1 GB size) of hugepage memory, not all of it will be allocated.
 Due to hugepages typically being allocated on a local NUMA node, the hugepages allocation the application gets during the initialization depends on which
 NUMA node it is running on (the EAL does not affinitize cores until much later in the initialization process).
-Sometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK master core and
+Sometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK initial core and
 therefore all the hugepages are allocated on the wrong socket.
 
 To avoid this scenario, either lower the amount of hugepage memory available to 1 GB size (or less), or run the application with taskset
-affinitizing the application to a would-be master core.
+affinitizing the application to a would-be initial core.
 
-For example, if your EAL coremask is 0xff0, the master core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset::
+For example, if your EAL coremask is 0xff0, the initial core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset::
 
    taskset 0x10 ./l2fwd -l 4-11 -n 2
 
diff --git a/doc/guides/howto/debug_troubleshoot.rst b/doc/guides/howto/debug_troubleshoot.rst
index cef016b2fef4..fdeaabe62206 100644
--- a/doc/guides/howto/debug_troubleshoot.rst
+++ b/doc/guides/howto/debug_troubleshoot.rst
@@ -311,7 +311,7 @@ Custom worker function :numref:`dtg_distributor_worker`.
      SERVICE. Check performance functions are mapped to run on the cores.
 
    * For high-performance execution logic ensure running it on correct NUMA
-     and non-master core.
+     and worker core.
 
    * Analyze run logic with ``rte_dump_stack``, ``rte_dump_registers`` and
      ``rte_memdump`` for more insights.
diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index 0fe44579689b..ca7508fb423e 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -33,9 +33,9 @@ Lcore-related options
     At a given instance only one core option ``--lcores``, ``-l`` or ``-c`` can
     be used.
 
-*   ``--master-lcore <core ID>``
+*   ``--initial-lcore <core ID>``
 
-    Core ID that is used as master.
+    Core ID that is used as initial lcore.
 
 *   ``-s <service core mask>``
 
diff --git a/doc/guides/nics/bnxt.rst b/doc/guides/nics/bnxt.rst
index a53cdad21d34..6a7314a91627 100644
--- a/doc/guides/nics/bnxt.rst
+++ b/doc/guides/nics/bnxt.rst
@@ -385,7 +385,7 @@ The application enables multiple TX and RX queues when it is started.
 
 .. code-block:: console
 
-    testpmd -l 1,3,5 --master-lcore 1 --txq=2 –rxq=2 --nb-cores=2
+    testpmd -l 1,3,5 --initial-lcore 1 --txq=2 –rxq=2 --nb-cores=2
 
 **TSS**
 
diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index b4a92f663b17..3b15d6f0743d 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -236,9 +236,6 @@ Upkeep round
     (brought down or up accordingly). Additionally, any sub-device marked for
     removal is cleaned-up.
 
-Slave
-    In the context of the fail-safe PMD, synonymous to sub-device.
-
 Sub-device
     A device being utilized by the fail-safe PMD.
     This is another PMD running underneath the fail-safe PMD.
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 48a2fec066db..463245463c52 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -64,7 +64,7 @@ It consist of calls to the pthread library (more specifically, pthread_self(), p
 .. note::
 
     Initialization of objects, such as memory zones, rings, memory pools, lpm tables and hash tables,
-    should be done as part of the overall application initialization on the master lcore.
+    should be done as part of the overall application initialization on the initial lcore.
     The creation and initialization functions for these objects are not multi-thread safe.
     However, once initialized, the objects themselves can safely be used in multiple threads simultaneously.
 
@@ -186,7 +186,7 @@ very dependent on the memory allocation patterns of the application.
 
 Additional restrictions are present when running in 32-bit mode. In dynamic
 memory mode, by default maximum of 2 gigabytes of VA space will be preallocated,
-and all of it will be on master lcore NUMA node unless ``--socket-mem`` flag is
+and all of it will be on initial lcore NUMA node unless ``--socket-mem`` flag is
 used.
 
 In legacy mode, VA space will only be preallocated for segments that were
@@ -603,7 +603,7 @@ controlled with tools like taskset (Linux) or cpuset (FreeBSD),
 - with affinity restricted to 2-4, the Control Threads will end up on
   CPU 4.
 - with affinity restricted to 2-3, the Control Threads will end up on
-  CPU 2 (master lcore, which is the default when no CPU is available).
+  CPU 2 (initial lcore, which is the default when no CPU is available).
 
 .. _known_issue_label:
 
diff --git a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
index c7dda92215ea..5d015fa2d678 100644
--- a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
+++ b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
@@ -172,7 +172,7 @@ converts the received packets to events in the same manner as packets
 received on a polled Rx queue. The interrupt thread is affinitized to the same
 CPUs as the lcores of the Rx adapter service function, if the Rx adapter
 service function has not been mapped to any lcores, the interrupt thread
-is mapped to the master lcore.
+is mapped to the initial lcore.
 
 Rx Callback for SW Rx Adapter
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/prog_guide/glossary.rst b/doc/guides/prog_guide/glossary.rst
index 21063a414729..3716efd13da2 100644
--- a/doc/guides/prog_guide/glossary.rst
+++ b/doc/guides/prog_guide/glossary.rst
@@ -124,9 +124,9 @@ LAN
 LPM
    Longest Prefix Match
 
-master lcore
+initial lcore
    The execution unit that executes the main() function and that launches
-   other lcores.
+   other lcores. Described in older versions as master lcore.
 
 mbuf
    An mbuf is a data structure used internally to carry messages (mainly
@@ -184,8 +184,8 @@ RTE
 Rx
    Reception
 
-Slave lcore
-   Any *lcore* that is not the *master lcore*.
+Worker lcore
+   Any *lcore* that is not the *initial lcore*.
 
 Socket
    A physical CPU, that includes several *cores*.
diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 5cbc4ce14446..ecbceb0d05e3 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -107,6 +107,9 @@ New Features
   * Dump ``rte_flow`` memory consumption.
   * Measure packet per second forwarding.
 
+* **Renamed master lcore to initial lcore.**
+
+  The name given to the first thread in DPDK is changed from master lcore to initial lcore.
 
 Removed Items
 -------------
@@ -122,7 +125,6 @@ Removed Items
 
 * Removed ``RTE_KDRV_NONE`` based PCI device driver probing.
 
-
 API Changes
 -----------
 
@@ -143,6 +145,9 @@ API Changes
 * vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
   queue and not per device, a qid parameter was added to the arguments list.
 
+* ``rte_get_master_lcore`` was renamed to ``rte_get_initial_lcore``
+  The old function is deprecated and will be removed in future release.
+
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/bbdev_app.rst b/doc/guides/sample_app_ug/bbdev_app.rst
index 405e706a46e4..5917d52ca199 100644
--- a/doc/guides/sample_app_ug/bbdev_app.rst
+++ b/doc/guides/sample_app_ug/bbdev_app.rst
@@ -94,7 +94,7 @@ device gets linked to a corresponding ethernet port as whitelisted by
 the parameter -w.
 3 cores are allocated to the application, and assigned as:
 
- - core 3 is the master and used to print the stats live on screen,
+ - core 3 is the initial and used to print the stats live on screen,
 
  - core 4 is the encoding lcore performing Rx and Turbo Encode operations
 
diff --git a/doc/guides/sample_app_ug/ethtool.rst b/doc/guides/sample_app_ug/ethtool.rst
index 8f7fc6ca66c0..a4b92255c266 100644
--- a/doc/guides/sample_app_ug/ethtool.rst
+++ b/doc/guides/sample_app_ug/ethtool.rst
@@ -64,8 +64,8 @@ Explanation
 -----------
 
 The sample program has two parts: A background `packet reflector`_
-that runs on a slave core, and a foreground `Ethtool Shell`_ that
-runs on the master core. These are described below.
+that runs on a worker core, and a foreground `Ethtool Shell`_ that
+runs on the initial core. These are described below.
 
 Packet Reflector
 ~~~~~~~~~~~~~~~~
diff --git a/doc/guides/sample_app_ug/hello_world.rst b/doc/guides/sample_app_ug/hello_world.rst
index 46f997a7dce3..f6740b10e385 100644
--- a/doc/guides/sample_app_ug/hello_world.rst
+++ b/doc/guides/sample_app_ug/hello_world.rst
@@ -75,13 +75,13 @@ The code that launches the function on each lcore is as follows:
 
 .. code-block:: c
 
-    /* call lcore_hello() on every slave lcore */
+    /* call lcore_hello() on every worker lcore */
 
-    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+    RTE_LCORE_FOREACH_WORKER(lcore_id) {
        rte_eal_remote_launch(lcore_hello, NULL, lcore_id);
     }
 
-    /* call it on master lcore too */
+    /* call it on initial lcore too */
 
     lcore_hello(NULL);
 
@@ -89,6 +89,6 @@ The following code is equivalent and simpler:
 
 .. code-block:: c
 
-    rte_eal_mp_remote_launch(lcore_hello, NULL, CALL_MASTER);
+    rte_eal_mp_remote_launch(lcore_hello, NULL, CALL_INITIAL);
 
 Refer to the *DPDK API Reference* for detailed information on the rte_eal_mp_remote_launch() function.
diff --git a/doc/guides/sample_app_ug/ioat.rst b/doc/guides/sample_app_ug/ioat.rst
index bab7654b8d4d..c75b91bfa989 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/ioat.rst
@@ -69,13 +69,13 @@ provided parameters. The app can use up to 2 lcores: one of them receives
 incoming traffic and makes a copy of each packet. The second lcore then
 updates MAC address and sends the copy. If one lcore per port is used,
 both operations are done sequentially. For each configuration an additional
-lcore is needed since the master lcore does not handle traffic but is
+lcore is needed since the initial lcore does not handle traffic but is
 responsible for configuration, statistics printing and safe shutdown of
 all ports and devices.
 
 The application can use a maximum of 8 ports.
 
-To run the application in a Linux environment with 3 lcores (the master lcore,
+To run the application in a Linux environment with 3 lcores (the initial lcore,
 plus two forwarding cores), a single port (port 0), software copying and MAC
 updating issue the command:
 
@@ -83,7 +83,7 @@ updating issue the command:
 
     $ ./build/ioatfwd -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
 
-To run the application in a Linux environment with 2 lcores (the master lcore,
+To run the application in a Linux environment with 2 lcores (the initial lcore,
 plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC
 updating issue the command:
 
@@ -208,7 +208,7 @@ After that each port application assigns resources needed.
     cfg.nb_lcores = rte_lcore_count() - 1;
     if (cfg.nb_lcores < 1)
         rte_exit(EXIT_FAILURE,
-            "There should be at least one slave lcore.\n");
+            "There should be at least one worker lcore.\n");
 
     ret = 0;
 
@@ -310,8 +310,8 @@ If initialization is successful, memory for hardware device
 statistics is allocated.
 
 Finally ``main()`` function starts all packet handling lcores and starts
-printing stats in a loop on the master lcore. The application can be
-interrupted and closed using ``Ctrl-C``. The master lcore waits for
+printing stats in a loop on the initial lcore. The application can be
+interrupted and closed using ``Ctrl-C``. The initial lcore waits for
 all slave processes to finish, deallocates resources and exits.
 
 The processing lcores launching function are described below.
diff --git a/doc/guides/sample_app_ug/ip_pipeline.rst b/doc/guides/sample_app_ug/ip_pipeline.rst
index 56014be17458..f395027b3498 100644
--- a/doc/guides/sample_app_ug/ip_pipeline.rst
+++ b/doc/guides/sample_app_ug/ip_pipeline.rst
@@ -122,7 +122,7 @@ is displayed and the application is terminated.
 Run-time
 ~~~~~~~~
 
-The master thread is creating and managing all the application objects based on CLI input.
+The initial thread is creating and managing all the application objects based on CLI input.
 
 Each data plane thread runs one or several pipelines previously assigned to it in round-robin order. Each data plane thread
 executes two tasks in time-sharing mode:
@@ -130,7 +130,7 @@ executes two tasks in time-sharing mode:
 1. *Packet processing task*: Process bursts of input packets read from the pipeline input ports.
 
 2. *Message handling task*: Periodically, the data plane thread pauses the packet processing task and polls for request
-   messages send by the master thread. Examples: add/remove pipeline to/from current data plane thread, add/delete rules
+   messages send by the initial thread. Examples: add/remove pipeline to/from current data plane thread, add/delete rules
    to/from given table of a specific pipeline owned by the current data plane thread, read statistics, etc.
 
 Examples
diff --git a/doc/guides/sample_app_ug/keep_alive.rst b/doc/guides/sample_app_ug/keep_alive.rst
index 865ba69e5c47..bca5df8ba934 100644
--- a/doc/guides/sample_app_ug/keep_alive.rst
+++ b/doc/guides/sample_app_ug/keep_alive.rst
@@ -16,7 +16,7 @@ Overview
 --------
 
 The application demonstrates how to protect against 'silent outages'
-on packet processing cores. A Keep Alive Monitor Agent Core (master)
+on packet processing cores. A Keep Alive Monitor Agent Core (initial)
 monitors the state of packet processing cores (worker cores) by
 dispatching pings at a regular time interval (default is 5ms) and
 monitoring the state of the cores. Cores states are: Alive, MIA, Dead
diff --git a/doc/guides/sample_app_ug/l2_forward_event.rst b/doc/guides/sample_app_ug/l2_forward_event.rst
index d536eee819d0..f384420cf1f0 100644
--- a/doc/guides/sample_app_ug/l2_forward_event.rst
+++ b/doc/guides/sample_app_ug/l2_forward_event.rst
@@ -630,8 +630,8 @@ not many packets to send, however it improves performance:
 
                         /* if timer has reached its timeout */
                         if (unlikely(timer_tsc >= timer_period)) {
-                                /* do this only on master core */
-                                if (lcore_id == rte_get_master_lcore()) {
+                                /* do this only on initial core */
+                                if (lcore_id == rte_get_initial_lcore()) {
                                         print_stats();
                                         /* reset the timer */
                                         timer_tsc = 0;
diff --git a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
index 671d0c7c19d4..615a55c36db9 100644
--- a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
+++ b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
@@ -440,9 +440,9 @@ however it improves performance:
             /* if timer has reached its timeout */
 
             if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-                /* do this only on master core */
+                /* do this only on initial core */
 
-                if (lcore_id == rte_get_master_lcore()) {
+                if (lcore_id == rte_get_initial_lcore()) {
                     print_stats();
 
                     /* reset the timer */
diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index df50827bab86..4ac96fc0c2f7 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -22,7 +22,7 @@ Run-time path is main thing that differs from L3 forwarding sample application.
 Difference is that forwarding logic starting from Rx, followed by LPM lookup,
 TTL update and finally Tx is implemented inside graph nodes. These nodes are
 interconnected in graph framework. Application main loop needs to walk over
-graph using ``rte_graph_walk()`` with graph objects created one per slave lcore.
+graph using ``rte_graph_walk()`` with graph objects created one per worker lcore.
 
 The lookup method is as per implementation of ``ip4_lookup`` graph node.
 The ID of the output interface for the input packet is the next hop returned by
@@ -265,7 +265,7 @@ headers will be provided run-time using ``rte_node_ip4_route_add()`` and
     Since currently ``ip4_lookup`` and ``ip4_rewrite`` nodes don't support
     lock-less mechanisms(RCU, etc) to add run-time forwarding data like route and
     rewrite data, forwarding data is added before packet processing loop is
-    launched on slave lcore.
+    launched on worker lcore.
 
 .. code-block:: c
 
@@ -297,7 +297,7 @@ Packet Forwarding using Graph Walk
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Now that all the device configurations are done, graph creations are done and
-forwarding data is updated with nodes, slave lcores will be launched with graph
+forwarding data is updated with nodes, worker lcores will be launched with graph
 main loop. Graph main loop is very simple in the sense that it needs to
 continuously call a non-blocking API ``rte_graph_walk()`` with it's lcore
 specific graph object that was already created.
diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst
index 0cc6f2e62e75..f20502c41a37 100644
--- a/doc/guides/sample_app_ug/l3_forward_power_man.rst
+++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst
@@ -441,7 +441,7 @@ The telemetry mode support for ``l3fwd-power`` is a standalone mode, in this mod
 ``l3fwd-power`` does simple l3fwding along with calculating empty polls, full polls,
 and busy percentage for each forwarding core. The aggregation of these
 values of all cores is reported as application level telemetry to metric
-library for every 500ms from the master core.
+library for every 500ms from the initial core.
 
 The busy percentage is calculated by recording the poll_count
 and when the count reaches a defined value the total
diff --git a/doc/guides/sample_app_ug/link_status_intr.rst b/doc/guides/sample_app_ug/link_status_intr.rst
index 04c40f28540d..e31fd2cc7368 100644
--- a/doc/guides/sample_app_ug/link_status_intr.rst
+++ b/doc/guides/sample_app_ug/link_status_intr.rst
@@ -401,9 +401,9 @@ However, it improves performance:
             /* if timer has reached its timeout */
 
             if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-                /* do this only on master core */
+                /* do this only on initial core */
 
-                if (lcore_id == rte_get_master_lcore()) {
+                if (lcore_id == rte_get_initial_lcore()) {
                     print_stats();
 
                     /* reset the timer */
diff --git a/doc/guides/sample_app_ug/multi_process.rst b/doc/guides/sample_app_ug/multi_process.rst
index f2a79a639763..51b8db5cf75a 100644
--- a/doc/guides/sample_app_ug/multi_process.rst
+++ b/doc/guides/sample_app_ug/multi_process.rst
@@ -66,7 +66,7 @@ The process should start successfully and display a command prompt as follows:
 
     EAL: check igb_uio module
     EAL: check module finished
-    EAL: Master core 0 is ready (tid=54e41820)
+    EAL: Initial core 0 is ready (tid=54e41820)
     EAL: Core 1 is ready (tid=53b32700)
 
     Starting core 1
@@ -92,7 +92,7 @@ At any stage, either process can be terminated using the quit command.
 
 .. code-block:: console
 
-   EAL: Master core 10 is ready (tid=b5f89820)           EAL: Master core 8 is ready (tid=864a3820)
+   EAL: Initial core 10 is ready (tid=b5f89820)           EAL: Initial core 8 is ready (tid=864a3820)
    EAL: Core 11 is ready (tid=84ffe700)                  EAL: Core 9 is ready (tid=85995700)
    Starting core 11                                      Starting core 9
    simple_mp > send hello_secondary                      simple_mp > core 9: Received 'hello_secondary'
@@ -273,7 +273,7 @@ In addition to the EAL parameters, the application- specific parameters are:
 
 .. note::
 
-    In the server process, a single thread, the master thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
+    In the server process, a single thread, the initial thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
     If a coremask/corelist is specified with more than a single lcore bit set in it,
     an additional lcore will be used for a thread to periodically print packet count statistics.
 
diff --git a/doc/guides/sample_app_ug/packet_ordering.rst b/doc/guides/sample_app_ug/packet_ordering.rst
index 1c8ee5d04071..e82938bd7c9c 100644
--- a/doc/guides/sample_app_ug/packet_ordering.rst
+++ b/doc/guides/sample_app_ug/packet_ordering.rst
@@ -12,14 +12,14 @@ Overview
 
 The application uses at least three CPU cores:
 
-* RX core (maser core) receives traffic from the NIC ports and feeds Worker
+* RX core (initial core) receives traffic from the NIC ports and feeds Worker
   cores with traffic through SW queues.
 
-* Worker core (slave core) basically do some light work on the packet.
+* Worker cores basically do some light work on the packet.
   Currently it modifies the output port of the packet for configurations with
   more than one port enabled.
 
-* TX Core (slave core) receives traffic from Worker cores through software queues,
+* TX Core receives traffic from Worker cores through software queues,
   inserts out-of-order packets into reorder buffer, extracts ordered packets
   from the reorder buffer and sends them to the NIC ports for transmission.
 
@@ -46,7 +46,7 @@ The application execution command line is:
     ./packet_ordering [EAL options] -- -p PORTMASK [--disable-reorder] [--insight-worker]
 
 The -c EAL CPU_COREMASK option has to contain at least 3 CPU cores.
-The first CPU core in the core mask is the master core and would be assigned to
+The first CPU core in the core mask is the initial core and would be assigned to
 RX core, the last to TX core and the rest to Worker cores.
 
 The PORTMASK parameter must contain either 1 or even enabled port numbers.
diff --git a/doc/guides/sample_app_ug/performance_thread.rst b/doc/guides/sample_app_ug/performance_thread.rst
index b04d0ba444af..29105f9708eb 100644
--- a/doc/guides/sample_app_ug/performance_thread.rst
+++ b/doc/guides/sample_app_ug/performance_thread.rst
@@ -280,8 +280,8 @@ functionality into different threads, and the pairs of RX and TX threads are
 interconnected via software rings.
 
 On initialization an L-thread scheduler is started on every EAL thread. On all
-but the master EAL thread only a dummy L-thread is initially started.
-The L-thread started on the master EAL thread then spawns other L-threads on
+but the initial EAL thread only a dummy L-thread is initially started.
+The L-thread started on the initial EAL thread then spawns other L-threads on
 different L-thread schedulers according the command line parameters.
 
 The RX threads poll the network interface queues and post received packets
@@ -1217,5 +1217,5 @@ Setting ``LTHREAD_DIAG`` also enables counting of statistics about cache and
 queue usage, and these statistics can be displayed by calling the function
 ``lthread_diag_stats_display()``. This function also performs a consistency
 check on the caches and queues. The function should only be called from the
-master EAL thread after all slave threads have stopped and returned to the C
+initial EAL thread after all worker threads have stopped and returned to the C
 main program, otherwise the consistency check will fail.
diff --git a/doc/guides/sample_app_ug/qos_scheduler.rst b/doc/guides/sample_app_ug/qos_scheduler.rst
index b5010657a7d8..345ecbb5905d 100644
--- a/doc/guides/sample_app_ug/qos_scheduler.rst
+++ b/doc/guides/sample_app_ug/qos_scheduler.rst
@@ -71,7 +71,7 @@ Optional application parameters include:
     In this mode, the application shows a command line that can be used for obtaining statistics while
     scheduling is taking place (see interactive mode below for more information).
 
-*   --mst n: Master core index (the default value is 1).
+*   --mst n: Initial core index (the default value is 1).
 
 *   --rsz "A, B, C": Ring sizes:
 
@@ -329,7 +329,7 @@ Another example with 2 packet flow configurations using different ports but shar
 Note that independent cores for the packet flow configurations for each of the RX, WT and TX thread are also supported,
 providing flexibility to balance the work.
 
-The EAL coremask/corelist is constrained to contain the default mastercore 1 and the RX, WT and TX cores only.
+The EAL coremask/corelist is constrained to contain the default initial lcore 1 and the RX, WT and TX cores only.
 
 Explanation
 -----------
diff --git a/doc/guides/sample_app_ug/timer.rst b/doc/guides/sample_app_ug/timer.rst
index 98d762d2388c..59a8ab11e9b6 100644
--- a/doc/guides/sample_app_ug/timer.rst
+++ b/doc/guides/sample_app_ug/timer.rst
@@ -49,17 +49,18 @@ In addition to EAL initialization, the timer subsystem must be initialized, by c
     rte_timer_subsystem_init();
 
 After timer creation (see the next paragraph),
-the main loop is executed on each slave lcore using the well-known rte_eal_remote_launch() and also on the master.
+the main loop is executed on each worker lcore using the well-known rte_eal_remote_launch() and
+also on the initial lcore.
 
 .. code-block:: c
 
-    /* call lcore_mainloop() on every slave lcore  */
+    /* call lcore_mainloop() on every worker lcore  */
 
-    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+    RTE_LCORE_FOREACH_WORKER(lcore_id) {
         rte_eal_remote_launch(lcore_mainloop, NULL, lcore_id);
     }
 
-    /* call it on master lcore too */
+    /* call it on initial lcore too */
 
     (void) lcore_mainloop(NULL);
 
@@ -105,7 +106,7 @@ This call to rte_timer_init() is necessary before doing any other operation on t
 
 Then, the two timers are configured:
 
-*   The first timer (timer0) is loaded on the master lcore and expires every second.
+*   The first timer (timer0) is loaded on the initial lcore and expires every second.
     Since the PERIODICAL flag is provided, the timer is reloaded automatically by the timer subsystem.
     The callback function is timer0_cb().
 
@@ -115,7 +116,7 @@ Then, the two timers are configured:
 
 .. code-block:: c
 
-    /* load timer0, every second, on master lcore, reloaded automatically */
+    /* load timer0, every second, on initial lcore, reloaded automatically */
 
     hz = rte_get_hpet_hz();
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f169604752b8..7d6b81de7f46 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -71,7 +71,7 @@ The command line options are:
 *   ``--coremask=0xXX``
 
     Set the hexadecimal bitmask of the cores running the packet forwarding test.
-    The master lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
+    The initial lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
 
 *   ``--portmask=0xXX``
 
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a308f2..7d4db1140092 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -692,7 +692,7 @@ This is equivalent to the ``--coremask`` command-line option.
 
 .. note::
 
-   The master lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
+   The initial lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
 
 set portmask
 ~~~~~~~~~~~~
-- 
2.26.2


^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v3 22/27] doc: update references to master/slave lcore in documentation
  @ 2020-07-01 19:46  1%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2020-07-01 19:46 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

New terms are initial and worker lcores.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/contributing/coding_style.rst            |  2 +-
 doc/guides/faq/faq.rst                              |  6 +++---
 doc/guides/howto/debug_troubleshoot.rst             |  2 +-
 doc/guides/linux_gsg/eal_args.include.rst           |  4 ++--
 doc/guides/nics/bnxt.rst                            |  2 +-
 doc/guides/nics/fail_safe.rst                       |  3 ---
 doc/guides/prog_guide/env_abstraction_layer.rst     |  6 +++---
 doc/guides/prog_guide/event_ethernet_rx_adapter.rst |  2 +-
 doc/guides/prog_guide/glossary.rst                  |  8 ++++----
 doc/guides/rel_notes/release_20_08.rst              |  7 ++++++-
 doc/guides/sample_app_ug/bbdev_app.rst              |  2 +-
 doc/guides/sample_app_ug/ethtool.rst                |  4 ++--
 doc/guides/sample_app_ug/hello_world.rst            |  8 ++++----
 doc/guides/sample_app_ug/ioat.rst                   | 12 ++++++------
 doc/guides/sample_app_ug/ip_pipeline.rst            |  4 ++--
 doc/guides/sample_app_ug/keep_alive.rst             |  2 +-
 doc/guides/sample_app_ug/l2_forward_event.rst       |  4 ++--
 .../sample_app_ug/l2_forward_real_virtual.rst       |  4 ++--
 doc/guides/sample_app_ug/l3_forward_graph.rst       |  6 +++---
 doc/guides/sample_app_ug/l3_forward_power_man.rst   |  2 +-
 doc/guides/sample_app_ug/link_status_intr.rst       |  4 ++--
 doc/guides/sample_app_ug/multi_process.rst          |  6 +++---
 doc/guides/sample_app_ug/packet_ordering.rst        |  8 ++++----
 doc/guides/sample_app_ug/performance_thread.rst     |  6 +++---
 doc/guides/sample_app_ug/qos_scheduler.rst          |  4 ++--
 doc/guides/sample_app_ug/timer.rst                  | 13 +++++++------
 doc/guides/testpmd_app_ug/run_app.rst               |  2 +-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst         |  2 +-
 28 files changed, 69 insertions(+), 66 deletions(-)

diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst
index 4efde93f6af0..321d54438f7d 100644
--- a/doc/guides/contributing/coding_style.rst
+++ b/doc/guides/contributing/coding_style.rst
@@ -334,7 +334,7 @@ For example:
 	typedef int (lcore_function_t)(void *);
 
 	/* launch a function of lcore_function_t type */
-	int rte_eal_remote_launch(lcore_function_t *f, void *arg, unsigned slave_id);
+	int rte_eal_remote_launch(lcore_function_t *f, void *arg, unsigned id);
 
 
 C Indentation
diff --git a/doc/guides/faq/faq.rst b/doc/guides/faq/faq.rst
index f19c1389b6af..cb5f35923d64 100644
--- a/doc/guides/faq/faq.rst
+++ b/doc/guides/faq/faq.rst
@@ -42,13 +42,13 @@ I am running a 32-bit DPDK application on a NUMA system, and sometimes the appli
 If your system has a lot (>1 GB size) of hugepage memory, not all of it will be allocated.
 Due to hugepages typically being allocated on a local NUMA node, the hugepages allocation the application gets during the initialization depends on which
 NUMA node it is running on (the EAL does not affinitize cores until much later in the initialization process).
-Sometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK master core and
+Sometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK initial core and
 therefore all the hugepages are allocated on the wrong socket.
 
 To avoid this scenario, either lower the amount of hugepage memory available to 1 GB size (or less), or run the application with taskset
-affinitizing the application to a would-be master core.
+affinitizing the application to a would-be initial core.
 
-For example, if your EAL coremask is 0xff0, the master core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset::
+For example, if your EAL coremask is 0xff0, the initial core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset::
 
    taskset 0x10 ./l2fwd -l 4-11 -n 2
 
diff --git a/doc/guides/howto/debug_troubleshoot.rst b/doc/guides/howto/debug_troubleshoot.rst
index cef016b2fef4..fdeaabe62206 100644
--- a/doc/guides/howto/debug_troubleshoot.rst
+++ b/doc/guides/howto/debug_troubleshoot.rst
@@ -311,7 +311,7 @@ Custom worker function :numref:`dtg_distributor_worker`.
      SERVICE. Check performance functions are mapped to run on the cores.
 
    * For high-performance execution logic ensure running it on correct NUMA
-     and non-master core.
+     and worker core.
 
    * Analyze run logic with ``rte_dump_stack``, ``rte_dump_registers`` and
      ``rte_memdump`` for more insights.
diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index 0fe44579689b..ca7508fb423e 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -33,9 +33,9 @@ Lcore-related options
     At a given instance only one core option ``--lcores``, ``-l`` or ``-c`` can
     be used.
 
-*   ``--master-lcore <core ID>``
+*   ``--initial-lcore <core ID>``
 
-    Core ID that is used as master.
+    Core ID that is used as initial lcore.
 
 *   ``-s <service core mask>``
 
diff --git a/doc/guides/nics/bnxt.rst b/doc/guides/nics/bnxt.rst
index a53cdad21d34..6a7314a91627 100644
--- a/doc/guides/nics/bnxt.rst
+++ b/doc/guides/nics/bnxt.rst
@@ -385,7 +385,7 @@ The application enables multiple TX and RX queues when it is started.
 
 .. code-block:: console
 
-    testpmd -l 1,3,5 --master-lcore 1 --txq=2 –rxq=2 --nb-cores=2
+    testpmd -l 1,3,5 --initial-lcore 1 --txq=2 –rxq=2 --nb-cores=2
 
 **TSS**
 
diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index b4a92f663b17..3b15d6f0743d 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -236,9 +236,6 @@ Upkeep round
     (brought down or up accordingly). Additionally, any sub-device marked for
     removal is cleaned-up.
 
-Slave
-    In the context of the fail-safe PMD, synonymous to sub-device.
-
 Sub-device
     A device being utilized by the fail-safe PMD.
     This is another PMD running underneath the fail-safe PMD.
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 48a2fec066db..463245463c52 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -64,7 +64,7 @@ It consist of calls to the pthread library (more specifically, pthread_self(), p
 .. note::
 
     Initialization of objects, such as memory zones, rings, memory pools, lpm tables and hash tables,
-    should be done as part of the overall application initialization on the master lcore.
+    should be done as part of the overall application initialization on the initial lcore.
     The creation and initialization functions for these objects are not multi-thread safe.
     However, once initialized, the objects themselves can safely be used in multiple threads simultaneously.
 
@@ -186,7 +186,7 @@ very dependent on the memory allocation patterns of the application.
 
 Additional restrictions are present when running in 32-bit mode. In dynamic
 memory mode, by default maximum of 2 gigabytes of VA space will be preallocated,
-and all of it will be on master lcore NUMA node unless ``--socket-mem`` flag is
+and all of it will be on initial lcore NUMA node unless ``--socket-mem`` flag is
 used.
 
 In legacy mode, VA space will only be preallocated for segments that were
@@ -603,7 +603,7 @@ controlled with tools like taskset (Linux) or cpuset (FreeBSD),
 - with affinity restricted to 2-4, the Control Threads will end up on
   CPU 4.
 - with affinity restricted to 2-3, the Control Threads will end up on
-  CPU 2 (master lcore, which is the default when no CPU is available).
+  CPU 2 (initial lcore, which is the default when no CPU is available).
 
 .. _known_issue_label:
 
diff --git a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
index c7dda92215ea..5d015fa2d678 100644
--- a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
+++ b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
@@ -172,7 +172,7 @@ converts the received packets to events in the same manner as packets
 received on a polled Rx queue. The interrupt thread is affinitized to the same
 CPUs as the lcores of the Rx adapter service function, if the Rx adapter
 service function has not been mapped to any lcores, the interrupt thread
-is mapped to the master lcore.
+is mapped to the initial lcore.
 
 Rx Callback for SW Rx Adapter
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/prog_guide/glossary.rst b/doc/guides/prog_guide/glossary.rst
index 21063a414729..3716efd13da2 100644
--- a/doc/guides/prog_guide/glossary.rst
+++ b/doc/guides/prog_guide/glossary.rst
@@ -124,9 +124,9 @@ LAN
 LPM
    Longest Prefix Match
 
-master lcore
+initial lcore
    The execution unit that executes the main() function and that launches
-   other lcores.
+   other lcores. Described in older versions as master lcore.
 
 mbuf
    An mbuf is a data structure used internally to carry messages (mainly
@@ -184,8 +184,8 @@ RTE
 Rx
    Reception
 
-Slave lcore
-   Any *lcore* that is not the *master lcore*.
+Worker lcore
+   Any *lcore* that is not the *initial lcore*.
 
 Socket
    A physical CPU, that includes several *cores*.
diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 5cbc4ce14446..ecbceb0d05e3 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -107,6 +107,9 @@ New Features
   * Dump ``rte_flow`` memory consumption.
   * Measure packet per second forwarding.
 
+* **Renamed master lcore to initial lcore.**
+
+  The name given to the first thread in DPDK is changed from master lcore to initial lcore.
 
 Removed Items
 -------------
@@ -122,7 +125,6 @@ Removed Items
 
 * Removed ``RTE_KDRV_NONE`` based PCI device driver probing.
 
-
 API Changes
 -----------
 
@@ -143,6 +145,9 @@ API Changes
 * vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
   queue and not per device, a qid parameter was added to the arguments list.
 
+* ``rte_get_master_lcore`` was renamed to ``rte_get_initial_lcore``
+  The old function is deprecated and will be removed in future release.
+
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/bbdev_app.rst b/doc/guides/sample_app_ug/bbdev_app.rst
index 405e706a46e4..5917d52ca199 100644
--- a/doc/guides/sample_app_ug/bbdev_app.rst
+++ b/doc/guides/sample_app_ug/bbdev_app.rst
@@ -94,7 +94,7 @@ device gets linked to a corresponding ethernet port as whitelisted by
 the parameter -w.
 3 cores are allocated to the application, and assigned as:
 
- - core 3 is the master and used to print the stats live on screen,
+ - core 3 is the initial and used to print the stats live on screen,
 
  - core 4 is the encoding lcore performing Rx and Turbo Encode operations
 
diff --git a/doc/guides/sample_app_ug/ethtool.rst b/doc/guides/sample_app_ug/ethtool.rst
index 8f7fc6ca66c0..a4b92255c266 100644
--- a/doc/guides/sample_app_ug/ethtool.rst
+++ b/doc/guides/sample_app_ug/ethtool.rst
@@ -64,8 +64,8 @@ Explanation
 -----------
 
 The sample program has two parts: A background `packet reflector`_
-that runs on a slave core, and a foreground `Ethtool Shell`_ that
-runs on the master core. These are described below.
+that runs on a worker core, and a foreground `Ethtool Shell`_ that
+runs on the initial core. These are described below.
 
 Packet Reflector
 ~~~~~~~~~~~~~~~~
diff --git a/doc/guides/sample_app_ug/hello_world.rst b/doc/guides/sample_app_ug/hello_world.rst
index 46f997a7dce3..f6740b10e385 100644
--- a/doc/guides/sample_app_ug/hello_world.rst
+++ b/doc/guides/sample_app_ug/hello_world.rst
@@ -75,13 +75,13 @@ The code that launches the function on each lcore is as follows:
 
 .. code-block:: c
 
-    /* call lcore_hello() on every slave lcore */
+    /* call lcore_hello() on every worker lcore */
 
-    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+    RTE_LCORE_FOREACH_WORKER(lcore_id) {
        rte_eal_remote_launch(lcore_hello, NULL, lcore_id);
     }
 
-    /* call it on master lcore too */
+    /* call it on initial lcore too */
 
     lcore_hello(NULL);
 
@@ -89,6 +89,6 @@ The following code is equivalent and simpler:
 
 .. code-block:: c
 
-    rte_eal_mp_remote_launch(lcore_hello, NULL, CALL_MASTER);
+    rte_eal_mp_remote_launch(lcore_hello, NULL, CALL_INITIAL);
 
 Refer to the *DPDK API Reference* for detailed information on the rte_eal_mp_remote_launch() function.
diff --git a/doc/guides/sample_app_ug/ioat.rst b/doc/guides/sample_app_ug/ioat.rst
index bab7654b8d4d..c75b91bfa989 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/ioat.rst
@@ -69,13 +69,13 @@ provided parameters. The app can use up to 2 lcores: one of them receives
 incoming traffic and makes a copy of each packet. The second lcore then
 updates MAC address and sends the copy. If one lcore per port is used,
 both operations are done sequentially. For each configuration an additional
-lcore is needed since the master lcore does not handle traffic but is
+lcore is needed since the initial lcore does not handle traffic but is
 responsible for configuration, statistics printing and safe shutdown of
 all ports and devices.
 
 The application can use a maximum of 8 ports.
 
-To run the application in a Linux environment with 3 lcores (the master lcore,
+To run the application in a Linux environment with 3 lcores (the initial lcore,
 plus two forwarding cores), a single port (port 0), software copying and MAC
 updating issue the command:
 
@@ -83,7 +83,7 @@ updating issue the command:
 
     $ ./build/ioatfwd -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
 
-To run the application in a Linux environment with 2 lcores (the master lcore,
+To run the application in a Linux environment with 2 lcores (the initial lcore,
 plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC
 updating issue the command:
 
@@ -208,7 +208,7 @@ After that each port application assigns resources needed.
     cfg.nb_lcores = rte_lcore_count() - 1;
     if (cfg.nb_lcores < 1)
         rte_exit(EXIT_FAILURE,
-            "There should be at least one slave lcore.\n");
+            "There should be at least one worker lcore.\n");
 
     ret = 0;
 
@@ -310,8 +310,8 @@ If initialization is successful, memory for hardware device
 statistics is allocated.
 
 Finally ``main()`` function starts all packet handling lcores and starts
-printing stats in a loop on the master lcore. The application can be
-interrupted and closed using ``Ctrl-C``. The master lcore waits for
+printing stats in a loop on the initial lcore. The application can be
+interrupted and closed using ``Ctrl-C``. The initial lcore waits for
 all slave processes to finish, deallocates resources and exits.
 
 The processing lcores launching function are described below.
diff --git a/doc/guides/sample_app_ug/ip_pipeline.rst b/doc/guides/sample_app_ug/ip_pipeline.rst
index 56014be17458..f395027b3498 100644
--- a/doc/guides/sample_app_ug/ip_pipeline.rst
+++ b/doc/guides/sample_app_ug/ip_pipeline.rst
@@ -122,7 +122,7 @@ is displayed and the application is terminated.
 Run-time
 ~~~~~~~~
 
-The master thread is creating and managing all the application objects based on CLI input.
+The initial thread is creating and managing all the application objects based on CLI input.
 
 Each data plane thread runs one or several pipelines previously assigned to it in round-robin order. Each data plane thread
 executes two tasks in time-sharing mode:
@@ -130,7 +130,7 @@ executes two tasks in time-sharing mode:
 1. *Packet processing task*: Process bursts of input packets read from the pipeline input ports.
 
 2. *Message handling task*: Periodically, the data plane thread pauses the packet processing task and polls for request
-   messages send by the master thread. Examples: add/remove pipeline to/from current data plane thread, add/delete rules
+   messages send by the initial thread. Examples: add/remove pipeline to/from current data plane thread, add/delete rules
    to/from given table of a specific pipeline owned by the current data plane thread, read statistics, etc.
 
 Examples
diff --git a/doc/guides/sample_app_ug/keep_alive.rst b/doc/guides/sample_app_ug/keep_alive.rst
index 865ba69e5c47..bca5df8ba934 100644
--- a/doc/guides/sample_app_ug/keep_alive.rst
+++ b/doc/guides/sample_app_ug/keep_alive.rst
@@ -16,7 +16,7 @@ Overview
 --------
 
 The application demonstrates how to protect against 'silent outages'
-on packet processing cores. A Keep Alive Monitor Agent Core (master)
+on packet processing cores. A Keep Alive Monitor Agent Core (initial)
 monitors the state of packet processing cores (worker cores) by
 dispatching pings at a regular time interval (default is 5ms) and
 monitoring the state of the cores. Cores states are: Alive, MIA, Dead
diff --git a/doc/guides/sample_app_ug/l2_forward_event.rst b/doc/guides/sample_app_ug/l2_forward_event.rst
index d536eee819d0..f384420cf1f0 100644
--- a/doc/guides/sample_app_ug/l2_forward_event.rst
+++ b/doc/guides/sample_app_ug/l2_forward_event.rst
@@ -630,8 +630,8 @@ not many packets to send, however it improves performance:
 
                         /* if timer has reached its timeout */
                         if (unlikely(timer_tsc >= timer_period)) {
-                                /* do this only on master core */
-                                if (lcore_id == rte_get_master_lcore()) {
+                                /* do this only on initial core */
+                                if (lcore_id == rte_get_initial_lcore()) {
                                         print_stats();
                                         /* reset the timer */
                                         timer_tsc = 0;
diff --git a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
index 671d0c7c19d4..615a55c36db9 100644
--- a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
+++ b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
@@ -440,9 +440,9 @@ however it improves performance:
             /* if timer has reached its timeout */
 
             if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-                /* do this only on master core */
+                /* do this only on initial core */
 
-                if (lcore_id == rte_get_master_lcore()) {
+                if (lcore_id == rte_get_initial_lcore()) {
                     print_stats();
 
                     /* reset the timer */
diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index df50827bab86..4ac96fc0c2f7 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -22,7 +22,7 @@ Run-time path is main thing that differs from L3 forwarding sample application.
 Difference is that forwarding logic starting from Rx, followed by LPM lookup,
 TTL update and finally Tx is implemented inside graph nodes. These nodes are
 interconnected in graph framework. Application main loop needs to walk over
-graph using ``rte_graph_walk()`` with graph objects created one per slave lcore.
+graph using ``rte_graph_walk()`` with graph objects created one per worker lcore.
 
 The lookup method is as per implementation of ``ip4_lookup`` graph node.
 The ID of the output interface for the input packet is the next hop returned by
@@ -265,7 +265,7 @@ headers will be provided run-time using ``rte_node_ip4_route_add()`` and
     Since currently ``ip4_lookup`` and ``ip4_rewrite`` nodes don't support
     lock-less mechanisms(RCU, etc) to add run-time forwarding data like route and
     rewrite data, forwarding data is added before packet processing loop is
-    launched on slave lcore.
+    launched on worker lcore.
 
 .. code-block:: c
 
@@ -297,7 +297,7 @@ Packet Forwarding using Graph Walk
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Now that all the device configurations are done, graph creations are done and
-forwarding data is updated with nodes, slave lcores will be launched with graph
+forwarding data is updated with nodes, worker lcores will be launched with graph
 main loop. Graph main loop is very simple in the sense that it needs to
 continuously call a non-blocking API ``rte_graph_walk()`` with it's lcore
 specific graph object that was already created.
diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst
index 0cc6f2e62e75..f20502c41a37 100644
--- a/doc/guides/sample_app_ug/l3_forward_power_man.rst
+++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst
@@ -441,7 +441,7 @@ The telemetry mode support for ``l3fwd-power`` is a standalone mode, in this mod
 ``l3fwd-power`` does simple l3fwding along with calculating empty polls, full polls,
 and busy percentage for each forwarding core. The aggregation of these
 values of all cores is reported as application level telemetry to metric
-library for every 500ms from the master core.
+library for every 500ms from the initial core.
 
 The busy percentage is calculated by recording the poll_count
 and when the count reaches a defined value the total
diff --git a/doc/guides/sample_app_ug/link_status_intr.rst b/doc/guides/sample_app_ug/link_status_intr.rst
index 04c40f28540d..e31fd2cc7368 100644
--- a/doc/guides/sample_app_ug/link_status_intr.rst
+++ b/doc/guides/sample_app_ug/link_status_intr.rst
@@ -401,9 +401,9 @@ However, it improves performance:
             /* if timer has reached its timeout */
 
             if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-                /* do this only on master core */
+                /* do this only on initial core */
 
-                if (lcore_id == rte_get_master_lcore()) {
+                if (lcore_id == rte_get_initial_lcore()) {
                     print_stats();
 
                     /* reset the timer */
diff --git a/doc/guides/sample_app_ug/multi_process.rst b/doc/guides/sample_app_ug/multi_process.rst
index f2a79a639763..51b8db5cf75a 100644
--- a/doc/guides/sample_app_ug/multi_process.rst
+++ b/doc/guides/sample_app_ug/multi_process.rst
@@ -66,7 +66,7 @@ The process should start successfully and display a command prompt as follows:
 
     EAL: check igb_uio module
     EAL: check module finished
-    EAL: Master core 0 is ready (tid=54e41820)
+    EAL: Initial core 0 is ready (tid=54e41820)
     EAL: Core 1 is ready (tid=53b32700)
 
     Starting core 1
@@ -92,7 +92,7 @@ At any stage, either process can be terminated using the quit command.
 
 .. code-block:: console
 
-   EAL: Master core 10 is ready (tid=b5f89820)           EAL: Master core 8 is ready (tid=864a3820)
+   EAL: Initial core 10 is ready (tid=b5f89820)           EAL: Initial core 8 is ready (tid=864a3820)
    EAL: Core 11 is ready (tid=84ffe700)                  EAL: Core 9 is ready (tid=85995700)
    Starting core 11                                      Starting core 9
    simple_mp > send hello_secondary                      simple_mp > core 9: Received 'hello_secondary'
@@ -273,7 +273,7 @@ In addition to the EAL parameters, the application- specific parameters are:
 
 .. note::
 
-    In the server process, a single thread, the master thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
+    In the server process, a single thread, the initial thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
     If a coremask/corelist is specified with more than a single lcore bit set in it,
     an additional lcore will be used for a thread to periodically print packet count statistics.
 
diff --git a/doc/guides/sample_app_ug/packet_ordering.rst b/doc/guides/sample_app_ug/packet_ordering.rst
index 1c8ee5d04071..e82938bd7c9c 100644
--- a/doc/guides/sample_app_ug/packet_ordering.rst
+++ b/doc/guides/sample_app_ug/packet_ordering.rst
@@ -12,14 +12,14 @@ Overview
 
 The application uses at least three CPU cores:
 
-* RX core (maser core) receives traffic from the NIC ports and feeds Worker
+* RX core (initial core) receives traffic from the NIC ports and feeds Worker
   cores with traffic through SW queues.
 
-* Worker core (slave core) basically do some light work on the packet.
+* Worker cores basically do some light work on the packet.
   Currently it modifies the output port of the packet for configurations with
   more than one port enabled.
 
-* TX Core (slave core) receives traffic from Worker cores through software queues,
+* TX Core receives traffic from Worker cores through software queues,
   inserts out-of-order packets into reorder buffer, extracts ordered packets
   from the reorder buffer and sends them to the NIC ports for transmission.
 
@@ -46,7 +46,7 @@ The application execution command line is:
     ./packet_ordering [EAL options] -- -p PORTMASK [--disable-reorder] [--insight-worker]
 
 The -c EAL CPU_COREMASK option has to contain at least 3 CPU cores.
-The first CPU core in the core mask is the master core and would be assigned to
+The first CPU core in the core mask is the initial core and would be assigned to
 RX core, the last to TX core and the rest to Worker cores.
 
 The PORTMASK parameter must contain either 1 or even enabled port numbers.
diff --git a/doc/guides/sample_app_ug/performance_thread.rst b/doc/guides/sample_app_ug/performance_thread.rst
index b04d0ba444af..29105f9708eb 100644
--- a/doc/guides/sample_app_ug/performance_thread.rst
+++ b/doc/guides/sample_app_ug/performance_thread.rst
@@ -280,8 +280,8 @@ functionality into different threads, and the pairs of RX and TX threads are
 interconnected via software rings.
 
 On initialization an L-thread scheduler is started on every EAL thread. On all
-but the master EAL thread only a dummy L-thread is initially started.
-The L-thread started on the master EAL thread then spawns other L-threads on
+but the initial EAL thread only a dummy L-thread is initially started.
+The L-thread started on the initial EAL thread then spawns other L-threads on
 different L-thread schedulers according the command line parameters.
 
 The RX threads poll the network interface queues and post received packets
@@ -1217,5 +1217,5 @@ Setting ``LTHREAD_DIAG`` also enables counting of statistics about cache and
 queue usage, and these statistics can be displayed by calling the function
 ``lthread_diag_stats_display()``. This function also performs a consistency
 check on the caches and queues. The function should only be called from the
-master EAL thread after all slave threads have stopped and returned to the C
+initial EAL thread after all worker threads have stopped and returned to the C
 main program, otherwise the consistency check will fail.
diff --git a/doc/guides/sample_app_ug/qos_scheduler.rst b/doc/guides/sample_app_ug/qos_scheduler.rst
index b5010657a7d8..345ecbb5905d 100644
--- a/doc/guides/sample_app_ug/qos_scheduler.rst
+++ b/doc/guides/sample_app_ug/qos_scheduler.rst
@@ -71,7 +71,7 @@ Optional application parameters include:
     In this mode, the application shows a command line that can be used for obtaining statistics while
     scheduling is taking place (see interactive mode below for more information).
 
-*   --mst n: Master core index (the default value is 1).
+*   --mst n: Initial core index (the default value is 1).
 
 *   --rsz "A, B, C": Ring sizes:
 
@@ -329,7 +329,7 @@ Another example with 2 packet flow configurations using different ports but shar
 Note that independent cores for the packet flow configurations for each of the RX, WT and TX thread are also supported,
 providing flexibility to balance the work.
 
-The EAL coremask/corelist is constrained to contain the default mastercore 1 and the RX, WT and TX cores only.
+The EAL coremask/corelist is constrained to contain the default initial lcore 1 and the RX, WT and TX cores only.
 
 Explanation
 -----------
diff --git a/doc/guides/sample_app_ug/timer.rst b/doc/guides/sample_app_ug/timer.rst
index 98d762d2388c..59a8ab11e9b6 100644
--- a/doc/guides/sample_app_ug/timer.rst
+++ b/doc/guides/sample_app_ug/timer.rst
@@ -49,17 +49,18 @@ In addition to EAL initialization, the timer subsystem must be initialized, by c
     rte_timer_subsystem_init();
 
 After timer creation (see the next paragraph),
-the main loop is executed on each slave lcore using the well-known rte_eal_remote_launch() and also on the master.
+the main loop is executed on each worker lcore using the well-known rte_eal_remote_launch() and
+also on the initial lcore.
 
 .. code-block:: c
 
-    /* call lcore_mainloop() on every slave lcore  */
+    /* call lcore_mainloop() on every worker lcore  */
 
-    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+    RTE_LCORE_FOREACH_WORKER(lcore_id) {
         rte_eal_remote_launch(lcore_mainloop, NULL, lcore_id);
     }
 
-    /* call it on master lcore too */
+    /* call it on initial lcore too */
 
     (void) lcore_mainloop(NULL);
 
@@ -105,7 +106,7 @@ This call to rte_timer_init() is necessary before doing any other operation on t
 
 Then, the two timers are configured:
 
-*   The first timer (timer0) is loaded on the master lcore and expires every second.
+*   The first timer (timer0) is loaded on the initial lcore and expires every second.
     Since the PERIODICAL flag is provided, the timer is reloaded automatically by the timer subsystem.
     The callback function is timer0_cb().
 
@@ -115,7 +116,7 @@ Then, the two timers are configured:
 
 .. code-block:: c
 
-    /* load timer0, every second, on master lcore, reloaded automatically */
+    /* load timer0, every second, on initial lcore, reloaded automatically */
 
     hz = rte_get_hpet_hz();
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f169604752b8..7d6b81de7f46 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -71,7 +71,7 @@ The command line options are:
 *   ``--coremask=0xXX``
 
     Set the hexadecimal bitmask of the cores running the packet forwarding test.
-    The master lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
+    The initial lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
 
 *   ``--portmask=0xXX``
 
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a308f2..7d4db1140092 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -692,7 +692,7 @@ This is equivalent to the ``--coremask`` command-line option.
 
 .. note::
 
-   The master lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
+   The initial lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
 
 set portmask
 ~~~~~~~~~~~~
-- 
2.26.2


^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-07-01  4:50  3%               ` Jerin Jacob
@ 2020-07-01 16:48  0%                 ` McDaniel, Timothy
  0 siblings, 0 replies; 200+ results
From: McDaniel, Timothy @ 2020-07-01 16:48 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

>-----Original Message-----
>From: Jerin Jacob <jerinjacobk@gmail.com>
>Sent: Tuesday, June 30, 2020 11:50 PM
>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>On Wed, Jul 1, 2020 at 12:57 AM McDaniel, Timothy
><timothy.mcdaniel@intel.com> wrote:
>>
>> >-----Original Message-----
>> >From: Jerin Jacob <jerinjacobk@gmail.com>
>> >Sent: Tuesday, June 30, 2020 10:58 AM
>> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
>> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>> >
>> >On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
>> ><timothy.mcdaniel@intel.com> wrote:
>> >>
>> >> >-----Original Message-----
>> >> >From: Jerin Jacob <jerinjacobk@gmail.com>
>> >> >Sent: Monday, June 29, 2020 11:21 PM
>> >> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>> >> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
><nhorman@tuxdriver.com>;
>> >> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> >> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage
>> >> ><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>> >> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>> >> >
>> >> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
>> >> ><timothy.mcdaniel@intel.com> wrote:
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Jerin Jacob <jerinjacobk@gmail.com>
>> >> >> Sent: Saturday, June 27, 2020 2:45 AM
>> >> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
>> >> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>> >> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> >> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage
>> >> ><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>> >> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>> >> >>
>> >> >> > +
>> >> >> > +/** Event port configuration structure */
>> >> >> > +struct rte_event_port_conf_v20 {
>> >> >> > +       int32_t new_event_threshold;
>> >> >> > +       /**< A backpressure threshold for new event enqueues on this
>port.
>> >> >> > +        * Use for *closed system* event dev where event capacity is
>limited,
>> >> >> > +        * and cannot exceed the capacity of the event dev.
>> >> >> > +        * Configuring ports with different thresholds can make higher
>priority
>> >> >> > +        * traffic less likely to  be backpressured.
>> >> >> > +        * For example, a port used to inject NIC Rx packets into the event
>dev
>> >> >> > +        * can have a lower threshold so as not to overwhelm the device,
>> >> >> > +        * while ports used for worker pools can have a higher threshold.
>> >> >> > +        * This value cannot exceed the *nb_events_limit*
>> >> >> > +        * which was previously supplied to rte_event_dev_configure().
>> >> >> > +        * This should be set to '-1' for *open system*.
>> >> >> > +        */
>> >> >> > +       uint16_t dequeue_depth;
>> >> >> > +       /**< Configure number of bulk dequeues for this event port.
>> >> >> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>> >> >> > +        * which previously supplied to rte_event_dev_configure().
>> >> >> > +        * Ignored when device is not
>RTE_EVENT_DEV_CAP_BURST_MODE
>> >> >capable.
>> >> >> > +        */
>> >> >> > +       uint16_t enqueue_depth;
>> >> >> > +       /**< Configure number of bulk enqueues for this event port.
>> >> >> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>> >> >> > +        * which previously supplied to rte_event_dev_configure().
>> >> >> > +        * Ignored when device is not
>RTE_EVENT_DEV_CAP_BURST_MODE
>> >> >capable.
>> >> >> > +        */
>> >> >> >         uint8_t disable_implicit_release;
>> >> >> >         /**< Configure the port not to release outstanding events in
>> >> >> >          * rte_event_dev_dequeue_burst(). If true, all events received
>through
>> >> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>> >> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>> >> >> >                                 struct rte_event_port_conf *port_conf);
>> >> >> >
>> >> >> > +int
>> >> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>> >> >> > +                               struct rte_event_port_conf_v20 *port_conf);
>> >> >> > +
>> >> >> > +int
>> >> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>> >> >> > +                                     struct rte_event_port_conf *port_conf);
>> >> >>
>> >> >> Hi Timothy,
>> >> >>
>> >> >> + ABI Maintainers (Ray, Neil)
>> >> >>
>> >> >> # As per my understanding, the structures can not be versioned, only
>> >> >> function can be versioned.
>> >> >> i.e we can not make any change to " struct rte_event_port_conf"
>> >> >>
>> >> >> # We have a similar case with ethdev and it deferred to next release
>v20.11
>> >> >> http://patches.dpdk.org/patch/69113/
>> >> >>
>> >> >> Regarding the API changes:
>> >> >> # The slow path changes general looks good to me. I will review the
>> >> >> next level in the coming days
>> >> >> # The following fast path changes bothers to me. Could you share more
>> >> >> details on below change?
>> >> >>
>> >> >> diff --git a/app/test-eventdev/test_order_atq.c
>> >> >> b/app/test-eventdev/test_order_atq.c
>> >> >> index 3366cfc..8246b96 100644
>> >> >> --- a/app/test-eventdev/test_order_atq.c
>> >> >> +++ b/app/test-eventdev/test_order_atq.c
>> >> >> @@ -34,6 +34,8 @@
>> >> >>                         continue;
>> >> >>                 }
>> >> >>
>> >> >> +               ev.flow_id = ev.mbuf->udata64;
>> >> >> +
>> >> >> # Since RC1 is near, I am not sure how to accommodate the API changes
>> >> >> now and sort out ABI stuffs.
>> >> >> # Other concern is eventdev spec get bloated with versioning files
>> >> >> just for ONE release as 20.11 will be OK to change the ABI.
>> >> >> # While we discuss the API change, Please send deprecation notice for
>> >> >> ABI change for 20.11,
>> >> >> so that there is no ambiguity of this patch for the 20.11 release.
>> >> >>
>> >> >> Hello Jerin,
>> >> >>
>> >> >> Thank you for the review comments.
>> >> >>
>> >> >> With regard to your comments regarding the fast path flow_id change,
>the
>> >Intel
>> >> >DLB hardware
>> >> >> is not capable of transferring the flow_id as part of the event itself. We
>> >> >therefore require a mechanism
>> >> >> to accomplish this. What we have done to work around this is to require
>the
>> >> >application to embed the flow_id
>> >> >> within the data payload. The new flag, #define
>> >> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>> >> >> by applications to determine if they need to embed the flow_id, or if its
>> >> >automatically propagated and present in the
>> >> >> received event.
>> >> >>
>> >> >> What we should have done is to wrap the assignment with a conditional.
>> >> >>
>> >> >> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>> >> >>         ev.flow_id = ev.mbuf->udata64;
>> >> >
>> >> >Two problems with this approach,
>> >> >1) we are assuming mbuf udata64 field is available for DLB driver
>> >> >2) It won't work with another adapter, eventdev has no dependency with
>mbuf
>> >> >
>> >>
>> >> This snippet is not intended to suggest that udata64 always be used to store
>the
>> >flow ID, but as an example of how an application could do it. Some
>applications
>> >won’t need to carry the flow ID through; others can select an unused field in
>the
>> >event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-
>generate
>> >the flow ID in pipeline stages that require it.
>> >
>> >OK.
>> >>
>> >> >Question:
>> >> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
>> >> >only event pointer and not have any other metadata like schedule_type
>> >> >etc.
>> >> >
>> >>
>> >> The DLB device provides a 16B “queue entry” that consists of:
>> >>
>> >> *       8B event data
>> >> *       Queue ID
>> >> *       Priority
>> >> *       Scheduling type
>> >> *       19 bits of carried-through data
>> >> *       Assorted error/debug/reserved bits that are set by the device (not
>carried-
>> >through)
>> >>
>> >>  For the carried-through 19b, we use 12b for event_type and
>sub_event_type.
>> >
>> >I can only think of TWO options to help
>> >1) Since event pointer always cache aligned, You could grab LSB
>> >6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
>> >structure
>> >2) Have separate mempool driver using existing drivers, ie "event
>> >pointer" + or - some offset have any amount of custom data.
>> >
>>
>> We can't guarantee that the event will contain a pointer -- it's possible that 8B
>is inline data (i.e. struct rte_event's u64 field).
>>
>> It's really an application decision -- for example an app could allocate space in
>the 'mbuf private data' to store the flow ID, if the event device lacks that carry-
>flow-ID capability and the other mbuf fields can't be used for whatever reason.
>> We modified the tests, sample apps to show how this might be done, not
>necessarily how it must be done.
>
>
>Yeah. If HW has limitation we can't do much. It is OK to change
>eventdev spec to support new HW limitations. aka,
>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID is OK.
>Please update existing drivers has this
>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID capability which is missing in the
>patch(I believe)
>
>>
>> >
>> >>
>> >> >
>> >> >>
>> >> >> This would minimize/eliminate any performance impact due to the
>> >processor's
>> >> >branch prediction logic.
>> >
>> >I think, If we need to change common fastpath, better we need to make
>> >it template to create code for compile-time to have absolute zero
>> >overhead
>> >and use runtime.
>> >See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
>> >_create_ worker at compile time based on runtime capability.
>> >
>>
>> Yes, that would be perfect.  Thanks for the example!
>
>Where ever you are making fastpath change, Please follow this scheme
>and send the next version.
>In order to have clean and reusable code, you could have template
>function and with "if" and it can opt-out in _compile_ time.
>i.e
>
>no_inline generic_worker(..., _const_ uint64_t flags)
>{
>..
>..
>
>if (! flags & CAP_CARRY_FLOW_ID)
>    ....
>
>}
>
>worker_with_out_carry_flow_id()
>{
>          generic_worker(.., CAP_CARRY_FLOW_ID)
>}
>
>normal_worker()
>{
>          generic_worker(.., 0)
>}
>
>No other controversial top-level comments with this patch series.
>Once we sorted out the ABI issues then I can review and merge.
>

Thanks Jerin. I'll get these changes into the v3 patch set.

>
>>
>> >
>> >
>> >> >> The assignment then becomes in essence a NOOP for all event devices
>that
>> >are
>> >> >capable of carrying the flow_id as part of the event payload itself.
>> >> >>
>> >> >> Thanks,
>> >> >> Tim
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] RE: [RFC] mbuf: accurate packet Tx scheduling
  @ 2020-07-01 15:46  0%       ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-01 15:46 UTC (permalink / raw)
  To: Harman Kalra
  Cc: dev, Thomas Monjalon, Matan Azrad, Raslan Darawsheh, Ori Kam,
	olivier.matz, Shahaf Shuler

> -----Original Message-----
> From: Harman Kalra <hkalra@marvell.com>
> Sent: Wednesday, June 17, 2020 18:58
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Matan
> Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Ori Kam <orika@mellanox.com>;
> olivier.matz@6wind.com; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [EXT] RE: [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling
> 
> On Wed, Jun 10, 2020 at 03:16:12PM +0000, Slava Ovsiienko wrote:
> 
> > External Email
> >
> > ----------------------------------------------------------------------
Hi, Harman

Sorry for delay - missed your reply.

[..skip..]
> > Should we waste CPU cycles to wait the desired moment of time? Can we
> > guarantee stable interrupt latency if we choose to schedule on interrupts
> approach?
> >
> > This RFC splits the responsibility - application should prepare the
> > data and specify when it desires to send, the rest is on PMD.
> 
> I agree with the fact that we cannot guarantee the delay between tx burst
> call and data on wire, hence PMD should take care of it.
> Even if PMD is holding, it is wastage of CPU cycles or if we setup an alarm
> then also interrupt latency might be a concern to achieve precise timming. So
> how are you planning to address both of above issue in PMD.

It is promoted to HW. The special WAIT descriptor with timestamp
is pushed to the queue and hardware just waits for the appropriate moment.
It is exactly the task for PMD - convert the timestamp in hardware related
entities and perform requested operation on hardware. Thus we should not
wait on CPU in any way - loop/interrupts, etc. Let NIC do it for us.
 
> >
> > > >
> > > > PMD reports the ability to synchronize packet sending on timestamp
> > > > with new offload flag:
> > > >
> > > > This is palliative and is going to be replaced with new eth_dev
> > > > API about reporting/managing the supported dynamic flags and its
> > > > related features. This API would break ABI compatibility and can't
> > > > be introduced at the moment, so is postponed to 20.11.
> > > >
> > > > For testing purposes it is proposed to update testpmd "txonly"
> > > > forwarding mode routine. With this update testpmd application
> > > > generates the packets and sets the dynamic timestamps according to
> > > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> registered.
> > >
> > > So what I am understanding here is "rte_dynfield_timestamp" will
> > > provide information about three parameters:
> > > - timestamp at which TX should start
> > > - intra packet gap
> > > - intra burst gap.
> > >
> > > If its about "intra packet gap" then PMD can take care, but if it is
> > > about intra burst gap, application can take care of it.
> >
> > Not sure - the intra-burst gap might be pretty small.
> > It is supposed to handle intra-burst in the same way - by specifying
> > the timestamps. Waiting is supposed to be implemented on tx_burst() retry.
> > Prepare the packets with timestamps, tx_burst - if not all packets are
> > sent - it means queue is waiting for the schedult, retry with the remaining
> packets.
> > As option - we can implement intra-burst wait basing rte_eth_read_clock().
> 
> Yeah, I think app can make use of rte_eth_read_clock() to implement intra-
> burst gap.
> But my actual doubt was, what all information will app provide as part of
> "rte_dynfield_timestamp" - one I understand will be timestamp at which
> packets should be sent out. What else? intra-packet gap ?
Intra-packet gap is just the parameter of testpmd to provide some
preliminary feature testing. If intra-gap is too small even hardware might
not support. In mlx5 we are going to support the scheduling with minimal
granularity 250 nanoseconds, so minimal gap supported is 250ns,
on 100Gbps line speed it means at least two 1500B packets.

With best regards, Slava

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
    @ 2020-07-01 15:36  2% ` Viacheslav Ovsiienko
  2020-07-07 11:50  0%   ` Olivier Matz
  2020-07-07 12:59  2% ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-01 15:36 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. It the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..fb5477c 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/*
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, these dynamic flag and field will be used to manage
+ * the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME "rte_dynflag_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
  2020-07-01 12:21  0%           ` Ananyev, Konstantin
@ 2020-07-01 14:11  0%             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2020-07-01 14:11 UTC (permalink / raw)
  To: Ananyev, Konstantin, Morten Brørup, thomas, Jerin Jacob, jerinj
  Cc: dev, Olivier Matz, David Christensen, Stephen Hemminger, nd,
	Honnappa Nagarahalli, nd

<snip>
> > >
> > > > Subject: Re: [dpdk-dev] [RFC] ring: make ring implementation non-
> > > inlined
> > > >
> > > > 26/03/2020 09:04, Morten Brørup:
> > > > > From: Jerin Jacob
> > > > > > On Fri, Mar 20, 2020 Konstantin Ananyev wrote:
> > > > > > >
> > > > > > > As was discussed here:
> > > > > > > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > > > > > > this RFC aimed to hide ring internals into .c and make all
> > > > > > > ring functions non-inlined. In theory that might help to
> > > > > > > maintain
> > > ABI
> > > > > > > stability in future.
> > > > > > > This is just a POC to measure the impact of proposed idea,
> > > proper
> > > > > > > implementation would definetly need some extra effort.
> > > > > > > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra
> > > for
> > > > > > > enqueue+dequeue pair. On some more realistic code, I suspect
> > > > > > > the impact it might be a bit higher.
> > > > > > > For MP/MC bulk transfers degradation seems quite small,
> > > > > > > though
> > > for
> > > > > > > SP/SC and/or small transfers it is more then noticable (see
> > > exact
> > > > > > > numbers below).
> > > > > > > From my perspective we'd probably keep it inlined for now to
> > > avoid
> > > > > > > any non-anticipated perfomance degradations.
> > > > > > > Though intersted to see perf results and opinions from other
> > > > > > > interested parties.
> > > > > >
> > > > > > +1
> > > >
> > > > Konstantin, thank you for doing some measures
> > > >
> > > >
> > > > > > My reasoning is a bit different, DPDK is using in embedded
> > > > > > boxes
> > > too
> > > > > > where performance has more weight than ABI stuff.
> > > > >
> > > > > As a network appliance vendor I can confirm that we certainly
> > > > > care more about performance than ABI stability.
> > > > > ABI stability is irrelevant for us; and API instability is a
> > > > > non-recurring engineering cost each time
> > > we
> > > > > choose to switch to a new DPDK version, which we only do if we
> > > cannot
> > > > > avoid it, e.g. due to new drivers, security fixes or new
> > > > > features
> > > that
> > > > > we want to use.
> > > > >
> > > > > For us, the trend pointed in the wrong direction when DPDK
> > > > > switched the preference towards runtime configurability and
> > > > > deprecated
> > > compile
> > > > > time configurability. I do understand the reasoning behind it,
> > > > > and
> > > the
> > > > > impact is minimal, so we accept it.
> > > >
> > > > The code can be optimized by removing some instructions with #ifdef.
> > > > But the complexity of managing #ifdef enabling/disabling,
> > > > depending
> > > on the
> > > > platform and the use case, would be huge.
> > > > We try to have a reasonable code "always enabled" which performs
> > > > well
> > > in all
> > > > cases. This is a design choice which makes DPDK a library, not a
> > > > pool
> > > of code
> > > > to cherry-pick.
> > > >
> > > > > However, if DPDK starts sacrificing performance of the core
> > > libraries
> > > > > for the benefits of the GNU/Linux distributors, network
> > > > > appliance vendors may put more effort into sticking with old
> > > > > DPDK versions instead of updating.
> > > >
> > > > The initial choice regarding ABI compatibility was "do not care".
> > > > Recently, the decision was done to care about ABI compatibility as
> > > priority
> > > > number 2. The priority number 1 remains the performance.
> > > > That's a reason for allowing some ABI breakages in some specific
> > > releases
> > > > announced in advance.
> > > >
> > > > > > I think we need to focus first on slow path APIs ABI stuff.
> > > >
> > > > Yes we should not degrade fast path performance for the sake of
> > > avoiding
> > > > uncertain future ABI issues.
> > > >
> > > > Morten, Jerin, thank you for the feedback.
> > > I think we have a consensus here not to make any changes to inline
> > > functions for now.
> > > Should we mark this as 'Deferred or Rejected'?
> >
> > Rejected.
> >
> > There is no need for this modification now, and no actual use cases
> > for it in the road map. In other words: This modification has no use cases; it
> is purely academic. Many other suggestions have been rejected for the reason
> that they have no current use cases.
> >
> > As Thomas mentioned, DPDK has transitioned towards being a library,
> > rather than a pool of code to cherry-pick from. I have learned to live with
> this.
> >
> > Being a library doesn't mean that functions cannot be exposed as
> > inline code in the library header files. DPDK is mainly a high
> > performance library with a tradition of exposing many of its internals in its
> API, and we should keep it this way. We certainly don't want an opaque API
> hiding all of its internals, passing around void pointers.
> >
> > However, it was still an interesting experiment to investigate the
> performance cost.
> 
> Yes, please reject it.
I just tried to mark it rejected in patchwork, I do not have the permissions (probably you are the owner of the patch). Can you please mark it?

> Konstantin
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
  2020-07-01  7:27  0%         ` Morten Brørup
@ 2020-07-01 12:21  0%           ` Ananyev, Konstantin
  2020-07-01 14:11  0%             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2020-07-01 12:21 UTC (permalink / raw)
  To: Morten Brørup, Honnappa Nagarahalli, thomas, Jerin Jacob, jerinj
  Cc: dev, Olivier Matz, David Christensen, Stephen Hemminger, nd, nd

> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa
> > Nagarahalli
> > Sent: Wednesday, July 1, 2020 1:16 AM
> >
> > <snip>
> >
> > > Subject: Re: [dpdk-dev] [RFC] ring: make ring implementation non-
> > inlined
> > >
> > > 26/03/2020 09:04, Morten Brørup:
> > > > From: Jerin Jacob
> > > > > On Fri, Mar 20, 2020 Konstantin Ananyev wrote:
> > > > > >
> > > > > > As was discussed here:
> > > > > > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > > > > > this RFC aimed to hide ring internals into .c and make all ring
> > > > > > functions non-inlined. In theory that might help to maintain
> > ABI
> > > > > > stability in future.
> > > > > > This is just a POC to measure the impact of proposed idea,
> > proper
> > > > > > implementation would definetly need some extra effort.
> > > > > > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra
> > for
> > > > > > enqueue+dequeue pair. On some more realistic code, I suspect
> > > > > > the impact it might be a bit higher.
> > > > > > For MP/MC bulk transfers degradation seems quite small, though
> > for
> > > > > > SP/SC and/or small transfers it is more then noticable (see
> > exact
> > > > > > numbers below).
> > > > > > From my perspective we'd probably keep it inlined for now to
> > avoid
> > > > > > any non-anticipated perfomance degradations.
> > > > > > Though intersted to see perf results and opinions from other
> > > > > > interested parties.
> > > > >
> > > > > +1
> > >
> > > Konstantin, thank you for doing some measures
> > >
> > >
> > > > > My reasoning is a bit different, DPDK is using in embedded boxes
> > too
> > > > > where performance has more weight than ABI stuff.
> > > >
> > > > As a network appliance vendor I can confirm that we certainly care
> > > > more about performance than ABI stability.
> > > > ABI stability is irrelevant for us;
> > > > and API instability is a non-recurring engineering cost each time
> > we
> > > > choose to switch to a new DPDK version, which we only do if we
> > cannot
> > > > avoid it, e.g. due to new drivers, security fixes or new features
> > that
> > > > we want to use.
> > > >
> > > > For us, the trend pointed in the wrong direction when DPDK switched
> > > > the preference towards runtime configurability and deprecated
> > compile
> > > > time configurability. I do understand the reasoning behind it, and
> > the
> > > > impact is minimal, so we accept it.
> > >
> > > The code can be optimized by removing some instructions with #ifdef.
> > > But the complexity of managing #ifdef enabling/disabling, depending
> > on the
> > > platform and the use case, would be huge.
> > > We try to have a reasonable code "always enabled" which performs well
> > in all
> > > cases. This is a design choice which makes DPDK a library, not a pool
> > of code
> > > to cherry-pick.
> > >
> > > > However, if DPDK starts sacrificing performance of the core
> > libraries
> > > > for the benefits of the GNU/Linux distributors, network appliance
> > > > vendors may put more effort into sticking with old DPDK versions
> > > > instead of updating.
> > >
> > > The initial choice regarding ABI compatibility was "do not care".
> > > Recently, the decision was done to care about ABI compatibility as
> > priority
> > > number 2. The priority number 1 remains the performance.
> > > That's a reason for allowing some ABI breakages in some specific
> > releases
> > > announced in advance.
> > >
> > > > > I think we need to focus first on slow path APIs ABI stuff.
> > >
> > > Yes we should not degrade fast path performance for the sake of
> > avoiding
> > > uncertain future ABI issues.
> > >
> > > Morten, Jerin, thank you for the feedback.
> > I think we have a consensus here not to make any changes to inline
> > functions for now.
> > Should we mark this as 'Deferred or Rejected'?
> 
> Rejected.
> 
> There is no need for this modification now, and no actual use cases for it in the road map. In other words: This modification has no use
> cases; it is purely academic. Many other suggestions have been rejected for the reason that they have no current use cases.
> 
> As Thomas mentioned, DPDK has transitioned towards being a library, rather than a pool of code to cherry-pick from. I have learned to live
> with this.
> 
> Being a library doesn't mean that functions cannot be exposed as inline code in the library header files. DPDK is mainly a high performance
> library with a tradition of exposing many of its internals in its API, and we should keep it this way. We certainly don't want an opaque API
> hiding all of its internals, passing around void pointers.
> 
> However, it was still an interesting experiment to investigate the performance cost.

Yes, please reject it.
Konstantin



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
  2020-06-30 23:15  0%       ` Honnappa Nagarahalli
@ 2020-07-01  7:27  0%         ` Morten Brørup
  2020-07-01 12:21  0%           ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2020-07-01  7:27 UTC (permalink / raw)
  To: Honnappa Nagarahalli, thomas, Jerin Jacob, Konstantin Ananyev, jerinj
  Cc: dev, Olivier Matz, David Christensen, Stephen Hemminger, nd, nd

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa
> Nagarahalli
> Sent: Wednesday, July 1, 2020 1:16 AM
> 
> <snip>
> 
> > Subject: Re: [dpdk-dev] [RFC] ring: make ring implementation non-
> inlined
> >
> > 26/03/2020 09:04, Morten Brørup:
> > > From: Jerin Jacob
> > > > On Fri, Mar 20, 2020 Konstantin Ananyev wrote:
> > > > >
> > > > > As was discussed here:
> > > > > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > > > > this RFC aimed to hide ring internals into .c and make all ring
> > > > > functions non-inlined. In theory that might help to maintain
> ABI
> > > > > stability in future.
> > > > > This is just a POC to measure the impact of proposed idea,
> proper
> > > > > implementation would definetly need some extra effort.
> > > > > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra
> for
> > > > > enqueue+dequeue pair. On some more realistic code, I suspect
> > > > > the impact it might be a bit higher.
> > > > > For MP/MC bulk transfers degradation seems quite small, though
> for
> > > > > SP/SC and/or small transfers it is more then noticable (see
> exact
> > > > > numbers below).
> > > > > From my perspective we'd probably keep it inlined for now to
> avoid
> > > > > any non-anticipated perfomance degradations.
> > > > > Though intersted to see perf results and opinions from other
> > > > > interested parties.
> > > >
> > > > +1
> >
> > Konstantin, thank you for doing some measures
> >
> >
> > > > My reasoning is a bit different, DPDK is using in embedded boxes
> too
> > > > where performance has more weight than ABI stuff.
> > >
> > > As a network appliance vendor I can confirm that we certainly care
> > > more about performance than ABI stability.
> > > ABI stability is irrelevant for us;
> > > and API instability is a non-recurring engineering cost each time
> we
> > > choose to switch to a new DPDK version, which we only do if we
> cannot
> > > avoid it, e.g. due to new drivers, security fixes or new features
> that
> > > we want to use.
> > >
> > > For us, the trend pointed in the wrong direction when DPDK switched
> > > the preference towards runtime configurability and deprecated
> compile
> > > time configurability. I do understand the reasoning behind it, and
> the
> > > impact is minimal, so we accept it.
> >
> > The code can be optimized by removing some instructions with #ifdef.
> > But the complexity of managing #ifdef enabling/disabling, depending
> on the
> > platform and the use case, would be huge.
> > We try to have a reasonable code "always enabled" which performs well
> in all
> > cases. This is a design choice which makes DPDK a library, not a pool
> of code
> > to cherry-pick.
> >
> > > However, if DPDK starts sacrificing performance of the core
> libraries
> > > for the benefits of the GNU/Linux distributors, network appliance
> > > vendors may put more effort into sticking with old DPDK versions
> > > instead of updating.
> >
> > The initial choice regarding ABI compatibility was "do not care".
> > Recently, the decision was done to care about ABI compatibility as
> priority
> > number 2. The priority number 1 remains the performance.
> > That's a reason for allowing some ABI breakages in some specific
> releases
> > announced in advance.
> >
> > > > I think we need to focus first on slow path APIs ABI stuff.
> >
> > Yes we should not degrade fast path performance for the sake of
> avoiding
> > uncertain future ABI issues.
> >
> > Morten, Jerin, thank you for the feedback.
> I think we have a consensus here not to make any changes to inline
> functions for now.
> Should we mark this as 'Deferred or Rejected'?

Rejected.

There is no need for this modification now, and no actual use cases for it in the road map. In other words: This modification has no use cases; it is purely academic. Many other suggestions have been rejected for the reason that they have no current use cases.

As Thomas mentioned, DPDK has transitioned towards being a library, rather than a pool of code to cherry-pick from. I have learned to live with this.

Being a library doesn't mean that functions cannot be exposed as inline code in the library header files. DPDK is mainly a high performance library with a tradition of exposing many of its internals in its API, and we should keep it this way. We certainly don't want an opaque API hiding all of its internals, passing around void pointers.

However, it was still an interesting experiment to investigate the performance cost.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 19:26  0%             ` McDaniel, Timothy
  2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
@ 2020-07-01  4:50  3%               ` Jerin Jacob
  2020-07-01 16:48  0%                 ` McDaniel, Timothy
  1 sibling, 1 reply; 200+ results
From: Jerin Jacob @ 2020-07-01  4:50 UTC (permalink / raw)
  To: McDaniel, Timothy
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

On Wed, Jul 1, 2020 at 12:57 AM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
> >-----Original Message-----
> >From: Jerin Jacob <jerinjacobk@gmail.com>
> >Sent: Tuesday, June 30, 2020 10:58 AM
> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >
> >On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
> ><timothy.mcdaniel@intel.com> wrote:
> >>
> >> >-----Original Message-----
> >> >From: Jerin Jacob <jerinjacobk@gmail.com>
> >> >Sent: Monday, June 29, 2020 11:21 PM
> >> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> >> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
> >> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> >> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> >> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >> >
> >> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
> >> ><timothy.mcdaniel@intel.com> wrote:
> >> >>
> >> >> -----Original Message-----
> >> >> From: Jerin Jacob <jerinjacobk@gmail.com>
> >> >> Sent: Saturday, June 27, 2020 2:45 AM
> >> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
> >> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
> >> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> >> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> >> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >> >>
> >> >> > +
> >> >> > +/** Event port configuration structure */
> >> >> > +struct rte_event_port_conf_v20 {
> >> >> > +       int32_t new_event_threshold;
> >> >> > +       /**< A backpressure threshold for new event enqueues on this port.
> >> >> > +        * Use for *closed system* event dev where event capacity is limited,
> >> >> > +        * and cannot exceed the capacity of the event dev.
> >> >> > +        * Configuring ports with different thresholds can make higher priority
> >> >> > +        * traffic less likely to  be backpressured.
> >> >> > +        * For example, a port used to inject NIC Rx packets into the event dev
> >> >> > +        * can have a lower threshold so as not to overwhelm the device,
> >> >> > +        * while ports used for worker pools can have a higher threshold.
> >> >> > +        * This value cannot exceed the *nb_events_limit*
> >> >> > +        * which was previously supplied to rte_event_dev_configure().
> >> >> > +        * This should be set to '-1' for *open system*.
> >> >> > +        */
> >> >> > +       uint16_t dequeue_depth;
> >> >> > +       /**< Configure number of bulk dequeues for this event port.
> >> >> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> >> >> > +        * which previously supplied to rte_event_dev_configure().
> >> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
> >> >capable.
> >> >> > +        */
> >> >> > +       uint16_t enqueue_depth;
> >> >> > +       /**< Configure number of bulk enqueues for this event port.
> >> >> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> >> >> > +        * which previously supplied to rte_event_dev_configure().
> >> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
> >> >capable.
> >> >> > +        */
> >> >> >         uint8_t disable_implicit_release;
> >> >> >         /**< Configure the port not to release outstanding events in
> >> >> >          * rte_event_dev_dequeue_burst(). If true, all events received through
> >> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >> >> >                                 struct rte_event_port_conf *port_conf);
> >> >> >
> >> >> > +int
> >> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> >> >> > +                               struct rte_event_port_conf_v20 *port_conf);
> >> >> > +
> >> >> > +int
> >> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> >> >> > +                                     struct rte_event_port_conf *port_conf);
> >> >>
> >> >> Hi Timothy,
> >> >>
> >> >> + ABI Maintainers (Ray, Neil)
> >> >>
> >> >> # As per my understanding, the structures can not be versioned, only
> >> >> function can be versioned.
> >> >> i.e we can not make any change to " struct rte_event_port_conf"
> >> >>
> >> >> # We have a similar case with ethdev and it deferred to next release v20.11
> >> >> http://patches.dpdk.org/patch/69113/
> >> >>
> >> >> Regarding the API changes:
> >> >> # The slow path changes general looks good to me. I will review the
> >> >> next level in the coming days
> >> >> # The following fast path changes bothers to me. Could you share more
> >> >> details on below change?
> >> >>
> >> >> diff --git a/app/test-eventdev/test_order_atq.c
> >> >> b/app/test-eventdev/test_order_atq.c
> >> >> index 3366cfc..8246b96 100644
> >> >> --- a/app/test-eventdev/test_order_atq.c
> >> >> +++ b/app/test-eventdev/test_order_atq.c
> >> >> @@ -34,6 +34,8 @@
> >> >>                         continue;
> >> >>                 }
> >> >>
> >> >> +               ev.flow_id = ev.mbuf->udata64;
> >> >> +
> >> >> # Since RC1 is near, I am not sure how to accommodate the API changes
> >> >> now and sort out ABI stuffs.
> >> >> # Other concern is eventdev spec get bloated with versioning files
> >> >> just for ONE release as 20.11 will be OK to change the ABI.
> >> >> # While we discuss the API change, Please send deprecation notice for
> >> >> ABI change for 20.11,
> >> >> so that there is no ambiguity of this patch for the 20.11 release.
> >> >>
> >> >> Hello Jerin,
> >> >>
> >> >> Thank you for the review comments.
> >> >>
> >> >> With regard to your comments regarding the fast path flow_id change, the
> >Intel
> >> >DLB hardware
> >> >> is not capable of transferring the flow_id as part of the event itself. We
> >> >therefore require a mechanism
> >> >> to accomplish this. What we have done to work around this is to require the
> >> >application to embed the flow_id
> >> >> within the data payload. The new flag, #define
> >> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
> >> >> by applications to determine if they need to embed the flow_id, or if its
> >> >automatically propagated and present in the
> >> >> received event.
> >> >>
> >> >> What we should have done is to wrap the assignment with a conditional.
> >> >>
> >> >> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
> >> >>         ev.flow_id = ev.mbuf->udata64;
> >> >
> >> >Two problems with this approach,
> >> >1) we are assuming mbuf udata64 field is available for DLB driver
> >> >2) It won't work with another adapter, eventdev has no dependency with mbuf
> >> >
> >>
> >> This snippet is not intended to suggest that udata64 always be used to store the
> >flow ID, but as an example of how an application could do it. Some applications
> >won’t need to carry the flow ID through; others can select an unused field in the
> >event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-generate
> >the flow ID in pipeline stages that require it.
> >
> >OK.
> >>
> >> >Question:
> >> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
> >> >only event pointer and not have any other metadata like schedule_type
> >> >etc.
> >> >
> >>
> >> The DLB device provides a 16B “queue entry” that consists of:
> >>
> >> *       8B event data
> >> *       Queue ID
> >> *       Priority
> >> *       Scheduling type
> >> *       19 bits of carried-through data
> >> *       Assorted error/debug/reserved bits that are set by the device (not carried-
> >through)
> >>
> >>  For the carried-through 19b, we use 12b for event_type and sub_event_type.
> >
> >I can only think of TWO options to help
> >1) Since event pointer always cache aligned, You could grab LSB
> >6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
> >structure
> >2) Have separate mempool driver using existing drivers, ie "event
> >pointer" + or - some offset have any amount of custom data.
> >
>
> We can't guarantee that the event will contain a pointer -- it's possible that 8B is inline data (i.e. struct rte_event's u64 field).
>
> It's really an application decision -- for example an app could allocate space in the 'mbuf private data' to store the flow ID, if the event device lacks that carry-flow-ID capability and the other mbuf fields can't be used for whatever reason.
> We modified the tests, sample apps to show how this might be done, not necessarily how it must be done.


Yeah. If HW has limitation we can't do much. It is OK to change
eventdev spec to support new HW limitations. aka,
RTE_EVENT_DEV_CAP_CARRY_FLOW_ID is OK.
Please update existing drivers has this
RTE_EVENT_DEV_CAP_CARRY_FLOW_ID capability which is missing in the
patch(I believe)

>
> >
> >>
> >> >
> >> >>
> >> >> This would minimize/eliminate any performance impact due to the
> >processor's
> >> >branch prediction logic.
> >
> >I think, If we need to change common fastpath, better we need to make
> >it template to create code for compile-time to have absolute zero
> >overhead
> >and use runtime.
> >See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
> >_create_ worker at compile time based on runtime capability.
> >
>
> Yes, that would be perfect.  Thanks for the example!

Where ever you are making fastpath change, Please follow this scheme
and send the next version.
In order to have clean and reusable code, you could have template
function and with "if" and it can opt-out in _compile_ time.
i.e

no_inline generic_worker(..., _const_ uint64_t flags)
{
..
..

if (! flags & CAP_CARRY_FLOW_ID)
    ....

}

worker_with_out_carry_flow_id()
{
          generic_worker(.., CAP_CARRY_FLOW_ID)
}

normal_worker()
{
          generic_worker(.., 0)
}

No other controversial top-level comments with this patch series.
Once we sorted out the ABI issues then I can review and merge.


>
> >
> >
> >> >> The assignment then becomes in essence a NOOP for all event devices that
> >are
> >> >capable of carrying the flow_id as part of the event payload itself.
> >> >>
> >> >> Thanks,
> >> >> Tim
> >> >>
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Tim

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
  @ 2020-06-30 23:15  0%       ` Honnappa Nagarahalli
  2020-07-01  7:27  0%         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2020-06-30 23:15 UTC (permalink / raw)
  To: thomas, Jerin Jacob, Konstantin Ananyev, jerinj, Morten Brørup
  Cc: dev, Olivier Matz, David Christensen, Stephen Hemminger,
	Honnappa Nagarahalli, nd, nd

<snip>

> Subject: Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
> 
> 26/03/2020 09:04, Morten Brørup:
> > From: Jerin Jacob
> > > On Fri, Mar 20, 2020 Konstantin Ananyev wrote:
> > > >
> > > > As was discussed here:
> > > > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > > > this RFC aimed to hide ring internals into .c and make all ring
> > > > functions non-inlined. In theory that might help to maintain ABI
> > > > stability in future.
> > > > This is just a POC to measure the impact of proposed idea, proper
> > > > implementation would definetly need some extra effort.
> > > > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra for
> > > > enqueue+dequeue pair. On some more realistic code, I suspect
> > > > the impact it might be a bit higher.
> > > > For MP/MC bulk transfers degradation seems quite small, though for
> > > > SP/SC and/or small transfers it is more then noticable (see exact
> > > > numbers below).
> > > > From my perspective we'd probably keep it inlined for now to avoid
> > > > any non-anticipated perfomance degradations.
> > > > Though intersted to see perf results and opinions from other
> > > > interested parties.
> > >
> > > +1
> 
> Konstantin, thank you for doing some measures
> 
> 
> > > My reasoning is a bit different, DPDK is using in embedded boxes too
> > > where performance has more weight than ABI stuff.
> >
> > As a network appliance vendor I can confirm that we certainly care
> > more about performance than ABI stability.
> > ABI stability is irrelevant for us;
> > and API instability is a non-recurring engineering cost each time we
> > choose to switch to a new DPDK version, which we only do if we cannot
> > avoid it, e.g. due to new drivers, security fixes or new features that
> > we want to use.
> >
> > For us, the trend pointed in the wrong direction when DPDK switched
> > the preference towards runtime configurability and deprecated compile
> > time configurability. I do understand the reasoning behind it, and the
> > impact is minimal, so we accept it.
> 
> The code can be optimized by removing some instructions with #ifdef.
> But the complexity of managing #ifdef enabling/disabling, depending on the
> platform and the use case, would be huge.
> We try to have a reasonable code "always enabled" which performs well in all
> cases. This is a design choice which makes DPDK a library, not a pool of code
> to cherry-pick.
> 
> > However, if DPDK starts sacrificing performance of the core libraries
> > for the benefits of the GNU/Linux distributors, network appliance
> > vendors may put more effort into sticking with old DPDK versions
> > instead of updating.
> 
> The initial choice regarding ABI compatibility was "do not care".
> Recently, the decision was done to care about ABI compatibility as priority
> number 2. The priority number 1 remains the performance.
> That's a reason for allowing some ABI breakages in some specific releases
> announced in advance.
> 
> > > I think we need to focus first on slow path APIs ABI stuff.
> 
> Yes we should not degrade fast path performance for the sake of avoiding
> uncertain future ABI issues.
> 
> Morten, Jerin, thank you for the feedback.
I think we have a consensus here not to make any changes to inline functions for now.
Should we mark this as 'Deferred or Rejected'?

> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
@ 2020-06-30 21:07  0%                 ` McDaniel, Timothy
  0 siblings, 0 replies; 200+ results
From: McDaniel, Timothy @ 2020-06-30 21:07 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob Kollanukkaran,
	Mattias Rönnblom, dpdk-dev, Eads, Gage, Van Haaren, Harry

>-----Original Message-----
>From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
>Sent: Tuesday, June 30, 2020 3:40 PM
>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Jerin Jacob
><jerinjacobk@gmail.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>
>
>>-----Original Message-----
>>From: dev <dev-bounces@dpdk.org> On Behalf Of McDaniel, Timothy
>>Sent: Wednesday, July 1, 2020 12:57 AM
>>To: Jerin Jacob <jerinjacobk@gmail.com>
>>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
>><nhorman@tuxdriver.com>; Jerin Jacob Kollanukkaran
>><jerinj@marvell.com>; Mattias Rönnblom
>><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>>Gage <gage.eads@intel.com>; Van Haaren, Harry
>><harry.van.haaren@intel.com>
>>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>>prerequisites
>>
>>>-----Original Message-----
>>>From: Jerin Jacob <jerinjacobk@gmail.com>
>>>Sent: Tuesday, June 30, 2020 10:58 AM
>>>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>>>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
>><nhorman@tuxdriver.com>;
>>>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>>Gage
>>><gage.eads@intel.com>; Van Haaren, Harry
>><harry.van.haaren@intel.com>
>>>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>>prerequisites
>>>
>>>On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
>>><timothy.mcdaniel@intel.com> wrote:
>>>>
>>>> >-----Original Message-----
>>>> >From: Jerin Jacob <jerinjacobk@gmail.com>
>>>> >Sent: Monday, June 29, 2020 11:21 PM
>>>> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>>>> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
>><nhorman@tuxdriver.com>;
>>>> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>;
>>Eads, Gage
>>>> ><gage.eads@intel.com>; Van Haaren, Harry
>><harry.van.haaren@intel.com>
>>>> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>>prerequisites
>>>> >
>>>> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
>>>> ><timothy.mcdaniel@intel.com> wrote:
>>>> >>
>>>> >> -----Original Message-----
>>>> >> From: Jerin Jacob <jerinjacobk@gmail.com>
>>>> >> Sent: Saturday, June 27, 2020 2:45 AM
>>>> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray
>>Kinsella
>>>> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>>>> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>;
>>Eads, Gage
>>>> ><gage.eads@intel.com>; Van Haaren, Harry
>><harry.van.haaren@intel.com>
>>>> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>>prerequisites
>>>> >>
>>>> >> > +
>>>> >> > +/** Event port configuration structure */
>>>> >> > +struct rte_event_port_conf_v20 {
>>>> >> > +       int32_t new_event_threshold;
>>>> >> > +       /**< A backpressure threshold for new event enqueues on
>>this port.
>>>> >> > +        * Use for *closed system* event dev where event capacity
>>is limited,
>>>> >> > +        * and cannot exceed the capacity of the event dev.
>>>> >> > +        * Configuring ports with different thresholds can make
>>higher priority
>>>> >> > +        * traffic less likely to  be backpressured.
>>>> >> > +        * For example, a port used to inject NIC Rx packets into
>>the event dev
>>>> >> > +        * can have a lower threshold so as not to overwhelm the
>>device,
>>>> >> > +        * while ports used for worker pools can have a higher
>>threshold.
>>>> >> > +        * This value cannot exceed the *nb_events_limit*
>>>> >> > +        * which was previously supplied to
>>rte_event_dev_configure().
>>>> >> > +        * This should be set to '-1' for *open system*.
>>>> >> > +        */
>>>> >> > +       uint16_t dequeue_depth;
>>>> >> > +       /**< Configure number of bulk dequeues for this event
>>port.
>>>> >> > +        * This value cannot exceed the
>>*nb_event_port_dequeue_depth*
>>>> >> > +        * which previously supplied to rte_event_dev_configure().
>>>> >> > +        * Ignored when device is not
>>RTE_EVENT_DEV_CAP_BURST_MODE
>>>> >capable.
>>>> >> > +        */
>>>> >> > +       uint16_t enqueue_depth;
>>>> >> > +       /**< Configure number of bulk enqueues for this event
>>port.
>>>> >> > +        * This value cannot exceed the
>>*nb_event_port_enqueue_depth*
>>>> >> > +        * which previously supplied to rte_event_dev_configure().
>>>> >> > +        * Ignored when device is not
>>RTE_EVENT_DEV_CAP_BURST_MODE
>>>> >capable.
>>>> >> > +        */
>>>> >> >         uint8_t disable_implicit_release;
>>>> >> >         /**< Configure the port not to release outstanding events
>>in
>>>> >> >          * rte_event_dev_dequeue_burst(). If true, all events
>>received through
>>>> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>>> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t
>>port_id,
>>>> >> >                                 struct rte_event_port_conf *port_conf);
>>>> >> >
>>>> >> > +int
>>>> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t
>>port_id,
>>>> >> > +                               struct rte_event_port_conf_v20 *port_conf);
>>>> >> > +
>>>> >> > +int
>>>> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t
>>port_id,
>>>> >> > +                                     struct rte_event_port_conf *port_conf);
>>>> >>
>>>> >> Hi Timothy,
>>>> >>
>>>> >> + ABI Maintainers (Ray, Neil)
>>>> >>
>>>> >> # As per my understanding, the structures can not be versioned,
>>only
>>>> >> function can be versioned.
>>>> >> i.e we can not make any change to " struct rte_event_port_conf"
>>>> >>
>>>> >> # We have a similar case with ethdev and it deferred to next
>>release v20.11
>>>> >> https://urldefense.proofpoint.com/v2/url?u=http-
>>3A__patches.dpdk.org_patch_69113_&d=DwIGaQ&c=nKjWec2b6R0mO
>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=lL7
>>dDlN7ICIpENvIB7_El27UclXA2tJLdOwbsirg1Dw&s=CNmSXDDn28U-
>>OjEAaZgJI_A2fDmMKM6zb12sIE9L-Io&e=
>>>> >>
>>>> >> Regarding the API changes:
>>>> >> # The slow path changes general looks good to me. I will review
>>the
>>>> >> next level in the coming days
>>>> >> # The following fast path changes bothers to me. Could you share
>>more
>>>> >> details on below change?
>>>> >>
>>>> >> diff --git a/app/test-eventdev/test_order_atq.c
>>>> >> b/app/test-eventdev/test_order_atq.c
>>>> >> index 3366cfc..8246b96 100644
>>>> >> --- a/app/test-eventdev/test_order_atq.c
>>>> >> +++ b/app/test-eventdev/test_order_atq.c
>>>> >> @@ -34,6 +34,8 @@
>>>> >>                         continue;
>>>> >>                 }
>>>> >>
>>>> >> +               ev.flow_id = ev.mbuf->udata64;
>>>> >> +
>>>> >> # Since RC1 is near, I am not sure how to accommodate the API
>>changes
>>>> >> now and sort out ABI stuffs.
>>>> >> # Other concern is eventdev spec get bloated with versioning files
>>>> >> just for ONE release as 20.11 will be OK to change the ABI.
>>>> >> # While we discuss the API change, Please send deprecation
>>notice for
>>>> >> ABI change for 20.11,
>>>> >> so that there is no ambiguity of this patch for the 20.11 release.
>>>> >>
>>>> >> Hello Jerin,
>>>> >>
>>>> >> Thank you for the review comments.
>>>> >>
>>>> >> With regard to your comments regarding the fast path flow_id
>>change, the
>>>Intel
>>>> >DLB hardware
>>>> >> is not capable of transferring the flow_id as part of the event
>>itself. We
>>>> >therefore require a mechanism
>>>> >> to accomplish this. What we have done to work around this is to
>>require the
>>>> >application to embed the flow_id
>>>> >> within the data payload. The new flag, #define
>>>> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>>>> >> by applications to determine if they need to embed the flow_id,
>>or if its
>>>> >automatically propagated and present in the
>>>> >> received event.
>>>> >>
>>>> >> What we should have done is to wrap the assignment with a
>>conditional.
>>>> >>
>>>> >> if (!(device_capability_flags &
>>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>>>> >>         ev.flow_id = ev.mbuf->udata64;
>>>> >
>>>> >Two problems with this approach,
>>>> >1) we are assuming mbuf udata64 field is available for DLB driver
>>>> >2) It won't work with another adapter, eventdev has no
>>dependency with mbuf
>>>> >
>>>>
>>>> This snippet is not intended to suggest that udata64 always be used
>>to store the
>>>flow ID, but as an example of how an application could do it. Some
>>applications
>>>won’t need to carry the flow ID through; others can select an unused
>>field in the
>>>event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case)
>>re-generate
>>>the flow ID in pipeline stages that require it.
>>>
>>>OK.
>>>>
>>>> >Question:
>>>> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is
>>it
>>>> >only event pointer and not have any other metadata like
>>schedule_type
>>>> >etc.
>>>> >
>>>>
>>>> The DLB device provides a 16B “queue entry” that consists of:
>>>>
>>>> *       8B event data
>>>> *       Queue ID
>>>> *       Priority
>>>> *       Scheduling type
>>>> *       19 bits of carried-through data
>>>> *       Assorted error/debug/reserved bits that are set by the device
>>(not carried-
>>>through)
>>>>
>>>>  For the carried-through 19b, we use 12b for event_type and
>>sub_event_type.
>>>
>>>I can only think of TWO options to help
>>>1) Since event pointer always cache aligned, You could grab LSB
>>>6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
>>>structure
>>>2) Have separate mempool driver using existing drivers, ie "event
>>>pointer" + or - some offset have any amount of custom data.
>>>
>>
>>We can't guarantee that the event will contain a pointer -- it's possible
>>that 8B is inline data (i.e. struct rte_event's u64 field).
>>
>>It's really an application decision -- for example an app could allocate
>>space in the 'mbuf private data' to store the flow ID, if the event device
>>lacks that carry-flow-ID capability and the other mbuf fields can't be
>>used for whatever reason.
>>We modified the tests, sample apps to show how this might be done,
>>not necessarily how it must be done.
>>
>>>
>>>>
>>>> >
>>>> >>
>>>> >> This would minimize/eliminate any performance impact due to
>>the
>>>processor's
>>>> >branch prediction logic.
>>>
>>>I think, If we need to change common fastpath, better we need to
>>make
>>>it template to create code for compile-time to have absolute zero
>>>overhead
>>>and use runtime.
>>>See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
>>>_create_ worker at compile time based on runtime capability.
>>>
>>
>>Yes, that would be perfect.  Thanks for the example!
>
>Just to  add instead of having if and else using a jumptbl would be much cleaner
>Ex.
>	const pipeline_atq_worker_t pipeline_atq_worker_single_stage[2][2][2]
>= {
>		[0][0] = pipeline_atq_worker_single_stage_fwd,
>		[0][1] = pipeline_atq_worker_single_stage_tx,
>		[1][0] = pipeline_atq_worker_single_stage_burst_fwd,
>		[1][1] = pipeline_atq_worker_single_stage_burst_tx,
>	};
>
>		return
>(pipeline_atq_worker_single_stage[burst][internal_port])(arg);
>


Thank you for the suggestion.


>>
>>>
>>>
>>>> >> The assignment then becomes in essence a NOOP for all event
>>devices that
>>>are
>>>> >capable of carrying the flow_id as part of the event payload itself.
>>>> >>
>>>> >> Thanks,
>>>> >> Tim
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks,
>>>> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 19:26  0%             ` McDaniel, Timothy
@ 2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
  2020-06-30 21:07  0%                 ` McDaniel, Timothy
  2020-07-01  4:50  3%               ` Jerin Jacob
  1 sibling, 1 reply; 200+ results
From: Pavan Nikhilesh Bhagavatula @ 2020-06-30 20:40 UTC (permalink / raw)
  To: McDaniel, Timothy, Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob Kollanukkaran,
	Mattias Rönnblom, dpdk-dev, Eads, Gage, Van Haaren, Harry



>-----Original Message-----
>From: dev <dev-bounces@dpdk.org> On Behalf Of McDaniel, Timothy
>Sent: Wednesday, July 1, 2020 12:57 AM
>To: Jerin Jacob <jerinjacobk@gmail.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
><nhorman@tuxdriver.com>; Jerin Jacob Kollanukkaran
><jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage <gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>
>>-----Original Message-----
>>From: Jerin Jacob <jerinjacobk@gmail.com>
>>Sent: Tuesday, June 30, 2020 10:58 AM
>>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
><nhorman@tuxdriver.com>;
>>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage
>><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>>
>>On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
>><timothy.mcdaniel@intel.com> wrote:
>>>
>>> >-----Original Message-----
>>> >From: Jerin Jacob <jerinjacobk@gmail.com>
>>> >Sent: Monday, June 29, 2020 11:21 PM
>>> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>>> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
><nhorman@tuxdriver.com>;
>>> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>;
>Eads, Gage
>>> ><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>>> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>>> >
>>> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
>>> ><timothy.mcdaniel@intel.com> wrote:
>>> >>
>>> >> -----Original Message-----
>>> >> From: Jerin Jacob <jerinjacobk@gmail.com>
>>> >> Sent: Saturday, June 27, 2020 2:45 AM
>>> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray
>Kinsella
>>> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>>> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>;
>Eads, Gage
>>> ><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>>> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>>> >>
>>> >> > +
>>> >> > +/** Event port configuration structure */
>>> >> > +struct rte_event_port_conf_v20 {
>>> >> > +       int32_t new_event_threshold;
>>> >> > +       /**< A backpressure threshold for new event enqueues on
>this port.
>>> >> > +        * Use for *closed system* event dev where event capacity
>is limited,
>>> >> > +        * and cannot exceed the capacity of the event dev.
>>> >> > +        * Configuring ports with different thresholds can make
>higher priority
>>> >> > +        * traffic less likely to  be backpressured.
>>> >> > +        * For example, a port used to inject NIC Rx packets into
>the event dev
>>> >> > +        * can have a lower threshold so as not to overwhelm the
>device,
>>> >> > +        * while ports used for worker pools can have a higher
>threshold.
>>> >> > +        * This value cannot exceed the *nb_events_limit*
>>> >> > +        * which was previously supplied to
>rte_event_dev_configure().
>>> >> > +        * This should be set to '-1' for *open system*.
>>> >> > +        */
>>> >> > +       uint16_t dequeue_depth;
>>> >> > +       /**< Configure number of bulk dequeues for this event
>port.
>>> >> > +        * This value cannot exceed the
>*nb_event_port_dequeue_depth*
>>> >> > +        * which previously supplied to rte_event_dev_configure().
>>> >> > +        * Ignored when device is not
>RTE_EVENT_DEV_CAP_BURST_MODE
>>> >capable.
>>> >> > +        */
>>> >> > +       uint16_t enqueue_depth;
>>> >> > +       /**< Configure number of bulk enqueues for this event
>port.
>>> >> > +        * This value cannot exceed the
>*nb_event_port_enqueue_depth*
>>> >> > +        * which previously supplied to rte_event_dev_configure().
>>> >> > +        * Ignored when device is not
>RTE_EVENT_DEV_CAP_BURST_MODE
>>> >capable.
>>> >> > +        */
>>> >> >         uint8_t disable_implicit_release;
>>> >> >         /**< Configure the port not to release outstanding events
>in
>>> >> >          * rte_event_dev_dequeue_burst(). If true, all events
>received through
>>> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t
>port_id,
>>> >> >                                 struct rte_event_port_conf *port_conf);
>>> >> >
>>> >> > +int
>>> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t
>port_id,
>>> >> > +                               struct rte_event_port_conf_v20 *port_conf);
>>> >> > +
>>> >> > +int
>>> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t
>port_id,
>>> >> > +                                     struct rte_event_port_conf *port_conf);
>>> >>
>>> >> Hi Timothy,
>>> >>
>>> >> + ABI Maintainers (Ray, Neil)
>>> >>
>>> >> # As per my understanding, the structures can not be versioned,
>only
>>> >> function can be versioned.
>>> >> i.e we can not make any change to " struct rte_event_port_conf"
>>> >>
>>> >> # We have a similar case with ethdev and it deferred to next
>release v20.11
>>> >> https://urldefense.proofpoint.com/v2/url?u=http-
>3A__patches.dpdk.org_patch_69113_&d=DwIGaQ&c=nKjWec2b6R0mO
>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=lL7
>dDlN7ICIpENvIB7_El27UclXA2tJLdOwbsirg1Dw&s=CNmSXDDn28U-
>OjEAaZgJI_A2fDmMKM6zb12sIE9L-Io&e=
>>> >>
>>> >> Regarding the API changes:
>>> >> # The slow path changes general looks good to me. I will review
>the
>>> >> next level in the coming days
>>> >> # The following fast path changes bothers to me. Could you share
>more
>>> >> details on below change?
>>> >>
>>> >> diff --git a/app/test-eventdev/test_order_atq.c
>>> >> b/app/test-eventdev/test_order_atq.c
>>> >> index 3366cfc..8246b96 100644
>>> >> --- a/app/test-eventdev/test_order_atq.c
>>> >> +++ b/app/test-eventdev/test_order_atq.c
>>> >> @@ -34,6 +34,8 @@
>>> >>                         continue;
>>> >>                 }
>>> >>
>>> >> +               ev.flow_id = ev.mbuf->udata64;
>>> >> +
>>> >> # Since RC1 is near, I am not sure how to accommodate the API
>changes
>>> >> now and sort out ABI stuffs.
>>> >> # Other concern is eventdev spec get bloated with versioning files
>>> >> just for ONE release as 20.11 will be OK to change the ABI.
>>> >> # While we discuss the API change, Please send deprecation
>notice for
>>> >> ABI change for 20.11,
>>> >> so that there is no ambiguity of this patch for the 20.11 release.
>>> >>
>>> >> Hello Jerin,
>>> >>
>>> >> Thank you for the review comments.
>>> >>
>>> >> With regard to your comments regarding the fast path flow_id
>change, the
>>Intel
>>> >DLB hardware
>>> >> is not capable of transferring the flow_id as part of the event
>itself. We
>>> >therefore require a mechanism
>>> >> to accomplish this. What we have done to work around this is to
>require the
>>> >application to embed the flow_id
>>> >> within the data payload. The new flag, #define
>>> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>>> >> by applications to determine if they need to embed the flow_id,
>or if its
>>> >automatically propagated and present in the
>>> >> received event.
>>> >>
>>> >> What we should have done is to wrap the assignment with a
>conditional.
>>> >>
>>> >> if (!(device_capability_flags &
>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>>> >>         ev.flow_id = ev.mbuf->udata64;
>>> >
>>> >Two problems with this approach,
>>> >1) we are assuming mbuf udata64 field is available for DLB driver
>>> >2) It won't work with another adapter, eventdev has no
>dependency with mbuf
>>> >
>>>
>>> This snippet is not intended to suggest that udata64 always be used
>to store the
>>flow ID, but as an example of how an application could do it. Some
>applications
>>won’t need to carry the flow ID through; others can select an unused
>field in the
>>event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case)
>re-generate
>>the flow ID in pipeline stages that require it.
>>
>>OK.
>>>
>>> >Question:
>>> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is
>it
>>> >only event pointer and not have any other metadata like
>schedule_type
>>> >etc.
>>> >
>>>
>>> The DLB device provides a 16B “queue entry” that consists of:
>>>
>>> *       8B event data
>>> *       Queue ID
>>> *       Priority
>>> *       Scheduling type
>>> *       19 bits of carried-through data
>>> *       Assorted error/debug/reserved bits that are set by the device
>(not carried-
>>through)
>>>
>>>  For the carried-through 19b, we use 12b for event_type and
>sub_event_type.
>>
>>I can only think of TWO options to help
>>1) Since event pointer always cache aligned, You could grab LSB
>>6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
>>structure
>>2) Have separate mempool driver using existing drivers, ie "event
>>pointer" + or - some offset have any amount of custom data.
>>
>
>We can't guarantee that the event will contain a pointer -- it's possible
>that 8B is inline data (i.e. struct rte_event's u64 field).
>
>It's really an application decision -- for example an app could allocate
>space in the 'mbuf private data' to store the flow ID, if the event device
>lacks that carry-flow-ID capability and the other mbuf fields can't be
>used for whatever reason.
>We modified the tests, sample apps to show how this might be done,
>not necessarily how it must be done.
>
>>
>>>
>>> >
>>> >>
>>> >> This would minimize/eliminate any performance impact due to
>the
>>processor's
>>> >branch prediction logic.
>>
>>I think, If we need to change common fastpath, better we need to
>make
>>it template to create code for compile-time to have absolute zero
>>overhead
>>and use runtime.
>>See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
>>_create_ worker at compile time based on runtime capability.
>>
>
>Yes, that would be perfect.  Thanks for the example!

Just to  add instead of having if and else using a jumptbl would be much cleaner
Ex.
	const pipeline_atq_worker_t pipeline_atq_worker_single_stage[2][2][2] = {
		[0][0] = pipeline_atq_worker_single_stage_fwd,
		[0][1] = pipeline_atq_worker_single_stage_tx,
		[1][0] = pipeline_atq_worker_single_stage_burst_fwd,
		[1][1] = pipeline_atq_worker_single_stage_burst_tx,
	};

		return (pipeline_atq_worker_single_stage[burst][internal_port])(arg);

>
>>
>>
>>> >> The assignment then becomes in essence a NOOP for all event
>devices that
>>are
>>> >capable of carrying the flow_id as part of the event payload itself.
>>> >>
>>> >> Thanks,
>>> >> Tim
>>> >>
>>> >>
>>> >>
>>> >> Thanks,
>>> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 15:57  0%           ` Jerin Jacob
@ 2020-06-30 19:26  0%             ` McDaniel, Timothy
  2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
  2020-07-01  4:50  3%               ` Jerin Jacob
  0 siblings, 2 replies; 200+ results
From: McDaniel, Timothy @ 2020-06-30 19:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

>-----Original Message-----
>From: Jerin Jacob <jerinjacobk@gmail.com>
>Sent: Tuesday, June 30, 2020 10:58 AM
>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
><timothy.mcdaniel@intel.com> wrote:
>>
>> >-----Original Message-----
>> >From: Jerin Jacob <jerinjacobk@gmail.com>
>> >Sent: Monday, June 29, 2020 11:21 PM
>> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
>> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>> >
>> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
>> ><timothy.mcdaniel@intel.com> wrote:
>> >>
>> >> -----Original Message-----
>> >> From: Jerin Jacob <jerinjacobk@gmail.com>
>> >> Sent: Saturday, June 27, 2020 2:45 AM
>> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
>> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
>> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>> >>
>> >> > +
>> >> > +/** Event port configuration structure */
>> >> > +struct rte_event_port_conf_v20 {
>> >> > +       int32_t new_event_threshold;
>> >> > +       /**< A backpressure threshold for new event enqueues on this port.
>> >> > +        * Use for *closed system* event dev where event capacity is limited,
>> >> > +        * and cannot exceed the capacity of the event dev.
>> >> > +        * Configuring ports with different thresholds can make higher priority
>> >> > +        * traffic less likely to  be backpressured.
>> >> > +        * For example, a port used to inject NIC Rx packets into the event dev
>> >> > +        * can have a lower threshold so as not to overwhelm the device,
>> >> > +        * while ports used for worker pools can have a higher threshold.
>> >> > +        * This value cannot exceed the *nb_events_limit*
>> >> > +        * which was previously supplied to rte_event_dev_configure().
>> >> > +        * This should be set to '-1' for *open system*.
>> >> > +        */
>> >> > +       uint16_t dequeue_depth;
>> >> > +       /**< Configure number of bulk dequeues for this event port.
>> >> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>> >> > +        * which previously supplied to rte_event_dev_configure().
>> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>> >capable.
>> >> > +        */
>> >> > +       uint16_t enqueue_depth;
>> >> > +       /**< Configure number of bulk enqueues for this event port.
>> >> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>> >> > +        * which previously supplied to rte_event_dev_configure().
>> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>> >capable.
>> >> > +        */
>> >> >         uint8_t disable_implicit_release;
>> >> >         /**< Configure the port not to release outstanding events in
>> >> >          * rte_event_dev_dequeue_burst(). If true, all events received through
>> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>> >> >                                 struct rte_event_port_conf *port_conf);
>> >> >
>> >> > +int
>> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>> >> > +                               struct rte_event_port_conf_v20 *port_conf);
>> >> > +
>> >> > +int
>> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>> >> > +                                     struct rte_event_port_conf *port_conf);
>> >>
>> >> Hi Timothy,
>> >>
>> >> + ABI Maintainers (Ray, Neil)
>> >>
>> >> # As per my understanding, the structures can not be versioned, only
>> >> function can be versioned.
>> >> i.e we can not make any change to " struct rte_event_port_conf"
>> >>
>> >> # We have a similar case with ethdev and it deferred to next release v20.11
>> >> http://patches.dpdk.org/patch/69113/
>> >>
>> >> Regarding the API changes:
>> >> # The slow path changes general looks good to me. I will review the
>> >> next level in the coming days
>> >> # The following fast path changes bothers to me. Could you share more
>> >> details on below change?
>> >>
>> >> diff --git a/app/test-eventdev/test_order_atq.c
>> >> b/app/test-eventdev/test_order_atq.c
>> >> index 3366cfc..8246b96 100644
>> >> --- a/app/test-eventdev/test_order_atq.c
>> >> +++ b/app/test-eventdev/test_order_atq.c
>> >> @@ -34,6 +34,8 @@
>> >>                         continue;
>> >>                 }
>> >>
>> >> +               ev.flow_id = ev.mbuf->udata64;
>> >> +
>> >> # Since RC1 is near, I am not sure how to accommodate the API changes
>> >> now and sort out ABI stuffs.
>> >> # Other concern is eventdev spec get bloated with versioning files
>> >> just for ONE release as 20.11 will be OK to change the ABI.
>> >> # While we discuss the API change, Please send deprecation notice for
>> >> ABI change for 20.11,
>> >> so that there is no ambiguity of this patch for the 20.11 release.
>> >>
>> >> Hello Jerin,
>> >>
>> >> Thank you for the review comments.
>> >>
>> >> With regard to your comments regarding the fast path flow_id change, the
>Intel
>> >DLB hardware
>> >> is not capable of transferring the flow_id as part of the event itself. We
>> >therefore require a mechanism
>> >> to accomplish this. What we have done to work around this is to require the
>> >application to embed the flow_id
>> >> within the data payload. The new flag, #define
>> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>> >> by applications to determine if they need to embed the flow_id, or if its
>> >automatically propagated and present in the
>> >> received event.
>> >>
>> >> What we should have done is to wrap the assignment with a conditional.
>> >>
>> >> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>> >>         ev.flow_id = ev.mbuf->udata64;
>> >
>> >Two problems with this approach,
>> >1) we are assuming mbuf udata64 field is available for DLB driver
>> >2) It won't work with another adapter, eventdev has no dependency with mbuf
>> >
>>
>> This snippet is not intended to suggest that udata64 always be used to store the
>flow ID, but as an example of how an application could do it. Some applications
>won’t need to carry the flow ID through; others can select an unused field in the
>event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-generate
>the flow ID in pipeline stages that require it.
>
>OK.
>>
>> >Question:
>> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
>> >only event pointer and not have any other metadata like schedule_type
>> >etc.
>> >
>>
>> The DLB device provides a 16B “queue entry” that consists of:
>>
>> *       8B event data
>> *       Queue ID
>> *       Priority
>> *       Scheduling type
>> *       19 bits of carried-through data
>> *       Assorted error/debug/reserved bits that are set by the device (not carried-
>through)
>>
>>  For the carried-through 19b, we use 12b for event_type and sub_event_type.
>
>I can only think of TWO options to help
>1) Since event pointer always cache aligned, You could grab LSB
>6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
>structure
>2) Have separate mempool driver using existing drivers, ie "event
>pointer" + or - some offset have any amount of custom data.
>

We can't guarantee that the event will contain a pointer -- it's possible that 8B is inline data (i.e. struct rte_event's u64 field).

It's really an application decision -- for example an app could allocate space in the 'mbuf private data' to store the flow ID, if the event device lacks that carry-flow-ID capability and the other mbuf fields can't be used for whatever reason.
We modified the tests, sample apps to show how this might be done, not necessarily how it must be done.

>
>>
>> >
>> >>
>> >> This would minimize/eliminate any performance impact due to the
>processor's
>> >branch prediction logic.
>
>I think, If we need to change common fastpath, better we need to make
>it template to create code for compile-time to have absolute zero
>overhead
>and use runtime.
>See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
>_create_ worker at compile time based on runtime capability.
>

Yes, that would be perfect.  Thanks for the example!

>
>
>> >> The assignment then becomes in essence a NOOP for all event devices that
>are
>> >capable of carrying the flow_id as part of the event payload itself.
>> >>
>> >> Thanks,
>> >> Tim
>> >>
>> >>
>> >>
>> >> Thanks,
>> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 15:37  0%         ` McDaniel, Timothy
@ 2020-06-30 15:57  0%           ` Jerin Jacob
  2020-06-30 19:26  0%             ` McDaniel, Timothy
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-30 15:57 UTC (permalink / raw)
  To: McDaniel, Timothy
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
> >-----Original Message-----
> >From: Jerin Jacob <jerinjacobk@gmail.com>
> >Sent: Monday, June 29, 2020 11:21 PM
> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >
> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
> ><timothy.mcdaniel@intel.com> wrote:
> >>
> >> -----Original Message-----
> >> From: Jerin Jacob <jerinjacobk@gmail.com>
> >> Sent: Saturday, June 27, 2020 2:45 AM
> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >>
> >> > +
> >> > +/** Event port configuration structure */
> >> > +struct rte_event_port_conf_v20 {
> >> > +       int32_t new_event_threshold;
> >> > +       /**< A backpressure threshold for new event enqueues on this port.
> >> > +        * Use for *closed system* event dev where event capacity is limited,
> >> > +        * and cannot exceed the capacity of the event dev.
> >> > +        * Configuring ports with different thresholds can make higher priority
> >> > +        * traffic less likely to  be backpressured.
> >> > +        * For example, a port used to inject NIC Rx packets into the event dev
> >> > +        * can have a lower threshold so as not to overwhelm the device,
> >> > +        * while ports used for worker pools can have a higher threshold.
> >> > +        * This value cannot exceed the *nb_events_limit*
> >> > +        * which was previously supplied to rte_event_dev_configure().
> >> > +        * This should be set to '-1' for *open system*.
> >> > +        */
> >> > +       uint16_t dequeue_depth;
> >> > +       /**< Configure number of bulk dequeues for this event port.
> >> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> >> > +        * which previously supplied to rte_event_dev_configure().
> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
> >capable.
> >> > +        */
> >> > +       uint16_t enqueue_depth;
> >> > +       /**< Configure number of bulk enqueues for this event port.
> >> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> >> > +        * which previously supplied to rte_event_dev_configure().
> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
> >capable.
> >> > +        */
> >> >         uint8_t disable_implicit_release;
> >> >         /**< Configure the port not to release outstanding events in
> >> >          * rte_event_dev_dequeue_burst(). If true, all events received through
> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >> >                                 struct rte_event_port_conf *port_conf);
> >> >
> >> > +int
> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> >> > +                               struct rte_event_port_conf_v20 *port_conf);
> >> > +
> >> > +int
> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> >> > +                                     struct rte_event_port_conf *port_conf);
> >>
> >> Hi Timothy,
> >>
> >> + ABI Maintainers (Ray, Neil)
> >>
> >> # As per my understanding, the structures can not be versioned, only
> >> function can be versioned.
> >> i.e we can not make any change to " struct rte_event_port_conf"
> >>
> >> # We have a similar case with ethdev and it deferred to next release v20.11
> >> http://patches.dpdk.org/patch/69113/
> >>
> >> Regarding the API changes:
> >> # The slow path changes general looks good to me. I will review the
> >> next level in the coming days
> >> # The following fast path changes bothers to me. Could you share more
> >> details on below change?
> >>
> >> diff --git a/app/test-eventdev/test_order_atq.c
> >> b/app/test-eventdev/test_order_atq.c
> >> index 3366cfc..8246b96 100644
> >> --- a/app/test-eventdev/test_order_atq.c
> >> +++ b/app/test-eventdev/test_order_atq.c
> >> @@ -34,6 +34,8 @@
> >>                         continue;
> >>                 }
> >>
> >> +               ev.flow_id = ev.mbuf->udata64;
> >> +
> >> # Since RC1 is near, I am not sure how to accommodate the API changes
> >> now and sort out ABI stuffs.
> >> # Other concern is eventdev spec get bloated with versioning files
> >> just for ONE release as 20.11 will be OK to change the ABI.
> >> # While we discuss the API change, Please send deprecation notice for
> >> ABI change for 20.11,
> >> so that there is no ambiguity of this patch for the 20.11 release.
> >>
> >> Hello Jerin,
> >>
> >> Thank you for the review comments.
> >>
> >> With regard to your comments regarding the fast path flow_id change, the Intel
> >DLB hardware
> >> is not capable of transferring the flow_id as part of the event itself. We
> >therefore require a mechanism
> >> to accomplish this. What we have done to work around this is to require the
> >application to embed the flow_id
> >> within the data payload. The new flag, #define
> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
> >> by applications to determine if they need to embed the flow_id, or if its
> >automatically propagated and present in the
> >> received event.
> >>
> >> What we should have done is to wrap the assignment with a conditional.
> >>
> >> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
> >>         ev.flow_id = ev.mbuf->udata64;
> >
> >Two problems with this approach,
> >1) we are assuming mbuf udata64 field is available for DLB driver
> >2) It won't work with another adapter, eventdev has no dependency with mbuf
> >
>
> This snippet is not intended to suggest that udata64 always be used to store the flow ID, but as an example of how an application could do it. Some applications won’t need to carry the flow ID through; others can select an unused field in the event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-generate the flow ID in pipeline stages that require it.

OK.
>
> >Question:
> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
> >only event pointer and not have any other metadata like schedule_type
> >etc.
> >
>
> The DLB device provides a 16B “queue entry” that consists of:
>
> *       8B event data
> *       Queue ID
> *       Priority
> *       Scheduling type
> *       19 bits of carried-through data
> *       Assorted error/debug/reserved bits that are set by the device (not carried-through)
>
>  For the carried-through 19b, we use 12b for event_type and sub_event_type.

I can only think of TWO options to help
1) Since event pointer always cache aligned, You could grab LSB
6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
structure
2) Have separate mempool driver using existing drivers, ie "event
pointer" + or - some offset have any amount of custom data.


>
> >
> >>
> >> This would minimize/eliminate any performance impact due to the processor's
> >branch prediction logic.

I think, If we need to change common fastpath, better we need to make
it template to create code for compile-time to have absolute zero
overhead
and use runtime.
See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
_create_ worker at compile time based on runtime capability.



> >> The assignment then becomes in essence a NOOP for all event devices that are
> >capable of carrying the flow_id as part of the event payload itself.
> >>
> >> Thanks,
> >> Tim
> >>
> >>
> >>
> >> Thanks,
> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30  4:21  0%       ` Jerin Jacob
@ 2020-06-30 15:37  0%         ` McDaniel, Timothy
  2020-06-30 15:57  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: McDaniel, Timothy @ 2020-06-30 15:37 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

>-----Original Message-----
>From: Jerin Jacob <jerinjacobk@gmail.com>
>Sent: Monday, June 29, 2020 11:21 PM
>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
><timothy.mcdaniel@intel.com> wrote:
>>
>> -----Original Message-----
>> From: Jerin Jacob <jerinjacobk@gmail.com>
>> Sent: Saturday, June 27, 2020 2:45 AM
>> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>>
>> > +
>> > +/** Event port configuration structure */
>> > +struct rte_event_port_conf_v20 {
>> > +       int32_t new_event_threshold;
>> > +       /**< A backpressure threshold for new event enqueues on this port.
>> > +        * Use for *closed system* event dev where event capacity is limited,
>> > +        * and cannot exceed the capacity of the event dev.
>> > +        * Configuring ports with different thresholds can make higher priority
>> > +        * traffic less likely to  be backpressured.
>> > +        * For example, a port used to inject NIC Rx packets into the event dev
>> > +        * can have a lower threshold so as not to overwhelm the device,
>> > +        * while ports used for worker pools can have a higher threshold.
>> > +        * This value cannot exceed the *nb_events_limit*
>> > +        * which was previously supplied to rte_event_dev_configure().
>> > +        * This should be set to '-1' for *open system*.
>> > +        */
>> > +       uint16_t dequeue_depth;
>> > +       /**< Configure number of bulk dequeues for this event port.
>> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>> > +        * which previously supplied to rte_event_dev_configure().
>> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>capable.
>> > +        */
>> > +       uint16_t enqueue_depth;
>> > +       /**< Configure number of bulk enqueues for this event port.
>> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>> > +        * which previously supplied to rte_event_dev_configure().
>> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>capable.
>> > +        */
>> >         uint8_t disable_implicit_release;
>> >         /**< Configure the port not to release outstanding events in
>> >          * rte_event_dev_dequeue_burst(). If true, all events received through
>> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>> >                                 struct rte_event_port_conf *port_conf);
>> >
>> > +int
>> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>> > +                               struct rte_event_port_conf_v20 *port_conf);
>> > +
>> > +int
>> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>> > +                                     struct rte_event_port_conf *port_conf);
>>
>> Hi Timothy,
>>
>> + ABI Maintainers (Ray, Neil)
>>
>> # As per my understanding, the structures can not be versioned, only
>> function can be versioned.
>> i.e we can not make any change to " struct rte_event_port_conf"
>>
>> # We have a similar case with ethdev and it deferred to next release v20.11
>> http://patches.dpdk.org/patch/69113/
>>
>> Regarding the API changes:
>> # The slow path changes general looks good to me. I will review the
>> next level in the coming days
>> # The following fast path changes bothers to me. Could you share more
>> details on below change?
>>
>> diff --git a/app/test-eventdev/test_order_atq.c
>> b/app/test-eventdev/test_order_atq.c
>> index 3366cfc..8246b96 100644
>> --- a/app/test-eventdev/test_order_atq.c
>> +++ b/app/test-eventdev/test_order_atq.c
>> @@ -34,6 +34,8 @@
>>                         continue;
>>                 }
>>
>> +               ev.flow_id = ev.mbuf->udata64;
>> +
>> # Since RC1 is near, I am not sure how to accommodate the API changes
>> now and sort out ABI stuffs.
>> # Other concern is eventdev spec get bloated with versioning files
>> just for ONE release as 20.11 will be OK to change the ABI.
>> # While we discuss the API change, Please send deprecation notice for
>> ABI change for 20.11,
>> so that there is no ambiguity of this patch for the 20.11 release.
>>
>> Hello Jerin,
>>
>> Thank you for the review comments.
>>
>> With regard to your comments regarding the fast path flow_id change, the Intel
>DLB hardware
>> is not capable of transferring the flow_id as part of the event itself. We
>therefore require a mechanism
>> to accomplish this. What we have done to work around this is to require the
>application to embed the flow_id
>> within the data payload. The new flag, #define
>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>> by applications to determine if they need to embed the flow_id, or if its
>automatically propagated and present in the
>> received event.
>>
>> What we should have done is to wrap the assignment with a conditional.
>>
>> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>>         ev.flow_id = ev.mbuf->udata64;
>
>Two problems with this approach,
>1) we are assuming mbuf udata64 field is available for DLB driver
>2) It won't work with another adapter, eventdev has no dependency with mbuf
>

This snippet is not intended to suggest that udata64 always be used to store the flow ID, but as an example of how an application could do it. Some applications won’t need to carry the flow ID through; others can select an unused field in the event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-generate the flow ID in pipeline stages that require it.

>Question:
>1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
>only event pointer and not have any other metadata like schedule_type
>etc.
>

The DLB device provides a 16B “queue entry” that consists of:

*	8B event data
*	Queue ID
*	Priority
*	Scheduling type
*	19 bits of carried-through data
*	Assorted error/debug/reserved bits that are set by the device (not carried-through)

 For the carried-through 19b, we use 12b for event_type and sub_event_type.

>
>>
>> This would minimize/eliminate any performance impact due to the processor's
>branch prediction logic.
>> The assignment then becomes in essence a NOOP for all event devices that are
>capable of carrying the flow_id as part of the event payload itself.
>>
>> Thanks,
>> Tim
>>
>>
>>
>> Thanks,
>> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 11:36  0%         ` Kinsella, Ray
@ 2020-06-30 12:14  0%           ` Jerin Jacob
  2020-07-02 15:21  0%             ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-30 12:14 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: Tim McDaniel, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Gage Eads, Van Haaren, Harry

On Tue, Jun 30, 2020 at 5:06 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
>
>
> On 30/06/2020 12:30, Jerin Jacob wrote:
> > On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
> >>
> >>
> >>
> >> On 27/06/2020 08:44, Jerin Jacob wrote:
> >>>> +
> >>>> +/** Event port configuration structure */
> >>>> +struct rte_event_port_conf_v20 {
> >>>> +       int32_t new_event_threshold;
> >>>> +       /**< A backpressure threshold for new event enqueues on this port.
> >>>> +        * Use for *closed system* event dev where event capacity is limited,
> >>>> +        * and cannot exceed the capacity of the event dev.
> >>>> +        * Configuring ports with different thresholds can make higher priority
> >>>> +        * traffic less likely to  be backpressured.
> >>>> +        * For example, a port used to inject NIC Rx packets into the event dev
> >>>> +        * can have a lower threshold so as not to overwhelm the device,
> >>>> +        * while ports used for worker pools can have a higher threshold.
> >>>> +        * This value cannot exceed the *nb_events_limit*
> >>>> +        * which was previously supplied to rte_event_dev_configure().
> >>>> +        * This should be set to '-1' for *open system*.
> >>>> +        */
> >>>> +       uint16_t dequeue_depth;
> >>>> +       /**< Configure number of bulk dequeues for this event port.
> >>>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> >>>> +        * which previously supplied to rte_event_dev_configure().
> >>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> >>>> +        */
> >>>> +       uint16_t enqueue_depth;
> >>>> +       /**< Configure number of bulk enqueues for this event port.
> >>>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> >>>> +        * which previously supplied to rte_event_dev_configure().
> >>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> >>>> +        */
> >>>>         uint8_t disable_implicit_release;
> >>>>         /**< Configure the port not to release outstanding events in
> >>>>          * rte_event_dev_dequeue_burst(). If true, all events received through
> >>>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >>>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >>>>                                 struct rte_event_port_conf *port_conf);
> >>>>
> >>>> +int
> >>>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> >>>> +                               struct rte_event_port_conf_v20 *port_conf);
> >>>> +
> >>>> +int
> >>>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> >>>> +                                     struct rte_event_port_conf *port_conf);
> >>>
> >>> Hi Timothy,
> >>>
> >>> + ABI Maintainers (Ray, Neil)
> >>>
> >>> # As per my understanding, the structures can not be versioned, only
> >>> function can be versioned.
> >>> i.e we can not make any change to " struct rte_event_port_conf"
> >>
> >> So the answer is (as always): depends
> >>
> >> If the structure is being use in inline functions is when you run into trouble
> >> - as knowledge of the structure is embedded in the linked application.
> >>
> >> However if the structure is _strictly_ being used as a non-inlined function parameter,
> >> It can be safe to version in this way.
> >
> > But based on the optimization applied when building the consumer code
> > matters. Right?
> > i.e compiler can "inline" it, based on the optimization even the
> > source code explicitly mentions it.
>
> Well a compiler will typically only inline within the confines of a given object file, or
> binary, if LTO is enabled.

>
> If a function symbol is exported from library however, it won't be inlined in a linked application.

Yes, With respect to that function.
But the application can use struct rte_event_port_conf in their code
and it can be part of other structures.
Right?


> The compiler doesn't have enough information to inline it.
> All the compiler will know about it is it's offset in memory, and it's signature.
>
> >
> >
> >>
> >> So just to be clear, it is still the function that is actually being versioned here.
> >>
> >>>
> >>> # We have a similar case with ethdev and it deferred to next release v20.11
> >>> http://patches.dpdk.org/patch/69113/
> >>
> >> Yes - I spent a why looking at this one, but I am struggling to recall,
> >> why when I looked it we didn't suggest function versioning as a potential solution in this case.
> >>
> >> Looking back at it now, looks like it would have been ok.
> >
> > Ok.
> >
> >>
> >>>
> >>> Regarding the API changes:
> >>> # The slow path changes general looks good to me. I will review the
> >>> next level in the coming days
> >>> # The following fast path changes bothers to me. Could you share more
> >>> details on below change?
> >>>
> >>> diff --git a/app/test-eventdev/test_order_atq.c
> >>> b/app/test-eventdev/test_order_atq.c
> >>> index 3366cfc..8246b96 100644
> >>> --- a/app/test-eventdev/test_order_atq.c
> >>> +++ b/app/test-eventdev/test_order_atq.c
> >>> @@ -34,6 +34,8 @@
> >>>                         continue;
> >>>                 }
> >>>
> >>> +               ev.flow_id = ev.mbuf->udata64;
> >>> +
> >>> # Since RC1 is near, I am not sure how to accommodate the API changes
> >>> now and sort out ABI stuffs.
> >>> # Other concern is eventdev spec get bloated with versioning files
> >>> just for ONE release as 20.11 will be OK to change the ABI.
> >>> # While we discuss the API change, Please send deprecation notice for
> >>> ABI change for 20.11,
> >>> so that there is no ambiguity of this patch for the 20.11 release.
> >>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 11:30  0%       ` Jerin Jacob
@ 2020-06-30 11:36  0%         ` Kinsella, Ray
  2020-06-30 12:14  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-06-30 11:36 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Tim McDaniel, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Gage Eads, Van Haaren, Harry



On 30/06/2020 12:30, Jerin Jacob wrote:
> On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>
>>
>>
>> On 27/06/2020 08:44, Jerin Jacob wrote:
>>>> +
>>>> +/** Event port configuration structure */
>>>> +struct rte_event_port_conf_v20 {
>>>> +       int32_t new_event_threshold;
>>>> +       /**< A backpressure threshold for new event enqueues on this port.
>>>> +        * Use for *closed system* event dev where event capacity is limited,
>>>> +        * and cannot exceed the capacity of the event dev.
>>>> +        * Configuring ports with different thresholds can make higher priority
>>>> +        * traffic less likely to  be backpressured.
>>>> +        * For example, a port used to inject NIC Rx packets into the event dev
>>>> +        * can have a lower threshold so as not to overwhelm the device,
>>>> +        * while ports used for worker pools can have a higher threshold.
>>>> +        * This value cannot exceed the *nb_events_limit*
>>>> +        * which was previously supplied to rte_event_dev_configure().
>>>> +        * This should be set to '-1' for *open system*.
>>>> +        */
>>>> +       uint16_t dequeue_depth;
>>>> +       /**< Configure number of bulk dequeues for this event port.
>>>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>>>> +        * which previously supplied to rte_event_dev_configure().
>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>>>> +        */
>>>> +       uint16_t enqueue_depth;
>>>> +       /**< Configure number of bulk enqueues for this event port.
>>>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>>>> +        * which previously supplied to rte_event_dev_configure().
>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>>>> +        */
>>>>         uint8_t disable_implicit_release;
>>>>         /**< Configure the port not to release outstanding events in
>>>>          * rte_event_dev_dequeue_burst(). If true, all events received through
>>>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>>>>                                 struct rte_event_port_conf *port_conf);
>>>>
>>>> +int
>>>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>>>> +                               struct rte_event_port_conf_v20 *port_conf);
>>>> +
>>>> +int
>>>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>>>> +                                     struct rte_event_port_conf *port_conf);
>>>
>>> Hi Timothy,
>>>
>>> + ABI Maintainers (Ray, Neil)
>>>
>>> # As per my understanding, the structures can not be versioned, only
>>> function can be versioned.
>>> i.e we can not make any change to " struct rte_event_port_conf"
>>
>> So the answer is (as always): depends
>>
>> If the structure is being use in inline functions is when you run into trouble
>> - as knowledge of the structure is embedded in the linked application.
>>
>> However if the structure is _strictly_ being used as a non-inlined function parameter,
>> It can be safe to version in this way.
> 
> But based on the optimization applied when building the consumer code
> matters. Right?
> i.e compiler can "inline" it, based on the optimization even the
> source code explicitly mentions it.

Well a compiler will typically only inline within the confines of a given object file, or 
binary, if LTO is enabled. 

If a function symbol is exported from library however, it won't be inlined in a linked application. 
The compiler doesn't have enough information to inline it. 
All the compiler will know about it is it's offset in memory, and it's signature. 

> 
> 
>>
>> So just to be clear, it is still the function that is actually being versioned here.
>>
>>>
>>> # We have a similar case with ethdev and it deferred to next release v20.11
>>> http://patches.dpdk.org/patch/69113/
>>
>> Yes - I spent a why looking at this one, but I am struggling to recall,
>> why when I looked it we didn't suggest function versioning as a potential solution in this case.
>>
>> Looking back at it now, looks like it would have been ok.
> 
> Ok.
> 
>>
>>>
>>> Regarding the API changes:
>>> # The slow path changes general looks good to me. I will review the
>>> next level in the coming days
>>> # The following fast path changes bothers to me. Could you share more
>>> details on below change?
>>>
>>> diff --git a/app/test-eventdev/test_order_atq.c
>>> b/app/test-eventdev/test_order_atq.c
>>> index 3366cfc..8246b96 100644
>>> --- a/app/test-eventdev/test_order_atq.c
>>> +++ b/app/test-eventdev/test_order_atq.c
>>> @@ -34,6 +34,8 @@
>>>                         continue;
>>>                 }
>>>
>>> +               ev.flow_id = ev.mbuf->udata64;
>>> +
>>> # Since RC1 is near, I am not sure how to accommodate the API changes
>>> now and sort out ABI stuffs.
>>> # Other concern is eventdev spec get bloated with versioning files
>>> just for ONE release as 20.11 will be OK to change the ABI.
>>> # While we discuss the API change, Please send deprecation notice for
>>> ABI change for 20.11,
>>> so that there is no ambiguity of this patch for the 20.11 release.
>>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 11:22  0%     ` Kinsella, Ray
@ 2020-06-30 11:30  0%       ` Jerin Jacob
  2020-06-30 11:36  0%         ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-30 11:30 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: Tim McDaniel, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Gage Eads, Van Haaren, Harry

On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
>
>
> On 27/06/2020 08:44, Jerin Jacob wrote:
> >> +
> >> +/** Event port configuration structure */
> >> +struct rte_event_port_conf_v20 {
> >> +       int32_t new_event_threshold;
> >> +       /**< A backpressure threshold for new event enqueues on this port.
> >> +        * Use for *closed system* event dev where event capacity is limited,
> >> +        * and cannot exceed the capacity of the event dev.
> >> +        * Configuring ports with different thresholds can make higher priority
> >> +        * traffic less likely to  be backpressured.
> >> +        * For example, a port used to inject NIC Rx packets into the event dev
> >> +        * can have a lower threshold so as not to overwhelm the device,
> >> +        * while ports used for worker pools can have a higher threshold.
> >> +        * This value cannot exceed the *nb_events_limit*
> >> +        * which was previously supplied to rte_event_dev_configure().
> >> +        * This should be set to '-1' for *open system*.
> >> +        */
> >> +       uint16_t dequeue_depth;
> >> +       /**< Configure number of bulk dequeues for this event port.
> >> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> >> +        * which previously supplied to rte_event_dev_configure().
> >> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> >> +        */
> >> +       uint16_t enqueue_depth;
> >> +       /**< Configure number of bulk enqueues for this event port.
> >> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> >> +        * which previously supplied to rte_event_dev_configure().
> >> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> >> +        */
> >>         uint8_t disable_implicit_release;
> >>         /**< Configure the port not to release outstanding events in
> >>          * rte_event_dev_dequeue_burst(). If true, all events received through
> >> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >>                                 struct rte_event_port_conf *port_conf);
> >>
> >> +int
> >> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> >> +                               struct rte_event_port_conf_v20 *port_conf);
> >> +
> >> +int
> >> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> >> +                                     struct rte_event_port_conf *port_conf);
> >
> > Hi Timothy,
> >
> > + ABI Maintainers (Ray, Neil)
> >
> > # As per my understanding, the structures can not be versioned, only
> > function can be versioned.
> > i.e we can not make any change to " struct rte_event_port_conf"
>
> So the answer is (as always): depends
>
> If the structure is being use in inline functions is when you run into trouble
> - as knowledge of the structure is embedded in the linked application.
>
> However if the structure is _strictly_ being used as a non-inlined function parameter,
> It can be safe to version in this way.

But based on the optimization applied when building the consumer code
matters. Right?
i.e compiler can "inline" it, based on the optimization even the
source code explicitly mentions it.


>
> So just to be clear, it is still the function that is actually being versioned here.
>
> >
> > # We have a similar case with ethdev and it deferred to next release v20.11
> > http://patches.dpdk.org/patch/69113/
>
> Yes - I spent a why looking at this one, but I am struggling to recall,
> why when I looked it we didn't suggest function versioning as a potential solution in this case.
>
> Looking back at it now, looks like it would have been ok.

Ok.

>
> >
> > Regarding the API changes:
> > # The slow path changes general looks good to me. I will review the
> > next level in the coming days
> > # The following fast path changes bothers to me. Could you share more
> > details on below change?
> >
> > diff --git a/app/test-eventdev/test_order_atq.c
> > b/app/test-eventdev/test_order_atq.c
> > index 3366cfc..8246b96 100644
> > --- a/app/test-eventdev/test_order_atq.c
> > +++ b/app/test-eventdev/test_order_atq.c
> > @@ -34,6 +34,8 @@
> >                         continue;
> >                 }
> >
> > +               ev.flow_id = ev.mbuf->udata64;
> > +
> > # Since RC1 is near, I am not sure how to accommodate the API changes
> > now and sort out ABI stuffs.
> > # Other concern is eventdev spec get bloated with versioning files
> > just for ONE release as 20.11 will be OK to change the ABI.
> > # While we discuss the API change, Please send deprecation notice for
> > ABI change for 20.11,
> > so that there is no ambiguity of this patch for the 20.11 release.
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-27  7:44  5%   ` Jerin Jacob
  2020-06-29 19:30  4%     ` McDaniel, Timothy
@ 2020-06-30 11:22  0%     ` Kinsella, Ray
  2020-06-30 11:30  0%       ` Jerin Jacob
  1 sibling, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-06-30 11:22 UTC (permalink / raw)
  To: Jerin Jacob, Tim McDaniel, Neil Horman
  Cc: Jerin Jacob, Mattias Rönnblom, dpdk-dev, Gage Eads,
	Van Haaren, Harry



On 27/06/2020 08:44, Jerin Jacob wrote:
>> +
>> +/** Event port configuration structure */
>> +struct rte_event_port_conf_v20 {
>> +       int32_t new_event_threshold;
>> +       /**< A backpressure threshold for new event enqueues on this port.
>> +        * Use for *closed system* event dev where event capacity is limited,
>> +        * and cannot exceed the capacity of the event dev.
>> +        * Configuring ports with different thresholds can make higher priority
>> +        * traffic less likely to  be backpressured.
>> +        * For example, a port used to inject NIC Rx packets into the event dev
>> +        * can have a lower threshold so as not to overwhelm the device,
>> +        * while ports used for worker pools can have a higher threshold.
>> +        * This value cannot exceed the *nb_events_limit*
>> +        * which was previously supplied to rte_event_dev_configure().
>> +        * This should be set to '-1' for *open system*.
>> +        */
>> +       uint16_t dequeue_depth;
>> +       /**< Configure number of bulk dequeues for this event port.
>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>> +        * which previously supplied to rte_event_dev_configure().
>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>> +        */
>> +       uint16_t enqueue_depth;
>> +       /**< Configure number of bulk enqueues for this event port.
>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>> +        * which previously supplied to rte_event_dev_configure().
>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>> +        */
>>         uint8_t disable_implicit_release;
>>         /**< Configure the port not to release outstanding events in
>>          * rte_event_dev_dequeue_burst(). If true, all events received through
>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>>                                 struct rte_event_port_conf *port_conf);
>>
>> +int
>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>> +                               struct rte_event_port_conf_v20 *port_conf);
>> +
>> +int
>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>> +                                     struct rte_event_port_conf *port_conf);
> 
> Hi Timothy,
> 
> + ABI Maintainers (Ray, Neil)
> 
> # As per my understanding, the structures can not be versioned, only
> function can be versioned.
> i.e we can not make any change to " struct rte_event_port_conf"

So the answer is (as always): depends

If the structure is being use in inline functions is when you run into trouble 
- as knowledge of the structure is embedded in the linked application. 

However if the structure is _strictly_ being used as a non-inlined function parameter,
It can be safe to version in this way. 

So just to be clear, it is still the function that is actually being versioned here.

> 
> # We have a similar case with ethdev and it deferred to next release v20.11
> http://patches.dpdk.org/patch/69113/

Yes - I spent a why looking at this one, but I am struggling to recall,
why when I looked it we didn't suggest function versioning as a potential solution in this case. 

Looking back at it now, looks like it would have been ok. 

> 
> Regarding the API changes:
> # The slow path changes general looks good to me. I will review the
> next level in the coming days
> # The following fast path changes bothers to me. Could you share more
> details on below change?
> 
> diff --git a/app/test-eventdev/test_order_atq.c
> b/app/test-eventdev/test_order_atq.c
> index 3366cfc..8246b96 100644
> --- a/app/test-eventdev/test_order_atq.c
> +++ b/app/test-eventdev/test_order_atq.c
> @@ -34,6 +34,8 @@
>                         continue;
>                 }
> 
> +               ev.flow_id = ev.mbuf->udata64;
> +
> # Since RC1 is near, I am not sure how to accommodate the API changes
> now and sort out ABI stuffs.
> # Other concern is eventdev spec get bloated with versioning files
> just for ONE release as 20.11 will be OK to change the ABI.
> # While we discuss the API change, Please send deprecation notice for
> ABI change for 20.11,
> so that there is no ambiguity of this patch for the 20.11 release.
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  @ 2020-06-30 10:35  3%         ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-06-30 10:35 UTC (permalink / raw)
  To: Bruce Richardson, David Marchand
  Cc: Ruifeng Wang, Vladimir Medvedkin, John McNamara, Marko Kovacevic,
	Neil Horman, dev, Ananyev, Konstantin, Honnappa Nagarahalli, nd



On 29/06/2020 13:55, Bruce Richardson wrote:
> On Mon, Jun 29, 2020 at 01:56:07PM +0200, David Marchand wrote:
>> On Mon, Jun 29, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>>>
>>> Currently, the tbl8 group is freed even though the readers might be
>>> using the tbl8 group entries. The freed tbl8 group can be reallocated
>>> quickly. This results in incorrect lookup results.
>>>
>>> RCU QSBR process is integrated for safe tbl8 group reclaim.
>>> Refer to RCU documentation to understand various aspects of
>>> integrating RCU library into other libraries.
>>>
>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>> ---
>>>  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
>>>  lib/librte_lpm/Makefile            |   2 +-
>>>  lib/librte_lpm/meson.build         |   1 +
>>>  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
>>>  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
>>>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>>>  6 files changed, 216 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
>>> index 1609a57d0..7cc99044a 100644
>>> --- a/doc/guides/prog_guide/lpm_lib.rst
>>> +++ b/doc/guides/prog_guide/lpm_lib.rst
>>> @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
>>>  Prefix expansion is one of the keys of this algorithm,
>>>  since it improves the speed dramatically by adding redundancy.
>>>
>>> +Deletion
>>> +~~~~~~~~
>>> +
>>> +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
>>> +the longest prefix match with the rule to be deleted, but has smaller depth.
>>> +
>>> +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
>>> +value with the replacement rule.
>>> +
>>> +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
>>> +
>>> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
>>> +
>>> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
>>> +
>>> +*   All tbl8s in the group are empty .
>>> +
>>> +*   All tbl8s in the group have the same values and with depth no greater than 24.
>>> +
>>> +Free of tbl8s have different behaviors:
>>> +
>>> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
>>> +
>>> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
>>> +
>>> +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
>>> +the tbl8 group entries. This might result in incorrect lookup results.
>>> +
>>> +RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
>>> +while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
>>> +for more details.
>>> +
>>
>> Would the lpm6 library benefit from the same?
>> Asking as I do not see much code shared between lpm and lpm6.
>>
>> [...]
>>
>>> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
>>> index 38ab512a4..41e9c49b8 100644
>>> --- a/lib/librte_lpm/rte_lpm.c
>>> +++ b/lib/librte_lpm/rte_lpm.c
>>> @@ -1,5 +1,6 @@
>>>  /* SPDX-License-Identifier: BSD-3-Clause
>>>   * Copyright(c) 2010-2014 Intel Corporation
>>> + * Copyright(c) 2020 Arm Limited
>>>   */
>>>
>>>  #include <string.h>
>>> @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
>>>                 TAILQ_REMOVE(lpm_list, te, next);
>>>
>>>         rte_mcfg_tailq_write_unlock();
>>> -
>>> +#ifdef ALLOW_EXPERIMENTAL_API
>>> +       if (lpm->dq)
>>> +               rte_rcu_qsbr_dq_delete(lpm->dq);
>>> +#endif
>>
>> All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API flag set.
>> There is no need to protect against this flag in rte_lpm.c.
>>
>> [...]
>>
>>> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>>> index b9d49ac87..7889f21b3 100644
>>> --- a/lib/librte_lpm/rte_lpm.h
>>> +++ b/lib/librte_lpm/rte_lpm.h
>>
>>> @@ -130,6 +143,28 @@ struct rte_lpm {
>>>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>>>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>>>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
>>> +#ifdef ALLOW_EXPERIMENTAL_API
>>> +       /* RCU config. */
>>> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
>>> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
>>> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
>>> +#endif
>>> +};
>>
>> This is more a comment/question for the lpm maintainers.
>>
>> Afaics, the rte_lpm structure is exported/public because of lookup
>> which is inlined.
>> But most of the structure can be hidden and stored in a private
>> structure that would embed the exposed rte_lpm.
>> The slowpath functions would only have to translate from publicly
>> exposed to internal representation (via container_of).
>>
>> This patch could do this and be the first step to hide the unneeded
>> exposure of other fields (later/in 20.11 ?).
>>
>> Thoughts?
>>
> Hiding as much of the structures as possible is always a good idea, so if
> that is possible in this patchset I would support such a move.
> 
> /Bruce
> 

Agreed - I acked the change as it doesn't break ABI compatibility.
Bruce and David's comments still hold for 20.11+. 

Ray K

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 20.11] eal: simplify exit functions
  2020-06-24  9:36  3% [dpdk-dev] [PATCH 20.11] eal: simplify exit functions Thomas Monjalon
@ 2020-06-30 10:26  0% ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-06-30 10:26 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: david.marchand, bruce.richardson, John McNamara, Marko Kovacevic,
	Neil Horman


On 24/06/2020 10:36, Thomas Monjalon wrote:
> The option RTE_EAL_ALWAYS_PANIC_ON_ERROR was off by default,
> and not customizable with meson. It is completely removed.
> 
> The function rte_dump_registers is a trace of the bare metal support
> era, and was not supported in userland. It is completely removed.
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
> Because the empty function rte_dump_registers is part of the ABI,
> this change is planned for DPDK 20.11.
> ---
>  app/test/test_debug.c                    |  3 ---
>  config/common_base                       |  1 -
>  doc/guides/howto/debug_troubleshoot.rst  |  2 +-
>  lib/librte_eal/common/eal_common_debug.c | 17 +----------------
>  lib/librte_eal/include/rte_debug.h       |  7 -------
>  lib/librte_eal/rte_eal_version.map       |  1 -
>  6 files changed, 2 insertions(+), 29 deletions(-)
> 
> diff --git a/app/test/test_debug.c b/app/test/test_debug.c
> index 25eab97e2a..834a7386f5 100644
> --- a/app/test/test_debug.c
> +++ b/app/test/test_debug.c
> @@ -66,13 +66,11 @@ test_exit_val(int exit_val)
>  	}
>  	wait(&status);
>  	printf("Child process status: %d\n", status);
> -#ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
>  	if(!WIFEXITED(status) || WEXITSTATUS(status) != (uint8_t)exit_val){
>  		printf("Child process terminated with incorrect status (expected = %d)!\n",
>  				exit_val);
>  		return -1;
>  	}
> -#endif
>  	return 0;
>  }
>  
> @@ -113,7 +111,6 @@ static int
>  test_debug(void)
>  {
>  	rte_dump_stack();
> -	rte_dump_registers();
>  	if (test_panic() < 0)
>  		return -1;
>  	if (test_exit() < 0)
> diff --git a/config/common_base b/config/common_base
> index c7d5c73215..42ad399b17 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -103,7 +103,6 @@ CONFIG_RTE_ENABLE_TRACE_FP=n
>  CONFIG_RTE_LOG_HISTORY=256
>  CONFIG_RTE_BACKTRACE=y
>  CONFIG_RTE_LIBEAL_USE_HPET=n
> -CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
>  CONFIG_RTE_EAL_IGB_UIO=n
>  CONFIG_RTE_EAL_VFIO=n
>  CONFIG_RTE_MAX_VFIO_GROUPS=64
> diff --git a/doc/guides/howto/debug_troubleshoot.rst b/doc/guides/howto/debug_troubleshoot.rst
> index cef016b2fe..1ed8be5a04 100644
> --- a/doc/guides/howto/debug_troubleshoot.rst
> +++ b/doc/guides/howto/debug_troubleshoot.rst
> @@ -313,7 +313,7 @@ Custom worker function :numref:`dtg_distributor_worker`.
>     * For high-performance execution logic ensure running it on correct NUMA
>       and non-master core.
>  
> -   * Analyze run logic with ``rte_dump_stack``, ``rte_dump_registers`` and
> +   * Analyze run logic with ``rte_dump_stack`` and
>       ``rte_memdump`` for more insights.
>  
>     * Make use of objdump to ensure opcode is matching to the desired state.
> diff --git a/lib/librte_eal/common/eal_common_debug.c b/lib/librte_eal/common/eal_common_debug.c
> index 722468754d..15418e957f 100644
> --- a/lib/librte_eal/common/eal_common_debug.c
> +++ b/lib/librte_eal/common/eal_common_debug.c
> @@ -7,14 +7,6 @@
>  #include <rte_log.h>
>  #include <rte_debug.h>
>  
> -/* not implemented */
> -void
> -rte_dump_registers(void)
> -{
> -	return;
> -}
> -
> -/* call abort(), it will generate a coredump if enabled */
>  void
>  __rte_panic(const char *funcname, const char *format, ...)
>  {
> @@ -25,8 +17,7 @@ __rte_panic(const char *funcname, const char *format, ...)
>  	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
>  	va_end(ap);
>  	rte_dump_stack();
> -	rte_dump_registers();
> -	abort();
> +	abort(); /* generate a coredump if enabled */
>  }
>  
>  /*
> @@ -46,14 +37,8 @@ rte_exit(int exit_code, const char *format, ...)
>  	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
>  	va_end(ap);
>  
> -#ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
>  	if (rte_eal_cleanup() != 0)
>  		RTE_LOG(CRIT, EAL,
>  			"EAL could not release all resources\n");
>  	exit(exit_code);
> -#else
> -	rte_dump_stack();
> -	rte_dump_registers();
> -	abort();
> -#endif
>  }
> diff --git a/lib/librte_eal/include/rte_debug.h b/lib/librte_eal/include/rte_debug.h
> index 50052c5a90..c4bc71ce28 100644
> --- a/lib/librte_eal/include/rte_debug.h
> +++ b/lib/librte_eal/include/rte_debug.h
> @@ -26,13 +26,6 @@ extern "C" {
>   */
>  void rte_dump_stack(void);
>  
> -/**
> - * Dump the registers of the calling core to the console.
> - *
> - * Note: Not implemented in a userapp environment; use gdb instead.
> - */
> -void rte_dump_registers(void);
> -
>  /**
>   * Provide notification of a critical non-recoverable error and terminate
>   * execution abnormally.
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 196eef5afa..3f36e46b3b 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -37,7 +37,6 @@ DPDK_20.0 {
>  	rte_devargs_remove;
>  	rte_devargs_type_count;
>  	rte_dump_physmem_layout;
> -	rte_dump_registers;
>  	rte_dump_stack;
>  	rte_dump_tailq;
>  	rte_eal_alarm_cancel;
> 

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper
  2020-06-26 14:47  3%   ` [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper David Marchand
  2020-06-26 15:00  0%     ` Jerin Jacob
  2020-06-29  8:59  0%     ` [dpdk-dev] [EXT] " Sunil Kumar Kori
@ 2020-06-30  9:42  0%     ` Olivier Matz
  2 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2020-06-30  9:42 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, jerinjacobk, bruce.richardson, mdr, thomas, arybchenko,
	ktraynor, ian.stokes, i.maximets, Jerin Jacob, Sunil Kumar Kori,
	Neil Horman, Harini Ramakrishnan, Omar Cardona, Pallavi Kadam,
	Ranjit Menon

On Fri, Jun 26, 2020 at 04:47:31PM +0200, David Marchand wrote:
> This is a preparation step for dynamically unregistering threads.
> 
> Since we explicitly allocate a per thread trace buffer in
> rte_thread_init, add an internal helper to free this buffer.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Note: I preferred renaming the current internal function to free all
> threads trace buffers (new name trace_mem_free()) and reuse the previous
> name (trace_mem_per_thread_free()) when freeing this buffer for a given
> thread.
> 
> Changes since v2:
> - added missing stub for windows tracing support,
> - moved free symbol to exported (experimental) ABI as a counterpart of
>   the alloc symbol we already had,
> 
> Changes since v1:
> - rebased on master, removed Windows workaround wrt traces support,
> 
> ---
>  lib/librte_eal/common/eal_common_thread.c |  9 ++++
>  lib/librte_eal/common/eal_common_trace.c  | 51 +++++++++++++++++++----
>  lib/librte_eal/common/eal_thread.h        |  5 +++
>  lib/librte_eal/common/eal_trace.h         |  2 +-
>  lib/librte_eal/include/rte_trace_point.h  |  9 ++++
>  lib/librte_eal/rte_eal_version.map        |  3 ++
>  lib/librte_eal/windows/eal.c              |  5 +++
>  7 files changed, 75 insertions(+), 9 deletions(-)

[...]

> diff --git a/lib/librte_eal/common/eal_common_trace.c b/lib/librte_eal/common/eal_common_trace.c
> index 875553d7e5..3e620d76ed 100644
> --- a/lib/librte_eal/common/eal_common_trace.c
> +++ b/lib/librte_eal/common/eal_common_trace.c
> @@ -101,7 +101,7 @@ eal_trace_fini(void)
>  {
>  	if (!rte_trace_is_enabled())
>  		return;
> -	trace_mem_per_thread_free();
> +	trace_mem_free();
>  	trace_metadata_destroy();
>  	eal_trace_args_free();
>  }
> @@ -370,24 +370,59 @@ __rte_trace_mem_per_thread_alloc(void)
>  	rte_spinlock_unlock(&trace->lock);
>  }
>  
> +static void
> +trace_mem_per_thread_free_unlocked(struct thread_mem_meta *meta)
> +{
> +	if (meta->area == TRACE_AREA_HUGEPAGE)
> +		eal_free_no_trace(meta->mem);
> +	else if (meta->area == TRACE_AREA_HEAP)
> +		free(meta->mem);
> +}
> +
> +void
> +__rte_trace_mem_per_thread_free(void)
> +{
> +	struct trace *trace = trace_obj_get();
> +	struct __rte_trace_header *header;
> +	uint32_t count;
> +
> +	if (RTE_PER_LCORE(trace_mem) == NULL)
> +		return;
> +
> +	header = RTE_PER_LCORE(trace_mem);

nit:

	header = RTE_PER_LCORE(trace_mem);
	if (header == NULL)
		return;

[...]

> diff --git a/lib/librte_eal/include/rte_trace_point.h b/lib/librte_eal/include/rte_trace_point.h
> index 377c2414aa..686b86fdb1 100644
> --- a/lib/librte_eal/include/rte_trace_point.h
> +++ b/lib/librte_eal/include/rte_trace_point.h
> @@ -230,6 +230,15 @@ __rte_trace_point_fp_is_enabled(void)
>  __rte_experimental
>  void __rte_trace_mem_per_thread_alloc(void);
>  
> +/**
> + * @internal
> + *
> + * Free trace memory buffer per thread.
> + *
> + */
> +__rte_experimental
> +void __rte_trace_mem_per_thread_free(void);

Maybe the doc comment could be reworded a bit
(and the empty line can be removed by the way).

> +
>  /**
>   * @internal
>   *
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 0d42d44ce9..5831eea4b0 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -393,6 +393,9 @@ EXPERIMENTAL {
>  	rte_trace_point_lookup;
>  	rte_trace_regexp;
>  	rte_trace_save;
> +
> +	# added in 20.08
> +	__rte_trace_mem_per_thread_free;

Is it really needed to export this function?


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 2/9] eal: fix multiple definition of per lcore thread id
  2020-06-26 14:47  3%   ` [dpdk-dev] [PATCH v4 2/9] eal: fix multiple definition of per lcore thread id David Marchand
@ 2020-06-30  9:34  0%     ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2020-06-30  9:34 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, jerinjacobk, bruce.richardson, mdr, thomas, arybchenko,
	ktraynor, ian.stokes, i.maximets, Neil Horman, Cunming Liang,
	Konstantin Ananyev

On Fri, Jun 26, 2020 at 04:47:29PM +0200, David Marchand wrote:
> Because of the inline accessor + static declaration in rte_gettid(),
> we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
> Each compilation unit will pay a cost when accessing this information
> for the first time.
> 
> $ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
> 0000000000000054 d per_lcore__thread_id.5037
> 0000000000000040 d per_lcore__thread_id.5103
> 0000000000000048 d per_lcore__thread_id.5259
> 000000000000004c d per_lcore__thread_id.5259
> 0000000000000044 d per_lcore__thread_id.5933
> 0000000000000058 d per_lcore__thread_id.6261
> 0000000000000050 d per_lcore__thread_id.7378
> 000000000000005c d per_lcore__thread_id.7496
> 000000000000000c d per_lcore__thread_id.8016
> 0000000000000010 d per_lcore__thread_id.8431
> 
> Make it global as part of the DPDK_21 stable ABI.
> 
> Fixes: ef76436c6834 ("eal: get unique thread id")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>

Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-06-30  7:30  4% ` David Marchand
@ 2020-06-30  7:35  3%   ` Akhil Goyal
  2020-07-02 17:54  0%     ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2020-06-30  7:35 UTC (permalink / raw)
  To: David Marchand, Nicolas Chautru; +Cc: dev, Thomas Monjalon


> 
> Hello Nicolas,
> 
> On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
> <nicolas.chautru@intel.com> wrote:
> >
> > Planning to move bbdev API to stable from 20.11 (ABI version 21)
> > and remove experimental tag.
> > Sending now to advertise and get any feedback.
> > Some manual rebase will be required later on notably as the
> > actual release note which is not there yet.
> 
> Cool that we want to stabilize this API.
> My concern is that we have drivers from a single vendor.
> I would hate to see a new vendor unable to submit a driver (or having
> to wait until the next ABI breakage window) because of the current
> API/ABI.
> 
> 

+1 from my side. I am not sure how much it is acceptable for all the vendors/customers.
It is not reviewed by most of the vendors who may support in future.
It is not good to remove experimental tag as we have a long 1 year cycle to break the API/ABI.

Regards,
Akhil

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [dpdk-announce] DPDK Userspace CFP now open; help celebrate 10 years of DPDK
@ 2020-06-29 22:36  3% Jill Lovato
  0 siblings, 0 replies; 200+ results
From: Jill Lovato @ 2020-06-29 22:36 UTC (permalink / raw)
  To: announce

DPDK Community,

We are moving forward with a virtual experience for DPDK Userspace
<https://events.linuxfoundation.org/dpdk-userspace-summit/program/cfp/#overview>
this year, happening September 22-23. The Call for Proposals is now open
<https://events.linuxfoundation.org/dpdk-userspace-summit/program/cfp/>
and will
close on July 12. Please plan to join us and get your submissions in
quickly. As usual, we are looking for presentations that showcase the
following:

-- Enhancements and additions to DPDK libraries, functional or
performance-wise
-- New networking technologies and their applicability to DPDK
-- Hardware NIC capabilities and offloads
-- Hardware datapath accelerators (compression, crypto, baseband, GPU,
regex, etc)
-- Virtualization and container networking
-- Debug tooling (tracing, dumps, metrics, telemetry, monitoring)
-- DPDK consumability (API/ABI compatibility, OS integration, packaging)
-- Project infrastructure, security, testing and workflow
-- Developer stories, technical challenges when integrating or developing
with DPDK
-- Feedback from usage and deployment of DPDK applications (OSS or
proprietar

Separately, DPDK celebrates 10 years as a project during 2020! We are
working to create a virtual yearbook and would love to hear about your
favorite moments from DPDK over the years. Please take a few moments to
share your thoughts via this brief Google form (folks in China, please
email pr@dpdk.org and we will send you a Word version):

Please also check out the latest blog post, which includes updates from the
Governing and Tech Boards:


Many thanks,
Jill



*Jill Lovato*
Senior PR Manager
The Linux Foundation
jlovato@linuxfoundation.org
Phone: +1.503.703.8268

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-06-26 23:14  3% [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API Nicolas Chautru
  2020-06-26 23:14  3% ` Nicolas Chautru
@ 2020-06-30  7:30  4% ` David Marchand
  2020-06-30  7:35  3%   ` Akhil Goyal
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2020-06-30  7:30 UTC (permalink / raw)
  To: Nicolas Chautru; +Cc: dev, Thomas Monjalon, Akhil Goyal

Hello Nicolas,

On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
<nicolas.chautru@intel.com> wrote:
>
> Planning to move bbdev API to stable from 20.11 (ABI version 21)
> and remove experimental tag.
> Sending now to advertise and get any feedback.
> Some manual rebase will be required later on notably as the
> actual release note which is not there yet.

Cool that we want to stabilize this API.
My concern is that we have drivers from a single vendor.
I would hate to see a new vendor unable to submit a driver (or having
to wait until the next ABI breakage window) because of the current
API/ABI.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-29 19:30  4%     ` McDaniel, Timothy
@ 2020-06-30  4:21  0%       ` Jerin Jacob
  2020-06-30 15:37  0%         ` McDaniel, Timothy
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-30  4:21 UTC (permalink / raw)
  To: McDaniel, Timothy
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Saturday, June 27, 2020 2:45 AM
> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
> > +
> > +/** Event port configuration structure */
> > +struct rte_event_port_conf_v20 {
> > +       int32_t new_event_threshold;
> > +       /**< A backpressure threshold for new event enqueues on this port.
> > +        * Use for *closed system* event dev where event capacity is limited,
> > +        * and cannot exceed the capacity of the event dev.
> > +        * Configuring ports with different thresholds can make higher priority
> > +        * traffic less likely to  be backpressured.
> > +        * For example, a port used to inject NIC Rx packets into the event dev
> > +        * can have a lower threshold so as not to overwhelm the device,
> > +        * while ports used for worker pools can have a higher threshold.
> > +        * This value cannot exceed the *nb_events_limit*
> > +        * which was previously supplied to rte_event_dev_configure().
> > +        * This should be set to '-1' for *open system*.
> > +        */
> > +       uint16_t dequeue_depth;
> > +       /**< Configure number of bulk dequeues for this event port.
> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> > +        * which previously supplied to rte_event_dev_configure().
> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> > +        */
> > +       uint16_t enqueue_depth;
> > +       /**< Configure number of bulk enqueues for this event port.
> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> > +        * which previously supplied to rte_event_dev_configure().
> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> > +        */
> >         uint8_t disable_implicit_release;
> >         /**< Configure the port not to release outstanding events in
> >          * rte_event_dev_dequeue_burst(). If true, all events received through
> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >                                 struct rte_event_port_conf *port_conf);
> >
> > +int
> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> > +                               struct rte_event_port_conf_v20 *port_conf);
> > +
> > +int
> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> > +                                     struct rte_event_port_conf *port_conf);
>
> Hi Timothy,
>
> + ABI Maintainers (Ray, Neil)
>
> # As per my understanding, the structures can not be versioned, only
> function can be versioned.
> i.e we can not make any change to " struct rte_event_port_conf"
>
> # We have a similar case with ethdev and it deferred to next release v20.11
> http://patches.dpdk.org/patch/69113/
>
> Regarding the API changes:
> # The slow path changes general looks good to me. I will review the
> next level in the coming days
> # The following fast path changes bothers to me. Could you share more
> details on below change?
>
> diff --git a/app/test-eventdev/test_order_atq.c
> b/app/test-eventdev/test_order_atq.c
> index 3366cfc..8246b96 100644
> --- a/app/test-eventdev/test_order_atq.c
> +++ b/app/test-eventdev/test_order_atq.c
> @@ -34,6 +34,8 @@
>                         continue;
>                 }
>
> +               ev.flow_id = ev.mbuf->udata64;
> +
> # Since RC1 is near, I am not sure how to accommodate the API changes
> now and sort out ABI stuffs.
> # Other concern is eventdev spec get bloated with versioning files
> just for ONE release as 20.11 will be OK to change the ABI.
> # While we discuss the API change, Please send deprecation notice for
> ABI change for 20.11,
> so that there is no ambiguity of this patch for the 20.11 release.
>
> Hello Jerin,
>
> Thank you for the review comments.
>
> With regard to your comments regarding the fast path flow_id change, the Intel DLB hardware
> is not capable of transferring the flow_id as part of the event itself. We therefore require a mechanism
> to accomplish this. What we have done to work around this is to require the application to embed the flow_id
> within the data payload. The new flag, #define RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
> by applications to determine if they need to embed the flow_id, or if its automatically propagated and present in the
> received event.
>
> What we should have done is to wrap the assignment with a conditional.
>
> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>         ev.flow_id = ev.mbuf->udata64;

Two problems with this approach,
1) we are assuming mbuf udata64 field is available for DLB driver
2) It won't work with another adapter, eventdev has no dependency with mbuf

Question:
1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
only event pointer and not have any other metadata like schedule_type
etc.


>
> This would minimize/eliminate any performance impact due to the processor's branch prediction logic.
> The assignment then becomes in essence a NOOP for all event devices that are capable of carrying the flow_id as part of the event payload itself.
>
> Thanks,
> Tim
>
>
>
> Thanks,
> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
  2020-06-29  8:12  0%         ` Tal Shnaiderman
@ 2020-06-29 23:56  0%           ` Dmitry Kozlyuk
  2020-07-08  1:09  0%             ` Dmitry Kozlyuk
  0 siblings, 1 reply; 200+ results
From: Dmitry Kozlyuk @ 2020-06-29 23:56 UTC (permalink / raw)
  To: Tal Shnaiderman
  Cc: Ranjit Menon, Fady Bader, dev, Dmitry Malloy,
	Narcisa Ana Maria Vasile, Thomas Monjalon, Olivier Matz

On Mon, 29 Jun 2020 08:12:51 +0000, Tal Shnaiderman wrote:
> > From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > Subject: Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
> > 
> > On Sun, 28 Jun 2020 23:23:11 -0700, Ranjit Menon wrote:  
> > > On 6/28/2020 7:20 AM, Fady Bader wrote:  
> > > > Hi Dmitry,
> > > > I'm trying to run test-pmd on Windows and I ran into this error with  
> > cmdline.  
> > > >
> > > > The error log message is :
> > > > In file included from ../app/test-pmd/cmdline_flow.c:23:
> > > > ..\lib\librte_cmdline/cmdline_parse_num.h:24:2: error: 'INT64'  
> > redeclared as different kind of symbol  
> > > >    INT64
> > > >
> > > > In file included from C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/winnt.h:150,  
> > > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/minwindef.h:163,  
> > > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/windef.h:8,  
> > > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/windows.h:69,  
> > > >                   from ..\lib/librte_eal/windows/include/rte_windows.h:22,
> > > >                   from ..\lib/librte_eal/windows/include/pthread.h:20,
> > > >                   from ..\lib/librte_eal/include/rte_per_lcore.h:25,
> > > >                   from ..\lib/librte_eal/include/rte_errno.h:18,
> > > >                   from ..\lib\librte_ethdev/rte_ethdev.h:156,
> > > >                   from ../app/test-pmd/cmdline_flow.c:18:
> > > > C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/basetsd.h:32:44: note: previous declaration of 'INT64' was
> > here  
> > > >     __MINGW_EXTENSION typedef signed __int64 INT64,*PINT64;
> > > >
> > > > The same error is for the other types defined in cmdline_numtype.
> > > >
> > > > This problem with windows.h is popping in many places and some of
> > > > them are cmdline and test-pmd and librte_net.
> > > > We should find a way to exclude windows.h from the unneeded places,
> > > > is there any suggestions on how it can be done ?  
> > >
> > > We ran into this same issue when working with the code that is on the
> > > draft repo.
> > >
> > > The issue is that UINT8, UINT16, INT32, INT64 etc. are reserved types
> > > in Windows headers for integer types. We found that it is easier to
> > > change the enum in cmdline_parse_num.h than try to play with the
> > > include order of headers. AFAIK, the enums were only used to determine
> > > the type in a series of switch() statements in librte_cmdline, so we
> > > simply renamed the enums. Not sure, if that will be acceptable here.  
> > 
> > +1 for renaming enum values. It's not a problem of librte_cmdline itself
> > +but a
> > problem of its consumption on Windows, however renaming enum values
> > doesn't break ABI and winn make librte_cmdline API "namespaced".
> > 
> > I don't see a clean way not to expose windows.h, because pthread.h
> > depends on it, and if we hide implementation, librte_eal would have to
> > export pthread symbols on Windows, which is a hack (or is it?).  
> 
> test_pmd redefine BOOLEAN and PATTERN in the index enum, I'm not sure how many more conflicts we will face because of this huge include.
>
> Also, DPDK applications will inherit it unknowingly, not sure if this is common for windows libraries.

I never hit these particular conflicts, but you're right that there will be
more, e.g. I remember particularly nasty clashes in failsafe PMD, unrelated
to cmdline token names.


We could take the same approach as with networking headers: copy required
declarations instead of including them from SDK. Here's a list of what
pthread.h uses:

CloseHandle
CreateThread
DeleteSynchronizationBarrier
EnterSynchronizationBarrier
GetThreadAffinityMask
InitializeSynchronizationBarrier
OpenThread
SetPriorityClass
SetThreadAffinityMask
SetThreadPriority
TerminateThread

Windows has strict compatibility policy, so prototypes are unlikely to ever
change. None of the used functions takes string parameters, thus not affected
by A/W macros. Looks a bit messy, but it's limited in scope at least.


An external pthread library would solve the problem, but as I've reported
earlier, I failed to find a good one: [1] and [3] are tied to MinGW, although
of high quality, [2] seems outdated.

[1]: Wnpthreads:
https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-libraries/winpthreads/
[2] pthreads-win32: https://sourceware.org/pthreads-win32/
[3] mcfgthread: https://github.com/lhmouse/mcfgthread

-- 
Dmitry Kozlyuk

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/4] add PPC and Windows cross-compilation to meson test
  @ 2020-06-29 23:15  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-06-29 23:15 UTC (permalink / raw)
  To: dev; +Cc: david.marchand, bruce.richardson, drc, dmitry.kozliuk

16/06/2020 00:22, Thomas Monjalon:
> In order to better support PPC and Windows,
> their compilation is tested on Linux with Meson
> with the script test-meson-builds.sh,
> supposed to be called in every CI labs.
> 
> 
> Thomas Monjalon (4):
>   devtools: shrink cross-compilation test definition
>   devtools: allow non-standard toolchain in meson test
>   devtools: add ppc64 in meson build test
>   devtools: add Windows cross-build test with MinGW
> 
> 
> v2: update some explanations and fix ABI check


Applied



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-27  7:44  5%   ` Jerin Jacob
@ 2020-06-29 19:30  4%     ` McDaniel, Timothy
  2020-06-30  4:21  0%       ` Jerin Jacob
  2020-06-30 11:22  0%     ` Kinsella, Ray
  1 sibling, 1 reply; 200+ results
From: McDaniel, Timothy @ 2020-06-29 19:30 UTC (permalink / raw)
  To: Jerin Jacob, Ray Kinsella, Neil Horman
  Cc: Jerin Jacob, Mattias Rönnblom, dpdk-dev, Eads, Gage,
	Van Haaren, Harry

-----Original Message-----
From: Jerin Jacob <jerinjacobk@gmail.com> 
Sent: Saturday, June 27, 2020 2:45 AM
To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites

> +
> +/** Event port configuration structure */
> +struct rte_event_port_conf_v20 {
> +       int32_t new_event_threshold;
> +       /**< A backpressure threshold for new event enqueues on this port.
> +        * Use for *closed system* event dev where event capacity is limited,
> +        * and cannot exceed the capacity of the event dev.
> +        * Configuring ports with different thresholds can make higher priority
> +        * traffic less likely to  be backpressured.
> +        * For example, a port used to inject NIC Rx packets into the event dev
> +        * can have a lower threshold so as not to overwhelm the device,
> +        * while ports used for worker pools can have a higher threshold.
> +        * This value cannot exceed the *nb_events_limit*
> +        * which was previously supplied to rte_event_dev_configure().
> +        * This should be set to '-1' for *open system*.
> +        */
> +       uint16_t dequeue_depth;
> +       /**< Configure number of bulk dequeues for this event port.
> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> +        * which previously supplied to rte_event_dev_configure().
> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> +        */
> +       uint16_t enqueue_depth;
> +       /**< Configure number of bulk enqueues for this event port.
> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> +        * which previously supplied to rte_event_dev_configure().
> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> +        */
>         uint8_t disable_implicit_release;
>         /**< Configure the port not to release outstanding events in
>          * rte_event_dev_dequeue_burst(). If true, all events received through
> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>                                 struct rte_event_port_conf *port_conf);
>
> +int
> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> +                               struct rte_event_port_conf_v20 *port_conf);
> +
> +int
> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> +                                     struct rte_event_port_conf *port_conf);

Hi Timothy,

+ ABI Maintainers (Ray, Neil)

# As per my understanding, the structures can not be versioned, only
function can be versioned.
i.e we can not make any change to " struct rte_event_port_conf"

# We have a similar case with ethdev and it deferred to next release v20.11
http://patches.dpdk.org/patch/69113/

Regarding the API changes:
# The slow path changes general looks good to me. I will review the
next level in the coming days
# The following fast path changes bothers to me. Could you share more
details on below change?

diff --git a/app/test-eventdev/test_order_atq.c
b/app/test-eventdev/test_order_atq.c
index 3366cfc..8246b96 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -34,6 +34,8 @@
                        continue;
                }

+               ev.flow_id = ev.mbuf->udata64;
+
# Since RC1 is near, I am not sure how to accommodate the API changes
now and sort out ABI stuffs.
# Other concern is eventdev spec get bloated with versioning files
just for ONE release as 20.11 will be OK to change the ABI.
# While we discuss the API change, Please send deprecation notice for
ABI change for 20.11,
so that there is no ambiguity of this patch for the 20.11 release.

Hello Jerin,

Thank you for the review comments.

With regard to your comments regarding the fast path flow_id change, the Intel DLB hardware
is not capable of transferring the flow_id as part of the event itself. We therefore require a mechanism
to accomplish this. What we have done to work around this is to require the application to embed the flow_id
within the data payload. The new flag, #define RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
by applications to determine if they need to embed the flow_id, or if its automatically propagated and present in the
received event.

What we should have done is to wrap the assignment with a conditional.  

if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
	ev.flow_id = ev.mbuf->udata64;

This would minimize/eliminate any performance impact due to the processor's branch prediction logic.
The assignment then becomes in essence a NOOP for all event devices that are capable of carrying the flow_id as part of the event payload itself.

Thanks,
Tim



Thanks,
Tim

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 4/6] bus/mlx5_pci: register a PCI driver
  @ 2020-06-29 15:49  2%     ` Gaëtan Rivet
  0 siblings, 0 replies; 200+ results
From: Gaëtan Rivet @ 2020-06-29 15:49 UTC (permalink / raw)
  To: Parav Pandit; +Cc: ferruh.yigit, thomas, dev, orika, matan

On 21/06/20 19:11 +0000, Parav Pandit wrote:
> Create a mlx5 bus driver framework for invoking drivers of
> multiple classes who have registered with the mlx5_pci bus
> driver.
> 
> Validate user class arguments for supported class combinations.
> 
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
> Changelog:
> v1->v2:
>  - Address comments from Thomas and Gaetan
>  - Enhanced driver to honor RTE_PCI_DRV_PROBE_AGAIN drv_flag
>  - Use anonymous structure for class search and code changes around it
>  - Define static for class comination array
>  - Use RTE_DIM to find array size
>  - Added OOM check for strdup()
>  - Renamed copy variable to nstr_orig
>  - Returning negagive error code
>  - Returning directly if match entry found
>  - Use compat condition check
>  - Avoided cutting error message string
>  - USe uint32_t datatype instead of enum mlx5_class
>  - Changed logic to parse device arguments only once during probe()
>  - Added check to fail driver probe if multiple classes register with
>    DMA ops
>  - Renamed function to parse_class_options
> ---
>  drivers/bus/mlx5_pci/Makefile           |   2 +
>  drivers/bus/mlx5_pci/meson.build        |   2 +-
>  drivers/bus/mlx5_pci/mlx5_pci_bus.c     | 290 ++++++++++++++++++++++++
>  drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h |   1 +
>  4 files changed, 294 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/mlx5_pci/Makefile b/drivers/bus/mlx5_pci/Makefile
> index 7db977ba8..e53ed8856 100644
> --- a/drivers/bus/mlx5_pci/Makefile
> +++ b/drivers/bus/mlx5_pci/Makefile
> @@ -13,7 +13,9 @@ CFLAGS += $(WERROR_FLAGS)
>  CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
>  CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
>  CFLAGS += -I$(RTE_SDK)/drivers/bus/pci
> +CFLAGS += -D_DEFAULT_SOURCE
>  LDLIBS += -lrte_eal
> +LDLIBS += -lrte_kvargs
>  LDLIBS += -lrte_common_mlx5
>  LDLIBS += -lrte_pci -lrte_bus_pci
>  
> diff --git a/drivers/bus/mlx5_pci/meson.build b/drivers/bus/mlx5_pci/meson.build
> index cc4a84e23..5111baa4e 100644
> --- a/drivers/bus/mlx5_pci/meson.build
> +++ b/drivers/bus/mlx5_pci/meson.build
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright(c) 2020 Mellanox Technologies Ltd
>  
> -deps += ['pci', 'bus_pci', 'common_mlx5']
> +deps += ['pci', 'bus_pci', 'common_mlx5', 'kvargs']
>  install_headers('rte_bus_mlx5_pci.h')
>  sources = files('mlx5_pci_bus.c')
> diff --git a/drivers/bus/mlx5_pci/mlx5_pci_bus.c b/drivers/bus/mlx5_pci/mlx5_pci_bus.c
> index 66db3c7b0..e8f1649a3 100644
> --- a/drivers/bus/mlx5_pci/mlx5_pci_bus.c
> +++ b/drivers/bus/mlx5_pci/mlx5_pci_bus.c
> @@ -3,12 +3,302 @@
>   */
>  
>  #include "rte_bus_mlx5_pci.h"
> +#include <mlx5_common_utils.h>
>  
>  static TAILQ_HEAD(mlx5_pci_bus_drv_head, rte_mlx5_pci_driver) drv_list =
>  				TAILQ_HEAD_INITIALIZER(drv_list);
>  
> +static const struct {
> +	const char *name;
> +	unsigned int dev_class;

Let me quote you when you refused to follow my comment:

>> Yes, I acked to changed to define, but I forgot that I use the enum here.
>> So I am going to keep the enum as code reads more clear with enum.

You refused to use a fixed-width integer type as per my past comments,
for readability reasons, but changed the type to "unsigned int" instead.

I insisted in the previous commit on uint32_t for exposed ABI (even if
between internal libs). Here I accept some leeway given the
compilation-unit scope of the definition. But at this point, your choice
is certainly *NOT* to use a vague type instead.

> +} mlx5_classes[] = {

mlx5_class_names.

> +	{ .name = "vdpa", .dev_class = MLX5_CLASS_VDPA },
> +	{ .name = "net", .dev_class = MLX5_CLASS_NET },
> +};
> +
> +static const unsigned int mlx5_valid_class_combo[] = {
> +	MLX5_CLASS_NET,
> +	MLX5_CLASS_VDPA,
> +	/* New class combination should be added here */

This comment seems redundant, new class combo will be added wherever
appropriate, leave it to future dev.

> +};
> +
> +static int class_name_to_val(const char *class_name)

I think mlx5_class_from_name() is better.
(with mlx5_ namespace.)

> +{
> +	unsigned int i;

In general, size_t is the type of array iterators in C.

> +
> +	for (i = 0; i < RTE_DIM(mlx5_classes); i++) {
> +		if (strcmp(class_name, mlx5_classes[i].name) == 0)
> +			return mlx5_classes[i].dev_class;
> +
> +	}
> +	return -EINVAL;

You're mixing signed int and enum mlx5_class as return type.
Please find another way of signaling error that will make you keep the enum.

You have a sentinel value describing explicitly an invalid class, it seems perfectly
suited instead of -EINVAL. Use it instead.

> +}
> +
> +static int
> +mlx5_bus_opt_handler(__rte_unused const char *key, const char *class_names,
> +		     void *opaque)
> +{
> +	int *ret = opaque;
> +	char *nstr_org;
> +	int class_val;
> +	char *found;
> +	char *nstr;
> +
> +	*ret = 0;
> +	nstr = strdup(class_names);
> +	if (!nstr) {

Please be explicit and use (nstr == NULL).

> +		*ret = -ENOMEM;
> +		return *ret;
> +	}
> +
> +	nstr_org = nstr;

nstr_orig is more readable.

> +	while (nstr) {
        while (nstr != NULL) {

> +		/* Extract each individual class name */
> +		found = strsep(&nstr, ":");
> +		if (!found)

        ditto

> +			continue;
> +
> +		/* Check if its a valid class */
> +		class_val = class_name_to_val(found);
> +		if (class_val < 0) {

if (class_val == MLX5_CLASS_INVALID),
with the proper API change.

> +			*ret = -EINVAL;
> +			goto err;
> +		}
> +
> +		*ret |= class_val;

Once again, mixing ints and enum mlx5_class.
You don't *have* to set *ret on error.

* Change your opaque out_arg to uint32_t, stop using variable width types for bitmaps.

* Do not set it on error, use a tmp u32 for parsing and only set it once everything is ok.

* rte_kvargs_process() will mask your error values anyway, so instead set rte_errno and return -1.
  On negative return, it will itself return -1. Check for < 0 in bus_options_valid()

> +	}
> +err:
> +	free(nstr_org);
> +	if (*ret < 0)
> +		DRV_LOG(ERR, "Invalid mlx5 class options %s. Maybe typo in device class argument setting?",

Find a way to give the exact source of error. If it is an invalid name, show which token failed to be parsed
(meaning move your error code before nstr_orig is freed). Remove the "Maybe" formulation.

By the way, Thomas' comment was correct instead of mine, you should just cut your format string after
the "%s.".

> +			class_names);
> +	return *ret;
> +}
> +
> +static int
> +parse_class_options(const struct rte_devargs *devargs)
> +{
> +	const char *key = MLX5_CLASS_ARG_NAME;
> +	struct rte_kvargs *kvlist;
> +	int ret = 0;
> +
> +	if (devargs == NULL)
> +		return 0;
> +	kvlist = rte_kvargs_parse(devargs->args, NULL);
> +	if (kvlist == NULL)
> +		return 0;
> +	if (rte_kvargs_count(kvlist, key))
> +		rte_kvargs_process(kvlist, key, mlx5_bus_opt_handler, &ret);

Set ret to rte_kvargs_process() return value instead, define a specific u32 for bitmap.
Find a way to output the bitmap *separately* from the error code, or
set MLX5_CLASS_INVALID in the bitmap before returning it as sole return value for this function.
(meaning having a proper bit value for MLX5_CLASS_INVALID, if you go this way.)

I already said it in previous review, I will reformulate: stop overloading your types,
relying on implicit casts between correct and incorrect values, and merging your returned values
and the error channel.

Please be proactive into cleaning up your APIs.

> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +
>  void
>  rte_mlx5_pci_driver_register(struct rte_mlx5_pci_driver *driver)
>  {
>  	TAILQ_INSERT_TAIL(&drv_list, driver, next);
>  }
> +
> +static bool
> +mlx5_bus_match(const struct rte_mlx5_pci_driver *drv,
> +	       const struct rte_pci_device *pci_dev)
> +{
> +	const struct rte_pci_id *id_table;
> +
> +	for (id_table = drv->pci_driver.id_table; id_table->vendor_id != 0;
> +	     id_table++) {
> +		/* check if device's ids match the class driver's ones */
> +		if (id_table->vendor_id != pci_dev->id.vendor_id &&
> +				id_table->vendor_id != PCI_ANY_ID)
> +			continue;
> +		if (id_table->device_id != pci_dev->id.device_id &&
> +				id_table->device_id != PCI_ANY_ID)
> +			continue;
> +		if (id_table->subsystem_vendor_id !=
> +		    pci_dev->id.subsystem_vendor_id &&
> +		    id_table->subsystem_vendor_id != PCI_ANY_ID)
> +			continue;
> +		if (id_table->subsystem_device_id !=
> +		    pci_dev->id.subsystem_device_id &&
> +		    id_table->subsystem_device_id != PCI_ANY_ID)
> +			continue;
> +		if (id_table->class_id != pci_dev->id.class_id &&
> +				id_table->class_id != RTE_CLASS_ANY_ID)
> +			continue;
> +
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static int is_valid_class_combo(uint32_t user_classes)
> +{
> +	unsigned int i;

size_t

> +
> +	/* Verify if user specified valid supported combination */
                                    a valid combination.
> +	for (i = 0; i < RTE_DIM(mlx5_valid_class_combo); i++) {
> +		if (mlx5_valid_class_combo[i] == user_classes)

You simplified the scope of this function, which is good.
However, given the more limited scope, now it becomes a boolean
yes/no.

You are returning (0 | false) for yes, which is not ok.

reading if (is_valid_class_combo(combo)) { handle_error(combo); } is pretty
awkward.

While you're at it, you might want to use a proper bool instead.

> +			return 0;
> +	}
> +	/* Not found any valid class combination */
> +	return -EINVAL;
> +}
> +
> +static int validate_single_class_dma_ops(void)
> +{
> +	struct rte_mlx5_pci_driver *class;
> +	int dma_map_classes = 0;
> +
> +	TAILQ_FOREACH(class, &drv_list, next) {
> +		if (class->pci_driver.dma_map)
> +			dma_map_classes++;
> +	}
> +	if (dma_map_classes > 1) {
> +		DRV_LOG(ERR, "Multiple classes with DMA ops is unsupported");
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +/**
> + * DPDK callback to register to probe multiple PCI class devices.
> + *
> + * @param[in] pci_drv
> + *   PCI driver structure.
> + * @param[in] dev
> + *   PCI device information.
> + *
> + * @return
> + *   0 on success, 1 to skip this driver, a negative errno value otherwise
> + *   and rte_errno is set.
> + */
> +static int
> +mlx5_bus_pci_probe(struct rte_pci_driver *drv __rte_unused,

drv is not unused, you are passing it to sub-drivers below.

> +		   struct rte_pci_device *dev)
> +{
> +	struct rte_mlx5_pci_driver *class;

This compilation unit targets a C compiler. I think only
headers should ensure compat with C++, but this name is not great still.

driver seems more appropriate anyway to designate a driver.

> +	uint32_t user_classes = 0;
> +	int ret;
> +

Mixing ret and user_classes as you do afterward is the result of the above API issues
already outlined. I won't go over them again, please fix everything to have proper
type discipline.

> +	ret = validate_single_class_dma_ops();
> +	if (ret)
> +		return ret;
> +
> +	ret = parse_class_options(dev->device.devargs);
> +	if (ret < 0)
> +		return ret;
> +
> +	user_classes = ret;
> +	if (user_classes) {
> +		/* Validate combination here */
> +		ret = is_valid_class_combo(user_classes);
> +		if (ret) {
> +			DRV_LOG(ERR, "Unsupported mlx5 classes supplied");
> +			return ret;
> +		}
> +	}
> +
> +	/* Default to net class */
> +	if (user_classes == 0)
> +		user_classes = MLX5_CLASS_NET;
> +
> +	TAILQ_FOREACH(class, &drv_list, next) {
> +		if (!mlx5_bus_match(class, dev))
> +			continue;
> +
> +		if ((class->dev_class & user_classes) == 0)
> +			continue;
> +
> +		ret = -EINVAL;
> +		if (class->loaded) {
> +			/* If already loaded and class driver can handle
> +			 * reprobe, probe such class driver again.
> +			 */
> +			if (class->pci_driver.drv_flags & RTE_PCI_DRV_PROBE_AGAIN)
> +				ret = class->pci_driver.probe(drv, dev);

Using "drv" here instead of "class" means you are overriding the DRV_FLAG set by the
sub-driver.

Why not use "class" instead? dev->driver is setup by the upper layer, so will be correctly set
to drv instead of class.

> +		} else {
> +			ret = class->pci_driver.probe(drv, dev);
> +		}

You are ignoring probe() < 0 here, seems wrong.

> +		if (!ret)
> +			class->loaded = true;

loaded flag is not properly set.
You will set it on first successful probe, even on further errors.

Instead, use a u32 to mark each properly probed classes, then set loaded outside of this loop,
only if this "probed" bitmap matches exactly the "user_classes" bitmap.

This means also not silently ignoring dev and class mismatch. If this is the behavior you
explicitly want, then you will need to unset the mismatched class in the user_classes, so that the
exact match on probed is correct. Otherwise, logging an error is more appropriate.

> +	}
> +	return 0;
> +}
> +
> +/**
> + * DPDK callback to remove one or more class devices for a PCI device.
> + *
> + * This function removes all class devices belong to a given PCI device.
> + *
> + * @param[in] pci_dev
> + *   Pointer to the PCI device.
> + *
> + * @return
> + *   0 on success, the function cannot fail.
> + */
> +static int
> +mlx5_bus_pci_remove(struct rte_pci_device *dev)
> +{
> +	struct rte_mlx5_pci_driver *class;
> +
> +	/* Remove each class driver in reverse order */
> +	TAILQ_FOREACH_REVERSE(class, &drv_list, mlx5_pci_bus_drv_head, next) {
> +		if (class->loaded)
> +			class->pci_driver.remove(dev);
> +	}
> +	return 0;
> +}
> +
> +static int
> +mlx5_bus_pci_dma_map(struct rte_pci_device *dev, void *addr,
> +		     uint64_t iova, size_t len)
> +{
> +	struct rte_mlx5_pci_driver *class;
> +	int ret = -EINVAL;
> +
> +	TAILQ_FOREACH(class, &drv_list, next) {
> +		if (!class->pci_driver.dma_map)
> +			continue;
> +
> +		return class->pci_driver.dma_map(dev, addr, iova, len);
> +	}
> +	return ret;
> +}
> +
> +static int
> +mlx5_bus_pci_dma_unmap(struct rte_pci_device *dev, void *addr,
> +		       uint64_t iova, size_t len)
> +{
> +	struct rte_mlx5_pci_driver *class;
> +	int ret = -EINVAL;
> +
> +	TAILQ_FOREACH_REVERSE(class, &drv_list, mlx5_pci_bus_drv_head, next) {
> +		if (!class->pci_driver.dma_unmap)
> +			continue;
> +

I see no additional logging about edge-cases that were discussed previously.
You can add them to the register function.

> +		return class->pci_driver.dma_unmap(dev, addr, iova, len);
> +	}
> +	return ret;
> +}
> +
> +static const struct rte_pci_id mlx5_bus_pci_id_map[] = {
> +	{
> +		.vendor_id = 0
> +	}
> +};
> +
> +static struct rte_pci_driver mlx5_bus_driver = {
> +	.driver = {
> +		.name = "mlx5_bus_pci",
> +	},
> +	.id_table = mlx5_bus_pci_id_map,
> +	.probe = mlx5_bus_pci_probe,
> +	.remove = mlx5_bus_pci_remove,
> +	.dma_map = mlx5_bus_pci_dma_map,
> +	.dma_unmap = mlx5_bus_pci_dma_unmap,
> +	.drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
> +		     RTE_PCI_DRV_PROBE_AGAIN,
> +};
> +
> +RTE_PMD_REGISTER_PCI(mlx5_bus, mlx5_bus_driver);
> +RTE_PMD_REGISTER_PCI_TABLE(mlx5_bus, mlx5_bus_pci_id_map);
> diff --git a/drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h b/drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h
> index 571f7dfd6..c8cd7187b 100644
> --- a/drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h
> +++ b/drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h
> @@ -55,6 +55,7 @@ struct rte_mlx5_pci_driver {
>  	enum mlx5_class dev_class;		/**< Class of this driver */
>  	struct rte_pci_driver pci_driver;	/**< Inherit core pci driver. */
>  	TAILQ_ENTRY(rte_mlx5_pci_driver) next;
> +	bool loaded;
>  };
>  
>  /**
> -- 
> 2.25.4
> 

-- 
Gaëtan

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration
  @ 2020-06-29 14:08  4%   ` Matan Azrad
  0 siblings, 0 replies; 200+ results
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

As an arrangement to per queue operations in the vDPA device it is
needed to change the next experimental API:

The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
instead of per device.

A `qid` parameter was added to the API arguments list.

Setting the parameter to the value RTE_VHOST_QUEUE_ALL configures the
host notifier to all the device queues as done before this patch.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_20_08.rst |  3 +++
 drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
 drivers/vdpa/mlx5/mlx5_vdpa.c          |  6 ++++--
 lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
 lib/librte_vhost/rte_vhost.h           |  1 -
 lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
 6 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 44383b8..2d5a3f7 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -125,6 +125,9 @@ API Changes
 
 * ``rte_page_sizes`` enumeration is replaced with ``RTE_PGSIZE_xxx`` defines.
 
+* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
+  queue and not per device, a qid parameter was added to the arguments list.
+
 
 ABI Changes
 -----------
diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c
index ec97178..6a2fed3 100644
--- a/drivers/vdpa/ifc/ifcvf_vdpa.c
+++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
@@ -839,7 +839,7 @@ struct internal_list {
 	vdpa_ifcvf_stop(internal);
 	vdpa_disable_vfio_intr(internal);
 
-	ret = rte_vhost_host_notifier_ctrl(vid, false);
+	ret = rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, false);
 	if (ret && ret != -ENOTSUP)
 		goto error;
 
@@ -858,7 +858,7 @@ struct internal_list {
 	if (ret)
 		goto stop_vf;
 
-	rte_vhost_host_notifier_ctrl(vid, true);
+	rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true);
 
 	internal->sw_fallback_running = true;
 
@@ -893,7 +893,7 @@ struct internal_list {
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
-	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+	if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true) != 0)
 		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
 
 	return 0;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 159653f..97f87c5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -146,7 +146,8 @@
 	int ret;
 
 	if (priv->direct_notifier) {
-		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
+		ret = rte_vhost_host_notifier_ctrl(priv->vid,
+						   RTE_VHOST_QUEUE_ALL, false);
 		if (ret != 0) {
 			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
 				"destroyed for device %d: %d.", priv->vid, ret);
@@ -154,7 +155,8 @@
 		}
 		priv->direct_notifier = 0;
 	}
-	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
+	ret = rte_vhost_host_notifier_ctrl(priv->vid, RTE_VHOST_QUEUE_ALL,
+					   true);
 	if (ret != 0)
 		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured for"
 			" device %d: %d.", priv->vid, ret);
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index ecb3d91..fd42085 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -202,22 +202,26 @@ struct rte_vdpa_device *
 int
 rte_vdpa_get_device_num(void);
 
+#define RTE_VHOST_QUEUE_ALL UINT16_MAX
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
- * Enable/Disable host notifier mapping for a vdpa port.
+ * Enable/Disable host notifier mapping for a vdpa queue.
  *
  * @param vid
  *  vhost device id
  * @param enable
  *  true for host notifier map, false for host notifier unmap
+ * @param qid
+ *  vhost queue id, RTE_VHOST_QUEUE_ALL to configure all the device queues
  * @return
  *  0 on success, -1 on failure
  */
 __rte_experimental
 int
-rte_vhost_host_notifier_ctrl(int vid, bool enable);
+rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
 
 /**
  * @warning
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 329ed8a..1ac7eaf 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -107,7 +107,6 @@
 #define VHOST_USER_F_PROTOCOL_FEATURES	30
 #endif
 
-
 /**
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ea9cd10..4e1af91 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2951,13 +2951,13 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int rte_vhost_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
 	int vfio_device_fd, did, ret = 0;
 	uint64_t offset, size;
-	unsigned int i;
+	unsigned int i, q_start, q_last;
 
 	dev = get_device(vid);
 	if (!dev)
@@ -2981,6 +2981,16 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 	if (!vdpa_dev)
 		return -ENODEV;
 
+	if (qid == RTE_VHOST_QUEUE_ALL) {
+		q_start = 0;
+		q_last = dev->nr_vring - 1;
+	} else {
+		if (qid >= dev->nr_vring)
+			return -EINVAL;
+		q_start = qid;
+		q_last = qid;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(vdpa_dev->ops->get_vfio_device_fd, -ENOTSUP);
 	RTE_FUNC_PTR_OR_ERR_RET(vdpa_dev->ops->get_notify_area, -ENOTSUP);
 
@@ -2989,7 +2999,7 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 		return -ENOTSUP;
 
 	if (enable) {
-		for (i = 0; i < dev->nr_vring; i++) {
+		for (i = q_start; i <= q_last; i++) {
 			if (vdpa_dev->ops->get_notify_area(vid, i, &offset,
 					&size) < 0) {
 				ret = -ENOTSUP;
@@ -3004,7 +3014,7 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 		}
 	} else {
 disable:
-		for (i = 0; i < dev->nr_vring; i++) {
+		for (i = q_start; i <= q_last; i++) {
 			vhost_user_slave_set_vring_host_notifier(dev, i, -1,
 					0, 0);
 		}
-- 
1.8.3.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v10 10/10] build: generate version.map file for MinGW on Windows
  @ 2020-06-29 12:37  4%     ` talshn
  0 siblings, 0 replies; 200+ results
From: talshn @ 2020-06-29 12:37 UTC (permalink / raw)
  To: dev
  Cc: thomas, pallavi.kadam, dmitry.kozliuk, david.marchand, grive,
	ranjit.menon, navasile, harini.ramakrishnan, ocardona,
	anatoly.burakov, fady, bruce.richardson, Tal Shnaiderman

From: Tal Shnaiderman <talshn@mellanox.com>

The MinGW build for Windows has special cases where exported
function contain additional prefix:

__emutls_v.per_lcore__*

To avoid adding those prefixed functions to the version.map file
the map_to_def.py script was modified to create a map file for MinGW
with the needed changed.

The file name was changed to map_to_win.py and lib/meson.build map output
was unified with drivers/meson.build output

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
---
 buildtools/{map_to_def.py => map_to_win.py} | 11 ++++++++++-
 buildtools/meson.build                      |  4 ++--
 drivers/meson.build                         | 12 +++++++++---
 lib/meson.build                             | 19 ++++++++++++++-----
 4 files changed, 35 insertions(+), 11 deletions(-)
 rename buildtools/{map_to_def.py => map_to_win.py} (69%)

diff --git a/buildtools/map_to_def.py b/buildtools/map_to_win.py
similarity index 69%
rename from buildtools/map_to_def.py
rename to buildtools/map_to_win.py
index 6775b54a9d..2990b58634 100644
--- a/buildtools/map_to_def.py
+++ b/buildtools/map_to_win.py
@@ -10,12 +10,21 @@
 def is_function_line(ln):
     return ln.startswith('\t') and ln.endswith(';\n') and ":" not in ln
 
+# MinGW keeps the original .map file but replaces per_lcore* to __emutls_v.per_lcore*
+def create_mingw_map_file(input_map, output_map):
+    with open(input_map) as f_in, open(output_map, 'w') as f_out:
+        f_out.writelines([lines.replace('per_lcore', '__emutls_v.per_lcore') for lines in f_in.readlines()])
 
 def main(args):
     if not args[1].endswith('version.map') or \
-            not args[2].endswith('exports.def'):
+            not args[2].endswith('exports.def') and \
+            not args[2].endswith('mingw.map'):
         return 1
 
+    if args[2].endswith('mingw.map'):
+        create_mingw_map_file(args[1], args[2])
+        return 0
+
 # special case, allow override if an def file already exists alongside map file
     override_file = join(dirname(args[1]), basename(args[2]))
     if exists(override_file):
diff --git a/buildtools/meson.build b/buildtools/meson.build
index d5f8291beb..f9d2fdf74b 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -9,14 +9,14 @@ list_dir_globs = find_program('list-dir-globs.py')
 check_symbols = find_program('check-symbols.sh')
 ldflags_ibverbs_static = find_program('options-ibverbs-static.sh')
 
-# set up map-to-def script using python, either built-in or external
+# set up map-to-win script using python, either built-in or external
 python3 = import('python').find_installation(required: false)
 if python3.found()
 	py3 = [python3]
 else
 	py3 = ['meson', 'runpython']
 endif
-map_to_def_cmd = py3 + files('map_to_def.py')
+map_to_win_cmd = py3 + files('map_to_win.py')
 sphinx_wrapper = py3 + files('call-sphinx-build.py')
 
 # stable ABI always starts with "DPDK_"
diff --git a/drivers/meson.build b/drivers/meson.build
index 646a7d5eb5..2cd8505d10 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -152,16 +152,22 @@ foreach class:dpdk_driver_classes
 			implib = 'lib' + lib_name + '.dll.a'
 
 			def_file = custom_target(lib_name + '_def',
-				command: [map_to_def_cmd, '@INPUT@', '@OUTPUT@'],
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
 				input: version_map,
 				output: '@0@_exports.def'.format(lib_name))
-			lk_deps = [version_map, def_file]
+
+			mingw_map = custom_target(lib_name + '_mingw',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
+				input: version_map,
+				output: '@0@_mingw.map'.format(lib_name))
+
+			lk_deps = [version_map, def_file, mingw_map]
 			if is_windows
 				if is_ms_linker
 					lk_args = ['-Wl,/def:' + def_file.full_path(),
 						'-Wl,/implib:drivers\\' + implib]
 				else
-					lk_args = []
+					lk_args = ['-Wl,--version-script=' + mingw_map.full_path()]
 				endif
 			else
 				lk_args = ['-Wl,--version-script=' + version_map]
diff --git a/lib/meson.build b/lib/meson.build
index a8fd317a18..af66610fcb 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -149,19 +149,28 @@ foreach l:libraries
 					meson.current_source_dir(), dir_name, name)
 			implib = dir_name + '.dll.a'
 
-			def_file = custom_target(name + '_def',
-				command: [map_to_def_cmd, '@INPUT@', '@OUTPUT@'],
+			def_file = custom_target(libname + '_def',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
 				input: version_map,
-				output: 'rte_@0@_exports.def'.format(name))
+				output: '@0@_exports.def'.format(libname))
+
+			mingw_map = custom_target(libname + '_mingw',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
+				input: version_map,
+				output: '@0@_mingw.map'.format(libname))
 
 			if is_ms_linker
 				lk_args = ['-Wl,/def:' + def_file.full_path(),
 					'-Wl,/implib:lib\\' + implib]
 			else
-				lk_args = ['-Wl,--version-script=' + version_map]
+				if is_windows
+					lk_args = ['-Wl,--version-script=' + mingw_map.full_path()]
+				else
+					lk_args = ['-Wl,--version-script=' + version_map]
+				endif
 			endif
 
-			lk_deps = [version_map, def_file]
+			lk_deps = [version_map, def_file, mingw_map]
 			if not is_windows
 				# on unix systems check the output of the
 				# check-symbols.sh script, using it as a
-- 
2.16.1.windows.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper
  2020-06-26 15:00  0%     ` Jerin Jacob
@ 2020-06-29  9:07  0%       ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-29  9:07 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dpdk-dev, Richardson, Bruce, Ray Kinsella, Thomas Monjalon,
	Andrew Rybchenko, Kevin Traynor, Ian Stokes, Ilya Maximets,
	Jerin Jacob, Sunil Kumar Kori, Neil Horman, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

On Fri, Jun 26, 2020 at 5:00 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Fri, Jun 26, 2020 at 8:18 PM David Marchand
> <david.marchand@redhat.com> wrote:
> >
> > This is a preparation step for dynamically unregistering threads.
> >
> > Since we explicitly allocate a per thread trace buffer in
> > rte_thread_init, add an internal helper to free this buffer.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> > Note: I preferred renaming the current internal function to free all
> > threads trace buffers (new name trace_mem_free()) and reuse the previous
> > name (trace_mem_per_thread_free()) when freeing this buffer for a given
> > thread.
> >
> > Changes since v2:
> > - added missing stub for windows tracing support,
> > - moved free symbol to exported (experimental) ABI as a counterpart of
> >   the alloc symbol we already had,
> >
> > Changes since v1:
> > - rebased on master, removed Windows workaround wrt traces support,
>
> > +/**
> > + * Uninitialize per-lcore info for current thread.
> > + */
> > +void rte_thread_uninit(void);
> > +
>
> Is it a public API? I guess not as it not adding in .map file.
> If it is private API, Is n't it better to change as eal_thread_ like
> another private API in eal_thread.h?

Before this series, we have:
- rte_thread_ public APIs for both EAL and non-EAL threads (declared
in rte_eal_interrupts.h and rte_lcore.h),
- eal_thread_ internal APIs that apply to EAL threads (declared in
eal_thread.h),

I guess __rte_thread_ could do the trick and I will move this to eal_private.h.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] [PATCH v4 4/9] eal: introduce thread uninit helper
  2020-06-26 14:47  3%   ` [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper David Marchand
  2020-06-26 15:00  0%     ` Jerin Jacob
@ 2020-06-29  8:59  0%     ` Sunil Kumar Kori
  2020-06-30  9:42  0%     ` [dpdk-dev] " Olivier Matz
  2 siblings, 0 replies; 200+ results
From: Sunil Kumar Kori @ 2020-06-29  8:59 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, Jerin Jacob Kollanukkaran, Neil Horman,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon

>-----Original Message-----
>From: David Marchand <david.marchand@redhat.com>
>Sent: Friday, June 26, 2020 8:18 PM
>To: dev@dpdk.org
>Cc: jerinjacobk@gmail.com; bruce.richardson@intel.com; mdr@ashroe.eu;
>thomas@monjalon.net; arybchenko@solarflare.com; ktraynor@redhat.com;
>ian.stokes@intel.com; i.maximets@ovn.org; Jerin Jacob Kollanukkaran
><jerinj@marvell.com>; Sunil Kumar Kori <skori@marvell.com>; Neil Horman
><nhorman@tuxdriver.com>; Harini Ramakrishnan
><harini.ramakrishnan@microsoft.com>; Omar Cardona
><ocardona@microsoft.com>; Pallavi Kadam <pallavi.kadam@intel.com>;
>Ranjit Menon <ranjit.menon@intel.com>
>Subject: [EXT] [PATCH v4 4/9] eal: introduce thread uninit helper
>
>External Email
>
>----------------------------------------------------------------------
>This is a preparation step for dynamically unregistering threads.
>
>Since we explicitly allocate a per thread trace buffer in rte_thread_init, add an
>internal helper to free this buffer.
>
>Signed-off-by: David Marchand <david.marchand@redhat.com>
>---
>Note: I preferred renaming the current internal function to free all threads
>trace buffers (new name trace_mem_free()) and reuse the previous name
>(trace_mem_per_thread_free()) when freeing this buffer for a given thread.
>
>Changes since v2:
>- added missing stub for windows tracing support,
>- moved free symbol to exported (experimental) ABI as a counterpart of
>  the alloc symbol we already had,
>
>Changes since v1:
>- rebased on master, removed Windows workaround wrt traces support,
>
>---
> lib/librte_eal/common/eal_common_thread.c |  9 ++++
>lib/librte_eal/common/eal_common_trace.c  | 51 +++++++++++++++++++----
> lib/librte_eal/common/eal_thread.h        |  5 +++
> lib/librte_eal/common/eal_trace.h         |  2 +-
> lib/librte_eal/include/rte_trace_point.h  |  9 ++++
> lib/librte_eal/rte_eal_version.map        |  3 ++
> lib/librte_eal/windows/eal.c              |  5 +++
> 7 files changed, 75 insertions(+), 9 deletions(-)
>
>diff --git a/lib/librte_eal/common/eal_common_thread.c
>b/lib/librte_eal/common/eal_common_thread.c
>index afb30236c5..3b30cc99d9 100644
>--- a/lib/librte_eal/common/eal_common_thread.c
>+++ b/lib/librte_eal/common/eal_common_thread.c
>@@ -20,6 +20,7 @@
> #include "eal_internal_cfg.h"
> #include "eal_private.h"
> #include "eal_thread.h"
>+#include "eal_trace.h"
>
> RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
>RTE_DEFINE_PER_LCORE(int, _thread_id) = -1; @@ -161,6 +162,14 @@
>rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset)
> 	__rte_trace_mem_per_thread_alloc();
> }
>
>+void
>+rte_thread_uninit(void)
>+{

Need to check whether trace is enabled or not similar to trace_mem_free(). 
>+	__rte_trace_mem_per_thread_free();
>+
>+	RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY; }
>+
> struct rte_thread_ctrl_params {
> 	void *(*start_routine)(void *);
> 	void *arg;

[snipped]

>2.23.0


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
  2020-06-29  7:42  3%       ` Dmitry Kozlyuk
@ 2020-06-29  8:12  0%         ` Tal Shnaiderman
  2020-06-29 23:56  0%           ` Dmitry Kozlyuk
  0 siblings, 1 reply; 200+ results
From: Tal Shnaiderman @ 2020-06-29  8:12 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Ranjit Menon
  Cc: Fady Bader, dev, Dmitry Malloy, Narcisa Ana Maria Vasile,
	Thomas Monjalon, Olivier Matz

> From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> Subject: Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
> 
> On Sun, 28 Jun 2020 23:23:11 -0700, Ranjit Menon wrote:
> > On 6/28/2020 7:20 AM, Fady Bader wrote:
> > > Hi Dmitry,
> > > I'm trying to run test-pmd on Windows and I ran into this error with
> cmdline.
> > >
> > > The error log message is :
> > > In file included from ../app/test-pmd/cmdline_flow.c:23:
> > > ..\lib\librte_cmdline/cmdline_parse_num.h:24:2: error: 'INT64'
> redeclared as different kind of symbol
> > >    INT64
> > >
> > > In file included from C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/winnt.h:150,
> > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/minwindef.h:163,
> > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/windef.h:8,
> > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/windows.h:69,
> > >                   from ..\lib/librte_eal/windows/include/rte_windows.h:22,
> > >                   from ..\lib/librte_eal/windows/include/pthread.h:20,
> > >                   from ..\lib/librte_eal/include/rte_per_lcore.h:25,
> > >                   from ..\lib/librte_eal/include/rte_errno.h:18,
> > >                   from ..\lib\librte_ethdev/rte_ethdev.h:156,
> > >                   from ../app/test-pmd/cmdline_flow.c:18:
> > > C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/basetsd.h:32:44: note: previous declaration of 'INT64' was
> here
> > >     __MINGW_EXTENSION typedef signed __int64 INT64,*PINT64;
> > >
> > > The same error is for the other types defined in cmdline_numtype.
> > >
> > > This problem with windows.h is popping in many places and some of
> > > them are cmdline and test-pmd and librte_net.
> > > We should find a way to exclude windows.h from the unneeded places,
> > > is there any suggestions on how it can be done ?
> >
> > We ran into this same issue when working with the code that is on the
> > draft repo.
> >
> > The issue is that UINT8, UINT16, INT32, INT64 etc. are reserved types
> > in Windows headers for integer types. We found that it is easier to
> > change the enum in cmdline_parse_num.h than try to play with the
> > include order of headers. AFAIK, the enums were only used to determine
> > the type in a series of switch() statements in librte_cmdline, so we
> > simply renamed the enums. Not sure, if that will be acceptable here.
> 
> +1 for renaming enum values. It's not a problem of librte_cmdline itself
> +but a
> problem of its consumption on Windows, however renaming enum values
> doesn't break ABI and winn make librte_cmdline API "namespaced".
> 
> I don't see a clean way not to expose windows.h, because pthread.h
> depends on it, and if we hide implementation, librte_eal would have to
> export pthread symbols on Windows, which is a hack (or is it?).

test_pmd redefine BOOLEAN and PATTERN in the index enum, I'm not sure how many more conflicts we will face because of this huge include.

Also, DPDK applications will inherit it unknowingly, not sure if this is common for windows libraries.

> 
> --
> Dmitry Kozlyuk

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library
  @ 2020-06-29  8:02  3% ` Ruifeng Wang
    2020-07-07 14:40  3% ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-06-29  8:02 UTC (permalink / raw)
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 129 +++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 995 insertions(+), 17 deletions(-)

-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
  @ 2020-06-29  7:42  3%       ` Dmitry Kozlyuk
  2020-06-29  8:12  0%         ` Tal Shnaiderman
  0 siblings, 1 reply; 200+ results
From: Dmitry Kozlyuk @ 2020-06-29  7:42 UTC (permalink / raw)
  To: Ranjit Menon
  Cc: Fady Bader, dev, Dmitry Malloy, Narcisa Ana Maria Vasile,
	Tal Shnaiderman, Thomas Monjalon, Olivier Matz

On Sun, 28 Jun 2020 23:23:11 -0700, Ranjit Menon wrote:
> On 6/28/2020 7:20 AM, Fady Bader wrote:
> > Hi Dmitry,
> > I'm trying to run test-pmd on Windows and I ran into this error with cmdline.
> >
> > The error log message is :
> > In file included from ../app/test-pmd/cmdline_flow.c:23:
> > ..\lib\librte_cmdline/cmdline_parse_num.h:24:2: error: 'INT64' redeclared as different kind of symbol
> >    INT64
> >
> > In file included from C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/winnt.h:150,
> >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/minwindef.h:163,
> >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/windef.h:8,
> >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/windows.h:69,
> >                   from ..\lib/librte_eal/windows/include/rte_windows.h:22,
> >                   from ..\lib/librte_eal/windows/include/pthread.h:20,
> >                   from ..\lib/librte_eal/include/rte_per_lcore.h:25,
> >                   from ..\lib/librte_eal/include/rte_errno.h:18,
> >                   from ..\lib\librte_ethdev/rte_ethdev.h:156,
> >                   from ../app/test-pmd/cmdline_flow.c:18:
> > C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/basetsd.h:32:44: note: previous declaration of 'INT64' was here
> >     __MINGW_EXTENSION typedef signed __int64 INT64,*PINT64;
> >
> > The same error is for the other types defined in cmdline_numtype.
> >
> > This problem with windows.h is popping in many places and some of them are
> > cmdline and test-pmd and librte_net.
> > We should find a way to exclude windows.h from the unneeded places, is there any
> > suggestions on how it can be done ?  
> 
> We ran into this same issue when working with the code that is on the 
> draft repo.
> 
> The issue is that UINT8, UINT16, INT32, INT64 etc. are reserved types in 
> Windows headers for integer types. We found that it is easier to change 
> the enum in cmdline_parse_num.h than try to play with the include order 
> of headers. AFAIK, the enums were only used to determine the type in a 
> series of switch() statements in librte_cmdline, so we simply renamed 
> the enums. Not sure, if that will be acceptable here.

+1 for renaming enum values. It's not a problem of librte_cmdline itself but a
problem of its consumption on Windows, however renaming enum values doesn't
break ABI and winn make librte_cmdline API "namespaced".

I don't see a clean way not to expose windows.h, because pthread.h depends on
it, and if we hide implementation, librte_eal would have to export pthread
symbols on Windows, which is a hack (or is it?).

-- 
Dmitry Kozlyuk

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ethdev: add fragment attribute to IPv6 item
  @ 2020-06-28 14:52  0%             ` Dekel Peled
  0 siblings, 0 replies; 200+ results
From: Dekel Peled @ 2020-06-28 14:52 UTC (permalink / raw)
  To: Adrien Mazarguil, Ori Kam, Andrew Rybchenko
  Cc: ferruh.yigit, john.mcnamara, marko.kovacevic, Asaf Penso,
	Matan Azrad, Eli Britstein, dev, Ivan Malov

Hi,

This change is proposed for 20.11.
It is suggested after internal discussions, where multiple suggestions were considered, some of them similar to the ones suggested below. 
Continuing the earlier correspondence in this thread, please send any other comments/suggestions you have.

Regards,
Dekel

> -----Original Message-----
> From: Dekel Peled <dekelp@mellanox.com>
> Sent: Thursday, June 18, 2020 9:59 AM
> To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Ori Kam
> <orika@mellanox.com>; Andrew Rybchenko <arybchenko@solarflare.com>
> Cc: ferruh.yigit@intel.com; john.mcnamara@intel.com;
> marko.kovacevic@intel.com; Asaf Penso <asafp@mellanox.com>; Matan
> Azrad <matan@mellanox.com>; Eli Britstein <elibr@mellanox.com>;
> dev@dpdk.org; Ivan Malov <Ivan.Malov@oktetlabs.ru>
> Subject: RE: [RFC] ethdev: add fragment attribute to IPv6 item
> 
> Hi,
> 
> Kind reminder, please respond on the recent correspondence so we can
> conclude this issue.
> 
> Regards,
> Dekel
> 
> > -----Original Message-----
> > From: Dekel Peled <dekelp@mellanox.com>
> > Sent: Wednesday, June 3, 2020 3:11 PM
> > To: Ori Kam <orika@mellanox.com>; Adrien Mazarguil
> > <adrien.mazarguil@6wind.com>
> > Cc: Andrew Rybchenko <arybchenko@solarflare.com>;
> > ferruh.yigit@intel.com; john.mcnamara@intel.com;
> > marko.kovacevic@intel.com; Asaf Penso <asafp@mellanox.com>; Matan
> > Azrad <matan@mellanox.com>; Eli Britstein <elibr@mellanox.com>;
> > dev@dpdk.org; Ivan Malov <Ivan.Malov@oktetlabs.ru>
> > Subject: RE: [RFC] ethdev: add fragment attribute to IPv6 item
> >
> > Hi, PSB.
> >
> > > -----Original Message-----
> > > From: Ori Kam <orika@mellanox.com>
> > > Sent: Wednesday, June 3, 2020 11:16 AM
> > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Dekel Peled
> > > <dekelp@mellanox.com>; ferruh.yigit@intel.com;
> > > john.mcnamara@intel.com; marko.kovacevic@intel.com; Asaf Penso
> > > <asafp@mellanox.com>; Matan Azrad <matan@mellanox.com>; Eli
> > Britstein
> > > <elibr@mellanox.com>; dev@dpdk.org; Ivan Malov
> > > <Ivan.Malov@oktetlabs.ru>
> > > Subject: RE: [RFC] ethdev: add fragment attribute to IPv6 item
> > >
> > > Hi Adrien,
> > >
> > > Great to hear from you again.
> > >
> > > > -----Original Message-----
> > > > From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > Sent: Tuesday, June 2, 2020 10:04 PM
> > > > To: Ori Kam <orika@mellanox.com>
> > > > Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Dekel Peled
> > > > <dekelp@mellanox.com>; ferruh.yigit@intel.com;
> > > > john.mcnamara@intel.com; marko.kovacevic@intel.com; Asaf Penso
> > > > <asafp@mellanox.com>; Matan Azrad <matan@mellanox.com>; Eli
> > > Britstein
> > > > <elibr@mellanox.com>; dev@dpdk.org; Ivan Malov
> > > > <Ivan.Malov@oktetlabs.ru>
> > > > Subject: Re: [RFC] ethdev: add fragment attribute to IPv6 item
> > > >
> > > > Hi Ori, Andrew, Delek,
> >
> > It's Dekel, not Delek ;-)
> >
> > > >
> > > > (been a while eh?)
> > > >
> > > > On Tue, Jun 02, 2020 at 06:28:41PM +0000, Ori Kam wrote:
> > > > > Hi Andrew,
> > > > >
> > > > > PSB,
> > > > [...]
> > > > > > > diff --git a/lib/librte_ethdev/rte_flow.h
> > > > > > > b/lib/librte_ethdev/rte_flow.h index b0e4199..3bc8ce1 100644
> > > > > > > --- a/lib/librte_ethdev/rte_flow.h
> > > > > > > +++ b/lib/librte_ethdev/rte_flow.h
> > > > > > > @@ -787,6 +787,8 @@ struct rte_flow_item_ipv4 {
> > > > > > >   */
> > > > > > >  struct rte_flow_item_ipv6 {
> > > > > > >  	struct rte_ipv6_hdr hdr; /**< IPv6 header definition. */
> > > > > > > +	uint32_t is_frag:1; /**< Is IPv6 packet fragmented/non-
> > > fragmented. */
> > > > > > > +	uint32_t reserved:31; /**< Reserved, must be zero. */
> > > > > >
> > > > > > The solution is simple, but hardly generic and adds an example
> > > > > > for the future extensions. I doubt that it is a right way to go.
> > > > > >
> > > > > I agree with you that this is not the most generic way possible,
> > > > > but the IPV6 extensions are very unique. So the solution is also
> unique.
> > > > > In general, I'm always in favor of finding the most generic way,
> > > > > but
> > > > sometimes
> > > > > it is better to keep things simple, and see how it goes.
> > > >
> > > > Same feeling here, it doesn't look right.
> > > >
> > > > > > May be we should add 256-bit string with one bit for each IP
> > > > > > protocol number and apply it to extension headers only?
> > > > > > If bit A is set in the mask:
> > > > > >  - if bit A is set in spec as well, extension header with
> > > > > >    IP protocol (1 << A) number must present
> > > > > >  - if bit A is clear in spec, extension header with
> > > > > >    IP protocol (1 << A) number must absent If bit is clear in
> > > > > > the mask, corresponding extension header may present and may
> > > > > > absent (i.e. don't care).
> > > > > >
> > > > > There are only 12 possible extension headers and currently none
> > > > > of them are supported in rte_flow. So adding a logic to parse
> > > > > the 256 just to get a max
> > > > of 12
> > > > > possible values is an overkill. Also, if we disregard the case
> > > > > of the extension, the application must select only one next proto.
> > > > > For example, the application can't select udp + tcp. There is
> > > > > the option to add a flag for each of the possible extensions,
> > > > > does it makes more
> > > sense to you?
> > > >
> > > > Each of these extension headers has its own structure, we first
> > > > need the ability to match them properly by adding the necessary
> pattern items.
> > > >
> > > > > > The RFC indirectly touches IPv6 proto (next header) matching
> > > > > > logic.
> > > > > >
> > > > > > If logic used in ETH+VLAN is applied on IPv6 as well, it would
> > > > > > make pattern specification and handling complicated. E.g.:
> > > > > >   eth / ipv6 / udp / end
> > > > > > should match UDP over IPv6 without any extension headers only.
> > > > > >
> > > > > The issue with VLAN I agree is different since by definition
> > > > > VLAN is layer 2.5. We can add the same logic also to the VLAN
> > > > > case, maybe it will be easier.
> > > > > In any case, in your example above and according to the RFC we
> > > > > will get all ipv6 udp traffic with and without extensions.
> > > > >
> > > > > > And how to specify UPD over IPv6 regardless extension headers?
> > > > >
> > > > > Please see above the rule will be eth / ipv6 /udp.
> > > > >
> > > > > >   eth / ipv6 / ipv6_ext / udp / end with a convention that
> > > > > > ipv6_ext is optional if spec and mask are NULL (or mask is empty).
> > > > > >
> > > > > I would guess that this flow should match all ipv6 that has one
> > > > > ext and the
> > > > next
> > > > > proto is udp.
> > > >
> > > > In my opinion RTE_FLOW_ITEM_TYPE_IPV6_EXT is a bit useless on its
> > own.
> > > > It's only for matching packets that contain some kind of extension
> > > > header, not a specific one, more about that below.
> > > >
> > > > > > I'm wondering if any driver treats it this way?
> > > > > >
> > > > > I'm not sure, we can support only the frag ext by default, but
> > > > > if required we
> > > > can support other
> > > > > ext.
> > > > >
> > > > > > I agree that the problem really comes when we'd like match
> > > > > > IPv6 frags or even worse not fragments.
> > > > > >
> > > > > > Two patterns for fragments:
> > > > > >   eth / ipv6 (proto=FRAGMENT) / end
> > > > > >   eth / ipv6 / ipv6_ext (next_hdr=FRAGMENT) / end
> > > > > >
> > > > > > Any sensible solution for not-fragments with any other
> > > > > > extension headers?
> > > > > >
> > > > > The one propose in this mail 😊
> > > > >
> > > > > > INVERT exists, but hardly useful, since it simply says that
> > > > > > patches which do not match pattern without INVERT matches the
> > > > > > pattern with INVERT and
> > > > > >   invert / eth / ipv6 (proto=FRAGMENT) / end will match ARP,
> > > > > > IPv4,
> > > > > > IPv6 with an extension header before fragment header and so on.
> > > > > >
> > > > > I agree with you, INVERT in this doesn’t help.
> > > > > We were considering adding some kind of not mask / item per item.
> > > > > some think around this line:
> > > > > user request ipv6 unfragmented udp packets. The flow would look
> > > > > something like this:
> > > > > Eth / ipv6 / Not (Ipv6.proto = frag_proto) / udp But it makes
> > > > > the rules much harder to use, and I don't think that there is
> > > > > any HW that support not, and adding such feature to all items is
> overkill.
> > > > >
> > > > >
> > > > > > Bit string suggested above will allow to match:
> > > > > >  - UDP over IPv6 with any extension headers:
> > > > > >     eth / ipv6 (ext_hdrs mask empty) / udp / end
> > > > > >  - UDP over IPv6 without any extension headers:
> > > > > >     eth / ipv6 (ext_hdrs mask full, spec empty) / udp / end
> > > > > >  - UDP over IPv6 without fragment header:
> > > > > >     eth / ipv6 (ext.spec & ~FRAGMENT, ext.mask | FRAGMENT) /
> > > > > > udp / end
> > > > > >  - UDP over IPv6 with fragment header
> > > > > >     eth / ipv6 (ext.spec | FRAGMENT, ext.mask | FRAGMENT) /
> > > > > > udp / end
> > > > > >
> > > > > > where FRAGMENT is 1 << IPPROTO_FRAGMENT.
> > > > > >
> > > > > Please see my response regarding this above.
> > > > >
> > > > > > Above I intentionally keep 'proto' unspecified in ipv6 since
> > > > > > otherwise it would specify the next header after IPv6 header.
> > > > > >
> > > > > > Extension headers mask should be empty by default.
> > > >
> > > > This is a deliberate design choice/issue with rte_flow: an empty
> > > > pattern matches everything; adding items only narrows the selection.
> > > > As Andrew said there is currently no way to provide a specific
> > > > item to reject, it can only be done globally on a pattern through
> > > > INVERT that no
> > > PMD implements so far.
> > > >
> > > > So we have two requirements here: the ability to specifically
> > > > match
> > > > IPv6 fragment headers and the ability to reject them.
> > > >
> > > > To match IPv6 fragment headers, we need a dedicated pattern item.
> > > > The generic RTE_FLOW_ITEM_TYPE_IPV6_EXT is useless for that on its
> > > > own, it must be completed with
> RTE_FLOW_ITEM_TYPE_IPV6_EXT_FRAG
> > and
> > > associated
> > > > object
> > >
> > > Yes, we must add EXT_FRAG to be able to match on the FRAG bits.
> > >
> >
> > Please see previous RFC I sent.
> > [RFC] ethdev: add IPv6 fragment extension header item
> > http://mails.dpdk.org/archives/dev/2020-March/160255.html
> > It is complemented by this RFC.
> >
> > > > to match individual fields if needed (like all the others
> > > > protocols/headers).
> > > >
> > > > Then to reject a pattern item... My preference goes to a new "NOT"
> > > > meta item affecting the meaning of the item coming immediately
> > > > after in the pattern list. That would be ultra generic, wouldn't
> > > > break any ABI/API and like INVERT, wouldn't even require a new
> > > > object associated
> > > with it.
> > > >
> > > > To match UDPv6 traffic when there is no fragment header, one could
> > > > then do something like:
> > > >
> > > >  eth / ipv6 / not / ipv6_ext_frag / udp
> > > >
> > > > PMD support would be trivial to implement (I'm sure!)
> > > >
> > > I agree with you as I said above. The issue is not PMD, the issues are:
> > > 1. think about the rule you stated above from logic point there is
> > > some contradiction, you are saying ipv6 next proto udp but you also
> > > say not frag, this is logic only for IPV6 ext.
> > > 2. HW issue, I don't know of HW that knows how to support not on an
> item.
> > > So adding something for all items for only one case is overkill.
> > >
> > >
> > >
> > > > We may later implement other kinds of "operator" items as Andrew
> > > > suggested, for bit-wise stuff and so on. Let's keep adding
> > > > features on a needed basis though.
> > > >
> > > > --
> > > > Adrien Mazarguil
> > > > 6WIND
> > >
> > > Best,
> > > Ori

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  @ 2020-06-27  7:44  5%   ` Jerin Jacob
  2020-06-29 19:30  4%     ` McDaniel, Timothy
  2020-06-30 11:22  0%     ` Kinsella, Ray
  0 siblings, 2 replies; 200+ results
From: Jerin Jacob @ 2020-06-27  7:44 UTC (permalink / raw)
  To: Tim McDaniel, Ray Kinsella, Neil Horman
  Cc: Jerin Jacob, Mattias Rönnblom, dpdk-dev, Gage Eads,
	Van Haaren, Harry

> +
> +/** Event port configuration structure */
> +struct rte_event_port_conf_v20 {
> +       int32_t new_event_threshold;
> +       /**< A backpressure threshold for new event enqueues on this port.
> +        * Use for *closed system* event dev where event capacity is limited,
> +        * and cannot exceed the capacity of the event dev.
> +        * Configuring ports with different thresholds can make higher priority
> +        * traffic less likely to  be backpressured.
> +        * For example, a port used to inject NIC Rx packets into the event dev
> +        * can have a lower threshold so as not to overwhelm the device,
> +        * while ports used for worker pools can have a higher threshold.
> +        * This value cannot exceed the *nb_events_limit*
> +        * which was previously supplied to rte_event_dev_configure().
> +        * This should be set to '-1' for *open system*.
> +        */
> +       uint16_t dequeue_depth;
> +       /**< Configure number of bulk dequeues for this event port.
> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> +        * which previously supplied to rte_event_dev_configure().
> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> +        */
> +       uint16_t enqueue_depth;
> +       /**< Configure number of bulk enqueues for this event port.
> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> +        * which previously supplied to rte_event_dev_configure().
> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> +        */
>         uint8_t disable_implicit_release;
>         /**< Configure the port not to release outstanding events in
>          * rte_event_dev_dequeue_burst(). If true, all events received through
> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>                                 struct rte_event_port_conf *port_conf);
>
> +int
> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> +                               struct rte_event_port_conf_v20 *port_conf);
> +
> +int
> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> +                                     struct rte_event_port_conf *port_conf);

Hi Timothy,

+ ABI Maintainers (Ray, Neil)

# As per my understanding, the structures can not be versioned, only
function can be versioned.
i.e we can not make any change to " struct rte_event_port_conf"

# We have a similar case with ethdev and it deferred to next release v20.11
http://patches.dpdk.org/patch/69113/

Regarding the API changes:
# The slow path changes general looks good to me. I will review the
next level in the coming days
# The following fast path changes bothers to me. Could you share more
details on below change?

diff --git a/app/test-eventdev/test_order_atq.c
b/app/test-eventdev/test_order_atq.c
index 3366cfc..8246b96 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -34,6 +34,8 @@
                        continue;
                }

+               ev.flow_id = ev.mbuf->udata64;
+
# Since RC1 is near, I am not sure how to accommodate the API changes
now and sort out ABI stuffs.
# Other concern is eventdev spec get bloated with versioning files
just for ONE release as 20.11 will be OK to change the ABI.
# While we discuss the API change, Please send deprecation notice for
ABI change for 20.11,
so that there is no ambiguity of this patch for the 20.11 release.

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH 08/27] event/dlb: add definitions shared with LKM or shared code
  2020-06-27  4:37  2% [dpdk-dev] [PATCH 00/27] event/dlb Intel DLB PMD Tim McDaniel
    2020-06-27  4:37  1% ` [dpdk-dev] [PATCH 03/27] event/dlb: add shared code version 10.7.9 Tim McDaniel
@ 2020-06-27  4:37  1% ` Tim McDaniel
  2 siblings, 0 replies; 200+ results
From: Tim McDaniel @ 2020-06-27  4:37 UTC (permalink / raw)
  To: jerinj
  Cc: mattias.ronnblom, dev, gage.eads, harry.van.haaren, McDaniel, Timothy

From: "McDaniel, Timothy" <timothy.mcdaniel@intel.com>

Signed-off-by: McDaniel, Timothy <timothy.mcdaniel@intel.com>
---
 drivers/event/dlb/dlb_user.h | 1083 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1083 insertions(+)
 create mode 100644 drivers/event/dlb/dlb_user.h

diff --git a/drivers/event/dlb/dlb_user.h b/drivers/event/dlb/dlb_user.h
new file mode 100644
index 0000000..73b601b
--- /dev/null
+++ b/drivers/event/dlb/dlb_user.h
@@ -0,0 +1,1083 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_USER_H
+#define __DLB_USER_H
+
+#define DLB_MAX_NAME_LEN 64
+
+#include <linux/types.h>
+
+enum dlb_error {
+	DLB_ST_SUCCESS = 0,
+	DLB_ST_NAME_EXISTS,
+	DLB_ST_DOMAIN_UNAVAILABLE,
+	DLB_ST_LDB_PORTS_UNAVAILABLE,
+	DLB_ST_DIR_PORTS_UNAVAILABLE,
+	DLB_ST_LDB_QUEUES_UNAVAILABLE,
+	DLB_ST_LDB_CREDITS_UNAVAILABLE,
+	DLB_ST_DIR_CREDITS_UNAVAILABLE,
+	DLB_ST_LDB_CREDIT_POOLS_UNAVAILABLE,
+	DLB_ST_DIR_CREDIT_POOLS_UNAVAILABLE,
+	DLB_ST_SEQUENCE_NUMBERS_UNAVAILABLE,
+	DLB_ST_INVALID_DOMAIN_ID,
+	DLB_ST_INVALID_QID_INFLIGHT_ALLOCATION,
+	DLB_ST_ATOMIC_INFLIGHTS_UNAVAILABLE,
+	DLB_ST_HIST_LIST_ENTRIES_UNAVAILABLE,
+	DLB_ST_INVALID_LDB_CREDIT_POOL_ID,
+	DLB_ST_INVALID_DIR_CREDIT_POOL_ID,
+	DLB_ST_INVALID_POP_COUNT_VIRT_ADDR,
+	DLB_ST_INVALID_LDB_QUEUE_ID,
+	DLB_ST_INVALID_CQ_DEPTH,
+	DLB_ST_INVALID_CQ_VIRT_ADDR,
+	DLB_ST_INVALID_PORT_ID,
+	DLB_ST_INVALID_QID,
+	DLB_ST_INVALID_PRIORITY,
+	DLB_ST_NO_QID_SLOTS_AVAILABLE,
+	DLB_ST_QED_FREELIST_ENTRIES_UNAVAILABLE,
+	DLB_ST_DQED_FREELIST_ENTRIES_UNAVAILABLE,
+	DLB_ST_INVALID_DIR_QUEUE_ID,
+	DLB_ST_DIR_QUEUES_UNAVAILABLE,
+	DLB_ST_INVALID_LDB_CREDIT_LOW_WATERMARK,
+	DLB_ST_INVALID_LDB_CREDIT_QUANTUM,
+	DLB_ST_INVALID_DIR_CREDIT_LOW_WATERMARK,
+	DLB_ST_INVALID_DIR_CREDIT_QUANTUM,
+	DLB_ST_DOMAIN_NOT_CONFIGURED,
+	DLB_ST_PID_ALREADY_ATTACHED,
+	DLB_ST_PID_NOT_ATTACHED,
+	DLB_ST_INTERNAL_ERROR,
+	DLB_ST_DOMAIN_IN_USE,
+	DLB_ST_IOMMU_MAPPING_ERROR,
+	DLB_ST_FAIL_TO_PIN_MEMORY_PAGE,
+	DLB_ST_UNABLE_TO_PIN_POPCOUNT_PAGES,
+	DLB_ST_UNABLE_TO_PIN_CQ_PAGES,
+	DLB_ST_DISCONTIGUOUS_CQ_MEMORY,
+	DLB_ST_DISCONTIGUOUS_POP_COUNT_MEMORY,
+	DLB_ST_DOMAIN_STARTED,
+	DLB_ST_LARGE_POOL_NOT_SPECIFIED,
+	DLB_ST_SMALL_POOL_NOT_SPECIFIED,
+	DLB_ST_NEITHER_POOL_SPECIFIED,
+	DLB_ST_DOMAIN_NOT_STARTED,
+	DLB_ST_INVALID_MEASUREMENT_DURATION,
+	DLB_ST_INVALID_PERF_METRIC_GROUP_ID,
+	DLB_ST_LDB_PORT_REQUIRED_FOR_LDB_QUEUES,
+	DLB_ST_DOMAIN_RESET_FAILED,
+	DLB_ST_MBOX_ERROR,
+	DLB_ST_INVALID_HIST_LIST_DEPTH,
+	DLB_ST_NO_MEMORY,
+};
+
+static const char dlb_error_strings[][128] = {
+	"DLB_ST_SUCCESS",
+	"DLB_ST_NAME_EXISTS",
+	"DLB_ST_DOMAIN_UNAVAILABLE",
+	"DLB_ST_LDB_PORTS_UNAVAILABLE",
+	"DLB_ST_DIR_PORTS_UNAVAILABLE",
+	"DLB_ST_LDB_QUEUES_UNAVAILABLE",
+	"DLB_ST_LDB_CREDITS_UNAVAILABLE",
+	"DLB_ST_DIR_CREDITS_UNAVAILABLE",
+	"DLB_ST_LDB_CREDIT_POOLS_UNAVAILABLE",
+	"DLB_ST_DIR_CREDIT_POOLS_UNAVAILABLE",
+	"DLB_ST_SEQUENCE_NUMBERS_UNAVAILABLE",
+	"DLB_ST_INVALID_DOMAIN_ID",
+	"DLB_ST_INVALID_QID_INFLIGHT_ALLOCATION",
+	"DLB_ST_ATOMIC_INFLIGHTS_UNAVAILABLE",
+	"DLB_ST_HIST_LIST_ENTRIES_UNAVAILABLE",
+	"DLB_ST_INVALID_LDB_CREDIT_POOL_ID",
+	"DLB_ST_INVALID_DIR_CREDIT_POOL_ID",
+	"DLB_ST_INVALID_POP_COUNT_VIRT_ADDR",
+	"DLB_ST_INVALID_LDB_QUEUE_ID",
+	"DLB_ST_INVALID_CQ_DEPTH",
+	"DLB_ST_INVALID_CQ_VIRT_ADDR",
+	"DLB_ST_INVALID_PORT_ID",
+	"DLB_ST_INVALID_QID",
+	"DLB_ST_INVALID_PRIORITY",
+	"DLB_ST_NO_QID_SLOTS_AVAILABLE",
+	"DLB_ST_QED_FREELIST_ENTRIES_UNAVAILABLE",
+	"DLB_ST_DQED_FREELIST_ENTRIES_UNAVAILABLE",
+	"DLB_ST_INVALID_DIR_QUEUE_ID",
+	"DLB_ST_DIR_QUEUES_UNAVAILABLE",
+	"DLB_ST_INVALID_LDB_CREDIT_LOW_WATERMARK",
+	"DLB_ST_INVALID_LDB_CREDIT_QUANTUM",
+	"DLB_ST_INVALID_DIR_CREDIT_LOW_WATERMARK",
+	"DLB_ST_INVALID_DIR_CREDIT_QUANTUM",
+	"DLB_ST_DOMAIN_NOT_CONFIGURED",
+	"DLB_ST_PID_ALREADY_ATTACHED",
+	"DLB_ST_PID_NOT_ATTACHED",
+	"DLB_ST_INTERNAL_ERROR",
+	"DLB_ST_DOMAIN_IN_USE",
+	"DLB_ST_IOMMU_MAPPING_ERROR",
+	"DLB_ST_FAIL_TO_PIN_MEMORY_PAGE",
+	"DLB_ST_UNABLE_TO_PIN_POPCOUNT_PAGES",
+	"DLB_ST_UNABLE_TO_PIN_CQ_PAGES",
+	"DLB_ST_DISCONTIGUOUS_CQ_MEMORY",
+	"DLB_ST_DISCONTIGUOUS_POP_COUNT_MEMORY",
+	"DLB_ST_DOMAIN_STARTED",
+	"DLB_ST_LARGE_POOL_NOT_SPECIFIED",
+	"DLB_ST_SMALL_POOL_NOT_SPECIFIED",
+	"DLB_ST_NEITHER_POOL_SPECIFIED",
+	"DLB_ST_DOMAIN_NOT_STARTED",
+	"DLB_ST_INVALID_MEASUREMENT_DURATION",
+	"DLB_ST_INVALID_PERF_METRIC_GROUP_ID",
+	"DLB_ST_LDB_PORT_REQUIRED_FOR_LDB_QUEUES",
+	"DLB_ST_DOMAIN_RESET_FAILED",
+	"DLB_ST_MBOX_ERROR",
+	"DLB_ST_INVALID_HIST_LIST_DEPTH",
+	"DLB_ST_NO_MEMORY",
+};
+
+struct dlb_cmd_response {
+	__u32 status; /* Interpret using enum dlb_error */
+	__u32 id;
+};
+
+/******************************/
+/* 'dlb' device file commands */
+/******************************/
+
+#define DLB_DEVICE_VERSION(x) (((x) >> 8) & 0xFF)
+#define DLB_DEVICE_REVISION(x) ((x) & 0xFF)
+
+enum dlb_revisions {
+	DLB_REV_A0 = 0,
+	DLB_REV_A1 = 1,
+	DLB_REV_A2 = 2,
+	DLB_REV_A3 = 3,
+	DLB_REV_B0 = 4,
+};
+
+/*
+ * DLB_CMD_GET_DEVICE_VERSION: Query the DLB device version.
+ *
+ *	This ioctl interface is the same in all driver versions and is always
+ *	the first ioctl.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id[7:0]: Device revision.
+ *	response.id[15:8]: Device version.
+ */
+
+struct dlb_get_device_version_args {
+	/* Output parameters */
+	__u64 response;
+};
+
+#define DLB_VERSION_MAJOR_NUMBER 10
+#define DLB_VERSION_MINOR_NUMBER 7
+#define DLB_VERSION_REVISION_NUMBER 9
+#define DLB_VERSION (DLB_VERSION_MAJOR_NUMBER << 24 | \
+		     DLB_VERSION_MINOR_NUMBER << 16 | \
+		     DLB_VERSION_REVISION_NUMBER)
+
+#define DLB_VERSION_GET_MAJOR_NUMBER(x) (((x) >> 24) & 0xFF)
+#define DLB_VERSION_GET_MINOR_NUMBER(x) (((x) >> 16) & 0xFF)
+#define DLB_VERSION_GET_REVISION_NUMBER(x) ((x) & 0xFFFF)
+
+static inline __u8 dlb_version_incompatible(__u32 version)
+{
+	__u8 inc;
+
+	inc = DLB_VERSION_GET_MAJOR_NUMBER(version) != DLB_VERSION_MAJOR_NUMBER;
+	inc |= (int)DLB_VERSION_GET_MINOR_NUMBER(version) <
+		DLB_VERSION_MINOR_NUMBER;
+
+	return inc;
+}
+
+/*
+ * DLB_CMD_GET_DRIVER_VERSION: Query the DLB driver version. The major number
+ *	is changed when there is an ABI-breaking change, the minor number is
+ *	changed if the API is changed in a backwards-compatible way, and the
+ *	revision number is changed for fixes that don't affect the API.
+ *
+ *	If the kernel driver's API version major number and the header's
+ *	DLB_VERSION_MAJOR_NUMBER differ, the two are incompatible, or if the
+ *	major numbers match but the kernel driver's minor number is less than
+ *	the header file's, they are incompatible. The DLB_VERSION_INCOMPATIBLE
+ *	macro should be used to check for compatibility.
+ *
+ *	This ioctl interface is the same in all driver versions. Applications
+ *	should check the driver version before performing any other ioctl
+ *	operations.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Driver API version. Use the DLB_VERSION_GET_MAJOR_NUMBER,
+ *		DLB_VERSION_GET_MINOR_NUMBER, and
+ *		DLB_VERSION_GET_REVISION_NUMBER macros to interpret the field.
+ */
+
+struct dlb_get_driver_version_args {
+	/* Output parameters */
+	__u64 response;
+};
+
+/*
+ * DLB_CMD_CREATE_SCHED_DOMAIN: Create a DLB scheduling domain and reserve the
+ *	resources (queues, ports, etc.) that it contains.
+ *
+ * Input parameters:
+ * - num_ldb_queues: Number of load-balanced queues.
+ * - num_ldb_ports: Number of load-balanced ports.
+ * - num_dir_ports: Number of directed ports. A directed port has one directed
+ *	queue, so no num_dir_queues argument is necessary.
+ * - num_atomic_inflights: This specifies the amount of temporary atomic QE
+ *	storage for the domain. This storage is divided among the domain's
+ *	load-balanced queues that are configured for atomic scheduling.
+ * - num_hist_list_entries: Amount of history list storage. This is divided
+ *	among the domain's CQs.
+ * - num_ldb_credits: Amount of load-balanced QE storage (QED). QEs occupy this
+ *	space until they are scheduled to a load-balanced CQ. One credit
+ *	represents the storage for one QE.
+ * - num_dir_credits: Amount of directed QE storage (DQED). QEs occupy this
+ *	space until they are scheduled to a directed CQ. One credit represents
+ *	the storage for one QE.
+ * - num_ldb_credit_pools: Number of pools into which the load-balanced credits
+ *	are placed.
+ * - num_dir_credit_pools: Number of pools into which the directed credits are
+ *	placed.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: domain ID.
+ */
+struct dlb_create_sched_domain_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 num_ldb_queues;
+	__u32 num_ldb_ports;
+	__u32 num_dir_ports;
+	__u32 num_atomic_inflights;
+	__u32 num_hist_list_entries;
+	__u32 num_ldb_credits;
+	__u32 num_dir_credits;
+	__u32 num_ldb_credit_pools;
+	__u32 num_dir_credit_pools;
+};
+
+/*
+ * DLB_CMD_GET_NUM_RESOURCES: Return the number of available resources
+ *	(queues, ports, etc.) that this device owns.
+ *
+ * Output parameters:
+ * - num_domains: Number of available scheduling domains.
+ * - num_ldb_queues: Number of available load-balanced queues.
+ * - num_ldb_ports: Number of available load-balanced ports.
+ * - num_dir_ports: Number of available directed ports. There is one directed
+ *	queue for every directed port.
+ * - num_atomic_inflights: Amount of available temporary atomic QE storage.
+ * - max_contiguous_atomic_inflights: When a domain is created, the temporary
+ *	atomic QE storage is allocated in a contiguous chunk. This return value
+ *	is the longest available contiguous range of atomic QE storage.
+ * - num_hist_list_entries: Amount of history list storage.
+ * - max_contiguous_hist_list_entries: History list storage is allocated in
+ *	a contiguous chunk, and this return value is the longest available
+ *	contiguous range of history list entries.
+ * - num_ldb_credits: Amount of available load-balanced QE storage.
+ * - max_contiguous_ldb_credits: QED storage is allocated in a contiguous
+ *	chunk, and this return value is the longest available contiguous range
+ *	of load-balanced credit storage.
+ * - num_dir_credits: Amount of available directed QE storage.
+ * - max_contiguous_dir_credits: DQED storage is allocated in a contiguous
+ *	chunk, and this return value is the longest available contiguous range
+ *	of directed credit storage.
+ * - num_ldb_credit_pools: Number of available load-balanced credit pools.
+ * - num_dir_credit_pools: Number of available directed credit pools.
+ * - padding0: Reserved for future use.
+ */
+struct dlb_get_num_resources_args {
+	/* Output parameters */
+	__u32 num_sched_domains;
+	__u32 num_ldb_queues;
+	__u32 num_ldb_ports;
+	__u32 num_dir_ports;
+	__u32 num_atomic_inflights;
+	__u32 max_contiguous_atomic_inflights;
+	__u32 num_hist_list_entries;
+	__u32 max_contiguous_hist_list_entries;
+	__u32 num_ldb_credits;
+	__u32 max_contiguous_ldb_credits;
+	__u32 num_dir_credits;
+	__u32 max_contiguous_dir_credits;
+	__u32 num_ldb_credit_pools;
+	__u32 num_dir_credit_pools;
+	__u32 padding0;
+};
+
+/*
+ * DLB_CMD_SET_SN_ALLOCATION: Configure a sequence number group
+ *
+ * Input parameters:
+ * - group: Sequence number group ID.
+ * - num: Number of sequence numbers per queue.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_set_sn_allocation_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 group;
+	__u32 num;
+};
+
+/*
+ * DLB_CMD_GET_SN_ALLOCATION: Get a sequence number group's configuration
+ *
+ * Input parameters:
+ * - group: Sequence number group ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Specified group's number of sequence numbers per queue.
+ */
+struct dlb_get_sn_allocation_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 group;
+	__u32 padding0;
+};
+
+enum dlb_cq_poll_modes {
+	DLB_CQ_POLL_MODE_STD,
+	DLB_CQ_POLL_MODE_SPARSE,
+
+	/* NUM_DLB_CQ_POLL_MODE must be last */
+	NUM_DLB_CQ_POLL_MODE,
+};
+
+/*
+ * DLB_CMD_QUERY_CQ_POLL_MODE: Query the CQ poll mode the kernel driver is using
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: CQ poll mode (see enum dlb_cq_poll_modes).
+ */
+struct dlb_query_cq_poll_mode_args {
+	/* Output parameters */
+	__u64 response;
+};
+
+/*
+ * DLB_CMD_GET_SN_OCCUPANCY: Get a sequence number group's occupancy
+ *
+ * Each sequence number group has one or more slots, depending on its
+ * configuration. I.e.:
+ * - If configured for 1024 sequence numbers per queue, the group has 1 slot
+ * - If configured for 512 sequence numbers per queue, the group has 2 slots
+ *   ...
+ * - If configured for 32 sequence numbers per queue, the group has 32 slots
+ *
+ * This ioctl returns the group's number of in-use slots. If its occupancy is
+ * 0, the group's sequence number allocation can be reconfigured.
+ *
+ * Input parameters:
+ * - group: Sequence number group ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Specified group's number of used slots.
+ */
+struct dlb_get_sn_occupancy_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 group;
+	__u32 padding0;
+};
+
+enum dlb_user_interface_commands {
+	DLB_CMD_GET_DEVICE_VERSION,
+	DLB_CMD_CREATE_SCHED_DOMAIN,
+	DLB_CMD_GET_NUM_RESOURCES,
+	DLB_CMD_GET_DRIVER_VERSION,
+	DLB_CMD_SAMPLE_PERF_COUNTERS,
+	DLB_CMD_SET_SN_ALLOCATION,
+	DLB_CMD_GET_SN_ALLOCATION,
+	DLB_CMD_MEASURE_SCHED_COUNTS,
+	DLB_CMD_QUERY_CQ_POLL_MODE,
+	DLB_CMD_GET_SN_OCCUPANCY,
+
+	/* NUM_DLB_CMD must be last */
+	NUM_DLB_CMD,
+};
+
+/*******************************/
+/* 'domain' device file alerts */
+/*******************************/
+
+/* Scheduling domain device files can be read to receive domain-specific
+ * notifications, for alerts such as hardware errors.
+ *
+ * Each alert is encoded in a 16B message. The first 8B contains the alert ID,
+ * and the second 8B is optional and contains additional information.
+ * Applications should cast read data to a struct dlb_domain_alert, and
+ * interpret the struct's alert_id according to dlb_domain_alert_id. The read
+ * length must be 16B, or the function will return -EINVAL.
+ *
+ * Reads are destructive, and in the case of multiple file descriptors for the
+ * same domain device file, an alert will be read by only one of the file
+ * descriptors.
+ *
+ * The driver stores alerts in a fixed-size alert ring until they are read. If
+ * the alert ring fills completely, subsequent alerts will be dropped. It is
+ * recommended that DLB applications dedicate a thread to perform blocking
+ * reads on the device file.
+ */
+enum dlb_domain_alert_id {
+	/* A destination domain queue that this domain connected to has
+	 * unregistered, and can no longer be sent to. The aux alert data
+	 * contains the queue ID.
+	 */
+	DLB_DOMAIN_ALERT_REMOTE_QUEUE_UNREGISTER,
+	/* A producer port in this domain attempted to send a QE without a
+	 * credit. aux_alert_data[7:0] contains the port ID, and
+	 * aux_alert_data[15:8] contains a flag indicating whether the port is
+	 * load-balanced (1) or directed (0).
+	 */
+	DLB_DOMAIN_ALERT_PP_OUT_OF_CREDITS,
+	/* Software issued an illegal enqueue for a port in this domain. An
+	 * illegal enqueue could be:
+	 * - Illegal (excess) completion
+	 * - Illegal fragment
+	 * - Illegal enqueue command
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_PP_ILLEGAL_ENQ,
+	/* Software issued excess CQ token pops for a port in this domain.
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_PP_EXCESS_TOKEN_POPS,
+	/* A enqueue contained either an invalid command encoding or a REL,
+	 * REL_T, RLS, FWD, FWD_T, FRAG, or FRAG_T from a directed port.
+	 *
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_ILLEGAL_HCW,
+	/* The QID must be valid and less than 128.
+	 *
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_ILLEGAL_QID,
+	/* An enqueue went to a disabled QID.
+	 *
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_DISABLED_QID,
+	/* The device containing this domain was reset. All applications using
+	 * the device need to exit for the driver to complete the reset
+	 * procedure.
+	 *
+	 * aux_alert_data doesn't contain any information for this alert.
+	 */
+	DLB_DOMAIN_ALERT_DEVICE_RESET,
+	/* User-space has enqueued an alert.
+	 *
+	 * aux_alert_data contains user-provided data.
+	 */
+	DLB_DOMAIN_ALERT_USER,
+
+	/* Number of DLB domain alerts */
+	NUM_DLB_DOMAIN_ALERTS
+};
+
+static const char dlb_domain_alert_strings[][128] = {
+	"DLB_DOMAIN_ALERT_REMOTE_QUEUE_UNREGISTER",
+	"DLB_DOMAIN_ALERT_PP_OUT_OF_CREDITS",
+	"DLB_DOMAIN_ALERT_PP_ILLEGAL_ENQ",
+	"DLB_DOMAIN_ALERT_PP_EXCESS_TOKEN_POPS",
+	"DLB_DOMAIN_ALERT_ILLEGAL_HCW",
+	"DLB_DOMAIN_ALERT_ILLEGAL_QID",
+	"DLB_DOMAIN_ALERT_DISABLED_QID",
+	"DLB_DOMAIN_ALERT_DEVICE_RESET",
+	"DLB_DOMAIN_ALERT_USER",
+};
+
+struct dlb_domain_alert {
+	__u64 alert_id;
+	__u64 aux_alert_data;
+};
+
+/*********************************/
+/* 'domain' device file commands */
+/*********************************/
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_LDB_POOL: Configure a load-balanced credit pool.
+ * Input parameters:
+ * - num_ldb_credits: Number of load-balanced credits (QED space) for this
+ *	pool.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: pool ID.
+ */
+struct dlb_create_ldb_pool_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 num_ldb_credits;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_DIR_POOL: Configure a directed credit pool.
+ * Input parameters:
+ * - num_dir_credits: Number of directed credits (DQED space) for this pool.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Pool ID.
+ */
+struct dlb_create_dir_pool_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 num_dir_credits;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_LDB_QUEUE: Configure a load-balanced queue.
+ * Input parameters:
+ * - num_atomic_inflights: This specifies the amount of temporary atomic QE
+ *	storage for this queue. If zero, the queue will not support atomic
+ *	scheduling.
+ * - num_sequence_numbers: This specifies the number of sequence numbers used
+ *	by this queue. If zero, the queue will not support ordered scheduling.
+ *	If non-zero, the queue will not support unordered scheduling.
+ * - num_qid_inflights: The maximum number of QEs that can be inflight
+ *	(scheduled to a CQ but not completed) at any time. If
+ *	num_sequence_numbers is non-zero, num_qid_inflights must be set equal
+ *	to num_sequence_numbers.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Queue ID.
+ */
+struct dlb_create_ldb_queue_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 num_sequence_numbers;
+	__u32 num_qid_inflights;
+	__u32 num_atomic_inflights;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_DIR_QUEUE: Configure a directed queue.
+ * Input parameters:
+ * - port_id: Port ID. If the corresponding directed port is already created,
+ *	specify its ID here. Else this argument must be 0xFFFFFFFF to indicate
+ *	that the queue is being created before the port.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Queue ID.
+ */
+struct dlb_create_dir_queue_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__s32 port_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_LDB_PORT: Configure a load-balanced port.
+ * Input parameters:
+ * - ldb_credit_pool_id: Load-balanced credit pool this port will belong to.
+ * - dir_credit_pool_id: Directed credit pool this port will belong to.
+ * - ldb_credit_high_watermark: Number of load-balanced credits from the pool
+ *	that this port will own.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_high_watermark: Number of directed credits from the pool that
+ *	this port will own.
+ *
+ *	If this port's scheduling domain doesn't have any directed queues,
+ *	this argument is ignored and the port is given no directed credits.
+ * - ldb_credit_low_watermark: Load-balanced credit low watermark. When the
+ *	port's credits reach this watermark, they become eligible to be
+ *	refilled by the DLB as credits until the high watermark
+ *	(num_ldb_credits) is reached.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_low_watermark: Directed credit low watermark. When the port's
+ *	credits reach this watermark, they become eligible to be refilled by
+ *	the DLB as credits until the high watermark (num_dir_credits) is
+ *	reached.
+ *
+ *	If this port's scheduling domain doesn't have any directed queues,
+ *	this argument is ignored and the port is given no directed credits.
+ * - ldb_credit_quantum: Number of load-balanced credits for the DLB to refill
+ *	per refill operation.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_quantum: Number of directed credits for the DLB to refill per
+ *	refill operation.
+ *
+ *	If this port's scheduling domain doesn't have any directed queues,
+ *	this argument is ignored and the port is given no directed credits.
+ * - padding0: Reserved for future use.
+ * - cq_depth: Depth of the port's CQ. Must be a power-of-two between 8 and
+ *	1024, inclusive.
+ * - cq_depth_threshold: CQ depth interrupt threshold. A value of N means that
+ *	the CQ interrupt won't fire until there are N or more outstanding CQ
+ *	tokens.
+ * - cq_history_list_size: Number of history list entries. This must be greater
+ *	than or equal to cq_depth.
+ * - padding1: Reserved for future use.
+ * - padding2: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: port ID.
+ */
+struct dlb_create_ldb_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 ldb_credit_pool_id;
+	__u32 dir_credit_pool_id;
+	__u16 ldb_credit_high_watermark;
+	__u16 ldb_credit_low_watermark;
+	__u16 ldb_credit_quantum;
+	__u16 dir_credit_high_watermark;
+	__u16 dir_credit_low_watermark;
+	__u16 dir_credit_quantum;
+	__u16 padding0;
+	__u16 cq_depth;
+	__u16 cq_depth_threshold;
+	__u16 cq_history_list_size;
+	__u32 padding1;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_DIR_PORT: Configure a directed port.
+ * Input parameters:
+ * - ldb_credit_pool_id: Load-balanced credit pool this port will belong to.
+ * - dir_credit_pool_id: Directed credit pool this port will belong to.
+ * - ldb_credit_high_watermark: Number of load-balanced credits from the pool
+ *	that this port will own.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_high_watermark: Number of directed credits from the pool that
+ *	this port will own.
+ * - ldb_credit_low_watermark: Load-balanced credit low watermark. When the
+ *	port's credits reach this watermark, they become eligible to be
+ *	refilled by the DLB as credits until the high watermark
+ *	(num_ldb_credits) is reached.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_low_watermark: Directed credit low watermark. When the port's
+ *	credits reach this watermark, they become eligible to be refilled by
+ *	the DLB as credits until the high watermark (num_dir_credits) is
+ *	reached.
+ * - ldb_credit_quantum: Number of load-balanced credits for the DLB to refill
+ *	per refill operation.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_quantum: Number of directed credits for the DLB to refill per
+ *	refill operation.
+ * - cq_depth: Depth of the port's CQ. Must be a power-of-two between 8 and
+ *	1024, inclusive.
+ * - cq_depth_threshold: CQ depth interrupt threshold. A value of N means that
+ *	the CQ interrupt won't fire until there are N or more outstanding CQ
+ *	tokens.
+ * - qid: Queue ID. If the corresponding directed queue is already created,
+ *	specify its ID here. Else this argument must be 0xFFFFFFFF to indicate
+ *	that the port is being created before the queue.
+ * - padding1: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Port ID.
+ */
+struct dlb_create_dir_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 ldb_credit_pool_id;
+	__u32 dir_credit_pool_id;
+	__u16 ldb_credit_high_watermark;
+	__u16 ldb_credit_low_watermark;
+	__u16 ldb_credit_quantum;
+	__u16 dir_credit_high_watermark;
+	__u16 dir_credit_low_watermark;
+	__u16 dir_credit_quantum;
+	__u16 cq_depth;
+	__u16 cq_depth_threshold;
+	__s32 queue_id;
+	__u32 padding1;
+};
+
+/*
+ * DLB_DOMAIN_CMD_START_DOMAIN: Mark the end of the domain configuration. This
+ *	must be called before passing QEs into the device, and no configuration
+ *	ioctls can be issued once the domain has started. Sending QEs into the
+ *	device before calling this ioctl will result in undefined behavior.
+ * Input parameters:
+ * - (None)
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_start_domain_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+};
+
+/*
+ * DLB_DOMAIN_CMD_MAP_QID: Map a load-balanced queue to a load-balanced port.
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - qid: Load-balanced queue ID.
+ * - priority: Queue->port service priority.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_map_qid_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 qid;
+	__u32 priority;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_UNMAP_QID: Unmap a load-balanced queue to a load-balanced
+ *	port.
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - qid: Load-balanced queue ID.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_unmap_qid_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 qid;
+};
+
+/*
+ * DLB_DOMAIN_CMD_ENABLE_LDB_PORT: Enable scheduling to a load-balanced port.
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_enable_ldb_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_ENABLE_DIR_PORT: Enable scheduling to a directed port.
+ * Input parameters:
+ * - port_id: Directed port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_enable_dir_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+};
+
+/*
+ * DLB_DOMAIN_CMD_DISABLE_LDB_PORT: Disable scheduling to a load-balanced port.
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_disable_ldb_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_DISABLE_DIR_PORT: Disable scheduling to a directed port.
+ * Input parameters:
+ * - port_id: Directed port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_disable_dir_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_BLOCK_ON_CQ_INTERRUPT: Block on a CQ interrupt until a QE
+ *	arrives for the specified port. If a QE is already present, the ioctl
+ *	will immediately return.
+ *
+ *	Note: Only one thread can block on a CQ's interrupt at a time. Doing
+ *	otherwise can result in hung threads.
+ *
+ * Input parameters:
+ * - port_id: Port ID.
+ * - is_ldb: True if the port is load-balanced, false otherwise.
+ * - arm: Tell the driver to arm the interrupt.
+ * - cq_gen: Current CQ generation bit.
+ * - padding0: Reserved for future use.
+ * - cq_va: VA of the CQ entry where the next QE will be placed.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_block_on_cq_interrupt_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u8 is_ldb;
+	__u8 arm;
+	__u8 cq_gen;
+	__u8 padding0;
+	__u64 cq_va;
+};
+
+/*
+ * DLB_DOMAIN_CMD_ENQUEUE_DOMAIN_ALERT: Enqueue a domain alert that will be
+ *	read by one reader thread.
+ *
+ * Input parameters:
+ * - aux_alert_data: user-defined auxiliary data.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_enqueue_domain_alert_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u64 aux_alert_data;
+};
+
+/*
+ * DLB_DOMAIN_CMD_GET_LDB_QUEUE_DEPTH: Get a load-balanced queue's depth.
+ * Input parameters:
+ * - queue_id: The load-balanced queue ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: queue depth.
+ */
+struct dlb_get_ldb_queue_depth_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 queue_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_GET_DIR_QUEUE_DEPTH: Get a directed queue's depth.
+ * Input parameters:
+ * - queue_id: The directed queue ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: queue depth.
+ */
+struct dlb_get_dir_queue_depth_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 queue_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_PENDING_PORT_UNMAPS: Get number of queue unmap operations in
+ *	progress for a load-balanced port.
+ *
+ *	Note: This is a snapshot; the number of unmap operations in progress
+ *	is subject to change at any time.
+ *
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: number of unmaps in progress.
+ */
+struct dlb_pending_port_unmaps_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 padding0;
+};
+
+enum dlb_domain_user_interface_commands {
+	DLB_DOMAIN_CMD_CREATE_LDB_POOL,
+	DLB_DOMAIN_CMD_CREATE_DIR_POOL,
+	DLB_DOMAIN_CMD_CREATE_LDB_QUEUE,
+	DLB_DOMAIN_CMD_CREATE_DIR_QUEUE,
+	DLB_DOMAIN_CMD_CREATE_LDB_PORT,
+	DLB_DOMAIN_CMD_CREATE_DIR_PORT,
+	DLB_DOMAIN_CMD_START_DOMAIN,
+	DLB_DOMAIN_CMD_MAP_QID,
+	DLB_DOMAIN_CMD_UNMAP_QID,
+	DLB_DOMAIN_CMD_ENABLE_LDB_PORT,
+	DLB_DOMAIN_CMD_ENABLE_DIR_PORT,
+	DLB_DOMAIN_CMD_DISABLE_LDB_PORT,
+	DLB_DOMAIN_CMD_DISABLE_DIR_PORT,
+	DLB_DOMAIN_CMD_BLOCK_ON_CQ_INTERRUPT,
+	DLB_DOMAIN_CMD_ENQUEUE_DOMAIN_ALERT,
+	DLB_DOMAIN_CMD_GET_LDB_QUEUE_DEPTH,
+	DLB_DOMAIN_CMD_GET_DIR_QUEUE_DEPTH,
+	DLB_DOMAIN_CMD_PENDING_PORT_UNMAPS,
+
+	/* NUM_DLB_DOMAIN_CMD must be last */
+	NUM_DLB_DOMAIN_CMD,
+};
+
+/*
+ * Base addresses for memory mapping the consumer queue (CQ) and popcount (PC)
+ * memory space, and producer port (PP) MMIO space. The CQ, PC, and PP
+ * addresses are per-port. Every address is page-separated (e.g. LDB PP 0 is at
+ * 0x2100000 and LDB PP 1 is at 0x2101000).
+ */
+#define DLB_LDB_CQ_BASE 0x3000000
+#define DLB_LDB_CQ_MAX_SIZE 65536
+#define DLB_LDB_CQ_OFFS(id) (DLB_LDB_CQ_BASE + (id) * DLB_LDB_CQ_MAX_SIZE)
+
+#define DLB_DIR_CQ_BASE 0x3800000
+#define DLB_DIR_CQ_MAX_SIZE 65536
+#define DLB_DIR_CQ_OFFS(id) (DLB_DIR_CQ_BASE + (id) * DLB_DIR_CQ_MAX_SIZE)
+
+#define DLB_LDB_PC_BASE 0x2300000
+#define DLB_LDB_PC_MAX_SIZE 4096
+#define DLB_LDB_PC_OFFS(id) (DLB_LDB_PC_BASE + (id) * DLB_LDB_PC_MAX_SIZE)
+
+#define DLB_DIR_PC_BASE 0x2200000
+#define DLB_DIR_PC_MAX_SIZE 4096
+#define DLB_DIR_PC_OFFS(id) (DLB_DIR_PC_BASE + (id) * DLB_DIR_PC_MAX_SIZE)
+
+#define DLB_LDB_PP_BASE 0x2100000
+#define DLB_LDB_PP_MAX_SIZE 4096
+#define DLB_LDB_PP_OFFS(id) (DLB_LDB_PP_BASE + (id) * DLB_LDB_PP_MAX_SIZE)
+
+#define DLB_DIR_PP_BASE 0x2000000
+#define DLB_DIR_PP_MAX_SIZE 4096
+#define DLB_DIR_PP_OFFS(id) (DLB_DIR_PP_BASE + (id) * DLB_DIR_PP_MAX_SIZE)
+
+#endif /* __DLB_USER_H */
-- 
1.7.10


^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH 03/27] event/dlb: add shared code version 10.7.9
  2020-06-27  4:37  2% [dpdk-dev] [PATCH 00/27] event/dlb Intel DLB PMD Tim McDaniel
  @ 2020-06-27  4:37  1% ` Tim McDaniel
  2020-06-27  4:37  1% ` [dpdk-dev] [PATCH 08/27] event/dlb: add definitions shared with LKM or shared code Tim McDaniel
  2 siblings, 0 replies; 200+ results
From: Tim McDaniel @ 2020-06-27  4:37 UTC (permalink / raw)
  To: jerinj
  Cc: mattias.ronnblom, dev, gage.eads, harry.van.haaren, McDaniel, Timothy

From: "McDaniel, Timothy" <timothy.mcdaniel@intel.com>

The DLB shared code is auto generated by Intel, and is being committed
here so that it can be built in the DPDK environment. The shared code
should not be modified. The shared code must be present in order to
successfully build the DLB PMD.

Changes since v1 patch series
1) convert C99 comment to standard C
2) remove TODO and FIXME comments

Signed-off-by: McDaniel, Timothy <timothy.mcdaniel@intel.com>
---
 drivers/event/dlb/pf/base/dlb_hw_types.h     |  360 +
 drivers/event/dlb/pf/base/dlb_mbox.h         |  645 ++
 drivers/event/dlb/pf/base/dlb_osdep.h        |  348 +
 drivers/event/dlb/pf/base/dlb_osdep_bitmap.h |  442 ++
 drivers/event/dlb/pf/base/dlb_osdep_list.h   |  131 +
 drivers/event/dlb/pf/base/dlb_osdep_types.h  |   31 +
 drivers/event/dlb/pf/base/dlb_regs.h         | 2646 +++++++
 drivers/event/dlb/pf/base/dlb_resource.c     | 9700 ++++++++++++++++++++++++++
 drivers/event/dlb/pf/base/dlb_resource.h     | 1625 +++++
 drivers/event/dlb/pf/base/dlb_user.h         | 1084 +++
 10 files changed, 17012 insertions(+)
 create mode 100644 drivers/event/dlb/pf/base/dlb_hw_types.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_mbox.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_osdep.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_osdep_bitmap.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_osdep_list.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_osdep_types.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_regs.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_resource.c
 create mode 100644 drivers/event/dlb/pf/base/dlb_resource.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_user.h

diff --git a/drivers/event/dlb/pf/base/dlb_hw_types.h b/drivers/event/dlb/pf/base/dlb_hw_types.h
new file mode 100644
index 0000000..d56590e
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_hw_types.h
@@ -0,0 +1,360 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_HW_TYPES_H
+#define __DLB_HW_TYPES_H
+
+#include "dlb_user.h"
+#include "dlb_osdep_types.h"
+#include "dlb_osdep_list.h"
+
+#define DLB_MAX_NUM_VFS 16
+#define DLB_MAX_NUM_DOMAINS 32
+#define DLB_MAX_NUM_LDB_QUEUES 128
+#define DLB_MAX_NUM_LDB_PORTS 64
+#define DLB_MAX_NUM_DIR_PORTS 128
+#define DLB_MAX_NUM_LDB_CREDITS 16384
+#define DLB_MAX_NUM_DIR_CREDITS 4096
+#define DLB_MAX_NUM_LDB_CREDIT_POOLS 64
+#define DLB_MAX_NUM_DIR_CREDIT_POOLS 64
+#define DLB_MAX_NUM_HIST_LIST_ENTRIES 5120
+#define DLB_MAX_NUM_AQOS_ENTRIES 2048
+#define DLB_MAX_NUM_TOTAL_OUTSTANDING_COMPLETIONS 4096
+#define DLB_MAX_NUM_QIDS_PER_LDB_CQ 8
+#define DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS 4
+#define DLB_MAX_NUM_SEQUENCE_NUMBER_MODES 6
+#define DLB_QID_PRIORITIES 8
+#define DLB_NUM_ARB_WEIGHTS 8
+#define DLB_MAX_WEIGHT 255
+#define DLB_MAX_PORT_CREDIT_QUANTUM 1023
+#define DLB_MAX_CQ_COMP_CHECK_LOOPS 409600
+#define DLB_MAX_QID_EMPTY_CHECK_LOOPS (32 * 64 * 1024 * (800 / 30))
+#define DLB_HZ 800000000
+
+/* Used for DLB A-stepping workaround for hardware write buffer lock up issue */
+#define DLB_A_STEP_MAX_PORTS 128
+
+#define DLB_PF_DEV_ID 0x270B
+#define DLB_VF_DEV_ID 0x270C
+
+/* Interrupt related macros */
+#define DLB_PF_NUM_NON_CQ_INTERRUPT_VECTORS 8
+#define DLB_PF_NUM_CQ_INTERRUPT_VECTORS	 64
+#define DLB_PF_TOTAL_NUM_INTERRUPT_VECTORS \
+	(DLB_PF_NUM_NON_CQ_INTERRUPT_VECTORS + \
+	 DLB_PF_NUM_CQ_INTERRUPT_VECTORS)
+#define DLB_PF_NUM_COMPRESSED_MODE_VECTORS \
+	(DLB_PF_NUM_NON_CQ_INTERRUPT_VECTORS + 1)
+#define DLB_PF_NUM_PACKED_MODE_VECTORS	 DLB_PF_TOTAL_NUM_INTERRUPT_VECTORS
+#define DLB_PF_COMPRESSED_MODE_CQ_VECTOR_ID DLB_PF_NUM_NON_CQ_INTERRUPT_VECTORS
+
+#define DLB_VF_NUM_NON_CQ_INTERRUPT_VECTORS 1
+#define DLB_VF_NUM_CQ_INTERRUPT_VECTORS 31
+#define DLB_VF_BASE_CQ_VECTOR_ID 0
+#define DLB_VF_LAST_CQ_VECTOR_ID 30
+#define DLB_VF_MBOX_VECTOR_ID 31
+#define DLB_VF_TOTAL_NUM_INTERRUPT_VECTORS \
+	(DLB_VF_NUM_NON_CQ_INTERRUPT_VECTORS + \
+	 DLB_VF_NUM_CQ_INTERRUPT_VECTORS)
+
+#define DLB_PF_NUM_ALARM_INTERRUPT_VECTORS 4
+/* DLB ALARM interrupts */
+#define DLB_INT_ALARM 0
+/* VF to PF Mailbox Service Request */
+#define DLB_INT_VF_TO_PF_MBOX 1
+/* HCW Ingress Errors */
+#define DLB_INT_INGRESS_ERROR 3
+
+#define DLB_ALARM_HW_SOURCE_SYS 0
+#define DLB_ALARM_HW_SOURCE_DLB 1
+
+#define DLB_ALARM_HW_UNIT_CHP 1
+#define DLB_ALARM_HW_UNIT_LSP 3
+
+#define DLB_ALARM_HW_CHP_AID_OUT_OF_CREDITS 6
+#define DLB_ALARM_HW_CHP_AID_ILLEGAL_ENQ 7
+#define DLB_ALARM_HW_LSP_AID_EXCESS_TOKEN_POPS 15
+#define DLB_ALARM_SYS_AID_ILLEGAL_HCW 0
+#define DLB_ALARM_SYS_AID_ILLEGAL_QID 3
+#define DLB_ALARM_SYS_AID_DISABLED_QID 4
+#define DLB_ALARM_SYS_AID_ILLEGAL_CQID 6
+
+/* Hardware-defined base addresses */
+#define DLB_LDB_PP_BASE 0x2100000
+#define DLB_LDB_PP_STRIDE 0x1000
+#define DLB_LDB_PP_BOUND \
+	(DLB_LDB_PP_BASE + DLB_LDB_PP_STRIDE * DLB_MAX_NUM_LDB_PORTS)
+#define DLB_DIR_PP_BASE 0x2000000
+#define DLB_DIR_PP_STRIDE 0x1000
+#define DLB_DIR_PP_BOUND \
+	(DLB_DIR_PP_BASE + DLB_DIR_PP_STRIDE * DLB_MAX_NUM_DIR_PORTS)
+
+struct dlb_resource_id {
+	u32 phys_id;
+	u32 virt_id;
+	u8 vf_owned;
+	u8 vf_id;
+};
+
+struct dlb_freelist {
+	u32 base;
+	u32 bound;
+	u32 offset;
+};
+
+static inline u32 dlb_freelist_count(struct dlb_freelist *list)
+{
+	return (list->bound - list->base) - list->offset;
+}
+
+struct dlb_hcw {
+	u64 data;
+	/* Word 3 */
+	u16 opaque;
+	u8 qid;
+	u8 sched_type:2;
+	u8 priority:3;
+	u8 msg_type:3;
+	/* Word 4 */
+	u16 lock_id;
+	u8 meas_lat:1;
+	u8 rsvd1:2;
+	u8 no_dec:1;
+	u8 cmp_id:4;
+	u8 cq_token:1;
+	u8 qe_comp:1;
+	u8 qe_frag:1;
+	u8 qe_valid:1;
+	u8 int_arm:1;
+	u8 error:1;
+	u8 rsvd:2;
+};
+
+struct dlb_ldb_queue {
+	struct dlb_list_entry domain_list;
+	struct dlb_list_entry func_list;
+	struct dlb_resource_id id;
+	struct dlb_resource_id domain_id;
+	u32 num_qid_inflights;
+	struct dlb_freelist aqed_freelist;
+	u8 sn_cfg_valid;
+	u32 sn_group;
+	u32 sn_slot;
+	u32 num_mappings;
+	u8 num_pending_additions;
+	u8 owned;
+	u8 configured;
+};
+
+/* Directed ports and queues are paired by nature, so the driver tracks them
+ * with a single data structure.
+ */
+struct dlb_dir_pq_pair {
+	struct dlb_list_entry domain_list;
+	struct dlb_list_entry func_list;
+	struct dlb_resource_id id;
+	struct dlb_resource_id domain_id;
+	u8 ldb_pool_used;
+	u8 dir_pool_used;
+	u8 queue_configured;
+	u8 port_configured;
+	u8 owned;
+	u8 enabled;
+	u32 ref_cnt;
+};
+
+enum dlb_qid_map_state {
+	/* The slot doesn't contain a valid queue mapping */
+	DLB_QUEUE_UNMAPPED,
+	/* The slot contains a valid queue mapping */
+	DLB_QUEUE_MAPPED,
+	/* The driver is mapping a queue into this slot */
+	DLB_QUEUE_MAP_IN_PROGRESS,
+	/* The driver is unmapping a queue from this slot */
+	DLB_QUEUE_UNMAP_IN_PROGRESS,
+	/* The driver is unmapping a queue from this slot, and once complete
+	 * will replace it with another mapping.
+	 */
+	DLB_QUEUE_UNMAP_IN_PROGRESS_PENDING_MAP,
+};
+
+struct dlb_ldb_port_qid_map {
+	u16 qid;
+	u8 priority;
+	u16 pending_qid;
+	u8 pending_priority;
+	enum dlb_qid_map_state state;
+};
+
+struct dlb_ldb_port {
+	struct dlb_list_entry domain_list;
+	struct dlb_list_entry func_list;
+	struct dlb_resource_id id;
+	struct dlb_resource_id domain_id;
+	u8 ldb_pool_used;
+	u8 dir_pool_used;
+	u8 init_tkn_cnt;
+	u32 hist_list_entry_base;
+	u32 hist_list_entry_limit;
+	/* The qid_map represents the hardware QID mapping state. */
+	struct dlb_ldb_port_qid_map qid_map[DLB_MAX_NUM_QIDS_PER_LDB_CQ];
+	u32 ref_cnt;
+	u8 num_pending_removals;
+	u8 num_mappings;
+	u8 owned;
+	u8 enabled;
+	u8 configured;
+};
+
+struct dlb_credit_pool {
+	struct dlb_list_entry domain_list;
+	struct dlb_list_entry func_list;
+	struct dlb_resource_id id;
+	struct dlb_resource_id domain_id;
+	u32 total_credits;
+	u32 avail_credits;
+	u8 owned;
+	u8 configured;
+};
+
+struct dlb_sn_group {
+	u32 mode;
+	u32 sequence_numbers_per_queue;
+	u32 slot_use_bitmap;
+	u32 id;
+};
+
+static inline bool dlb_sn_group_full(struct dlb_sn_group *group)
+{
+	u32 mask[6] = {
+		0xffffffff,  /* 32 SNs per queue */
+		0x0000ffff,  /* 64 SNs per queue */
+		0x000000ff,  /* 128 SNs per queue */
+		0x0000000f,  /* 256 SNs per queue */
+		0x00000003,  /* 512 SNs per queue */
+		0x00000001}; /* 1024 SNs per queue */
+
+	return group->slot_use_bitmap == mask[group->mode];
+}
+
+static inline int dlb_sn_group_alloc_slot(struct dlb_sn_group *group)
+{
+	int bound[6] = {32, 16, 8, 4, 2, 1};
+	int i;
+
+	for (i = 0; i < bound[group->mode]; i++) {
+		if (!(group->slot_use_bitmap & (1 << i))) {
+			group->slot_use_bitmap |= 1 << i;
+			return i;
+		}
+	}
+
+	return -1;
+}
+
+static inline void dlb_sn_group_free_slot(struct dlb_sn_group *group, int slot)
+{
+	group->slot_use_bitmap &= ~(1 << slot);
+}
+
+static inline int dlb_sn_group_used_slots(struct dlb_sn_group *group)
+{
+	int i, cnt = 0;
+
+	for (i = 0; i < 32; i++)
+		cnt += !!(group->slot_use_bitmap & (1 << i));
+
+	return cnt;
+}
+
+struct dlb_domain {
+	struct dlb_function_resources *parent_func;
+	struct dlb_list_entry func_list;
+	struct dlb_list_head used_ldb_queues;
+	struct dlb_list_head used_ldb_ports;
+	struct dlb_list_head used_dir_pq_pairs;
+	struct dlb_list_head used_ldb_credit_pools;
+	struct dlb_list_head used_dir_credit_pools;
+	struct dlb_list_head avail_ldb_queues;
+	struct dlb_list_head avail_ldb_ports;
+	struct dlb_list_head avail_dir_pq_pairs;
+	struct dlb_list_head avail_ldb_credit_pools;
+	struct dlb_list_head avail_dir_credit_pools;
+	u32 total_hist_list_entries;
+	u32 avail_hist_list_entries;
+	u32 hist_list_entry_base;
+	u32 hist_list_entry_offset;
+	struct dlb_freelist qed_freelist;
+	struct dlb_freelist dqed_freelist;
+	struct dlb_freelist aqed_freelist;
+	struct dlb_resource_id id;
+	int num_pending_removals;
+	int num_pending_additions;
+	u8 configured;
+	u8 started;
+};
+
+struct dlb_bitmap;
+
+struct dlb_function_resources {
+	u32 num_avail_domains;
+	struct dlb_list_head avail_domains;
+	struct dlb_list_head used_domains;
+	u32 num_avail_ldb_queues;
+	struct dlb_list_head avail_ldb_queues;
+	u32 num_avail_ldb_ports;
+	struct dlb_list_head avail_ldb_ports;
+	u32 num_avail_dir_pq_pairs;
+	struct dlb_list_head avail_dir_pq_pairs;
+	struct dlb_bitmap *avail_hist_list_entries;
+	struct dlb_bitmap *avail_qed_freelist_entries;
+	struct dlb_bitmap *avail_dqed_freelist_entries;
+	struct dlb_bitmap *avail_aqed_freelist_entries;
+	u32 num_avail_ldb_credit_pools;
+	struct dlb_list_head avail_ldb_credit_pools;
+	u32 num_avail_dir_credit_pools;
+	struct dlb_list_head avail_dir_credit_pools;
+	u32 num_enabled_ldb_ports; /* (PF only) */
+	u8 locked; /* (VF only) */
+};
+
+/* After initialization, each resource in dlb_hw_resources is located in one of
+ * the following lists:
+ * -- The PF's available resources list. These are unconfigured resources owned
+ *	by the PF and not allocated to a DLB scheduling domain.
+ * -- A VF's available resources list. These are VF-owned unconfigured
+ *	resources not allocated to a DLB scheduling domain.
+ * -- A domain's available resources list. These are domain-owned unconfigured
+ *	resources.
+ * -- A domain's used resources list. These are are domain-owned configured
+ *	resources.
+ *
+ * A resource moves to a new list when a VF or domain is created or destroyed,
+ * or when the resource is configured.
+ */
+struct dlb_hw_resources {
+	struct dlb_ldb_queue ldb_queues[DLB_MAX_NUM_LDB_QUEUES];
+	struct dlb_ldb_port ldb_ports[DLB_MAX_NUM_LDB_PORTS];
+	struct dlb_dir_pq_pair dir_pq_pairs[DLB_MAX_NUM_DIR_PORTS];
+	struct dlb_credit_pool ldb_credit_pools[DLB_MAX_NUM_LDB_CREDIT_POOLS];
+	struct dlb_credit_pool dir_credit_pools[DLB_MAX_NUM_DIR_CREDIT_POOLS];
+	struct dlb_sn_group sn_groups[DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS];
+};
+
+struct dlb_hw {
+	/* BAR 0 address */
+	void  *csr_kva;
+	unsigned long csr_phys_addr;
+	/* BAR 2 address */
+	void  *func_kva;
+	unsigned long func_phys_addr;
+
+	/* Resource tracking */
+	struct dlb_hw_resources rsrcs;
+	struct dlb_function_resources pf;
+	struct dlb_function_resources vf[DLB_MAX_NUM_VFS];
+	struct dlb_domain domains[DLB_MAX_NUM_DOMAINS];
+};
+
+#endif /* __DLB_HW_TYPES_H */
diff --git a/drivers/event/dlb/pf/base/dlb_mbox.h b/drivers/event/dlb/pf/base/dlb_mbox.h
new file mode 100644
index 0000000..e195526
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_mbox.h
@@ -0,0 +1,645 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_BASE_DLB_MBOX_H
+#define __DLB_BASE_DLB_MBOX_H
+
+#include "dlb_regs.h"
+#include "dlb_osdep_types.h"
+
+#define DLB_MBOX_INTERFACE_VERSION 1
+
+/* The PF uses its PF->VF mailbox to send responses to VF requests, as well as
+ * to send requests of its own (e.g. notifying a VF of an impending FLR).
+ * To avoid communication race conditions, e.g. the PF sends a response and then
+ * sends a request before the VF reads the response, the PF->VF mailbox is
+ * divided into two sections:
+ * - Bytes 0-47: PF responses
+ * - Bytes 48-63: PF requests
+ *
+ * Partitioning the PF->VF mailbox allows responses and requests to occupy the
+ * mailbox simultaneously.
+ */
+#define DLB_PF2VF_RESP_BYTES 48
+#define DLB_PF2VF_RESP_BASE 0
+#define DLB_PF2VF_RESP_BASE_WORD (DLB_PF2VF_RESP_BASE / 4)
+
+#define DLB_PF2VF_REQ_BYTES \
+	(DLB_FUNC_PF_PF2VF_MAILBOX_BYTES - DLB_PF2VF_RESP_BYTES)
+#define DLB_PF2VF_REQ_BASE DLB_PF2VF_RESP_BYTES
+#define DLB_PF2VF_REQ_BASE_WORD (DLB_PF2VF_REQ_BASE / 4)
+
+/* Similarly, the VF->PF mailbox is divided into two sections:
+ * - Bytes 0-239: VF requests
+ * - Bytes 240-255: VF responses
+ */
+#define DLB_VF2PF_REQ_BYTES 240
+#define DLB_VF2PF_REQ_BASE 0
+#define DLB_VF2PF_REQ_BASE_WORD (DLB_VF2PF_REQ_BASE / 4)
+
+#define DLB_VF2PF_RESP_BYTES \
+	(DLB_FUNC_VF_VF2PF_MAILBOX_BYTES - DLB_VF2PF_REQ_BYTES)
+#define DLB_VF2PF_RESP_BASE DLB_VF2PF_REQ_BYTES
+#define DLB_VF2PF_RESP_BASE_WORD (DLB_VF2PF_RESP_BASE / 4)
+
+/* VF-initiated commands */
+enum dlb_mbox_cmd_type {
+	DLB_MBOX_CMD_REGISTER,
+	DLB_MBOX_CMD_UNREGISTER,
+	DLB_MBOX_CMD_GET_NUM_RESOURCES,
+	DLB_MBOX_CMD_CREATE_SCHED_DOMAIN,
+	DLB_MBOX_CMD_RESET_SCHED_DOMAIN,
+	DLB_MBOX_CMD_CREATE_LDB_POOL,
+	DLB_MBOX_CMD_CREATE_DIR_POOL,
+	DLB_MBOX_CMD_CREATE_LDB_QUEUE,
+	DLB_MBOX_CMD_CREATE_DIR_QUEUE,
+	DLB_MBOX_CMD_CREATE_LDB_PORT,
+	DLB_MBOX_CMD_CREATE_DIR_PORT,
+	DLB_MBOX_CMD_ENABLE_LDB_PORT,
+	DLB_MBOX_CMD_DISABLE_LDB_PORT,
+	DLB_MBOX_CMD_ENABLE_DIR_PORT,
+	DLB_MBOX_CMD_DISABLE_DIR_PORT,
+	DLB_MBOX_CMD_LDB_PORT_OWNED_BY_DOMAIN,
+	DLB_MBOX_CMD_DIR_PORT_OWNED_BY_DOMAIN,
+	DLB_MBOX_CMD_MAP_QID,
+	DLB_MBOX_CMD_UNMAP_QID,
+	DLB_MBOX_CMD_START_DOMAIN,
+	DLB_MBOX_CMD_ENABLE_LDB_PORT_INTR,
+	DLB_MBOX_CMD_ENABLE_DIR_PORT_INTR,
+	DLB_MBOX_CMD_ARM_CQ_INTR,
+	DLB_MBOX_CMD_GET_NUM_USED_RESOURCES,
+	DLB_MBOX_CMD_INIT_CQ_SCHED_COUNT,
+	DLB_MBOX_CMD_COLLECT_CQ_SCHED_COUNT,
+	DLB_MBOX_CMD_ACK_VF_FLR_DONE,
+	DLB_MBOX_CMD_GET_SN_ALLOCATION,
+	DLB_MBOX_CMD_GET_LDB_QUEUE_DEPTH,
+	DLB_MBOX_CMD_GET_DIR_QUEUE_DEPTH,
+	DLB_MBOX_CMD_PENDING_PORT_UNMAPS,
+	DLB_MBOX_CMD_QUERY_CQ_POLL_MODE,
+	DLB_MBOX_CMD_GET_SN_OCCUPANCY,
+
+	/* NUM_QE_CMD_TYPES must be last */
+	NUM_DLB_MBOX_CMD_TYPES,
+};
+
+static const char dlb_mbox_cmd_type_strings[][128] = {
+	"DLB_MBOX_CMD_REGISTER",
+	"DLB_MBOX_CMD_UNREGISTER",
+	"DLB_MBOX_CMD_GET_NUM_RESOURCES",
+	"DLB_MBOX_CMD_CREATE_SCHED_DOMAIN",
+	"DLB_MBOX_CMD_RESET_SCHED_DOMAIN",
+	"DLB_MBOX_CMD_CREATE_LDB_POOL",
+	"DLB_MBOX_CMD_CREATE_DIR_POOL",
+	"DLB_MBOX_CMD_CREATE_LDB_QUEUE",
+	"DLB_MBOX_CMD_CREATE_DIR_QUEUE",
+	"DLB_MBOX_CMD_CREATE_LDB_PORT",
+	"DLB_MBOX_CMD_CREATE_DIR_PORT",
+	"DLB_MBOX_CMD_ENABLE_LDB_PORT",
+	"DLB_MBOX_CMD_DISABLE_LDB_PORT",
+	"DLB_MBOX_CMD_ENABLE_DIR_PORT",
+	"DLB_MBOX_CMD_DISABLE_DIR_PORT",
+	"DLB_MBOX_CMD_LDB_PORT_OWNED_BY_DOMAIN",
+	"DLB_MBOX_CMD_DIR_PORT_OWNED_BY_DOMAIN",
+	"DLB_MBOX_CMD_MAP_QID",
+	"DLB_MBOX_CMD_UNMAP_QID",
+	"DLB_MBOX_CMD_START_DOMAIN",
+	"DLB_MBOX_CMD_ENABLE_LDB_PORT_INTR",
+	"DLB_MBOX_CMD_ENABLE_DIR_PORT_INTR",
+	"DLB_MBOX_CMD_ARM_CQ_INTR",
+	"DLB_MBOX_CMD_GET_NUM_USED_RESOURCES",
+	"DLB_MBOX_CMD_INIT_CQ_SCHED_COUNT",
+	"DLB_MBOX_CMD_COLLECT_CQ_SCHED_COUNT",
+	"DLB_MBOX_CMD_ACK_VF_FLR_DONE",
+	"DLB_MBOX_CMD_GET_SN_ALLOCATION",
+	"DLB_MBOX_CMD_GET_LDB_QUEUE_DEPTH",
+	"DLB_MBOX_CMD_GET_DIR_QUEUE_DEPTH",
+	"DLB_MBOX_CMD_PENDING_PORT_UNMAPS",
+	"DLB_MBOX_CMD_QUERY_CQ_POLL_MODE",
+	"DLB_MBOX_CMD_GET_SN_OCCUPANCY",
+};
+
+/* PF-initiated commands */
+enum dlb_mbox_vf_cmd_type {
+	DLB_MBOX_VF_CMD_DOMAIN_ALERT,
+	DLB_MBOX_VF_CMD_NOTIFICATION,
+	DLB_MBOX_VF_CMD_IN_USE,
+
+	/* NUM_DLB_MBOX_VF_CMD_TYPES must be last */
+	NUM_DLB_MBOX_VF_CMD_TYPES,
+};
+
+static const char dlb_mbox_vf_cmd_type_strings[][128] = {
+	"DLB_MBOX_VF_CMD_DOMAIN_ALERT",
+	"DLB_MBOX_VF_CMD_NOTIFICATION",
+	"DLB_MBOX_VF_CMD_IN_USE",
+};
+
+#define DLB_MBOX_CMD_TYPE(hdr) \
+	(((struct dlb_mbox_req_hdr *)hdr)->type)
+#define DLB_MBOX_CMD_STRING(hdr) \
+	dlb_mbox_cmd_type_strings[DLB_MBOX_CMD_TYPE(hdr)]
+
+enum dlb_mbox_status_type {
+	DLB_MBOX_ST_SUCCESS,
+	DLB_MBOX_ST_INVALID_CMD_TYPE,
+	DLB_MBOX_ST_VERSION_MISMATCH,
+	DLB_MBOX_ST_EXPECTED_PHASE_ONE,
+	DLB_MBOX_ST_EXPECTED_PHASE_TWO,
+	DLB_MBOX_ST_INVALID_OWNER_VF,
+};
+
+static const char dlb_mbox_status_type_strings[][128] = {
+	"DLB_MBOX_ST_SUCCESS",
+	"DLB_MBOX_ST_INVALID_CMD_TYPE",
+	"DLB_MBOX_ST_VERSION_MISMATCH",
+	"DLB_MBOX_ST_EXPECTED_PHASE_ONE",
+	"DLB_MBOX_ST_EXPECTED_PHASE_TWO",
+	"DLB_MBOX_ST_INVALID_OWNER_VF",
+};
+
+#define DLB_MBOX_ST_TYPE(hdr) \
+	(((struct dlb_mbox_resp_hdr *)hdr)->status)
+#define DLB_MBOX_ST_STRING(hdr) \
+	dlb_mbox_status_type_strings[DLB_MBOX_ST_TYPE(hdr)]
+
+/* This structure is always the first field in a request structure */
+struct dlb_mbox_req_hdr {
+	u32 type;
+};
+
+/* This structure is always the first field in a response structure */
+struct dlb_mbox_resp_hdr {
+	u32 status;
+};
+
+struct dlb_mbox_register_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u16 min_interface_version;
+	u16 max_interface_version;
+};
+
+struct dlb_mbox_register_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 interface_version;
+	u8 pf_id;
+	u8 vf_id;
+	u8 is_auxiliary_vf;
+	u8 primary_vf_id;
+	u32 padding;
+};
+
+struct dlb_mbox_unregister_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 padding;
+};
+
+struct dlb_mbox_unregister_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 padding;
+};
+
+struct dlb_mbox_get_num_resources_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 padding;
+};
+
+struct dlb_mbox_get_num_resources_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u16 num_sched_domains;
+	u16 num_ldb_queues;
+	u16 num_ldb_ports;
+	u16 num_dir_ports;
+	u16 padding0;
+	u8 num_ldb_credit_pools;
+	u8 num_dir_credit_pools;
+	u32 num_atomic_inflights;
+	u32 max_contiguous_atomic_inflights;
+	u32 num_hist_list_entries;
+	u32 max_contiguous_hist_list_entries;
+	u16 num_ldb_credits;
+	u16 max_contiguous_ldb_credits;
+	u16 num_dir_credits;
+	u16 max_contiguous_dir_credits;
+	u32 padding1;
+};
+
+struct dlb_mbox_create_sched_domain_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 num_ldb_queues;
+	u32 num_ldb_ports;
+	u32 num_dir_ports;
+	u32 num_atomic_inflights;
+	u32 num_hist_list_entries;
+	u32 num_ldb_credits;
+	u32 num_dir_credits;
+	u32 num_ldb_credit_pools;
+	u32 num_dir_credit_pools;
+};
+
+struct dlb_mbox_create_sched_domain_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 id;
+};
+
+struct dlb_mbox_reset_sched_domain_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 id;
+};
+
+struct dlb_mbox_reset_sched_domain_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+};
+
+struct dlb_mbox_create_credit_pool_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 num_credits;
+	u32 padding;
+};
+
+struct dlb_mbox_create_credit_pool_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 id;
+};
+
+struct dlb_mbox_create_ldb_queue_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 num_sequence_numbers;
+	u32 num_qid_inflights;
+	u32 num_atomic_inflights;
+	u32 padding;
+};
+
+struct dlb_mbox_create_ldb_queue_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 id;
+};
+
+struct dlb_mbox_create_dir_queue_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 padding0;
+};
+
+struct dlb_mbox_create_dir_queue_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 id;
+};
+
+struct dlb_mbox_create_ldb_port_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 ldb_credit_pool_id;
+	u32 dir_credit_pool_id;
+	u64 pop_count_address;
+	u16 ldb_credit_high_watermark;
+	u16 ldb_credit_low_watermark;
+	u16 ldb_credit_quantum;
+	u16 dir_credit_high_watermark;
+	u16 dir_credit_low_watermark;
+	u16 dir_credit_quantum;
+	u32 padding0;
+	u16 cq_depth;
+	u16 cq_history_list_size;
+	u32 padding1;
+	u64 cq_base_address;
+	u64 nq_base_address;
+	u32 nq_size;
+	u32 padding2;
+};
+
+struct dlb_mbox_create_ldb_port_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 id;
+};
+
+struct dlb_mbox_create_dir_port_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 ldb_credit_pool_id;
+	u32 dir_credit_pool_id;
+	u64 pop_count_address;
+	u16 ldb_credit_high_watermark;
+	u16 ldb_credit_low_watermark;
+	u16 ldb_credit_quantum;
+	u16 dir_credit_high_watermark;
+	u16 dir_credit_low_watermark;
+	u16 dir_credit_quantum;
+	u16 cq_depth;
+	u16 padding0;
+	u64 cq_base_address;
+	s32 queue_id;
+	u32 padding1;
+};
+
+struct dlb_mbox_create_dir_port_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 id;
+};
+
+struct dlb_mbox_enable_ldb_port_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 padding;
+};
+
+struct dlb_mbox_enable_ldb_port_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding;
+};
+
+struct dlb_mbox_disable_ldb_port_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 padding;
+};
+
+struct dlb_mbox_disable_ldb_port_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding;
+};
+
+struct dlb_mbox_enable_dir_port_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 padding;
+};
+
+struct dlb_mbox_enable_dir_port_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding;
+};
+
+struct dlb_mbox_disable_dir_port_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 padding;
+};
+
+struct dlb_mbox_disable_dir_port_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding;
+};
+
+struct dlb_mbox_ldb_port_owned_by_domain_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 padding;
+};
+
+struct dlb_mbox_ldb_port_owned_by_domain_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	s32 owned;
+};
+
+struct dlb_mbox_dir_port_owned_by_domain_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 padding;
+};
+
+struct dlb_mbox_dir_port_owned_by_domain_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	s32 owned;
+};
+
+struct dlb_mbox_map_qid_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 qid;
+	u32 priority;
+	u32 padding0;
+};
+
+struct dlb_mbox_map_qid_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 id;
+};
+
+struct dlb_mbox_unmap_qid_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 qid;
+};
+
+struct dlb_mbox_unmap_qid_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding;
+};
+
+struct dlb_mbox_start_domain_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+};
+
+struct dlb_mbox_start_domain_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding;
+};
+
+struct dlb_mbox_enable_ldb_port_intr_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u16 port_id;
+	u16 thresh;
+	u16 vector;
+	u16 owner_vf;
+	u16 reserved[2];
+};
+
+struct dlb_mbox_enable_ldb_port_intr_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding0;
+};
+
+struct dlb_mbox_enable_dir_port_intr_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u16 port_id;
+	u16 thresh;
+	u16 vector;
+	u16 owner_vf;
+	u16 reserved[2];
+};
+
+struct dlb_mbox_enable_dir_port_intr_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding0;
+};
+
+struct dlb_mbox_arm_cq_intr_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 is_ldb;
+};
+
+struct dlb_mbox_arm_cq_intr_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding0;
+};
+
+/* The alert_id and aux_alert_data follows the format of the alerts defined in
+ * dlb_types.h. The alert id contains an enum dlb_domain_alert_id value, and
+ * the aux_alert_data value varies depending on the alert.
+ */
+struct dlb_mbox_vf_alert_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 alert_id;
+	u32 aux_alert_data;
+};
+
+enum dlb_mbox_vf_notification_type {
+	DLB_MBOX_VF_NOTIFICATION_PRE_RESET,
+	DLB_MBOX_VF_NOTIFICATION_POST_RESET,
+
+	/* NUM_DLB_MBOX_VF_NOTIFICATION_TYPES must be last */
+	NUM_DLB_MBOX_VF_NOTIFICATION_TYPES,
+};
+
+struct dlb_mbox_vf_notification_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 notification;
+};
+
+struct dlb_mbox_vf_in_use_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 padding;
+};
+
+struct dlb_mbox_vf_in_use_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 in_use;
+};
+
+struct dlb_mbox_ack_vf_flr_done_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 padding;
+};
+
+struct dlb_mbox_ack_vf_flr_done_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 padding;
+};
+
+struct dlb_mbox_get_sn_allocation_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 group_id;
+};
+
+struct dlb_mbox_get_sn_allocation_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 num;
+};
+
+struct dlb_mbox_get_ldb_queue_depth_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 queue_id;
+	u32 padding;
+};
+
+struct dlb_mbox_get_ldb_queue_depth_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 depth;
+};
+
+struct dlb_mbox_get_dir_queue_depth_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 queue_id;
+	u32 padding;
+};
+
+struct dlb_mbox_get_dir_queue_depth_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 depth;
+};
+
+struct dlb_mbox_pending_port_unmaps_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 domain_id;
+	u32 port_id;
+	u32 padding;
+};
+
+struct dlb_mbox_pending_port_unmaps_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 num;
+};
+
+struct dlb_mbox_query_cq_poll_mode_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 padding;
+};
+
+struct dlb_mbox_query_cq_poll_mode_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 error_code;
+	u32 status;
+	u32 mode;
+};
+
+struct dlb_mbox_get_sn_occupancy_cmd_req {
+	struct dlb_mbox_req_hdr hdr;
+	u32 group_id;
+};
+
+struct dlb_mbox_get_sn_occupancy_cmd_resp {
+	struct dlb_mbox_resp_hdr hdr;
+	u32 num;
+};
+
+#endif /* __DLB_BASE_DLB_MBOX_H */
diff --git a/drivers/event/dlb/pf/base/dlb_osdep.h b/drivers/event/dlb/pf/base/dlb_osdep.h
new file mode 100644
index 0000000..9286f3d
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_osdep.h
@@ -0,0 +1,348 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_OSDEP_H__
+#define __DLB_OSDEP_H__
+
+#include <string.h>
+#include <time.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <rte_string_fns.h>
+#include <rte_cycles.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_spinlock.h>
+#include "../dlb_main.h"
+#include "dlb_resource.h"
+#include "../../dlb_user.h"
+
+
+#define DLB_PCI_REG_READ(reg)        rte_read32((void *)reg)
+#define DLB_PCI_REG_WRITE(reg, val)   rte_write32(val, (void *)reg)
+
+#define DLB_CSR_REG_ADDR(a, reg) ((void *)((uintptr_t)(a)->csr_kva + (reg)))
+#define DLB_CSR_RD(hw, reg) \
+	DLB_PCI_REG_READ(DLB_CSR_REG_ADDR((hw), (reg)))
+#define DLB_CSR_WR(hw, reg, val) \
+	DLB_PCI_REG_WRITE(DLB_CSR_REG_ADDR((hw), (reg)), (val))
+
+#define DLB_FUNC_REG_ADDR(a, reg) ((void *)((uintptr_t)(a)->func_kva + (reg)))
+#define DLB_FUNC_RD(hw, reg) \
+	DLB_PCI_REG_READ(DLB_FUNC_REG_ADDR((hw), (reg)))
+#define DLB_FUNC_WR(hw, reg, val) \
+	DLB_PCI_REG_WRITE(DLB_FUNC_REG_ADDR((hw), (reg)), (val))
+
+#define READ_ONCE(x) (x)
+#define WRITE_ONCE(x, y) ((x) = (y))
+
+#define OS_READ_ONCE(x) READ_ONCE(x)
+#define OS_WRITE_ONCE(x, y) WRITE_ONCE(x, y)
+
+
+extern unsigned int dlb_unregister_timeout_s;
+/**
+ * os_queue_unregister_timeout_s() - timeout (in seconds) to wait for queue
+ *                                   unregister acknowledgments.
+ */
+static inline unsigned int os_queue_unregister_timeout_s(void)
+{
+	return dlb_unregister_timeout_s;
+}
+
+static inline size_t os_strlcpy(char *dst, const char *src, size_t sz)
+{
+	return rte_strlcpy(dst, src, sz);
+}
+
+/**
+ * os_udelay() - busy-wait for a number of microseconds
+ * @usecs: delay duration.
+ */
+static inline void os_udelay(int usecs)
+{
+	rte_delay_us(usecs);
+}
+
+/**
+ * os_msleep() - sleep for a number of milliseconds
+ * @usecs: delay duration.
+ */
+
+static inline void os_msleep(int msecs)
+{
+	rte_delay_ms(msecs);
+}
+
+/**
+ * os_curtime_s() - get the current time (in seconds)
+ * @usecs: delay duration.
+ */
+static inline unsigned long os_curtime_s(void)
+{
+	struct timespec tv;
+
+	clock_gettime(CLOCK_MONOTONIC, &tv);
+
+	return (unsigned long)tv.tv_sec;
+}
+
+#define DLB_PP_BASE(__is_ldb) ((__is_ldb) ? DLB_LDB_PP_BASE : DLB_DIR_PP_BASE)
+/**
+ * os_map_producer_port() - map a producer port into the caller's address space
+ * @hw: dlb_hw handle for a particular device.
+ * @port_id: port ID
+ * @is_ldb: true for load-balanced port, false for a directed port
+ *
+ * This function maps the requested producer port memory into the caller's
+ * address space.
+ *
+ * Return:
+ * Returns the base address at which the PP memory was mapped, else NULL.
+ */
+static inline void *os_map_producer_port(struct dlb_hw *hw,
+					 u8 port_id,
+					 bool is_ldb)
+{
+	uint64_t addr;
+	uint64_t pp_dma_base;
+
+
+	pp_dma_base = (uintptr_t)hw->func_kva + DLB_PP_BASE(is_ldb);
+	addr = (pp_dma_base + (PAGE_SIZE * port_id));
+
+	return (void *)(uintptr_t)addr;
+
+}
+/**
+ * os_unmap_producer_port() - unmap a producer port
+ * @addr: mapped producer port address
+ *
+ * This function undoes os_map_producer_port() by unmapping the producer port
+ * memory from the caller's address space.
+ *
+ * Return:
+ * Returns the base address at which the PP memory was mapped, else NULL.
+ */
+
+/* PFPMD - Nothing to do here, since memory was not actually mapped by us */
+static inline void os_unmap_producer_port(struct dlb_hw *hw, void *addr)
+{
+	RTE_SET_USED(hw);
+	RTE_SET_USED(addr);
+}
+/**
+ * os_enqueue_four_hcws() - enqueue four HCWs to DLB
+ * @hw: dlb_hw handle for a particular device.
+ * @hcw: pointer to the 64B-aligned contiguous HCW memory
+ * @addr: producer port address
+ */
+static inline void os_enqueue_four_hcws(struct dlb_hw *hw,
+					struct dlb_hcw *hcw,
+					void *addr)
+{
+	struct dlb_dev *dlb_dev;
+
+	dlb_dev = container_of(hw, struct dlb_dev, hw);
+
+	dlb_dev->enqueue_four(hcw, addr);
+}
+
+/**
+ * os_fence_hcw() - fence an HCW to ensure it arrives at the device
+ * @hw: dlb_hw handle for a particular device.
+ * @pp_addr: producer port address
+ */
+static inline void os_fence_hcw(struct dlb_hw *hw, u64 *pp_addr)
+{
+	RTE_SET_USED(hw);
+
+	/* To ensure outstanding HCWs reach the device, read the PP address. IA
+	 * memory ordering prevents reads from passing older writes, and the
+	 * mfence also ensures this.
+	 */
+	rte_mb();
+
+	*(volatile u64 *)pp_addr;
+}
+
+#define DLB_ERR(dev, fmt, args...) \
+	RTE_LOG(ERR, PMD, "%s() line %u: " fmt "\n",  \
+			__func__, __LINE__, ## args)
+
+#define DLB_INFO(dev, fmt, args...) \
+	RTE_LOG(INFO, PMD, "%s() line %u: " fmt "\n", \
+			__func__, __LINE__, ## args)
+
+#define DLB_DEBUG(dev, fmt, args...) \
+	RTE_LOG(DEBUG, PMD, "%s() line %u: " fmt "\n", \
+			__func__, __LINE__, ## args)
+
+/**
+ * DLB_HW_ERR() - log an error message
+ * @dlb: dlb_hw handle for a particular device.
+ * @...: variable string args.
+ */
+#define DLB_HW_ERR(dlb, ...) do {	\
+	RTE_SET_USED(dlb);		\
+	DLB_ERR(dlb, __VA_ARGS__);	\
+} while (0)
+
+/**
+ * DLB_HW_INFO() - log an info message
+ * @dlb: dlb_hw handle for a particular device.
+ * @...: variable string args.
+ */
+#define DLB_HW_INFO(dlb, ...) do {	\
+	RTE_SET_USED(dlb);		\
+	DLB_INFO(dlb, __VA_ARGS__);	\
+} while (0)
+
+/*** scheduling functions ***/
+
+/* The callback runs until it completes all outstanding QID->CQ
+ * map and unmap requests. To prevent deadlock, this function gives other
+ * threads a chance to grab the resource mutex and configure hardware.
+ */
+static void *dlb_complete_queue_map_unmap(void *__args)
+{
+	struct dlb_dev *dlb_dev = (struct dlb_dev *)__args;
+	int ret;
+
+	while (1) {
+		rte_spinlock_lock(&dlb_dev->resource_mutex);
+
+		ret = dlb_finish_unmap_qid_procedures(&dlb_dev->hw);
+		ret += dlb_finish_map_qid_procedures(&dlb_dev->hw);
+
+		if (ret != 0) {
+			rte_spinlock_unlock(&dlb_dev->resource_mutex);
+			/* Relinquish the CPU so the application can process
+			 * its CQs, so this function doesn't deadlock.
+			 */
+			sched_yield();
+		} else
+			break;
+	}
+
+	dlb_dev->worker_launched = false;
+
+	rte_spinlock_unlock(&dlb_dev->resource_mutex);
+
+	return NULL;
+}
+
+
+/**
+ * os_schedule_work() - launch a thread to process pending map and unmap work
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function launches a thread that will run until all pending
+ * map and unmap procedures are complete.
+ */
+static inline void os_schedule_work(struct dlb_hw *hw)
+{
+	struct dlb_dev *dlb_dev;
+	pthread_t complete_queue_map_unmap_thread;
+	int ret;
+
+	dlb_dev = container_of(hw, struct dlb_dev, hw);
+
+	ret = pthread_create(&complete_queue_map_unmap_thread,
+			     NULL,
+			     dlb_complete_queue_map_unmap,
+			     dlb_dev);
+	if (ret)
+		DLB_ERR(dlb_dev,
+		"Could not create queue complete map /unmap thread, err=%d\n",
+			  ret);
+	else
+		dlb_dev->worker_launched = true;
+}
+
+/**
+ * os_worker_active() - query whether the map/unmap worker thread is active
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function returns a boolean indicating whether a thread (launched by
+ * os_schedule_work()) is active. This function is used to determine
+ * whether or not to launch a worker thread.
+ */
+static inline bool os_worker_active(struct dlb_hw *hw)
+{
+	struct dlb_dev *dlb_dev;
+
+	dlb_dev = container_of(hw, struct dlb_dev, hw);
+
+	return dlb_dev->worker_launched;
+}
+
+/**
+ * os_notify_user_space() - notify user space
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: ID of domain to notify.
+ * @alert_id: alert ID.
+ * @aux_alert_data: additional alert data.
+ *
+ * This function notifies user space of an alert (such as a remote queue
+ * unregister or hardware alarm).
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ */
+static inline int os_notify_user_space(struct dlb_hw *hw,
+				       u32 domain_id,
+				       u64 alert_id,
+				       u64 aux_alert_data)
+{
+	RTE_SET_USED(hw);
+	RTE_SET_USED(domain_id);
+	RTE_SET_USED(alert_id);
+	RTE_SET_USED(aux_alert_data);
+
+	rte_panic("internal_error: %s should never be called for DLB PF PMD\n",
+		  __func__);
+	return -1;
+}
+
+enum dlb_dev_revision {
+	DLB_A0,
+	DLB_A1,
+	DLB_A2,
+	DLB_A3,
+	DLB_B0,
+};
+
+#include <cpuid.h>
+
+/**
+ * os_get_dev_revision() - query the device_revision
+ * @hw: dlb_hw handle for a particular device.
+ */
+static inline enum dlb_dev_revision os_get_dev_revision(struct dlb_hw *hw)
+{
+	uint32_t a, b, c, d, stepping;
+
+	RTE_SET_USED(hw);
+
+	__cpuid(0x1, a, b, c, d);
+
+	stepping = a & 0xf;
+
+	switch (stepping) {
+	case 0:
+		return DLB_A0;
+	case 1:
+		return DLB_A1;
+	case 2:
+		return DLB_A2;
+	case 3:
+		return DLB_A3;
+	default:
+		/* Treat all revisions >= 4 as B0 */
+		return DLB_B0;
+	}
+}
+
+#endif /*  __DLB_OSDEP_H__ */
diff --git a/drivers/event/dlb/pf/base/dlb_osdep_bitmap.h b/drivers/event/dlb/pf/base/dlb_osdep_bitmap.h
new file mode 100644
index 0000000..8df1d59
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_osdep_bitmap.h
@@ -0,0 +1,442 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_OSDEP_BITMAP_H__
+#define __DLB_OSDEP_BITMAP_H__
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <rte_bitmap.h>
+#include <rte_string_fns.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include "../dlb_main.h"
+
+
+/*************************/
+/*** Bitmap operations ***/
+/*************************/
+struct dlb_bitmap {
+	struct rte_bitmap *map;
+	unsigned int len;
+	struct dlb_hw *hw;
+};
+
+/**
+ * dlb_bitmap_alloc() - alloc a bitmap data structure
+ * @bitmap: pointer to dlb_bitmap structure pointer.
+ * @len: number of entries in the bitmap.
+ *
+ * This function allocates a bitmap and initializes it with length @len. All
+ * entries are initially zero.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or len is 0.
+ * ENOMEM - could not allocate memory for the bitmap data structure.
+ */
+static inline int dlb_bitmap_alloc(struct dlb_hw *hw,
+				   struct dlb_bitmap **bitmap,
+				   unsigned int len)
+{
+	struct dlb_bitmap *bm;
+	void *mem;
+	uint32_t alloc_size;
+	uint32_t nbits = (uint32_t) len;
+	RTE_SET_USED(hw);
+
+	if (!bitmap || nbits == 0)
+		return -EINVAL;
+
+	/* Allocate DLB bitmap control struct */
+	bm = rte_malloc("DLB_PF",
+		sizeof(struct dlb_bitmap),
+		RTE_CACHE_LINE_SIZE);
+
+	if (!bm)
+		return -ENOMEM;
+
+	/* Allocate bitmap memory */
+	alloc_size = rte_bitmap_get_memory_footprint(nbits);
+	mem = rte_malloc("DLB_PF_BITMAP", alloc_size, RTE_CACHE_LINE_SIZE);
+	if (!mem) {
+		rte_free(bm);
+		return -ENOMEM;
+	}
+
+	bm->map = rte_bitmap_init(len, mem, alloc_size);
+	if (!bm->map) {
+		rte_free(mem);
+		rte_free(bm);
+		return -ENOMEM;
+	}
+
+	bm->len = len;
+
+	*bitmap = bm;
+
+	return 0;
+}
+
+/**
+ * dlb_bitmap_free() - free a previously allocated bitmap data structure
+ * @bitmap: pointer to dlb_bitmap structure.
+ *
+ * This function frees a bitmap that was allocated with dlb_bitmap_alloc().
+ */
+static inline void dlb_bitmap_free(struct dlb_bitmap *bitmap)
+{
+	if (!bitmap)
+		rte_panic("NULL dlb_bitmap in %s\n", __func__);
+
+	rte_free(bitmap->map);
+	rte_free(bitmap);
+}
+
+/**
+ * dlb_bitmap_fill() - fill a bitmap with all 1s
+ * @bitmap: pointer to dlb_bitmap structure.
+ *
+ * This function sets all bitmap values to 1.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or is uninitialized.
+ */
+static inline int dlb_bitmap_fill(struct dlb_bitmap *bitmap)
+{
+	unsigned int i;
+
+	if (!bitmap || !bitmap->map)
+		return -EINVAL;
+
+	for (i = 0; i != bitmap->len; i++)
+		rte_bitmap_set(bitmap->map, i);
+
+	return 0;
+}
+
+/**
+ * dlb_bitmap_fill() - fill a bitmap with all 0s
+ * @bitmap: pointer to dlb_bitmap structure.
+ *
+ * This function sets all bitmap values to 0.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or is uninitialized.
+ */
+static inline int dlb_bitmap_zero(struct dlb_bitmap *bitmap)
+{
+	if (!bitmap || !bitmap->map)
+		return -EINVAL;
+
+	rte_bitmap_reset(bitmap->map);
+
+	return 0;
+}
+
+/**
+ * dlb_bitmap_set() - set a bitmap entry
+ * @bitmap: pointer to dlb_bitmap structure.
+ * @bit: bit index.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or is uninitialized, or bit is larger than the
+ *	    bitmap length.
+ */
+static inline int dlb_bitmap_set(struct dlb_bitmap *bitmap,
+				 unsigned int bit)
+{
+	if (!bitmap || !bitmap->map)
+		return -EINVAL;
+
+	if (bitmap->len <= bit)
+		return -EINVAL;
+
+	rte_bitmap_set(bitmap->map, bit);
+
+	return 0;
+}
+
+/**
+ * dlb_bitmap_set_range() - set a range of bitmap entries
+ * @bitmap: pointer to dlb_bitmap structure.
+ * @bit: starting bit index.
+ * @len: length of the range.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or is uninitialized, or the range exceeds the bitmap
+ *	    length.
+ */
+static inline int dlb_bitmap_set_range(struct dlb_bitmap *bitmap,
+				       unsigned int bit,
+				       unsigned int len)
+{
+	unsigned int i;
+
+	if (!bitmap || !bitmap->map)
+		return -EINVAL;
+
+	if (bitmap->len <= bit)
+		return -EINVAL;
+
+	for (i = 0; i != len; i++)
+		rte_bitmap_set(bitmap->map, bit + i);
+
+	return 0;
+}
+
+/**
+ * dlb_bitmap_clear() - clear a bitmap entry
+ * @bitmap: pointer to dlb_bitmap structure.
+ * @bit: bit index.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or is uninitialized, or bit is larger than the
+ *	    bitmap length.
+ */
+static inline int dlb_bitmap_clear(struct dlb_bitmap *bitmap,
+				   unsigned int bit)
+{
+	if (!bitmap || !bitmap->map)
+		return -EINVAL;
+
+	if (bitmap->len <= bit)
+		return -EINVAL;
+
+	rte_bitmap_clear(bitmap->map, bit);
+
+	return 0;
+}
+
+/**
+ * dlb_bitmap_clear_range() - clear a range of bitmap entries
+ * @bitmap: pointer to dlb_bitmap structure.
+ * @bit: starting bit index.
+ * @len: length of the range.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or is uninitialized, or the range exceeds the bitmap
+ *	    length.
+ */
+static inline int dlb_bitmap_clear_range(struct dlb_bitmap *bitmap,
+					 unsigned int bit,
+					 unsigned int len)
+{
+	unsigned int i;
+
+	if (!bitmap || !bitmap->map)
+		return -EINVAL;
+
+	if (bitmap->len <= bit)
+		return -EINVAL;
+
+	for (i = 0; i != len; i++)
+		rte_bitmap_clear(bitmap->map, bit + i);
+
+	return 0;
+}
+
+/**
+ * dlb_bitmap_find_set_bit_range() - find a range of set bits
+ * @bitmap: pointer to dlb_bitmap structure.
+ * @len: length of the range.
+ *
+ * This function looks for a range of set bits of length @len.
+ *
+ * Return:
+ * Returns the base bit index upon success, < 0 otherwise.
+ *
+ * Errors:
+ * ENOENT - unable to find a length *len* range of set bits.
+ * EINVAL - bitmap is NULL or is uninitialized, or len is invalid.
+ */
+static inline int dlb_bitmap_find_set_bit_range(struct dlb_bitmap *bitmap,
+						unsigned int len)
+{
+	unsigned int i, j = 0;
+
+	if (!bitmap || !bitmap->map || len == 0)
+		return -EINVAL;
+
+	if (bitmap->len < len)
+		return -ENOENT;
+
+	for (i = 0; i != bitmap->len; i++) {
+		if  (rte_bitmap_get(bitmap->map, i)) {
+			if (++j == len)
+				return i - j + 1;
+		} else
+			j = 0;
+	}
+
+	/* No set bit range of length len? */
+	return -ENOENT;
+}
+
+/**
+ * dlb_bitmap_find_set_bit() - find the first set bit
+ * @bitmap: pointer to dlb_bitmap structure.
+ *
+ * This function looks for a single set bit.
+ *
+ * Return:
+ * Returns the base bit index upon success, < 0 otherwise.
+ *
+ * Errors:
+ * ENOENT - the bitmap contains no set bits.
+ * EINVAL - bitmap is NULL or is uninitialized, or len is invalid.
+ */
+static inline int dlb_bitmap_find_set_bit(struct dlb_bitmap *bitmap)
+{
+	unsigned int i;
+
+	if (!bitmap)
+		return -EINVAL;
+
+	if (!bitmap->map)
+		return -EINVAL;
+
+	for (i = 0; i != bitmap->len; i++) {
+		if  (rte_bitmap_get(bitmap->map, i))
+			return i;
+	}
+
+	return -ENOENT;
+}
+
+/**
+ * dlb_bitmap_count() - returns the number of set bits
+ * @bitmap: pointer to dlb_bitmap structure.
+ *
+ * This function looks for a single set bit.
+ *
+ * Return:
+ * Returns the number of set bits upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or is uninitialized.
+ */
+static inline int dlb_bitmap_count(struct dlb_bitmap *bitmap)
+{
+	int weight = 0;
+	unsigned int i;
+
+	if (!bitmap)
+		return -EINVAL;
+
+	if (!bitmap->map)
+		return -EINVAL;
+
+	for (i = 0; i != bitmap->len; i++) {
+		if  (rte_bitmap_get(bitmap->map, i))
+			weight++;
+	}
+	return weight;
+}
+
+/**
+ * dlb_bitmap_longest_set_range() - returns longest contiguous range of set bits
+ * @bitmap: pointer to dlb_bitmap structure.
+ *
+ * Return:
+ * Returns the bitmap's longest contiguous range of of set bits upon success,
+ * <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - bitmap is NULL or is uninitialized.
+ */
+static inline int dlb_bitmap_longest_set_range(struct dlb_bitmap *bitmap)
+{
+	int max_len = 0, len = 0;
+	unsigned int i;
+
+	if (!bitmap)
+		return -EINVAL;
+
+	if (!bitmap->map)
+		return -EINVAL;
+
+	for (i = 0; i != bitmap->len; i++) {
+		if  (rte_bitmap_get(bitmap->map, i)) {
+			len++;
+		} else {
+			if (len > max_len)
+				max_len = len;
+			len = 0;
+		}
+	}
+
+	if (len > max_len)
+		max_len = len;
+
+	return max_len;
+}
+
+/**
+ * dlb_bitmap_or() - store the logical 'or' of two bitmaps into a third
+ * @dest: pointer to dlb_bitmap structure, which will contain the results of
+ *	  the 'or' of src1 and src2.
+ * @src1: pointer to dlb_bitmap structure, will be 'or'ed with src2.
+ * @src2: pointer to dlb_bitmap structure, will be 'or'ed with src1.
+ *
+ * This function 'or's two bitmaps together and stores the result in a third
+ * bitmap. The source and destination bitmaps can be the same.
+ *
+ * Return:
+ * Returns the number of set bits upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - One of the bitmaps is NULL or is uninitialized.
+ */
+static inline int dlb_bitmap_or(struct dlb_bitmap *dest,
+				struct dlb_bitmap *src1,
+				struct dlb_bitmap *src2)
+{
+	unsigned int i, min;
+	int numset = 0;
+
+	if (!dest || !dest->map ||
+	    !src1 || !src1->map ||
+	    !src2 || !src2->map)
+		return -EINVAL;
+
+	min = dest->len;
+	min = (min > src1->len) ? src1->len : min;
+	min = (min > src2->len) ? src2->len : min;
+
+	for (i = 0; i != min; i++) {
+		if  (rte_bitmap_get(src1->map, i) ||
+				rte_bitmap_get(src2->map, i)) {
+			rte_bitmap_set(dest->map, i);
+			numset++;
+		} else
+			rte_bitmap_clear(dest->map, i);
+	}
+
+	return numset;
+}
+
+#endif /*  __DLB_OSDEP_BITMAP_H__ */
diff --git a/drivers/event/dlb/pf/base/dlb_osdep_list.h b/drivers/event/dlb/pf/base/dlb_osdep_list.h
new file mode 100644
index 0000000..a53b362
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_osdep_list.h
@@ -0,0 +1,131 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_OSDEP_LIST_H__
+#define __DLB_OSDEP_LIST_H__
+
+#include <rte_tailq.h>
+
+struct dlb_list_entry {
+	TAILQ_ENTRY(dlb_list_entry) node;
+};
+
+/* Dummy - just a struct definition */
+TAILQ_HEAD(dlb_list_head, dlb_list_entry);
+
+/* =================
+ * TAILQ Supplements
+ * =================
+ */
+
+#ifndef TAILQ_FOREACH_ENTRY
+#define TAILQ_FOREACH_ENTRY(ptr, head, name, iter)		\
+	for ((iter) = TAILQ_FIRST(&head);			\
+	    (iter)						\
+		&& (ptr = container_of(iter, typeof(*(ptr)), name)); \
+	    (iter) = TAILQ_NEXT((iter), node))
+#endif
+
+#ifndef TAILQ_FOREACH_ENTRY_SAFE
+#define TAILQ_FOREACH_ENTRY_SAFE(ptr, head, name, iter, tvar)	\
+	for ((iter) = TAILQ_FIRST(&head);			\
+	    (iter) &&						\
+		(ptr = container_of(iter, typeof(*(ptr)), name)) &&\
+		((tvar) = TAILQ_NEXT((iter), node), 1);	\
+	    (iter) = (tvar))
+#endif
+
+/* =========
+ * DLB Lists
+ * =========
+ */
+
+/**
+ * dlb_list_init_head() - initialize the head of a list
+ * @head: list head
+ */
+static inline void dlb_list_init_head(struct dlb_list_head *head)
+{
+	TAILQ_INIT(head);
+}
+
+/**
+ * dlb_list_add() - add an entry to a list
+ * @head: new entry will be added after this list header
+ * @entry: new list entry to be added
+ */
+static inline void dlb_list_add(struct dlb_list_head *head,
+				struct dlb_list_entry *entry)
+{
+	TAILQ_INSERT_TAIL(head, entry, node);
+}
+
+/**
+ * @head: list head
+ * @entry: list entry to be deleted
+ */
+static inline void dlb_list_del(struct dlb_list_head *head,
+				struct dlb_list_entry *entry)
+{
+	TAILQ_REMOVE(head, entry, node);
+}
+
+/**
+ * dlb_list_empty() - check if a list is empty
+ * @head: list head
+ *
+ * Return:
+ * Returns 1 if empty, 0 if not.
+ */
+static inline bool dlb_list_empty(struct dlb_list_head *head)
+{
+	return TAILQ_EMPTY(head);
+}
+
+/**
+ * dlb_list_empty() - check if a list is empty
+ * @src_head: list to be added
+ * @ head: where src_head will be inserted
+ */
+static inline void dlb_list_splice(struct dlb_list_head *src_head,
+				   struct dlb_list_head *head)
+{
+	TAILQ_CONCAT(head, src_head, node);
+}
+
+/**
+ * DLB_LIST_HEAD() - retrieve the head of the list
+ * @head: list head
+ * @type: type of the list variable
+ * @name: name of the dlb_list within the struct
+ */
+#define DLB_LIST_HEAD(head, type, name)				\
+	(TAILQ_FIRST(&head) ?					\
+		container_of(TAILQ_FIRST(&head), type, name) :	\
+		NULL)
+
+/**
+ * DLB_LIST_FOR_EACH() - iterate over a list
+ * @head: list head
+ * @ptr: pointer to struct containing a struct dlb_list_entry
+ * @name: name of the dlb_list_entry field within the containing struct
+ * @iter: iterator variable
+ */
+#define DLB_LIST_FOR_EACH(head, ptr, name, tmp_iter) \
+	TAILQ_FOREACH_ENTRY(ptr, head, name, tmp_iter)
+
+/**
+ * DLB_LIST_FOR_EACH_SAFE() - iterate over a list. This loop works even if
+ * an element is removed from the list while processing it.
+ * @ptr: pointer to struct containing a struct dlb_list_entry
+ * @ptr_tmp: pointer to struct containing a struct dlb_list_entry (temporary)
+ * @head: list head
+ * @name: name of the dlb_list_entry field within the containing struct
+ * @iter: iterator variable
+ * @iter_tmp: iterator variable (temporary)
+ */
+#define DLB_LIST_FOR_EACH_SAFE(head, ptr, ptr_tmp, name, tmp_iter, saf_iter) \
+	TAILQ_FOREACH_ENTRY_SAFE(ptr, head, name, tmp_iter, saf_iter)
+
+#endif /*  __DLB_OSDEP_LIST_H__ */
diff --git a/drivers/event/dlb/pf/base/dlb_osdep_types.h b/drivers/event/dlb/pf/base/dlb_osdep_types.h
new file mode 100644
index 0000000..2e9d7d8
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_osdep_types.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_OSDEP_TYPES_H
+#define __DLB_OSDEP_TYPES_H
+
+#include <linux/types.h>
+
+#include <inttypes.h>
+#include <ctype.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+
+/* Types for user mode PF PMD */
+typedef uint8_t         u8;
+typedef int8_t          s8;
+typedef uint16_t        u16;
+typedef int16_t         s16;
+typedef uint32_t        u32;
+typedef int32_t         s32;
+typedef uint64_t        u64;
+
+#define __iomem
+
+/* END types for user mode PF PMD */
+
+#endif /* __DLB_OSDEP_TYPES_H */
diff --git a/drivers/event/dlb/pf/base/dlb_regs.h b/drivers/event/dlb/pf/base/dlb_regs.h
new file mode 100644
index 0000000..3b0be23
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_regs.h
@@ -0,0 +1,2646 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_REGS_H
+#define __DLB_REGS_H
+
+#include "dlb_osdep_types.h"
+
+#define DLB_FUNC_PF_VF2PF_MAILBOX_BYTES 256
+#define DLB_FUNC_PF_VF2PF_MAILBOX(vf_id, x) \
+	(0x1000 + 0x4 * (x) + (vf_id) * 0x10000)
+#define DLB_FUNC_PF_VF2PF_MAILBOX_RST 0x0
+union dlb_func_pf_vf2pf_mailbox {
+	struct {
+		u32 msg : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_PF_VF2PF_MAILBOX_ISR(vf_id) \
+	(0x1f00 + (vf_id) * 0x10000)
+#define DLB_FUNC_PF_VF2PF_MAILBOX_ISR_RST 0x0
+union dlb_func_pf_vf2pf_mailbox_isr {
+	struct {
+		u32 vf_isr : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_PF_VF2PF_FLR_ISR(vf_id) \
+	(0x1f04 + (vf_id) * 0x10000)
+#define DLB_FUNC_PF_VF2PF_FLR_ISR_RST 0x0
+union dlb_func_pf_vf2pf_flr_isr {
+	struct {
+		u32 vf_isr : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_PF_VF2PF_ISR_PEND(vf_id) \
+	(0x1f10 + (vf_id) * 0x10000)
+#define DLB_FUNC_PF_VF2PF_ISR_PEND_RST 0x0
+union dlb_func_pf_vf2pf_isr_pend {
+	struct {
+		u32 isr_pend : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_PF_PF2VF_MAILBOX_BYTES 64
+#define DLB_FUNC_PF_PF2VF_MAILBOX(vf_id, x) \
+	(0x2000 + 0x4 * (x) + (vf_id) * 0x10000)
+#define DLB_FUNC_PF_PF2VF_MAILBOX_RST 0x0
+union dlb_func_pf_pf2vf_mailbox {
+	struct {
+		u32 msg : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_PF_PF2VF_MAILBOX_ISR(vf_id) \
+	(0x2f00 + (vf_id) * 0x10000)
+#define DLB_FUNC_PF_PF2VF_MAILBOX_ISR_RST 0x0
+union dlb_func_pf_pf2vf_mailbox_isr {
+	struct {
+		u32 isr : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_PF_VF_RESET_IN_PROGRESS(vf_id) \
+	(0x3000 + (vf_id) * 0x10000)
+#define DLB_FUNC_PF_VF_RESET_IN_PROGRESS_RST 0xffff
+union dlb_func_pf_vf_reset_in_progress {
+	struct {
+		u32 reset_in_progress : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_MSIX_MEM_VECTOR_CTRL(x) \
+	(0x100000c + (x) * 0x10)
+#define DLB_MSIX_MEM_VECTOR_CTRL_RST 0x1
+union dlb_msix_mem_vector_ctrl {
+	struct {
+		u32 vec_mask : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_TOTAL_VAS 0x124
+#define DLB_SYS_TOTAL_VAS_RST 0x20
+union dlb_sys_total_vas {
+	struct {
+		u32 total_vas : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_ALARM_PF_SYND2 0x508
+#define DLB_SYS_ALARM_PF_SYND2_RST 0x0
+union dlb_sys_alarm_pf_synd2 {
+	struct {
+		u32 lock_id : 16;
+		u32 meas : 1;
+		u32 debug : 7;
+		u32 cq_pop : 1;
+		u32 qe_uhl : 1;
+		u32 qe_orsp : 1;
+		u32 qe_valid : 1;
+		u32 cq_int_rearm : 1;
+		u32 dsi_error : 1;
+		u32 rsvd0 : 2;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_ALARM_PF_SYND1 0x504
+#define DLB_SYS_ALARM_PF_SYND1_RST 0x0
+union dlb_sys_alarm_pf_synd1 {
+	struct {
+		u32 dsi : 16;
+		u32 qid : 8;
+		u32 qtype : 2;
+		u32 qpri : 3;
+		u32 msg_type : 3;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_ALARM_PF_SYND0 0x500
+#define DLB_SYS_ALARM_PF_SYND0_RST 0x0
+union dlb_sys_alarm_pf_synd0 {
+	struct {
+		u32 syndrome : 8;
+		u32 rtype : 2;
+		u32 rsvd0 : 2;
+		u32 from_dmv : 1;
+		u32 is_ldb : 1;
+		u32 cls : 2;
+		u32 aid : 6;
+		u32 unit : 4;
+		u32 source : 4;
+		u32 more : 1;
+		u32 valid : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_VF_LDB_VPP_V(x) \
+	(0xf00 + (x) * 0x1000)
+#define DLB_SYS_VF_LDB_VPP_V_RST 0x0
+union dlb_sys_vf_ldb_vpp_v {
+	struct {
+		u32 vpp_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_VF_LDB_VPP2PP(x) \
+	(0xf08 + (x) * 0x1000)
+#define DLB_SYS_VF_LDB_VPP2PP_RST 0x0
+union dlb_sys_vf_ldb_vpp2pp {
+	struct {
+		u32 pp : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_VF_DIR_VPP_V(x) \
+	(0xf10 + (x) * 0x1000)
+#define DLB_SYS_VF_DIR_VPP_V_RST 0x0
+union dlb_sys_vf_dir_vpp_v {
+	struct {
+		u32 vpp_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_VF_DIR_VPP2PP(x) \
+	(0xf18 + (x) * 0x1000)
+#define DLB_SYS_VF_DIR_VPP2PP_RST 0x0
+union dlb_sys_vf_dir_vpp2pp {
+	struct {
+		u32 pp : 7;
+		u32 rsvd0 : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_VF_LDB_VQID_V(x) \
+	(0xf20 + (x) * 0x1000)
+#define DLB_SYS_VF_LDB_VQID_V_RST 0x0
+union dlb_sys_vf_ldb_vqid_v {
+	struct {
+		u32 vqid_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_VF_LDB_VQID2QID(x) \
+	(0xf28 + (x) * 0x1000)
+#define DLB_SYS_VF_LDB_VQID2QID_RST 0x0
+union dlb_sys_vf_ldb_vqid2qid {
+	struct {
+		u32 qid : 7;
+		u32 rsvd0 : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_QID2VQID(x) \
+	(0xf2c + (x) * 0x1000)
+#define DLB_SYS_LDB_QID2VQID_RST 0x0
+union dlb_sys_ldb_qid2vqid {
+	struct {
+		u32 vqid : 7;
+		u32 rsvd0 : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_VF_DIR_VQID_V(x) \
+	(0xf30 + (x) * 0x1000)
+#define DLB_SYS_VF_DIR_VQID_V_RST 0x0
+union dlb_sys_vf_dir_vqid_v {
+	struct {
+		u32 vqid_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_VF_DIR_VQID2QID(x) \
+	(0xf38 + (x) * 0x1000)
+#define DLB_SYS_VF_DIR_VQID2QID_RST 0x0
+union dlb_sys_vf_dir_vqid2qid {
+	struct {
+		u32 qid : 7;
+		u32 rsvd0 : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_VASQID_V(x) \
+	(0xf60 + (x) * 0x1000)
+#define DLB_SYS_LDB_VASQID_V_RST 0x0
+union dlb_sys_ldb_vasqid_v {
+	struct {
+		u32 vasqid_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_VASQID_V(x) \
+	(0xf68 + (x) * 0x1000)
+#define DLB_SYS_DIR_VASQID_V_RST 0x0
+union dlb_sys_dir_vasqid_v {
+	struct {
+		u32 vasqid_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_WBUF_DIR_FLAGS(x) \
+	(0xf70 + (x) * 0x1000)
+#define DLB_SYS_WBUF_DIR_FLAGS_RST 0x0
+union dlb_sys_wbuf_dir_flags {
+	struct {
+		u32 wb_v : 4;
+		u32 cl : 1;
+		u32 busy : 1;
+		u32 opt : 1;
+		u32 rsvd0 : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_WBUF_LDB_FLAGS(x) \
+	(0xf78 + (x) * 0x1000)
+#define DLB_SYS_WBUF_LDB_FLAGS_RST 0x0
+union dlb_sys_wbuf_ldb_flags {
+	struct {
+		u32 wb_v : 4;
+		u32 cl : 1;
+		u32 busy : 1;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_ALARM_VF_SYND2(x) \
+	(0x8000018 + (x) * 0x1000)
+#define DLB_SYS_ALARM_VF_SYND2_RST 0x0
+union dlb_sys_alarm_vf_synd2 {
+	struct {
+		u32 lock_id : 16;
+		u32 meas : 1;
+		u32 debug : 7;
+		u32 cq_pop : 1;
+		u32 qe_uhl : 1;
+		u32 qe_orsp : 1;
+		u32 qe_valid : 1;
+		u32 cq_int_rearm : 1;
+		u32 dsi_error : 1;
+		u32 rsvd0 : 2;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_ALARM_VF_SYND1(x) \
+	(0x8000014 + (x) * 0x1000)
+#define DLB_SYS_ALARM_VF_SYND1_RST 0x0
+union dlb_sys_alarm_vf_synd1 {
+	struct {
+		u32 dsi : 16;
+		u32 qid : 8;
+		u32 qtype : 2;
+		u32 qpri : 3;
+		u32 msg_type : 3;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_ALARM_VF_SYND0(x) \
+	(0x8000010 + (x) * 0x1000)
+#define DLB_SYS_ALARM_VF_SYND0_RST 0x0
+union dlb_sys_alarm_vf_synd0 {
+	struct {
+		u32 syndrome : 8;
+		u32 rtype : 2;
+		u32 rsvd0 : 2;
+		u32 from_dmv : 1;
+		u32 is_ldb : 1;
+		u32 cls : 2;
+		u32 aid : 6;
+		u32 unit : 4;
+		u32 source : 4;
+		u32 more : 1;
+		u32 valid : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_QID_V(x) \
+	(0x8000034 + (x) * 0x1000)
+#define DLB_SYS_LDB_QID_V_RST 0x0
+union dlb_sys_ldb_qid_v {
+	struct {
+		u32 qid_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_QID_CFG_V(x) \
+	(0x8000030 + (x) * 0x1000)
+#define DLB_SYS_LDB_QID_CFG_V_RST 0x0
+union dlb_sys_ldb_qid_cfg_v {
+	struct {
+		u32 sn_cfg_v : 1;
+		u32 fid_cfg_v : 1;
+		u32 rsvd0 : 30;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_QID_V(x) \
+	(0x8000040 + (x) * 0x1000)
+#define DLB_SYS_DIR_QID_V_RST 0x0
+union dlb_sys_dir_qid_v {
+	struct {
+		u32 qid_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_POOL_ENBLD(x) \
+	(0x8000070 + (x) * 0x1000)
+#define DLB_SYS_LDB_POOL_ENBLD_RST 0x0
+union dlb_sys_ldb_pool_enbld {
+	struct {
+		u32 pool_enabled : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_POOL_ENBLD(x) \
+	(0x8000080 + (x) * 0x1000)
+#define DLB_SYS_DIR_POOL_ENBLD_RST 0x0
+union dlb_sys_dir_pool_enbld {
+	struct {
+		u32 pool_enabled : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_PP2VPP(x) \
+	(0x8000090 + (x) * 0x1000)
+#define DLB_SYS_LDB_PP2VPP_RST 0x0
+union dlb_sys_ldb_pp2vpp {
+	struct {
+		u32 vpp : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_PP2VPP(x) \
+	(0x8000094 + (x) * 0x1000)
+#define DLB_SYS_DIR_PP2VPP_RST 0x0
+union dlb_sys_dir_pp2vpp {
+	struct {
+		u32 vpp : 7;
+		u32 rsvd0 : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_PP_V(x) \
+	(0x8000128 + (x) * 0x1000)
+#define DLB_SYS_LDB_PP_V_RST 0x0
+union dlb_sys_ldb_pp_v {
+	struct {
+		u32 pp_v : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_CQ_ISR(x) \
+	(0x8000124 + (x) * 0x1000)
+#define DLB_SYS_LDB_CQ_ISR_RST 0x0
+/* CQ Interrupt Modes */
+#define DLB_CQ_ISR_MODE_DIS  0
+#define DLB_CQ_ISR_MODE_MSI  1
+#define DLB_CQ_ISR_MODE_MSIX 2
+union dlb_sys_ldb_cq_isr {
+	struct {
+		u32 vector : 6;
+		u32 vf : 4;
+		u32 en_code : 2;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_CQ2VF_PF(x) \
+	(0x8000120 + (x) * 0x1000)
+#define DLB_SYS_LDB_CQ2VF_PF_RST 0x0
+union dlb_sys_ldb_cq2vf_pf {
+	struct {
+		u32 vf : 4;
+		u32 is_pf : 1;
+		u32 rsvd0 : 27;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_PP2VAS(x) \
+	(0x800011c + (x) * 0x1000)
+#define DLB_SYS_LDB_PP2VAS_RST 0x0
+union dlb_sys_ldb_pp2vas {
+	struct {
+		u32 vas : 5;
+		u32 rsvd0 : 27;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_PP2LDBPOOL(x) \
+	(0x8000118 + (x) * 0x1000)
+#define DLB_SYS_LDB_PP2LDBPOOL_RST 0x0
+union dlb_sys_ldb_pp2ldbpool {
+	struct {
+		u32 ldbpool : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_PP2DIRPOOL(x) \
+	(0x8000114 + (x) * 0x1000)
+#define DLB_SYS_LDB_PP2DIRPOOL_RST 0x0
+union dlb_sys_ldb_pp2dirpool {
+	struct {
+		u32 dirpool : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_PP2VF_PF(x) \
+	(0x8000110 + (x) * 0x1000)
+#define DLB_SYS_LDB_PP2VF_PF_RST 0x0
+union dlb_sys_ldb_pp2vf_pf {
+	struct {
+		u32 vf : 4;
+		u32 is_pf : 1;
+		u32 rsvd0 : 27;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_PP_ADDR_U(x) \
+	(0x800010c + (x) * 0x1000)
+#define DLB_SYS_LDB_PP_ADDR_U_RST 0x0
+union dlb_sys_ldb_pp_addr_u {
+	struct {
+		u32 addr_u : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_PP_ADDR_L(x) \
+	(0x8000108 + (x) * 0x1000)
+#define DLB_SYS_LDB_PP_ADDR_L_RST 0x0
+union dlb_sys_ldb_pp_addr_l {
+	struct {
+		u32 rsvd0 : 7;
+		u32 addr_l : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_CQ_ADDR_U(x) \
+	(0x8000104 + (x) * 0x1000)
+#define DLB_SYS_LDB_CQ_ADDR_U_RST 0x0
+union dlb_sys_ldb_cq_addr_u {
+	struct {
+		u32 addr_u : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_CQ_ADDR_L(x) \
+	(0x8000100 + (x) * 0x1000)
+#define DLB_SYS_LDB_CQ_ADDR_L_RST 0x0
+union dlb_sys_ldb_cq_addr_l {
+	struct {
+		u32 rsvd0 : 6;
+		u32 addr_l : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_PP_V(x) \
+	(0x8000228 + (x) * 0x1000)
+#define DLB_SYS_DIR_PP_V_RST 0x0
+union dlb_sys_dir_pp_v {
+	struct {
+		u32 pp_v : 1;
+		u32 mb_dm : 1;
+		u32 rsvd0 : 30;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_CQ_ISR(x) \
+	(0x8000224 + (x) * 0x1000)
+#define DLB_SYS_DIR_CQ_ISR_RST 0x0
+union dlb_sys_dir_cq_isr {
+	struct {
+		u32 vector : 6;
+		u32 vf : 4;
+		u32 en_code : 2;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_CQ2VF_PF(x) \
+	(0x8000220 + (x) * 0x1000)
+#define DLB_SYS_DIR_CQ2VF_PF_RST 0x0
+union dlb_sys_dir_cq2vf_pf {
+	struct {
+		u32 vf : 4;
+		u32 is_pf : 1;
+		u32 rsvd0 : 27;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_PP2VAS(x) \
+	(0x800021c + (x) * 0x1000)
+#define DLB_SYS_DIR_PP2VAS_RST 0x0
+union dlb_sys_dir_pp2vas {
+	struct {
+		u32 vas : 5;
+		u32 rsvd0 : 27;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_PP2LDBPOOL(x) \
+	(0x8000218 + (x) * 0x1000)
+#define DLB_SYS_DIR_PP2LDBPOOL_RST 0x0
+union dlb_sys_dir_pp2ldbpool {
+	struct {
+		u32 ldbpool : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_PP2DIRPOOL(x) \
+	(0x8000214 + (x) * 0x1000)
+#define DLB_SYS_DIR_PP2DIRPOOL_RST 0x0
+union dlb_sys_dir_pp2dirpool {
+	struct {
+		u32 dirpool : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_PP2VF_PF(x) \
+	(0x8000210 + (x) * 0x1000)
+#define DLB_SYS_DIR_PP2VF_PF_RST 0x0
+union dlb_sys_dir_pp2vf_pf {
+	struct {
+		u32 vf : 4;
+		u32 is_pf : 1;
+		u32 is_hw_dsi : 1;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_PP_ADDR_U(x) \
+	(0x800020c + (x) * 0x1000)
+#define DLB_SYS_DIR_PP_ADDR_U_RST 0x0
+union dlb_sys_dir_pp_addr_u {
+	struct {
+		u32 addr_u : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_PP_ADDR_L(x) \
+	(0x8000208 + (x) * 0x1000)
+#define DLB_SYS_DIR_PP_ADDR_L_RST 0x0
+union dlb_sys_dir_pp_addr_l {
+	struct {
+		u32 rsvd0 : 7;
+		u32 addr_l : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_CQ_ADDR_U(x) \
+	(0x8000204 + (x) * 0x1000)
+#define DLB_SYS_DIR_CQ_ADDR_U_RST 0x0
+union dlb_sys_dir_cq_addr_u {
+	struct {
+		u32 addr_u : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_CQ_ADDR_L(x) \
+	(0x8000200 + (x) * 0x1000)
+#define DLB_SYS_DIR_CQ_ADDR_L_RST 0x0
+union dlb_sys_dir_cq_addr_l {
+	struct {
+		u32 rsvd0 : 6;
+		u32 addr_l : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_INGRESS_ALARM_ENBL 0x300
+#define DLB_SYS_INGRESS_ALARM_ENBL_RST 0x0
+union dlb_sys_ingress_alarm_enbl {
+	struct {
+		u32 illegal_hcw : 1;
+		u32 illegal_pp : 1;
+		u32 disabled_pp : 1;
+		u32 illegal_qid : 1;
+		u32 disabled_qid : 1;
+		u32 illegal_ldb_qid_cfg : 1;
+		u32 illegal_cqid : 1;
+		u32 rsvd0 : 25;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_CQ_MODE 0x30c
+#define DLB_SYS_CQ_MODE_RST 0x0
+union dlb_sys_cq_mode {
+	struct {
+		u32 ldb_cq64 : 1;
+		u32 dir_cq64 : 1;
+		u32 rsvd0 : 30;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_FUNC_VF_BAR_DSBL(x) \
+	(0x310 + (x) * 0x4)
+#define DLB_SYS_FUNC_VF_BAR_DSBL_RST 0x0
+union dlb_sys_func_vf_bar_dsbl {
+	struct {
+		u32 func_vf_bar_dis : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_MSIX_ACK 0x400
+#define DLB_SYS_MSIX_ACK_RST 0x0
+union dlb_sys_msix_ack {
+	struct {
+		u32 msix_0_ack : 1;
+		u32 msix_1_ack : 1;
+		u32 msix_2_ack : 1;
+		u32 msix_3_ack : 1;
+		u32 msix_4_ack : 1;
+		u32 msix_5_ack : 1;
+		u32 msix_6_ack : 1;
+		u32 msix_7_ack : 1;
+		u32 msix_8_ack : 1;
+		u32 rsvd0 : 23;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_MSIX_PASSTHRU 0x404
+#define DLB_SYS_MSIX_PASSTHRU_RST 0x0
+union dlb_sys_msix_passthru {
+	struct {
+		u32 msix_0_passthru : 1;
+		u32 msix_1_passthru : 1;
+		u32 msix_2_passthru : 1;
+		u32 msix_3_passthru : 1;
+		u32 msix_4_passthru : 1;
+		u32 msix_5_passthru : 1;
+		u32 msix_6_passthru : 1;
+		u32 msix_7_passthru : 1;
+		u32 msix_8_passthru : 1;
+		u32 rsvd0 : 23;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_MSIX_MODE 0x408
+#define DLB_SYS_MSIX_MODE_RST 0x0
+/* MSI-X Modes */
+#define DLB_MSIX_MODE_PACKED     0
+#define DLB_MSIX_MODE_COMPRESSED 1
+union dlb_sys_msix_mode {
+	struct {
+		u32 mode : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_CQ_31_0_OCC_INT_STS 0x440
+#define DLB_SYS_DIR_CQ_31_0_OCC_INT_STS_RST 0x0
+union dlb_sys_dir_cq_31_0_occ_int_sts {
+	struct {
+		u32 cq_0_occ_int : 1;
+		u32 cq_1_occ_int : 1;
+		u32 cq_2_occ_int : 1;
+		u32 cq_3_occ_int : 1;
+		u32 cq_4_occ_int : 1;
+		u32 cq_5_occ_int : 1;
+		u32 cq_6_occ_int : 1;
+		u32 cq_7_occ_int : 1;
+		u32 cq_8_occ_int : 1;
+		u32 cq_9_occ_int : 1;
+		u32 cq_10_occ_int : 1;
+		u32 cq_11_occ_int : 1;
+		u32 cq_12_occ_int : 1;
+		u32 cq_13_occ_int : 1;
+		u32 cq_14_occ_int : 1;
+		u32 cq_15_occ_int : 1;
+		u32 cq_16_occ_int : 1;
+		u32 cq_17_occ_int : 1;
+		u32 cq_18_occ_int : 1;
+		u32 cq_19_occ_int : 1;
+		u32 cq_20_occ_int : 1;
+		u32 cq_21_occ_int : 1;
+		u32 cq_22_occ_int : 1;
+		u32 cq_23_occ_int : 1;
+		u32 cq_24_occ_int : 1;
+		u32 cq_25_occ_int : 1;
+		u32 cq_26_occ_int : 1;
+		u32 cq_27_occ_int : 1;
+		u32 cq_28_occ_int : 1;
+		u32 cq_29_occ_int : 1;
+		u32 cq_30_occ_int : 1;
+		u32 cq_31_occ_int : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_CQ_63_32_OCC_INT_STS 0x444
+#define DLB_SYS_DIR_CQ_63_32_OCC_INT_STS_RST 0x0
+union dlb_sys_dir_cq_63_32_occ_int_sts {
+	struct {
+		u32 cq_32_occ_int : 1;
+		u32 cq_33_occ_int : 1;
+		u32 cq_34_occ_int : 1;
+		u32 cq_35_occ_int : 1;
+		u32 cq_36_occ_int : 1;
+		u32 cq_37_occ_int : 1;
+		u32 cq_38_occ_int : 1;
+		u32 cq_39_occ_int : 1;
+		u32 cq_40_occ_int : 1;
+		u32 cq_41_occ_int : 1;
+		u32 cq_42_occ_int : 1;
+		u32 cq_43_occ_int : 1;
+		u32 cq_44_occ_int : 1;
+		u32 cq_45_occ_int : 1;
+		u32 cq_46_occ_int : 1;
+		u32 cq_47_occ_int : 1;
+		u32 cq_48_occ_int : 1;
+		u32 cq_49_occ_int : 1;
+		u32 cq_50_occ_int : 1;
+		u32 cq_51_occ_int : 1;
+		u32 cq_52_occ_int : 1;
+		u32 cq_53_occ_int : 1;
+		u32 cq_54_occ_int : 1;
+		u32 cq_55_occ_int : 1;
+		u32 cq_56_occ_int : 1;
+		u32 cq_57_occ_int : 1;
+		u32 cq_58_occ_int : 1;
+		u32 cq_59_occ_int : 1;
+		u32 cq_60_occ_int : 1;
+		u32 cq_61_occ_int : 1;
+		u32 cq_62_occ_int : 1;
+		u32 cq_63_occ_int : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_CQ_95_64_OCC_INT_STS 0x448
+#define DLB_SYS_DIR_CQ_95_64_OCC_INT_STS_RST 0x0
+union dlb_sys_dir_cq_95_64_occ_int_sts {
+	struct {
+		u32 cq_64_occ_int : 1;
+		u32 cq_65_occ_int : 1;
+		u32 cq_66_occ_int : 1;
+		u32 cq_67_occ_int : 1;
+		u32 cq_68_occ_int : 1;
+		u32 cq_69_occ_int : 1;
+		u32 cq_70_occ_int : 1;
+		u32 cq_71_occ_int : 1;
+		u32 cq_72_occ_int : 1;
+		u32 cq_73_occ_int : 1;
+		u32 cq_74_occ_int : 1;
+		u32 cq_75_occ_int : 1;
+		u32 cq_76_occ_int : 1;
+		u32 cq_77_occ_int : 1;
+		u32 cq_78_occ_int : 1;
+		u32 cq_79_occ_int : 1;
+		u32 cq_80_occ_int : 1;
+		u32 cq_81_occ_int : 1;
+		u32 cq_82_occ_int : 1;
+		u32 cq_83_occ_int : 1;
+		u32 cq_84_occ_int : 1;
+		u32 cq_85_occ_int : 1;
+		u32 cq_86_occ_int : 1;
+		u32 cq_87_occ_int : 1;
+		u32 cq_88_occ_int : 1;
+		u32 cq_89_occ_int : 1;
+		u32 cq_90_occ_int : 1;
+		u32 cq_91_occ_int : 1;
+		u32 cq_92_occ_int : 1;
+		u32 cq_93_occ_int : 1;
+		u32 cq_94_occ_int : 1;
+		u32 cq_95_occ_int : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_DIR_CQ_127_96_OCC_INT_STS 0x44c
+#define DLB_SYS_DIR_CQ_127_96_OCC_INT_STS_RST 0x0
+union dlb_sys_dir_cq_127_96_occ_int_sts {
+	struct {
+		u32 cq_96_occ_int : 1;
+		u32 cq_97_occ_int : 1;
+		u32 cq_98_occ_int : 1;
+		u32 cq_99_occ_int : 1;
+		u32 cq_100_occ_int : 1;
+		u32 cq_101_occ_int : 1;
+		u32 cq_102_occ_int : 1;
+		u32 cq_103_occ_int : 1;
+		u32 cq_104_occ_int : 1;
+		u32 cq_105_occ_int : 1;
+		u32 cq_106_occ_int : 1;
+		u32 cq_107_occ_int : 1;
+		u32 cq_108_occ_int : 1;
+		u32 cq_109_occ_int : 1;
+		u32 cq_110_occ_int : 1;
+		u32 cq_111_occ_int : 1;
+		u32 cq_112_occ_int : 1;
+		u32 cq_113_occ_int : 1;
+		u32 cq_114_occ_int : 1;
+		u32 cq_115_occ_int : 1;
+		u32 cq_116_occ_int : 1;
+		u32 cq_117_occ_int : 1;
+		u32 cq_118_occ_int : 1;
+		u32 cq_119_occ_int : 1;
+		u32 cq_120_occ_int : 1;
+		u32 cq_121_occ_int : 1;
+		u32 cq_122_occ_int : 1;
+		u32 cq_123_occ_int : 1;
+		u32 cq_124_occ_int : 1;
+		u32 cq_125_occ_int : 1;
+		u32 cq_126_occ_int : 1;
+		u32 cq_127_occ_int : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_CQ_31_0_OCC_INT_STS 0x460
+#define DLB_SYS_LDB_CQ_31_0_OCC_INT_STS_RST 0x0
+union dlb_sys_ldb_cq_31_0_occ_int_sts {
+	struct {
+		u32 cq_0_occ_int : 1;
+		u32 cq_1_occ_int : 1;
+		u32 cq_2_occ_int : 1;
+		u32 cq_3_occ_int : 1;
+		u32 cq_4_occ_int : 1;
+		u32 cq_5_occ_int : 1;
+		u32 cq_6_occ_int : 1;
+		u32 cq_7_occ_int : 1;
+		u32 cq_8_occ_int : 1;
+		u32 cq_9_occ_int : 1;
+		u32 cq_10_occ_int : 1;
+		u32 cq_11_occ_int : 1;
+		u32 cq_12_occ_int : 1;
+		u32 cq_13_occ_int : 1;
+		u32 cq_14_occ_int : 1;
+		u32 cq_15_occ_int : 1;
+		u32 cq_16_occ_int : 1;
+		u32 cq_17_occ_int : 1;
+		u32 cq_18_occ_int : 1;
+		u32 cq_19_occ_int : 1;
+		u32 cq_20_occ_int : 1;
+		u32 cq_21_occ_int : 1;
+		u32 cq_22_occ_int : 1;
+		u32 cq_23_occ_int : 1;
+		u32 cq_24_occ_int : 1;
+		u32 cq_25_occ_int : 1;
+		u32 cq_26_occ_int : 1;
+		u32 cq_27_occ_int : 1;
+		u32 cq_28_occ_int : 1;
+		u32 cq_29_occ_int : 1;
+		u32 cq_30_occ_int : 1;
+		u32 cq_31_occ_int : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_LDB_CQ_63_32_OCC_INT_STS 0x464
+#define DLB_SYS_LDB_CQ_63_32_OCC_INT_STS_RST 0x0
+union dlb_sys_ldb_cq_63_32_occ_int_sts {
+	struct {
+		u32 cq_32_occ_int : 1;
+		u32 cq_33_occ_int : 1;
+		u32 cq_34_occ_int : 1;
+		u32 cq_35_occ_int : 1;
+		u32 cq_36_occ_int : 1;
+		u32 cq_37_occ_int : 1;
+		u32 cq_38_occ_int : 1;
+		u32 cq_39_occ_int : 1;
+		u32 cq_40_occ_int : 1;
+		u32 cq_41_occ_int : 1;
+		u32 cq_42_occ_int : 1;
+		u32 cq_43_occ_int : 1;
+		u32 cq_44_occ_int : 1;
+		u32 cq_45_occ_int : 1;
+		u32 cq_46_occ_int : 1;
+		u32 cq_47_occ_int : 1;
+		u32 cq_48_occ_int : 1;
+		u32 cq_49_occ_int : 1;
+		u32 cq_50_occ_int : 1;
+		u32 cq_51_occ_int : 1;
+		u32 cq_52_occ_int : 1;
+		u32 cq_53_occ_int : 1;
+		u32 cq_54_occ_int : 1;
+		u32 cq_55_occ_int : 1;
+		u32 cq_56_occ_int : 1;
+		u32 cq_57_occ_int : 1;
+		u32 cq_58_occ_int : 1;
+		u32 cq_59_occ_int : 1;
+		u32 cq_60_occ_int : 1;
+		u32 cq_61_occ_int : 1;
+		u32 cq_62_occ_int : 1;
+		u32 cq_63_occ_int : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_SYS_ALARM_HW_SYND 0x50c
+#define DLB_SYS_ALARM_HW_SYND_RST 0x0
+union dlb_sys_alarm_hw_synd {
+	struct {
+		u32 syndrome : 8;
+		u32 rtype : 2;
+		u32 rsvd0 : 2;
+		u32 from_dmv : 1;
+		u32 is_ldb : 1;
+		u32 cls : 2;
+		u32 aid : 6;
+		u32 unit : 4;
+		u32 source : 4;
+		u32 more : 1;
+		u32 valid : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_LDB_TOT_SCH_CNT_CTRL(x) \
+	(0x20000000 + (x) * 0x1000)
+#define DLB_LSP_CQ_LDB_TOT_SCH_CNT_CTRL_RST 0x0
+union dlb_lsp_cq_ldb_tot_sch_cnt_ctrl {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_LDB_DSBL(x) \
+	(0x20000124 + (x) * 0x1000)
+#define DLB_LSP_CQ_LDB_DSBL_RST 0x1
+union dlb_lsp_cq_ldb_dsbl {
+	struct {
+		u32 disabled : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_LDB_TOT_SCH_CNTH(x) \
+	(0x20000120 + (x) * 0x1000)
+#define DLB_LSP_CQ_LDB_TOT_SCH_CNTH_RST 0x0
+union dlb_lsp_cq_ldb_tot_sch_cnth {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_LDB_TOT_SCH_CNTL(x) \
+	(0x2000011c + (x) * 0x1000)
+#define DLB_LSP_CQ_LDB_TOT_SCH_CNTL_RST 0x0
+union dlb_lsp_cq_ldb_tot_sch_cntl {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_LDB_TKN_DEPTH_SEL(x) \
+	(0x20000118 + (x) * 0x1000)
+#define DLB_LSP_CQ_LDB_TKN_DEPTH_SEL_RST 0x0
+union dlb_lsp_cq_ldb_tkn_depth_sel {
+	struct {
+		u32 token_depth_select : 4;
+		u32 ignore_depth : 1;
+		u32 enab_shallow_cq : 1;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_LDB_TKN_CNT(x) \
+	(0x20000114 + (x) * 0x1000)
+#define DLB_LSP_CQ_LDB_TKN_CNT_RST 0x0
+union dlb_lsp_cq_ldb_tkn_cnt {
+	struct {
+		u32 token_count : 11;
+		u32 rsvd0 : 21;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_LDB_INFL_LIM(x) \
+	(0x20000110 + (x) * 0x1000)
+#define DLB_LSP_CQ_LDB_INFL_LIM_RST 0x0
+union dlb_lsp_cq_ldb_infl_lim {
+	struct {
+		u32 limit : 13;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_LDB_INFL_CNT(x) \
+	(0x2000010c + (x) * 0x1000)
+#define DLB_LSP_CQ_LDB_INFL_CNT_RST 0x0
+union dlb_lsp_cq_ldb_infl_cnt {
+	struct {
+		u32 count : 13;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ2QID(x, y) \
+	(0x20000104 + (x) * 0x1000 + (y) * 0x4)
+#define DLB_LSP_CQ2QID_RST 0x0
+union dlb_lsp_cq2qid {
+	struct {
+		u32 qid_p0 : 7;
+		u32 rsvd3 : 1;
+		u32 qid_p1 : 7;
+		u32 rsvd2 : 1;
+		u32 qid_p2 : 7;
+		u32 rsvd1 : 1;
+		u32 qid_p3 : 7;
+		u32 rsvd0 : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ2PRIOV(x) \
+	(0x20000100 + (x) * 0x1000)
+#define DLB_LSP_CQ2PRIOV_RST 0x0
+union dlb_lsp_cq2priov {
+	struct {
+		u32 prio : 24;
+		u32 v : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_DIR_DSBL(x) \
+	(0x20000310 + (x) * 0x1000)
+#define DLB_LSP_CQ_DIR_DSBL_RST 0x1
+union dlb_lsp_cq_dir_dsbl {
+	struct {
+		u32 disabled : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_DIR_TKN_DEPTH_SEL_DSI(x) \
+	(0x2000030c + (x) * 0x1000)
+#define DLB_LSP_CQ_DIR_TKN_DEPTH_SEL_DSI_RST 0x0
+union dlb_lsp_cq_dir_tkn_depth_sel_dsi {
+	struct {
+		u32 token_depth_select : 4;
+		u32 disable_wb_opt : 1;
+		u32 ignore_depth : 1;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_DIR_TOT_SCH_CNTH(x) \
+	(0x20000308 + (x) * 0x1000)
+#define DLB_LSP_CQ_DIR_TOT_SCH_CNTH_RST 0x0
+union dlb_lsp_cq_dir_tot_sch_cnth {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_DIR_TOT_SCH_CNTL(x) \
+	(0x20000304 + (x) * 0x1000)
+#define DLB_LSP_CQ_DIR_TOT_SCH_CNTL_RST 0x0
+union dlb_lsp_cq_dir_tot_sch_cntl {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CQ_DIR_TKN_CNT(x) \
+	(0x20000300 + (x) * 0x1000)
+#define DLB_LSP_CQ_DIR_TKN_CNT_RST 0x0
+union dlb_lsp_cq_dir_tkn_cnt {
+	struct {
+		u32 count : 11;
+		u32 rsvd0 : 21;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_LDB_QID2CQIDX(x, y) \
+	(0x20000400 + (x) * 0x1000 + (y) * 0x4)
+#define DLB_LSP_QID_LDB_QID2CQIDX_RST 0x0
+union dlb_lsp_qid_ldb_qid2cqidx {
+	struct {
+		u32 cq_p0 : 8;
+		u32 cq_p1 : 8;
+		u32 cq_p2 : 8;
+		u32 cq_p3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_LDB_QID2CQIDX2(x, y) \
+	(0x20000500 + (x) * 0x1000 + (y) * 0x4)
+#define DLB_LSP_QID_LDB_QID2CQIDX2_RST 0x0
+union dlb_lsp_qid_ldb_qid2cqidx2 {
+	struct {
+		u32 cq_p0 : 8;
+		u32 cq_p1 : 8;
+		u32 cq_p2 : 8;
+		u32 cq_p3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_ATQ_ENQUEUE_CNT(x) \
+	(0x2000066c + (x) * 0x1000)
+#define DLB_LSP_QID_ATQ_ENQUEUE_CNT_RST 0x0
+union dlb_lsp_qid_atq_enqueue_cnt {
+	struct {
+		u32 count : 15;
+		u32 rsvd0 : 17;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_LDB_INFL_LIM(x) \
+	(0x2000064c + (x) * 0x1000)
+#define DLB_LSP_QID_LDB_INFL_LIM_RST 0x0
+union dlb_lsp_qid_ldb_infl_lim {
+	struct {
+		u32 limit : 13;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_LDB_INFL_CNT(x) \
+	(0x2000062c + (x) * 0x1000)
+#define DLB_LSP_QID_LDB_INFL_CNT_RST 0x0
+union dlb_lsp_qid_ldb_infl_cnt {
+	struct {
+		u32 count : 13;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_AQED_ACTIVE_LIM(x) \
+	(0x20000628 + (x) * 0x1000)
+#define DLB_LSP_QID_AQED_ACTIVE_LIM_RST 0x0
+union dlb_lsp_qid_aqed_active_lim {
+	struct {
+		u32 limit : 12;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_AQED_ACTIVE_CNT(x) \
+	(0x20000624 + (x) * 0x1000)
+#define DLB_LSP_QID_AQED_ACTIVE_CNT_RST 0x0
+union dlb_lsp_qid_aqed_active_cnt {
+	struct {
+		u32 count : 12;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_LDB_ENQUEUE_CNT(x) \
+	(0x20000604 + (x) * 0x1000)
+#define DLB_LSP_QID_LDB_ENQUEUE_CNT_RST 0x0
+union dlb_lsp_qid_ldb_enqueue_cnt {
+	struct {
+		u32 count : 15;
+		u32 rsvd0 : 17;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_LDB_REPLAY_CNT(x) \
+	(0x20000600 + (x) * 0x1000)
+#define DLB_LSP_QID_LDB_REPLAY_CNT_RST 0x0
+union dlb_lsp_qid_ldb_replay_cnt {
+	struct {
+		u32 count : 15;
+		u32 rsvd0 : 17;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_QID_DIR_ENQUEUE_CNT(x) \
+	(0x20000700 + (x) * 0x1000)
+#define DLB_LSP_QID_DIR_ENQUEUE_CNT_RST 0x0
+union dlb_lsp_qid_dir_enqueue_cnt {
+	struct {
+		u32 count : 13;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CTRL_CONFIG_0 0x2800002c
+#define DLB_LSP_CTRL_CONFIG_0_RST 0x12cc
+union dlb_lsp_ctrl_config_0 {
+	struct {
+		u32 atm_cq_qid_priority_prot : 1;
+		u32 ldb_arb_ignore_empty : 1;
+		u32 ldb_arb_mode : 2;
+		u32 ldb_arb_threshold : 18;
+		u32 cfg_cq_sla_upd_always : 1;
+		u32 cfg_cq_wcn_upd_always : 1;
+		u32 spare : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CFG_ARB_WEIGHT_ATM_NALB_QID_1 0x28000028
+#define DLB_LSP_CFG_ARB_WEIGHT_ATM_NALB_QID_1_RST 0x0
+union dlb_lsp_cfg_arb_weight_atm_nalb_qid_1 {
+	struct {
+		u32 slot4_weight : 8;
+		u32 slot5_weight : 8;
+		u32 slot6_weight : 8;
+		u32 slot7_weight : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CFG_ARB_WEIGHT_ATM_NALB_QID_0 0x28000024
+#define DLB_LSP_CFG_ARB_WEIGHT_ATM_NALB_QID_0_RST 0x0
+union dlb_lsp_cfg_arb_weight_atm_nalb_qid_0 {
+	struct {
+		u32 slot0_weight : 8;
+		u32 slot1_weight : 8;
+		u32 slot2_weight : 8;
+		u32 slot3_weight : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CFG_ARB_WEIGHT_LDB_QID_1 0x28000020
+#define DLB_LSP_CFG_ARB_WEIGHT_LDB_QID_1_RST 0x0
+union dlb_lsp_cfg_arb_weight_ldb_qid_1 {
+	struct {
+		u32 slot4_weight : 8;
+		u32 slot5_weight : 8;
+		u32 slot6_weight : 8;
+		u32 slot7_weight : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_CFG_ARB_WEIGHT_LDB_QID_0 0x2800001c
+#define DLB_LSP_CFG_ARB_WEIGHT_LDB_QID_0_RST 0x0
+union dlb_lsp_cfg_arb_weight_ldb_qid_0 {
+	struct {
+		u32 slot0_weight : 8;
+		u32 slot1_weight : 8;
+		u32 slot2_weight : 8;
+		u32 slot3_weight : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_LDB_SCHED_CTRL 0x28100000
+#define DLB_LSP_LDB_SCHED_CTRL_RST 0x0
+union dlb_lsp_ldb_sched_ctrl {
+	struct {
+		u32 cq : 8;
+		u32 qidix : 3;
+		u32 value : 1;
+		u32 nalb_haswork_v : 1;
+		u32 rlist_haswork_v : 1;
+		u32 slist_haswork_v : 1;
+		u32 inflight_ok_v : 1;
+		u32 aqed_nfull_v : 1;
+		u32 spare0 : 15;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_DIR_SCH_CNT_H 0x2820000c
+#define DLB_LSP_DIR_SCH_CNT_H_RST 0x0
+union dlb_lsp_dir_sch_cnt_h {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_DIR_SCH_CNT_L 0x28200008
+#define DLB_LSP_DIR_SCH_CNT_L_RST 0x0
+union dlb_lsp_dir_sch_cnt_l {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_LDB_SCH_CNT_H 0x28200004
+#define DLB_LSP_LDB_SCH_CNT_H_RST 0x0
+union dlb_lsp_ldb_sch_cnt_h {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_LSP_LDB_SCH_CNT_L 0x28200000
+#define DLB_LSP_LDB_SCH_CNT_L_RST 0x0
+union dlb_lsp_ldb_sch_cnt_l {
+	struct {
+		u32 count : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_DP_DIR_CSR_CTRL 0x38000018
+#define DLB_DP_DIR_CSR_CTRL_RST 0xc0000000
+union dlb_dp_dir_csr_ctrl {
+	struct {
+		u32 cfg_int_dis : 1;
+		u32 cfg_int_dis_sbe : 1;
+		u32 cfg_int_dis_mbe : 1;
+		u32 spare0 : 27;
+		u32 cfg_vasr_dis : 1;
+		u32 cfg_int_dis_synd : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_DIR_1 0x38000014
+#define DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_DIR_1_RST 0xfffefdfc
+union dlb_dp_cfg_ctrl_arb_weights_tqpri_dir_1 {
+	struct {
+		u32 pri4 : 8;
+		u32 pri5 : 8;
+		u32 pri6 : 8;
+		u32 pri7 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_DIR_0 0x38000010
+#define DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_DIR_0_RST 0xfbfaf9f8
+union dlb_dp_cfg_ctrl_arb_weights_tqpri_dir_0 {
+	struct {
+		u32 pri0 : 8;
+		u32 pri1 : 8;
+		u32 pri2 : 8;
+		u32 pri3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_1 0x3800000c
+#define DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_1_RST 0xfffefdfc
+union dlb_dp_cfg_ctrl_arb_weights_tqpri_replay_1 {
+	struct {
+		u32 pri4 : 8;
+		u32 pri5 : 8;
+		u32 pri6 : 8;
+		u32 pri7 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_0 0x38000008
+#define DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_0_RST 0xfbfaf9f8
+union dlb_dp_cfg_ctrl_arb_weights_tqpri_replay_0 {
+	struct {
+		u32 pri0 : 8;
+		u32 pri1 : 8;
+		u32 pri2 : 8;
+		u32 pri3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_NALB_PIPE_CTRL_ARB_WEIGHTS_TQPRI_NALB_1 0x6800001c
+#define DLB_NALB_PIPE_CTRL_ARB_WEIGHTS_TQPRI_NALB_1_RST 0xfffefdfc
+union dlb_nalb_pipe_ctrl_arb_weights_tqpri_nalb_1 {
+	struct {
+		u32 pri4 : 8;
+		u32 pri5 : 8;
+		u32 pri6 : 8;
+		u32 pri7 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_NALB_PIPE_CTRL_ARB_WEIGHTS_TQPRI_NALB_0 0x68000018
+#define DLB_NALB_PIPE_CTRL_ARB_WEIGHTS_TQPRI_NALB_0_RST 0xfbfaf9f8
+union dlb_nalb_pipe_ctrl_arb_weights_tqpri_nalb_0 {
+	struct {
+		u32 pri0 : 8;
+		u32 pri1 : 8;
+		u32 pri2 : 8;
+		u32 pri3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATQ_1 0x68000014
+#define DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATQ_1_RST 0xfffefdfc
+union dlb_nalb_pipe_cfg_ctrl_arb_weights_tqpri_atq_1 {
+	struct {
+		u32 pri4 : 8;
+		u32 pri5 : 8;
+		u32 pri6 : 8;
+		u32 pri7 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATQ_0 0x68000010
+#define DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATQ_0_RST 0xfbfaf9f8
+union dlb_nalb_pipe_cfg_ctrl_arb_weights_tqpri_atq_0 {
+	struct {
+		u32 pri0 : 8;
+		u32 pri1 : 8;
+		u32 pri2 : 8;
+		u32 pri3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_1 0x6800000c
+#define DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_1_RST 0xfffefdfc
+union dlb_nalb_pipe_cfg_ctrl_arb_weights_tqpri_replay_1 {
+	struct {
+		u32 pri4 : 8;
+		u32 pri5 : 8;
+		u32 pri6 : 8;
+		u32 pri7 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_0 0x68000008
+#define DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_0_RST 0xfbfaf9f8
+union dlb_nalb_pipe_cfg_ctrl_arb_weights_tqpri_replay_0 {
+	struct {
+		u32 pri0 : 8;
+		u32 pri1 : 8;
+		u32 pri2 : 8;
+		u32 pri3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_ATM_PIPE_QID_LDB_QID2CQIDX(x, y) \
+	(0x70000000 + (x) * 0x1000 + (y) * 0x4)
+#define DLB_ATM_PIPE_QID_LDB_QID2CQIDX_RST 0x0
+union dlb_atm_pipe_qid_ldb_qid2cqidx {
+	struct {
+		u32 cq_p0 : 8;
+		u32 cq_p1 : 8;
+		u32 cq_p2 : 8;
+		u32 cq_p3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_ATM_PIPE_CFG_CTRL_ARB_WEIGHTS_SCHED_BIN 0x7800000c
+#define DLB_ATM_PIPE_CFG_CTRL_ARB_WEIGHTS_SCHED_BIN_RST 0xfffefdfc
+union dlb_atm_pipe_cfg_ctrl_arb_weights_sched_bin {
+	struct {
+		u32 bin0 : 8;
+		u32 bin1 : 8;
+		u32 bin2 : 8;
+		u32 bin3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_ATM_PIPE_CTRL_ARB_WEIGHTS_RDY_BIN 0x78000008
+#define DLB_ATM_PIPE_CTRL_ARB_WEIGHTS_RDY_BIN_RST 0xfffefdfc
+union dlb_atm_pipe_ctrl_arb_weights_rdy_bin {
+	struct {
+		u32 bin0 : 8;
+		u32 bin1 : 8;
+		u32 bin2 : 8;
+		u32 bin3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_AQED_PIPE_QID_FID_LIM(x) \
+	(0x80000014 + (x) * 0x1000)
+#define DLB_AQED_PIPE_QID_FID_LIM_RST 0x7ff
+union dlb_aqed_pipe_qid_fid_lim {
+	struct {
+		u32 qid_fid_limit : 13;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_AQED_PIPE_FL_POP_PTR(x) \
+	(0x80000010 + (x) * 0x1000)
+#define DLB_AQED_PIPE_FL_POP_PTR_RST 0x0
+union dlb_aqed_pipe_fl_pop_ptr {
+	struct {
+		u32 pop_ptr : 11;
+		u32 generation : 1;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_AQED_PIPE_FL_PUSH_PTR(x) \
+	(0x8000000c + (x) * 0x1000)
+#define DLB_AQED_PIPE_FL_PUSH_PTR_RST 0x0
+union dlb_aqed_pipe_fl_push_ptr {
+	struct {
+		u32 push_ptr : 11;
+		u32 generation : 1;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_AQED_PIPE_FL_BASE(x) \
+	(0x80000008 + (x) * 0x1000)
+#define DLB_AQED_PIPE_FL_BASE_RST 0x0
+union dlb_aqed_pipe_fl_base {
+	struct {
+		u32 base : 11;
+		u32 rsvd0 : 21;
+	} field;
+	u32 val;
+};
+
+#define DLB_AQED_PIPE_FL_LIM(x) \
+	(0x80000004 + (x) * 0x1000)
+#define DLB_AQED_PIPE_FL_LIM_RST 0x800
+union dlb_aqed_pipe_fl_lim {
+	struct {
+		u32 limit : 11;
+		u32 freelist_disable : 1;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_AQED_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATM_0 0x88000008
+#define DLB_AQED_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATM_0_RST 0xfffe
+union dlb_aqed_pipe_cfg_ctrl_arb_weights_tqpri_atm_0 {
+	struct {
+		u32 pri0 : 8;
+		u32 pri1 : 8;
+		u32 pri2 : 8;
+		u32 pri3 : 8;
+	} field;
+	u32 val;
+};
+
+#define DLB_RO_PIPE_QID2GRPSLT(x) \
+	(0x90000000 + (x) * 0x1000)
+#define DLB_RO_PIPE_QID2GRPSLT_RST 0x0
+union dlb_ro_pipe_qid2grpslt {
+	struct {
+		u32 slot : 5;
+		u32 rsvd1 : 3;
+		u32 group : 2;
+		u32 rsvd0 : 22;
+	} field;
+	u32 val;
+};
+
+#define DLB_RO_PIPE_GRP_SN_MODE 0x98000008
+#define DLB_RO_PIPE_GRP_SN_MODE_RST 0x0
+union dlb_ro_pipe_grp_sn_mode {
+	struct {
+		u32 sn_mode_0 : 3;
+		u32 reserved0 : 5;
+		u32 sn_mode_1 : 3;
+		u32 reserved1 : 5;
+		u32 sn_mode_2 : 3;
+		u32 reserved2 : 5;
+		u32 sn_mode_3 : 3;
+		u32 reserved3 : 5;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_CFG_DIR_PP_SW_ALARM_EN(x) \
+	(0xa000003c + (x) * 0x1000)
+#define DLB_CHP_CFG_DIR_PP_SW_ALARM_EN_RST 0x1
+union dlb_chp_cfg_dir_pp_sw_alarm_en {
+	struct {
+		u32 alarm_enable : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_WD_ENB(x) \
+	(0xa0000038 + (x) * 0x1000)
+#define DLB_CHP_DIR_CQ_WD_ENB_RST 0x0
+union dlb_chp_dir_cq_wd_enb {
+	struct {
+		u32 wd_enable : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_LDB_PP2POOL(x) \
+	(0xa0000034 + (x) * 0x1000)
+#define DLB_CHP_DIR_LDB_PP2POOL_RST 0x0
+union dlb_chp_dir_ldb_pp2pool {
+	struct {
+		u32 pool : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_DIR_PP2POOL(x) \
+	(0xa0000030 + (x) * 0x1000)
+#define DLB_CHP_DIR_DIR_PP2POOL_RST 0x0
+union dlb_chp_dir_dir_pp2pool {
+	struct {
+		u32 pool : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_LDB_CRD_CNT(x) \
+	(0xa000002c + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_LDB_CRD_CNT_RST 0x0
+union dlb_chp_dir_pp_ldb_crd_cnt {
+	struct {
+		u32 count : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_DIR_CRD_CNT(x) \
+	(0xa0000028 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_DIR_CRD_CNT_RST 0x0
+union dlb_chp_dir_pp_dir_crd_cnt {
+	struct {
+		u32 count : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_TMR_THRESHOLD(x) \
+	(0xa0000024 + (x) * 0x1000)
+#define DLB_CHP_DIR_CQ_TMR_THRESHOLD_RST 0x0
+union dlb_chp_dir_cq_tmr_threshold {
+	struct {
+		u32 timer_thrsh : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_INT_ENB(x) \
+	(0xa0000020 + (x) * 0x1000)
+#define DLB_CHP_DIR_CQ_INT_ENB_RST 0x0
+union dlb_chp_dir_cq_int_enb {
+	struct {
+		u32 en_tim : 1;
+		u32 en_depth : 1;
+		u32 rsvd0 : 30;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_INT_DEPTH_THRSH(x) \
+	(0xa000001c + (x) * 0x1000)
+#define DLB_CHP_DIR_CQ_INT_DEPTH_THRSH_RST 0x0
+union dlb_chp_dir_cq_int_depth_thrsh {
+	struct {
+		u32 depth_threshold : 12;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_TKN_DEPTH_SEL(x) \
+	(0xa0000018 + (x) * 0x1000)
+#define DLB_CHP_DIR_CQ_TKN_DEPTH_SEL_RST 0x0
+union dlb_chp_dir_cq_tkn_depth_sel {
+	struct {
+		u32 token_depth_select : 4;
+		u32 rsvd0 : 28;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_LDB_MIN_CRD_QNT(x) \
+	(0xa0000014 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_LDB_MIN_CRD_QNT_RST 0x1
+union dlb_chp_dir_pp_ldb_min_crd_qnt {
+	struct {
+		u32 quanta : 10;
+		u32 rsvd0 : 22;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_DIR_MIN_CRD_QNT(x) \
+	(0xa0000010 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_DIR_MIN_CRD_QNT_RST 0x1
+union dlb_chp_dir_pp_dir_min_crd_qnt {
+	struct {
+		u32 quanta : 10;
+		u32 rsvd0 : 22;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_LDB_CRD_LWM(x) \
+	(0xa000000c + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_LDB_CRD_LWM_RST 0x0
+union dlb_chp_dir_pp_ldb_crd_lwm {
+	struct {
+		u32 lwm : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_LDB_CRD_HWM(x) \
+	(0xa0000008 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_LDB_CRD_HWM_RST 0x0
+union dlb_chp_dir_pp_ldb_crd_hwm {
+	struct {
+		u32 hwm : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_DIR_CRD_LWM(x) \
+	(0xa0000004 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_DIR_CRD_LWM_RST 0x0
+union dlb_chp_dir_pp_dir_crd_lwm {
+	struct {
+		u32 lwm : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_DIR_CRD_HWM(x) \
+	(0xa0000000 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_DIR_CRD_HWM_RST 0x0
+union dlb_chp_dir_pp_dir_crd_hwm {
+	struct {
+		u32 hwm : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_CFG_LDB_PP_SW_ALARM_EN(x) \
+	(0xa0000148 + (x) * 0x1000)
+#define DLB_CHP_CFG_LDB_PP_SW_ALARM_EN_RST 0x1
+union dlb_chp_cfg_ldb_pp_sw_alarm_en {
+	struct {
+		u32 alarm_enable : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_WD_ENB(x) \
+	(0xa0000144 + (x) * 0x1000)
+#define DLB_CHP_LDB_CQ_WD_ENB_RST 0x0
+union dlb_chp_ldb_cq_wd_enb {
+	struct {
+		u32 wd_enable : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_SN_CHK_ENBL(x) \
+	(0xa0000140 + (x) * 0x1000)
+#define DLB_CHP_SN_CHK_ENBL_RST 0x0
+union dlb_chp_sn_chk_enbl {
+	struct {
+		u32 en : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_HIST_LIST_BASE(x) \
+	(0xa000013c + (x) * 0x1000)
+#define DLB_CHP_HIST_LIST_BASE_RST 0x0
+union dlb_chp_hist_list_base {
+	struct {
+		u32 base : 13;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_HIST_LIST_LIM(x) \
+	(0xa0000138 + (x) * 0x1000)
+#define DLB_CHP_HIST_LIST_LIM_RST 0x0
+union dlb_chp_hist_list_lim {
+	struct {
+		u32 limit : 13;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_LDB_PP2POOL(x) \
+	(0xa0000134 + (x) * 0x1000)
+#define DLB_CHP_LDB_LDB_PP2POOL_RST 0x0
+union dlb_chp_ldb_ldb_pp2pool {
+	struct {
+		u32 pool : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_DIR_PP2POOL(x) \
+	(0xa0000130 + (x) * 0x1000)
+#define DLB_CHP_LDB_DIR_PP2POOL_RST 0x0
+union dlb_chp_ldb_dir_pp2pool {
+	struct {
+		u32 pool : 6;
+		u32 rsvd0 : 26;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_LDB_CRD_CNT(x) \
+	(0xa000012c + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_LDB_CRD_CNT_RST 0x0
+union dlb_chp_ldb_pp_ldb_crd_cnt {
+	struct {
+		u32 count : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_DIR_CRD_CNT(x) \
+	(0xa0000128 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_DIR_CRD_CNT_RST 0x0
+union dlb_chp_ldb_pp_dir_crd_cnt {
+	struct {
+		u32 count : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_TMR_THRESHOLD(x) \
+	(0xa0000124 + (x) * 0x1000)
+#define DLB_CHP_LDB_CQ_TMR_THRESHOLD_RST 0x0
+union dlb_chp_ldb_cq_tmr_threshold {
+	struct {
+		u32 thrsh : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_INT_ENB(x) \
+	(0xa0000120 + (x) * 0x1000)
+#define DLB_CHP_LDB_CQ_INT_ENB_RST 0x0
+union dlb_chp_ldb_cq_int_enb {
+	struct {
+		u32 en_tim : 1;
+		u32 en_depth : 1;
+		u32 rsvd0 : 30;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_INT_DEPTH_THRSH(x) \
+	(0xa000011c + (x) * 0x1000)
+#define DLB_CHP_LDB_CQ_INT_DEPTH_THRSH_RST 0x0
+union dlb_chp_ldb_cq_int_depth_thrsh {
+	struct {
+		u32 depth_threshold : 12;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_TKN_DEPTH_SEL(x) \
+	(0xa0000118 + (x) * 0x1000)
+#define DLB_CHP_LDB_CQ_TKN_DEPTH_SEL_RST 0x0
+union dlb_chp_ldb_cq_tkn_depth_sel {
+	struct {
+		u32 token_depth_select : 4;
+		u32 rsvd0 : 28;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_LDB_MIN_CRD_QNT(x) \
+	(0xa0000114 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_LDB_MIN_CRD_QNT_RST 0x1
+union dlb_chp_ldb_pp_ldb_min_crd_qnt {
+	struct {
+		u32 quanta : 10;
+		u32 rsvd0 : 22;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_DIR_MIN_CRD_QNT(x) \
+	(0xa0000110 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_DIR_MIN_CRD_QNT_RST 0x1
+union dlb_chp_ldb_pp_dir_min_crd_qnt {
+	struct {
+		u32 quanta : 10;
+		u32 rsvd0 : 22;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_LDB_CRD_LWM(x) \
+	(0xa000010c + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_LDB_CRD_LWM_RST 0x0
+union dlb_chp_ldb_pp_ldb_crd_lwm {
+	struct {
+		u32 lwm : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_LDB_CRD_HWM(x) \
+	(0xa0000108 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_LDB_CRD_HWM_RST 0x0
+union dlb_chp_ldb_pp_ldb_crd_hwm {
+	struct {
+		u32 hwm : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_DIR_CRD_LWM(x) \
+	(0xa0000104 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_DIR_CRD_LWM_RST 0x0
+union dlb_chp_ldb_pp_dir_crd_lwm {
+	struct {
+		u32 lwm : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_DIR_CRD_HWM(x) \
+	(0xa0000100 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_DIR_CRD_HWM_RST 0x0
+union dlb_chp_ldb_pp_dir_crd_hwm {
+	struct {
+		u32 hwm : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_DEPTH(x) \
+	(0xa0000218 + (x) * 0x1000)
+#define DLB_CHP_DIR_CQ_DEPTH_RST 0x0
+union dlb_chp_dir_cq_depth {
+	struct {
+		u32 cq_depth : 11;
+		u32 rsvd0 : 21;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_WPTR(x) \
+	(0xa0000214 + (x) * 0x1000)
+#define DLB_CHP_DIR_CQ_WPTR_RST 0x0
+union dlb_chp_dir_cq_wptr {
+	struct {
+		u32 write_pointer : 10;
+		u32 rsvd0 : 22;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_LDB_PUSH_PTR(x) \
+	(0xa0000210 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_LDB_PUSH_PTR_RST 0x0
+union dlb_chp_dir_pp_ldb_push_ptr {
+	struct {
+		u32 push_pointer : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_DIR_PUSH_PTR(x) \
+	(0xa000020c + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_DIR_PUSH_PTR_RST 0x0
+union dlb_chp_dir_pp_dir_push_ptr {
+	struct {
+		u32 push_pointer : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_STATE_RESET(x) \
+	(0xa0000204 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_STATE_RESET_RST 0x0
+union dlb_chp_dir_pp_state_reset {
+	struct {
+		u32 rsvd1 : 7;
+		u32 dir_type : 1;
+		u32 rsvd0 : 23;
+		u32 reset_pp_state : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_PP_CRD_REQ_STATE(x) \
+	(0xa0000200 + (x) * 0x1000)
+#define DLB_CHP_DIR_PP_CRD_REQ_STATE_RST 0x0
+union dlb_chp_dir_pp_crd_req_state {
+	struct {
+		u32 dir_crd_req_active_valid : 1;
+		u32 dir_crd_req_active_check : 1;
+		u32 dir_crd_req_active_busy : 1;
+		u32 rsvd1 : 1;
+		u32 ldb_crd_req_active_valid : 1;
+		u32 ldb_crd_req_active_check : 1;
+		u32 ldb_crd_req_active_busy : 1;
+		u32 rsvd0 : 1;
+		u32 no_pp_credit_update : 1;
+		u32 crd_req_state : 23;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_DEPTH(x) \
+	(0xa0000320 + (x) * 0x1000)
+#define DLB_CHP_LDB_CQ_DEPTH_RST 0x0
+union dlb_chp_ldb_cq_depth {
+	struct {
+		u32 depth : 11;
+		u32 reserved : 2;
+		u32 rsvd0 : 19;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_WPTR(x) \
+	(0xa000031c + (x) * 0x1000)
+#define DLB_CHP_LDB_CQ_WPTR_RST 0x0
+union dlb_chp_ldb_cq_wptr {
+	struct {
+		u32 write_pointer : 10;
+		u32 rsvd0 : 22;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_LDB_PUSH_PTR(x) \
+	(0xa0000318 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_LDB_PUSH_PTR_RST 0x0
+union dlb_chp_ldb_pp_ldb_push_ptr {
+	struct {
+		u32 push_pointer : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_DIR_PUSH_PTR(x) \
+	(0xa0000314 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_DIR_PUSH_PTR_RST 0x0
+union dlb_chp_ldb_pp_dir_push_ptr {
+	struct {
+		u32 push_pointer : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_HIST_LIST_POP_PTR(x) \
+	(0xa000030c + (x) * 0x1000)
+#define DLB_CHP_HIST_LIST_POP_PTR_RST 0x0
+union dlb_chp_hist_list_pop_ptr {
+	struct {
+		u32 pop_ptr : 13;
+		u32 generation : 1;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_HIST_LIST_PUSH_PTR(x) \
+	(0xa0000308 + (x) * 0x1000)
+#define DLB_CHP_HIST_LIST_PUSH_PTR_RST 0x0
+union dlb_chp_hist_list_push_ptr {
+	struct {
+		u32 push_ptr : 13;
+		u32 generation : 1;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_STATE_RESET(x) \
+	(0xa0000304 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_STATE_RESET_RST 0x0
+union dlb_chp_ldb_pp_state_reset {
+	struct {
+		u32 rsvd1 : 7;
+		u32 dir_type : 1;
+		u32 rsvd0 : 23;
+		u32 reset_pp_state : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_PP_CRD_REQ_STATE(x) \
+	(0xa0000300 + (x) * 0x1000)
+#define DLB_CHP_LDB_PP_CRD_REQ_STATE_RST 0x0
+union dlb_chp_ldb_pp_crd_req_state {
+	struct {
+		u32 dir_crd_req_active_valid : 1;
+		u32 dir_crd_req_active_check : 1;
+		u32 dir_crd_req_active_busy : 1;
+		u32 rsvd1 : 1;
+		u32 ldb_crd_req_active_valid : 1;
+		u32 ldb_crd_req_active_check : 1;
+		u32 ldb_crd_req_active_busy : 1;
+		u32 rsvd0 : 1;
+		u32 no_pp_credit_update : 1;
+		u32 crd_req_state : 23;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_ORD_QID_SN(x) \
+	(0xa0000408 + (x) * 0x1000)
+#define DLB_CHP_ORD_QID_SN_RST 0x0
+union dlb_chp_ord_qid_sn {
+	struct {
+		u32 sn : 12;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_ORD_QID_SN_MAP(x) \
+	(0xa0000404 + (x) * 0x1000)
+#define DLB_CHP_ORD_QID_SN_MAP_RST 0x0
+union dlb_chp_ord_qid_sn_map {
+	struct {
+		u32 mode : 3;
+		u32 slot : 5;
+		u32 grp : 2;
+		u32 rsvd0 : 22;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_POOL_CRD_CNT(x) \
+	(0xa000050c + (x) * 0x1000)
+#define DLB_CHP_LDB_POOL_CRD_CNT_RST 0x0
+union dlb_chp_ldb_pool_crd_cnt {
+	struct {
+		u32 count : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_QED_FL_BASE(x) \
+	(0xa0000508 + (x) * 0x1000)
+#define DLB_CHP_QED_FL_BASE_RST 0x0
+union dlb_chp_qed_fl_base {
+	struct {
+		u32 base : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_QED_FL_LIM(x) \
+	(0xa0000504 + (x) * 0x1000)
+#define DLB_CHP_QED_FL_LIM_RST 0x8000
+union dlb_chp_qed_fl_lim {
+	struct {
+		u32 limit : 14;
+		u32 rsvd1 : 1;
+		u32 freelist_disable : 1;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_POOL_CRD_LIM(x) \
+	(0xa0000500 + (x) * 0x1000)
+#define DLB_CHP_LDB_POOL_CRD_LIM_RST 0x0
+union dlb_chp_ldb_pool_crd_lim {
+	struct {
+		u32 limit : 16;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_QED_FL_POP_PTR(x) \
+	(0xa0000604 + (x) * 0x1000)
+#define DLB_CHP_QED_FL_POP_PTR_RST 0x0
+union dlb_chp_qed_fl_pop_ptr {
+	struct {
+		u32 pop_ptr : 14;
+		u32 reserved0 : 1;
+		u32 generation : 1;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_QED_FL_PUSH_PTR(x) \
+	(0xa0000600 + (x) * 0x1000)
+#define DLB_CHP_QED_FL_PUSH_PTR_RST 0x0
+union dlb_chp_qed_fl_push_ptr {
+	struct {
+		u32 push_ptr : 14;
+		u32 reserved0 : 1;
+		u32 generation : 1;
+		u32 rsvd0 : 16;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_POOL_CRD_CNT(x) \
+	(0xa000070c + (x) * 0x1000)
+#define DLB_CHP_DIR_POOL_CRD_CNT_RST 0x0
+union dlb_chp_dir_pool_crd_cnt {
+	struct {
+		u32 count : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DQED_FL_BASE(x) \
+	(0xa0000708 + (x) * 0x1000)
+#define DLB_CHP_DQED_FL_BASE_RST 0x0
+union dlb_chp_dqed_fl_base {
+	struct {
+		u32 base : 12;
+		u32 rsvd0 : 20;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DQED_FL_LIM(x) \
+	(0xa0000704 + (x) * 0x1000)
+#define DLB_CHP_DQED_FL_LIM_RST 0x2000
+union dlb_chp_dqed_fl_lim {
+	struct {
+		u32 limit : 12;
+		u32 rsvd1 : 1;
+		u32 freelist_disable : 1;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_POOL_CRD_LIM(x) \
+	(0xa0000700 + (x) * 0x1000)
+#define DLB_CHP_DIR_POOL_CRD_LIM_RST 0x0
+union dlb_chp_dir_pool_crd_lim {
+	struct {
+		u32 limit : 14;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DQED_FL_POP_PTR(x) \
+	(0xa0000804 + (x) * 0x1000)
+#define DLB_CHP_DQED_FL_POP_PTR_RST 0x0
+union dlb_chp_dqed_fl_pop_ptr {
+	struct {
+		u32 pop_ptr : 12;
+		u32 reserved0 : 1;
+		u32 generation : 1;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DQED_FL_PUSH_PTR(x) \
+	(0xa0000800 + (x) * 0x1000)
+#define DLB_CHP_DQED_FL_PUSH_PTR_RST 0x0
+union dlb_chp_dqed_fl_push_ptr {
+	struct {
+		u32 push_ptr : 12;
+		u32 reserved0 : 1;
+		u32 generation : 1;
+		u32 rsvd0 : 18;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_CTRL_DIAG_02 0xa8000154
+#define DLB_CHP_CTRL_DIAG_02_RST 0x0
+union dlb_chp_ctrl_diag_02 {
+	struct {
+		u32 control : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_CFG_CHP_CSR_CTRL 0xa8000130
+#define DLB_CHP_CFG_CHP_CSR_CTRL_RST 0xc0003fff
+#define DLB_CHP_CFG_EXCESS_TOKENS_SHIFT 12
+union dlb_chp_cfg_chp_csr_ctrl {
+	struct {
+		u32 int_inf_alarm_enable_0 : 1;
+		u32 int_inf_alarm_enable_1 : 1;
+		u32 int_inf_alarm_enable_2 : 1;
+		u32 int_inf_alarm_enable_3 : 1;
+		u32 int_inf_alarm_enable_4 : 1;
+		u32 int_inf_alarm_enable_5 : 1;
+		u32 int_inf_alarm_enable_6 : 1;
+		u32 int_inf_alarm_enable_7 : 1;
+		u32 int_inf_alarm_enable_8 : 1;
+		u32 int_inf_alarm_enable_9 : 1;
+		u32 int_inf_alarm_enable_10 : 1;
+		u32 int_inf_alarm_enable_11 : 1;
+		u32 int_inf_alarm_enable_12 : 1;
+		u32 int_cor_alarm_enable : 1;
+		u32 csr_control_spare : 14;
+		u32 cfg_vasr_dis : 1;
+		u32 counter_clear : 1;
+		u32 blk_cor_report : 1;
+		u32 blk_cor_synd : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_INTR_ARMED1 0xa8000068
+#define DLB_CHP_LDB_CQ_INTR_ARMED1_RST 0x0
+union dlb_chp_ldb_cq_intr_armed1 {
+	struct {
+		u32 armed : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_LDB_CQ_INTR_ARMED0 0xa8000064
+#define DLB_CHP_LDB_CQ_INTR_ARMED0_RST 0x0
+union dlb_chp_ldb_cq_intr_armed0 {
+	struct {
+		u32 armed : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_INTR_ARMED3 0xa8000024
+#define DLB_CHP_DIR_CQ_INTR_ARMED3_RST 0x0
+union dlb_chp_dir_cq_intr_armed3 {
+	struct {
+		u32 armed : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_INTR_ARMED2 0xa8000020
+#define DLB_CHP_DIR_CQ_INTR_ARMED2_RST 0x0
+union dlb_chp_dir_cq_intr_armed2 {
+	struct {
+		u32 armed : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_INTR_ARMED1 0xa800001c
+#define DLB_CHP_DIR_CQ_INTR_ARMED1_RST 0x0
+union dlb_chp_dir_cq_intr_armed1 {
+	struct {
+		u32 armed : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_CHP_DIR_CQ_INTR_ARMED0 0xa8000018
+#define DLB_CHP_DIR_CQ_INTR_ARMED0_RST 0x0
+union dlb_chp_dir_cq_intr_armed0 {
+	struct {
+		u32 armed : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_CFG_MSTR_DIAG_RESET_STS 0xb8000004
+#define DLB_CFG_MSTR_DIAG_RESET_STS_RST 0x1ff
+union dlb_cfg_mstr_diag_reset_sts {
+	struct {
+		u32 chp_pf_reset_done : 1;
+		u32 rop_pf_reset_done : 1;
+		u32 lsp_pf_reset_done : 1;
+		u32 nalb_pf_reset_done : 1;
+		u32 ap_pf_reset_done : 1;
+		u32 dp_pf_reset_done : 1;
+		u32 qed_pf_reset_done : 1;
+		u32 dqed_pf_reset_done : 1;
+		u32 aqed_pf_reset_done : 1;
+		u32 rsvd1 : 6;
+		u32 pf_reset_active : 1;
+		u32 chp_vf_reset_done : 1;
+		u32 rop_vf_reset_done : 1;
+		u32 lsp_vf_reset_done : 1;
+		u32 nalb_vf_reset_done : 1;
+		u32 ap_vf_reset_done : 1;
+		u32 dp_vf_reset_done : 1;
+		u32 qed_vf_reset_done : 1;
+		u32 dqed_vf_reset_done : 1;
+		u32 aqed_vf_reset_done : 1;
+		u32 rsvd0 : 6;
+		u32 vf_reset_active : 1;
+	} field;
+	u32 val;
+};
+
+#define DLB_CFG_MSTR_BCAST_RESET_VF_START 0xc8100000
+#define DLB_CFG_MSTR_BCAST_RESET_VF_START_RST 0x0
+/* HW Reset Types */
+#define VF_RST_TYPE_CQ_LDB   0
+#define VF_RST_TYPE_QID_LDB  1
+#define VF_RST_TYPE_POOL_LDB 2
+#define VF_RST_TYPE_CQ_DIR   8
+#define VF_RST_TYPE_QID_DIR  9
+#define VF_RST_TYPE_POOL_DIR 10
+union dlb_cfg_mstr_bcast_reset_vf_start {
+	struct {
+		u32 vf_reset_start : 1;
+		u32 reserved : 3;
+		u32 vf_reset_type : 4;
+		u32 vf_reset_id : 24;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_VF_VF2PF_MAILBOX_BYTES 256
+#define DLB_FUNC_VF_VF2PF_MAILBOX(x) \
+	(0x1000 + (x) * 0x4)
+#define DLB_FUNC_VF_VF2PF_MAILBOX_RST 0x0
+union dlb_func_vf_vf2pf_mailbox {
+	struct {
+		u32 msg : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_VF_VF2PF_MAILBOX_ISR 0x1f00
+#define DLB_FUNC_VF_VF2PF_MAILBOX_ISR_RST 0x0
+union dlb_func_vf_vf2pf_mailbox_isr {
+	struct {
+		u32 isr : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_VF_PF2VF_MAILBOX_BYTES 64
+#define DLB_FUNC_VF_PF2VF_MAILBOX(x) \
+	(0x2000 + (x) * 0x4)
+#define DLB_FUNC_VF_PF2VF_MAILBOX_RST 0x0
+union dlb_func_vf_pf2vf_mailbox {
+	struct {
+		u32 msg : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_VF_PF2VF_MAILBOX_ISR 0x2f00
+#define DLB_FUNC_VF_PF2VF_MAILBOX_ISR_RST 0x0
+union dlb_func_vf_pf2vf_mailbox_isr {
+	struct {
+		u32 pf_isr : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_VF_VF_MSI_ISR_PEND 0x2f10
+#define DLB_FUNC_VF_VF_MSI_ISR_PEND_RST 0x0
+union dlb_func_vf_vf_msi_isr_pend {
+	struct {
+		u32 isr_pend : 32;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_VF_VF_RESET_IN_PROGRESS 0x3000
+#define DLB_FUNC_VF_VF_RESET_IN_PROGRESS_RST 0x1
+union dlb_func_vf_vf_reset_in_progress {
+	struct {
+		u32 reset_in_progress : 1;
+		u32 rsvd0 : 31;
+	} field;
+	u32 val;
+};
+
+#define DLB_FUNC_VF_VF_MSI_ISR 0x4000
+#define DLB_FUNC_VF_VF_MSI_ISR_RST 0x0
+union dlb_func_vf_vf_msi_isr {
+	struct {
+		u32 vf_msi_isr : 32;
+	} field;
+	u32 val;
+};
+
+#endif /* __DLB_REGS_H */
diff --git a/drivers/event/dlb/pf/base/dlb_resource.c b/drivers/event/dlb/pf/base/dlb_resource.c
new file mode 100644
index 0000000..4d67eb8
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_resource.c
@@ -0,0 +1,9700 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#include "dlb_hw_types.h"
+#include "dlb_user.h"
+#include "dlb_resource.h"
+#include "dlb_osdep.h"
+#include "dlb_osdep_bitmap.h"
+#include "dlb_osdep_types.h"
+#include "dlb_regs.h"
+#include "dlb_mbox.h"
+
+#define DLB_DOM_LIST_HEAD(head, type) \
+	DLB_LIST_HEAD((head), type, domain_list)
+
+#define DLB_FUNC_LIST_HEAD(head, type) \
+	DLB_LIST_HEAD((head), type, func_list)
+
+#define DLB_DOM_LIST_FOR(head, ptr, iter) \
+	DLB_LIST_FOR_EACH(head, ptr, domain_list, iter)
+
+#define DLB_FUNC_LIST_FOR(head, ptr, iter) \
+	DLB_LIST_FOR_EACH(head, ptr, func_list, iter)
+
+#define DLB_DOM_LIST_FOR_SAFE(head, ptr, ptr_tmp, it, it_tmp) \
+	DLB_LIST_FOR_EACH_SAFE((head), ptr, ptr_tmp, domain_list, it, it_tmp)
+
+#define DLB_FUNC_LIST_FOR_SAFE(head, ptr, ptr_tmp, it, it_tmp) \
+	DLB_LIST_FOR_EACH_SAFE((head), ptr, ptr_tmp, func_list, it, it_tmp)
+
+/* The PF driver cannot assume that a register write will affect subsequent HCW
+ * writes. To ensure a write completes, the driver must read back a CSR. This
+ * function only need be called for configuration that can occur after the
+ * domain has started; prior to starting, applications can't send HCWs.
+ */
+static inline void dlb_flush_csr(struct dlb_hw *hw)
+{
+	DLB_CSR_RD(hw, DLB_SYS_TOTAL_VAS);
+}
+
+static void dlb_init_fn_rsrc_lists(struct dlb_function_resources *rsrc)
+{
+	dlb_list_init_head(&rsrc->avail_domains);
+	dlb_list_init_head(&rsrc->used_domains);
+	dlb_list_init_head(&rsrc->avail_ldb_queues);
+	dlb_list_init_head(&rsrc->avail_ldb_ports);
+	dlb_list_init_head(&rsrc->avail_dir_pq_pairs);
+	dlb_list_init_head(&rsrc->avail_ldb_credit_pools);
+	dlb_list_init_head(&rsrc->avail_dir_credit_pools);
+}
+
+static void dlb_init_domain_rsrc_lists(struct dlb_domain *domain)
+{
+	dlb_list_init_head(&domain->used_ldb_queues);
+	dlb_list_init_head(&domain->used_ldb_ports);
+	dlb_list_init_head(&domain->used_dir_pq_pairs);
+	dlb_list_init_head(&domain->used_ldb_credit_pools);
+	dlb_list_init_head(&domain->used_dir_credit_pools);
+	dlb_list_init_head(&domain->avail_ldb_queues);
+	dlb_list_init_head(&domain->avail_ldb_ports);
+	dlb_list_init_head(&domain->avail_dir_pq_pairs);
+	dlb_list_init_head(&domain->avail_ldb_credit_pools);
+	dlb_list_init_head(&domain->avail_dir_credit_pools);
+}
+
+int dlb_resource_init(struct dlb_hw *hw)
+{
+	struct dlb_list_entry *list;
+	unsigned int i;
+
+	/* For optimal load-balancing, ports that map to one or more QIDs in
+	 * common should not be in numerical sequence. This is application
+	 * dependent, but the driver interleaves port IDs as much as possible
+	 * to reduce the likelihood of this. This initial allocation maximizes
+	 * the average distance between an ID and its immediate neighbors (i.e.
+	 * the distance from 1 to 0 and to 2, the distance from 2 to 1 and to
+	 * 3, etc.).
+	 */
+	u32 init_ldb_port_allocation[DLB_MAX_NUM_LDB_PORTS] = {
+		0,  31, 62, 29, 60, 27, 58, 25, 56, 23, 54, 21, 52, 19, 50, 17,
+		48, 15, 46, 13, 44, 11, 42,  9, 40,  7, 38,  5, 36,  3, 34, 1,
+		32, 63, 30, 61, 28, 59, 26, 57, 24, 55, 22, 53, 20, 51, 18, 49,
+		16, 47, 14, 45, 12, 43, 10, 41,  8, 39,  6, 37,  4, 35,  2, 33
+	};
+
+	/* Zero-out resource tracking data structures */
+	memset(&hw->rsrcs, 0, sizeof(hw->rsrcs));
+	memset(&hw->pf, 0, sizeof(hw->pf));
+
+	dlb_init_fn_rsrc_lists(&hw->pf);
+
+	for (i = 0; i < DLB_MAX_NUM_VFS; i++) {
+		memset(&hw->vf[i], 0, sizeof(hw->vf[i]));
+		dlb_init_fn_rsrc_lists(&hw->vf[i]);
+	}
+
+	for (i = 0; i < DLB_MAX_NUM_DOMAINS; i++) {
+		memset(&hw->domains[i], 0, sizeof(hw->domains[i]));
+		dlb_init_domain_rsrc_lists(&hw->domains[i]);
+		hw->domains[i].parent_func = &hw->pf;
+	}
+
+	/* Give all resources to the PF driver */
+	hw->pf.num_avail_domains = DLB_MAX_NUM_DOMAINS;
+	for (i = 0; i < hw->pf.num_avail_domains; i++) {
+		list = &hw->domains[i].func_list;
+
+		dlb_list_add(&hw->pf.avail_domains, list);
+	}
+
+	hw->pf.num_avail_ldb_queues = DLB_MAX_NUM_LDB_QUEUES;
+	for (i = 0; i < hw->pf.num_avail_ldb_queues; i++) {
+		list = &hw->rsrcs.ldb_queues[i].func_list;
+
+		dlb_list_add(&hw->pf.avail_ldb_queues, list);
+	}
+
+	hw->pf.num_avail_ldb_ports = DLB_MAX_NUM_LDB_PORTS;
+	for (i = 0; i < hw->pf.num_avail_ldb_ports; i++) {
+		struct dlb_ldb_port *port;
+
+		port = &hw->rsrcs.ldb_ports[init_ldb_port_allocation[i]];
+
+		dlb_list_add(&hw->pf.avail_ldb_ports, &port->func_list);
+	}
+
+	hw->pf.num_avail_dir_pq_pairs = DLB_MAX_NUM_DIR_PORTS;
+	for (i = 0; i < hw->pf.num_avail_dir_pq_pairs; i++) {
+		list = &hw->rsrcs.dir_pq_pairs[i].func_list;
+
+		dlb_list_add(&hw->pf.avail_dir_pq_pairs, list);
+	}
+
+	hw->pf.num_avail_ldb_credit_pools = DLB_MAX_NUM_LDB_CREDIT_POOLS;
+	for (i = 0; i < hw->pf.num_avail_ldb_credit_pools; i++) {
+		list = &hw->rsrcs.ldb_credit_pools[i].func_list;
+
+		dlb_list_add(&hw->pf.avail_ldb_credit_pools, list);
+	}
+
+	hw->pf.num_avail_dir_credit_pools = DLB_MAX_NUM_DIR_CREDIT_POOLS;
+	for (i = 0; i < hw->pf.num_avail_dir_credit_pools; i++) {
+		list = &hw->rsrcs.dir_credit_pools[i].func_list;
+
+		dlb_list_add(&hw->pf.avail_dir_credit_pools, list);
+	}
+
+	/* There are 5120 history list entries, which allows us to overprovision
+	 * the inflight limit (4096) by 1k.
+	 */
+	if (dlb_bitmap_alloc(hw,
+			     &hw->pf.avail_hist_list_entries,
+			     DLB_MAX_NUM_HIST_LIST_ENTRIES))
+		return -1;
+
+	if (dlb_bitmap_fill(hw->pf.avail_hist_list_entries))
+		return -1;
+
+	if (dlb_bitmap_alloc(hw,
+			     &hw->pf.avail_qed_freelist_entries,
+			     DLB_MAX_NUM_LDB_CREDITS))
+		return -1;
+
+	if (dlb_bitmap_fill(hw->pf.avail_qed_freelist_entries))
+		return -1;
+
+	if (dlb_bitmap_alloc(hw,
+			     &hw->pf.avail_dqed_freelist_entries,
+			     DLB_MAX_NUM_DIR_CREDITS))
+		return -1;
+
+	if (dlb_bitmap_fill(hw->pf.avail_dqed_freelist_entries))
+		return -1;
+
+	if (dlb_bitmap_alloc(hw,
+			     &hw->pf.avail_aqed_freelist_entries,
+			     DLB_MAX_NUM_AQOS_ENTRIES))
+		return -1;
+
+	if (dlb_bitmap_fill(hw->pf.avail_aqed_freelist_entries))
+		return -1;
+
+	for (i = 0; i < DLB_MAX_NUM_VFS; i++) {
+		if (dlb_bitmap_alloc(hw,
+				     &hw->vf[i].avail_hist_list_entries,
+				     DLB_MAX_NUM_HIST_LIST_ENTRIES))
+			return -1;
+		if (dlb_bitmap_alloc(hw,
+				     &hw->vf[i].avail_qed_freelist_entries,
+				     DLB_MAX_NUM_LDB_CREDITS))
+			return -1;
+		if (dlb_bitmap_alloc(hw,
+				     &hw->vf[i].avail_dqed_freelist_entries,
+				     DLB_MAX_NUM_DIR_CREDITS))
+			return -1;
+		if (dlb_bitmap_alloc(hw,
+				     &hw->vf[i].avail_aqed_freelist_entries,
+				     DLB_MAX_NUM_AQOS_ENTRIES))
+			return -1;
+
+		if (dlb_bitmap_zero(hw->vf[i].avail_hist_list_entries))
+			return -1;
+
+		if (dlb_bitmap_zero(hw->vf[i].avail_qed_freelist_entries))
+			return -1;
+
+		if (dlb_bitmap_zero(hw->vf[i].avail_dqed_freelist_entries))
+			return -1;
+
+		if (dlb_bitmap_zero(hw->vf[i].avail_aqed_freelist_entries))
+			return -1;
+	}
+
+	/* Initialize the hardware resource IDs */
+	for (i = 0; i < DLB_MAX_NUM_DOMAINS; i++) {
+		hw->domains[i].id.phys_id = i;
+		hw->domains[i].id.vf_owned = false;
+	}
+
+	for (i = 0; i < DLB_MAX_NUM_LDB_QUEUES; i++) {
+		hw->rsrcs.ldb_queues[i].id.phys_id = i;
+		hw->rsrcs.ldb_queues[i].id.vf_owned = false;
+	}
+
+	for (i = 0; i < DLB_MAX_NUM_LDB_PORTS; i++) {
+		hw->rsrcs.ldb_ports[i].id.phys_id = i;
+		hw->rsrcs.ldb_ports[i].id.vf_owned = false;
+	}
+
+	for (i = 0; i < DLB_MAX_NUM_DIR_PORTS; i++) {
+		hw->rsrcs.dir_pq_pairs[i].id.phys_id = i;
+		hw->rsrcs.dir_pq_pairs[i].id.vf_owned = false;
+	}
+
+	for (i = 0; i < DLB_MAX_NUM_LDB_CREDIT_POOLS; i++) {
+		hw->rsrcs.ldb_credit_pools[i].id.phys_id = i;
+		hw->rsrcs.ldb_credit_pools[i].id.vf_owned = false;
+	}
+
+	for (i = 0; i < DLB_MAX_NUM_DIR_CREDIT_POOLS; i++) {
+		hw->rsrcs.dir_credit_pools[i].id.phys_id = i;
+		hw->rsrcs.dir_credit_pools[i].id.vf_owned = false;
+	}
+
+	for (i = 0; i < DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS; i++) {
+		hw->rsrcs.sn_groups[i].id = i;
+		/* Default mode (0) is 32 sequence numbers per queue */
+		hw->rsrcs.sn_groups[i].mode = 0;
+		hw->rsrcs.sn_groups[i].sequence_numbers_per_queue = 32;
+		hw->rsrcs.sn_groups[i].slot_use_bitmap = 0;
+	}
+
+	return 0;
+}
+
+void dlb_resource_free(struct dlb_hw *hw)
+{
+	int i;
+
+	dlb_bitmap_free(hw->pf.avail_hist_list_entries);
+
+	dlb_bitmap_free(hw->pf.avail_qed_freelist_entries);
+
+	dlb_bitmap_free(hw->pf.avail_dqed_freelist_entries);
+
+	dlb_bitmap_free(hw->pf.avail_aqed_freelist_entries);
+
+	for (i = 0; i < DLB_MAX_NUM_VFS; i++) {
+		dlb_bitmap_free(hw->vf[i].avail_hist_list_entries);
+		dlb_bitmap_free(hw->vf[i].avail_qed_freelist_entries);
+		dlb_bitmap_free(hw->vf[i].avail_dqed_freelist_entries);
+		dlb_bitmap_free(hw->vf[i].avail_aqed_freelist_entries);
+	}
+}
+
+static struct dlb_domain *dlb_get_domain_from_id(struct dlb_hw *hw,
+						 u32 id,
+						 bool vf_request,
+						 unsigned int vf_id)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_function_resources *rsrcs;
+	struct dlb_domain *domain;
+
+	if (id >= DLB_MAX_NUM_DOMAINS)
+		return NULL;
+
+	if (!vf_request)
+		return &hw->domains[id];
+
+	rsrcs = &hw->vf[vf_id];
+
+	DLB_FUNC_LIST_FOR(rsrcs->used_domains, domain, iter)
+		if (domain->id.virt_id == id)
+			return domain;
+
+	return NULL;
+}
+
+static struct dlb_credit_pool *
+dlb_get_domain_ldb_pool(u32 id,
+			bool vf_request,
+			struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_credit_pool *pool;
+
+	if (id >= DLB_MAX_NUM_LDB_CREDIT_POOLS)
+		return NULL;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_credit_pools, pool, iter)
+		if ((!vf_request && pool->id.phys_id == id) ||
+		    (vf_request && pool->id.virt_id == id))
+			return pool;
+
+	return NULL;
+}
+
+static struct dlb_credit_pool *
+dlb_get_domain_dir_pool(u32 id,
+			bool vf_request,
+			struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_credit_pool *pool;
+
+	if (id >= DLB_MAX_NUM_DIR_CREDIT_POOLS)
+		return NULL;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_credit_pools, pool, iter)
+		if ((!vf_request && pool->id.phys_id == id) ||
+		    (vf_request && pool->id.virt_id == id))
+			return pool;
+
+	return NULL;
+}
+
+static struct dlb_ldb_port *dlb_get_ldb_port_from_id(struct dlb_hw *hw,
+						     u32 id,
+						     bool vf_request,
+						     unsigned int vf_id)
+{
+	struct dlb_list_entry *iter1 __attribute__((unused));
+	struct dlb_list_entry *iter2 __attribute__((unused));
+	struct dlb_function_resources *rsrcs;
+	struct dlb_ldb_port *port;
+	struct dlb_domain *domain;
+
+	if (id >= DLB_MAX_NUM_LDB_PORTS)
+		return NULL;
+
+	rsrcs = (vf_request) ? &hw->vf[vf_id] : &hw->pf;
+
+	if (!vf_request)
+		return &hw->rsrcs.ldb_ports[id];
+
+	DLB_FUNC_LIST_FOR(rsrcs->used_domains, domain, iter1) {
+		DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter2)
+			if (port->id.virt_id == id)
+				return port;
+	}
+
+	DLB_FUNC_LIST_FOR(rsrcs->avail_ldb_ports, port, iter1)
+		if (port->id.virt_id == id)
+			return port;
+
+	return NULL;
+}
+
+static struct dlb_ldb_port *
+dlb_get_domain_used_ldb_port(u32 id,
+			     bool vf_request,
+			     struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	if (id >= DLB_MAX_NUM_LDB_PORTS)
+		return NULL;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter)
+		if ((!vf_request && port->id.phys_id == id) ||
+		    (vf_request && port->id.virt_id == id))
+			return port;
+
+	DLB_DOM_LIST_FOR(domain->avail_ldb_ports, port, iter)
+		if ((!vf_request && port->id.phys_id == id) ||
+		    (vf_request && port->id.virt_id == id))
+			return port;
+
+	return NULL;
+}
+
+static struct dlb_ldb_port *dlb_get_domain_ldb_port(u32 id,
+						    bool vf_request,
+						    struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	if (id >= DLB_MAX_NUM_LDB_PORTS)
+		return NULL;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter)
+		if ((!vf_request && port->id.phys_id == id) ||
+		    (vf_request && port->id.virt_id == id))
+			return port;
+
+	DLB_DOM_LIST_FOR(domain->avail_ldb_ports, port, iter)
+		if ((!vf_request && port->id.phys_id == id) ||
+		    (vf_request && port->id.virt_id == id))
+			return port;
+
+	return NULL;
+}
+
+static struct dlb_dir_pq_pair *dlb_get_dir_pq_from_id(struct dlb_hw *hw,
+						      u32 id,
+						      bool vf_request,
+						      unsigned int vf_id)
+{
+	struct dlb_list_entry *iter1 __attribute__((unused));
+	struct dlb_list_entry *iter2 __attribute__((unused));
+	struct dlb_function_resources *rsrcs;
+	struct dlb_dir_pq_pair *port;
+	struct dlb_domain *domain;
+
+	if (id >= DLB_MAX_NUM_DIR_PORTS)
+		return NULL;
+
+	rsrcs = (vf_request) ? &hw->vf[vf_id] : &hw->pf;
+
+	if (!vf_request)
+		return &hw->rsrcs.dir_pq_pairs[id];
+
+	DLB_FUNC_LIST_FOR(rsrcs->used_domains, domain, iter1) {
+		DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter2)
+			if (port->id.virt_id == id)
+				return port;
+	}
+
+	DLB_FUNC_LIST_FOR(rsrcs->avail_dir_pq_pairs, port, iter1)
+		if (port->id.virt_id == id)
+			return port;
+
+	return NULL;
+}
+
+static struct dlb_dir_pq_pair *
+dlb_get_domain_used_dir_pq(u32 id,
+			   bool vf_request,
+			   struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *port;
+
+	if (id >= DLB_MAX_NUM_DIR_PORTS)
+		return NULL;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter)
+		if ((!vf_request && port->id.phys_id == id) ||
+		    (vf_request && port->id.virt_id == id))
+			return port;
+
+	return NULL;
+}
+
+static struct dlb_dir_pq_pair *dlb_get_domain_dir_pq(u32 id,
+						     bool vf_request,
+						     struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *port;
+
+	if (id >= DLB_MAX_NUM_DIR_PORTS)
+		return NULL;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter)
+		if ((!vf_request && port->id.phys_id == id) ||
+		    (vf_request && port->id.virt_id == id))
+			return port;
+
+	DLB_DOM_LIST_FOR(domain->avail_dir_pq_pairs, port, iter)
+		if ((!vf_request && port->id.phys_id == id) ||
+		    (vf_request && port->id.virt_id == id))
+			return port;
+
+	return NULL;
+}
+
+static struct dlb_ldb_queue *dlb_get_ldb_queue_from_id(struct dlb_hw *hw,
+						       u32 id,
+						       bool vf_request,
+						       unsigned int vf_id)
+{
+	struct dlb_list_entry *iter1 __attribute__((unused));
+	struct dlb_list_entry *iter2 __attribute__((unused));
+	struct dlb_function_resources *rsrcs;
+	struct dlb_ldb_queue *queue;
+	struct dlb_domain *domain;
+
+	if (id >= DLB_MAX_NUM_LDB_QUEUES)
+		return NULL;
+
+	rsrcs = (vf_request) ? &hw->vf[vf_id] : &hw->pf;
+
+	if (!vf_request)
+		return &hw->rsrcs.ldb_queues[id];
+
+	DLB_FUNC_LIST_FOR(rsrcs->used_domains, domain, iter1) {
+		DLB_DOM_LIST_FOR(domain->used_ldb_queues, queue, iter2)
+			if (queue->id.virt_id == id)
+				return queue;
+	}
+
+	DLB_FUNC_LIST_FOR(rsrcs->avail_ldb_queues, queue, iter1)
+		if (queue->id.virt_id == id)
+			return queue;
+
+	return NULL;
+}
+
+static struct dlb_ldb_queue *dlb_get_domain_ldb_queue(u32 id,
+						      bool vf_request,
+						      struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_queue *queue;
+
+	if (id >= DLB_MAX_NUM_LDB_QUEUES)
+		return NULL;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_queues, queue, iter)
+		if ((!vf_request && queue->id.phys_id == id) ||
+		    (vf_request && queue->id.virt_id == id))
+			return queue;
+
+	return NULL;
+}
+
+#define DLB_XFER_LL_RSRC(dst, src, num, type_t, name) ({		    \
+	struct dlb_list_entry *it1 __attribute__((unused));		    \
+	struct dlb_list_entry *it2 __attribute__((unused));		    \
+	struct dlb_function_resources *_src = src;			    \
+	struct dlb_function_resources *_dst = dst;			    \
+	type_t *ptr, *tmp __attribute__((unused));			    \
+	unsigned int i = 0;						    \
+									    \
+	DLB_FUNC_LIST_FOR_SAFE(_src->avail_##name##s, ptr, tmp, it1, it2) { \
+		if (i++ == (num))					    \
+			break;						    \
+									    \
+		dlb_list_del(&_src->avail_##name##s, &ptr->func_list);	    \
+		dlb_list_add(&_dst->avail_##name##s,  &ptr->func_list);     \
+		_src->num_avail_##name##s--;				    \
+		_dst->num_avail_##name##s++;				    \
+	}								    \
+})
+
+#define DLB_VF_ID_CLEAR(head, type_t) ({   \
+	struct dlb_list_entry *iter __attribute__((unused)); \
+	type_t *var;					     \
+							     \
+	DLB_FUNC_LIST_FOR(head, var, iter)		     \
+		var->id.vf_owned = false;		     \
+})
+
+int dlb_update_vf_sched_domains(struct dlb_hw *hw, u32 vf_id, u32 num)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_function_resources *src, *dst;
+	struct dlb_domain *domain;
+	unsigned int orig;
+	int ret;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	orig = dst->num_avail_domains;
+
+	/* Detach the destination VF's current resources before checking if
+	 * enough are available, and set their IDs accordingly.
+	 */
+	DLB_VF_ID_CLEAR(dst->avail_domains, struct dlb_domain);
+
+	DLB_XFER_LL_RSRC(src, dst, orig, struct dlb_domain, domain);
+
+	/* Are there enough available resources to satisfy the request? */
+	if (num > src->num_avail_domains) {
+		num = orig;
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+	DLB_XFER_LL_RSRC(dst, src, num, struct dlb_domain, domain);
+
+	/* Set the domains' VF backpointer */
+	DLB_FUNC_LIST_FOR(dst->avail_domains, domain, iter)
+		domain->parent_func = dst;
+
+	return ret;
+}
+
+int dlb_update_vf_ldb_queues(struct dlb_hw *hw, u32 vf_id, u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+	unsigned int orig;
+	int ret;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	orig = dst->num_avail_ldb_queues;
+
+	/* Detach the destination VF's current resources before checking if
+	 * enough are available, and set their IDs accordingly.
+	 */
+	DLB_VF_ID_CLEAR(dst->avail_ldb_queues, struct dlb_ldb_queue);
+
+	DLB_XFER_LL_RSRC(src, dst, orig, struct dlb_ldb_queue, ldb_queue);
+
+	/* Are there enough available resources to satisfy the request? */
+	if (num > src->num_avail_ldb_queues) {
+		num = orig;
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+	DLB_XFER_LL_RSRC(dst, src, num, struct dlb_ldb_queue, ldb_queue);
+
+	return ret;
+}
+
+int dlb_update_vf_ldb_ports(struct dlb_hw *hw, u32 vf_id, u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+	unsigned int orig;
+	int ret;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	orig = dst->num_avail_ldb_ports;
+
+	/* Detach the destination VF's current resources before checking if
+	 * enough are available, and set their IDs accordingly.
+	 */
+	DLB_VF_ID_CLEAR(dst->avail_ldb_ports, struct dlb_ldb_port);
+
+	DLB_XFER_LL_RSRC(src, dst, orig, struct dlb_ldb_port, ldb_port);
+
+	/* Are there enough available resources to satisfy the request? */
+	if (num > src->num_avail_ldb_ports) {
+		num = orig;
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+	DLB_XFER_LL_RSRC(dst, src, num, struct dlb_ldb_port, ldb_port);
+
+	return ret;
+}
+
+int dlb_update_vf_dir_ports(struct dlb_hw *hw, u32 vf_id, u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+	unsigned int orig;
+	int ret;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	orig = dst->num_avail_dir_pq_pairs;
+
+	/* Detach the destination VF's current resources before checking if
+	 * enough are available, and set their IDs accordingly.
+	 */
+	DLB_VF_ID_CLEAR(dst->avail_dir_pq_pairs, struct dlb_dir_pq_pair);
+
+	DLB_XFER_LL_RSRC(src, dst, orig, struct dlb_dir_pq_pair, dir_pq_pair);
+
+	/* Are there enough available resources to satisfy the request? */
+	if (num > src->num_avail_dir_pq_pairs) {
+		num = orig;
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+	DLB_XFER_LL_RSRC(dst, src, num, struct dlb_dir_pq_pair, dir_pq_pair);
+
+	return ret;
+}
+
+int dlb_update_vf_ldb_credit_pools(struct dlb_hw *hw,
+				   u32 vf_id,
+				   u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+	unsigned int orig;
+	int ret;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	orig = dst->num_avail_ldb_credit_pools;
+
+	/* Detach the destination VF's current resources before checking if
+	 * enough are available, and set their IDs accordingly.
+	 */
+	DLB_VF_ID_CLEAR(dst->avail_ldb_credit_pools, struct dlb_credit_pool);
+
+	DLB_XFER_LL_RSRC(src,
+			 dst,
+			 orig,
+			 struct dlb_credit_pool,
+			 ldb_credit_pool);
+
+	/* Are there enough available resources to satisfy the request? */
+	if (num > src->num_avail_ldb_credit_pools) {
+		num = orig;
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+	DLB_XFER_LL_RSRC(dst,
+			 src,
+			 num,
+			 struct dlb_credit_pool,
+			 ldb_credit_pool);
+
+	return ret;
+}
+
+int dlb_update_vf_dir_credit_pools(struct dlb_hw *hw,
+				   u32 vf_id,
+				   u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+	unsigned int orig;
+	int ret;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	orig = dst->num_avail_dir_credit_pools;
+
+	/* Detach the VF's current resources before checking if enough are
+	 * available, and set their IDs accordingly.
+	 */
+	DLB_VF_ID_CLEAR(dst->avail_dir_credit_pools, struct dlb_credit_pool);
+
+	DLB_XFER_LL_RSRC(src,
+			 dst,
+			 orig,
+			 struct dlb_credit_pool,
+			 dir_credit_pool);
+
+	/* Are there enough available resources to satisfy the request? */
+	if (num > src->num_avail_dir_credit_pools) {
+		num = orig;
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+	DLB_XFER_LL_RSRC(dst,
+			 src,
+			 num,
+			 struct dlb_credit_pool,
+			 dir_credit_pool);
+
+	return ret;
+}
+
+static int dlb_transfer_bitmap_resources(struct dlb_bitmap *src,
+					 struct dlb_bitmap *dst,
+					 u32 num)
+{
+	int orig, ret, base;
+
+	/* Validate bitmaps before use */
+	if (dlb_bitmap_count(dst) < 0 || dlb_bitmap_count(src) < 0)
+		return -EINVAL;
+
+	/* Reassign the dest's bitmap entries to the source's before checking
+	 * if a contiguous chunk of size 'num' is available. The reassignment
+	 * may be necessary to create a sufficiently large contiguous chunk.
+	 */
+	orig = dlb_bitmap_count(dst);
+
+	dlb_bitmap_or(src, src, dst);
+
+	dlb_bitmap_zero(dst);
+
+	/* Are there enough available resources to satisfy the request? */
+	base = dlb_bitmap_find_set_bit_range(src, num);
+
+	if (base == -ENOENT) {
+		num = orig;
+		base = dlb_bitmap_find_set_bit_range(src, num);
+		ret = -EINVAL;
+	} else {
+		ret = 0;
+	}
+
+	dlb_bitmap_set_range(dst, base, num);
+
+	dlb_bitmap_clear_range(src, base, num);
+
+	return ret;
+}
+
+int dlb_update_vf_ldb_credits(struct dlb_hw *hw, u32 vf_id, u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	return dlb_transfer_bitmap_resources(src->avail_qed_freelist_entries,
+					     dst->avail_qed_freelist_entries,
+					     num);
+}
+
+int dlb_update_vf_dir_credits(struct dlb_hw *hw, u32 vf_id, u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	return dlb_transfer_bitmap_resources(src->avail_dqed_freelist_entries,
+					     dst->avail_dqed_freelist_entries,
+					     num);
+}
+
+int dlb_update_vf_hist_list_entries(struct dlb_hw *hw,
+				    u32 vf_id,
+				    u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	return dlb_transfer_bitmap_resources(src->avail_hist_list_entries,
+					     dst->avail_hist_list_entries,
+					     num);
+}
+
+int dlb_update_vf_atomic_inflights(struct dlb_hw *hw,
+				   u32 vf_id,
+				   u32 num)
+{
+	struct dlb_function_resources *src, *dst;
+
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	src = &hw->pf;
+	dst = &hw->vf[vf_id];
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	return dlb_transfer_bitmap_resources(src->avail_aqed_freelist_entries,
+					     dst->avail_aqed_freelist_entries,
+					     num);
+}
+
+static int dlb_attach_ldb_queues(struct dlb_hw *hw,
+				 struct dlb_function_resources *rsrcs,
+				 struct dlb_domain *domain,
+				 u32 num_queues,
+				 struct dlb_cmd_response *resp)
+{
+	unsigned int i, j;
+
+	if (rsrcs->num_avail_ldb_queues < num_queues) {
+		resp->status = DLB_ST_LDB_QUEUES_UNAVAILABLE;
+		return -1;
+	}
+
+	for (i = 0; i < num_queues; i++) {
+		struct dlb_ldb_queue *queue;
+
+		queue = DLB_FUNC_LIST_HEAD(rsrcs->avail_ldb_queues,
+					   typeof(*queue));
+		if (!queue) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: domain validation failed\n",
+				   __func__);
+			goto cleanup;
+		}
+
+		dlb_list_del(&rsrcs->avail_ldb_queues, &queue->func_list);
+
+		queue->domain_id = domain->id;
+		queue->owned = true;
+
+		dlb_list_add(&domain->avail_ldb_queues, &queue->domain_list);
+	}
+
+	rsrcs->num_avail_ldb_queues -= num_queues;
+
+	return 0;
+
+cleanup:
+
+	/* Return the assigned queues */
+	for (j = 0; j < i; j++) {
+		struct dlb_ldb_queue *queue;
+
+		queue = DLB_FUNC_LIST_HEAD(domain->avail_ldb_queues,
+					   typeof(*queue));
+		/* Unrecoverable internal error */
+		if (!queue)
+			break;
+
+		queue->owned = false;
+
+		dlb_list_del(&domain->avail_ldb_queues, &queue->domain_list);
+
+		dlb_list_add(&rsrcs->avail_ldb_queues, &queue->func_list);
+	}
+
+	return -EFAULT;
+}
+
+static struct dlb_ldb_port *
+dlb_get_next_ldb_port(struct dlb_hw *hw,
+		      struct dlb_function_resources *rsrcs,
+		      u32 domain_id)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	/* To reduce the odds of consecutive load-balanced ports mapping to the
+	 * same queue(s), the driver attempts to allocate ports whose neighbors
+	 * are owned by a different domain.
+	 */
+	DLB_FUNC_LIST_FOR(rsrcs->avail_ldb_ports, port, iter) {
+		u32 next, prev;
+		u32 phys_id;
+
+		phys_id = port->id.phys_id;
+		next = phys_id + 1;
+		prev = phys_id - 1;
+
+		if (phys_id == DLB_MAX_NUM_LDB_PORTS - 1)
+			next = 0;
+		if (phys_id == 0)
+			prev = DLB_MAX_NUM_LDB_PORTS - 1;
+
+		if (!hw->rsrcs.ldb_ports[next].owned ||
+		    hw->rsrcs.ldb_ports[next].domain_id.phys_id == domain_id)
+			continue;
+
+		if (!hw->rsrcs.ldb_ports[prev].owned ||
+		    hw->rsrcs.ldb_ports[prev].domain_id.phys_id == domain_id)
+			continue;
+
+		return port;
+	}
+
+	/* Failing that, the driver looks for a port with one neighbor owned by
+	 * a different domain and the other unallocated.
+	 */
+	DLB_FUNC_LIST_FOR(rsrcs->avail_ldb_ports, port, iter) {
+		u32 next, prev;
+		u32 phys_id;
+
+		phys_id = port->id.phys_id;
+		next = phys_id + 1;
+		prev = phys_id - 1;
+
+		if (phys_id == DLB_MAX_NUM_LDB_PORTS - 1)
+			next = 0;
+		if (phys_id == 0)
+			prev = DLB_MAX_NUM_LDB_PORTS - 1;
+
+		if (!hw->rsrcs.ldb_ports[prev].owned &&
+		    hw->rsrcs.ldb_ports[next].owned &&
+		    hw->rsrcs.ldb_ports[next].domain_id.phys_id != domain_id)
+			return port;
+
+		if (!hw->rsrcs.ldb_ports[next].owned &&
+		    hw->rsrcs.ldb_ports[prev].owned &&
+		    hw->rsrcs.ldb_ports[prev].domain_id.phys_id != domain_id)
+			return port;
+	}
+
+	/* Failing that, the driver looks for a port with both neighbors
+	 * unallocated.
+	 */
+	DLB_FUNC_LIST_FOR(rsrcs->avail_ldb_ports, port, iter) {
+		u32 next, prev;
+		u32 phys_id;
+
+		phys_id = port->id.phys_id;
+		next = phys_id + 1;
+		prev = phys_id - 1;
+
+		if (phys_id == DLB_MAX_NUM_LDB_PORTS - 1)
+			next = 0;
+		if (phys_id == 0)
+			prev = DLB_MAX_NUM_LDB_PORTS - 1;
+
+		if (!hw->rsrcs.ldb_ports[prev].owned &&
+		    !hw->rsrcs.ldb_ports[next].owned)
+			return port;
+	}
+
+	/* If all else fails, the driver returns the next available port. */
+	return DLB_FUNC_LIST_HEAD(rsrcs->avail_ldb_ports, typeof(*port));
+}
+
+static int dlb_attach_ldb_ports(struct dlb_hw *hw,
+				struct dlb_function_resources *rsrcs,
+				struct dlb_domain *domain,
+				u32 num_ports,
+				struct dlb_cmd_response *resp)
+{
+	unsigned int i, j;
+
+	if (rsrcs->num_avail_ldb_ports < num_ports) {
+		resp->status = DLB_ST_LDB_PORTS_UNAVAILABLE;
+		return -1;
+	}
+
+	for (i = 0; i < num_ports; i++) {
+		struct dlb_ldb_port *port;
+
+		port = dlb_get_next_ldb_port(hw, rsrcs, domain->id.phys_id);
+
+		if (!port) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: domain validation failed\n",
+				   __func__);
+			goto cleanup;
+		}
+
+		dlb_list_del(&rsrcs->avail_ldb_ports, &port->func_list);
+
+		port->domain_id = domain->id;
+		port->owned = true;
+
+		dlb_list_add(&domain->avail_ldb_ports, &port->domain_list);
+	}
+
+	rsrcs->num_avail_ldb_ports -= num_ports;
+
+	return 0;
+
+cleanup:
+
+	/* Return the assigned ports */
+	for (j = 0; j < i; j++) {
+		struct dlb_ldb_port *port;
+
+		port = DLB_FUNC_LIST_HEAD(domain->avail_ldb_ports,
+					  typeof(*port));
+		/* Unrecoverable internal error */
+		if (!port)
+			break;
+
+		port->owned = false;
+
+		dlb_list_del(&domain->avail_ldb_ports, &port->domain_list);
+
+		dlb_list_add(&rsrcs->avail_ldb_ports, &port->func_list);
+	}
+
+	return -EFAULT;
+}
+
+static int dlb_attach_dir_ports(struct dlb_hw *hw,
+				struct dlb_function_resources *rsrcs,
+				struct dlb_domain *domain,
+				u32 num_ports,
+				struct dlb_cmd_response *resp)
+{
+	unsigned int i, j;
+
+	if (rsrcs->num_avail_dir_pq_pairs < num_ports) {
+		resp->status = DLB_ST_DIR_PORTS_UNAVAILABLE;
+		return -1;
+	}
+
+	for (i = 0; i < num_ports; i++) {
+		struct dlb_dir_pq_pair *port;
+
+		port = DLB_FUNC_LIST_HEAD(rsrcs->avail_dir_pq_pairs,
+					  typeof(*port));
+		if (!port) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: domain validation failed\n",
+				   __func__);
+			goto cleanup;
+		}
+
+		dlb_list_del(&rsrcs->avail_dir_pq_pairs, &port->func_list);
+
+		port->domain_id = domain->id;
+		port->owned = true;
+
+		dlb_list_add(&domain->avail_dir_pq_pairs, &port->domain_list);
+	}
+
+	rsrcs->num_avail_dir_pq_pairs -= num_ports;
+
+	return 0;
+
+cleanup:
+
+	/* Return the assigned ports */
+	for (j = 0; j < i; j++) {
+		struct dlb_dir_pq_pair *port;
+
+		port = DLB_FUNC_LIST_HEAD(domain->avail_dir_pq_pairs,
+					  typeof(*port));
+		/* Unrecoverable internal error */
+		if (!port)
+			break;
+
+		port->owned = false;
+
+		dlb_list_del(&domain->avail_dir_pq_pairs, &port->domain_list);
+
+		dlb_list_add(&rsrcs->avail_dir_pq_pairs, &port->func_list);
+	}
+
+	return -EFAULT;
+}
+
+static int dlb_attach_ldb_credits(struct dlb_function_resources *rsrcs,
+				  struct dlb_domain *domain,
+				  u32 num_credits,
+				  struct dlb_cmd_response *resp)
+{
+	struct dlb_bitmap *bitmap = rsrcs->avail_qed_freelist_entries;
+
+	if (dlb_bitmap_count(bitmap) < (int)num_credits) {
+		resp->status = DLB_ST_LDB_CREDITS_UNAVAILABLE;
+		return -1;
+	}
+
+	if (num_credits) {
+		int base;
+
+		base = dlb_bitmap_find_set_bit_range(bitmap, num_credits);
+		if (base < 0)
+			goto error;
+
+		domain->qed_freelist.base = base;
+		domain->qed_freelist.bound = base + num_credits;
+		domain->qed_freelist.offset = 0;
+
+		dlb_bitmap_clear_range(bitmap, base, num_credits);
+	}
+
+	return 0;
+
+error:
+	resp->status = DLB_ST_QED_FREELIST_ENTRIES_UNAVAILABLE;
+	return -1;
+}
+
+static int dlb_attach_dir_credits(struct dlb_function_resources *rsrcs,
+				  struct dlb_domain *domain,
+				  u32 num_credits,
+				  struct dlb_cmd_response *resp)
+{
+	struct dlb_bitmap *bitmap = rsrcs->avail_dqed_freelist_entries;
+
+	if (dlb_bitmap_count(bitmap) < (int)num_credits) {
+		resp->status = DLB_ST_DIR_CREDITS_UNAVAILABLE;
+		return -1;
+	}
+
+	if (num_credits) {
+		int base;
+
+		base = dlb_bitmap_find_set_bit_range(bitmap, num_credits);
+		if (base < 0)
+			goto error;
+
+		domain->dqed_freelist.base = base;
+		domain->dqed_freelist.bound = base + num_credits;
+		domain->dqed_freelist.offset = 0;
+
+		dlb_bitmap_clear_range(bitmap, base, num_credits);
+	}
+
+	return 0;
+
+error:
+	resp->status = DLB_ST_DQED_FREELIST_ENTRIES_UNAVAILABLE;
+	return -1;
+}
+
+static int dlb_attach_ldb_credit_pools(struct dlb_hw *hw,
+				       struct dlb_function_resources *rsrcs,
+				       struct dlb_domain *domain,
+				       u32 num_credit_pools,
+				       struct dlb_cmd_response *resp)
+{
+	unsigned int i, j;
+
+	if (rsrcs->num_avail_ldb_credit_pools < num_credit_pools) {
+		resp->status = DLB_ST_LDB_CREDIT_POOLS_UNAVAILABLE;
+		return -1;
+	}
+
+	for (i = 0; i < num_credit_pools; i++) {
+		struct dlb_credit_pool *pool;
+
+		pool = DLB_FUNC_LIST_HEAD(rsrcs->avail_ldb_credit_pools,
+					  typeof(*pool));
+		if (!pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: domain validation failed\n",
+				   __func__);
+			goto cleanup;
+		}
+
+		dlb_list_del(&rsrcs->avail_ldb_credit_pools,
+			     &pool->func_list);
+
+		pool->domain_id = domain->id;
+		pool->owned = true;
+
+		dlb_list_add(&domain->avail_ldb_credit_pools,
+			     &pool->domain_list);
+	}
+
+	rsrcs->num_avail_ldb_credit_pools -= num_credit_pools;
+
+	return 0;
+
+cleanup:
+
+	/* Return the assigned credit pools */
+	for (j = 0; j < i; j++) {
+		struct dlb_credit_pool *pool;
+
+		pool = DLB_FUNC_LIST_HEAD(domain->avail_ldb_credit_pools,
+					  typeof(*pool));
+		/* Unrecoverable internal error */
+		if (!pool)
+			break;
+
+		pool->owned = false;
+
+		dlb_list_del(&domain->avail_ldb_credit_pools,
+			     &pool->domain_list);
+
+		dlb_list_add(&rsrcs->avail_ldb_credit_pools,
+			     &pool->func_list);
+	}
+
+	return -EFAULT;
+}
+
+static int dlb_attach_dir_credit_pools(struct dlb_hw *hw,
+				       struct dlb_function_resources *rsrcs,
+				       struct dlb_domain *domain,
+				       u32 num_credit_pools,
+				       struct dlb_cmd_response *resp)
+{
+	unsigned int i, j;
+
+	if (rsrcs->num_avail_dir_credit_pools < num_credit_pools) {
+		resp->status = DLB_ST_DIR_CREDIT_POOLS_UNAVAILABLE;
+		return -1;
+	}
+
+	for (i = 0; i < num_credit_pools; i++) {
+		struct dlb_credit_pool *pool;
+
+		pool = DLB_FUNC_LIST_HEAD(rsrcs->avail_dir_credit_pools,
+					  typeof(*pool));
+		if (!pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: domain validation failed\n",
+				   __func__);
+			goto cleanup;
+		}
+
+		dlb_list_del(&rsrcs->avail_dir_credit_pools,
+			     &pool->func_list);
+
+		pool->domain_id = domain->id;
+		pool->owned = true;
+
+		dlb_list_add(&domain->avail_dir_credit_pools,
+			     &pool->domain_list);
+	}
+
+	rsrcs->num_avail_dir_credit_pools -= num_credit_pools;
+
+	return 0;
+
+cleanup:
+
+	/* Return the assigned credit pools */
+	for (j = 0; j < i; j++) {
+		struct dlb_credit_pool *pool;
+
+		pool = DLB_FUNC_LIST_HEAD(domain->avail_dir_credit_pools,
+					  typeof(*pool));
+		/* Unrecoverable internal error */
+		if (!pool)
+			break;
+
+		pool->owned = false;
+
+		dlb_list_del(&domain->avail_dir_credit_pools,
+			     &pool->domain_list);
+
+		dlb_list_add(&rsrcs->avail_dir_credit_pools,
+			     &pool->func_list);
+	}
+
+	return -EFAULT;
+}
+
+static int dlb_attach_atomic_inflights(struct dlb_function_resources *rsrcs,
+				       struct dlb_domain *domain,
+				       u32 num_atomic_inflights,
+				       struct dlb_cmd_response *resp)
+{
+	if (num_atomic_inflights) {
+		struct dlb_bitmap *bitmap =
+			rsrcs->avail_aqed_freelist_entries;
+		int base;
+
+		base = dlb_bitmap_find_set_bit_range(bitmap,
+						     num_atomic_inflights);
+		if (base < 0)
+			goto error;
+
+		domain->aqed_freelist.base = base;
+		domain->aqed_freelist.bound = base + num_atomic_inflights;
+		domain->aqed_freelist.offset = 0;
+
+		dlb_bitmap_clear_range(bitmap, base, num_atomic_inflights);
+	}
+
+	return 0;
+
+error:
+	resp->status = DLB_ST_ATOMIC_INFLIGHTS_UNAVAILABLE;
+	return -1;
+}
+
+static int
+dlb_attach_domain_hist_list_entries(struct dlb_function_resources *rsrcs,
+				    struct dlb_domain *domain,
+				    u32 num_hist_list_entries,
+				    struct dlb_cmd_response *resp)
+{
+	struct dlb_bitmap *bitmap;
+	int base;
+
+	if (num_hist_list_entries) {
+		bitmap = rsrcs->avail_hist_list_entries;
+
+		base = dlb_bitmap_find_set_bit_range(bitmap,
+						     num_hist_list_entries);
+		if (base < 0)
+			goto error;
+
+		domain->total_hist_list_entries = num_hist_list_entries;
+		domain->avail_hist_list_entries = num_hist_list_entries;
+		domain->hist_list_entry_base = base;
+		domain->hist_list_entry_offset = 0;
+
+		dlb_bitmap_clear_range(bitmap, base, num_hist_list_entries);
+	}
+	return 0;
+
+error:
+	resp->status = DLB_ST_HIST_LIST_ENTRIES_UNAVAILABLE;
+	return -1;
+}
+
+static unsigned int
+dlb_get_num_ports_in_use(struct dlb_hw *hw)
+{
+	unsigned int i, n = 0;
+
+	for (i = 0; i < DLB_MAX_NUM_LDB_PORTS; i++)
+		if (hw->rsrcs.ldb_ports[i].owned)
+			n++;
+
+	for (i = 0; i < DLB_MAX_NUM_DIR_PORTS; i++)
+		if (hw->rsrcs.dir_pq_pairs[i].owned)
+			n++;
+
+	return n;
+}
+
+static int
+dlb_verify_create_sched_domain_args(struct dlb_hw *hw,
+				    struct dlb_function_resources *rsrcs,
+				    struct dlb_create_sched_domain_args *args,
+				    struct dlb_cmd_response *resp)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_bitmap *ldb_credit_freelist;
+	struct dlb_bitmap *dir_credit_freelist;
+	unsigned int ldb_credit_freelist_count;
+	unsigned int dir_credit_freelist_count;
+	unsigned int max_contig_aqed_entries;
+	unsigned int max_contig_dqed_entries;
+	unsigned int max_contig_qed_entries;
+	unsigned int max_contig_hl_entries;
+	struct dlb_bitmap *aqed_freelist;
+	enum dlb_dev_revision revision;
+
+	ldb_credit_freelist = rsrcs->avail_qed_freelist_entries;
+	dir_credit_freelist = rsrcs->avail_dqed_freelist_entries;
+	aqed_freelist = rsrcs->avail_aqed_freelist_entries;
+
+	ldb_credit_freelist_count = dlb_bitmap_count(ldb_credit_freelist);
+	dir_credit_freelist_count = dlb_bitmap_count(dir_credit_freelist);
+
+	max_contig_hl_entries =
+		dlb_bitmap_longest_set_range(rsrcs->avail_hist_list_entries);
+	max_contig_aqed_entries =
+		dlb_bitmap_longest_set_range(aqed_freelist);
+	max_contig_qed_entries =
+		dlb_bitmap_longest_set_range(ldb_credit_freelist);
+	max_contig_dqed_entries =
+		dlb_bitmap_longest_set_range(dir_credit_freelist);
+
+	if (rsrcs->num_avail_domains < 1)
+		resp->status = DLB_ST_DOMAIN_UNAVAILABLE;
+	else if (rsrcs->num_avail_ldb_queues < args->num_ldb_queues)
+		resp->status = DLB_ST_LDB_QUEUES_UNAVAILABLE;
+	else if (rsrcs->num_avail_ldb_ports < args->num_ldb_ports)
+		resp->status = DLB_ST_LDB_PORTS_UNAVAILABLE;
+	else if (args->num_ldb_queues > 0 && args->num_ldb_ports == 0)
+		resp->status = DLB_ST_LDB_PORT_REQUIRED_FOR_LDB_QUEUES;
+	else if (rsrcs->num_avail_dir_pq_pairs < args->num_dir_ports)
+		resp->status = DLB_ST_DIR_PORTS_UNAVAILABLE;
+	else if (ldb_credit_freelist_count < args->num_ldb_credits)
+		resp->status = DLB_ST_LDB_CREDITS_UNAVAILABLE;
+	else if (dir_credit_freelist_count < args->num_dir_credits)
+		resp->status = DLB_ST_DIR_CREDITS_UNAVAILABLE;
+	else if (rsrcs->num_avail_ldb_credit_pools < args->num_ldb_credit_pools)
+		resp->status = DLB_ST_LDB_CREDIT_POOLS_UNAVAILABLE;
+	else if (rsrcs->num_avail_dir_credit_pools < args->num_dir_credit_pools)
+		resp->status = DLB_ST_DIR_CREDIT_POOLS_UNAVAILABLE;
+	else if (max_contig_hl_entries < args->num_hist_list_entries)
+		resp->status = DLB_ST_HIST_LIST_ENTRIES_UNAVAILABLE;
+	else if (max_contig_aqed_entries < args->num_atomic_inflights)
+		resp->status = DLB_ST_ATOMIC_INFLIGHTS_UNAVAILABLE;
+	else if (max_contig_qed_entries < args->num_ldb_credits)
+		resp->status = DLB_ST_QED_FREELIST_ENTRIES_UNAVAILABLE;
+	else if (max_contig_dqed_entries < args->num_dir_credits)
+		resp->status = DLB_ST_DQED_FREELIST_ENTRIES_UNAVAILABLE;
+
+	/* DLB A-stepping workaround for hardware write buffer lock up issue:
+	 * limit the maximum configured ports to less than 128 and disable CQ
+	 * occupancy interrupts.
+	 */
+	revision = os_get_dev_revision(hw);
+
+	if (revision < DLB_B0) {
+		u32 n = dlb_get_num_ports_in_use(hw);
+
+		n += args->num_ldb_ports + args->num_dir_ports;
+
+		if (n >= DLB_A_STEP_MAX_PORTS)
+			resp->status = args->num_ldb_ports ?
+				DLB_ST_LDB_PORTS_UNAVAILABLE :
+				DLB_ST_DIR_PORTS_UNAVAILABLE;
+	}
+
+	if (resp->status)
+		return -1;
+
+	return 0;
+}
+
+static int
+dlb_verify_create_ldb_pool_args(struct dlb_hw *hw,
+				u32 domain_id,
+				struct dlb_create_ldb_pool_args *args,
+				struct dlb_cmd_response *resp,
+				bool vf_request,
+				unsigned int vf_id)
+{
+	struct dlb_freelist *qed_freelist;
+	struct dlb_domain *domain;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	qed_freelist = &domain->qed_freelist;
+
+	if (dlb_freelist_count(qed_freelist) < args->num_ldb_credits) {
+		resp->status = DLB_ST_LDB_CREDITS_UNAVAILABLE;
+		return -1;
+	}
+
+	if (dlb_list_empty(&domain->avail_ldb_credit_pools)) {
+		resp->status = DLB_ST_LDB_CREDIT_POOLS_UNAVAILABLE;
+		return -1;
+	}
+
+	if (domain->started) {
+		resp->status = DLB_ST_DOMAIN_STARTED;
+		return -1;
+	}
+
+	return 0;
+}
+
+static void
+dlb_configure_ldb_credit_pool(struct dlb_hw *hw,
+			      struct dlb_domain *domain,
+			      struct dlb_create_ldb_pool_args *args,
+			      struct dlb_credit_pool *pool)
+{
+	union dlb_sys_ldb_pool_enbld r0 = { {0} };
+	union dlb_chp_ldb_pool_crd_lim r1 = { {0} };
+	union dlb_chp_ldb_pool_crd_cnt r2 = { {0} };
+	union dlb_chp_qed_fl_base  r3 = { {0} };
+	union dlb_chp_qed_fl_lim r4 = { {0} };
+	union dlb_chp_qed_fl_push_ptr r5 = { {0} };
+	union dlb_chp_qed_fl_pop_ptr  r6 = { {0} };
+
+	r1.field.limit = args->num_ldb_credits;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_POOL_CRD_LIM(pool->id.phys_id), r1.val);
+
+	r2.field.count = args->num_ldb_credits;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_POOL_CRD_CNT(pool->id.phys_id), r2.val);
+
+	r3.field.base = domain->qed_freelist.base + domain->qed_freelist.offset;
+
+	DLB_CSR_WR(hw, DLB_CHP_QED_FL_BASE(pool->id.phys_id), r3.val);
+
+	r4.field.freelist_disable = 0;
+	r4.field.limit = r3.field.base + args->num_ldb_credits - 1;
+
+	DLB_CSR_WR(hw, DLB_CHP_QED_FL_LIM(pool->id.phys_id), r4.val);
+
+	r5.field.push_ptr = r3.field.base;
+	r5.field.generation = 1;
+
+	DLB_CSR_WR(hw, DLB_CHP_QED_FL_PUSH_PTR(pool->id.phys_id), r5.val);
+
+	r6.field.pop_ptr = r3.field.base;
+	r6.field.generation = 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_QED_FL_POP_PTR(pool->id.phys_id), r6.val);
+
+	r0.field.pool_enabled = 1;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_POOL_ENBLD(pool->id.phys_id), r0.val);
+
+	pool->avail_credits = args->num_ldb_credits;
+	pool->total_credits = args->num_ldb_credits;
+	domain->qed_freelist.offset += args->num_ldb_credits;
+
+	pool->configured = true;
+}
+
+static int
+dlb_verify_create_dir_pool_args(struct dlb_hw *hw,
+				u32 domain_id,
+				struct dlb_create_dir_pool_args *args,
+				struct dlb_cmd_response *resp,
+				bool vf_request,
+				unsigned int vf_id)
+{
+	struct dlb_freelist *dqed_freelist;
+	struct dlb_domain *domain;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	dqed_freelist = &domain->dqed_freelist;
+
+	if (dlb_freelist_count(dqed_freelist) < args->num_dir_credits) {
+		resp->status = DLB_ST_DIR_CREDITS_UNAVAILABLE;
+		return -1;
+	}
+
+	if (dlb_list_empty(&domain->avail_dir_credit_pools)) {
+		resp->status = DLB_ST_DIR_CREDIT_POOLS_UNAVAILABLE;
+		return -1;
+	}
+
+	if (domain->started) {
+		resp->status = DLB_ST_DOMAIN_STARTED;
+		return -1;
+	}
+
+	return 0;
+}
+
+static void
+dlb_configure_dir_credit_pool(struct dlb_hw *hw,
+			      struct dlb_domain *domain,
+			      struct dlb_create_dir_pool_args *args,
+			      struct dlb_credit_pool *pool)
+{
+	union dlb_sys_dir_pool_enbld r0 = { {0} };
+	union dlb_chp_dir_pool_crd_lim r1 = { {0} };
+	union dlb_chp_dir_pool_crd_cnt r2 = { {0} };
+	union dlb_chp_dqed_fl_base  r3 = { {0} };
+	union dlb_chp_dqed_fl_lim r4 = { {0} };
+	union dlb_chp_dqed_fl_push_ptr r5 = { {0} };
+	union dlb_chp_dqed_fl_pop_ptr  r6 = { {0} };
+
+	r1.field.limit = args->num_dir_credits;
+
+	DLB_CSR_WR(hw, DLB_CHP_DIR_POOL_CRD_LIM(pool->id.phys_id), r1.val);
+
+	r2.field.count = args->num_dir_credits;
+
+	DLB_CSR_WR(hw, DLB_CHP_DIR_POOL_CRD_CNT(pool->id.phys_id), r2.val);
+
+	r3.field.base = domain->dqed_freelist.base +
+			domain->dqed_freelist.offset;
+
+	DLB_CSR_WR(hw, DLB_CHP_DQED_FL_BASE(pool->id.phys_id), r3.val);
+
+	r4.field.freelist_disable = 0;
+	r4.field.limit = r3.field.base + args->num_dir_credits - 1;
+
+	DLB_CSR_WR(hw, DLB_CHP_DQED_FL_LIM(pool->id.phys_id), r4.val);
+
+	r5.field.push_ptr = r3.field.base;
+	r5.field.generation = 1;
+
+	DLB_CSR_WR(hw, DLB_CHP_DQED_FL_PUSH_PTR(pool->id.phys_id), r5.val);
+
+	r6.field.pop_ptr = r3.field.base;
+	r6.field.generation = 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_DQED_FL_POP_PTR(pool->id.phys_id), r6.val);
+
+	r0.field.pool_enabled = 1;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_POOL_ENBLD(pool->id.phys_id), r0.val);
+
+	pool->avail_credits = args->num_dir_credits;
+	pool->total_credits = args->num_dir_credits;
+	domain->dqed_freelist.offset += args->num_dir_credits;
+
+	pool->configured = true;
+}
+
+static int
+dlb_verify_create_ldb_queue_args(struct dlb_hw *hw,
+				 u32 domain_id,
+				 struct dlb_create_ldb_queue_args *args,
+				 struct dlb_cmd_response *resp,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	struct dlb_freelist *aqed_freelist;
+	struct dlb_domain *domain;
+	int i;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	if (domain->started) {
+		resp->status = DLB_ST_DOMAIN_STARTED;
+		return -1;
+	}
+
+	if (dlb_list_empty(&domain->avail_ldb_queues)) {
+		resp->status = DLB_ST_LDB_QUEUES_UNAVAILABLE;
+		return -1;
+	}
+
+	if (args->num_sequence_numbers) {
+		for (i = 0; i < DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS; i++) {
+			struct dlb_sn_group *group = &hw->rsrcs.sn_groups[i];
+
+			if (group->sequence_numbers_per_queue ==
+			    args->num_sequence_numbers &&
+			    !dlb_sn_group_full(group))
+				break;
+		}
+
+		if (i == DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS) {
+			resp->status = DLB_ST_SEQUENCE_NUMBERS_UNAVAILABLE;
+			return -1;
+		}
+	}
+
+	if (args->num_qid_inflights > 4096) {
+		resp->status = DLB_ST_INVALID_QID_INFLIGHT_ALLOCATION;
+		return -1;
+	}
+
+	/* Inflights must be <= number of sequence numbers if ordered */
+	if (args->num_sequence_numbers != 0 &&
+	    args->num_qid_inflights > args->num_sequence_numbers) {
+		resp->status = DLB_ST_INVALID_QID_INFLIGHT_ALLOCATION;
+		return -1;
+	}
+
+	aqed_freelist = &domain->aqed_freelist;
+
+	if (dlb_freelist_count(aqed_freelist) < args->num_atomic_inflights) {
+		resp->status = DLB_ST_ATOMIC_INFLIGHTS_UNAVAILABLE;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+dlb_verify_create_dir_queue_args(struct dlb_hw *hw,
+				 u32 domain_id,
+				 struct dlb_create_dir_queue_args *args,
+				 struct dlb_cmd_response *resp,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	if (domain->started) {
+		resp->status = DLB_ST_DOMAIN_STARTED;
+		return -1;
+	}
+
+	/* If the user claims the port is already configured, validate the port
+	 * ID, its domain, and whether the port is configured.
+	 */
+	if (args->port_id != -1) {
+		struct dlb_dir_pq_pair *port;
+
+		port = dlb_get_domain_used_dir_pq(args->port_id,
+						  vf_request,
+						  domain);
+
+		if (!port || port->domain_id.phys_id != domain->id.phys_id ||
+		    !port->port_configured) {
+			resp->status = DLB_ST_INVALID_PORT_ID;
+			return -1;
+		}
+	}
+
+	/* If the queue's port is not configured, validate that a free
+	 * port-queue pair is available.
+	 */
+	if (args->port_id == -1 &&
+	    dlb_list_empty(&domain->avail_dir_pq_pairs)) {
+		resp->status = DLB_ST_DIR_QUEUES_UNAVAILABLE;
+		return -1;
+	}
+
+	return 0;
+}
+
+static void dlb_configure_ldb_queue(struct dlb_hw *hw,
+				    struct dlb_domain *domain,
+				    struct dlb_ldb_queue *queue,
+				    struct dlb_create_ldb_queue_args *args,
+				    bool vf_request,
+				    unsigned int vf_id)
+{
+	union dlb_sys_vf_ldb_vqid_v r0 = { {0} };
+	union dlb_sys_vf_ldb_vqid2qid r1 = { {0} };
+	union dlb_sys_ldb_qid2vqid r2 = { {0} };
+	union dlb_sys_ldb_vasqid_v r3 = { {0} };
+	union dlb_lsp_qid_ldb_infl_lim r4 = { {0} };
+	union dlb_lsp_qid_aqed_active_lim r5 = { {0} };
+	union dlb_aqed_pipe_fl_lim r6 = { {0} };
+	union dlb_aqed_pipe_fl_base r7 = { {0} };
+	union dlb_chp_ord_qid_sn_map r11 = { {0} };
+	union dlb_sys_ldb_qid_cfg_v r12 = { {0} };
+	union dlb_sys_ldb_qid_v r13 = { {0} };
+	union dlb_aqed_pipe_fl_push_ptr r14 = { {0} };
+	union dlb_aqed_pipe_fl_pop_ptr r15 = { {0} };
+	union dlb_aqed_pipe_qid_fid_lim r16 = { {0} };
+	union dlb_ro_pipe_qid2grpslt r17 = { {0} };
+	struct dlb_sn_group *sn_group;
+	unsigned int offs;
+
+	/* QID write permissions are turned on when the domain is started */
+	r3.field.vasqid_v = 0;
+
+	offs = domain->id.phys_id * DLB_MAX_NUM_LDB_QUEUES + queue->id.phys_id;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_VASQID_V(offs), r3.val);
+
+	/* Unordered QIDs get 4K inflights, ordered get as many as the number
+	 * of sequence numbers.
+	 */
+	r4.field.limit = args->num_qid_inflights;
+
+	DLB_CSR_WR(hw, DLB_LSP_QID_LDB_INFL_LIM(queue->id.phys_id), r4.val);
+
+	r5.field.limit = queue->aqed_freelist.bound -
+			 queue->aqed_freelist.base;
+
+	if (r5.field.limit > DLB_MAX_NUM_AQOS_ENTRIES)
+		r5.field.limit = DLB_MAX_NUM_AQOS_ENTRIES;
+
+	/* AQOS */
+	DLB_CSR_WR(hw, DLB_LSP_QID_AQED_ACTIVE_LIM(queue->id.phys_id), r5.val);
+
+	r6.field.freelist_disable = 0;
+	r6.field.limit = queue->aqed_freelist.bound - 1;
+
+	DLB_CSR_WR(hw, DLB_AQED_PIPE_FL_LIM(queue->id.phys_id), r6.val);
+
+	r7.field.base = queue->aqed_freelist.base;
+
+	DLB_CSR_WR(hw, DLB_AQED_PIPE_FL_BASE(queue->id.phys_id), r7.val);
+
+	r14.field.push_ptr = r7.field.base;
+	r14.field.generation = 1;
+
+	DLB_CSR_WR(hw, DLB_AQED_PIPE_FL_PUSH_PTR(queue->id.phys_id), r14.val);
+
+	r15.field.pop_ptr = r7.field.base;
+	r15.field.generation = 0;
+
+	DLB_CSR_WR(hw, DLB_AQED_PIPE_FL_POP_PTR(queue->id.phys_id), r15.val);
+
+	/* Configure SNs */
+	sn_group = &hw->rsrcs.sn_groups[queue->sn_group];
+	r11.field.mode = sn_group->mode;
+	r11.field.slot = queue->sn_slot;
+	r11.field.grp  = sn_group->id;
+
+	DLB_CSR_WR(hw, DLB_CHP_ORD_QID_SN_MAP(queue->id.phys_id), r11.val);
+
+	/* This register limits the number of inflight flows a queue can have
+	 * at one time.  It has an upper bound of 2048, but can be
+	 * over-subscribed. 512 is chosen so that a single queue doesn't use
+	 * the entire atomic storage, but can use a substantial portion if
+	 * needed.
+	 */
+	r16.field.qid_fid_limit = 512;
+
+	DLB_CSR_WR(hw, DLB_AQED_PIPE_QID_FID_LIM(queue->id.phys_id), r16.val);
+
+	r17.field.group = sn_group->id;
+	r17.field.slot = queue->sn_slot;
+
+	DLB_CSR_WR(hw, DLB_RO_PIPE_QID2GRPSLT(queue->id.phys_id), r17.val);
+
+	r12.field.sn_cfg_v = (args->num_sequence_numbers != 0);
+	r12.field.fid_cfg_v = (args->num_atomic_inflights != 0);
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_QID_CFG_V(queue->id.phys_id), r12.val);
+
+	if (vf_request) {
+		unsigned int offs;
+
+		r0.field.vqid_v = 1;
+
+		offs = vf_id * DLB_MAX_NUM_LDB_QUEUES + queue->id.virt_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_LDB_VQID_V(offs), r0.val);
+
+		r1.field.qid = queue->id.phys_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_LDB_VQID2QID(offs), r1.val);
+
+		r2.field.vqid = queue->id.virt_id;
+
+		offs = vf_id * DLB_MAX_NUM_LDB_QUEUES + queue->id.phys_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_LDB_QID2VQID(offs), r2.val);
+	}
+
+	r13.field.qid_v = 1;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_QID_V(queue->id.phys_id), r13.val);
+}
+
+static void dlb_configure_dir_queue(struct dlb_hw *hw,
+				    struct dlb_domain *domain,
+				    struct dlb_dir_pq_pair *queue,
+				    bool vf_request,
+				    unsigned int vf_id)
+{
+	union dlb_sys_dir_vasqid_v r0 = { {0} };
+	unsigned int offs;
+
+	/* QID write permissions are turned on when the domain is started */
+	r0.field.vasqid_v = 0;
+
+	offs = (domain->id.phys_id * DLB_MAX_NUM_DIR_PORTS) + queue->id.phys_id;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_VASQID_V(offs), r0.val);
+
+	if (vf_request) {
+		union dlb_sys_vf_dir_vqid_v   r1 = { {0} };
+		union dlb_sys_vf_dir_vqid2qid r2 = { {0} };
+
+		r1.field.vqid_v = 1;
+
+		offs = (vf_id * DLB_MAX_NUM_DIR_PORTS) + queue->id.virt_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_DIR_VQID_V(offs), r1.val);
+
+		r2.field.qid = queue->id.phys_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_DIR_VQID2QID(offs), r2.val);
+	} else {
+		union dlb_sys_dir_qid_v r3 = { {0} };
+
+		r3.field.qid_v = 1;
+
+		DLB_CSR_WR(hw, DLB_SYS_DIR_QID_V(queue->id.phys_id), r3.val);
+	}
+
+	queue->queue_configured = true;
+}
+
+static int
+dlb_verify_create_ldb_port_args(struct dlb_hw *hw,
+				u32 domain_id,
+				u64 pop_count_dma_base,
+				u64 cq_dma_base,
+				struct dlb_create_ldb_port_args *args,
+				struct dlb_cmd_response *resp,
+				bool vf_request,
+				unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_credit_pool *pool;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	if (domain->started) {
+		resp->status = DLB_ST_DOMAIN_STARTED;
+		return -1;
+	}
+
+	if (dlb_list_empty(&domain->avail_ldb_ports)) {
+		resp->status = DLB_ST_LDB_PORTS_UNAVAILABLE;
+		return -1;
+	}
+
+	/* If the scheduling domain has no LDB queues, we configure the
+	 * hardware to not supply the port with any LDB credits. In that
+	 * case, ignore the LDB credit arguments.
+	 */
+	if (!dlb_list_empty(&domain->used_ldb_queues) ||
+	    !dlb_list_empty(&domain->avail_ldb_queues)) {
+		pool = dlb_get_domain_ldb_pool(args->ldb_credit_pool_id,
+					       vf_request,
+					       domain);
+
+		if (!pool || !pool->configured ||
+		    pool->domain_id.phys_id != domain->id.phys_id) {
+			resp->status = DLB_ST_INVALID_LDB_CREDIT_POOL_ID;
+			return -1;
+		}
+
+		if (args->ldb_credit_high_watermark > pool->avail_credits) {
+			resp->status = DLB_ST_LDB_CREDITS_UNAVAILABLE;
+			return -1;
+		}
+
+		if (args->ldb_credit_low_watermark >=
+		    args->ldb_credit_high_watermark) {
+			resp->status = DLB_ST_INVALID_LDB_CREDIT_LOW_WATERMARK;
+			return -1;
+		}
+
+		if (args->ldb_credit_quantum >=
+		    args->ldb_credit_high_watermark) {
+			resp->status = DLB_ST_INVALID_LDB_CREDIT_QUANTUM;
+			return -1;
+		}
+
+		if (args->ldb_credit_quantum > DLB_MAX_PORT_CREDIT_QUANTUM) {
+			resp->status = DLB_ST_INVALID_LDB_CREDIT_QUANTUM;
+			return -1;
+		}
+	}
+
+	/* Likewise, if the scheduling domain has no DIR queues, we configure
+	 * the hardware to not supply the port with any DIR credits. In that
+	 * case, ignore the DIR credit arguments.
+	 */
+	if (!dlb_list_empty(&domain->used_dir_pq_pairs) ||
+	    !dlb_list_empty(&domain->avail_dir_pq_pairs)) {
+		pool = dlb_get_domain_dir_pool(args->dir_credit_pool_id,
+					       vf_request,
+					       domain);
+
+		if (!pool || !pool->configured ||
+		    pool->domain_id.phys_id != domain->id.phys_id) {
+			resp->status = DLB_ST_INVALID_DIR_CREDIT_POOL_ID;
+			return -1;
+		}
+
+		if (args->dir_credit_high_watermark > pool->avail_credits) {
+			resp->status = DLB_ST_DIR_CREDITS_UNAVAILABLE;
+			return -1;
+		}
+
+		if (args->dir_credit_low_watermark >=
+		    args->dir_credit_high_watermark) {
+			resp->status = DLB_ST_INVALID_DIR_CREDIT_LOW_WATERMARK;
+			return -1;
+		}
+
+		if (args->dir_credit_quantum >=
+		    args->dir_credit_high_watermark) {
+			resp->status = DLB_ST_INVALID_DIR_CREDIT_QUANTUM;
+			return -1;
+		}
+
+		if (args->dir_credit_quantum > DLB_MAX_PORT_CREDIT_QUANTUM) {
+			resp->status = DLB_ST_INVALID_DIR_CREDIT_QUANTUM;
+			return -1;
+		}
+	}
+
+	/* Check cache-line alignment */
+	if ((pop_count_dma_base & 0x3F) != 0) {
+		resp->status = DLB_ST_INVALID_POP_COUNT_VIRT_ADDR;
+		return -1;
+	}
+
+	if ((cq_dma_base & 0x3F) != 0) {
+		resp->status = DLB_ST_INVALID_CQ_VIRT_ADDR;
+		return -1;
+	}
+
+	if (args->cq_depth != 1 &&
+	    args->cq_depth != 2 &&
+	    args->cq_depth != 4 &&
+	    args->cq_depth != 8 &&
+	    args->cq_depth != 16 &&
+	    args->cq_depth != 32 &&
+	    args->cq_depth != 64 &&
+	    args->cq_depth != 128 &&
+	    args->cq_depth != 256 &&
+	    args->cq_depth != 512 &&
+	    args->cq_depth != 1024) {
+		resp->status = DLB_ST_INVALID_CQ_DEPTH;
+		return -1;
+	}
+
+	/* The history list size must be >= 1 */
+	if (!args->cq_history_list_size) {
+		resp->status = DLB_ST_INVALID_HIST_LIST_DEPTH;
+		return -1;
+	}
+
+	if (args->cq_history_list_size > domain->avail_hist_list_entries) {
+		resp->status = DLB_ST_HIST_LIST_ENTRIES_UNAVAILABLE;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+dlb_verify_create_dir_port_args(struct dlb_hw *hw,
+				u32 domain_id,
+				u64 pop_count_dma_base,
+				u64 cq_dma_base,
+				struct dlb_create_dir_port_args *args,
+				struct dlb_cmd_response *resp,
+				bool vf_request,
+				unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_credit_pool *pool;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	if (domain->started) {
+		resp->status = DLB_ST_DOMAIN_STARTED;
+		return -1;
+	}
+
+	/* If the user claims the queue is already configured, validate
+	 * the queue ID, its domain, and whether the queue is configured.
+	 */
+	if (args->queue_id != -1) {
+		struct dlb_dir_pq_pair *queue;
+
+		queue = dlb_get_domain_used_dir_pq(args->queue_id,
+						   vf_request,
+						   domain);
+
+		if (!queue || queue->domain_id.phys_id != domain->id.phys_id ||
+		    !queue->queue_configured) {
+			resp->status = DLB_ST_INVALID_DIR_QUEUE_ID;
+			return -1;
+		}
+	}
+
+	/* If the port's queue is not configured, validate that a free
+	 * port-queue pair is available.
+	 */
+	if (args->queue_id == -1 &&
+	    dlb_list_empty(&domain->avail_dir_pq_pairs)) {
+		resp->status = DLB_ST_DIR_PORTS_UNAVAILABLE;
+		return -1;
+	}
+
+	/* If the scheduling domain has no LDB queues, we configure the
+	 * hardware to not supply the port with any LDB credits. In that
+	 * case, ignore the LDB credit arguments.
+	 */
+	if (!dlb_list_empty(&domain->used_ldb_queues) ||
+	    !dlb_list_empty(&domain->avail_ldb_queues)) {
+		pool = dlb_get_domain_ldb_pool(args->ldb_credit_pool_id,
+					       vf_request,
+					       domain);
+
+		if (!pool || !pool->configured ||
+		    pool->domain_id.phys_id != domain->id.phys_id) {
+			resp->status = DLB_ST_INVALID_LDB_CREDIT_POOL_ID;
+			return -1;
+		}
+
+		if (args->ldb_credit_high_watermark > pool->avail_credits) {
+			resp->status = DLB_ST_LDB_CREDITS_UNAVAILABLE;
+			return -1;
+		}
+
+		if (args->ldb_credit_low_watermark >=
+		    args->ldb_credit_high_watermark) {
+			resp->status = DLB_ST_INVALID_LDB_CREDIT_LOW_WATERMARK;
+			return -1;
+		}
+
+		if (args->ldb_credit_quantum >=
+		    args->ldb_credit_high_watermark) {
+			resp->status = DLB_ST_INVALID_LDB_CREDIT_QUANTUM;
+			return -1;
+		}
+
+		if (args->ldb_credit_quantum > DLB_MAX_PORT_CREDIT_QUANTUM) {
+			resp->status = DLB_ST_INVALID_LDB_CREDIT_QUANTUM;
+			return -1;
+		}
+	}
+
+	pool = dlb_get_domain_dir_pool(args->dir_credit_pool_id,
+				       vf_request,
+				       domain);
+
+	if (!pool || !pool->configured ||
+	    pool->domain_id.phys_id != domain->id.phys_id) {
+		resp->status = DLB_ST_INVALID_DIR_CREDIT_POOL_ID;
+		return -1;
+	}
+
+	if (args->dir_credit_high_watermark > pool->avail_credits) {
+		resp->status = DLB_ST_DIR_CREDITS_UNAVAILABLE;
+		return -1;
+	}
+
+	if (args->dir_credit_low_watermark >= args->dir_credit_high_watermark) {
+		resp->status = DLB_ST_INVALID_DIR_CREDIT_LOW_WATERMARK;
+		return -1;
+	}
+
+	if (args->dir_credit_quantum >= args->dir_credit_high_watermark) {
+		resp->status = DLB_ST_INVALID_DIR_CREDIT_QUANTUM;
+		return -1;
+	}
+
+	if (args->dir_credit_quantum > DLB_MAX_PORT_CREDIT_QUANTUM) {
+		resp->status = DLB_ST_INVALID_DIR_CREDIT_QUANTUM;
+		return -1;
+	}
+
+	/* Check cache-line alignment */
+	if ((pop_count_dma_base & 0x3F) != 0) {
+		resp->status = DLB_ST_INVALID_POP_COUNT_VIRT_ADDR;
+		return -1;
+	}
+
+	if ((cq_dma_base & 0x3F) != 0) {
+		resp->status = DLB_ST_INVALID_CQ_VIRT_ADDR;
+		return -1;
+	}
+
+	if (args->cq_depth != 8 &&
+	    args->cq_depth != 16 &&
+	    args->cq_depth != 32 &&
+	    args->cq_depth != 64 &&
+	    args->cq_depth != 128 &&
+	    args->cq_depth != 256 &&
+	    args->cq_depth != 512 &&
+	    args->cq_depth != 1024) {
+		resp->status = DLB_ST_INVALID_CQ_DEPTH;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int dlb_verify_start_domain_args(struct dlb_hw *hw,
+					u32 domain_id,
+					struct dlb_cmd_response *resp,
+					bool vf_request,
+					unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	if (domain->started) {
+		resp->status = DLB_ST_DOMAIN_STARTED;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int dlb_verify_map_qid_args(struct dlb_hw *hw,
+				   u32 domain_id,
+				   struct dlb_map_qid_args *args,
+				   struct dlb_cmd_response *resp,
+				   bool vf_request,
+				   unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_ldb_port *port;
+	struct dlb_ldb_queue *queue;
+	int id;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_ldb_port(id, vf_request, domain);
+
+	if (!port || !port->configured) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -1;
+	}
+
+	if (args->priority >= DLB_QID_PRIORITIES) {
+		resp->status = DLB_ST_INVALID_PRIORITY;
+		return -1;
+	}
+
+	queue = dlb_get_domain_ldb_queue(args->qid, vf_request, domain);
+
+	if (!queue || !queue->configured) {
+		resp->status = DLB_ST_INVALID_QID;
+		return -1;
+	}
+
+	if (queue->domain_id.phys_id != domain->id.phys_id) {
+		resp->status = DLB_ST_INVALID_QID;
+		return -1;
+	}
+
+	if (port->domain_id.phys_id != domain->id.phys_id) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -1;
+	}
+
+	return 0;
+}
+
+static bool dlb_port_find_slot(struct dlb_ldb_port *port,
+			       enum dlb_qid_map_state state,
+			       int *slot)
+{
+	int i;
+
+	for (i = 0; i < DLB_MAX_NUM_QIDS_PER_LDB_CQ; i++) {
+		if (port->qid_map[i].state == state)
+			break;
+	}
+
+	*slot = i;
+
+	return (i < DLB_MAX_NUM_QIDS_PER_LDB_CQ);
+}
+
+static bool dlb_port_find_slot_queue(struct dlb_ldb_port *port,
+				     enum dlb_qid_map_state state,
+				     struct dlb_ldb_queue *queue,
+				     int *slot)
+{
+	int i;
+
+	for (i = 0; i < DLB_MAX_NUM_QIDS_PER_LDB_CQ; i++) {
+		if (port->qid_map[i].state == state &&
+		    port->qid_map[i].qid == queue->id.phys_id)
+			break;
+	}
+
+	*slot = i;
+
+	return (i < DLB_MAX_NUM_QIDS_PER_LDB_CQ);
+}
+
+static bool
+dlb_port_find_slot_with_pending_map_queue(struct dlb_ldb_port *port,
+					  struct dlb_ldb_queue *queue,
+					  int *slot)
+{
+	int i;
+
+	for (i = 0; i < DLB_MAX_NUM_QIDS_PER_LDB_CQ; i++) {
+		struct dlb_ldb_port_qid_map *map = &port->qid_map[i];
+
+		if (map->state == DLB_QUEUE_UNMAP_IN_PROGRESS_PENDING_MAP &&
+		    map->pending_qid == queue->id.phys_id)
+			break;
+	}
+
+	*slot = i;
+
+	return (i < DLB_MAX_NUM_QIDS_PER_LDB_CQ);
+}
+
+static int dlb_port_slot_state_transition(struct dlb_hw *hw,
+					  struct dlb_ldb_port *port,
+					  struct dlb_ldb_queue *queue,
+					  int slot,
+					  enum dlb_qid_map_state new_state)
+{
+	enum dlb_qid_map_state curr_state = port->qid_map[slot].state;
+	struct dlb_domain *domain;
+
+	domain = dlb_get_domain_from_id(hw, port->domain_id.phys_id, false, 0);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: unable to find domain %d\n",
+			   __func__, port->domain_id.phys_id);
+		return -EFAULT;
+	}
+
+	switch (curr_state) {
+	case DLB_QUEUE_UNMAPPED:
+		switch (new_state) {
+		case DLB_QUEUE_MAPPED:
+			queue->num_mappings++;
+			port->num_mappings++;
+			break;
+		case DLB_QUEUE_MAP_IN_PROGRESS:
+			queue->num_pending_additions++;
+			domain->num_pending_additions++;
+			break;
+		default:
+			goto error;
+		}
+		break;
+	case DLB_QUEUE_MAPPED:
+		switch (new_state) {
+		case DLB_QUEUE_UNMAPPED:
+			queue->num_mappings--;
+			port->num_mappings--;
+			break;
+		case DLB_QUEUE_UNMAP_IN_PROGRESS:
+			port->num_pending_removals++;
+			domain->num_pending_removals++;
+			break;
+		case DLB_QUEUE_MAPPED:
+			/* Priority change, nothing to update */
+			break;
+		default:
+			goto error;
+		}
+		break;
+	case DLB_QUEUE_MAP_IN_PROGRESS:
+		switch (new_state) {
+		case DLB_QUEUE_UNMAPPED:
+			queue->num_pending_additions--;
+			domain->num_pending_additions--;
+			break;
+		case DLB_QUEUE_MAPPED:
+			queue->num_mappings++;
+			port->num_mappings++;
+			queue->num_pending_additions--;
+			domain->num_pending_additions--;
+			break;
+		default:
+			goto error;
+		}
+		break;
+	case DLB_QUEUE_UNMAP_IN_PROGRESS:
+		switch (new_state) {
+		case DLB_QUEUE_UNMAPPED:
+			port->num_pending_removals--;
+			domain->num_pending_removals--;
+			queue->num_mappings--;
+			port->num_mappings--;
+			break;
+		case DLB_QUEUE_MAPPED:
+			port->num_pending_removals--;
+			domain->num_pending_removals--;
+			break;
+		case DLB_QUEUE_UNMAP_IN_PROGRESS_PENDING_MAP:
+			/* Nothing to update */
+			break;
+		default:
+			goto error;
+		}
+		break;
+	case DLB_QUEUE_UNMAP_IN_PROGRESS_PENDING_MAP:
+		switch (new_state) {
+		case DLB_QUEUE_UNMAP_IN_PROGRESS:
+			/* Nothing to update */
+			break;
+		case DLB_QUEUE_UNMAPPED:
+			/* An UNMAP_IN_PROGRESS_PENDING_MAP slot briefly
+			 * becomes UNMAPPED before it transitions to
+			 * MAP_IN_PROGRESS.
+			 */
+			queue->num_mappings--;
+			port->num_mappings--;
+			port->num_pending_removals--;
+			domain->num_pending_removals--;
+			break;
+		default:
+			goto error;
+		}
+		break;
+	default:
+		goto error;
+	}
+
+	port->qid_map[slot].state = new_state;
+
+	DLB_HW_INFO(hw,
+		    "[%s()] queue %d -> port %d state transition (%d -> %d)\n",
+		    __func__, queue->id.phys_id, port->id.phys_id, curr_state,
+		    new_state);
+	return 0;
+
+error:
+	DLB_HW_ERR(hw,
+		   "[%s()] Internal error: invalid queue %d -> port %d state transition (%d -> %d)\n",
+		   __func__, queue->id.phys_id, port->id.phys_id, curr_state,
+		   new_state);
+	return -EFAULT;
+}
+
+static int dlb_verify_map_qid_slot_available(struct dlb_ldb_port *port,
+					     struct dlb_ldb_queue *queue,
+					     struct dlb_cmd_response *resp)
+{
+	enum dlb_qid_map_state state;
+	int i;
+
+	/* Unused slot available? */
+	if (port->num_mappings < DLB_MAX_NUM_QIDS_PER_LDB_CQ)
+		return 0;
+
+	/* If the queue is already mapped (from the application's perspective),
+	 * this is simply a priority update.
+	 */
+	state = DLB_QUEUE_MAPPED;
+	if (dlb_port_find_slot_queue(port, state, queue, &i))
+		return 0;
+
+	state = DLB_QUEUE_MAP_IN_PROGRESS;
+	if (dlb_port_find_slot_queue(port, state, queue, &i))
+		return 0;
+
+	if (dlb_port_find_slot_with_pending_map_queue(port, queue, &i))
+		return 0;
+
+	/* If the slot contains an unmap in progress, it's considered
+	 * available.
+	 */
+	state = DLB_QUEUE_UNMAP_IN_PROGRESS;
+	if (dlb_port_find_slot(port, state, &i))
+		return 0;
+
+	state = DLB_QUEUE_UNMAPPED;
+	if (dlb_port_find_slot(port, state, &i))
+		return 0;
+
+	resp->status = DLB_ST_NO_QID_SLOTS_AVAILABLE;
+	return -EINVAL;
+}
+
+static int dlb_verify_unmap_qid_args(struct dlb_hw *hw,
+				     u32 domain_id,
+				     struct dlb_unmap_qid_args *args,
+				     struct dlb_cmd_response *resp,
+				     bool vf_request,
+				     unsigned int vf_id)
+{
+	enum dlb_qid_map_state state;
+	struct dlb_domain *domain;
+	struct dlb_ldb_port *port;
+	struct dlb_ldb_queue *queue;
+	int slot;
+	int id;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_ldb_port(id, vf_request, domain);
+
+	if (!port || !port->configured) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -1;
+	}
+
+	if (port->domain_id.phys_id != domain->id.phys_id) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -1;
+	}
+
+	queue = dlb_get_domain_ldb_queue(args->qid, vf_request, domain);
+
+	if (!queue || !queue->configured) {
+		DLB_HW_ERR(hw, "[%s()] Can't unmap unconfigured queue %d\n",
+			   __func__, args->qid);
+		resp->status = DLB_ST_INVALID_QID;
+		return -1;
+	}
+
+	/* Verify that the port has the queue mapped. From the application's
+	 * perspective a queue is mapped if it is actually mapped, the map is
+	 * in progress, or the map is blocked pending an unmap.
+	 */
+	state = DLB_QUEUE_MAPPED;
+	if (dlb_port_find_slot_queue(port, state, queue, &slot))
+		return 0;
+
+	state = DLB_QUEUE_MAP_IN_PROGRESS;
+	if (dlb_port_find_slot_queue(port, state, queue, &slot))
+		return 0;
+
+	if (dlb_port_find_slot_with_pending_map_queue(port, queue, &slot))
+		return 0;
+
+	resp->status = DLB_ST_INVALID_QID;
+	return -1;
+}
+
+static int
+dlb_verify_enable_ldb_port_args(struct dlb_hw *hw,
+				u32 domain_id,
+				struct dlb_enable_ldb_port_args *args,
+				struct dlb_cmd_response *resp,
+				bool vf_request,
+				unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_ldb_port *port;
+	int id;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_ldb_port(id, vf_request, domain);
+
+	if (!port || !port->configured) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+dlb_verify_enable_dir_port_args(struct dlb_hw *hw,
+				u32 domain_id,
+				struct dlb_enable_dir_port_args *args,
+				struct dlb_cmd_response *resp,
+				bool vf_request,
+				unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_dir_pq_pair *port;
+	int id;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_dir_pq(id, vf_request, domain);
+
+	if (!port || !port->port_configured) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+dlb_verify_disable_ldb_port_args(struct dlb_hw *hw,
+				 u32 domain_id,
+				 struct dlb_disable_ldb_port_args *args,
+				 struct dlb_cmd_response *resp,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_ldb_port *port;
+	int id;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_ldb_port(id, vf_request, domain);
+
+	if (!port || !port->configured) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+dlb_verify_disable_dir_port_args(struct dlb_hw *hw,
+				 u32 domain_id,
+				 struct dlb_disable_dir_port_args *args,
+				 struct dlb_cmd_response *resp,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_dir_pq_pair *port;
+	int id;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -1;
+	}
+
+	if (!domain->configured) {
+		resp->status = DLB_ST_DOMAIN_NOT_CONFIGURED;
+		return -1;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_dir_pq(id, vf_request, domain);
+
+	if (!port || !port->port_configured) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+dlb_domain_attach_resources(struct dlb_hw *hw,
+			    struct dlb_function_resources *rsrcs,
+			    struct dlb_domain *domain,
+			    struct dlb_create_sched_domain_args *args,
+			    struct dlb_cmd_response *resp)
+{
+	int ret;
+
+	ret = dlb_attach_ldb_queues(hw,
+				    rsrcs,
+				    domain,
+				    args->num_ldb_queues,
+				    resp);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_attach_ldb_ports(hw,
+				   rsrcs,
+				   domain,
+				   args->num_ldb_ports,
+				   resp);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_attach_dir_ports(hw,
+				   rsrcs,
+				   domain,
+				   args->num_dir_ports,
+				   resp);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_attach_ldb_credits(rsrcs,
+				     domain,
+				     args->num_ldb_credits,
+				     resp);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_attach_dir_credits(rsrcs,
+				     domain,
+				     args->num_dir_credits,
+				     resp);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_attach_ldb_credit_pools(hw,
+					  rsrcs,
+					  domain,
+					  args->num_ldb_credit_pools,
+					  resp);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_attach_dir_credit_pools(hw,
+					  rsrcs,
+					  domain,
+					  args->num_dir_credit_pools,
+					  resp);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_attach_domain_hist_list_entries(rsrcs,
+						  domain,
+						  args->num_hist_list_entries,
+						  resp);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_attach_atomic_inflights(rsrcs,
+					  domain,
+					  args->num_atomic_inflights,
+					  resp);
+	if (ret < 0)
+		return ret;
+
+	domain->configured = true;
+
+	domain->started = false;
+
+	rsrcs->num_avail_domains--;
+
+	return 0;
+}
+
+static int
+dlb_ldb_queue_attach_to_sn_group(struct dlb_hw *hw,
+				 struct dlb_ldb_queue *queue,
+				 struct dlb_create_ldb_queue_args *args)
+{
+	int slot = -1;
+	int i;
+
+	queue->sn_cfg_valid = false;
+
+	if (args->num_sequence_numbers == 0)
+		return 0;
+
+	for (i = 0; i < DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS; i++) {
+		struct dlb_sn_group *group = &hw->rsrcs.sn_groups[i];
+
+		if (group->sequence_numbers_per_queue ==
+		    args->num_sequence_numbers &&
+		    !dlb_sn_group_full(group)) {
+			slot = dlb_sn_group_alloc_slot(group);
+			if (slot >= 0)
+				break;
+		}
+	}
+
+	if (slot == -1) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: no sequence number slots available\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	queue->sn_cfg_valid = true;
+	queue->sn_group = i;
+	queue->sn_slot = slot;
+	return 0;
+}
+
+static int
+dlb_ldb_queue_attach_resources(struct dlb_hw *hw,
+			       struct dlb_domain *domain,
+			       struct dlb_ldb_queue *queue,
+			       struct dlb_create_ldb_queue_args *args)
+{
+	int ret;
+
+	ret = dlb_ldb_queue_attach_to_sn_group(hw, queue, args);
+	if (ret)
+		return ret;
+
+	/* Attach QID inflights */
+	queue->num_qid_inflights = args->num_qid_inflights;
+
+	/* Attach atomic inflights */
+	queue->aqed_freelist.base = domain->aqed_freelist.base +
+				    domain->aqed_freelist.offset;
+	queue->aqed_freelist.bound = queue->aqed_freelist.base +
+				     args->num_atomic_inflights;
+	domain->aqed_freelist.offset += args->num_atomic_inflights;
+
+	return 0;
+}
+
+static void dlb_ldb_port_cq_enable(struct dlb_hw *hw,
+				   struct dlb_ldb_port *port)
+{
+	union dlb_lsp_cq_ldb_dsbl reg;
+
+	/* Don't re-enable the port if a removal is pending. The caller should
+	 * mark this port as enabled (if it isn't already), and when the
+	 * removal completes the port will be enabled.
+	 */
+	if (port->num_pending_removals)
+		return;
+
+	reg.field.disabled = 0;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ_LDB_DSBL(port->id.phys_id), reg.val);
+
+	dlb_flush_csr(hw);
+}
+
+static void dlb_ldb_port_cq_disable(struct dlb_hw *hw,
+				    struct dlb_ldb_port *port)
+{
+	union dlb_lsp_cq_ldb_dsbl reg;
+
+	reg.field.disabled = 1;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ_LDB_DSBL(port->id.phys_id), reg.val);
+
+	dlb_flush_csr(hw);
+}
+
+static void dlb_dir_port_cq_enable(struct dlb_hw *hw,
+				   struct dlb_dir_pq_pair *port)
+{
+	union dlb_lsp_cq_dir_dsbl reg;
+
+	reg.field.disabled = 0;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ_DIR_DSBL(port->id.phys_id), reg.val);
+
+	dlb_flush_csr(hw);
+}
+
+static void dlb_dir_port_cq_disable(struct dlb_hw *hw,
+				    struct dlb_dir_pq_pair *port)
+{
+	union dlb_lsp_cq_dir_dsbl reg;
+
+	reg.field.disabled = 1;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ_DIR_DSBL(port->id.phys_id), reg.val);
+
+	dlb_flush_csr(hw);
+}
+
+static int dlb_ldb_port_configure_pp(struct dlb_hw *hw,
+				     struct dlb_domain *domain,
+				     struct dlb_ldb_port *port,
+				     struct dlb_create_ldb_port_args *args,
+				     bool vf_request,
+				     unsigned int vf_id)
+{
+	union dlb_sys_ldb_pp2ldbpool r0 = { {0} };
+	union dlb_sys_ldb_pp2dirpool r1 = { {0} };
+	union dlb_sys_ldb_pp2vf_pf r2 = { {0} };
+	union dlb_sys_ldb_pp2vas r3 = { {0} };
+	union dlb_sys_ldb_pp_v r4 = { {0} };
+	union dlb_sys_ldb_pp2vpp r5 = { {0} };
+	union dlb_chp_ldb_pp_ldb_crd_hwm r6 = { {0} };
+	union dlb_chp_ldb_pp_dir_crd_hwm r7 = { {0} };
+	union dlb_chp_ldb_pp_ldb_crd_lwm r8 = { {0} };
+	union dlb_chp_ldb_pp_dir_crd_lwm r9 = { {0} };
+	union dlb_chp_ldb_pp_ldb_min_crd_qnt r10 = { {0} };
+	union dlb_chp_ldb_pp_dir_min_crd_qnt r11 = { {0} };
+	union dlb_chp_ldb_pp_ldb_crd_cnt r12 = { {0} };
+	union dlb_chp_ldb_pp_dir_crd_cnt r13 = { {0} };
+	union dlb_chp_ldb_ldb_pp2pool r14 = { {0} };
+	union dlb_chp_ldb_dir_pp2pool r15 = { {0} };
+	union dlb_chp_ldb_pp_crd_req_state r16 = { {0} };
+	union dlb_chp_ldb_pp_ldb_push_ptr r17 = { {0} };
+	union dlb_chp_ldb_pp_dir_push_ptr r18 = { {0} };
+
+	struct dlb_credit_pool *ldb_pool = NULL;
+	struct dlb_credit_pool *dir_pool = NULL;
+	unsigned int offs;
+
+	if (port->ldb_pool_used) {
+		ldb_pool = dlb_get_domain_ldb_pool(args->ldb_credit_pool_id,
+						   vf_request,
+						   domain);
+		if (!ldb_pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: port validation failed\n",
+				   __func__);
+			return -EFAULT;
+		}
+	}
+
+	if (port->dir_pool_used) {
+		dir_pool = dlb_get_domain_dir_pool(args->dir_credit_pool_id,
+						   vf_request,
+						   domain);
+		if (!dir_pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: port validation failed\n",
+				   __func__);
+			return -EFAULT;
+		}
+	}
+
+	r0.field.ldbpool = (port->ldb_pool_used) ? ldb_pool->id.phys_id : 0;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_PP2LDBPOOL(port->id.phys_id), r0.val);
+
+	r1.field.dirpool = (port->dir_pool_used) ? dir_pool->id.phys_id : 0;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_PP2DIRPOOL(port->id.phys_id), r1.val);
+
+	r2.field.vf = vf_id;
+	r2.field.is_pf = !vf_request;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_PP2VF_PF(port->id.phys_id), r2.val);
+
+	r3.field.vas = domain->id.phys_id;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_PP2VAS(port->id.phys_id), r3.val);
+
+	r5.field.vpp = port->id.virt_id;
+
+	offs = (vf_id * DLB_MAX_NUM_LDB_PORTS) + port->id.phys_id;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_PP2VPP(offs), r5.val);
+
+	r6.field.hwm = args->ldb_credit_high_watermark;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_LDB_CRD_HWM(port->id.phys_id), r6.val);
+
+	r7.field.hwm = args->dir_credit_high_watermark;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_DIR_CRD_HWM(port->id.phys_id), r7.val);
+
+	r8.field.lwm = args->ldb_credit_low_watermark;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_LDB_CRD_LWM(port->id.phys_id), r8.val);
+
+	r9.field.lwm = args->dir_credit_low_watermark;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_DIR_CRD_LWM(port->id.phys_id), r9.val);
+
+	r10.field.quanta = args->ldb_credit_quantum;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_LDB_MIN_CRD_QNT(port->id.phys_id),
+		   r10.val);
+
+	r11.field.quanta = args->dir_credit_quantum;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_DIR_MIN_CRD_QNT(port->id.phys_id),
+		   r11.val);
+
+	r12.field.count = args->ldb_credit_high_watermark;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_LDB_CRD_CNT(port->id.phys_id), r12.val);
+
+	r13.field.count = args->dir_credit_high_watermark;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_DIR_CRD_CNT(port->id.phys_id), r13.val);
+
+	r14.field.pool = (port->ldb_pool_used) ? ldb_pool->id.phys_id : 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_LDB_PP2POOL(port->id.phys_id), r14.val);
+
+	r15.field.pool = (port->dir_pool_used) ? dir_pool->id.phys_id : 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_DIR_PP2POOL(port->id.phys_id), r15.val);
+
+	r16.field.no_pp_credit_update = 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_CRD_REQ_STATE(port->id.phys_id), r16.val);
+
+	r17.field.push_pointer = 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_LDB_PUSH_PTR(port->id.phys_id), r17.val);
+
+	r18.field.push_pointer = 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_PP_DIR_PUSH_PTR(port->id.phys_id), r18.val);
+
+	if (vf_request) {
+		union dlb_sys_vf_ldb_vpp2pp r16 = { {0} };
+		union dlb_sys_vf_ldb_vpp_v r17 = { {0} };
+
+		r16.field.pp = port->id.phys_id;
+
+		offs = vf_id * DLB_MAX_NUM_LDB_PORTS + port->id.virt_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_LDB_VPP2PP(offs), r16.val);
+
+		r17.field.vpp_v = 1;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_LDB_VPP_V(offs), r17.val);
+	}
+
+	r4.field.pp_v = 1;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_PP_V(port->id.phys_id),
+		   r4.val);
+
+	return 0;
+}
+
+static int dlb_ldb_port_configure_cq(struct dlb_hw *hw,
+				     struct dlb_ldb_port *port,
+				     u64 pop_count_dma_base,
+				     u64 cq_dma_base,
+				     struct dlb_create_ldb_port_args *args,
+				     bool vf_request,
+				     unsigned int vf_id)
+{
+	int i;
+
+	union dlb_sys_ldb_cq_addr_l r0 = { {0} };
+	union dlb_sys_ldb_cq_addr_u r1 = { {0} };
+	union dlb_sys_ldb_cq2vf_pf r2 = { {0} };
+	union dlb_chp_ldb_cq_tkn_depth_sel r3 = { {0} };
+	union dlb_chp_hist_list_lim r4 = { {0} };
+	union dlb_chp_hist_list_base r5 = { {0} };
+	union dlb_lsp_cq_ldb_infl_lim r6 = { {0} };
+	union dlb_lsp_cq2priov r7 = { {0} };
+	union dlb_chp_hist_list_push_ptr r8 = { {0} };
+	union dlb_chp_hist_list_pop_ptr r9 = { {0} };
+	union dlb_lsp_cq_ldb_tkn_depth_sel r10 = { {0} };
+	union dlb_sys_ldb_pp_addr_l r11 = { {0} };
+	union dlb_sys_ldb_pp_addr_u r12 = { {0} };
+
+	/* The CQ address is 64B-aligned, and the DLB only wants bits [63:6] */
+	r0.field.addr_l = cq_dma_base >> 6;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_CQ_ADDR_L(port->id.phys_id),
+		   r0.val);
+
+	r1.field.addr_u = cq_dma_base >> 32;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_CQ_ADDR_U(port->id.phys_id),
+		   r1.val);
+
+	r2.field.vf = vf_id;
+	r2.field.is_pf = !vf_request;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_CQ2VF_PF(port->id.phys_id),
+		   r2.val);
+
+	if (args->cq_depth <= 8) {
+		r3.field.token_depth_select = 1;
+	} else if (args->cq_depth == 16) {
+		r3.field.token_depth_select = 2;
+	} else if (args->cq_depth == 32) {
+		r3.field.token_depth_select = 3;
+	} else if (args->cq_depth == 64) {
+		r3.field.token_depth_select = 4;
+	} else if (args->cq_depth == 128) {
+		r3.field.token_depth_select = 5;
+	} else if (args->cq_depth == 256) {
+		r3.field.token_depth_select = 6;
+	} else if (args->cq_depth == 512) {
+		r3.field.token_depth_select = 7;
+	} else if (args->cq_depth == 1024) {
+		r3.field.token_depth_select = 8;
+	} else {
+		DLB_HW_ERR(hw, "[%s():%d] Internal error: invalid CQ depth\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_CQ_TKN_DEPTH_SEL(port->id.phys_id),
+		   r3.val);
+
+	r10.field.token_depth_select = r3.field.token_depth_select;
+	r10.field.ignore_depth = 0;
+	/* TDT algorithm: DLB must be able to write CQs with depth < 4 */
+	r10.field.enab_shallow_cq = 1;
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ_LDB_TKN_DEPTH_SEL(port->id.phys_id),
+		   r10.val);
+
+	/* To support CQs with depth less than 8, program the token count
+	 * register with a non-zero initial value. Operations such as domain
+	 * reset must take this initial value into account when quiescing the
+	 * CQ.
+	 */
+	port->init_tkn_cnt = 0;
+
+	if (args->cq_depth < 8) {
+		union dlb_lsp_cq_ldb_tkn_cnt r12 = { {0} };
+
+		port->init_tkn_cnt = 8 - args->cq_depth;
+
+		r12.field.token_count = port->init_tkn_cnt;
+
+		DLB_CSR_WR(hw,
+			   DLB_LSP_CQ_LDB_TKN_CNT(port->id.phys_id),
+			   r12.val);
+	}
+
+	r4.field.limit = port->hist_list_entry_limit - 1;
+
+	DLB_CSR_WR(hw, DLB_CHP_HIST_LIST_LIM(port->id.phys_id), r4.val);
+
+	r5.field.base = port->hist_list_entry_base;
+
+	DLB_CSR_WR(hw, DLB_CHP_HIST_LIST_BASE(port->id.phys_id), r5.val);
+
+	r8.field.push_ptr = r5.field.base;
+	r8.field.generation = 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_HIST_LIST_PUSH_PTR(port->id.phys_id), r8.val);
+
+	r9.field.pop_ptr = r5.field.base;
+	r9.field.generation = 0;
+
+	DLB_CSR_WR(hw, DLB_CHP_HIST_LIST_POP_PTR(port->id.phys_id), r9.val);
+
+	/* The inflight limit sets a cap on the number of QEs for which this CQ
+	 * can owe completions at one time.
+	 */
+	r6.field.limit = args->cq_history_list_size;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ_LDB_INFL_LIM(port->id.phys_id), r6.val);
+
+	/* Disable the port's QID mappings */
+	r7.field.v = 0;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ2PRIOV(port->id.phys_id), r7.val);
+
+	/* Two cache lines (128B) are dedicated for the port's pop counts */
+	r11.field.addr_l = pop_count_dma_base >> 7;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_PP_ADDR_L(port->id.phys_id), r11.val);
+
+	r12.field.addr_u = pop_count_dma_base >> 32;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_PP_ADDR_U(port->id.phys_id), r12.val);
+
+	for (i = 0; i < DLB_MAX_NUM_QIDS_PER_LDB_CQ; i++)
+		port->qid_map[i].state = DLB_QUEUE_UNMAPPED;
+
+	return 0;
+}
+
+static void dlb_update_ldb_arb_threshold(struct dlb_hw *hw)
+{
+	union dlb_lsp_ctrl_config_0 r0 = { {0} };
+
+	/* From the hardware spec:
+	 * "The optimal value for ldb_arb_threshold is in the region of {8 *
+	 * #CQs}. It is expected therefore that the PF will change this value
+	 * dynamically as the number of active ports changes."
+	 */
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_CTRL_CONFIG_0);
+
+	r0.field.ldb_arb_threshold = hw->pf.num_enabled_ldb_ports * 8;
+	r0.field.ldb_arb_ignore_empty = 1;
+	r0.field.ldb_arb_mode = 1;
+
+	DLB_CSR_WR(hw, DLB_LSP_CTRL_CONFIG_0, r0.val);
+
+	dlb_flush_csr(hw);
+}
+
+static void dlb_ldb_pool_update_credit_count(struct dlb_hw *hw,
+					     u32 pool_id,
+					     u32 count)
+{
+	hw->rsrcs.ldb_credit_pools[pool_id].avail_credits -= count;
+}
+
+static void dlb_dir_pool_update_credit_count(struct dlb_hw *hw,
+					     u32 pool_id,
+					     u32 count)
+{
+	hw->rsrcs.dir_credit_pools[pool_id].avail_credits -= count;
+}
+
+static void dlb_ldb_pool_write_credit_count_reg(struct dlb_hw *hw,
+						u32 pool_id)
+{
+	union dlb_chp_ldb_pool_crd_cnt r0 = { {0} };
+	struct dlb_credit_pool *pool;
+
+	pool = &hw->rsrcs.ldb_credit_pools[pool_id];
+
+	r0.field.count = pool->avail_credits;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_POOL_CRD_CNT(pool->id.phys_id),
+		   r0.val);
+}
+
+static void dlb_dir_pool_write_credit_count_reg(struct dlb_hw *hw,
+						u32 pool_id)
+{
+	union dlb_chp_dir_pool_crd_cnt r0 = { {0} };
+	struct dlb_credit_pool *pool;
+
+	pool = &hw->rsrcs.dir_credit_pools[pool_id];
+
+	r0.field.count = pool->avail_credits;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_POOL_CRD_CNT(pool->id.phys_id),
+		   r0.val);
+}
+
+static int dlb_configure_ldb_port(struct dlb_hw *hw,
+				  struct dlb_domain *domain,
+				  struct dlb_ldb_port *port,
+				  u64 pop_count_dma_base,
+				  u64 cq_dma_base,
+				  struct dlb_create_ldb_port_args *args,
+				  bool vf_request,
+				  unsigned int vf_id)
+{
+	struct dlb_credit_pool *ldb_pool, *dir_pool;
+	int ret;
+
+	port->hist_list_entry_base = domain->hist_list_entry_base +
+				     domain->hist_list_entry_offset;
+	port->hist_list_entry_limit = port->hist_list_entry_base +
+				      args->cq_history_list_size;
+
+	domain->hist_list_entry_offset += args->cq_history_list_size;
+	domain->avail_hist_list_entries -= args->cq_history_list_size;
+
+	port->ldb_pool_used = !dlb_list_empty(&domain->used_ldb_queues) ||
+			      !dlb_list_empty(&domain->avail_ldb_queues);
+	port->dir_pool_used = !dlb_list_empty(&domain->used_dir_pq_pairs) ||
+			      !dlb_list_empty(&domain->avail_dir_pq_pairs);
+
+	if (port->ldb_pool_used) {
+		u32 cnt = args->ldb_credit_high_watermark;
+
+		ldb_pool = dlb_get_domain_ldb_pool(args->ldb_credit_pool_id,
+						   vf_request,
+						   domain);
+		if (!ldb_pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: port validation failed\n",
+				   __func__);
+			return -EFAULT;
+		}
+
+		dlb_ldb_pool_update_credit_count(hw, ldb_pool->id.phys_id, cnt);
+	} else {
+		args->ldb_credit_high_watermark = 0;
+		args->ldb_credit_low_watermark = 0;
+		args->ldb_credit_quantum = 0;
+	}
+
+	if (port->dir_pool_used) {
+		u32 cnt = args->dir_credit_high_watermark;
+
+		dir_pool = dlb_get_domain_dir_pool(args->dir_credit_pool_id,
+						   vf_request,
+						   domain);
+		if (!dir_pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: port validation failed\n",
+				   __func__);
+			return -EFAULT;
+		}
+
+		dlb_dir_pool_update_credit_count(hw, dir_pool->id.phys_id, cnt);
+	} else {
+		args->dir_credit_high_watermark = 0;
+		args->dir_credit_low_watermark = 0;
+		args->dir_credit_quantum = 0;
+	}
+
+	ret = dlb_ldb_port_configure_cq(hw,
+					port,
+					pop_count_dma_base,
+					cq_dma_base,
+					args,
+					vf_request,
+					vf_id);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_ldb_port_configure_pp(hw,
+					domain,
+					port,
+					args,
+					vf_request,
+					vf_id);
+	if (ret < 0)
+		return ret;
+
+	dlb_ldb_port_cq_enable(hw, port);
+
+	port->num_mappings = 0;
+
+	port->enabled = true;
+
+	hw->pf.num_enabled_ldb_ports++;
+
+	dlb_update_ldb_arb_threshold(hw);
+
+	port->configured = true;
+
+	return 0;
+}
+
+static int dlb_dir_port_configure_pp(struct dlb_hw *hw,
+				     struct dlb_domain *domain,
+				     struct dlb_dir_pq_pair *port,
+				     struct dlb_create_dir_port_args *args,
+				     bool vf_request,
+				     unsigned int vf_id)
+{
+	union dlb_sys_dir_pp2ldbpool r0 = { {0} };
+	union dlb_sys_dir_pp2dirpool r1 = { {0} };
+	union dlb_sys_dir_pp2vf_pf r2 = { {0} };
+	union dlb_sys_dir_pp2vas r3 = { {0} };
+	union dlb_sys_dir_pp_v r4 = { {0} };
+	union dlb_sys_dir_pp2vpp r5 = { {0} };
+	union dlb_chp_dir_pp_ldb_crd_hwm r6 = { {0} };
+	union dlb_chp_dir_pp_dir_crd_hwm r7 = { {0} };
+	union dlb_chp_dir_pp_ldb_crd_lwm r8 = { {0} };
+	union dlb_chp_dir_pp_dir_crd_lwm r9 = { {0} };
+	union dlb_chp_dir_pp_ldb_min_crd_qnt r10 = { {0} };
+	union dlb_chp_dir_pp_dir_min_crd_qnt r11 = { {0} };
+	union dlb_chp_dir_pp_ldb_crd_cnt r12 = { {0} };
+	union dlb_chp_dir_pp_dir_crd_cnt r13 = { {0} };
+	union dlb_chp_dir_ldb_pp2pool r14 = { {0} };
+	union dlb_chp_dir_dir_pp2pool r15 = { {0} };
+	union dlb_chp_dir_pp_crd_req_state r16 = { {0} };
+	union dlb_chp_dir_pp_ldb_push_ptr r17 = { {0} };
+	union dlb_chp_dir_pp_dir_push_ptr r18 = { {0} };
+
+	struct dlb_credit_pool *ldb_pool = NULL;
+	struct dlb_credit_pool *dir_pool = NULL;
+
+	if (port->ldb_pool_used) {
+		ldb_pool = dlb_get_domain_ldb_pool(args->ldb_credit_pool_id,
+						   vf_request,
+						   domain);
+		if (!ldb_pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: port validation failed\n",
+				   __func__);
+			return -EFAULT;
+		}
+	}
+
+	if (port->dir_pool_used) {
+		dir_pool = dlb_get_domain_dir_pool(args->dir_credit_pool_id,
+						   vf_request,
+						   domain);
+		if (!dir_pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: port validation failed\n",
+				   __func__);
+			return -EFAULT;
+		}
+	}
+
+	r0.field.ldbpool = (port->ldb_pool_used) ? ldb_pool->id.phys_id : 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2LDBPOOL(port->id.phys_id),
+		   r0.val);
+
+	r1.field.dirpool = (port->dir_pool_used) ? dir_pool->id.phys_id : 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2DIRPOOL(port->id.phys_id),
+		   r1.val);
+
+	r2.field.vf = vf_id;
+	r2.field.is_pf = !vf_request;
+	r2.field.is_hw_dsi = 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2VF_PF(port->id.phys_id),
+		   r2.val);
+
+	r3.field.vas = domain->id.phys_id;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2VAS(port->id.phys_id),
+		   r3.val);
+
+	r5.field.vpp = port->id.virt_id;
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2VPP((vf_id * DLB_MAX_NUM_DIR_PORTS) +
+				      port->id.phys_id),
+		   r5.val);
+
+	r6.field.hwm = args->ldb_credit_high_watermark;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_CRD_HWM(port->id.phys_id),
+		   r6.val);
+
+	r7.field.hwm = args->dir_credit_high_watermark;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_CRD_HWM(port->id.phys_id),
+		   r7.val);
+
+	r8.field.lwm = args->ldb_credit_low_watermark;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_CRD_LWM(port->id.phys_id),
+		   r8.val);
+
+	r9.field.lwm = args->dir_credit_low_watermark;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_CRD_LWM(port->id.phys_id),
+		   r9.val);
+
+	r10.field.quanta = args->ldb_credit_quantum;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_MIN_CRD_QNT(port->id.phys_id),
+		   r10.val);
+
+	r11.field.quanta = args->dir_credit_quantum;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_MIN_CRD_QNT(port->id.phys_id),
+		   r11.val);
+
+	r12.field.count = args->ldb_credit_high_watermark;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_CRD_CNT(port->id.phys_id),
+		   r12.val);
+
+	r13.field.count = args->dir_credit_high_watermark;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_CRD_CNT(port->id.phys_id),
+		   r13.val);
+
+	r14.field.pool = (port->ldb_pool_used) ? ldb_pool->id.phys_id : 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_LDB_PP2POOL(port->id.phys_id),
+		   r14.val);
+
+	r15.field.pool = (port->dir_pool_used) ? dir_pool->id.phys_id : 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_DIR_PP2POOL(port->id.phys_id),
+		   r15.val);
+
+	r16.field.no_pp_credit_update = 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_CRD_REQ_STATE(port->id.phys_id),
+		   r16.val);
+
+	r17.field.push_pointer = 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_PUSH_PTR(port->id.phys_id),
+		   r17.val);
+
+	r18.field.push_pointer = 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_PUSH_PTR(port->id.phys_id),
+		   r18.val);
+
+	if (vf_request) {
+		union dlb_sys_vf_dir_vpp2pp r16 = { {0} };
+		union dlb_sys_vf_dir_vpp_v r17 = { {0} };
+		unsigned int offs;
+
+		r16.field.pp = port->id.phys_id;
+
+		offs = vf_id * DLB_MAX_NUM_DIR_PORTS + port->id.virt_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_DIR_VPP2PP(offs), r16.val);
+
+		r17.field.vpp_v = 1;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_DIR_VPP_V(offs), r17.val);
+	}
+
+	r4.field.pp_v = 1;
+	r4.field.mb_dm = 0;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_PP_V(port->id.phys_id), r4.val);
+
+	return 0;
+}
+
+static int dlb_dir_port_configure_cq(struct dlb_hw *hw,
+				     struct dlb_dir_pq_pair *port,
+				     u64 pop_count_dma_base,
+				     u64 cq_dma_base,
+				     struct dlb_create_dir_port_args *args,
+				     bool vf_request,
+				     unsigned int vf_id)
+{
+	union dlb_sys_dir_cq_addr_l r0 = { {0} };
+	union dlb_sys_dir_cq_addr_u r1 = { {0} };
+	union dlb_sys_dir_cq2vf_pf r2 = { {0} };
+	union dlb_chp_dir_cq_tkn_depth_sel r3 = { {0} };
+	union dlb_lsp_cq_dir_tkn_depth_sel_dsi r4 = { {0} };
+	union dlb_sys_dir_pp_addr_l r5 = { {0} };
+	union dlb_sys_dir_pp_addr_u r6 = { {0} };
+
+	/* The CQ address is 64B-aligned, and the DLB only wants bits [63:6] */
+	r0.field.addr_l = cq_dma_base >> 6;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_CQ_ADDR_L(port->id.phys_id), r0.val);
+
+	r1.field.addr_u = cq_dma_base >> 32;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_CQ_ADDR_U(port->id.phys_id), r1.val);
+
+	r2.field.vf = vf_id;
+	r2.field.is_pf = !vf_request;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_CQ2VF_PF(port->id.phys_id), r2.val);
+
+	if (args->cq_depth == 8) {
+		r3.field.token_depth_select = 1;
+	} else if (args->cq_depth == 16) {
+		r3.field.token_depth_select = 2;
+	} else if (args->cq_depth == 32) {
+		r3.field.token_depth_select = 3;
+	} else if (args->cq_depth == 64) {
+		r3.field.token_depth_select = 4;
+	} else if (args->cq_depth == 128) {
+		r3.field.token_depth_select = 5;
+	} else if (args->cq_depth == 256) {
+		r3.field.token_depth_select = 6;
+	} else if (args->cq_depth == 512) {
+		r3.field.token_depth_select = 7;
+	} else if (args->cq_depth == 1024) {
+		r3.field.token_depth_select = 8;
+	} else {
+		DLB_HW_ERR(hw, "[%s():%d] Internal error: invalid CQ depth\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_CQ_TKN_DEPTH_SEL(port->id.phys_id),
+		   r3.val);
+
+	r4.field.token_depth_select = r3.field.token_depth_select;
+	r4.field.disable_wb_opt = 0;
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ_DIR_TKN_DEPTH_SEL_DSI(port->id.phys_id),
+		   r4.val);
+
+	/* Two cache lines (128B) are dedicated for the port's pop counts */
+	r5.field.addr_l = pop_count_dma_base >> 7;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_PP_ADDR_L(port->id.phys_id), r5.val);
+
+	r6.field.addr_u = pop_count_dma_base >> 32;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_PP_ADDR_U(port->id.phys_id), r6.val);
+
+	return 0;
+}
+
+static int dlb_configure_dir_port(struct dlb_hw *hw,
+				  struct dlb_domain *domain,
+				  struct dlb_dir_pq_pair *port,
+				  u64 pop_count_dma_base,
+				  u64 cq_dma_base,
+				  struct dlb_create_dir_port_args *args,
+				  bool vf_request,
+				  unsigned int vf_id)
+{
+	struct dlb_credit_pool *ldb_pool, *dir_pool;
+	int ret;
+
+	port->ldb_pool_used = !dlb_list_empty(&domain->used_ldb_queues) ||
+			      !dlb_list_empty(&domain->avail_ldb_queues);
+
+	/* Each directed port has a directed queue, hence this port requires
+	 * directed credits.
+	 */
+	port->dir_pool_used = true;
+
+	if (port->ldb_pool_used) {
+		u32 cnt = args->ldb_credit_high_watermark;
+
+		ldb_pool = dlb_get_domain_ldb_pool(args->ldb_credit_pool_id,
+						   vf_request,
+						   domain);
+		if (!ldb_pool) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: port validation failed\n",
+				   __func__);
+			return -EFAULT;
+		}
+
+		dlb_ldb_pool_update_credit_count(hw, ldb_pool->id.phys_id, cnt);
+	} else {
+		args->ldb_credit_high_watermark = 0;
+		args->ldb_credit_low_watermark = 0;
+		args->ldb_credit_quantum = 0;
+	}
+
+	dir_pool = dlb_get_domain_dir_pool(args->dir_credit_pool_id,
+					   vf_request,
+					   domain);
+	if (!dir_pool) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: port validation failed\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	dlb_dir_pool_update_credit_count(hw,
+					 dir_pool->id.phys_id,
+					 args->dir_credit_high_watermark);
+
+	ret = dlb_dir_port_configure_cq(hw,
+					port,
+					pop_count_dma_base,
+					cq_dma_base,
+					args,
+					vf_request,
+					vf_id);
+
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_dir_port_configure_pp(hw,
+					domain,
+					port,
+					args,
+					vf_request,
+					vf_id);
+	if (ret < 0)
+		return ret;
+
+	dlb_dir_port_cq_enable(hw, port);
+
+	port->enabled = true;
+
+	port->port_configured = true;
+
+	return 0;
+}
+
+static int dlb_ldb_port_map_qid_static(struct dlb_hw *hw,
+				       struct dlb_ldb_port *p,
+				       struct dlb_ldb_queue *q,
+				       u8 priority)
+{
+	union dlb_lsp_cq2priov r0;
+	union dlb_lsp_cq2qid r1;
+	union dlb_atm_pipe_qid_ldb_qid2cqidx r2;
+	union dlb_lsp_qid_ldb_qid2cqidx r3;
+	union dlb_lsp_qid_ldb_qid2cqidx2 r4;
+	enum dlb_qid_map_state state;
+	int i;
+
+	/* Look for a pending or already mapped slot, else an unused slot */
+	if (!dlb_port_find_slot_queue(p, DLB_QUEUE_MAP_IN_PROGRESS, q, &i) &&
+	    !dlb_port_find_slot_queue(p, DLB_QUEUE_MAPPED, q, &i) &&
+	    !dlb_port_find_slot(p, DLB_QUEUE_UNMAPPED, &i)) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: CQ has no available QID mapping slots\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port slot tracking failed\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* Read-modify-write the priority and valid bit register */
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_CQ2PRIOV(p->id.phys_id));
+
+	r0.field.v |= 1 << i;
+	r0.field.prio |= (priority & 0x7) << i * 3;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ2PRIOV(p->id.phys_id), r0.val);
+
+	/* Read-modify-write the QID map register */
+	r1.val = DLB_CSR_RD(hw, DLB_LSP_CQ2QID(p->id.phys_id, i / 4));
+
+	if (i == 0 || i == 4)
+		r1.field.qid_p0 = q->id.phys_id;
+	if (i == 1 || i == 5)
+		r1.field.qid_p1 = q->id.phys_id;
+	if (i == 2 || i == 6)
+		r1.field.qid_p2 = q->id.phys_id;
+	if (i == 3 || i == 7)
+		r1.field.qid_p3 = q->id.phys_id;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ2QID(p->id.phys_id, i / 4), r1.val);
+
+	r2.val = DLB_CSR_RD(hw,
+			    DLB_ATM_PIPE_QID_LDB_QID2CQIDX(q->id.phys_id,
+							   p->id.phys_id / 4));
+
+	r3.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_LDB_QID2CQIDX(q->id.phys_id,
+						      p->id.phys_id / 4));
+
+	r4.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_LDB_QID2CQIDX2(q->id.phys_id,
+						       p->id.phys_id / 4));
+
+	switch (p->id.phys_id % 4) {
+	case 0:
+		r2.field.cq_p0 |= 1 << i;
+		r3.field.cq_p0 |= 1 << i;
+		r4.field.cq_p0 |= 1 << i;
+		break;
+
+	case 1:
+		r2.field.cq_p1 |= 1 << i;
+		r3.field.cq_p1 |= 1 << i;
+		r4.field.cq_p1 |= 1 << i;
+		break;
+
+	case 2:
+		r2.field.cq_p2 |= 1 << i;
+		r3.field.cq_p2 |= 1 << i;
+		r4.field.cq_p2 |= 1 << i;
+		break;
+
+	case 3:
+		r2.field.cq_p3 |= 1 << i;
+		r3.field.cq_p3 |= 1 << i;
+		r4.field.cq_p3 |= 1 << i;
+		break;
+	}
+
+	DLB_CSR_WR(hw,
+		   DLB_ATM_PIPE_QID_LDB_QID2CQIDX(q->id.phys_id,
+						  p->id.phys_id / 4),
+		   r2.val);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_QID_LDB_QID2CQIDX(q->id.phys_id,
+					     p->id.phys_id / 4),
+		   r3.val);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_QID_LDB_QID2CQIDX2(q->id.phys_id,
+					      p->id.phys_id / 4),
+		   r4.val);
+
+	dlb_flush_csr(hw);
+
+	p->qid_map[i].qid = q->id.phys_id;
+	p->qid_map[i].priority = priority;
+
+	state = DLB_QUEUE_MAPPED;
+
+	return dlb_port_slot_state_transition(hw, p, q, i, state);
+}
+
+static void dlb_ldb_port_change_qid_priority(struct dlb_hw *hw,
+					     struct dlb_ldb_port *port,
+					     int slot,
+					     struct dlb_map_qid_args *args)
+{
+	union dlb_lsp_cq2priov r0;
+
+	/* Read-modify-write the priority and valid bit register */
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_CQ2PRIOV(port->id.phys_id));
+
+	r0.field.v |= 1 << slot;
+	r0.field.prio |= (args->priority & 0x7) << slot * 3;
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ2PRIOV(port->id.phys_id), r0.val);
+
+	dlb_flush_csr(hw);
+
+	port->qid_map[slot].priority = args->priority;
+}
+
+static int dlb_ldb_port_set_has_work_bits(struct dlb_hw *hw,
+					  struct dlb_ldb_port *port,
+					  struct dlb_ldb_queue *queue,
+					  int slot)
+{
+	union dlb_lsp_qid_aqed_active_cnt r0;
+	union dlb_lsp_qid_ldb_enqueue_cnt r1;
+	union dlb_lsp_ldb_sched_ctrl r2 = { {0} };
+
+	/* Set the atomic scheduling haswork bit */
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_QID_AQED_ACTIVE_CNT(queue->id.phys_id));
+
+	r2.field.cq = port->id.phys_id;
+	r2.field.qidix = slot;
+	r2.field.value = 1;
+	r2.field.rlist_haswork_v = r0.field.count > 0;
+
+	/* Set the non-atomic scheduling haswork bit */
+	DLB_CSR_WR(hw, DLB_LSP_LDB_SCHED_CTRL, r2.val);
+
+	r1.val = DLB_CSR_RD(hw, DLB_LSP_QID_LDB_ENQUEUE_CNT(queue->id.phys_id));
+
+	memset(&r2, 0, sizeof(r2));
+
+	r2.field.cq = port->id.phys_id;
+	r2.field.qidix = slot;
+	r2.field.value = 1;
+	r2.field.nalb_haswork_v = (r1.field.count > 0);
+
+	DLB_CSR_WR(hw, DLB_LSP_LDB_SCHED_CTRL, r2.val);
+
+	dlb_flush_csr(hw);
+
+	return 0;
+}
+
+static void dlb_ldb_port_clear_has_work_bits(struct dlb_hw *hw,
+					     struct dlb_ldb_port *port,
+					     u8 slot)
+{
+	union dlb_lsp_ldb_sched_ctrl r2 = { {0} };
+
+	r2.field.cq = port->id.phys_id;
+	r2.field.qidix = slot;
+	r2.field.value = 0;
+	r2.field.rlist_haswork_v = 1;
+
+	DLB_CSR_WR(hw, DLB_LSP_LDB_SCHED_CTRL, r2.val);
+
+	memset(&r2, 0, sizeof(r2));
+
+	r2.field.cq = port->id.phys_id;
+	r2.field.qidix = slot;
+	r2.field.value = 0;
+	r2.field.nalb_haswork_v = 1;
+
+	DLB_CSR_WR(hw, DLB_LSP_LDB_SCHED_CTRL, r2.val);
+
+	dlb_flush_csr(hw);
+}
+
+static void dlb_ldb_port_clear_queue_if_status(struct dlb_hw *hw,
+					       struct dlb_ldb_port *port,
+					       int slot)
+{
+	union dlb_lsp_ldb_sched_ctrl r0 = { {0} };
+
+	r0.field.cq = port->id.phys_id;
+	r0.field.qidix = slot;
+	r0.field.value = 0;
+	r0.field.inflight_ok_v = 1;
+
+	DLB_CSR_WR(hw, DLB_LSP_LDB_SCHED_CTRL, r0.val);
+
+	dlb_flush_csr(hw);
+}
+
+static void dlb_ldb_port_set_queue_if_status(struct dlb_hw *hw,
+					     struct dlb_ldb_port *port,
+					     int slot)
+{
+	union dlb_lsp_ldb_sched_ctrl r0 = { {0} };
+
+	r0.field.cq = port->id.phys_id;
+	r0.field.qidix = slot;
+	r0.field.value = 1;
+	r0.field.inflight_ok_v = 1;
+
+	DLB_CSR_WR(hw, DLB_LSP_LDB_SCHED_CTRL, r0.val);
+
+	dlb_flush_csr(hw);
+}
+
+static void dlb_ldb_queue_set_inflight_limit(struct dlb_hw *hw,
+					     struct dlb_ldb_queue *queue)
+{
+	union dlb_lsp_qid_ldb_infl_lim r0 = { {0} };
+
+	r0.field.limit = queue->num_qid_inflights;
+
+	DLB_CSR_WR(hw, DLB_LSP_QID_LDB_INFL_LIM(queue->id.phys_id), r0.val);
+}
+
+static void dlb_ldb_queue_clear_inflight_limit(struct dlb_hw *hw,
+					       struct dlb_ldb_queue *queue)
+{
+	DLB_CSR_WR(hw,
+		   DLB_LSP_QID_LDB_INFL_LIM(queue->id.phys_id),
+		   DLB_LSP_QID_LDB_INFL_LIM_RST);
+}
+
+/* dlb_ldb_queue_{enable, disable}_mapped_cqs() don't operate exactly as their
+ * function names imply, and should only be called by the dynamic CQ mapping
+ * code.
+ */
+static void dlb_ldb_queue_disable_mapped_cqs(struct dlb_hw *hw,
+					     struct dlb_domain *domain,
+					     struct dlb_ldb_queue *queue)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+	int slot;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		enum dlb_qid_map_state state = DLB_QUEUE_MAPPED;
+
+		if (!dlb_port_find_slot_queue(port, state, queue, &slot))
+			continue;
+
+		if (port->enabled)
+			dlb_ldb_port_cq_disable(hw, port);
+	}
+}
+
+static void dlb_ldb_queue_enable_mapped_cqs(struct dlb_hw *hw,
+					    struct dlb_domain *domain,
+					    struct dlb_ldb_queue *queue)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+	int slot;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		enum dlb_qid_map_state state = DLB_QUEUE_MAPPED;
+
+		if (!dlb_port_find_slot_queue(port, state, queue, &slot))
+			continue;
+
+		if (port->enabled)
+			dlb_ldb_port_cq_enable(hw, port);
+	}
+}
+
+static int dlb_ldb_port_finish_map_qid_dynamic(struct dlb_hw *hw,
+					       struct dlb_domain *domain,
+					       struct dlb_ldb_port *port,
+					       struct dlb_ldb_queue *queue)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_lsp_qid_ldb_infl_cnt r0;
+	enum dlb_qid_map_state state;
+	int slot, ret;
+	u8 prio;
+
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_QID_LDB_INFL_CNT(queue->id.phys_id));
+
+	if (r0.field.count) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: non-zero QID inflight count\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	/* For each port with a pending mapping to this queue, perform the
+	 * static mapping and set the corresponding has_work bits.
+	 */
+	state = DLB_QUEUE_MAP_IN_PROGRESS;
+	if (!dlb_port_find_slot_queue(port, state, queue, &slot))
+		return -EINVAL;
+
+	if (slot >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port slot tracking failed\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	prio = port->qid_map[slot].priority;
+
+	/* Update the CQ2QID, CQ2PRIOV, and QID2CQIDX registers, and
+	 * the port's qid_map state.
+	 */
+	ret = dlb_ldb_port_map_qid_static(hw, port, queue, prio);
+	if (ret)
+		return ret;
+
+	ret = dlb_ldb_port_set_has_work_bits(hw, port, queue, slot);
+	if (ret)
+		return ret;
+
+	/* Ensure IF_status(cq,qid) is 0 before enabling the port to
+	 * prevent spurious schedules to cause the queue's inflight
+	 * count to increase.
+	 */
+	dlb_ldb_port_clear_queue_if_status(hw, port, slot);
+
+	/* Reset the queue's inflight status */
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		state = DLB_QUEUE_MAPPED;
+		if (!dlb_port_find_slot_queue(port, state, queue, &slot))
+			continue;
+
+		dlb_ldb_port_set_queue_if_status(hw, port, slot);
+	}
+
+	dlb_ldb_queue_set_inflight_limit(hw, queue);
+
+	/* Re-enable CQs mapped to this queue */
+	dlb_ldb_queue_enable_mapped_cqs(hw, domain, queue);
+
+	/* If this queue has other mappings pending, clear its inflight limit */
+	if (queue->num_pending_additions > 0)
+		dlb_ldb_queue_clear_inflight_limit(hw, queue);
+
+	return 0;
+}
+
+/**
+ * dlb_ldb_port_map_qid_dynamic() - perform a "dynamic" QID->CQ mapping
+ * @hw: dlb_hw handle for a particular device.
+ * @port: load-balanced port
+ * @queue: load-balanced queue
+ * @priority: queue servicing priority
+ *
+ * Returns 0 if the queue was mapped, 1 if the mapping is scheduled to occur
+ * at a later point, and <0 if an error occurred.
+ */
+static int dlb_ldb_port_map_qid_dynamic(struct dlb_hw *hw,
+					struct dlb_ldb_port *port,
+					struct dlb_ldb_queue *queue,
+					u8 priority)
+{
+	union dlb_lsp_qid_ldb_infl_cnt r0 = { {0} };
+	enum dlb_qid_map_state state;
+	struct dlb_domain *domain;
+	int slot, ret;
+
+	domain = dlb_get_domain_from_id(hw, port->domain_id.phys_id, false, 0);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: unable to find domain %d\n",
+			   __func__, port->domain_id.phys_id);
+		return -EFAULT;
+	}
+
+	/* Set the QID inflight limit to 0 to prevent further scheduling of the
+	 * queue.
+	 */
+	DLB_CSR_WR(hw, DLB_LSP_QID_LDB_INFL_LIM(queue->id.phys_id), 0);
+
+	if (!dlb_port_find_slot(port, DLB_QUEUE_UNMAPPED, &slot)) {
+		DLB_HW_ERR(hw,
+			   "Internal error: No available unmapped slots\n");
+		return -EFAULT;
+	}
+
+	if (slot >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port slot tracking failed\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	port->qid_map[slot].qid = queue->id.phys_id;
+	port->qid_map[slot].priority = priority;
+
+	state = DLB_QUEUE_MAP_IN_PROGRESS;
+	ret = dlb_port_slot_state_transition(hw, port, queue, slot, state);
+	if (ret)
+		return ret;
+
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_QID_LDB_INFL_CNT(queue->id.phys_id));
+
+	if (r0.field.count) {
+		/* The queue is owed completions so it's not safe to map it
+		 * yet. Schedule a kernel thread to complete the mapping later,
+		 * once software has completed all the queue's inflight events.
+		 */
+		if (!os_worker_active(hw))
+			os_schedule_work(hw);
+
+		return 1;
+	}
+
+	/* Disable the affected CQ, and the CQs already mapped to the QID,
+	 * before reading the QID's inflight count a second time. There is an
+	 * unlikely race in which the QID may schedule one more QE after we
+	 * read an inflight count of 0, and disabling the CQs guarantees that
+	 * the race will not occur after a re-read of the inflight count
+	 * register.
+	 */
+	if (port->enabled)
+		dlb_ldb_port_cq_disable(hw, port);
+
+	dlb_ldb_queue_disable_mapped_cqs(hw, domain, queue);
+
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_QID_LDB_INFL_CNT(queue->id.phys_id));
+
+	if (r0.field.count) {
+		if (port->enabled)
+			dlb_ldb_port_cq_enable(hw, port);
+
+		dlb_ldb_queue_enable_mapped_cqs(hw, domain, queue);
+
+		/* The queue is owed completions so it's not safe to map it
+		 * yet. Schedule a kernel thread to complete the mapping later,
+		 * once software has completed all the queue's inflight events.
+		 */
+		if (!os_worker_active(hw))
+			os_schedule_work(hw);
+
+		return 1;
+	}
+
+	return dlb_ldb_port_finish_map_qid_dynamic(hw, domain, port, queue);
+}
+
+static int dlb_ldb_port_map_qid(struct dlb_hw *hw,
+				struct dlb_domain *domain,
+				struct dlb_ldb_port *port,
+				struct dlb_ldb_queue *queue,
+				u8 prio)
+{
+	if (domain->started)
+		return dlb_ldb_port_map_qid_dynamic(hw, port, queue, prio);
+	else
+		return dlb_ldb_port_map_qid_static(hw, port, queue, prio);
+}
+
+static int dlb_ldb_port_unmap_qid(struct dlb_hw *hw,
+				  struct dlb_ldb_port *port,
+				  struct dlb_ldb_queue *queue)
+{
+	enum dlb_qid_map_state mapped, in_progress, pending_map, unmapped;
+	union dlb_lsp_cq2priov r0;
+	union dlb_atm_pipe_qid_ldb_qid2cqidx r1;
+	union dlb_lsp_qid_ldb_qid2cqidx r2;
+	union dlb_lsp_qid_ldb_qid2cqidx2 r3;
+	u32 queue_id;
+	u32 port_id;
+	int i;
+
+	/* Find the queue's slot */
+	mapped = DLB_QUEUE_MAPPED;
+	in_progress = DLB_QUEUE_UNMAP_IN_PROGRESS;
+	pending_map = DLB_QUEUE_UNMAP_IN_PROGRESS_PENDING_MAP;
+
+	if (!dlb_port_find_slot_queue(port, mapped, queue, &i) &&
+	    !dlb_port_find_slot_queue(port, in_progress, queue, &i) &&
+	    !dlb_port_find_slot_queue(port, pending_map, queue, &i)) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: QID %d isn't mapped\n",
+			   __func__, __LINE__, queue->id.phys_id);
+		return -EFAULT;
+	}
+
+	if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port slot tracking failed\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	port_id = port->id.phys_id;
+	queue_id = queue->id.phys_id;
+
+	/* Read-modify-write the priority and valid bit register */
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_CQ2PRIOV(port_id));
+
+	r0.field.v &= ~(1 << i);
+
+	DLB_CSR_WR(hw, DLB_LSP_CQ2PRIOV(port_id), r0.val);
+
+	r1.val = DLB_CSR_RD(hw,
+			    DLB_ATM_PIPE_QID_LDB_QID2CQIDX(queue_id,
+							   port_id / 4));
+
+	r2.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_LDB_QID2CQIDX(queue_id,
+						      port_id / 4));
+
+	r3.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_LDB_QID2CQIDX2(queue_id,
+						       port_id / 4));
+
+	switch (port_id % 4) {
+	case 0:
+		r1.field.cq_p0 &= ~(1 << i);
+		r2.field.cq_p0 &= ~(1 << i);
+		r3.field.cq_p0 &= ~(1 << i);
+		break;
+
+	case 1:
+		r1.field.cq_p1 &= ~(1 << i);
+		r2.field.cq_p1 &= ~(1 << i);
+		r3.field.cq_p1 &= ~(1 << i);
+		break;
+
+	case 2:
+		r1.field.cq_p2 &= ~(1 << i);
+		r2.field.cq_p2 &= ~(1 << i);
+		r3.field.cq_p2 &= ~(1 << i);
+		break;
+
+	case 3:
+		r1.field.cq_p3 &= ~(1 << i);
+		r2.field.cq_p3 &= ~(1 << i);
+		r3.field.cq_p3 &= ~(1 << i);
+		break;
+	}
+
+	DLB_CSR_WR(hw,
+		   DLB_ATM_PIPE_QID_LDB_QID2CQIDX(queue_id, port_id / 4),
+		   r1.val);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_QID_LDB_QID2CQIDX(queue_id, port_id / 4),
+		   r2.val);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_QID_LDB_QID2CQIDX2(queue_id, port_id / 4),
+		   r3.val);
+
+	dlb_flush_csr(hw);
+
+	unmapped = DLB_QUEUE_UNMAPPED;
+
+	return dlb_port_slot_state_transition(hw, port, queue, i, unmapped);
+}
+
+static void
+dlb_log_create_sched_domain_args(struct dlb_hw *hw,
+				 struct dlb_create_sched_domain_args *args,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB create sched domain arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tNumber of LDB queues:        %d\n",
+		    args->num_ldb_queues);
+	DLB_HW_INFO(hw, "\tNumber of LDB ports:         %d\n",
+		    args->num_ldb_ports);
+	DLB_HW_INFO(hw, "\tNumber of DIR ports:         %d\n",
+		    args->num_dir_ports);
+	DLB_HW_INFO(hw, "\tNumber of ATM inflights:     %d\n",
+		    args->num_atomic_inflights);
+	DLB_HW_INFO(hw, "\tNumber of hist list entries: %d\n",
+		    args->num_hist_list_entries);
+	DLB_HW_INFO(hw, "\tNumber of LDB credits:       %d\n",
+		    args->num_ldb_credits);
+	DLB_HW_INFO(hw, "\tNumber of DIR credits:       %d\n",
+		    args->num_dir_credits);
+	DLB_HW_INFO(hw, "\tNumber of LDB credit pools:  %d\n",
+		    args->num_ldb_credit_pools);
+	DLB_HW_INFO(hw, "\tNumber of DIR credit pools:  %d\n",
+		    args->num_dir_credit_pools);
+}
+
+/**
+ * dlb_hw_create_sched_domain() - Allocate and initialize a DLB scheduling
+ *	domain and its resources.
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ * @vf_request: Request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * Return: returns < 0 on error, 0 otherwise. If the driver is unable to
+ * satisfy a request, resp->status will be set accordingly.
+ */
+int dlb_hw_create_sched_domain(struct dlb_hw *hw,
+			       struct dlb_create_sched_domain_args *args,
+			       struct dlb_cmd_response *resp,
+			       bool vf_request,
+			       unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_function_resources *rsrcs;
+	int ret;
+
+	rsrcs = (vf_request) ? &hw->vf[vf_id] : &hw->pf;
+
+	dlb_log_create_sched_domain_args(hw, args, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_create_sched_domain_args(hw, rsrcs, args, resp))
+		return -EINVAL;
+
+	domain = DLB_FUNC_LIST_HEAD(rsrcs->avail_domains, typeof(*domain));
+
+	/* Verification should catch this. */
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: no available domains\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	if (domain->configured) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: avail_domains contains configured domains.\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	dlb_init_domain_rsrc_lists(domain);
+
+	/* Verification should catch this too. */
+	ret = dlb_domain_attach_resources(hw, rsrcs, domain, args, resp);
+	if (ret < 0) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: failed to verify args.\n",
+			   __func__);
+
+		return -EFAULT;
+	}
+
+	dlb_list_del(&rsrcs->avail_domains, &domain->func_list);
+
+	dlb_list_add(&rsrcs->used_domains, &domain->func_list);
+
+	resp->id = (vf_request) ? domain->id.virt_id : domain->id.phys_id;
+	resp->status = 0;
+
+	return 0;
+}
+
+static void
+dlb_log_create_ldb_pool_args(struct dlb_hw *hw,
+			     u32 domain_id,
+			     struct dlb_create_ldb_pool_args *args,
+			     bool vf_request,
+			     unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB create load-balanced credit pool arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID:             %d\n", domain_id);
+	DLB_HW_INFO(hw, "\tNumber of LDB credits: %d\n",
+		    args->num_ldb_credits);
+}
+
+/**
+ * dlb_hw_create_ldb_pool() - Allocate and initialize a DLB credit pool.
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ *
+ * Return: returns < 0 on error, 0 otherwise. If the driver is unable to
+ * satisfy a request, resp->status will be set accordingly.
+ */
+int dlb_hw_create_ldb_pool(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_create_ldb_pool_args *args,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id)
+{
+	struct dlb_credit_pool *pool;
+	struct dlb_domain *domain;
+
+	dlb_log_create_ldb_pool_args(hw, domain_id, args, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_create_ldb_pool_args(hw,
+					    domain_id,
+					    args,
+					    resp,
+					    vf_request,
+					    vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	pool = DLB_DOM_LIST_HEAD(domain->avail_ldb_credit_pools, typeof(*pool));
+
+	/* Verification should catch this. */
+	if (!pool) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: no available ldb credit pools\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	dlb_configure_ldb_credit_pool(hw, domain, args, pool);
+
+	/* Configuration succeeded, so move the resource from the 'avail' to
+	 * the 'used' list.
+	 */
+	dlb_list_del(&domain->avail_ldb_credit_pools, &pool->domain_list);
+
+	dlb_list_add(&domain->used_ldb_credit_pools, &pool->domain_list);
+
+	resp->status = 0;
+	resp->id = (vf_request) ? pool->id.virt_id : pool->id.phys_id;
+
+	return 0;
+}
+
+static void
+dlb_log_create_dir_pool_args(struct dlb_hw *hw,
+			     u32 domain_id,
+			     struct dlb_create_dir_pool_args *args,
+			     bool vf_request,
+			     unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB create directed credit pool arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID:             %d\n", domain_id);
+	DLB_HW_INFO(hw, "\tNumber of DIR credits: %d\n",
+		    args->num_dir_credits);
+}
+
+/**
+ * dlb_hw_create_dir_pool() - Allocate and initialize a DLB credit pool.
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ *
+ * Return: returns < 0 on error, 0 otherwise. If the driver is unable to
+ * satisfy a request, resp->status will be set accordingly.
+ */
+int dlb_hw_create_dir_pool(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_create_dir_pool_args *args,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id)
+{
+	struct dlb_credit_pool *pool;
+	struct dlb_domain *domain;
+
+	dlb_log_create_dir_pool_args(hw, domain_id, args, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	/* At least one available pool */
+	if (dlb_verify_create_dir_pool_args(hw,
+					    domain_id,
+					    args,
+					    resp,
+					    vf_request,
+					    vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	pool = DLB_DOM_LIST_HEAD(domain->avail_dir_credit_pools, typeof(*pool));
+
+	/* Verification should catch this. */
+	if (!pool) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: no available dir credit pools\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	dlb_configure_dir_credit_pool(hw, domain, args, pool);
+
+	/* Configuration succeeded, so move the resource from the 'avail' to
+	 * the 'used' list.
+	 */
+	dlb_list_del(&domain->avail_dir_credit_pools, &pool->domain_list);
+
+	dlb_list_add(&domain->used_dir_credit_pools, &pool->domain_list);
+
+	resp->status = 0;
+	resp->id = (vf_request) ? pool->id.virt_id : pool->id.phys_id;
+
+	return 0;
+}
+
+static void
+dlb_log_create_ldb_queue_args(struct dlb_hw *hw,
+			      u32 domain_id,
+			      struct dlb_create_ldb_queue_args *args,
+			      bool vf_request,
+			      unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB create load-balanced queue arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID:                  %d\n",
+		    domain_id);
+	DLB_HW_INFO(hw, "\tNumber of sequence numbers: %d\n",
+		    args->num_sequence_numbers);
+	DLB_HW_INFO(hw, "\tNumber of QID inflights:    %d\n",
+		    args->num_qid_inflights);
+	DLB_HW_INFO(hw, "\tNumber of ATM inflights:    %d\n",
+		    args->num_atomic_inflights);
+}
+
+/**
+ * dlb_hw_create_ldb_queue() - Allocate and initialize a DLB LDB queue.
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ *
+ * Return: returns < 0 on error, 0 otherwise. If the driver is unable to
+ * satisfy a request, resp->status will be set accordingly.
+ */
+int dlb_hw_create_ldb_queue(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_create_ldb_queue_args *args,
+			    struct dlb_cmd_response *resp,
+			    bool vf_request,
+			    unsigned int vf_id)
+{
+	struct dlb_ldb_queue *queue;
+	struct dlb_domain *domain;
+	int ret;
+
+	dlb_log_create_ldb_queue_args(hw, domain_id, args, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	/* At least one available queue */
+	if (dlb_verify_create_ldb_queue_args(hw,
+					     domain_id,
+					     args,
+					     resp,
+					     vf_request,
+					     vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	queue = DLB_DOM_LIST_HEAD(domain->avail_ldb_queues, typeof(*queue));
+
+	/* Verification should catch this. */
+	if (!queue) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: no available ldb queues\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	ret = dlb_ldb_queue_attach_resources(hw, domain, queue, args);
+	if (ret < 0) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: failed to attach the ldb queue resources\n",
+			   __func__, __LINE__);
+		return ret;
+	}
+
+	dlb_configure_ldb_queue(hw, domain, queue, args, vf_request, vf_id);
+
+	queue->num_mappings = 0;
+
+	queue->configured = true;
+
+	/* Configuration succeeded, so move the resource from the 'avail' to
+	 * the 'used' list.
+	 */
+	dlb_list_del(&domain->avail_ldb_queues, &queue->domain_list);
+
+	dlb_list_add(&domain->used_ldb_queues, &queue->domain_list);
+
+	resp->status = 0;
+	resp->id = (vf_request) ? queue->id.virt_id : queue->id.phys_id;
+
+	return 0;
+}
+
+static void
+dlb_log_create_dir_queue_args(struct dlb_hw *hw,
+			      u32 domain_id,
+			      struct dlb_create_dir_queue_args *args,
+			      bool vf_request,
+			      unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB create directed queue arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n", domain_id);
+	DLB_HW_INFO(hw, "\tPort ID:   %d\n", args->port_id);
+}
+
+/**
+ * dlb_hw_create_dir_queue() - Allocate and initialize a DLB DIR queue.
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ *
+ * Return: returns < 0 on error, 0 otherwise. If the driver is unable to
+ * satisfy a request, resp->status will be set accordingly.
+ */
+int dlb_hw_create_dir_queue(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_create_dir_queue_args *args,
+			    struct dlb_cmd_response *resp,
+			    bool vf_request,
+			    unsigned int vf_id)
+{
+	struct dlb_dir_pq_pair *queue;
+	struct dlb_domain *domain;
+
+	dlb_log_create_dir_queue_args(hw, domain_id, args, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_create_dir_queue_args(hw,
+					     domain_id,
+					     args,
+					     resp,
+					     vf_request,
+					     vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	if (args->port_id != -1)
+		queue = dlb_get_domain_used_dir_pq(args->port_id,
+						   vf_request,
+						   domain);
+	else
+		queue = DLB_DOM_LIST_HEAD(domain->avail_dir_pq_pairs,
+					  typeof(*queue));
+
+	/* Verification should catch this. */
+	if (!queue) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: no available dir queues\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	dlb_configure_dir_queue(hw, domain, queue, vf_request, vf_id);
+
+	/* Configuration succeeded, so move the resource from the 'avail' to
+	 * the 'used' list (if it's not already there).
+	 */
+	if (args->port_id == -1) {
+		dlb_list_del(&domain->avail_dir_pq_pairs, &queue->domain_list);
+
+		dlb_list_add(&domain->used_dir_pq_pairs, &queue->domain_list);
+	}
+
+	resp->status = 0;
+
+	resp->id = (vf_request) ? queue->id.virt_id : queue->id.phys_id;
+
+	return 0;
+}
+
+static void dlb_log_create_ldb_port_args(struct dlb_hw *hw,
+					 u32 domain_id,
+					 u64 pop_count_dma_base,
+					 u64 cq_dma_base,
+					 struct dlb_create_ldb_port_args *args,
+					 bool vf_request,
+					 unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB create load-balanced port arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID:                 %d\n",
+		    domain_id);
+	DLB_HW_INFO(hw, "\tLDB credit pool ID:        %d\n",
+		    args->ldb_credit_pool_id);
+	DLB_HW_INFO(hw, "\tLDB credit high watermark: %d\n",
+		    args->ldb_credit_high_watermark);
+	DLB_HW_INFO(hw, "\tLDB credit low watermark:  %d\n",
+		    args->ldb_credit_low_watermark);
+	DLB_HW_INFO(hw, "\tLDB credit quantum:        %d\n",
+		    args->ldb_credit_quantum);
+	DLB_HW_INFO(hw, "\tDIR credit pool ID:        %d\n",
+		    args->dir_credit_pool_id);
+	DLB_HW_INFO(hw, "\tDIR credit high watermark: %d\n",
+		    args->dir_credit_high_watermark);
+	DLB_HW_INFO(hw, "\tDIR credit low watermark:  %d\n",
+		    args->dir_credit_low_watermark);
+	DLB_HW_INFO(hw, "\tDIR credit quantum:        %d\n",
+		    args->dir_credit_quantum);
+	DLB_HW_INFO(hw, "\tpop_count_address:         0x%"PRIx64"\n",
+		    pop_count_dma_base);
+	DLB_HW_INFO(hw, "\tCQ depth:                  %d\n",
+		    args->cq_depth);
+	DLB_HW_INFO(hw, "\tCQ hist list size:         %d\n",
+		    args->cq_history_list_size);
+	DLB_HW_INFO(hw, "\tCQ base address:           0x%"PRIx64"\n",
+		    cq_dma_base);
+}
+
+/**
+ * dlb_hw_create_ldb_port() - Allocate and initialize a load-balanced port and
+ *	its resources.
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ *
+ * Return: returns < 0 on error, 0 otherwise. If the driver is unable to
+ * satisfy a request, resp->status will be set accordingly.
+ */
+int dlb_hw_create_ldb_port(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_create_ldb_port_args *args,
+			   u64 pop_count_dma_base,
+			   u64 cq_dma_base,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id)
+{
+	struct dlb_ldb_port *port;
+	struct dlb_domain *domain;
+	int ret;
+
+	dlb_log_create_ldb_port_args(hw,
+				     domain_id,
+				     pop_count_dma_base,
+				     cq_dma_base,
+				     args,
+				     vf_request,
+				     vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_create_ldb_port_args(hw,
+					    domain_id,
+					    pop_count_dma_base,
+					    cq_dma_base,
+					    args,
+					    resp,
+					    vf_request,
+					    vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	port = DLB_DOM_LIST_HEAD(domain->avail_ldb_ports, typeof(*port));
+
+	/* Verification should catch this. */
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: no available ldb ports\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	if (port->configured) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: avail_ldb_ports contains configured ports.\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	ret = dlb_configure_ldb_port(hw,
+				     domain,
+				     port,
+				     pop_count_dma_base,
+				     cq_dma_base,
+				     args,
+				     vf_request,
+				     vf_id);
+	if (ret < 0)
+		return ret;
+
+	/* Configuration succeeded, so move the resource from the 'avail' to
+	 * the 'used' list.
+	 */
+	dlb_list_del(&domain->avail_ldb_ports, &port->domain_list);
+
+	dlb_list_add(&domain->used_ldb_ports, &port->domain_list);
+
+	resp->status = 0;
+	resp->id = (vf_request) ? port->id.virt_id : port->id.phys_id;
+
+	return 0;
+}
+
+static void dlb_log_create_dir_port_args(struct dlb_hw *hw,
+					 u32 domain_id,
+					 u64 pop_count_dma_base,
+					 u64 cq_dma_base,
+					 struct dlb_create_dir_port_args *args,
+					 bool vf_request,
+					 unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB create directed port arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID:                 %d\n",
+		    domain_id);
+	DLB_HW_INFO(hw, "\tLDB credit pool ID:        %d\n",
+		    args->ldb_credit_pool_id);
+	DLB_HW_INFO(hw, "\tLDB credit high watermark: %d\n",
+		    args->ldb_credit_high_watermark);
+	DLB_HW_INFO(hw, "\tLDB credit low watermark:  %d\n",
+		    args->ldb_credit_low_watermark);
+	DLB_HW_INFO(hw, "\tLDB credit quantum:        %d\n",
+		    args->ldb_credit_quantum);
+	DLB_HW_INFO(hw, "\tDIR credit pool ID:        %d\n",
+		    args->dir_credit_pool_id);
+	DLB_HW_INFO(hw, "\tDIR credit high watermark: %d\n",
+		    args->dir_credit_high_watermark);
+	DLB_HW_INFO(hw, "\tDIR credit low watermark:  %d\n",
+		    args->dir_credit_low_watermark);
+	DLB_HW_INFO(hw, "\tDIR credit quantum:        %d\n",
+		    args->dir_credit_quantum);
+	DLB_HW_INFO(hw, "\tpop_count_address:         0x%"PRIx64"\n",
+		    pop_count_dma_base);
+	DLB_HW_INFO(hw, "\tCQ depth:                  %d\n",
+		    args->cq_depth);
+	DLB_HW_INFO(hw, "\tCQ base address:           0x%"PRIx64"\n",
+		    cq_dma_base);
+}
+
+/**
+ * dlb_hw_create_dir_port() - Allocate and initialize a DLB directed port and
+ *	queue. The port/queue pair have the same ID and name.
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ *
+ * Return: returns < 0 on error, 0 otherwise. If the driver is unable to
+ * satisfy a request, resp->status will be set accordingly.
+ */
+int dlb_hw_create_dir_port(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_create_dir_port_args *args,
+			   u64 pop_count_dma_base,
+			   u64 cq_dma_base,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id)
+{
+	struct dlb_dir_pq_pair *port;
+	struct dlb_domain *domain;
+	int ret;
+
+	dlb_log_create_dir_port_args(hw,
+				     domain_id,
+				     pop_count_dma_base,
+				     cq_dma_base,
+				     args,
+				     vf_request,
+				     vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_create_dir_port_args(hw,
+					    domain_id,
+					    pop_count_dma_base,
+					    cq_dma_base,
+					    args,
+					    resp,
+					    vf_request,
+					    vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	if (args->queue_id != -1)
+		port = dlb_get_domain_used_dir_pq(args->queue_id,
+						  vf_request,
+						  domain);
+	else
+		port = DLB_DOM_LIST_HEAD(domain->avail_dir_pq_pairs,
+					 typeof(*port));
+
+	/* Verification should catch this. */
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: no available dir ports\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	ret = dlb_configure_dir_port(hw,
+				     domain,
+				     port,
+				     pop_count_dma_base,
+				     cq_dma_base,
+				     args,
+				     vf_request,
+				     vf_id);
+	if (ret < 0)
+		return ret;
+
+	/* Configuration succeeded, so move the resource from the 'avail' to
+	 * the 'used' list (if it's not already there).
+	 */
+	if (args->queue_id == -1) {
+		dlb_list_del(&domain->avail_dir_pq_pairs, &port->domain_list);
+
+		dlb_list_add(&domain->used_dir_pq_pairs, &port->domain_list);
+	}
+
+	resp->status = 0;
+	resp->id = (vf_request) ? port->id.virt_id : port->id.phys_id;
+
+	return 0;
+}
+
+static void dlb_log_start_domain(struct dlb_hw *hw,
+				 u32 domain_id,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB start domain arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n", domain_id);
+}
+
+/**
+ * dlb_hw_start_domain() - Lock the domain configuration
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ *
+ * Return: returns < 0 on error, 0 otherwise. If the driver is unable to
+ * satisfy a request, resp->status will be set accordingly.
+ */
+int dlb_hw_start_domain(struct dlb_hw *hw,
+			u32 domain_id,
+			__attribute((unused)) struct dlb_start_domain_args *arg,
+			struct dlb_cmd_response *resp,
+			bool vf_request,
+			unsigned int vf_id)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *dir_queue;
+	struct dlb_ldb_queue *ldb_queue;
+	struct dlb_credit_pool *pool;
+	struct dlb_domain *domain;
+
+	dlb_log_start_domain(hw, domain_id, vf_request, vf_id);
+
+	if (dlb_verify_start_domain_args(hw,
+					 domain_id,
+					 resp,
+					 vf_request,
+					 vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* Write the domain's pool credit counts, which have been updated
+	 * during port configuration. The sum of the pool credit count plus
+	 * each producer port's credit count must equal the pool's credit
+	 * allocation *before* traffic is sent.
+	 */
+	DLB_DOM_LIST_FOR(domain->used_ldb_credit_pools, pool, iter)
+		dlb_ldb_pool_write_credit_count_reg(hw, pool->id.phys_id);
+
+	DLB_DOM_LIST_FOR(domain->used_dir_credit_pools, pool, iter)
+		dlb_dir_pool_write_credit_count_reg(hw, pool->id.phys_id);
+
+	/* Enable load-balanced and directed queue write permissions for the
+	 * queues this domain owns. Without this, the DLB will drop all
+	 * incoming traffic to those queues.
+	 */
+	DLB_DOM_LIST_FOR(domain->used_ldb_queues, ldb_queue, iter) {
+		union dlb_sys_ldb_vasqid_v r0 = { {0} };
+		unsigned int offs;
+
+		r0.field.vasqid_v = 1;
+
+		offs = domain->id.phys_id * DLB_MAX_NUM_LDB_QUEUES +
+			ldb_queue->id.phys_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_LDB_VASQID_V(offs), r0.val);
+	}
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, dir_queue, iter) {
+		union dlb_sys_dir_vasqid_v r0 = { {0} };
+		unsigned int offs;
+
+		r0.field.vasqid_v = 1;
+
+		offs = domain->id.phys_id * DLB_MAX_NUM_DIR_PORTS +
+			dir_queue->id.phys_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_DIR_VASQID_V(offs), r0.val);
+	}
+
+	dlb_flush_csr(hw);
+
+	domain->started = true;
+
+	resp->status = 0;
+
+	return 0;
+}
+
+static void dlb_domain_finish_unmap_port_slot(struct dlb_hw *hw,
+					      struct dlb_domain *domain,
+					      struct dlb_ldb_port *port,
+					      int slot)
+{
+	enum dlb_qid_map_state state;
+	struct dlb_ldb_queue *queue;
+
+	queue = &hw->rsrcs.ldb_queues[port->qid_map[slot].qid];
+
+	state = port->qid_map[slot].state;
+
+	/* Update the QID2CQIDX and CQ2QID vectors */
+	dlb_ldb_port_unmap_qid(hw, port, queue);
+
+	/* Ensure the QID will not be serviced by this {CQ, slot} by clearing
+	 * the has_work bits
+	 */
+	dlb_ldb_port_clear_has_work_bits(hw, port, slot);
+
+	/* Reset the {CQ, slot} to its default state */
+	dlb_ldb_port_set_queue_if_status(hw, port, slot);
+
+	/* Re-enable the CQ if it wasn't manually disabled by the user */
+	if (port->enabled)
+		dlb_ldb_port_cq_enable(hw, port);
+
+	/* If there is a mapping that is pending this slot's removal, perform
+	 * the mapping now.
+	 */
+	if (state == DLB_QUEUE_UNMAP_IN_PROGRESS_PENDING_MAP) {
+		struct dlb_ldb_port_qid_map *map;
+		struct dlb_ldb_queue *map_queue;
+		u8 prio;
+
+		map = &port->qid_map[slot];
+
+		map->qid = map->pending_qid;
+		map->priority = map->pending_priority;
+
+		map_queue = &hw->rsrcs.ldb_queues[map->qid];
+		prio = map->priority;
+
+		dlb_ldb_port_map_qid(hw, domain, port, map_queue, prio);
+	}
+}
+
+static bool dlb_domain_finish_unmap_port(struct dlb_hw *hw,
+					 struct dlb_domain *domain,
+					 struct dlb_ldb_port *port)
+{
+	union dlb_lsp_cq_ldb_infl_cnt r0;
+	int i;
+
+	if (port->num_pending_removals == 0)
+		return false;
+
+	/* The unmap requires all the CQ's outstanding inflights to be
+	 * completed.
+	 */
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_CQ_LDB_INFL_CNT(port->id.phys_id));
+	if (r0.field.count > 0)
+		return false;
+
+	for (i = 0; i < DLB_MAX_NUM_QIDS_PER_LDB_CQ; i++) {
+		struct dlb_ldb_port_qid_map *map;
+
+		map = &port->qid_map[i];
+
+		if (map->state != DLB_QUEUE_UNMAP_IN_PROGRESS &&
+		    map->state != DLB_QUEUE_UNMAP_IN_PROGRESS_PENDING_MAP)
+			continue;
+
+		dlb_domain_finish_unmap_port_slot(hw, domain, port, i);
+	}
+
+	return true;
+}
+
+static unsigned int
+dlb_domain_finish_unmap_qid_procedures(struct dlb_hw *hw,
+				       struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	if (!domain->configured || domain->num_pending_removals == 0)
+		return 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter)
+		dlb_domain_finish_unmap_port(hw, domain, port);
+
+	return domain->num_pending_removals;
+}
+
+unsigned int dlb_finish_unmap_qid_procedures(struct dlb_hw *hw)
+{
+	int i, num = 0;
+
+	/* Finish queue unmap jobs for any domain that needs it */
+	for (i = 0; i < DLB_MAX_NUM_DOMAINS; i++) {
+		struct dlb_domain *domain = &hw->domains[i];
+
+		num += dlb_domain_finish_unmap_qid_procedures(hw, domain);
+	}
+
+	return num;
+}
+
+static void dlb_domain_finish_map_port(struct dlb_hw *hw,
+				       struct dlb_domain *domain,
+				       struct dlb_ldb_port *port)
+{
+	int i;
+
+	for (i = 0; i < DLB_MAX_NUM_QIDS_PER_LDB_CQ; i++) {
+		union dlb_lsp_qid_ldb_infl_cnt r0;
+		struct dlb_ldb_queue *queue;
+		int qid;
+
+		if (port->qid_map[i].state != DLB_QUEUE_MAP_IN_PROGRESS)
+			continue;
+
+		qid = port->qid_map[i].qid;
+
+		queue = dlb_get_ldb_queue_from_id(hw, qid, false, 0);
+
+		if (!queue) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: unable to find queue %d\n",
+				   __func__, qid);
+			continue;
+		}
+
+		r0.val = DLB_CSR_RD(hw, DLB_LSP_QID_LDB_INFL_CNT(qid));
+
+		if (r0.field.count)
+			continue;
+
+		/* Disable the affected CQ, and the CQs already mapped to the
+		 * QID, before reading the QID's inflight count a second time.
+		 * There is an unlikely race in which the QID may schedule one
+		 * more QE after we read an inflight count of 0, and disabling
+		 * the CQs guarantees that the race will not occur after a
+		 * re-read of the inflight count register.
+		 */
+		if (port->enabled)
+			dlb_ldb_port_cq_disable(hw, port);
+
+		dlb_ldb_queue_disable_mapped_cqs(hw, domain, queue);
+
+		r0.val = DLB_CSR_RD(hw, DLB_LSP_QID_LDB_INFL_CNT(qid));
+
+		if (r0.field.count) {
+			if (port->enabled)
+				dlb_ldb_port_cq_enable(hw, port);
+
+			dlb_ldb_queue_enable_mapped_cqs(hw, domain, queue);
+
+			continue;
+		}
+
+		dlb_ldb_port_finish_map_qid_dynamic(hw, domain, port, queue);
+	}
+}
+
+static unsigned int
+dlb_domain_finish_map_qid_procedures(struct dlb_hw *hw,
+				     struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	if (!domain->configured || domain->num_pending_additions == 0)
+		return 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter)
+		dlb_domain_finish_map_port(hw, domain, port);
+
+	return domain->num_pending_additions;
+}
+
+unsigned int dlb_finish_map_qid_procedures(struct dlb_hw *hw)
+{
+	int i, num = 0;
+
+	/* Finish queue map jobs for any domain that needs it */
+	for (i = 0; i < DLB_MAX_NUM_DOMAINS; i++) {
+		struct dlb_domain *domain = &hw->domains[i];
+
+		num += dlb_domain_finish_map_qid_procedures(hw, domain);
+	}
+
+	return num;
+}
+
+static void dlb_log_map_qid(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_map_qid_args *args,
+			    bool vf_request,
+			    unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB map QID arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n",
+		    domain_id);
+	DLB_HW_INFO(hw, "\tPort ID:   %d\n",
+		    args->port_id);
+	DLB_HW_INFO(hw, "\tQueue ID:  %d\n",
+		    args->qid);
+	DLB_HW_INFO(hw, "\tPriority:  %d\n",
+		    args->priority);
+}
+
+int dlb_hw_map_qid(struct dlb_hw *hw,
+		   u32 domain_id,
+		   struct dlb_map_qid_args *args,
+		   struct dlb_cmd_response *resp,
+		   bool vf_request,
+		   unsigned int vf_id)
+{
+	enum dlb_qid_map_state state;
+	struct dlb_ldb_queue *queue;
+	struct dlb_ldb_port *port;
+	struct dlb_domain *domain;
+	int ret, i, id;
+	u8 prio;
+
+	dlb_log_map_qid(hw, domain_id, args, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_map_qid_args(hw,
+				    domain_id,
+				    args,
+				    resp,
+				    vf_request,
+				    vf_id))
+		return -EINVAL;
+
+	prio = args->priority;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_ldb_port(id, vf_request, domain);
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	queue = dlb_get_domain_ldb_queue(args->qid, vf_request, domain);
+	if (!queue) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: queue not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* If there are any outstanding detach operations for this port,
+	 * attempt to complete them. This may be necessary to free up a QID
+	 * slot for this requested mapping.
+	 */
+	if (port->num_pending_removals)
+		dlb_domain_finish_unmap_port(hw, domain, port);
+
+	ret = dlb_verify_map_qid_slot_available(port, queue, resp);
+	if (ret)
+		return ret;
+
+	/* Hardware requires disabling the CQ before mapping QIDs. */
+	if (port->enabled)
+		dlb_ldb_port_cq_disable(hw, port);
+
+	/* If this is only a priority change, don't perform the full QID->CQ
+	 * mapping procedure
+	 */
+	state = DLB_QUEUE_MAPPED;
+	if (dlb_port_find_slot_queue(port, state, queue, &i)) {
+		if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+			DLB_HW_ERR(hw,
+				   "[%s():%d] Internal error: port slot tracking failed\n",
+				   __func__, __LINE__);
+			return -EFAULT;
+		}
+
+		if (prio != port->qid_map[i].priority) {
+			dlb_ldb_port_change_qid_priority(hw, port, i, args);
+			DLB_HW_INFO(hw, "DLB map: priority change only\n");
+		}
+
+		state = DLB_QUEUE_MAPPED;
+		ret = dlb_port_slot_state_transition(hw, port, queue, i, state);
+		if (ret)
+			return ret;
+
+		goto map_qid_done;
+	}
+
+	state = DLB_QUEUE_UNMAP_IN_PROGRESS;
+	if (dlb_port_find_slot_queue(port, state, queue, &i)) {
+		if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+			DLB_HW_ERR(hw,
+				   "[%s():%d] Internal error: port slot tracking failed\n",
+				   __func__, __LINE__);
+			return -EFAULT;
+		}
+
+		if (prio != port->qid_map[i].priority) {
+			dlb_ldb_port_change_qid_priority(hw, port, i, args);
+			DLB_HW_INFO(hw, "DLB map: priority change only\n");
+		}
+
+		state = DLB_QUEUE_MAPPED;
+		ret = dlb_port_slot_state_transition(hw, port, queue, i, state);
+		if (ret)
+			return ret;
+
+		goto map_qid_done;
+	}
+
+	/* If this is a priority change on an in-progress mapping, don't
+	 * perform the full QID->CQ mapping procedure.
+	 */
+	state = DLB_QUEUE_MAP_IN_PROGRESS;
+	if (dlb_port_find_slot_queue(port, state, queue, &i)) {
+		if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+			DLB_HW_ERR(hw,
+				   "[%s():%d] Internal error: port slot tracking failed\n",
+				   __func__, __LINE__);
+			return -EFAULT;
+		}
+
+		port->qid_map[i].priority = prio;
+
+		DLB_HW_INFO(hw, "DLB map: priority change only\n");
+
+		goto map_qid_done;
+	}
+
+	/* If this is a priority change on a pending mapping, update the
+	 * pending priority
+	 */
+	if (dlb_port_find_slot_with_pending_map_queue(port, queue, &i)) {
+		if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+			DLB_HW_ERR(hw,
+				   "[%s():%d] Internal error: port slot tracking failed\n",
+				   __func__, __LINE__);
+			return -EFAULT;
+		}
+
+		port->qid_map[i].pending_priority = prio;
+
+		DLB_HW_INFO(hw, "DLB map: priority change only\n");
+
+		goto map_qid_done;
+	}
+
+	/* If all the CQ's slots are in use, then there's an unmap in progress
+	 * (guaranteed by dlb_verify_map_qid_slot_available()), so add this
+	 * mapping to pending_map and return. When the removal is completed for
+	 * the slot's current occupant, this mapping will be performed.
+	 */
+	if (!dlb_port_find_slot(port, DLB_QUEUE_UNMAPPED, &i)) {
+		if (dlb_port_find_slot(port, DLB_QUEUE_UNMAP_IN_PROGRESS, &i)) {
+			enum dlb_qid_map_state state;
+
+			if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+				DLB_HW_ERR(hw,
+					   "[%s():%d] Internal error: port slot tracking failed\n",
+					   __func__, __LINE__);
+				return -EFAULT;
+			}
+
+			port->qid_map[i].pending_qid = queue->id.phys_id;
+			port->qid_map[i].pending_priority = prio;
+
+			state = DLB_QUEUE_UNMAP_IN_PROGRESS_PENDING_MAP;
+
+			ret = dlb_port_slot_state_transition(hw, port, queue,
+							     i, state);
+			if (ret)
+				return ret;
+
+			DLB_HW_INFO(hw, "DLB map: map pending removal\n");
+
+			goto map_qid_done;
+		}
+	}
+
+	/* If the domain has started, a special "dynamic" CQ->queue mapping
+	 * procedure is required in order to safely update the CQ<->QID tables.
+	 * The "static" procedure cannot be used when traffic is flowing,
+	 * because the CQ<->QID tables cannot be updated atomically and the
+	 * scheduler won't see the new mapping unless the queue's if_status
+	 * changes, which isn't guaranteed.
+	 */
+	ret = dlb_ldb_port_map_qid(hw, domain, port, queue, prio);
+
+	/* If ret is less than zero, it's due to an internal error */
+	if (ret < 0)
+		return ret;
+
+map_qid_done:
+	if (port->enabled)
+		dlb_ldb_port_cq_enable(hw, port);
+
+	resp->status = 0;
+
+	return 0;
+}
+
+static void dlb_log_unmap_qid(struct dlb_hw *hw,
+			      u32 domain_id,
+			      struct dlb_unmap_qid_args *args,
+			      bool vf_request,
+			      unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB unmap QID arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n",
+		    domain_id);
+	DLB_HW_INFO(hw, "\tPort ID:   %d\n",
+		    args->port_id);
+	DLB_HW_INFO(hw, "\tQueue ID:  %d\n",
+		    args->qid);
+	if (args->qid < DLB_MAX_NUM_LDB_QUEUES)
+		DLB_HW_INFO(hw, "\tQueue's num mappings:  %d\n",
+			    hw->rsrcs.ldb_queues[args->qid].num_mappings);
+}
+
+int dlb_hw_unmap_qid(struct dlb_hw *hw,
+		     u32 domain_id,
+		     struct dlb_unmap_qid_args *args,
+		     struct dlb_cmd_response *resp,
+		     bool vf_request,
+		     unsigned int vf_id)
+{
+	enum dlb_qid_map_state state;
+	struct dlb_ldb_queue *queue;
+	struct dlb_ldb_port *port;
+	struct dlb_domain *domain;
+	bool unmap_complete;
+	int i, ret, id;
+
+	dlb_log_unmap_qid(hw, domain_id, args, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_unmap_qid_args(hw,
+				      domain_id,
+				      args,
+				      resp,
+				      vf_request,
+				      vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_ldb_port(id, vf_request, domain);
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	queue = dlb_get_domain_ldb_queue(args->qid, vf_request, domain);
+	if (!queue) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: queue not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* If the queue hasn't been mapped yet, we need to update the slot's
+	 * state and re-enable the queue's inflights.
+	 */
+	state = DLB_QUEUE_MAP_IN_PROGRESS;
+	if (dlb_port_find_slot_queue(port, state, queue, &i)) {
+		if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+			DLB_HW_ERR(hw,
+				   "[%s():%d] Internal error: port slot tracking failed\n",
+				   __func__, __LINE__);
+			return -EFAULT;
+		}
+
+		/* Since the in-progress map was aborted, re-enable the QID's
+		 * inflights.
+		 */
+		if (queue->num_pending_additions == 0)
+			dlb_ldb_queue_set_inflight_limit(hw, queue);
+
+		state = DLB_QUEUE_UNMAPPED;
+		ret = dlb_port_slot_state_transition(hw, port, queue, i, state);
+		if (ret)
+			return ret;
+
+		goto unmap_qid_done;
+	}
+
+	/* If the queue mapping is on hold pending an unmap, we simply need to
+	 * update the slot's state.
+	 */
+	if (dlb_port_find_slot_with_pending_map_queue(port, queue, &i)) {
+		if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+			DLB_HW_ERR(hw,
+				   "[%s():%d] Internal error: port slot tracking failed\n",
+				   __func__, __LINE__);
+			return -EFAULT;
+		}
+
+		state = DLB_QUEUE_UNMAP_IN_PROGRESS;
+		ret = dlb_port_slot_state_transition(hw, port, queue, i, state);
+		if (ret)
+			return ret;
+
+		goto unmap_qid_done;
+	}
+
+	state = DLB_QUEUE_MAPPED;
+	if (!dlb_port_find_slot_queue(port, state, queue, &i)) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: no available CQ slots\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	if (i >= DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port slot tracking failed\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* QID->CQ mapping removal is an asychronous procedure. It requires
+	 * stopping the DLB from scheduling this CQ, draining all inflights
+	 * from the CQ, then unmapping the queue from the CQ. This function
+	 * simply marks the port as needing the queue unmapped, and (if
+	 * necessary) starts the unmapping worker thread.
+	 */
+	dlb_ldb_port_cq_disable(hw, port);
+
+	state = DLB_QUEUE_UNMAP_IN_PROGRESS;
+	ret = dlb_port_slot_state_transition(hw, port, queue, i, state);
+	if (ret)
+		return ret;
+
+	/* Attempt to finish the unmapping now, in case the port has no
+	 * outstanding inflights. If that's not the case, this will fail and
+	 * the unmapping will be completed at a later time.
+	 */
+	unmap_complete = dlb_domain_finish_unmap_port(hw, domain, port);
+
+	/* If the unmapping couldn't complete immediately, launch the worker
+	 * thread (if it isn't already launched) to finish it later.
+	 */
+	if (!unmap_complete && !os_worker_active(hw))
+		os_schedule_work(hw);
+
+unmap_qid_done:
+	resp->status = 0;
+
+	return 0;
+}
+
+static void dlb_log_enable_port(struct dlb_hw *hw,
+				u32 domain_id,
+				u32 port_id,
+				bool vf_request,
+				unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB enable port arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n",
+		    domain_id);
+	DLB_HW_INFO(hw, "\tPort ID:   %d\n",
+		    port_id);
+}
+
+int dlb_hw_enable_ldb_port(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_enable_ldb_port_args *args,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id)
+{
+	struct dlb_ldb_port *port;
+	struct dlb_domain *domain;
+	int id;
+
+	dlb_log_enable_port(hw, domain_id, args->port_id, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_enable_ldb_port_args(hw,
+					    domain_id,
+					    args,
+					    resp,
+					    vf_request,
+					    vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_ldb_port(id, vf_request, domain);
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* Hardware requires disabling the CQ before unmapping QIDs. */
+	if (!port->enabled) {
+		dlb_ldb_port_cq_enable(hw, port);
+		port->enabled = true;
+
+		hw->pf.num_enabled_ldb_ports++;
+		dlb_update_ldb_arb_threshold(hw);
+	}
+
+	resp->status = 0;
+
+	return 0;
+}
+
+static void dlb_log_disable_port(struct dlb_hw *hw,
+				 u32 domain_id,
+				 u32 port_id,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB disable port arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n",
+		    domain_id);
+	DLB_HW_INFO(hw, "\tPort ID:   %d\n",
+		    port_id);
+}
+
+int dlb_hw_disable_ldb_port(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_disable_ldb_port_args *args,
+			    struct dlb_cmd_response *resp,
+			    bool vf_request,
+			    unsigned int vf_id)
+{
+	struct dlb_ldb_port *port;
+	struct dlb_domain *domain;
+	int id;
+
+	dlb_log_disable_port(hw, domain_id, args->port_id, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_disable_ldb_port_args(hw,
+					     domain_id,
+					     args,
+					     resp,
+					     vf_request,
+					     vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_ldb_port(id, vf_request, domain);
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* Hardware requires disabling the CQ before unmapping QIDs. */
+	if (port->enabled) {
+		dlb_ldb_port_cq_disable(hw, port);
+		port->enabled = false;
+
+		hw->pf.num_enabled_ldb_ports--;
+		dlb_update_ldb_arb_threshold(hw);
+	}
+
+	resp->status = 0;
+
+	return 0;
+}
+
+int dlb_hw_enable_dir_port(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_enable_dir_port_args *args,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id)
+{
+	struct dlb_dir_pq_pair *port;
+	struct dlb_domain *domain;
+	int id;
+
+	dlb_log_enable_port(hw, domain_id, args->port_id, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_enable_dir_port_args(hw,
+					    domain_id,
+					    args,
+					    resp,
+					    vf_request,
+					    vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_dir_pq(id, vf_request, domain);
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* Hardware requires disabling the CQ before unmapping QIDs. */
+	if (!port->enabled) {
+		dlb_dir_port_cq_enable(hw, port);
+		port->enabled = true;
+	}
+
+	resp->status = 0;
+
+	return 0;
+}
+
+int dlb_hw_disable_dir_port(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_disable_dir_port_args *args,
+			    struct dlb_cmd_response *resp,
+			    bool vf_request,
+			    unsigned int vf_id)
+{
+	struct dlb_dir_pq_pair *port;
+	struct dlb_domain *domain;
+	int id;
+
+	dlb_log_disable_port(hw, domain_id, args->port_id, vf_request, vf_id);
+
+	/* Verify that hardware resources are available before attempting to
+	 * satisfy the request. This simplifies the error unwinding code.
+	 */
+	if (dlb_verify_disable_dir_port_args(hw,
+					     domain_id,
+					     args,
+					     resp,
+					     vf_request,
+					     vf_id))
+		return -EINVAL;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+	if (!domain) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: domain not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	id = args->port_id;
+
+	port = dlb_get_domain_used_dir_pq(id, vf_request, domain);
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s():%d] Internal error: port not found\n",
+			   __func__, __LINE__);
+		return -EFAULT;
+	}
+
+	/* Hardware requires disabling the CQ before unmapping QIDs. */
+	if (port->enabled) {
+		dlb_dir_port_cq_disable(hw, port);
+		port->enabled = false;
+	}
+
+	resp->status = 0;
+
+	return 0;
+}
+
+int dlb_notify_vf(struct dlb_hw *hw,
+		  unsigned int vf_id,
+		  enum dlb_mbox_vf_notification_type notification)
+{
+	struct dlb_mbox_vf_notification_cmd_req req;
+	int retry_cnt;
+
+	req.hdr.type = DLB_MBOX_VF_CMD_NOTIFICATION;
+	req.notification = notification;
+
+	if (dlb_pf_write_vf_mbox_req(hw, vf_id, &req, sizeof(req)))
+		return -1;
+
+	dlb_send_async_pf_to_vf_msg(hw, vf_id);
+
+	/* Timeout after 1 second of inactivity */
+	retry_cnt = 0;
+	while (!dlb_pf_to_vf_complete(hw, vf_id)) {
+		os_msleep(1);
+		if (++retry_cnt >= 1000) {
+			DLB_HW_ERR(hw,
+				   "PF driver timed out waiting for mbox response\n");
+			return -1;
+		}
+	}
+
+	/* No response data expected for notifications. */
+
+	return 0;
+}
+
+int dlb_vf_in_use(struct dlb_hw *hw, unsigned int vf_id)
+{
+	struct dlb_mbox_vf_in_use_cmd_resp resp;
+	struct dlb_mbox_vf_in_use_cmd_req req;
+	int retry_cnt;
+
+	req.hdr.type = DLB_MBOX_VF_CMD_IN_USE;
+
+	if (dlb_pf_write_vf_mbox_req(hw, vf_id, &req, sizeof(req)))
+		return -1;
+
+	dlb_send_async_pf_to_vf_msg(hw, vf_id);
+
+	/* Timeout after 1 second of inactivity */
+	retry_cnt = 0;
+	while (!dlb_pf_to_vf_complete(hw, vf_id)) {
+		os_msleep(1);
+		if (++retry_cnt >= 1000) {
+			DLB_HW_ERR(hw,
+				   "PF driver timed out waiting for mbox response\n");
+			return -1;
+		}
+	}
+
+	if (dlb_pf_read_vf_mbox_resp(hw, vf_id, &resp, sizeof(resp)))
+		return -1;
+
+	if (resp.hdr.status != DLB_MBOX_ST_SUCCESS) {
+		DLB_HW_ERR(hw,
+			   "[%s()]: failed with mailbox error: %s\n",
+			   __func__,
+			   DLB_MBOX_ST_STRING(&resp));
+
+		return -1;
+	}
+
+	return resp.in_use;
+}
+
+static int dlb_vf_domain_alert(struct dlb_hw *hw,
+			       unsigned int vf_id,
+			       u32 domain_id,
+			       u32 alert_id,
+			       u32 aux_alert_data)
+{
+	struct dlb_mbox_vf_alert_cmd_req req;
+	int retry_cnt;
+
+	req.hdr.type = DLB_MBOX_VF_CMD_DOMAIN_ALERT;
+	req.domain_id = domain_id;
+	req.alert_id = alert_id;
+	req.aux_alert_data = aux_alert_data;
+
+	if (dlb_pf_write_vf_mbox_req(hw, vf_id, &req, sizeof(req)))
+		return -1;
+
+	dlb_send_async_pf_to_vf_msg(hw, vf_id);
+
+	/* Timeout after 1 second of inactivity */
+	retry_cnt = 0;
+	while (!dlb_pf_to_vf_complete(hw, vf_id)) {
+		os_msleep(1);
+		if (++retry_cnt >= 1000) {
+			DLB_HW_ERR(hw,
+				   "PF driver timed out waiting for mbox response\n");
+			return -1;
+		}
+	}
+
+	/* No response data expected for alarm notifications. */
+
+	return 0;
+}
+
+void dlb_set_msix_mode(struct dlb_hw *hw, int mode)
+{
+	union dlb_sys_msix_mode r0 = { {0} };
+
+	r0.field.mode = mode;
+
+	DLB_CSR_WR(hw, DLB_SYS_MSIX_MODE, r0.val);
+}
+
+int dlb_configure_ldb_cq_interrupt(struct dlb_hw *hw,
+				   int port_id,
+				   int vector,
+				   int mode,
+				   unsigned int vf,
+				   unsigned int owner_vf,
+				   u16 threshold)
+{
+	union dlb_chp_ldb_cq_int_depth_thrsh r0 = { {0} };
+	union dlb_chp_ldb_cq_int_enb r1 = { {0} };
+	union dlb_sys_ldb_cq_isr r2 = { {0} };
+	struct dlb_ldb_port *port;
+	bool vf_request;
+
+	vf_request = (mode == DLB_CQ_ISR_MODE_MSI);
+
+	port = dlb_get_ldb_port_from_id(hw, port_id, vf_request, vf);
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s()]: Internal error: failed to enable LDB CQ int\n\tport_id: %u, vf_req: %u, vf: %u\n",
+			   __func__, port_id, vf_request, vf);
+		return -EINVAL;
+	}
+
+	/* Trigger the interrupt when threshold or more QEs arrive in the CQ */
+	r0.field.depth_threshold = threshold - 1;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_CQ_INT_DEPTH_THRSH(port->id.phys_id),
+		   r0.val);
+
+	r1.field.en_depth = 1;
+
+	DLB_CSR_WR(hw, DLB_CHP_LDB_CQ_INT_ENB(port->id.phys_id), r1.val);
+
+	r2.field.vector = vector;
+	r2.field.vf = owner_vf;
+	r2.field.en_code = mode;
+
+	DLB_CSR_WR(hw, DLB_SYS_LDB_CQ_ISR(port->id.phys_id), r2.val);
+
+	return 0;
+}
+
+int dlb_configure_dir_cq_interrupt(struct dlb_hw *hw,
+				   int port_id,
+				   int vector,
+				   int mode,
+				   unsigned int vf,
+				   unsigned int owner_vf,
+				   u16 threshold)
+{
+	union dlb_chp_dir_cq_int_depth_thrsh r0 = { {0} };
+	union dlb_chp_dir_cq_int_enb r1 = { {0} };
+	union dlb_sys_dir_cq_isr r2 = { {0} };
+	struct dlb_dir_pq_pair *port;
+	bool vf_request;
+
+	vf_request = (mode == DLB_CQ_ISR_MODE_MSI);
+
+	port = dlb_get_dir_pq_from_id(hw, port_id, vf_request, vf);
+	if (!port) {
+		DLB_HW_ERR(hw,
+			   "[%s()]: Internal error: failed to enable DIR CQ int\n\tport_id: %u, vf_req: %u, vf: %u\n",
+			   __func__, port_id, vf_request, vf);
+		return -EINVAL;
+	}
+
+	/* Trigger the interrupt when threshold or more QEs arrive in the CQ */
+	r0.field.depth_threshold = threshold - 1;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_CQ_INT_DEPTH_THRSH(port->id.phys_id),
+		   r0.val);
+
+	r1.field.en_depth = 1;
+
+	DLB_CSR_WR(hw, DLB_CHP_DIR_CQ_INT_ENB(port->id.phys_id), r1.val);
+
+	r2.field.vector = vector;
+	r2.field.vf = owner_vf;
+	r2.field.en_code = mode;
+
+	DLB_CSR_WR(hw, DLB_SYS_DIR_CQ_ISR(port->id.phys_id), r2.val);
+
+	return 0;
+}
+
+int dlb_arm_cq_interrupt(struct dlb_hw *hw,
+			 int port_id,
+			 bool is_ldb,
+			 bool vf_request,
+			 unsigned int vf_id)
+{
+	u32 val;
+	u32 reg;
+
+	if (vf_request && is_ldb) {
+		struct dlb_ldb_port *ldb_port;
+
+		ldb_port = dlb_get_ldb_port_from_id(hw, port_id, true, vf_id);
+
+		if (!ldb_port || !ldb_port->configured)
+			return -EINVAL;
+
+		port_id = ldb_port->id.phys_id;
+	} else if (vf_request && !is_ldb) {
+		struct dlb_dir_pq_pair *dir_port;
+
+		dir_port = dlb_get_dir_pq_from_id(hw, port_id, true, vf_id);
+
+		if (!dir_port || !dir_port->port_configured)
+			return -EINVAL;
+
+		port_id = dir_port->id.phys_id;
+	}
+
+	val = 1 << (port_id % 32);
+
+	if (is_ldb && port_id < 32)
+		reg = DLB_CHP_LDB_CQ_INTR_ARMED0;
+	else if (is_ldb && port_id < 64)
+		reg = DLB_CHP_LDB_CQ_INTR_ARMED1;
+	else if (!is_ldb && port_id < 32)
+		reg = DLB_CHP_DIR_CQ_INTR_ARMED0;
+	else if (!is_ldb && port_id < 64)
+		reg = DLB_CHP_DIR_CQ_INTR_ARMED1;
+	else if (!is_ldb && port_id < 96)
+		reg = DLB_CHP_DIR_CQ_INTR_ARMED2;
+	else
+		reg = DLB_CHP_DIR_CQ_INTR_ARMED3;
+
+	DLB_CSR_WR(hw, reg, val);
+
+	dlb_flush_csr(hw);
+
+	return 0;
+}
+
+void dlb_read_compressed_cq_intr_status(struct dlb_hw *hw,
+					u32 *ldb_interrupts,
+					u32 *dir_interrupts)
+{
+	/* Read every CQ's interrupt status */
+
+	ldb_interrupts[0] = DLB_CSR_RD(hw, DLB_SYS_LDB_CQ_31_0_OCC_INT_STS);
+	ldb_interrupts[1] = DLB_CSR_RD(hw, DLB_SYS_LDB_CQ_63_32_OCC_INT_STS);
+
+	dir_interrupts[0] = DLB_CSR_RD(hw, DLB_SYS_DIR_CQ_31_0_OCC_INT_STS);
+	dir_interrupts[1] = DLB_CSR_RD(hw, DLB_SYS_DIR_CQ_63_32_OCC_INT_STS);
+	dir_interrupts[2] = DLB_CSR_RD(hw, DLB_SYS_DIR_CQ_95_64_OCC_INT_STS);
+	dir_interrupts[3] = DLB_CSR_RD(hw, DLB_SYS_DIR_CQ_127_96_OCC_INT_STS);
+}
+
+static void dlb_ack_msix_interrupt(struct dlb_hw *hw, int vector)
+{
+	union dlb_sys_msix_ack r0 = { {0} };
+
+	switch (vector) {
+	case 0:
+		r0.field.msix_0_ack = 1;
+		break;
+	case 1:
+		r0.field.msix_1_ack = 1;
+		break;
+	case 2:
+		r0.field.msix_2_ack = 1;
+		break;
+	case 3:
+		r0.field.msix_3_ack = 1;
+		break;
+	case 4:
+		r0.field.msix_4_ack = 1;
+		break;
+	case 5:
+		r0.field.msix_5_ack = 1;
+		break;
+	case 6:
+		r0.field.msix_6_ack = 1;
+		break;
+	case 7:
+		r0.field.msix_7_ack = 1;
+		break;
+	case 8:
+		r0.field.msix_8_ack = 1;
+		/*
+		 * CSSY-1650
+		 * workaround h/w bug for lost MSI-X interrupts
+		 *
+		 * The recommended workaround for acknowledging
+		 * vector 8 interrupts is :
+		 *   1: set   MSI-X mask
+		 *   2: set   MSIX_PASSTHROUGH
+		 *   3: clear MSIX_ACK
+		 *   4: clear MSIX_PASSTHROUGH
+		 *   5: clear MSI-X mask
+		 *
+		 * The MSIX-ACK (step 3) is cleared for all vectors
+		 * below. We handle steps 1 & 2 for vector 8 here.
+		 *
+		 * The bitfields for MSIX_ACK and MSIX_PASSTHRU are
+		 * defined the same, so we just use the MSIX_ACK
+		 * value when writing to PASSTHRU.
+		 */
+
+		/* set MSI-X mask and passthrough for vector 8 */
+		DLB_FUNC_WR(hw, DLB_MSIX_MEM_VECTOR_CTRL(8), 1);
+		DLB_CSR_WR(hw, DLB_SYS_MSIX_PASSTHRU, r0.val);
+		break;
+	}
+
+	/* clear MSIX_ACK (write one to clear) */
+	DLB_CSR_WR(hw, DLB_SYS_MSIX_ACK, r0.val);
+
+	if (vector == 8) {
+		/*
+		 * finish up steps 4 & 5 of the workaround -
+		 * clear pasthrough and mask
+		 */
+		DLB_CSR_WR(hw, DLB_SYS_MSIX_PASSTHRU, 0);
+		DLB_FUNC_WR(hw, DLB_MSIX_MEM_VECTOR_CTRL(8), 0);
+	}
+
+	dlb_flush_csr(hw);
+}
+
+void dlb_ack_compressed_cq_intr(struct dlb_hw *hw,
+				u32 *ldb_interrupts,
+				u32 *dir_interrupts)
+{
+	/* Write back the status regs to ack the interrupts */
+	if (ldb_interrupts[0])
+		DLB_CSR_WR(hw,
+			   DLB_SYS_LDB_CQ_31_0_OCC_INT_STS,
+			   ldb_interrupts[0]);
+	if (ldb_interrupts[1])
+		DLB_CSR_WR(hw,
+			   DLB_SYS_LDB_CQ_63_32_OCC_INT_STS,
+			   ldb_interrupts[1]);
+
+	if (dir_interrupts[0])
+		DLB_CSR_WR(hw,
+			   DLB_SYS_DIR_CQ_31_0_OCC_INT_STS,
+			   dir_interrupts[0]);
+	if (dir_interrupts[1])
+		DLB_CSR_WR(hw,
+			   DLB_SYS_DIR_CQ_63_32_OCC_INT_STS,
+			   dir_interrupts[1]);
+	if (dir_interrupts[2])
+		DLB_CSR_WR(hw,
+			   DLB_SYS_DIR_CQ_95_64_OCC_INT_STS,
+			   dir_interrupts[2]);
+	if (dir_interrupts[3])
+		DLB_CSR_WR(hw,
+			   DLB_SYS_DIR_CQ_127_96_OCC_INT_STS,
+			   dir_interrupts[3]);
+
+	dlb_ack_msix_interrupt(hw, DLB_PF_COMPRESSED_MODE_CQ_VECTOR_ID);
+}
+
+u32 dlb_read_vf_intr_status(struct dlb_hw *hw)
+{
+	return DLB_FUNC_RD(hw, DLB_FUNC_VF_VF_MSI_ISR);
+}
+
+void dlb_ack_vf_intr_status(struct dlb_hw *hw, u32 interrupts)
+{
+	DLB_FUNC_WR(hw, DLB_FUNC_VF_VF_MSI_ISR, interrupts);
+}
+
+void dlb_ack_vf_msi_intr(struct dlb_hw *hw, u32 interrupts)
+{
+	DLB_FUNC_WR(hw, DLB_FUNC_VF_VF_MSI_ISR_PEND, interrupts);
+}
+
+void dlb_ack_pf_mbox_int(struct dlb_hw *hw)
+{
+	union dlb_func_vf_pf2vf_mailbox_isr r0;
+
+	r0.field.pf_isr = 1;
+
+	DLB_FUNC_WR(hw, DLB_FUNC_VF_PF2VF_MAILBOX_ISR, r0.val);
+}
+
+u32 dlb_read_vf_to_pf_int_bitvec(struct dlb_hw *hw)
+{
+	/* The PF has one VF->PF MBOX ISR register per VF space, but they all
+	 * alias to the same physical register.
+	 */
+	return DLB_FUNC_RD(hw, DLB_FUNC_PF_VF2PF_MAILBOX_ISR(0));
+}
+
+void dlb_ack_vf_mbox_int(struct dlb_hw *hw, u32 bitvec)
+{
+	/* The PF has one VF->PF MBOX ISR register per VF space, but they all
+	 * alias to the same physical register.
+	 */
+	DLB_FUNC_WR(hw, DLB_FUNC_PF_VF2PF_MAILBOX_ISR(0), bitvec);
+}
+
+u32 dlb_read_vf_flr_int_bitvec(struct dlb_hw *hw)
+{
+	/* The PF has one VF->PF FLR ISR register per VF space, but they all
+	 * alias to the same physical register.
+	 */
+	return DLB_FUNC_RD(hw, DLB_FUNC_PF_VF2PF_FLR_ISR(0));
+}
+
+void dlb_set_vf_reset_in_progress(struct dlb_hw *hw, int vf)
+{
+	u32 bitvec = DLB_FUNC_RD(hw, DLB_FUNC_PF_VF_RESET_IN_PROGRESS(0));
+
+	bitvec |= (1 << vf);
+
+	DLB_FUNC_WR(hw, DLB_FUNC_PF_VF_RESET_IN_PROGRESS(0), bitvec);
+}
+
+void dlb_clr_vf_reset_in_progress(struct dlb_hw *hw, int vf)
+{
+	u32 bitvec = DLB_FUNC_RD(hw, DLB_FUNC_PF_VF_RESET_IN_PROGRESS(0));
+
+	bitvec &= ~(1 << vf);
+
+	DLB_FUNC_WR(hw, DLB_FUNC_PF_VF_RESET_IN_PROGRESS(0), bitvec);
+}
+
+void dlb_ack_vf_flr_int(struct dlb_hw *hw, u32 bitvec, bool a_stepping)
+{
+	union dlb_sys_func_vf_bar_dsbl r0 = { {0} };
+	u32 clear;
+	int i;
+
+	if (!bitvec)
+		return;
+
+	/* Re-enable access to the VF BAR */
+	r0.field.func_vf_bar_dis = 0;
+	for (i = 0; i < DLB_MAX_NUM_VFS; i++) {
+		if (!(bitvec & (1 << i)))
+			continue;
+
+		DLB_CSR_WR(hw, DLB_SYS_FUNC_VF_BAR_DSBL(i), r0.val);
+	}
+
+	/* Notify the VF driver that the reset has completed. This register is
+	 * RW in A-stepping devices, WOCLR otherwise.
+	 */
+	if (a_stepping) {
+		clear = DLB_FUNC_RD(hw, DLB_FUNC_PF_VF_RESET_IN_PROGRESS(0));
+		clear &= ~bitvec;
+	} else {
+		clear = bitvec;
+	}
+
+	DLB_FUNC_WR(hw, DLB_FUNC_PF_VF_RESET_IN_PROGRESS(0), clear);
+
+	/* Mark the FLR ISR as complete */
+	DLB_FUNC_WR(hw, DLB_FUNC_PF_VF2PF_FLR_ISR(0), bitvec);
+}
+
+void dlb_ack_vf_to_pf_int(struct dlb_hw *hw,
+			  u32 mbox_bitvec,
+			  u32 flr_bitvec)
+{
+	int i;
+
+	dlb_ack_msix_interrupt(hw, DLB_INT_VF_TO_PF_MBOX);
+
+	for (i = 0; i < DLB_MAX_NUM_VFS; i++) {
+		union dlb_func_pf_vf2pf_isr_pend r0 = { {0} };
+
+		if (!((mbox_bitvec & (1 << i)) || (flr_bitvec & (1 << i))))
+			continue;
+
+		/* Unset the VF's ISR pending bit */
+		r0.field.isr_pend = 1;
+		DLB_FUNC_WR(hw, DLB_FUNC_PF_VF2PF_ISR_PEND(i), r0.val);
+	}
+}
+
+void dlb_enable_alarm_interrupts(struct dlb_hw *hw)
+{
+	union dlb_sys_ingress_alarm_enbl r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_SYS_INGRESS_ALARM_ENBL);
+
+	r0.field.illegal_hcw = 1;
+	r0.field.illegal_pp = 1;
+	r0.field.disabled_pp = 1;
+	r0.field.illegal_qid = 1;
+	r0.field.disabled_qid = 1;
+	r0.field.illegal_ldb_qid_cfg = 1;
+	r0.field.illegal_cqid = 1;
+
+	DLB_CSR_WR(hw, DLB_SYS_INGRESS_ALARM_ENBL, r0.val);
+}
+
+void dlb_disable_alarm_interrupts(struct dlb_hw *hw)
+{
+	union dlb_sys_ingress_alarm_enbl r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_SYS_INGRESS_ALARM_ENBL);
+
+	r0.field.illegal_hcw = 0;
+	r0.field.illegal_pp = 0;
+	r0.field.disabled_pp = 0;
+	r0.field.illegal_qid = 0;
+	r0.field.disabled_qid = 0;
+	r0.field.illegal_ldb_qid_cfg = 0;
+	r0.field.illegal_cqid = 0;
+
+	DLB_CSR_WR(hw, DLB_SYS_INGRESS_ALARM_ENBL, r0.val);
+}
+
+static void dlb_log_alarm_syndrome(struct dlb_hw *hw,
+				   const char *str,
+				   union dlb_sys_alarm_hw_synd r0)
+{
+	DLB_HW_ERR(hw, "%s:\n", str);
+	DLB_HW_ERR(hw, "\tsyndrome: 0x%x\n", r0.field.syndrome);
+	DLB_HW_ERR(hw, "\trtype:    0x%x\n", r0.field.rtype);
+	DLB_HW_ERR(hw, "\tfrom_dmv: 0x%x\n", r0.field.from_dmv);
+	DLB_HW_ERR(hw, "\tis_ldb:   0x%x\n", r0.field.is_ldb);
+	DLB_HW_ERR(hw, "\tcls:      0x%x\n", r0.field.cls);
+	DLB_HW_ERR(hw, "\taid:      0x%x\n", r0.field.aid);
+	DLB_HW_ERR(hw, "\tunit:     0x%x\n", r0.field.unit);
+	DLB_HW_ERR(hw, "\tsource:   0x%x\n", r0.field.source);
+	DLB_HW_ERR(hw, "\tmore:     0x%x\n", r0.field.more);
+	DLB_HW_ERR(hw, "\tvalid:    0x%x\n", r0.field.valid);
+}
+
+/* Note: this array's contents must match dlb_alert_id() */
+static const char dlb_alert_strings[NUM_DLB_DOMAIN_ALERTS][128] = {
+	[DLB_DOMAIN_ALERT_PP_OUT_OF_CREDITS] = "Insufficient credits",
+	[DLB_DOMAIN_ALERT_PP_ILLEGAL_ENQ] = "Illegal enqueue",
+	[DLB_DOMAIN_ALERT_PP_EXCESS_TOKEN_POPS] = "Excess token pops",
+	[DLB_DOMAIN_ALERT_ILLEGAL_HCW] = "Illegal HCW",
+	[DLB_DOMAIN_ALERT_ILLEGAL_QID] = "Illegal QID",
+	[DLB_DOMAIN_ALERT_DISABLED_QID] = "Disabled QID",
+};
+
+static void dlb_log_pf_vf_syndrome(struct dlb_hw *hw,
+				   const char *str,
+				   union dlb_sys_alarm_pf_synd0 r0,
+				   union dlb_sys_alarm_pf_synd1 r1,
+				   union dlb_sys_alarm_pf_synd2 r2,
+				   u32 alert_id)
+{
+	DLB_HW_ERR(hw, "%s:\n", str);
+	if (alert_id < NUM_DLB_DOMAIN_ALERTS)
+		DLB_HW_ERR(hw, "Alert: %s\n", dlb_alert_strings[alert_id]);
+	DLB_HW_ERR(hw, "\tsyndrome:     0x%x\n", r0.field.syndrome);
+	DLB_HW_ERR(hw, "\trtype:        0x%x\n", r0.field.rtype);
+	DLB_HW_ERR(hw, "\tfrom_dmv:     0x%x\n", r0.field.from_dmv);
+	DLB_HW_ERR(hw, "\tis_ldb:       0x%x\n", r0.field.is_ldb);
+	DLB_HW_ERR(hw, "\tcls:          0x%x\n", r0.field.cls);
+	DLB_HW_ERR(hw, "\taid:          0x%x\n", r0.field.aid);
+	DLB_HW_ERR(hw, "\tunit:         0x%x\n", r0.field.unit);
+	DLB_HW_ERR(hw, "\tsource:       0x%x\n", r0.field.source);
+	DLB_HW_ERR(hw, "\tmore:         0x%x\n", r0.field.more);
+	DLB_HW_ERR(hw, "\tvalid:        0x%x\n", r0.field.valid);
+	DLB_HW_ERR(hw, "\tdsi:          0x%x\n", r1.field.dsi);
+	DLB_HW_ERR(hw, "\tqid:          0x%x\n", r1.field.qid);
+	DLB_HW_ERR(hw, "\tqtype:        0x%x\n", r1.field.qtype);
+	DLB_HW_ERR(hw, "\tqpri:         0x%x\n", r1.field.qpri);
+	DLB_HW_ERR(hw, "\tmsg_type:     0x%x\n", r1.field.msg_type);
+	DLB_HW_ERR(hw, "\tlock_id:      0x%x\n", r2.field.lock_id);
+	DLB_HW_ERR(hw, "\tmeas:         0x%x\n", r2.field.meas);
+	DLB_HW_ERR(hw, "\tdebug:        0x%x\n", r2.field.debug);
+	DLB_HW_ERR(hw, "\tcq_pop:       0x%x\n", r2.field.cq_pop);
+	DLB_HW_ERR(hw, "\tqe_uhl:       0x%x\n", r2.field.qe_uhl);
+	DLB_HW_ERR(hw, "\tqe_orsp:      0x%x\n", r2.field.qe_orsp);
+	DLB_HW_ERR(hw, "\tqe_valid:     0x%x\n", r2.field.qe_valid);
+	DLB_HW_ERR(hw, "\tcq_int_rearm: 0x%x\n", r2.field.cq_int_rearm);
+	DLB_HW_ERR(hw, "\tdsi_error:    0x%x\n", r2.field.dsi_error);
+}
+
+static void dlb_clear_syndrome_register(struct dlb_hw *hw, u32 offset)
+{
+	union dlb_sys_alarm_hw_synd r0 = { {0} };
+
+	r0.field.valid = 1;
+	r0.field.more = 1;
+
+	DLB_CSR_WR(hw, offset, r0.val);
+}
+
+void dlb_process_alarm_interrupt(struct dlb_hw *hw)
+{
+	union dlb_sys_alarm_hw_synd r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_SYS_ALARM_HW_SYND);
+
+	dlb_log_alarm_syndrome(hw, "HW alarm syndrome", r0);
+
+	dlb_clear_syndrome_register(hw, DLB_SYS_ALARM_HW_SYND);
+
+	dlb_ack_msix_interrupt(hw, DLB_INT_ALARM);
+}
+
+static void dlb_process_ingress_error(struct dlb_hw *hw,
+				      union dlb_sys_alarm_pf_synd0 r0,
+				      u32 alert_id,
+				      bool vf_error,
+				      unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	bool is_ldb;
+	u8 port_id;
+	int ret;
+
+	port_id = r0.field.syndrome & 0x7F;
+	if (r0.field.source == DLB_ALARM_HW_SOURCE_SYS)
+		is_ldb = r0.field.is_ldb;
+	else
+		is_ldb = (r0.field.syndrome & 0x80) != 0;
+
+	/* Get the domain ID and, if it's a VF domain, the virtual port ID */
+	if (is_ldb) {
+		struct dlb_ldb_port *port;
+
+		port = dlb_get_ldb_port_from_id(hw, port_id, vf_error, vf_id);
+
+		if (!port) {
+			DLB_HW_ERR(hw,
+				   "[%s()]: Internal error: unable to find LDB port\n\tport: %u, vf_error: %u, vf_id: %u\n",
+				   __func__, port_id, vf_error, vf_id);
+			return;
+		}
+
+		domain = &hw->domains[port->domain_id.phys_id];
+	} else {
+		struct dlb_dir_pq_pair *port;
+
+		port = dlb_get_dir_pq_from_id(hw, port_id, vf_error, vf_id);
+
+		if (!port) {
+			DLB_HW_ERR(hw,
+				   "[%s()]: Internal error: unable to find DIR port\n\tport: %u, vf_error: %u, vf_id: %u\n",
+				   __func__, port_id, vf_error, vf_id);
+			return;
+		}
+
+		domain = &hw->domains[port->domain_id.phys_id];
+	}
+
+	if (vf_error)
+		ret = dlb_vf_domain_alert(hw,
+					  vf_id,
+					  domain->id.virt_id,
+					  alert_id,
+					  (is_ldb << 8) | port_id);
+	else
+		ret = os_notify_user_space(hw,
+					   domain->id.phys_id,
+					   alert_id,
+					   (is_ldb << 8) | port_id);
+
+	if (ret)
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: failed to notify\n",
+			   __func__);
+}
+
+static u32 dlb_alert_id(union dlb_sys_alarm_pf_synd0 r0)
+{
+	if (r0.field.unit == DLB_ALARM_HW_UNIT_CHP &&
+	    r0.field.aid == DLB_ALARM_HW_CHP_AID_OUT_OF_CREDITS)
+		return DLB_DOMAIN_ALERT_PP_OUT_OF_CREDITS;
+	else if (r0.field.unit == DLB_ALARM_HW_UNIT_CHP &&
+		 r0.field.aid == DLB_ALARM_HW_CHP_AID_ILLEGAL_ENQ)
+		return DLB_DOMAIN_ALERT_PP_ILLEGAL_ENQ;
+	else if (r0.field.unit == DLB_ALARM_HW_UNIT_LSP &&
+		 r0.field.aid == DLB_ALARM_HW_LSP_AID_EXCESS_TOKEN_POPS)
+		return DLB_DOMAIN_ALERT_PP_EXCESS_TOKEN_POPS;
+	else if (r0.field.source == DLB_ALARM_HW_SOURCE_SYS &&
+		 r0.field.aid == DLB_ALARM_SYS_AID_ILLEGAL_HCW)
+		return DLB_DOMAIN_ALERT_ILLEGAL_HCW;
+	else if (r0.field.source == DLB_ALARM_HW_SOURCE_SYS &&
+		 r0.field.aid == DLB_ALARM_SYS_AID_ILLEGAL_QID)
+		return DLB_DOMAIN_ALERT_ILLEGAL_QID;
+	else if (r0.field.source == DLB_ALARM_HW_SOURCE_SYS &&
+		 r0.field.aid == DLB_ALARM_SYS_AID_DISABLED_QID)
+		return DLB_DOMAIN_ALERT_DISABLED_QID;
+	else
+		return NUM_DLB_DOMAIN_ALERTS;
+}
+
+void dlb_process_ingress_error_interrupt(struct dlb_hw *hw)
+{
+	union dlb_sys_alarm_pf_synd0 r0;
+	union dlb_sys_alarm_pf_synd1 r1;
+	union dlb_sys_alarm_pf_synd2 r2;
+	u32 alert_id;
+	int i;
+
+	r0.val = DLB_CSR_RD(hw, DLB_SYS_ALARM_PF_SYND0);
+
+	if (r0.field.valid) {
+		r1.val = DLB_CSR_RD(hw, DLB_SYS_ALARM_PF_SYND1);
+		r2.val = DLB_CSR_RD(hw, DLB_SYS_ALARM_PF_SYND2);
+
+		alert_id = dlb_alert_id(r0);
+
+		dlb_log_pf_vf_syndrome(hw,
+				       "PF Ingress error alarm",
+				       r0, r1, r2, alert_id);
+
+		dlb_clear_syndrome_register(hw, DLB_SYS_ALARM_PF_SYND0);
+
+		dlb_process_ingress_error(hw, r0, alert_id, false, 0);
+	}
+
+	for (i = 0; i < DLB_MAX_NUM_VFS; i++) {
+		r0.val = DLB_CSR_RD(hw, DLB_SYS_ALARM_VF_SYND0(i));
+
+		if (!r0.field.valid)
+			continue;
+
+		r1.val = DLB_CSR_RD(hw, DLB_SYS_ALARM_VF_SYND1(i));
+		r2.val = DLB_CSR_RD(hw, DLB_SYS_ALARM_VF_SYND2(i));
+
+		alert_id = dlb_alert_id(r0);
+
+		dlb_log_pf_vf_syndrome(hw,
+				       "VF Ingress error alarm",
+				       r0, r1, r2, alert_id);
+
+		dlb_clear_syndrome_register(hw,
+					    DLB_SYS_ALARM_VF_SYND0(i));
+
+		dlb_process_ingress_error(hw, r0, alert_id, true, i);
+	}
+
+	dlb_ack_msix_interrupt(hw, DLB_INT_INGRESS_ERROR);
+}
+
+int dlb_get_group_sequence_numbers(struct dlb_hw *hw, unsigned int group_id)
+{
+	if (group_id >= DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS)
+		return -EINVAL;
+
+	return hw->rsrcs.sn_groups[group_id].sequence_numbers_per_queue;
+}
+
+int dlb_get_group_sequence_number_occupancy(struct dlb_hw *hw,
+					    unsigned int group_id)
+{
+	if (group_id >= DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS)
+		return -EINVAL;
+
+	return dlb_sn_group_used_slots(&hw->rsrcs.sn_groups[group_id]);
+}
+
+static void dlb_log_set_group_sequence_numbers(struct dlb_hw *hw,
+					       unsigned int group_id,
+					       unsigned long val)
+{
+	DLB_HW_INFO(hw, "DLB set group sequence numbers:\n");
+	DLB_HW_INFO(hw, "\tGroup ID: %u\n", group_id);
+	DLB_HW_INFO(hw, "\tValue:    %lu\n", val);
+}
+
+int dlb_set_group_sequence_numbers(struct dlb_hw *hw,
+				   unsigned int group_id,
+				   unsigned long val)
+{
+	u32 valid_allocations[6] = {32, 64, 128, 256, 512, 1024};
+	union dlb_ro_pipe_grp_sn_mode r0 = { {0} };
+	struct dlb_sn_group *group;
+	int mode;
+
+	if (group_id >= DLB_MAX_NUM_SEQUENCE_NUMBER_GROUPS)
+		return -EINVAL;
+
+	group = &hw->rsrcs.sn_groups[group_id];
+
+	/* Once the first load-balanced queue using an SN group is configured,
+	 * the group cannot be changed.
+	 */
+	if (group->slot_use_bitmap != 0)
+		return -EPERM;
+
+	for (mode = 0; mode < DLB_MAX_NUM_SEQUENCE_NUMBER_MODES; mode++)
+		if (val == valid_allocations[mode])
+			break;
+
+	if (mode == DLB_MAX_NUM_SEQUENCE_NUMBER_MODES)
+		return -EINVAL;
+
+	group->mode = mode;
+	group->sequence_numbers_per_queue = val;
+
+	r0.field.sn_mode_0 = hw->rsrcs.sn_groups[0].mode;
+	r0.field.sn_mode_1 = hw->rsrcs.sn_groups[1].mode;
+	r0.field.sn_mode_2 = hw->rsrcs.sn_groups[2].mode;
+	r0.field.sn_mode_3 = hw->rsrcs.sn_groups[3].mode;
+
+	DLB_CSR_WR(hw, DLB_RO_PIPE_GRP_SN_MODE, r0.val);
+
+	dlb_log_set_group_sequence_numbers(hw, group_id, val);
+
+	return 0;
+}
+
+void dlb_disable_dp_vasr_feature(struct dlb_hw *hw)
+{
+	union dlb_dp_dir_csr_ctrl r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_DP_DIR_CSR_CTRL);
+
+	r0.field.cfg_vasr_dis = 1;
+
+	DLB_CSR_WR(hw, DLB_DP_DIR_CSR_CTRL, r0.val);
+}
+
+void dlb_enable_excess_tokens_alarm(struct dlb_hw *hw)
+{
+	union dlb_chp_cfg_chp_csr_ctrl r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_CHP_CFG_CHP_CSR_CTRL);
+
+	r0.val |= 1 << DLB_CHP_CFG_EXCESS_TOKENS_SHIFT;
+
+	DLB_CSR_WR(hw, DLB_CHP_CFG_CHP_CSR_CTRL, r0.val);
+}
+
+void dlb_disable_excess_tokens_alarm(struct dlb_hw *hw)
+{
+	union dlb_chp_cfg_chp_csr_ctrl r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_CHP_CFG_CHP_CSR_CTRL);
+
+	r0.val &= ~(1 << DLB_CHP_CFG_EXCESS_TOKENS_SHIFT);
+
+	DLB_CSR_WR(hw, DLB_CHP_CFG_CHP_CSR_CTRL, r0.val);
+}
+
+static int dlb_reset_hw_resource(struct dlb_hw *hw, int type, int id)
+{
+	union dlb_cfg_mstr_diag_reset_sts r0 = { {0} };
+	union dlb_cfg_mstr_bcast_reset_vf_start r1 = { {0} };
+	int i;
+
+	r1.field.vf_reset_start = 1;
+
+	r1.field.vf_reset_type = type;
+	r1.field.vf_reset_id = id;
+
+	DLB_CSR_WR(hw, DLB_CFG_MSTR_BCAST_RESET_VF_START, r1.val);
+
+	/* Wait for hardware to complete. This is a finite time operation,
+	 * but wait set a loop bound just in case.
+	 */
+	for (i = 0; i < 1024 * 1024; i++) {
+		r0.val = DLB_CSR_RD(hw, DLB_CFG_MSTR_DIAG_RESET_STS);
+
+		if (r0.field.chp_vf_reset_done &&
+		    r0.field.rop_vf_reset_done &&
+		    r0.field.lsp_vf_reset_done &&
+		    r0.field.nalb_vf_reset_done &&
+		    r0.field.ap_vf_reset_done &&
+		    r0.field.dp_vf_reset_done &&
+		    r0.field.qed_vf_reset_done &&
+		    r0.field.dqed_vf_reset_done &&
+		    r0.field.aqed_vf_reset_done)
+			return 0;
+
+		os_udelay(1);
+	}
+
+	return -ETIMEDOUT;
+}
+
+static int dlb_domain_reset_hw_resources(struct dlb_hw *hw,
+					 struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *dir_port;
+	struct dlb_ldb_queue *ldb_queue;
+	struct dlb_ldb_port *ldb_port;
+	struct dlb_credit_pool *pool;
+	int ret;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_credit_pools, pool, iter) {
+		ret = dlb_reset_hw_resource(hw,
+					    VF_RST_TYPE_POOL_LDB,
+					    pool->id.phys_id);
+		if (ret)
+			return ret;
+	}
+
+	DLB_DOM_LIST_FOR(domain->used_dir_credit_pools, pool, iter) {
+		ret = dlb_reset_hw_resource(hw,
+					    VF_RST_TYPE_POOL_DIR,
+					    pool->id.phys_id);
+		if (ret)
+			return ret;
+	}
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_queues, ldb_queue, iter) {
+		ret = dlb_reset_hw_resource(hw,
+					    VF_RST_TYPE_QID_LDB,
+					    ldb_queue->id.phys_id);
+		if (ret)
+			return ret;
+	}
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, dir_port, iter) {
+		ret = dlb_reset_hw_resource(hw,
+					    VF_RST_TYPE_QID_DIR,
+					    dir_port->id.phys_id);
+		if (ret)
+			return ret;
+	}
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, ldb_port, iter) {
+		ret = dlb_reset_hw_resource(hw,
+					    VF_RST_TYPE_CQ_LDB,
+					    ldb_port->id.phys_id);
+		if (ret)
+			return ret;
+	}
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, dir_port, iter) {
+		ret = dlb_reset_hw_resource(hw,
+					    VF_RST_TYPE_CQ_DIR,
+					    dir_port->id.phys_id);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static u32 dlb_ldb_cq_inflight_count(struct dlb_hw *hw,
+				     struct dlb_ldb_port *port)
+{
+	union dlb_lsp_cq_ldb_infl_cnt r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_CQ_LDB_INFL_CNT(port->id.phys_id));
+
+	return r0.field.count;
+}
+
+static u32 dlb_ldb_cq_token_count(struct dlb_hw *hw,
+				  struct dlb_ldb_port *port)
+{
+	union dlb_lsp_cq_ldb_tkn_cnt r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_CQ_LDB_TKN_CNT(port->id.phys_id));
+
+	return r0.field.token_count;
+}
+
+static int dlb_drain_ldb_cq(struct dlb_hw *hw, struct dlb_ldb_port *port)
+{
+	u32 infl_cnt, tkn_cnt;
+	unsigned int i;
+
+	infl_cnt = dlb_ldb_cq_inflight_count(hw, port);
+
+	/* Account for the initial token count, which is used in order to
+	 * provide a CQ with depth less than 8.
+	 */
+	tkn_cnt = dlb_ldb_cq_token_count(hw, port) - port->init_tkn_cnt;
+
+	if (infl_cnt || tkn_cnt) {
+		struct dlb_hcw hcw_mem[8], *hcw;
+		void  *pp_addr;
+
+		pp_addr = os_map_producer_port(hw, port->id.phys_id, true);
+
+		/* Point hcw to a 64B-aligned location */
+		hcw = (struct dlb_hcw *)((uintptr_t)&hcw_mem[4] & ~0x3F);
+
+		/* Program the first HCW for a completion and token return and
+		 * the other HCWs as NOOPS
+		 */
+
+		memset(hcw, 0, 4 * sizeof(*hcw));
+		hcw->qe_comp = (infl_cnt > 0);
+		hcw->cq_token = (tkn_cnt > 0);
+		hcw->lock_id = tkn_cnt - 1;
+
+		/* Return tokens in the first HCW */
+		os_enqueue_four_hcws(hw, hcw, pp_addr);
+
+		hcw->cq_token = 0;
+
+		/* Issue remaining completions (if any) */
+		for (i = 1; i < infl_cnt; i++)
+			os_enqueue_four_hcws(hw, hcw, pp_addr);
+
+		os_fence_hcw(hw, pp_addr);
+
+		os_unmap_producer_port(hw, pp_addr);
+	}
+
+	return 0;
+}
+
+static int dlb_domain_wait_for_ldb_cqs_to_empty(struct dlb_hw *hw,
+						struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		int i;
+
+		for (i = 0; i < DLB_MAX_CQ_COMP_CHECK_LOOPS; i++) {
+			if (dlb_ldb_cq_inflight_count(hw, port) == 0)
+				break;
+		}
+
+		if (i == DLB_MAX_CQ_COMP_CHECK_LOOPS) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: failed to flush load-balanced port %d's completions.\n",
+				   __func__, port->id.phys_id);
+			return -EFAULT;
+		}
+	}
+
+	return 0;
+}
+
+static int dlb_domain_reset_software_state(struct dlb_hw *hw,
+					   struct dlb_domain *domain)
+{
+	struct dlb_ldb_queue *tmp_ldb_queue __attribute__((unused));
+	struct dlb_dir_pq_pair *tmp_dir_port __attribute__((unused));
+	struct dlb_ldb_port *tmp_ldb_port __attribute__((unused));
+	struct dlb_credit_pool *tmp_pool __attribute__((unused));
+	struct dlb_list_entry *iter1 __attribute__((unused));
+	struct dlb_list_entry *iter2 __attribute__((unused));
+	struct dlb_ldb_queue *ldb_queue;
+	struct dlb_dir_pq_pair *dir_port;
+	struct dlb_ldb_port *ldb_port;
+	struct dlb_credit_pool *pool;
+
+	struct dlb_function_resources *rsrcs;
+	struct dlb_list_head *list;
+	int ret;
+
+	rsrcs = domain->parent_func;
+
+	/* Move the domain's ldb queues to the function's avail list */
+	list = &domain->used_ldb_queues;
+	DLB_DOM_LIST_FOR_SAFE(*list, ldb_queue, tmp_ldb_queue, iter1, iter2) {
+		if (ldb_queue->sn_cfg_valid) {
+			struct dlb_sn_group *grp;
+
+			grp = &hw->rsrcs.sn_groups[ldb_queue->sn_group];
+
+			dlb_sn_group_free_slot(grp, ldb_queue->sn_slot);
+			ldb_queue->sn_cfg_valid = false;
+		}
+
+		ldb_queue->owned = false;
+		ldb_queue->num_mappings = 0;
+		ldb_queue->num_pending_additions = 0;
+
+		dlb_list_del(&domain->used_ldb_queues, &ldb_queue->domain_list);
+		dlb_list_add(&rsrcs->avail_ldb_queues, &ldb_queue->func_list);
+		rsrcs->num_avail_ldb_queues++;
+	}
+
+	list = &domain->avail_ldb_queues;
+	DLB_DOM_LIST_FOR_SAFE(*list, ldb_queue, tmp_ldb_queue, iter1, iter2) {
+		ldb_queue->owned = false;
+
+		dlb_list_del(&domain->avail_ldb_queues,
+			     &ldb_queue->domain_list);
+		dlb_list_add(&rsrcs->avail_ldb_queues,
+			     &ldb_queue->func_list);
+		rsrcs->num_avail_ldb_queues++;
+	}
+
+	/* Move the domain's ldb ports to the function's avail list */
+	list = &domain->used_ldb_ports;
+	DLB_DOM_LIST_FOR_SAFE(*list, ldb_port, tmp_ldb_port, iter1, iter2) {
+		int i;
+
+		ldb_port->owned = false;
+		ldb_port->configured = false;
+		ldb_port->num_pending_removals = 0;
+		ldb_port->num_mappings = 0;
+		for (i = 0; i < DLB_MAX_NUM_QIDS_PER_LDB_CQ; i++)
+			ldb_port->qid_map[i].state = DLB_QUEUE_UNMAPPED;
+
+		dlb_list_del(&domain->used_ldb_ports, &ldb_port->domain_list);
+		dlb_list_add(&rsrcs->avail_ldb_ports, &ldb_port->func_list);
+		rsrcs->num_avail_ldb_ports++;
+	}
+
+	list = &domain->avail_ldb_ports;
+	DLB_DOM_LIST_FOR_SAFE(*list, ldb_port, tmp_ldb_port, iter1, iter2) {
+		ldb_port->owned = false;
+
+		dlb_list_del(&domain->avail_ldb_ports, &ldb_port->domain_list);
+		dlb_list_add(&rsrcs->avail_ldb_ports, &ldb_port->func_list);
+		rsrcs->num_avail_ldb_ports++;
+	}
+
+	/* Move the domain's dir ports to the function's avail list */
+	list = &domain->used_dir_pq_pairs;
+	DLB_DOM_LIST_FOR_SAFE(*list, dir_port, tmp_dir_port, iter1, iter2) {
+		dir_port->owned = false;
+		dir_port->port_configured = false;
+
+		dlb_list_del(&domain->used_dir_pq_pairs,
+			     &dir_port->domain_list);
+
+		dlb_list_add(&rsrcs->avail_dir_pq_pairs,
+			     &dir_port->func_list);
+		rsrcs->num_avail_dir_pq_pairs++;
+	}
+
+	list = &domain->avail_dir_pq_pairs;
+	DLB_DOM_LIST_FOR_SAFE(*list, dir_port, tmp_dir_port, iter1, iter2) {
+		dir_port->owned = false;
+
+		dlb_list_del(&domain->avail_dir_pq_pairs,
+			     &dir_port->domain_list);
+
+		dlb_list_add(&rsrcs->avail_dir_pq_pairs,
+			     &dir_port->func_list);
+		rsrcs->num_avail_dir_pq_pairs++;
+	}
+
+	/* Return hist list entries to the function */
+	ret = dlb_bitmap_set_range(rsrcs->avail_hist_list_entries,
+				   domain->hist_list_entry_base,
+				   domain->total_hist_list_entries);
+	if (ret) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: domain hist list base doesn't match the function's bitmap.\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	domain->total_hist_list_entries = 0;
+	domain->avail_hist_list_entries = 0;
+	domain->hist_list_entry_base = 0;
+	domain->hist_list_entry_offset = 0;
+
+	/* Return QED entries to the function */
+	ret = dlb_bitmap_set_range(rsrcs->avail_qed_freelist_entries,
+				   domain->qed_freelist.base,
+				   (domain->qed_freelist.bound -
+					domain->qed_freelist.base));
+	if (ret) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: domain QED base doesn't match the function's bitmap.\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	domain->qed_freelist.base = 0;
+	domain->qed_freelist.bound = 0;
+	domain->qed_freelist.offset = 0;
+
+	/* Return DQED entries back to the function */
+	ret = dlb_bitmap_set_range(rsrcs->avail_dqed_freelist_entries,
+				   domain->dqed_freelist.base,
+				   (domain->dqed_freelist.bound -
+					domain->dqed_freelist.base));
+	if (ret) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: domain DQED base doesn't match the function's bitmap.\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	domain->dqed_freelist.base = 0;
+	domain->dqed_freelist.bound = 0;
+	domain->dqed_freelist.offset = 0;
+
+	/* Return AQED entries back to the function */
+	ret = dlb_bitmap_set_range(rsrcs->avail_aqed_freelist_entries,
+				   domain->aqed_freelist.base,
+				   (domain->aqed_freelist.bound -
+					domain->aqed_freelist.base));
+	if (ret) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: domain AQED base doesn't match the function's bitmap.\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	domain->aqed_freelist.base = 0;
+	domain->aqed_freelist.bound = 0;
+	domain->aqed_freelist.offset = 0;
+
+	/* Return ldb credit pools back to the function's avail list */
+	list = &domain->used_ldb_credit_pools;
+	DLB_DOM_LIST_FOR_SAFE(*list, pool, tmp_pool, iter1, iter2) {
+		pool->owned = false;
+		pool->configured = false;
+
+		dlb_list_del(&domain->used_ldb_credit_pools,
+			     &pool->domain_list);
+		dlb_list_add(&rsrcs->avail_ldb_credit_pools,
+			     &pool->func_list);
+		rsrcs->num_avail_ldb_credit_pools++;
+	}
+
+	list = &domain->avail_ldb_credit_pools;
+	DLB_DOM_LIST_FOR_SAFE(*list, pool, tmp_pool, iter1, iter2) {
+		pool->owned = false;
+
+		dlb_list_del(&domain->avail_ldb_credit_pools,
+			     &pool->domain_list);
+		dlb_list_add(&rsrcs->avail_ldb_credit_pools,
+			     &pool->func_list);
+		rsrcs->num_avail_ldb_credit_pools++;
+	}
+
+	/* Move dir credit pools back to the function */
+	list = &domain->used_dir_credit_pools;
+	DLB_DOM_LIST_FOR_SAFE(*list, pool, tmp_pool, iter1, iter2) {
+		pool->owned = false;
+		pool->configured = false;
+
+		dlb_list_del(&domain->used_dir_credit_pools,
+			     &pool->domain_list);
+		dlb_list_add(&rsrcs->avail_dir_credit_pools,
+			     &pool->func_list);
+		rsrcs->num_avail_dir_credit_pools++;
+	}
+
+	list = &domain->avail_dir_credit_pools;
+	DLB_DOM_LIST_FOR_SAFE(*list, pool, tmp_pool, iter1, iter2) {
+		pool->owned = false;
+
+		dlb_list_del(&domain->avail_dir_credit_pools,
+			     &pool->domain_list);
+		dlb_list_add(&rsrcs->avail_dir_credit_pools,
+			     &pool->func_list);
+		rsrcs->num_avail_dir_credit_pools++;
+	}
+
+	domain->num_pending_removals = 0;
+	domain->num_pending_additions = 0;
+	domain->configured = false;
+	domain->started = false;
+
+	/* Move the domain out of the used_domains list and back to the
+	 * function's avail_domains list.
+	 */
+	dlb_list_del(&rsrcs->used_domains, &domain->func_list);
+	dlb_list_add(&rsrcs->avail_domains, &domain->func_list);
+	rsrcs->num_avail_domains++;
+
+	return 0;
+}
+
+void dlb_resource_reset(struct dlb_hw *hw)
+{
+	struct dlb_domain *domain, *next __attribute__((unused));
+	struct dlb_list_entry *iter1 __attribute__((unused));
+	struct dlb_list_entry *iter2 __attribute__((unused));
+	int i;
+
+	for (i = 0; i < DLB_MAX_NUM_VFS; i++) {
+		DLB_FUNC_LIST_FOR_SAFE(hw->vf[i].used_domains, domain,
+				       next, iter1, iter2)
+			dlb_domain_reset_software_state(hw, domain);
+	}
+
+	DLB_FUNC_LIST_FOR_SAFE(hw->pf.used_domains, domain, next, iter1, iter2)
+		dlb_domain_reset_software_state(hw, domain);
+}
+
+static u32 dlb_dir_queue_depth(struct dlb_hw *hw,
+			       struct dlb_dir_pq_pair *queue)
+{
+	union dlb_lsp_qid_dir_enqueue_cnt r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_QID_DIR_ENQUEUE_CNT(queue->id.phys_id));
+
+	return r0.field.count;
+}
+
+static bool dlb_dir_queue_is_empty(struct dlb_hw *hw,
+				   struct dlb_dir_pq_pair *queue)
+{
+	return dlb_dir_queue_depth(hw, queue) == 0;
+}
+
+static void dlb_log_get_dir_queue_depth(struct dlb_hw *hw,
+					u32 domain_id,
+					u32 queue_id,
+					bool vf_request,
+					unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB get directed queue depth:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n", domain_id);
+	DLB_HW_INFO(hw, "\tQueue ID: %d\n", queue_id);
+}
+
+int dlb_hw_get_dir_queue_depth(struct dlb_hw *hw,
+			       u32 domain_id,
+			       struct dlb_get_dir_queue_depth_args *args,
+			       struct dlb_cmd_response *resp,
+			       bool vf_request,
+			       unsigned int vf_id)
+{
+	struct dlb_dir_pq_pair *queue;
+	struct dlb_domain *domain;
+	int id;
+
+	id = domain_id;
+
+	dlb_log_get_dir_queue_depth(hw, domain_id, args->queue_id,
+				    vf_request, vf_id);
+
+	domain = dlb_get_domain_from_id(hw, id, vf_request, vf_id);
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -EINVAL;
+	}
+
+	id = args->queue_id;
+
+	queue = dlb_get_domain_used_dir_pq(id, vf_request, domain);
+	if (!queue) {
+		resp->status = DLB_ST_INVALID_QID;
+		return -EINVAL;
+	}
+
+	resp->id = dlb_dir_queue_depth(hw, queue);
+
+	return 0;
+}
+
+static void
+dlb_log_pending_port_unmaps_args(struct dlb_hw *hw,
+				 struct dlb_pending_port_unmaps_args *args,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB pending port unmaps arguments:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tPort ID: %d\n", args->port_id);
+}
+
+int dlb_hw_pending_port_unmaps(struct dlb_hw *hw,
+			       u32 domain_id,
+			       struct dlb_pending_port_unmaps_args *args,
+			       struct dlb_cmd_response *resp,
+			       bool vf_request,
+			       unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	struct dlb_ldb_port *port;
+
+	dlb_log_pending_port_unmaps_args(hw, args, vf_request, vf_id);
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -EINVAL;
+	}
+
+	port = dlb_get_domain_used_ldb_port(args->port_id, vf_request, domain);
+	if (!port || !port->configured) {
+		resp->status = DLB_ST_INVALID_PORT_ID;
+		return -EINVAL;
+	}
+
+	resp->id = port->num_pending_removals;
+
+	return 0;
+}
+
+/* Returns whether the queue is empty, including its inflight and replay
+ * counts.
+ */
+static bool dlb_ldb_queue_is_empty(struct dlb_hw *hw,
+				   struct dlb_ldb_queue *queue)
+{
+	union dlb_lsp_qid_ldb_replay_cnt r0;
+	union dlb_lsp_qid_aqed_active_cnt r1;
+	union dlb_lsp_qid_atq_enqueue_cnt r2;
+	union dlb_lsp_qid_ldb_enqueue_cnt r3;
+	union dlb_lsp_qid_ldb_infl_cnt r4;
+
+	r0.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_LDB_REPLAY_CNT(queue->id.phys_id));
+	if (r0.val)
+		return false;
+
+	r1.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_AQED_ACTIVE_CNT(queue->id.phys_id));
+	if (r1.val)
+		return false;
+
+	r2.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_ATQ_ENQUEUE_CNT(queue->id.phys_id));
+	if (r2.val)
+		return false;
+
+	r3.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_LDB_ENQUEUE_CNT(queue->id.phys_id));
+	if (r3.val)
+		return false;
+
+	r4.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_LDB_INFL_CNT(queue->id.phys_id));
+	if (r4.val)
+		return false;
+
+	return true;
+}
+
+static void dlb_log_get_ldb_queue_depth(struct dlb_hw *hw,
+					u32 domain_id,
+					u32 queue_id,
+					bool vf_request,
+					unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB get load-balanced queue depth:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n", domain_id);
+	DLB_HW_INFO(hw, "\tQueue ID: %d\n", queue_id);
+}
+
+int dlb_hw_get_ldb_queue_depth(struct dlb_hw *hw,
+			       u32 domain_id,
+			       struct dlb_get_ldb_queue_depth_args *args,
+			       struct dlb_cmd_response *resp,
+			       bool vf_req,
+			       unsigned int vf_id)
+{
+	union dlb_lsp_qid_aqed_active_cnt r0;
+	union dlb_lsp_qid_atq_enqueue_cnt r1;
+	union dlb_lsp_qid_ldb_enqueue_cnt r2;
+	struct dlb_ldb_queue *queue;
+	struct dlb_domain *domain;
+
+	dlb_log_get_ldb_queue_depth(hw, domain_id, args->queue_id,
+				    vf_req, vf_id);
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_req, vf_id);
+	if (!domain) {
+		resp->status = DLB_ST_INVALID_DOMAIN_ID;
+		return -EINVAL;
+	}
+
+	queue = dlb_get_domain_ldb_queue(args->queue_id, vf_req, domain);
+	if (!queue) {
+		resp->status = DLB_ST_INVALID_QID;
+		return -EINVAL;
+	}
+
+	r0.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_AQED_ACTIVE_CNT(queue->id.phys_id));
+
+	r1.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_ATQ_ENQUEUE_CNT(queue->id.phys_id));
+
+	r2.val = DLB_CSR_RD(hw,
+			    DLB_LSP_QID_LDB_ENQUEUE_CNT(queue->id.phys_id));
+
+	resp->id = r0.val + r1.val + r2.val;
+
+	return 0;
+}
+
+static u32 dlb_dir_cq_token_count(struct dlb_hw *hw,
+				  struct dlb_dir_pq_pair *port)
+{
+	union dlb_lsp_cq_dir_tkn_cnt r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_LSP_CQ_DIR_TKN_CNT(port->id.phys_id));
+
+	return r0.field.count;
+}
+
+static int dlb_domain_verify_reset_success(struct dlb_hw *hw,
+					   struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *dir_port;
+	struct dlb_ldb_port *ldb_port;
+	struct dlb_credit_pool *pool;
+	struct dlb_ldb_queue *queue;
+
+	/* Confirm that all credits are returned to the domain's credit pools */
+	DLB_DOM_LIST_FOR(domain->used_dir_credit_pools, pool, iter) {
+		union dlb_chp_dqed_fl_pop_ptr r0;
+		union dlb_chp_dqed_fl_push_ptr r1;
+
+		r0.val = DLB_CSR_RD(hw,
+				    DLB_CHP_DQED_FL_POP_PTR(pool->id.phys_id));
+
+		r1.val = DLB_CSR_RD(hw,
+				    DLB_CHP_DQED_FL_PUSH_PTR(pool->id.phys_id));
+
+		if (r0.field.pop_ptr != r1.field.push_ptr ||
+		    r0.field.generation == r1.field.generation) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: failed to refill directed pool %d's credits.\n",
+				   __func__, pool->id.phys_id);
+			return -EFAULT;
+		}
+	}
+
+	/* Confirm that all the domain's queue's inflight counts and AQED
+	 * active counts are 0.
+	 */
+	DLB_DOM_LIST_FOR(domain->used_ldb_queues, queue, iter) {
+		if (!dlb_ldb_queue_is_empty(hw, queue)) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: failed to empty ldb queue %d\n",
+				   __func__, queue->id.phys_id);
+			return -EFAULT;
+		}
+	}
+
+	/* Confirm that all the domain's CQs inflight and token counts are 0. */
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, ldb_port, iter) {
+		if (dlb_ldb_cq_inflight_count(hw, ldb_port) ||
+		    dlb_ldb_cq_token_count(hw, ldb_port)) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: failed to empty ldb port %d\n",
+				   __func__, ldb_port->id.phys_id);
+			return -EFAULT;
+		}
+	}
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, dir_port, iter) {
+		if (!dlb_dir_queue_is_empty(hw, dir_port)) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: failed to empty dir queue %d\n",
+				   __func__, dir_port->id.phys_id);
+			return -EFAULT;
+		}
+
+		if (dlb_dir_cq_token_count(hw, dir_port)) {
+			DLB_HW_ERR(hw,
+				   "[%s()] Internal error: failed to empty dir port %d\n",
+				   __func__, dir_port->id.phys_id);
+			return -EFAULT;
+		}
+	}
+
+	return 0;
+}
+
+static void __dlb_domain_reset_ldb_port_registers(struct dlb_hw *hw,
+						  struct dlb_ldb_port *port)
+{
+	union dlb_chp_ldb_pp_state_reset r0 = { {0} };
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_CRD_REQ_STATE(port->id.phys_id),
+		   DLB_CHP_LDB_PP_CRD_REQ_STATE_RST);
+
+	/* Reset the port's load-balanced and directed credit state */
+	r0.field.dir_type = 0;
+	r0.field.reset_pp_state = 1;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_STATE_RESET(port->id.phys_id),
+		   r0.val);
+
+	r0.field.dir_type = 1;
+	r0.field.reset_pp_state = 1;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_STATE_RESET(port->id.phys_id),
+		   r0.val);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_DIR_PUSH_PTR(port->id.phys_id),
+		   DLB_CHP_LDB_PP_DIR_PUSH_PTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_LDB_PUSH_PTR(port->id.phys_id),
+		   DLB_CHP_LDB_PP_LDB_PUSH_PTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_LDB_MIN_CRD_QNT(port->id.phys_id),
+		   DLB_CHP_LDB_PP_LDB_MIN_CRD_QNT_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_LDB_CRD_LWM(port->id.phys_id),
+		   DLB_CHP_LDB_PP_LDB_CRD_LWM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_LDB_CRD_HWM(port->id.phys_id),
+		   DLB_CHP_LDB_PP_LDB_CRD_HWM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_LDB_PP2POOL(port->id.phys_id),
+		   DLB_CHP_LDB_LDB_PP2POOL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_DIR_MIN_CRD_QNT(port->id.phys_id),
+		   DLB_CHP_LDB_PP_DIR_MIN_CRD_QNT_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_DIR_CRD_LWM(port->id.phys_id),
+		   DLB_CHP_LDB_PP_DIR_CRD_LWM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_PP_DIR_CRD_HWM(port->id.phys_id),
+		   DLB_CHP_LDB_PP_DIR_CRD_HWM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_DIR_PP2POOL(port->id.phys_id),
+		   DLB_CHP_LDB_DIR_PP2POOL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_PP2LDBPOOL(port->id.phys_id),
+		   DLB_SYS_LDB_PP2LDBPOOL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_PP2DIRPOOL(port->id.phys_id),
+		   DLB_SYS_LDB_PP2DIRPOOL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_HIST_LIST_LIM(port->id.phys_id),
+		   DLB_CHP_HIST_LIST_LIM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_HIST_LIST_BASE(port->id.phys_id),
+		   DLB_CHP_HIST_LIST_BASE_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_HIST_LIST_POP_PTR(port->id.phys_id),
+		   DLB_CHP_HIST_LIST_POP_PTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_HIST_LIST_PUSH_PTR(port->id.phys_id),
+		   DLB_CHP_HIST_LIST_PUSH_PTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_CQ_WPTR(port->id.phys_id),
+		   DLB_CHP_LDB_CQ_WPTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_CQ_INT_DEPTH_THRSH(port->id.phys_id),
+		   DLB_CHP_LDB_CQ_INT_DEPTH_THRSH_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_CQ_TMR_THRESHOLD(port->id.phys_id),
+		   DLB_CHP_LDB_CQ_TMR_THRESHOLD_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_CQ_INT_ENB(port->id.phys_id),
+		   DLB_CHP_LDB_CQ_INT_ENB_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ_LDB_INFL_LIM(port->id.phys_id),
+		   DLB_LSP_CQ_LDB_INFL_LIM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ2PRIOV(port->id.phys_id),
+		   DLB_LSP_CQ2PRIOV_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ_LDB_TOT_SCH_CNT_CTRL(port->id.phys_id),
+		   DLB_LSP_CQ_LDB_TOT_SCH_CNT_CTRL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ_LDB_TKN_DEPTH_SEL(port->id.phys_id),
+		   DLB_LSP_CQ_LDB_TKN_DEPTH_SEL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_LDB_CQ_TKN_DEPTH_SEL(port->id.phys_id),
+		   DLB_CHP_LDB_CQ_TKN_DEPTH_SEL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ_LDB_DSBL(port->id.phys_id),
+		   DLB_LSP_CQ_LDB_DSBL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_CQ2VF_PF(port->id.phys_id),
+		   DLB_SYS_LDB_CQ2VF_PF_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_PP2VF_PF(port->id.phys_id),
+		   DLB_SYS_LDB_PP2VF_PF_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_CQ_ADDR_L(port->id.phys_id),
+		   DLB_SYS_LDB_CQ_ADDR_L_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_CQ_ADDR_U(port->id.phys_id),
+		   DLB_SYS_LDB_CQ_ADDR_U_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_PP_ADDR_L(port->id.phys_id),
+		   DLB_SYS_LDB_PP_ADDR_L_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_PP_ADDR_U(port->id.phys_id),
+		   DLB_SYS_LDB_PP_ADDR_U_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_PP_V(port->id.phys_id),
+		   DLB_SYS_LDB_PP_V_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_PP2VAS(port->id.phys_id),
+		   DLB_SYS_LDB_PP2VAS_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_LDB_CQ_ISR(port->id.phys_id),
+		   DLB_SYS_LDB_CQ_ISR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_WBUF_LDB_FLAGS(port->id.phys_id),
+		   DLB_SYS_WBUF_LDB_FLAGS_RST);
+}
+
+static void dlb_domain_reset_ldb_port_registers(struct dlb_hw *hw,
+						struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter)
+		__dlb_domain_reset_ldb_port_registers(hw, port);
+}
+
+static void __dlb_domain_reset_dir_port_registers(struct dlb_hw *hw,
+						  struct dlb_dir_pq_pair *port)
+{
+	union dlb_chp_dir_pp_state_reset r0 = { {0} };
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_CRD_REQ_STATE(port->id.phys_id),
+		   DLB_CHP_DIR_PP_CRD_REQ_STATE_RST);
+
+	/* Reset the port's load-balanced and directed credit state */
+	r0.field.dir_type = 0;
+	r0.field.reset_pp_state = 1;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_STATE_RESET(port->id.phys_id),
+		   r0.val);
+
+	r0.field.dir_type = 1;
+	r0.field.reset_pp_state = 1;
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_STATE_RESET(port->id.phys_id),
+		   r0.val);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_PUSH_PTR(port->id.phys_id),
+		   DLB_CHP_DIR_PP_DIR_PUSH_PTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_PUSH_PTR(port->id.phys_id),
+		   DLB_CHP_DIR_PP_LDB_PUSH_PTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_MIN_CRD_QNT(port->id.phys_id),
+		   DLB_CHP_DIR_PP_LDB_MIN_CRD_QNT_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_CRD_LWM(port->id.phys_id),
+		   DLB_CHP_DIR_PP_LDB_CRD_LWM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_LDB_CRD_HWM(port->id.phys_id),
+		   DLB_CHP_DIR_PP_LDB_CRD_HWM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_LDB_PP2POOL(port->id.phys_id),
+		   DLB_CHP_DIR_LDB_PP2POOL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_MIN_CRD_QNT(port->id.phys_id),
+		   DLB_CHP_DIR_PP_DIR_MIN_CRD_QNT_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_CRD_LWM(port->id.phys_id),
+		   DLB_CHP_DIR_PP_DIR_CRD_LWM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_PP_DIR_CRD_HWM(port->id.phys_id),
+		   DLB_CHP_DIR_PP_DIR_CRD_HWM_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_DIR_PP2POOL(port->id.phys_id),
+		   DLB_CHP_DIR_DIR_PP2POOL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2LDBPOOL(port->id.phys_id),
+		   DLB_SYS_DIR_PP2LDBPOOL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2DIRPOOL(port->id.phys_id),
+		   DLB_SYS_DIR_PP2DIRPOOL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_CQ_WPTR(port->id.phys_id),
+		   DLB_CHP_DIR_CQ_WPTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ_DIR_TKN_DEPTH_SEL_DSI(port->id.phys_id),
+		   DLB_LSP_CQ_DIR_TKN_DEPTH_SEL_DSI_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_CQ_TKN_DEPTH_SEL(port->id.phys_id),
+		   DLB_CHP_DIR_CQ_TKN_DEPTH_SEL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_LSP_CQ_DIR_DSBL(port->id.phys_id),
+		   DLB_LSP_CQ_DIR_DSBL_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_CQ_WPTR(port->id.phys_id),
+		   DLB_CHP_DIR_CQ_WPTR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_CQ_INT_DEPTH_THRSH(port->id.phys_id),
+		   DLB_CHP_DIR_CQ_INT_DEPTH_THRSH_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_CQ_TMR_THRESHOLD(port->id.phys_id),
+		   DLB_CHP_DIR_CQ_TMR_THRESHOLD_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_CHP_DIR_CQ_INT_ENB(port->id.phys_id),
+		   DLB_CHP_DIR_CQ_INT_ENB_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_CQ2VF_PF(port->id.phys_id),
+		   DLB_SYS_DIR_CQ2VF_PF_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2VF_PF(port->id.phys_id),
+		   DLB_SYS_DIR_PP2VF_PF_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_CQ_ADDR_L(port->id.phys_id),
+		   DLB_SYS_DIR_CQ_ADDR_L_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_CQ_ADDR_U(port->id.phys_id),
+		   DLB_SYS_DIR_CQ_ADDR_U_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP_ADDR_L(port->id.phys_id),
+		   DLB_SYS_DIR_PP_ADDR_L_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP_ADDR_U(port->id.phys_id),
+		   DLB_SYS_DIR_PP_ADDR_U_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP_V(port->id.phys_id),
+		   DLB_SYS_DIR_PP_V_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_PP2VAS(port->id.phys_id),
+		   DLB_SYS_DIR_PP2VAS_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_DIR_CQ_ISR(port->id.phys_id),
+		   DLB_SYS_DIR_CQ_ISR_RST);
+
+	DLB_CSR_WR(hw,
+		   DLB_SYS_WBUF_DIR_FLAGS(port->id.phys_id),
+		   DLB_SYS_WBUF_DIR_FLAGS_RST);
+}
+
+static void dlb_domain_reset_dir_port_registers(struct dlb_hw *hw,
+						struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *port;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter)
+		__dlb_domain_reset_dir_port_registers(hw, port);
+}
+
+static void dlb_domain_reset_ldb_queue_registers(struct dlb_hw *hw,
+						 struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_queue *queue;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_queues, queue, iter) {
+		DLB_CSR_WR(hw,
+			   DLB_AQED_PIPE_FL_LIM(queue->id.phys_id),
+			   DLB_AQED_PIPE_FL_LIM_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_AQED_PIPE_FL_BASE(queue->id.phys_id),
+			   DLB_AQED_PIPE_FL_BASE_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_AQED_PIPE_FL_POP_PTR(queue->id.phys_id),
+			   DLB_AQED_PIPE_FL_POP_PTR_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_AQED_PIPE_FL_PUSH_PTR(queue->id.phys_id),
+			   DLB_AQED_PIPE_FL_PUSH_PTR_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_AQED_PIPE_QID_FID_LIM(queue->id.phys_id),
+			   DLB_AQED_PIPE_QID_FID_LIM_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_LSP_QID_AQED_ACTIVE_LIM(queue->id.phys_id),
+			   DLB_LSP_QID_AQED_ACTIVE_LIM_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_LSP_QID_LDB_INFL_LIM(queue->id.phys_id),
+			   DLB_LSP_QID_LDB_INFL_LIM_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_SYS_LDB_QID_V(queue->id.phys_id),
+			   DLB_SYS_LDB_QID_V_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_SYS_LDB_QID_V(queue->id.phys_id),
+			   DLB_SYS_LDB_QID_V_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_ORD_QID_SN(queue->id.phys_id),
+			   DLB_CHP_ORD_QID_SN_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_ORD_QID_SN_MAP(queue->id.phys_id),
+			   DLB_CHP_ORD_QID_SN_MAP_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_RO_PIPE_QID2GRPSLT(queue->id.phys_id),
+			   DLB_RO_PIPE_QID2GRPSLT_RST);
+	}
+}
+
+static void dlb_domain_reset_dir_queue_registers(struct dlb_hw *hw,
+						 struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *queue;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, queue, iter) {
+		DLB_CSR_WR(hw,
+			   DLB_SYS_DIR_QID_V(queue->id.phys_id),
+			   DLB_SYS_DIR_QID_V_RST);
+	}
+}
+
+static void dlb_domain_reset_ldb_pool_registers(struct dlb_hw *hw,
+						struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_credit_pool *pool;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_credit_pools, pool, iter) {
+		DLB_CSR_WR(hw,
+			   DLB_CHP_LDB_POOL_CRD_LIM(pool->id.phys_id),
+			   DLB_CHP_LDB_POOL_CRD_LIM_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_LDB_POOL_CRD_CNT(pool->id.phys_id),
+			   DLB_CHP_LDB_POOL_CRD_CNT_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_QED_FL_BASE(pool->id.phys_id),
+			   DLB_CHP_QED_FL_BASE_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_QED_FL_LIM(pool->id.phys_id),
+			   DLB_CHP_QED_FL_LIM_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_QED_FL_PUSH_PTR(pool->id.phys_id),
+			   DLB_CHP_QED_FL_PUSH_PTR_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_QED_FL_POP_PTR(pool->id.phys_id),
+			   DLB_CHP_QED_FL_POP_PTR_RST);
+	}
+}
+
+static void dlb_domain_reset_dir_pool_registers(struct dlb_hw *hw,
+						struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_credit_pool *pool;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_credit_pools, pool, iter) {
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DIR_POOL_CRD_LIM(pool->id.phys_id),
+			   DLB_CHP_DIR_POOL_CRD_LIM_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DIR_POOL_CRD_CNT(pool->id.phys_id),
+			   DLB_CHP_DIR_POOL_CRD_CNT_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DQED_FL_BASE(pool->id.phys_id),
+			   DLB_CHP_DQED_FL_BASE_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DQED_FL_LIM(pool->id.phys_id),
+			   DLB_CHP_DQED_FL_LIM_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DQED_FL_PUSH_PTR(pool->id.phys_id),
+			   DLB_CHP_DQED_FL_PUSH_PTR_RST);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DQED_FL_POP_PTR(pool->id.phys_id),
+			   DLB_CHP_DQED_FL_POP_PTR_RST);
+	}
+}
+
+static void dlb_domain_reset_registers(struct dlb_hw *hw,
+				       struct dlb_domain *domain)
+{
+	dlb_domain_reset_ldb_port_registers(hw, domain);
+
+	dlb_domain_reset_dir_port_registers(hw, domain);
+
+	dlb_domain_reset_ldb_queue_registers(hw, domain);
+
+	dlb_domain_reset_dir_queue_registers(hw, domain);
+
+	dlb_domain_reset_ldb_pool_registers(hw, domain);
+
+	dlb_domain_reset_dir_pool_registers(hw, domain);
+}
+
+static int dlb_domain_drain_ldb_cqs(struct dlb_hw *hw,
+				    struct dlb_domain *domain,
+				    bool toggle_port)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+	int ret;
+
+	/* If the domain hasn't been started, there's no traffic to drain */
+	if (!domain->started)
+		return 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		if (toggle_port)
+			dlb_ldb_port_cq_disable(hw, port);
+
+		ret = dlb_drain_ldb_cq(hw, port);
+		if (ret < 0)
+			return ret;
+
+		if (toggle_port)
+			dlb_ldb_port_cq_enable(hw, port);
+	}
+
+	return 0;
+}
+
+static bool dlb_domain_mapped_queues_empty(struct dlb_hw *hw,
+					   struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_queue *queue;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_queues, queue, iter) {
+		if (queue->num_mappings == 0)
+			continue;
+
+		if (!dlb_ldb_queue_is_empty(hw, queue))
+			return false;
+	}
+
+	return true;
+}
+
+static int dlb_domain_drain_mapped_queues(struct dlb_hw *hw,
+					  struct dlb_domain *domain)
+{
+	int i, ret;
+
+	/* If the domain hasn't been started, there's no traffic to drain */
+	if (!domain->started)
+		return 0;
+
+	if (domain->num_pending_removals > 0) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: failed to unmap domain queues\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	for (i = 0; i < DLB_MAX_QID_EMPTY_CHECK_LOOPS; i++) {
+		ret = dlb_domain_drain_ldb_cqs(hw, domain, true);
+		if (ret < 0)
+			return ret;
+
+		if (dlb_domain_mapped_queues_empty(hw, domain))
+			break;
+	}
+
+	if (i == DLB_MAX_QID_EMPTY_CHECK_LOOPS) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: failed to empty queues\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	/* Drain the CQs one more time. For the queues to go empty, they would
+	 * have scheduled one or more QEs.
+	 */
+	ret = dlb_domain_drain_ldb_cqs(hw, domain, true);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int dlb_domain_drain_unmapped_queue(struct dlb_hw *hw,
+					   struct dlb_domain *domain,
+					   struct dlb_ldb_queue *queue)
+{
+	struct dlb_ldb_port *port;
+	int ret;
+
+	/* If a domain has LDB queues, it must have LDB ports */
+	if (dlb_list_empty(&domain->used_ldb_ports)) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: No configured LDB ports\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	port = DLB_DOM_LIST_HEAD(domain->used_ldb_ports, typeof(*port));
+
+	/* If necessary, free up a QID slot in this CQ */
+	if (port->num_mappings == DLB_MAX_NUM_QIDS_PER_LDB_CQ) {
+		struct dlb_ldb_queue *mapped_queue;
+
+		mapped_queue = &hw->rsrcs.ldb_queues[port->qid_map[0].qid];
+
+		ret = dlb_ldb_port_unmap_qid(hw, port, mapped_queue);
+		if (ret)
+			return ret;
+	}
+
+	ret = dlb_ldb_port_map_qid_dynamic(hw, port, queue, 0);
+	if (ret)
+		return ret;
+
+	return dlb_domain_drain_mapped_queues(hw, domain);
+}
+
+static int dlb_domain_drain_unmapped_queues(struct dlb_hw *hw,
+					    struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_queue *queue;
+	int ret;
+
+	/* If the domain hasn't been started, there's no traffic to drain */
+	if (!domain->started)
+		return 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_queues, queue, iter) {
+		if (queue->num_mappings != 0 ||
+		    dlb_ldb_queue_is_empty(hw, queue))
+			continue;
+
+		ret = dlb_domain_drain_unmapped_queue(hw, domain, queue);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static void dlb_drain_dir_cq(struct dlb_hw *hw, struct dlb_dir_pq_pair *port)
+{
+	unsigned int port_id = port->id.phys_id;
+	u32 cnt;
+
+	/* Return any outstanding tokens */
+	cnt = dlb_dir_cq_token_count(hw, port);
+
+	if (cnt != 0) {
+		struct dlb_hcw hcw_mem[8], *hcw;
+		void  *pp_addr;
+
+		pp_addr = os_map_producer_port(hw, port_id, false);
+
+		/* Point hcw to a 64B-aligned location */
+		hcw = (struct dlb_hcw *)((uintptr_t)&hcw_mem[4] & ~0x3F);
+
+		/* Program the first HCW for a batch token return and
+		 * the rest as NOOPS
+		 */
+		memset(hcw, 0, 4 * sizeof(*hcw));
+		hcw->cq_token = 1;
+		hcw->lock_id = cnt - 1;
+
+		os_enqueue_four_hcws(hw, hcw, pp_addr);
+
+		os_fence_hcw(hw, pp_addr);
+
+		os_unmap_producer_port(hw, pp_addr);
+	}
+}
+
+static int dlb_domain_drain_dir_cqs(struct dlb_hw *hw,
+				    struct dlb_domain *domain,
+				    bool toggle_port)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *port;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter) {
+		/* Can't drain a port if it's not configured, and there's
+		 * nothing to drain if its queue is unconfigured.
+		 */
+		if (!port->port_configured || !port->queue_configured)
+			continue;
+
+		if (toggle_port)
+			dlb_dir_port_cq_disable(hw, port);
+
+		dlb_drain_dir_cq(hw, port);
+
+		if (toggle_port)
+			dlb_dir_port_cq_enable(hw, port);
+	}
+
+	return 0;
+}
+
+static bool dlb_domain_dir_queues_empty(struct dlb_hw *hw,
+					struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *queue;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, queue, iter) {
+		if (!dlb_dir_queue_is_empty(hw, queue))
+			return false;
+	}
+
+	return true;
+}
+
+static int dlb_domain_drain_dir_queues(struct dlb_hw *hw,
+				       struct dlb_domain *domain)
+{
+	int i;
+
+	/* If the domain hasn't been started, there's no traffic to drain */
+	if (!domain->started)
+		return 0;
+
+	for (i = 0; i < DLB_MAX_QID_EMPTY_CHECK_LOOPS; i++) {
+		dlb_domain_drain_dir_cqs(hw, domain, true);
+
+		if (dlb_domain_dir_queues_empty(hw, domain))
+			break;
+	}
+
+	if (i == DLB_MAX_QID_EMPTY_CHECK_LOOPS) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: failed to empty queues\n",
+			   __func__);
+		return -EFAULT;
+	}
+
+	/* Drain the CQs one more time. For the queues to go empty, they would
+	 * have scheduled one or more QEs.
+	 */
+	dlb_domain_drain_dir_cqs(hw, domain, true);
+
+	return 0;
+}
+
+static void dlb_domain_disable_dir_producer_ports(struct dlb_hw *hw,
+						  struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *port;
+	union dlb_sys_dir_pp_v r1;
+
+	r1.field.pp_v = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter)
+		DLB_CSR_WR(hw,
+			   DLB_SYS_DIR_PP_V(port->id.phys_id),
+			   r1.val);
+}
+
+static void dlb_domain_disable_ldb_producer_ports(struct dlb_hw *hw,
+						  struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_sys_ldb_pp_v r1;
+	struct dlb_ldb_port *port;
+
+	r1.field.pp_v = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		DLB_CSR_WR(hw,
+			   DLB_SYS_LDB_PP_V(port->id.phys_id),
+			   r1.val);
+
+		hw->pf.num_enabled_ldb_ports--;
+	}
+}
+
+static void dlb_domain_disable_dir_vpps(struct dlb_hw *hw,
+					struct dlb_domain *domain,
+					unsigned int vf_id)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_sys_vf_dir_vpp_v r1;
+	struct dlb_dir_pq_pair *port;
+
+	r1.field.vpp_v = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter) {
+		unsigned int offs;
+
+		offs = vf_id * DLB_MAX_NUM_DIR_PORTS + port->id.virt_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_DIR_VPP_V(offs), r1.val);
+	}
+}
+
+static void dlb_domain_disable_ldb_vpps(struct dlb_hw *hw,
+					struct dlb_domain *domain,
+					unsigned int vf_id)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_sys_vf_ldb_vpp_v r1;
+	struct dlb_ldb_port *port;
+
+	r1.field.vpp_v = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		unsigned int offs;
+
+		offs = vf_id * DLB_MAX_NUM_LDB_PORTS + port->id.virt_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_VF_LDB_VPP_V(offs), r1.val);
+	}
+}
+
+static void dlb_domain_disable_dir_pools(struct dlb_hw *hw,
+					 struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_sys_dir_pool_enbld r0 = { {0} };
+	struct dlb_credit_pool *pool;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_credit_pools, pool, iter)
+		DLB_CSR_WR(hw,
+			   DLB_SYS_DIR_POOL_ENBLD(pool->id.phys_id),
+			   r0.val);
+}
+
+static void dlb_domain_disable_ldb_pools(struct dlb_hw *hw,
+					 struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_sys_ldb_pool_enbld r0 = { {0} };
+	struct dlb_credit_pool *pool;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_credit_pools, pool, iter)
+		DLB_CSR_WR(hw,
+			   DLB_SYS_LDB_POOL_ENBLD(pool->id.phys_id),
+			   r0.val);
+}
+
+static void dlb_domain_disable_ldb_seq_checks(struct dlb_hw *hw,
+					      struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_chp_sn_chk_enbl r1;
+	struct dlb_ldb_port *port;
+
+	r1.field.en = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter)
+		DLB_CSR_WR(hw,
+			   DLB_CHP_SN_CHK_ENBL(port->id.phys_id),
+			   r1.val);
+}
+
+static void dlb_domain_disable_ldb_port_crd_updates(struct dlb_hw *hw,
+						    struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_chp_ldb_pp_crd_req_state r0;
+	struct dlb_ldb_port *port;
+
+	r0.field.no_pp_credit_update = 1;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter)
+		DLB_CSR_WR(hw,
+			   DLB_CHP_LDB_PP_CRD_REQ_STATE(port->id.phys_id),
+			   r0.val);
+}
+
+static void dlb_domain_disable_ldb_port_interrupts(struct dlb_hw *hw,
+						   struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_chp_ldb_cq_int_enb r0 = { {0} };
+	union dlb_chp_ldb_cq_wd_enb r1 = { {0} };
+	struct dlb_ldb_port *port;
+
+	r0.field.en_tim = 0;
+	r0.field.en_depth = 0;
+
+	r1.field.wd_enable = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		DLB_CSR_WR(hw,
+			   DLB_CHP_LDB_CQ_INT_ENB(port->id.phys_id),
+			   r0.val);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_LDB_CQ_WD_ENB(port->id.phys_id),
+			   r1.val);
+	}
+}
+
+static void dlb_domain_disable_dir_port_interrupts(struct dlb_hw *hw,
+						   struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_chp_dir_cq_int_enb r0 = { {0} };
+	union dlb_chp_dir_cq_wd_enb r1 = { {0} };
+	struct dlb_dir_pq_pair *port;
+
+	r0.field.en_tim = 0;
+	r0.field.en_depth = 0;
+
+	r1.field.wd_enable = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter) {
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DIR_CQ_INT_ENB(port->id.phys_id),
+			   r0.val);
+
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DIR_CQ_WD_ENB(port->id.phys_id),
+			   r1.val);
+	}
+}
+
+static void dlb_domain_disable_dir_port_crd_updates(struct dlb_hw *hw,
+						    struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_chp_dir_pp_crd_req_state r0;
+	struct dlb_dir_pq_pair *port;
+
+	r0.field.no_pp_credit_update = 1;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter)
+		DLB_CSR_WR(hw,
+			   DLB_CHP_DIR_PP_CRD_REQ_STATE(port->id.phys_id),
+			   r0.val);
+}
+
+static void dlb_domain_disable_ldb_queue_write_perms(struct dlb_hw *hw,
+						     struct dlb_domain *domain)
+{
+	int domain_offset = domain->id.phys_id * DLB_MAX_NUM_LDB_QUEUES;
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_sys_ldb_vasqid_v r0;
+	struct dlb_ldb_queue *queue;
+
+	r0.field.vasqid_v = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_queues, queue, iter) {
+		int idx = domain_offset + queue->id.phys_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_LDB_VASQID_V(idx), r0.val);
+	}
+}
+
+static void dlb_domain_disable_dir_queue_write_perms(struct dlb_hw *hw,
+						     struct dlb_domain *domain)
+{
+	int domain_offset = domain->id.phys_id * DLB_MAX_NUM_DIR_PORTS;
+	struct dlb_list_entry *iter __attribute__((unused));
+	union dlb_sys_dir_vasqid_v r0;
+	struct dlb_dir_pq_pair *port;
+
+	r0.field.vasqid_v = 0;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter) {
+		int idx = domain_offset + port->id.phys_id;
+
+		DLB_CSR_WR(hw, DLB_SYS_DIR_VASQID_V(idx), r0.val);
+	}
+}
+
+static void dlb_domain_disable_dir_cqs(struct dlb_hw *hw,
+				       struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *port;
+
+	DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, port, iter) {
+		port->enabled = false;
+
+		dlb_dir_port_cq_disable(hw, port);
+	}
+}
+
+static void dlb_domain_disable_ldb_cqs(struct dlb_hw *hw,
+				       struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		port->enabled = false;
+
+		dlb_ldb_port_cq_disable(hw, port);
+	}
+}
+
+static void dlb_domain_enable_ldb_cqs(struct dlb_hw *hw,
+				      struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_ldb_port *port;
+
+	DLB_DOM_LIST_FOR(domain->used_ldb_ports, port, iter) {
+		port->enabled = true;
+
+		dlb_ldb_port_cq_enable(hw, port);
+	}
+}
+
+static int dlb_domain_wait_for_ldb_pool_refill(struct dlb_hw *hw,
+					       struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_credit_pool *pool;
+
+	/* Confirm that all credits are returned to the domain's credit pools */
+	DLB_DOM_LIST_FOR(domain->used_ldb_credit_pools, pool, iter) {
+		union dlb_chp_qed_fl_push_ptr r0;
+		union dlb_chp_qed_fl_pop_ptr r1;
+		unsigned long pop_offs, push_offs;
+		int i;
+
+		push_offs = DLB_CHP_QED_FL_PUSH_PTR(pool->id.phys_id);
+		pop_offs = DLB_CHP_QED_FL_POP_PTR(pool->id.phys_id);
+
+		for (i = 0; i < DLB_MAX_QID_EMPTY_CHECK_LOOPS; i++) {
+			r0.val = DLB_CSR_RD(hw, push_offs);
+
+			r1.val = DLB_CSR_RD(hw, pop_offs);
+
+			/* Break early if the freelist is replenished */
+			if (r1.field.pop_ptr == r0.field.push_ptr &&
+			    r1.field.generation != r0.field.generation) {
+				break;
+			}
+		}
+
+		/* Error if the freelist is not full */
+		if (r1.field.pop_ptr != r0.field.push_ptr ||
+		    r1.field.generation == r0.field.generation) {
+			return -EFAULT;
+		}
+	}
+
+	return 0;
+}
+
+static int dlb_domain_wait_for_dir_pool_refill(struct dlb_hw *hw,
+					       struct dlb_domain *domain)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_credit_pool *pool;
+
+	/* Confirm that all credits are returned to the domain's credit pools */
+	DLB_DOM_LIST_FOR(domain->used_dir_credit_pools, pool, iter) {
+		union dlb_chp_dqed_fl_push_ptr r0;
+		union dlb_chp_dqed_fl_pop_ptr r1;
+		unsigned long pop_offs, push_offs;
+		int i;
+
+		push_offs = DLB_CHP_DQED_FL_PUSH_PTR(pool->id.phys_id);
+		pop_offs = DLB_CHP_DQED_FL_POP_PTR(pool->id.phys_id);
+
+		for (i = 0; i < DLB_MAX_QID_EMPTY_CHECK_LOOPS; i++) {
+			r0.val = DLB_CSR_RD(hw, push_offs);
+
+			r1.val = DLB_CSR_RD(hw, pop_offs);
+
+			/* Break early if the freelist is replenished */
+			if (r1.field.pop_ptr == r0.field.push_ptr &&
+			    r1.field.generation != r0.field.generation) {
+				break;
+			}
+		}
+
+		/* Error if the freelist is not full */
+		if (r1.field.pop_ptr != r0.field.push_ptr ||
+		    r1.field.generation == r0.field.generation) {
+			return -EFAULT;
+		}
+	}
+
+	return 0;
+}
+
+static void dlb_log_reset_domain(struct dlb_hw *hw,
+				 u32 domain_id,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	DLB_HW_INFO(hw, "DLB reset domain:\n");
+	if (vf_request)
+		DLB_HW_INFO(hw, "(Request from VF %d)\n", vf_id);
+	DLB_HW_INFO(hw, "\tDomain ID: %d\n", domain_id);
+}
+
+/**
+ * dlb_reset_domain() - Reset a DLB scheduling domain and its associated
+ *	hardware resources.
+ * @hw:	  Contains the current state of the DLB hardware.
+ * @args: User-provided arguments.
+ * @resp: Response to user.
+ *
+ * Note: User software *must* stop sending to this domain's producer ports
+ * before invoking this function, otherwise undefined behavior will result.
+ *
+ * Return: returns < 0 on error, 0 otherwise.
+ */
+int dlb_reset_domain(struct dlb_hw *hw,
+		     u32 domain_id,
+		     bool vf_request,
+		     unsigned int vf_id)
+{
+	struct dlb_domain *domain;
+	int ret;
+
+	dlb_log_reset_domain(hw, domain_id, vf_request, vf_id);
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain || !domain->configured)
+		return -EINVAL;
+
+	if (vf_request) {
+		dlb_domain_disable_dir_vpps(hw, domain, vf_id);
+
+		dlb_domain_disable_ldb_vpps(hw, domain, vf_id);
+	}
+
+	/* For each queue owned by this domain, disable its write permissions to
+	 * cause any traffic sent to it to be dropped. Well-behaved software
+	 * should not be sending QEs at this point.
+	 */
+	dlb_domain_disable_dir_queue_write_perms(hw, domain);
+
+	dlb_domain_disable_ldb_queue_write_perms(hw, domain);
+
+	/* Disable credit updates and turn off completion tracking on all the
+	 * domain's PPs.
+	 */
+	dlb_domain_disable_dir_port_crd_updates(hw, domain);
+
+	dlb_domain_disable_ldb_port_crd_updates(hw, domain);
+
+	dlb_domain_disable_dir_port_interrupts(hw, domain);
+
+	dlb_domain_disable_ldb_port_interrupts(hw, domain);
+
+	dlb_domain_disable_ldb_seq_checks(hw, domain);
+
+	/* Disable the LDB CQs and drain them in order to complete the map and
+	 * unmap procedures, which require zero CQ inflights and zero QID
+	 * inflights respectively.
+	 */
+	dlb_domain_disable_ldb_cqs(hw, domain);
+
+	ret = dlb_domain_drain_ldb_cqs(hw, domain, false);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_domain_wait_for_ldb_cqs_to_empty(hw, domain);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_domain_finish_unmap_qid_procedures(hw, domain);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_domain_finish_map_qid_procedures(hw, domain);
+	if (ret < 0)
+		return ret;
+
+	/* Re-enable the CQs in order to drain the mapped queues. */
+	dlb_domain_enable_ldb_cqs(hw, domain);
+
+	ret = dlb_domain_drain_mapped_queues(hw, domain);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_domain_drain_unmapped_queues(hw, domain);
+	if (ret < 0)
+		return ret;
+
+	ret = dlb_domain_wait_for_ldb_pool_refill(hw, domain);
+	if (ret) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: LDB credits failed to refill\n",
+			   __func__);
+		return ret;
+	}
+
+	/* Done draining LDB QEs, so disable the CQs. */
+	dlb_domain_disable_ldb_cqs(hw, domain);
+
+	/* Directed queues are reset in dlb_domain_reset_hw_resources(), but
+	 * that process doesn't decrement the directed queue size counters used
+	 * by SMON for its average DQED depth measurement. So, we manually drain
+	 * the directed queues here.
+	 */
+	dlb_domain_drain_dir_queues(hw, domain);
+
+	ret = dlb_domain_wait_for_dir_pool_refill(hw, domain);
+	if (ret) {
+		DLB_HW_ERR(hw,
+			   "[%s()] Internal error: DIR credits failed to refill\n",
+			   __func__);
+		return ret;
+	}
+
+	/* Done draining DIR QEs, so disable the CQs. */
+	dlb_domain_disable_dir_cqs(hw, domain);
+
+	dlb_domain_disable_dir_producer_ports(hw, domain);
+
+	dlb_domain_disable_ldb_producer_ports(hw, domain);
+
+	dlb_domain_disable_dir_pools(hw, domain);
+
+	dlb_domain_disable_ldb_pools(hw, domain);
+
+	/* Reset the QID, credit pool, and CQ hardware.
+	 *
+	 * Note: DLB 1.0 A0 h/w does not disarm CQ interrupts during VAS reset.
+	 * A spurious interrupt can occur on subsequent use of a reset CQ.
+	 */
+	ret = dlb_domain_reset_hw_resources(hw, domain);
+	if (ret)
+		return ret;
+
+	ret = dlb_domain_verify_reset_success(hw, domain);
+	if (ret)
+		return ret;
+
+	dlb_domain_reset_registers(hw, domain);
+
+	/* Hardware reset complete. Reset the domain's software state */
+	ret = dlb_domain_reset_software_state(hw, domain);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+int dlb_reset_vf(struct dlb_hw *hw, unsigned int vf_id)
+{
+	struct dlb_domain *domain, *next __attribute__((unused));
+	struct dlb_list_entry *it1 __attribute__((unused));
+	struct dlb_list_entry *it2 __attribute__((unused));
+	struct dlb_function_resources *rsrcs;
+
+	if (vf_id >= DLB_MAX_NUM_VFS) {
+		DLB_HW_ERR(hw, "[%s()] Internal error: invalid VF ID %d\n",
+			   __func__, vf_id);
+		return -EFAULT;
+	}
+
+	rsrcs = &hw->vf[vf_id];
+
+	DLB_FUNC_LIST_FOR_SAFE(rsrcs->used_domains, domain, next, it1, it2) {
+		int ret = dlb_reset_domain(hw,
+					   domain->id.virt_id,
+					   true,
+					   vf_id);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int dlb_ldb_port_owned_by_domain(struct dlb_hw *hw,
+				 u32 domain_id,
+				 u32 port_id,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	struct dlb_ldb_port *port;
+	struct dlb_domain *domain;
+
+	if (vf_request && vf_id >= DLB_MAX_NUM_VFS)
+		return -1;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain || !domain->configured)
+		return -EINVAL;
+
+	port = dlb_get_domain_ldb_port(port_id, vf_request, domain);
+
+	if (!port)
+		return -EINVAL;
+
+	return port->domain_id.phys_id == domain->id.phys_id;
+}
+
+int dlb_dir_port_owned_by_domain(struct dlb_hw *hw,
+				 u32 domain_id,
+				 u32 port_id,
+				 bool vf_request,
+				 unsigned int vf_id)
+{
+	struct dlb_dir_pq_pair *port;
+	struct dlb_domain *domain;
+
+	if (vf_request && vf_id >= DLB_MAX_NUM_VFS)
+		return -1;
+
+	domain = dlb_get_domain_from_id(hw, domain_id, vf_request, vf_id);
+
+	if (!domain || !domain->configured)
+		return -EINVAL;
+
+	port = dlb_get_domain_dir_pq(port_id, vf_request, domain);
+
+	if (!port)
+		return -EINVAL;
+
+	return port->domain_id.phys_id == domain->id.phys_id;
+}
+
+int dlb_hw_get_num_resources(struct dlb_hw *hw,
+			     struct dlb_get_num_resources_args *arg,
+			     bool vf_request,
+			     unsigned int vf_id)
+{
+	struct dlb_function_resources *rsrcs;
+	struct dlb_bitmap *map;
+
+	if (vf_request && vf_id >= DLB_MAX_NUM_VFS)
+		return -1;
+
+	if (vf_request)
+		rsrcs = &hw->vf[vf_id];
+	else
+		rsrcs = &hw->pf;
+
+	arg->num_sched_domains = rsrcs->num_avail_domains;
+
+	arg->num_ldb_queues = rsrcs->num_avail_ldb_queues;
+
+	arg->num_ldb_ports = rsrcs->num_avail_ldb_ports;
+
+	arg->num_dir_ports = rsrcs->num_avail_dir_pq_pairs;
+
+	map = rsrcs->avail_aqed_freelist_entries;
+
+	arg->num_atomic_inflights = dlb_bitmap_count(map);
+
+	arg->max_contiguous_atomic_inflights =
+		dlb_bitmap_longest_set_range(map);
+
+	map = rsrcs->avail_hist_list_entries;
+
+	arg->num_hist_list_entries = dlb_bitmap_count(map);
+
+	arg->max_contiguous_hist_list_entries =
+		dlb_bitmap_longest_set_range(map);
+
+	map = rsrcs->avail_qed_freelist_entries;
+
+	arg->num_ldb_credits = dlb_bitmap_count(map);
+
+	arg->max_contiguous_ldb_credits = dlb_bitmap_longest_set_range(map);
+
+	map = rsrcs->avail_dqed_freelist_entries;
+
+	arg->num_dir_credits = dlb_bitmap_count(map);
+
+	arg->max_contiguous_dir_credits = dlb_bitmap_longest_set_range(map);
+
+	arg->num_ldb_credit_pools = rsrcs->num_avail_ldb_credit_pools;
+
+	arg->num_dir_credit_pools = rsrcs->num_avail_dir_credit_pools;
+
+	return 0;
+}
+
+int dlb_hw_get_num_used_resources(struct dlb_hw *hw,
+				  struct dlb_get_num_resources_args *arg,
+				  bool vf_request,
+				  unsigned int vf_id)
+{
+	struct dlb_list_entry *iter1 __attribute__((unused));
+	struct dlb_list_entry *iter2 __attribute__((unused));
+	struct dlb_function_resources *rsrcs;
+	struct dlb_domain *domain;
+
+	if (vf_request && vf_id >= DLB_MAX_NUM_VFS)
+		return -1;
+
+	rsrcs = (vf_request) ? &hw->vf[vf_id] : &hw->pf;
+
+	memset(arg, 0, sizeof(*arg));
+
+	DLB_FUNC_LIST_FOR(rsrcs->used_domains, domain, iter1) {
+		struct dlb_dir_pq_pair *dir_port;
+		struct dlb_ldb_port *ldb_port;
+		struct dlb_credit_pool *pool;
+		struct dlb_ldb_queue *queue;
+
+		arg->num_sched_domains++;
+
+		arg->num_atomic_inflights +=
+			domain->aqed_freelist.bound -
+			domain->aqed_freelist.base;
+
+		DLB_DOM_LIST_FOR(domain->used_ldb_queues, queue, iter2)
+			arg->num_ldb_queues++;
+		DLB_DOM_LIST_FOR(domain->avail_ldb_queues, queue, iter2)
+			arg->num_ldb_queues++;
+
+		DLB_DOM_LIST_FOR(domain->used_ldb_ports, ldb_port, iter2)
+			arg->num_ldb_ports++;
+		DLB_DOM_LIST_FOR(domain->avail_ldb_ports, ldb_port, iter2)
+			arg->num_ldb_ports++;
+
+		DLB_DOM_LIST_FOR(domain->used_dir_pq_pairs, dir_port, iter2)
+			arg->num_dir_ports++;
+		DLB_DOM_LIST_FOR(domain->avail_dir_pq_pairs, dir_port, iter2)
+			arg->num_dir_ports++;
+
+		arg->num_ldb_credits +=
+			domain->qed_freelist.bound -
+			domain->qed_freelist.base;
+
+		DLB_DOM_LIST_FOR(domain->avail_ldb_credit_pools, pool, iter2)
+			arg->num_ldb_credit_pools++;
+		DLB_DOM_LIST_FOR(domain->used_ldb_credit_pools, pool, iter2) {
+			arg->num_ldb_credit_pools++;
+			arg->num_ldb_credits += pool->total_credits;
+		}
+
+		arg->num_dir_credits +=
+			domain->dqed_freelist.bound -
+			domain->dqed_freelist.base;
+
+		DLB_DOM_LIST_FOR(domain->avail_dir_credit_pools, pool, iter2)
+			arg->num_dir_credit_pools++;
+		DLB_DOM_LIST_FOR(domain->used_dir_credit_pools, pool, iter2) {
+			arg->num_dir_credit_pools++;
+			arg->num_dir_credits += pool->total_credits;
+		}
+
+		arg->num_hist_list_entries += domain->total_hist_list_entries;
+	}
+
+	return 0;
+}
+
+static inline bool dlb_ldb_port_owned_by_vf(struct dlb_hw *hw,
+					    u32 vf_id,
+					    u32 port_id)
+{
+	return (hw->rsrcs.ldb_ports[port_id].id.vf_owned &&
+		hw->rsrcs.ldb_ports[port_id].id.vf_id == vf_id);
+}
+
+static inline bool dlb_dir_port_owned_by_vf(struct dlb_hw *hw,
+					    u32 vf_id,
+					    u32 port_id)
+{
+	return (hw->rsrcs.dir_pq_pairs[port_id].id.vf_owned &&
+		hw->rsrcs.dir_pq_pairs[port_id].id.vf_id == vf_id);
+}
+
+void dlb_send_async_pf_to_vf_msg(struct dlb_hw *hw, unsigned int vf_id)
+{
+	union dlb_func_pf_pf2vf_mailbox_isr r0 = { {0} };
+
+	r0.field.isr = 1 << vf_id;
+
+	DLB_FUNC_WR(hw, DLB_FUNC_PF_PF2VF_MAILBOX_ISR(0), r0.val);
+}
+
+bool dlb_pf_to_vf_complete(struct dlb_hw *hw, unsigned int vf_id)
+{
+	union dlb_func_pf_pf2vf_mailbox_isr r0;
+
+	r0.val = DLB_FUNC_RD(hw, DLB_FUNC_PF_PF2VF_MAILBOX_ISR(vf_id));
+
+	return (r0.val & (1 << vf_id)) == 0;
+}
+
+void dlb_send_async_vf_to_pf_msg(struct dlb_hw *hw)
+{
+	union dlb_func_vf_vf2pf_mailbox_isr r0 = { {0} };
+
+	r0.field.isr = 1;
+	DLB_FUNC_WR(hw, DLB_FUNC_VF_VF2PF_MAILBOX_ISR, r0.val);
+}
+
+bool dlb_vf_to_pf_complete(struct dlb_hw *hw)
+{
+	union dlb_func_vf_vf2pf_mailbox_isr r0;
+
+	r0.val = DLB_FUNC_RD(hw, DLB_FUNC_VF_VF2PF_MAILBOX_ISR);
+
+	return (r0.field.isr == 0);
+}
+
+bool dlb_vf_flr_complete(struct dlb_hw *hw)
+{
+	union dlb_func_vf_vf_reset_in_progress r0;
+
+	r0.val = DLB_FUNC_RD(hw, DLB_FUNC_VF_VF_RESET_IN_PROGRESS);
+
+	return (r0.field.reset_in_progress == 0);
+}
+
+int dlb_pf_read_vf_mbox_req(struct dlb_hw *hw,
+			    unsigned int vf_id,
+			    void *data,
+			    int len)
+{
+	u32 buf[DLB_VF2PF_REQ_BYTES / 4];
+	int num_words;
+	int i;
+
+	if (len > DLB_VF2PF_REQ_BYTES) {
+		DLB_HW_ERR(hw, "[%s()] len (%d) > VF->PF mailbox req size\n",
+			   __func__, len);
+		return -EINVAL;
+	}
+
+	if (len == 0) {
+		DLB_HW_ERR(hw, "[%s()] invalid len (0)\n", __func__);
+		return -EINVAL;
+	}
+
+	/* Round up len to the nearest 4B boundary, since the mailbox registers
+	 * are 32b wide.
+	 */
+	num_words = len / 4;
+	if (len % 4 != 0)
+		num_words++;
+
+	for (i = 0; i < num_words; i++) {
+		u32 idx = i + DLB_VF2PF_REQ_BASE_WORD;
+
+		buf[i] = DLB_FUNC_RD(hw, DLB_FUNC_PF_VF2PF_MAILBOX(vf_id, idx));
+	}
+
+	memcpy(data, buf, len);
+
+	return 0;
+}
+
+int dlb_pf_read_vf_mbox_resp(struct dlb_hw *hw,
+			     unsigned int vf_id,
+			     void *data,
+			     int len)
+{
+	u32 buf[DLB_VF2PF_RESP_BYTES / 4];
+	int num_words;
+	int i;
+
+	if (len > DLB_VF2PF_RESP_BYTES) {
+		DLB_HW_ERR(hw, "[%s()] len (%d) > VF->PF mailbox resp size\n",
+			   __func__, len);
+		return -EINVAL;
+	}
+
+	/* Round up len to the nearest 4B boundary, since the mailbox registers
+	 * are 32b wide.
+	 */
+	num_words = len / 4;
+	if (len % 4 != 0)
+		num_words++;
+
+	for (i = 0; i < num_words; i++) {
+		u32 idx = i + DLB_VF2PF_RESP_BASE_WORD;
+
+		buf[i] = DLB_FUNC_RD(hw, DLB_FUNC_PF_VF2PF_MAILBOX(vf_id, idx));
+	}
+
+	memcpy(data, buf, len);
+
+	return 0;
+}
+
+int dlb_pf_write_vf_mbox_resp(struct dlb_hw *hw,
+			      unsigned int vf_id,
+			      void *data,
+			      int len)
+{
+	u32 buf[DLB_PF2VF_RESP_BYTES / 4];
+	int num_words;
+	int i;
+
+	if (len > DLB_PF2VF_RESP_BYTES) {
+		DLB_HW_ERR(hw, "[%s()] len (%d) > PF->VF mailbox resp size\n",
+			   __func__, len);
+		return -EINVAL;
+	}
+
+	memcpy(buf, data, len);
+
+	/* Round up len to the nearest 4B boundary, since the mailbox registers
+	 * are 32b wide.
+	 */
+	num_words = len / 4;
+	if (len % 4 != 0)
+		num_words++;
+
+	for (i = 0; i < num_words; i++) {
+		u32 idx = i + DLB_PF2VF_RESP_BASE_WORD;
+
+		DLB_FUNC_WR(hw, DLB_FUNC_PF_PF2VF_MAILBOX(vf_id, idx), buf[i]);
+	}
+
+	return 0;
+}
+
+int dlb_pf_write_vf_mbox_req(struct dlb_hw *hw,
+			     unsigned int vf_id,
+			     void *data,
+			     int len)
+{
+	u32 buf[DLB_PF2VF_REQ_BYTES / 4];
+	int num_words;
+	int i;
+
+	if (len > DLB_PF2VF_REQ_BYTES) {
+		DLB_HW_ERR(hw, "[%s()] len (%d) > PF->VF mailbox req size\n",
+			   __func__, len);
+		return -EINVAL;
+	}
+
+	memcpy(buf, data, len);
+
+	/* Round up len to the nearest 4B boundary, since the mailbox registers
+	 * are 32b wide.
+	 */
+	num_words = len / 4;
+	if (len % 4 != 0)
+		num_words++;
+
+	for (i = 0; i < num_words; i++) {
+		u32 idx = i + DLB_PF2VF_REQ_BASE_WORD;
+
+		DLB_FUNC_WR(hw, DLB_FUNC_PF_PF2VF_MAILBOX(vf_id, idx), buf[i]);
+	}
+
+	return 0;
+}
+
+int dlb_vf_read_pf_mbox_resp(struct dlb_hw *hw, void *data, int len)
+{
+	u32 buf[DLB_PF2VF_RESP_BYTES / 4];
+	int num_words;
+	int i;
+
+	if (len > DLB_PF2VF_RESP_BYTES) {
+		DLB_HW_ERR(hw, "[%s()] len (%d) > PF->VF mailbox resp size\n",
+			   __func__, len);
+		return -EINVAL;
+	}
+
+	if (len == 0) {
+		DLB_HW_ERR(hw, "[%s()] invalid len (0)\n", __func__);
+		return -EINVAL;
+	}
+
+	/* Round up len to the nearest 4B boundary, since the mailbox registers
+	 * are 32b wide.
+	 */
+	num_words = len / 4;
+	if (len % 4 != 0)
+		num_words++;
+
+	for (i = 0; i < num_words; i++) {
+		u32 idx = i + DLB_PF2VF_RESP_BASE_WORD;
+
+		buf[i] = DLB_FUNC_RD(hw, DLB_FUNC_VF_PF2VF_MAILBOX(idx));
+	}
+
+	memcpy(data, buf, len);
+
+	return 0;
+}
+
+int dlb_vf_read_pf_mbox_req(struct dlb_hw *hw, void *data, int len)
+{
+	u32 buf[DLB_PF2VF_REQ_BYTES / 4];
+	int num_words;
+	int i;
+
+	if (len > DLB_PF2VF_REQ_BYTES) {
+		DLB_HW_ERR(hw, "[%s()] len (%d) > PF->VF mailbox req size\n",
+			   __func__, len);
+		return -EINVAL;
+	}
+
+	/* Round up len to the nearest 4B boundary, since the mailbox registers
+	 * are 32b wide.
+	 */
+	num_words = len / 4;
+	if ((len % 4) != 0)
+		num_words++;
+
+	for (i = 0; i < num_words; i++) {
+		u32 idx = i + DLB_PF2VF_REQ_BASE_WORD;
+
+		buf[i] = DLB_FUNC_RD(hw, DLB_FUNC_VF_PF2VF_MAILBOX(idx));
+	}
+
+	memcpy(data, buf, len);
+
+	return 0;
+}
+
+int dlb_vf_write_pf_mbox_req(struct dlb_hw *hw, void *data, int len)
+{
+	u32 buf[DLB_VF2PF_REQ_BYTES / 4];
+	int num_words;
+	int i;
+
+	if (len > DLB_VF2PF_REQ_BYTES) {
+		DLB_HW_ERR(hw, "[%s()] len (%d) > VF->PF mailbox req size\n",
+			   __func__, len);
+		return -EINVAL;
+	}
+
+	memcpy(buf, data, len);
+
+	/* Round up len to the nearest 4B boundary, since the mailbox registers
+	 * are 32b wide.
+	 */
+	num_words = len / 4;
+	if (len % 4 != 0)
+		num_words++;
+
+	for (i = 0; i < num_words; i++) {
+		u32 idx = i + DLB_VF2PF_REQ_BASE_WORD;
+
+		DLB_FUNC_WR(hw, DLB_FUNC_VF_VF2PF_MAILBOX(idx), buf[i]);
+	}
+
+	return 0;
+}
+
+int dlb_vf_write_pf_mbox_resp(struct dlb_hw *hw, void *data, int len)
+{
+	u32 buf[DLB_VF2PF_RESP_BYTES / 4];
+	int num_words;
+	int i;
+
+	if (len > DLB_VF2PF_RESP_BYTES) {
+		DLB_HW_ERR(hw, "[%s()] len (%d) > VF->PF mailbox resp size\n",
+			   __func__, len);
+		return -EINVAL;
+	}
+
+	memcpy(buf, data, len);
+
+	/* Round up len to the nearest 4B boundary, since the mailbox registers
+	 * are 32b wide.
+	 */
+	num_words = len / 4;
+	if (len % 4 != 0)
+		num_words++;
+
+	for (i = 0; i < num_words; i++) {
+		u32 idx = i + DLB_VF2PF_RESP_BASE_WORD;
+
+		DLB_FUNC_WR(hw, DLB_FUNC_VF_VF2PF_MAILBOX(idx), buf[i]);
+	}
+
+	return 0;
+}
+
+bool dlb_vf_is_locked(struct dlb_hw *hw, unsigned int vf_id)
+{
+	return hw->vf[vf_id].locked;
+}
+
+static void dlb_vf_set_rsrc_virt_ids(struct dlb_function_resources *rsrcs,
+				     unsigned int vf_id)
+{
+	struct dlb_list_entry *iter __attribute__((unused));
+	struct dlb_dir_pq_pair *dir_port;
+	struct dlb_ldb_queue *ldb_queue;
+	struct dlb_ldb_port *ldb_port;
+	struct dlb_credit_pool *pool;
+	struct dlb_domain *domain;
+	int i;
+
+	i = 0;
+	DLB_FUNC_LIST_FOR(rsrcs->avail_domains, domain, iter) {
+		domain->id.virt_id = i;
+		domain->id.vf_owned = true;
+		domain->id.vf_id = vf_id;
+		i++;
+	}
+
+	i = 0;
+	DLB_FUNC_LIST_FOR(rsrcs->avail_ldb_queues, ldb_queue, iter) {
+		ldb_queue->id.virt_id = i;
+		ldb_queue->id.vf_owned = true;
+		ldb_queue->id.vf_id = vf_id;
+		i++;
+	}
+
+	i = 0;
+	DLB_FUNC_LIST_FOR(rsrcs->avail_ldb_ports, ldb_port, iter) {
+		ldb_port->id.virt_id = i;
+		ldb_port->id.vf_owned = true;
+		ldb_port->id.vf_id = vf_id;
+		i++;
+	}
+
+	i = 0;
+	DLB_FUNC_LIST_FOR(rsrcs->avail_dir_pq_pairs, dir_port, iter) {
+		dir_port->id.virt_id = i;
+		dir_port->id.vf_owned = true;
+		dir_port->id.vf_id = vf_id;
+		i++;
+	}
+
+	i = 0;
+	DLB_FUNC_LIST_FOR(rsrcs->avail_ldb_credit_pools, pool, iter) {
+		pool->id.virt_id = i;
+		pool->id.vf_owned = true;
+		pool->id.vf_id = vf_id;
+		i++;
+	}
+
+	i = 0;
+	DLB_FUNC_LIST_FOR(rsrcs->avail_dir_credit_pools, pool, iter) {
+		pool->id.virt_id = i;
+		pool->id.vf_owned = true;
+		pool->id.vf_id = vf_id;
+		i++;
+	}
+}
+
+void dlb_lock_vf(struct dlb_hw *hw, unsigned int vf_id)
+{
+	struct dlb_function_resources *rsrcs = &hw->vf[vf_id];
+
+	rsrcs->locked = true;
+
+	dlb_vf_set_rsrc_virt_ids(rsrcs, vf_id);
+}
+
+void dlb_unlock_vf(struct dlb_hw *hw, unsigned int vf_id)
+{
+	hw->vf[vf_id].locked = false;
+}
+
+int dlb_reset_vf_resources(struct dlb_hw *hw, unsigned int vf_id)
+{
+	if (vf_id >= DLB_MAX_NUM_VFS)
+		return -EINVAL;
+
+	/* If the VF is locked, its resource assignment can't be changed */
+	if (dlb_vf_is_locked(hw, vf_id))
+		return -EPERM;
+
+	dlb_update_vf_sched_domains(hw, vf_id, 0);
+	dlb_update_vf_ldb_queues(hw, vf_id, 0);
+	dlb_update_vf_ldb_ports(hw, vf_id, 0);
+	dlb_update_vf_dir_ports(hw, vf_id, 0);
+	dlb_update_vf_ldb_credit_pools(hw, vf_id, 0);
+	dlb_update_vf_dir_credit_pools(hw, vf_id, 0);
+	dlb_update_vf_ldb_credits(hw, vf_id, 0);
+	dlb_update_vf_dir_credits(hw, vf_id, 0);
+	dlb_update_vf_hist_list_entries(hw, vf_id, 0);
+	dlb_update_vf_atomic_inflights(hw, vf_id, 0);
+
+	return 0;
+}
+
+void dlb_hw_enable_sparse_ldb_cq_mode(struct dlb_hw *hw)
+{
+	union dlb_sys_cq_mode r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_SYS_CQ_MODE);
+
+	r0.field.ldb_cq64 = 1;
+
+	DLB_CSR_WR(hw, DLB_SYS_CQ_MODE, r0.val);
+}
+
+void dlb_hw_enable_sparse_dir_cq_mode(struct dlb_hw *hw)
+{
+	union dlb_sys_cq_mode r0;
+
+	r0.val = DLB_CSR_RD(hw, DLB_SYS_CQ_MODE);
+
+	r0.field.dir_cq64 = 1;
+
+	DLB_CSR_WR(hw, DLB_SYS_CQ_MODE, r0.val);
+}
+
+void dlb_hw_set_qe_arbiter_weights(struct dlb_hw *hw, u8 weight[8])
+{
+	union dlb_atm_pipe_ctrl_arb_weights_rdy_bin r0 = { {0} };
+	union dlb_nalb_pipe_ctrl_arb_weights_tqpri_nalb_0 r1 = { {0} };
+	union dlb_nalb_pipe_ctrl_arb_weights_tqpri_nalb_1 r2 = { {0} };
+	union dlb_nalb_pipe_cfg_ctrl_arb_weights_tqpri_replay_0 r3 = { {0} };
+	union dlb_nalb_pipe_cfg_ctrl_arb_weights_tqpri_replay_1 r4 = { {0} };
+	union dlb_dp_cfg_ctrl_arb_weights_tqpri_replay_0 r5 = { {0} };
+	union dlb_dp_cfg_ctrl_arb_weights_tqpri_replay_1 r6 = { {0} };
+	union dlb_dp_cfg_ctrl_arb_weights_tqpri_dir_0 r7 =  { {0} };
+	union dlb_dp_cfg_ctrl_arb_weights_tqpri_dir_1 r8 =  { {0} };
+	union dlb_nalb_pipe_cfg_ctrl_arb_weights_tqpri_atq_0 r9 = { {0} };
+	union dlb_nalb_pipe_cfg_ctrl_arb_weights_tqpri_atq_1 r10 = { {0} };
+	union dlb_atm_pipe_cfg_ctrl_arb_weights_sched_bin r11 = { {0} };
+	union dlb_aqed_pipe_cfg_ctrl_arb_weights_tqpri_atm_0 r12 = { {0} };
+
+	r0.field.bin0 = weight[1];
+	r0.field.bin1 = weight[3];
+	r0.field.bin2 = weight[5];
+	r0.field.bin3 = weight[7];
+
+	r1.field.pri0 = weight[0];
+	r1.field.pri1 = weight[1];
+	r1.field.pri2 = weight[2];
+	r1.field.pri3 = weight[3];
+	r2.field.pri4 = weight[4];
+	r2.field.pri5 = weight[5];
+	r2.field.pri6 = weight[6];
+	r2.field.pri7 = weight[7];
+
+	r3.field.pri0 = weight[0];
+	r3.field.pri1 = weight[1];
+	r3.field.pri2 = weight[2];
+	r3.field.pri3 = weight[3];
+	r4.field.pri4 = weight[4];
+	r4.field.pri5 = weight[5];
+	r4.field.pri6 = weight[6];
+	r4.field.pri7 = weight[7];
+
+	r5.field.pri0 = weight[0];
+	r5.field.pri1 = weight[1];
+	r5.field.pri2 = weight[2];
+	r5.field.pri3 = weight[3];
+	r6.field.pri4 = weight[4];
+	r6.field.pri5 = weight[5];
+	r6.field.pri6 = weight[6];
+	r6.field.pri7 = weight[7];
+
+	r7.field.pri0 = weight[0];
+	r7.field.pri1 = weight[1];
+	r7.field.pri2 = weight[2];
+	r7.field.pri3 = weight[3];
+	r8.field.pri4 = weight[4];
+	r8.field.pri5 = weight[5];
+	r8.field.pri6 = weight[6];
+	r8.field.pri7 = weight[7];
+
+	r9.field.pri0 = weight[0];
+	r9.field.pri1 = weight[1];
+	r9.field.pri2 = weight[2];
+	r9.field.pri3 = weight[3];
+	r10.field.pri4 = weight[4];
+	r10.field.pri5 = weight[5];
+	r10.field.pri6 = weight[6];
+	r10.field.pri7 = weight[7];
+
+	r11.field.bin0 = weight[1];
+	r11.field.bin1 = weight[3];
+	r11.field.bin2 = weight[5];
+	r11.field.bin3 = weight[7];
+
+	r12.field.pri0 = weight[1];
+	r12.field.pri1 = weight[3];
+	r12.field.pri2 = weight[5];
+	r12.field.pri3 = weight[7];
+
+	DLB_CSR_WR(hw, DLB_ATM_PIPE_CTRL_ARB_WEIGHTS_RDY_BIN, r0.val);
+	DLB_CSR_WR(hw, DLB_NALB_PIPE_CTRL_ARB_WEIGHTS_TQPRI_NALB_0, r1.val);
+	DLB_CSR_WR(hw, DLB_NALB_PIPE_CTRL_ARB_WEIGHTS_TQPRI_NALB_1, r2.val);
+	DLB_CSR_WR(hw,
+		   DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_0,
+		   r3.val);
+	DLB_CSR_WR(hw,
+		   DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_1,
+		   r4.val);
+	DLB_CSR_WR(hw, DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_0, r5.val);
+	DLB_CSR_WR(hw, DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_REPLAY_1, r6.val);
+	DLB_CSR_WR(hw, DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_DIR_0, r7.val);
+	DLB_CSR_WR(hw, DLB_DP_CFG_CTRL_ARB_WEIGHTS_TQPRI_DIR_1, r8.val);
+	DLB_CSR_WR(hw, DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATQ_0, r9.val);
+	DLB_CSR_WR(hw, DLB_NALB_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATQ_1, r10.val);
+	DLB_CSR_WR(hw, DLB_ATM_PIPE_CFG_CTRL_ARB_WEIGHTS_SCHED_BIN, r11.val);
+	DLB_CSR_WR(hw, DLB_AQED_PIPE_CFG_CTRL_ARB_WEIGHTS_TQPRI_ATM_0, r12.val);
+}
+
+void dlb_hw_set_qid_arbiter_weights(struct dlb_hw *hw, u8 weight[8])
+{
+	union dlb_lsp_cfg_arb_weight_ldb_qid_0 r0 = { {0} };
+	union dlb_lsp_cfg_arb_weight_ldb_qid_1 r1 = { {0} };
+	union dlb_lsp_cfg_arb_weight_atm_nalb_qid_0 r2 = { {0} };
+	union dlb_lsp_cfg_arb_weight_atm_nalb_qid_1 r3 = { {0} };
+
+	r0.field.slot0_weight = weight[0];
+	r0.field.slot1_weight = weight[1];
+	r0.field.slot2_weight = weight[2];
+	r0.field.slot3_weight = weight[3];
+	r1.field.slot4_weight = weight[4];
+	r1.field.slot5_weight = weight[5];
+	r1.field.slot6_weight = weight[6];
+	r1.field.slot7_weight = weight[7];
+
+	r2.field.slot0_weight = weight[0];
+	r2.field.slot1_weight = weight[1];
+	r2.field.slot2_weight = weight[2];
+	r2.field.slot3_weight = weight[3];
+	r3.field.slot4_weight = weight[4];
+	r3.field.slot5_weight = weight[5];
+	r3.field.slot6_weight = weight[6];
+	r3.field.slot7_weight = weight[7];
+
+	DLB_CSR_WR(hw, DLB_LSP_CFG_ARB_WEIGHT_LDB_QID_0, r0.val);
+	DLB_CSR_WR(hw, DLB_LSP_CFG_ARB_WEIGHT_LDB_QID_1, r1.val);
+	DLB_CSR_WR(hw, DLB_LSP_CFG_ARB_WEIGHT_ATM_NALB_QID_0, r2.val);
+	DLB_CSR_WR(hw, DLB_LSP_CFG_ARB_WEIGHT_ATM_NALB_QID_1, r3.val);
+}
+
+void dlb_hw_enable_pp_sw_alarms(struct dlb_hw *hw)
+{
+	union dlb_chp_cfg_ldb_pp_sw_alarm_en r0 = { {0} };
+	union dlb_chp_cfg_dir_pp_sw_alarm_en r1 = { {0} };
+	int i;
+
+	r0.field.alarm_enable = 1;
+	r1.field.alarm_enable = 1;
+
+	for (i = 0; i < DLB_MAX_NUM_LDB_PORTS; i++)
+		DLB_CSR_WR(hw, DLB_CHP_CFG_LDB_PP_SW_ALARM_EN(i), r0.val);
+
+	for (i = 0; i < DLB_MAX_NUM_DIR_PORTS; i++)
+		DLB_CSR_WR(hw, DLB_CHP_CFG_DIR_PP_SW_ALARM_EN(i), r1.val);
+}
+
+void dlb_hw_disable_pp_sw_alarms(struct dlb_hw *hw)
+{
+	union dlb_chp_cfg_ldb_pp_sw_alarm_en r0 = { {0} };
+	union dlb_chp_cfg_dir_pp_sw_alarm_en r1 = { {0} };
+	int i;
+
+	r0.field.alarm_enable = 0;
+	r1.field.alarm_enable = 0;
+
+	for (i = 0; i < DLB_MAX_NUM_LDB_PORTS; i++)
+		DLB_CSR_WR(hw, DLB_CHP_CFG_LDB_PP_SW_ALARM_EN(i), r0.val);
+
+	for (i = 0; i < DLB_MAX_NUM_DIR_PORTS; i++)
+		DLB_CSR_WR(hw, DLB_CHP_CFG_DIR_PP_SW_ALARM_EN(i), r1.val);
+}
diff --git a/drivers/event/dlb/pf/base/dlb_resource.h b/drivers/event/dlb/pf/base/dlb_resource.h
new file mode 100644
index 0000000..5500f9b
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_resource.h
@@ -0,0 +1,1625 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_RESOURCE_H
+#define __DLB_RESOURCE_H
+
+#include "dlb_hw_types.h"
+#include "dlb_osdep_types.h"
+#include "dlb_user.h"
+
+/**
+ * dlb_resource_init() - initialize the device
+ * @hw: pointer to struct dlb_hw.
+ *
+ * This function initializes the device's software state (pointed to by the hw
+ * argument) and programs global scheduling QoS registers. This function should
+ * be called during driver initialization.
+ *
+ * The dlb_hw struct must be unique per DLB device and persist until the device
+ * is reset.
+ *
+ * Return:
+ * Returns 0 upon success, -1 otherwise.
+ */
+int dlb_resource_init(struct dlb_hw *hw);
+
+/**
+ * dlb_resource_free() - free device state memory
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function frees software state pointed to by dlb_hw. This function
+ * should be called when resetting the device or unloading the driver.
+ */
+void dlb_resource_free(struct dlb_hw *hw);
+
+/**
+ * dlb_resource_reset() - reset in-use resources to their initial state
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function resets in-use resources, and makes them available for use.
+ * All resources go back to their owning function, whether a PF or a VF.
+ */
+void dlb_resource_reset(struct dlb_hw *hw);
+
+/**
+ * dlb_hw_create_sched_domain() - create a scheduling domain
+ * @hw: dlb_hw handle for a particular device.
+ * @args: scheduling domain creation arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function creates a scheduling domain containing the resources specified
+ * in args. The individual resources (queues, ports, credit pools) can be
+ * configured after creating a scheduling domain.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the domain ID.
+ *
+ * Note: resp->id contains a virtual ID if vf_request is true.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, or the requested domain name
+ *	    is already in use.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_create_sched_domain(struct dlb_hw *hw,
+			       struct dlb_create_sched_domain_args *args,
+			       struct dlb_cmd_response *resp,
+			       bool vf_request,
+			       unsigned int vf_id);
+
+/**
+ * dlb_hw_create_ldb_pool() - create a load-balanced credit pool
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: credit pool creation arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function creates a load-balanced credit pool containing the number of
+ * requested credits.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the pool ID.
+ *
+ * Note: resp->id contains a virtual ID if vf_request is true.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, the domain is not configured,
+ *	    or the domain has already been started.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_create_ldb_pool(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_create_ldb_pool_args *args,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id);
+
+/**
+ * dlb_hw_create_dir_pool() - create a directed credit pool
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: credit pool creation arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function creates a directed credit pool containing the number of
+ * requested credits.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the pool ID.
+ *
+ * Note: resp->id contains a virtual ID if vf_request is true.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, the domain is not configured,
+ *	    or the domain has already been started.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_create_dir_pool(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_create_dir_pool_args *args,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id);
+
+/**
+ * dlb_hw_create_ldb_queue() - create a load-balanced queue
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: queue creation arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function creates a load-balanced queue.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the queue ID.
+ *
+ * Note: resp->id contains a virtual ID if vf_request is true.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, the domain is not configured,
+ *	    the domain has already been started, or the requested queue name is
+ *	    already in use.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_create_ldb_queue(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_create_ldb_queue_args *args,
+			    struct dlb_cmd_response *resp,
+			    bool vf_request,
+			    unsigned int vf_id);
+
+/**
+ * dlb_hw_create_dir_queue() - create a directed queue
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: queue creation arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function creates a directed queue.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the queue ID.
+ *
+ * Note: resp->id contains a virtual ID if vf_request is true.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, the domain is not configured,
+ *	    or the domain has already been started.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_create_dir_queue(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_create_dir_queue_args *args,
+			    struct dlb_cmd_response *resp,
+			    bool vf_request,
+			    unsigned int vf_id);
+
+/**
+ * dlb_hw_create_dir_port() - create a directed port
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: port creation arguments.
+ * @pop_count_dma_base: base address of the pop count memory. This can be
+ *			a PA or an IOVA.
+ * @cq_dma_base: base address of the CQ memory. This can be a PA or an IOVA.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function creates a directed port.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the port ID.
+ *
+ * Note: resp->id contains a virtual ID if vf_request is true.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, a credit setting is invalid, a
+ *	    pool ID is invalid, a pointer address is not properly aligned, the
+ *	    domain is not configured, or the domain has already been started.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_create_dir_port(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_create_dir_port_args *args,
+			   u64 pop_count_dma_base,
+			   u64 cq_dma_base,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id);
+
+/**
+ * dlb_hw_create_ldb_port() - create a load-balanced port
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: port creation arguments.
+ * @pop_count_dma_base: base address of the pop count memory. This can be
+ *			 a PA or an IOVA.
+ * @cq_dma_base: base address of the CQ memory. This can be a PA or an IOVA.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function creates a load-balanced port.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the port ID.
+ *
+ * Note: resp->id contains a virtual ID if vf_request is true.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, a credit setting is invalid, a
+ *	    pool ID is invalid, a pointer address is not properly aligned, the
+ *	    domain is not configured, or the domain has already been started.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_create_ldb_port(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_create_ldb_port_args *args,
+			   u64 pop_count_dma_base,
+			   u64 cq_dma_base,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id);
+
+/**
+ * dlb_hw_start_domain() - start a scheduling domain
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: start domain arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function starts a scheduling domain, which allows applications to send
+ * traffic through it. Once a domain is started, its resources can no longer be
+ * configured (besides QID remapping and port enable/disable).
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error.
+ *
+ * Errors:
+ * EINVAL - the domain is not configured, or the domain is already started.
+ */
+int dlb_hw_start_domain(struct dlb_hw *hw,
+			u32 domain_id,
+			struct dlb_start_domain_args *args,
+			struct dlb_cmd_response *resp,
+			bool vf_request,
+			unsigned int vf_id);
+
+/**
+ * dlb_hw_map_qid() - map a load-balanced queue to a load-balanced port
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: map QID arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function configures the DLB to schedule QEs from the specified queue to
+ * the specified port. Each load-balanced port can be mapped to up to 8 queues;
+ * each load-balanced queue can potentially map to all the load-balanced ports.
+ *
+ * A successful return does not necessarily mean the mapping was configured. If
+ * this function is unable to immediately map the queue to the port, it will
+ * add the requested operation to a per-port list of pending map/unmap
+ * operations, and (if it's not already running) launch a kernel thread that
+ * periodically attempts to process all pending operations. In a sense, this is
+ * an asynchronous function.
+ *
+ * This asynchronicity creates two views of the state of hardware: the actual
+ * hardware state and the requested state (as if every request completed
+ * immediately). If there are any pending map/unmap operations, the requested
+ * state will differ from the actual state. All validation is performed with
+ * respect to the pending state; for instance, if there are 8 pending map
+ * operations for port X, a request for a 9th will fail because a load-balanced
+ * port can only map up to 8 queues.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, invalid port or queue ID, or
+ *	    the domain is not configured.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_map_qid(struct dlb_hw *hw,
+		   u32 domain_id,
+		   struct dlb_map_qid_args *args,
+		   struct dlb_cmd_response *resp,
+		   bool vf_request,
+		   unsigned int vf_id);
+
+/**
+ * dlb_hw_unmap_qid() - Unmap a load-balanced queue from a load-balanced port
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: unmap QID arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function configures the DLB to stop scheduling QEs from the specified
+ * queue to the specified port.
+ *
+ * A successful return does not necessarily mean the mapping was removed. If
+ * this function is unable to immediately unmap the queue from the port, it
+ * will add the requested operation to a per-port list of pending map/unmap
+ * operations, and (if it's not already running) launch a kernel thread that
+ * periodically attempts to process all pending operations. See
+ * dlb_hw_map_qid() for more details.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error.
+ *
+ * Errors:
+ * EINVAL - A requested resource is unavailable, invalid port or queue ID, or
+ *	    the domain is not configured.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_unmap_qid(struct dlb_hw *hw,
+		     u32 domain_id,
+		     struct dlb_unmap_qid_args *args,
+		     struct dlb_cmd_response *resp,
+		     bool vf_request,
+		     unsigned int vf_id);
+
+/**
+ * dlb_finish_unmap_qid_procedures() - finish any pending unmap procedures
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function attempts to finish any outstanding unmap procedures.
+ * This function should be called by the kernel thread responsible for
+ * finishing map/unmap procedures.
+ *
+ * Return:
+ * Returns the number of procedures that weren't completed.
+ */
+unsigned int dlb_finish_unmap_qid_procedures(struct dlb_hw *hw);
+
+/**
+ * dlb_finish_map_qid_procedures() - finish any pending map procedures
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function attempts to finish any outstanding map procedures.
+ * This function should be called by the kernel thread responsible for
+ * finishing map/unmap procedures.
+ *
+ * Return:
+ * Returns the number of procedures that weren't completed.
+ */
+unsigned int dlb_finish_map_qid_procedures(struct dlb_hw *hw);
+
+/**
+ * dlb_hw_enable_ldb_port() - enable a load-balanced port for scheduling
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: port enable arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function configures the DLB to schedule QEs to a load-balanced port.
+ * Ports are enabled by default.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error.
+ *
+ * Errors:
+ * EINVAL - The port ID is invalid or the domain is not configured.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_enable_ldb_port(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_enable_ldb_port_args *args,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id);
+
+/**
+ * dlb_hw_disable_ldb_port() - disable a load-balanced port for scheduling
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: port disable arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function configures the DLB to stop scheduling QEs to a load-balanced
+ * port. Ports are enabled by default.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error.
+ *
+ * Errors:
+ * EINVAL - The port ID is invalid or the domain is not configured.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_disable_ldb_port(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_disable_ldb_port_args *args,
+			    struct dlb_cmd_response *resp,
+			    bool vf_request,
+			    unsigned int vf_id);
+
+/**
+ * dlb_hw_enable_dir_port() - enable a directed port for scheduling
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: port enable arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function configures the DLB to schedule QEs to a directed port.
+ * Ports are enabled by default.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error.
+ *
+ * Errors:
+ * EINVAL - The port ID is invalid or the domain is not configured.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_enable_dir_port(struct dlb_hw *hw,
+			   u32 domain_id,
+			   struct dlb_enable_dir_port_args *args,
+			   struct dlb_cmd_response *resp,
+			   bool vf_request,
+			   unsigned int vf_id);
+
+/**
+ * dlb_hw_disable_dir_port() - disable a directed port for scheduling
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: port disable arguments.
+ * @resp: response structure.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function configures the DLB to stop scheduling QEs to a directed port.
+ * Ports are enabled by default.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error.
+ *
+ * Errors:
+ * EINVAL - The port ID is invalid or the domain is not configured.
+ * EFAULT - Internal error (resp->status not set).
+ */
+int dlb_hw_disable_dir_port(struct dlb_hw *hw,
+			    u32 domain_id,
+			    struct dlb_disable_dir_port_args *args,
+			    struct dlb_cmd_response *resp,
+			    bool vf_request,
+			    unsigned int vf_id);
+
+/**
+ * dlb_configure_ldb_cq_interrupt() - configure load-balanced CQ for interrupts
+ * @hw: dlb_hw handle for a particular device.
+ * @port_id: load-balancd port ID.
+ * @vector: interrupt vector ID. Should be 0 for MSI or compressed MSI-X mode,
+ *	    else a value up to 64.
+ * @mode: interrupt type (DLB_CQ_ISR_MODE_MSI or DLB_CQ_ISR_MODE_MSIX)
+ * @vf: If the port is VF-owned, the VF's ID. This is used for translating the
+ *	virtual port ID to a physical port ID. Ignored if mode is not MSI.
+ * @owner_vf: the VF to route the interrupt to. Ignore if mode is not MSI.
+ * @threshold: the minimum CQ depth at which the interrupt can fire. Must be
+ *	greater than 0.
+ *
+ * This function configures the DLB registers for load-balanced CQ's interrupts.
+ * This doesn't enable the CQ's interrupt; that can be done with
+ * dlb_arm_cq_interrupt() or through an interrupt arm QE.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - The port ID is invalid.
+ */
+int dlb_configure_ldb_cq_interrupt(struct dlb_hw *hw,
+				   int port_id,
+				   int vector,
+				   int mode,
+				   unsigned int vf,
+				   unsigned int owner_vf,
+				   u16 threshold);
+
+/**
+ * dlb_configure_dir_cq_interrupt() - configure directed CQ for interrupts
+ * @hw: dlb_hw handle for a particular device.
+ * @port_id: load-balancd port ID.
+ * @vector: interrupt vector ID. Should be 0 for MSI or compressed MSI-X mode,
+ *	    else a value up to 64.
+ * @mode: interrupt type (DLB_CQ_ISR_MODE_MSI or DLB_CQ_ISR_MODE_MSIX)
+ * @vf: If the port is VF-owned, the VF's ID. This is used for translating the
+ *	virtual port ID to a physical port ID. Ignored if mode is not MSI.
+ * @owner_vf: the VF to route the interrupt to. Ignore if mode is not MSI.
+ * @threshold: the minimum CQ depth at which the interrupt can fire. Must be
+ *	greater than 0.
+ *
+ * This function configures the DLB registers for directed CQ's interrupts.
+ * This doesn't enable the CQ's interrupt; that can be done with
+ * dlb_arm_cq_interrupt() or through an interrupt arm QE.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise.
+ *
+ * Errors:
+ * EINVAL - The port ID is invalid.
+ */
+int dlb_configure_dir_cq_interrupt(struct dlb_hw *hw,
+				   int port_id,
+				   int vector,
+				   int mode,
+				   unsigned int vf,
+				   unsigned int owner_vf,
+				   u16 threshold);
+
+/**
+ * dlb_enable_alarm_interrupts() - enable certain hardware alarm interrupts
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function configures the ingress error alarm. (Other alarms are enabled
+ * by default.)
+ */
+void dlb_enable_alarm_interrupts(struct dlb_hw *hw);
+
+/**
+ * dlb_disable_alarm_interrupts() - disable certain hardware alarm interrupts
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function configures the ingress error alarm. (Other alarms are disabled
+ * by default.)
+ */
+void dlb_disable_alarm_interrupts(struct dlb_hw *hw);
+
+/**
+ * dlb_set_msix_mode() - enable certain hardware alarm interrupts
+ * @hw: dlb_hw handle for a particular device.
+ * @mode: MSI-X mode (DLB_MSIX_MODE_PACKED or DLB_MSIX_MODE_COMPRESSED)
+ *
+ * This function configures the hardware to use either packed or compressed
+ * mode. This function should not be called if using MSI interrupts.
+ */
+void dlb_set_msix_mode(struct dlb_hw *hw, int mode);
+
+/**
+ * dlb_arm_cq_interrupt() - arm a CQ's interrupt
+ * @hw: dlb_hw handle for a particular device.
+ * @port_id: port ID
+ * @is_ldb: true for load-balanced port, false for a directed port
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function arms the CQ's interrupt. The CQ must be configured prior to
+ * calling this function.
+ *
+ * The function does no parameter validation; that is the caller's
+ * responsibility.
+ *
+ * Return: returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - Invalid port ID.
+ */
+int dlb_arm_cq_interrupt(struct dlb_hw *hw,
+			 int port_id,
+			 bool is_ldb,
+			 bool vf_request,
+			 unsigned int vf_id);
+
+/**
+ * dlb_read_compressed_cq_intr_status() - read compressed CQ interrupt status
+ * @hw: dlb_hw handle for a particular device.
+ * @ldb_interrupts: 2-entry array of u32 bitmaps
+ * @dir_interrupts: 4-entry array of u32 bitmaps
+ *
+ * This function can be called from a compressed CQ interrupt handler to
+ * determine which CQ interrupts have fired. The caller should take appropriate
+ * (such as waking threads blocked on a CQ's interrupt) then ack the interrupts
+ * with dlb_ack_compressed_cq_intr().
+ */
+void dlb_read_compressed_cq_intr_status(struct dlb_hw *hw,
+					u32 *ldb_interrupts,
+					u32 *dir_interrupts);
+
+/**
+ * dlb_ack_compressed_cq_intr_status() - ack compressed CQ interrupts
+ * @hw: dlb_hw handle for a particular device.
+ * @ldb_interrupts: 2-entry array of u32 bitmaps
+ * @dir_interrupts: 4-entry array of u32 bitmaps
+ *
+ * This function ACKs compressed CQ interrupts. Its arguments should be the
+ * same ones passed to dlb_read_compressed_cq_intr_status().
+ */
+void dlb_ack_compressed_cq_intr(struct dlb_hw *hw,
+				u32 *ldb_interrupts,
+				u32 *dir_interrupts);
+
+/**
+ * dlb_read_vf_intr_status() - read the VF interrupt status register
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function can be called from a VF's interrupt handler to determine
+ * which interrupts have fired. The first 31 bits correspond to CQ interrupt
+ * vectors, and the final bit is for the PF->VF mailbox interrupt vector.
+ *
+ * Return:
+ * Returns a bit vector indicating which interrupt vectors are active.
+ */
+u32 dlb_read_vf_intr_status(struct dlb_hw *hw);
+
+/**
+ * dlb_ack_vf_intr_status() - ack VF interrupts
+ * @hw: dlb_hw handle for a particular device.
+ * @interrupts: 32-bit bitmap
+ *
+ * This function ACKs a VF's interrupts. Its interrupts argument should be the
+ * value returned by dlb_read_vf_intr_status().
+ */
+void dlb_ack_vf_intr_status(struct dlb_hw *hw, u32 interrupts);
+
+/**
+ * dlb_ack_vf_msi_intr() - ack VF MSI interrupt
+ * @hw: dlb_hw handle for a particular device.
+ * @interrupts: 32-bit bitmap
+ *
+ * This function clears the VF's MSI interrupt pending register. Its interrupts
+ * argument should be contain the MSI vectors to ACK. For example, if MSI MME
+ * is in mode 0, then one bit 0 should ever be set.
+ */
+void dlb_ack_vf_msi_intr(struct dlb_hw *hw, u32 interrupts);
+
+/**
+ * dlb_ack_vf_mbox_int() - ack PF->VF mailbox interrupt
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * When done processing the PF mailbox request, this function unsets
+ * the PF's mailbox ISR register.
+ */
+void dlb_ack_pf_mbox_int(struct dlb_hw *hw);
+
+/**
+ * dlb_read_vf_to_pf_int_bitvec() - return a bit vector of all requesting VFs
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * When the VF->PF ISR fires, this function can be called to determine which
+ * VF(s) are requesting service. This bitvector must be passed to
+ * dlb_ack_vf_to_pf_int() when processing is complete for all requesting VFs.
+ *
+ * Return:
+ * Returns a bit vector indicating which VFs (0-15) have requested service.
+ */
+u32 dlb_read_vf_to_pf_int_bitvec(struct dlb_hw *hw);
+
+/**
+ * dlb_ack_vf_mbox_int() - ack processed VF->PF mailbox interrupt
+ * @hw: dlb_hw handle for a particular device.
+ * @bitvec: bit vector returned by dlb_read_vf_to_pf_int_bitvec()
+ *
+ * When done processing all VF mailbox requests, this function unsets the VF's
+ * mailbox ISR register.
+ */
+void dlb_ack_vf_mbox_int(struct dlb_hw *hw, u32 bitvec);
+
+/**
+ * dlb_read_vf_flr_int_bitvec() - return a bit vector of all VFs requesting FLR
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * When the VF FLR ISR fires, this function can be called to determine which
+ * VF(s) are requesting FLRs. This bitvector must passed to
+ * dlb_ack_vf_flr_int() when processing is complete for all requesting VFs.
+ *
+ * Return:
+ * Returns a bit vector indicating which VFs (0-15) have requested FLRs.
+ */
+u32 dlb_read_vf_flr_int_bitvec(struct dlb_hw *hw);
+
+/**
+ * dlb_ack_vf_flr_int() - ack processed VF<->PF interrupt(s)
+ * @hw: dlb_hw handle for a particular device.
+ * @bitvec: bit vector returned by dlb_read_vf_flr_int_bitvec()
+ * @a_stepping: device is A-stepping
+ *
+ * When done processing all VF FLR requests, this function unsets the VF's FLR
+ * ISR register.
+ *
+ * Note: The caller must ensure dlb_set_vf_reset_in_progress(),
+ * dlb_clr_vf_reset_in_progress(), and dlb_ack_vf_flr_int() are not executed in
+ * parallel, because the reset-in-progress register does not support atomic
+ * updates on A-stepping devices.
+ */
+void dlb_ack_vf_flr_int(struct dlb_hw *hw, u32 bitvec, bool a_stepping);
+
+/**
+ * dlb_ack_vf_to_pf_int() - ack processed VF mbox and FLR interrupt(s)
+ * @hw: dlb_hw handle for a particular device.
+ * @mbox_bitvec: bit vector returned by dlb_read_vf_to_pf_int_bitvec()
+ * @flr_bitvec: bit vector returned by dlb_read_vf_flr_int_bitvec()
+ *
+ * When done processing all VF requests, this function communicates to the
+ * hardware that processing is complete. When this function completes, hardware
+ * can immediately generate another VF mbox or FLR interrupt.
+ */
+void dlb_ack_vf_to_pf_int(struct dlb_hw *hw,
+			  u32 mbox_bitvec,
+			  u32 flr_bitvec);
+
+/**
+ * dlb_process_alarm_interrupt() - process an alarm interrupt
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function reads the alarm syndrome, logs its, and acks the interrupt.
+ * This function should be called from the alarm interrupt handler when
+ * interrupt vector DLB_INT_ALARM fires.
+ */
+void dlb_process_alarm_interrupt(struct dlb_hw *hw);
+
+/**
+ * dlb_process_ingress_error_interrupt() - process ingress error interrupts
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function reads the alarm syndrome, logs it, notifies user-space, and
+ * acks the interrupt. This function should be called from the alarm interrupt
+ * handler when interrupt vector DLB_INT_INGRESS_ERROR fires.
+ */
+void dlb_process_ingress_error_interrupt(struct dlb_hw *hw);
+
+/**
+ * dlb_get_group_sequence_numbers() - return a group's number of SNs per queue
+ * @hw: dlb_hw handle for a particular device.
+ * @group_id: sequence number group ID.
+ *
+ * This function returns the configured number of sequence numbers per queue
+ * for the specified group.
+ *
+ * Return:
+ * Returns -EINVAL if group_id is invalid, else the group's SNs per queue.
+ */
+int dlb_get_group_sequence_numbers(struct dlb_hw *hw, unsigned int group_id);
+
+/**
+ * dlb_get_group_sequence_number_occupancy() - return a group's in-use slots
+ * @hw: dlb_hw handle for a particular device.
+ * @group_id: sequence number group ID.
+ *
+ * This function returns the group's number of in-use slots (i.e. load-balanced
+ * queues using the specified group).
+ *
+ * Return:
+ * Returns -EINVAL if group_id is invalid, else the group's occupancy.
+ */
+int dlb_get_group_sequence_number_occupancy(struct dlb_hw *hw,
+					    unsigned int group_id);
+
+/**
+ * dlb_set_group_sequence_numbers() - assign a group's number of SNs per queue
+ * @hw: dlb_hw handle for a particular device.
+ * @group_id: sequence number group ID.
+ * @val: requested amount of sequence numbers per queue.
+ *
+ * This function configures the group's number of sequence numbers per queue.
+ * val can be a power-of-two between 32 and 1024, inclusive. This setting can
+ * be configured until the first ordered load-balanced queue is configured, at
+ * which point the configuration is locked.
+ *
+ * Return:
+ * Returns 0 upon success; -EINVAL if group_id or val is invalid, -EPERM if an
+ * ordered queue is configured.
+ */
+int dlb_set_group_sequence_numbers(struct dlb_hw *hw,
+				   unsigned int group_id,
+				   unsigned long val);
+
+/**
+ * dlb_reset_domain() - reset a scheduling domain
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function resets and frees a DLB scheduling domain and its associated
+ * resources.
+ *
+ * Pre-condition: the driver must ensure software has stopped sending QEs
+ * through this domain's producer ports before invoking this function, or
+ * undefined behavior will result.
+ *
+ * Return:
+ * Returns 0 upon success, -1 otherwise.
+ *
+ * EINVAL - Invalid domain ID, or the domain is not configured.
+ * EFAULT - Internal error. (Possibly caused if software is the pre-condition
+ *	    is not met.)
+ * ETIMEDOUT - Hardware component didn't reset in the expected time.
+ */
+int dlb_reset_domain(struct dlb_hw *hw,
+		     u32 domain_id,
+		     bool vf_request,
+		     unsigned int vf_id);
+
+/**
+ * dlb_ldb_port_owned_by_domain() - query whether a port is owned by a domain
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @port_id: indicates whether this request came from a VF.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function returns whether a load-balanced port is owned by a specified
+ * domain.
+ *
+ * Return:
+ * Returns 0 if false, 1 if true, <0 otherwise.
+ *
+ * EINVAL - Invalid domain or port ID, or the domain is not configured.
+ */
+int dlb_ldb_port_owned_by_domain(struct dlb_hw *hw,
+				 u32 domain_id,
+				 u32 port_id,
+				 bool vf_request,
+				 unsigned int vf_id);
+
+/**
+ * dlb_dir_port_owned_by_domain() - query whether a port is owned by a domain
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @port_id: indicates whether this request came from a VF.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function returns whether a directed port is owned by a specified
+ * domain.
+ *
+ * Return:
+ * Returns 0 if false, 1 if true, <0 otherwise.
+ *
+ * EINVAL - Invalid domain or port ID, or the domain is not configured.
+ */
+int dlb_dir_port_owned_by_domain(struct dlb_hw *hw,
+				 u32 domain_id,
+				 u32 port_id,
+				 bool vf_request,
+				 unsigned int vf_id);
+
+/**
+ * dlb_hw_get_num_resources() - query the PCI function's available resources
+ * @arg: pointer to resource counts.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function returns the number of available resources for the PF or for a
+ * VF.
+ *
+ * Return:
+ * Returns 0 upon success, -1 if vf_request is true and vf_id is invalid.
+ */
+int dlb_hw_get_num_resources(struct dlb_hw *hw,
+			     struct dlb_get_num_resources_args *arg,
+			     bool vf_request,
+			     unsigned int vf_id);
+
+/**
+ * dlb_hw_get_num_used_resources() - query the PCI function's used resources
+ * @arg: pointer to resource counts.
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function returns the number of resources in use by the PF or a VF. It
+ * fills in the fields that args points to, except the following:
+ * - max_contiguous_atomic_inflights
+ * - max_contiguous_hist_list_entries
+ * - max_contiguous_ldb_credits
+ * - max_contiguous_dir_credits
+ *
+ * Return:
+ * Returns 0 upon success, -1 if vf_request is true and vf_id is invalid.
+ */
+int dlb_hw_get_num_used_resources(struct dlb_hw *hw,
+				  struct dlb_get_num_resources_args *arg,
+				  bool vf_request,
+				  unsigned int vf_id);
+
+/**
+ * dlb_send_async_vf_to_pf_msg() - (VF only) send a mailbox message to the PF
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function sends a VF->PF mailbox message. It is asynchronous, so it
+ * returns once the message is sent but potentially before the PF has processed
+ * the message. The caller must call dlb_vf_to_pf_complete() to determine when
+ * the PF has finished processing the request.
+ */
+void dlb_send_async_vf_to_pf_msg(struct dlb_hw *hw);
+
+/**
+ * dlb_vf_to_pf_complete() - check the status of an asynchronous mailbox request
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function returns a boolean indicating whether the PF has finished
+ * processing a VF->PF mailbox request. It should only be called after sending
+ * an asynchronous request with dlb_send_async_vf_to_pf_msg().
+ */
+bool dlb_vf_to_pf_complete(struct dlb_hw *hw);
+
+/**
+ * dlb_vf_flr_complete() - check the status of a VF FLR
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function returns a boolean indicating whether the PF has finished
+ * executing the VF FLR. It should only be called after setting the VF's FLR
+ * bit.
+ */
+bool dlb_vf_flr_complete(struct dlb_hw *hw);
+
+/**
+ * dlb_set_vf_reset_in_progress() - set a VF's reset in progress bit
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID.
+ *
+ * Note: This function is only supported on A-stepping devices.
+ *
+ * Note: The caller must ensure dlb_set_vf_reset_in_progress(),
+ * dlb_clr_vf_reset_in_progress(), and dlb_ack_vf_flr_int() are not executed in
+ * parallel, because the reset-in-progress register does not support atomic
+ * updates on A-stepping devices.
+ */
+void dlb_set_vf_reset_in_progress(struct dlb_hw *hw, int vf_id);
+
+/**
+ * dlb_clr_vf_reset_in_progress() - clear a VF's reset in progress bit
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID.
+ *
+ * Note: This function is only supported on A-stepping devices.
+ *
+ * Note: The caller must ensure dlb_set_vf_reset_in_progress(),
+ * dlb_clr_vf_reset_in_progress(), and dlb_ack_vf_flr_int() are not executed in
+ * parallel, because the reset-in-progress register does not support atomic
+ * updates on A-stepping devices.
+ */
+void dlb_clr_vf_reset_in_progress(struct dlb_hw *hw, int vf_id);
+
+/**
+ * dlb_send_async_pf_to_vf_msg() - (PF only) send a mailbox message to the VF
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID.
+ *
+ * This function sends a PF->VF mailbox message. It is asynchronous, so it
+ * returns once the message is sent but potentially before the VF has processed
+ * the message. The caller must call dlb_pf_to_vf_complete() to determine when
+ * the VF has finished processing the request.
+ */
+void dlb_send_async_pf_to_vf_msg(struct dlb_hw *hw, unsigned int vf_id);
+
+/**
+ * dlb_pf_to_vf_complete() - check the status of an asynchronous mailbox request
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID.
+ *
+ * This function returns a boolean indicating whether the VF has finished
+ * processing a PF->VF mailbox request. It should only be called after sending
+ * an asynchronous request with dlb_send_async_pf_to_vf_msg().
+ */
+bool dlb_pf_to_vf_complete(struct dlb_hw *hw, unsigned int vf_id);
+
+/**
+ * dlb_pf_read_vf_mbox_req() - (PF only) read a VF->PF mailbox request
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID.
+ * @data: pointer to message data.
+ * @len: size, in bytes, of the data array.
+ *
+ * This function copies one of the PF's VF->PF mailboxes into the array pointed
+ * to by data.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - len >= DLB_VF2PF_REQ_BYTES.
+ */
+int dlb_pf_read_vf_mbox_req(struct dlb_hw *hw,
+			    unsigned int vf_id,
+			    void *data,
+			    int len);
+
+/**
+ * dlb_pf_read_vf_mbox_resp() - (PF only) read a VF->PF mailbox response
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID.
+ * @data: pointer to message data.
+ * @len: size, in bytes, of the data array.
+ *
+ * This function copies one of the PF's VF->PF mailboxes into the array pointed
+ * to by data.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - len >= DLB_VF2PF_RESP_BYTES.
+ */
+int dlb_pf_read_vf_mbox_resp(struct dlb_hw *hw,
+			     unsigned int vf_id,
+			     void *data,
+			     int len);
+
+/**
+ * dlb_pf_write_vf_mbox_resp() - (PF only) write a PF->VF mailbox response
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID.
+ * @data: pointer to message data.
+ * @len: size, in bytes, of the data array.
+ *
+ * This function copies the user-provided message data into of the PF's VF->PF
+ * mailboxes.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - len >= DLB_PF2VF_RESP_BYTES.
+ */
+int dlb_pf_write_vf_mbox_resp(struct dlb_hw *hw,
+			      unsigned int vf_id,
+			      void *data,
+			      int len);
+
+/**
+ * dlb_pf_write_vf_mbox_req() - (PF only) write a PF->VF mailbox request
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID.
+ * @data: pointer to message data.
+ * @len: size, in bytes, of the data array.
+ *
+ * This function copies the user-provided message data into of the PF's VF->PF
+ * mailboxes.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - len >= DLB_PF2VF_REQ_BYTES.
+ */
+int dlb_pf_write_vf_mbox_req(struct dlb_hw *hw,
+			     unsigned int vf_id,
+			     void *data,
+			     int len);
+
+/**
+ * dlb_vf_read_pf_mbox_resp() - (VF only) read a PF->VF mailbox response
+ * @hw: dlb_hw handle for a particular device.
+ * @data: pointer to message data.
+ * @len: size, in bytes, of the data array.
+ *
+ * This function copies the VF's PF->VF mailbox into the array pointed to by
+ * data.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - len >= DLB_PF2VF_RESP_BYTES.
+ */
+int dlb_vf_read_pf_mbox_resp(struct dlb_hw *hw, void *data, int len);
+
+/**
+ * dlb_vf_read_pf_mbox_req() - (VF only) read a PF->VF mailbox request
+ * @hw: dlb_hw handle for a particular device.
+ * @data: pointer to message data.
+ * @len: size, in bytes, of the data array.
+ *
+ * This function copies the VF's PF->VF mailbox into the array pointed to by
+ * data.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - len >= DLB_PF2VF_REQ_BYTES.
+ */
+int dlb_vf_read_pf_mbox_req(struct dlb_hw *hw, void *data, int len);
+
+/**
+ * dlb_vf_write_pf_mbox_req() - (VF only) write a VF->PF mailbox request
+ * @hw: dlb_hw handle for a particular device.
+ * @data: pointer to message data.
+ * @len: size, in bytes, of the data array.
+ *
+ * This function copies the user-provided message data into of the VF's PF->VF
+ * mailboxes.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - len >= DLB_VF2PF_REQ_BYTES.
+ */
+int dlb_vf_write_pf_mbox_req(struct dlb_hw *hw, void *data, int len);
+
+/**
+ * dlb_vf_write_pf_mbox_resp() - (VF only) write a VF->PF mailbox response
+ * @hw: dlb_hw handle for a particular device.
+ * @data: pointer to message data.
+ * @len: size, in bytes, of the data array.
+ *
+ * This function copies the user-provided message data into of the VF's PF->VF
+ * mailboxes.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * EINVAL - len >= DLB_VF2PF_RESP_BYTES.
+ */
+int dlb_vf_write_pf_mbox_resp(struct dlb_hw *hw, void *data, int len);
+
+/**
+ * dlb_reset_vf() - reset the hardware owned by a VF
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ *
+ * This function resets the hardware owned by a VF (if any), by resetting the
+ * VF's domains one by one.
+ */
+int dlb_reset_vf(struct dlb_hw *hw, unsigned int vf_id);
+
+/**
+ * dlb_vf_is_locked() - check whether the VF's resources are locked
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ *
+ * This function returns whether or not the VF's resource assignments are
+ * locked. If locked, no resources can be added to or subtracted from the
+ * group.
+ */
+bool dlb_vf_is_locked(struct dlb_hw *hw, unsigned int vf_id);
+
+/**
+ * dlb_lock_vf() - lock the VF's resources
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ *
+ * This function sets a flag indicating that the VF is using its resources.
+ * When VF is locked, its resource assignment cannot be changed.
+ */
+void dlb_lock_vf(struct dlb_hw *hw, unsigned int vf_id);
+
+/**
+ * dlb_unlock_vf() - unlock the VF's resources
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ *
+ * This function unlocks the VF's resource assignment, allowing it to be
+ * modified.
+ */
+void dlb_unlock_vf(struct dlb_hw *hw, unsigned int vf_id);
+
+/**
+ * dlb_update_vf_sched_domains() - update the domains assigned to a VF
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of scheduling domains to assign to this VF
+ *
+ * This function assigns num scheduling domains to the specified VF. If the VF
+ * already has domains assigned, this existing assignment is adjusted
+ * accordingly.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_sched_domains(struct dlb_hw *hw,
+				u32 vf_id,
+				u32 num);
+
+/**
+ * dlb_update_vf_ldb_queues() - update the LDB queues assigned to a VF
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of LDB queues to assign to this VF
+ *
+ * This function assigns num LDB queues to the specified VF. If the VF already
+ * has LDB queues assigned, this existing assignment is adjusted
+ * accordingly.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_ldb_queues(struct dlb_hw *hw, u32 vf_id, u32 num);
+
+/**
+ * dlb_update_vf_ldb_ports() - update the LDB ports assigned to a VF
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of LDB ports to assign to this VF
+ *
+ * This function assigns num LDB ports to the specified VF. If the VF already
+ * has LDB ports assigned, this existing assignment is adjusted accordingly.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_ldb_ports(struct dlb_hw *hw, u32 vf_id, u32 num);
+
+/**
+ * dlb_update_vf_dir_ports() - update the DIR ports assigned to a VF
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of DIR ports to assign to this VF
+ *
+ * This function assigns num DIR ports to the specified VF. If the VF already
+ * has DIR ports assigned, this existing assignment is adjusted accordingly.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_dir_ports(struct dlb_hw *hw, u32 vf_id, u32 num);
+
+/**
+ * dlb_update_vf_ldb_credit_pools() - update the VF's assigned LDB pools
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of LDB credit pools to assign to this VF
+ *
+ * This function assigns num LDB credit pools to the specified VF. If the VF
+ * already has LDB credit pools assigned, this existing assignment is adjusted
+ * accordingly.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_ldb_credit_pools(struct dlb_hw *hw,
+				   u32 vf_id,
+				   u32 num);
+
+/**
+ * dlb_update_vf_dir_credit_pools() - update the VF's assigned DIR pools
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of DIR credit pools to assign to this VF
+ *
+ * This function assigns num DIR credit pools to the specified VF. If the VF
+ * already has DIR credit pools assigned, this existing assignment is adjusted
+ * accordingly.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_dir_credit_pools(struct dlb_hw *hw,
+				   u32 vf_id,
+				   u32 num);
+
+/**
+ * dlb_update_vf_ldb_credits() - update the VF's assigned LDB credits
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of LDB credits to assign to this VF
+ *
+ * This function assigns num LDB credits to the specified VF. If the VF already
+ * has LDB credits assigned, this existing assignment is adjusted accordingly.
+ * VF's are assigned a contiguous chunk of credits, so this function may fail
+ * if a sufficiently large contiguous chunk is not available.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_ldb_credits(struct dlb_hw *hw, u32 vf_id, u32 num);
+
+/**
+ * dlb_update_vf_dir_credits() - update the VF's assigned DIR credits
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of DIR credits to assign to this VF
+ *
+ * This function assigns num DIR credits to the specified VF. If the VF already
+ * has DIR credits assigned, this existing assignment is adjusted accordingly.
+ * VF's are assigned a contiguous chunk of credits, so this function may fail
+ * if a sufficiently large contiguous chunk is not available.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_dir_credits(struct dlb_hw *hw, u32 vf_id, u32 num);
+
+/**
+ * dlb_update_vf_hist_list_entries() - update the VF's assigned HL entries
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of history list entries to assign to this VF
+ *
+ * This function assigns num history list entries to the specified VF. If the
+ * VF already has history list entries assigned, this existing assignment is
+ * adjusted accordingly. VF's are assigned a contiguous chunk of entries, so
+ * this function may fail if a sufficiently large contiguous chunk is not
+ * available.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_hist_list_entries(struct dlb_hw *hw,
+				    u32 vf_id,
+				    u32 num);
+
+/**
+ * dlb_update_vf_atomic_inflights() - update the VF's atomic inflights
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @num: number of atomic inflights to assign to this VF
+ *
+ * This function assigns num atomic inflights to the specified VF. If the VF
+ * already has atomic inflights assigned, this existing assignment is adjusted
+ * accordingly. VF's are assigned a contiguous chunk of entries, so this
+ * function may fail if a sufficiently large contiguous chunk is not available.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid, or the requested number of resources are
+ *	    unavailable.
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_update_vf_atomic_inflights(struct dlb_hw *hw,
+				   u32 vf_id,
+				   u32 num);
+
+/**
+ * dlb_reset_vf_resources() - reassign the VF's resources to the PF
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ *
+ * This function takes any resources currently assigned to the VF and reassigns
+ * them to the PF.
+ *
+ * Return:
+ * Returns 0 upon success, <0 otherwise.
+ *
+ * Errors:
+ * EINVAL - vf_id is invalid
+ * EPERM  - The VF's resource assignment is locked and cannot be changed.
+ */
+int dlb_reset_vf_resources(struct dlb_hw *hw, unsigned int vf_id);
+
+/**
+ * dlb_notify_vf() - send a notification to a VF
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ * @notification: notification
+ *
+ * This function sends a notification (as defined in dlb_mbox.h) to a VF.
+ *
+ * Return:
+ * Returns 0 upon success, -1 if the VF doesn't ACK the PF->VF interrupt.
+ */
+int dlb_notify_vf(struct dlb_hw *hw,
+		  unsigned int vf_id,
+		  u32 notification);
+
+/**
+ * dlb_vf_in_use() - query whether a VF is in use
+ * @hw: dlb_hw handle for a particular device.
+ * @vf_id: VF ID
+ *
+ * This function sends a mailbox request to the VF to query whether the VF is in
+ * use.
+ *
+ * Return:
+ * Returns 0 for false, 1 for true, and -1 if the mailbox request times out or
+ * an internal error occurs.
+ */
+int dlb_vf_in_use(struct dlb_hw *hw, unsigned int vf_id);
+
+/**
+ * dlb_disable_dp_vasr_feature() - disable directed pipe VAS reset hardware
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function disables certain hardware in the directed pipe,
+ * necessary to workaround a DLB VAS reset issue.
+ */
+void dlb_disable_dp_vasr_feature(struct dlb_hw *hw);
+
+/**
+ * dlb_enable_excess_tokens_alarm() - enable interrupts for the excess token
+ * pop alarm
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function enables the PF ingress error alarm interrupt to fire when an
+ * excess token pop occurs.
+ */
+void dlb_enable_excess_tokens_alarm(struct dlb_hw *hw);
+
+/**
+ * dlb_disable_excess_tokens_alarm() - disable interrupts for the excess token
+ * pop alarm
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function disables the PF ingress error alarm interrupt to fire when an
+ * excess token pop occurs.
+ */
+void dlb_disable_excess_tokens_alarm(struct dlb_hw *hw);
+
+/**
+ * dlb_hw_get_ldb_queue_depth() - returns the depth of a load-balanced queue
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: queue depth args
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function returns the depth of a load-balanced queue.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the depth.
+ *
+ * Errors:
+ * EINVAL - Invalid domain ID or queue ID.
+ */
+int dlb_hw_get_ldb_queue_depth(struct dlb_hw *hw,
+			       u32 domain_id,
+			       struct dlb_get_ldb_queue_depth_args *args,
+			       struct dlb_cmd_response *resp,
+			       bool vf_request,
+			       unsigned int vf_id);
+
+/**
+ * dlb_hw_get_dir_queue_depth() - returns the depth of a directed queue
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: queue depth args
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * This function returns the depth of a directed queue.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the depth.
+ *
+ * Errors:
+ * EINVAL - Invalid domain ID or queue ID.
+ */
+int dlb_hw_get_dir_queue_depth(struct dlb_hw *hw,
+			       u32 domain_id,
+			       struct dlb_get_dir_queue_depth_args *args,
+			       struct dlb_cmd_response *resp,
+			       bool vf_request,
+			       unsigned int vf_id);
+
+/**
+ * dlb_hw_pending_port_unmaps() - returns the number of unmap operations in
+ *	progress for a load-balanced port.
+ * @hw: dlb_hw handle for a particular device.
+ * @domain_id: domain ID.
+ * @args: number of unmaps in progress args
+ * @vf_request: indicates whether this request came from a VF.
+ * @vf_id: If vf_request is true, this contains the VF's ID.
+ *
+ * Return:
+ * Returns 0 upon success, < 0 otherwise. If an error occurs, resp->status is
+ * assigned a detailed error code from enum dlb_error. If successful, resp->id
+ * contains the number of unmaps in progress.
+ *
+ * Errors:
+ * EINVAL - Invalid port ID.
+ */
+int dlb_hw_pending_port_unmaps(struct dlb_hw *hw,
+			       u32 domain_id,
+			       struct dlb_pending_port_unmaps_args *args,
+			       struct dlb_cmd_response *resp,
+			       bool vf_request,
+			       unsigned int vf_id);
+
+/**
+ * dlb_hw_enable_sparse_ldb_cq_mode() - enable sparse mode for load-balanced
+ *	ports.
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function must be called prior to configuring scheduling domains.
+ */
+void dlb_hw_enable_sparse_ldb_cq_mode(struct dlb_hw *hw);
+
+/**
+ * dlb_hw_enable_sparse_dir_cq_mode() - enable sparse mode for directed ports
+ * @hw: dlb_hw handle for a particular device.
+ *
+ * This function must be called prior to configuring scheduling domains.
+ */
+void dlb_hw_enable_sparse_dir_cq_mode(struct dlb_hw *hw);
+
+/**
+ * dlb_hw_set_qe_arbiter_weights() - program QE arbiter weights
+ * @hw: dlb_hw handle for a particular device.
+ * @weight: 8-entry array of arbiter weights.
+ *
+ * weight[N] programs priority N's weight. In cases where the 8 priorities are
+ * reduced to 4 bins, the mapping is:
+ * - weight[1] programs bin 0
+ * - weight[3] programs bin 1
+ * - weight[5] programs bin 2
+ * - weight[7] programs bin 3
+ */
+void dlb_hw_set_qe_arbiter_weights(struct dlb_hw *hw, u8 weight[8]);
+
+/**
+ * dlb_hw_set_qid_arbiter_weights() - program QID arbiter weights
+ * @hw: dlb_hw handle for a particular device.
+ * @weight: 8-entry array of arbiter weights.
+ *
+ * weight[N] programs priority N's weight. In cases where the 8 priorities are
+ * reduced to 4 bins, the mapping is:
+ * - weight[1] programs bin 0
+ * - weight[3] programs bin 1
+ * - weight[5] programs bin 2
+ * - weight[7] programs bin 3
+ */
+void dlb_hw_set_qid_arbiter_weights(struct dlb_hw *hw, u8 weight[8]);
+
+/**
+ * dlb_hw_enable_pp_sw_alarms() - enable out-of-credit alarm for all producer
+ * ports
+ * @hw: dlb_hw handle for a particular device.
+ */
+void dlb_hw_enable_pp_sw_alarms(struct dlb_hw *hw);
+
+/**
+ * dlb_hw_disable_pp_sw_alarms() - disable out-of-credit alarm for all producer
+ * ports
+ * @hw: dlb_hw handle for a particular device.
+ */
+void dlb_hw_disable_pp_sw_alarms(struct dlb_hw *hw);
+
+#endif /* __DLB_RESOURCE_H */
diff --git a/drivers/event/dlb/pf/base/dlb_user.h b/drivers/event/dlb/pf/base/dlb_user.h
new file mode 100644
index 0000000..6e7ee2e
--- /dev/null
+++ b/drivers/event/dlb/pf/base/dlb_user.h
@@ -0,0 +1,1084 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+ * Copyright(c) 2016-2020 Intel Corporation
+ */
+
+#ifndef __DLB_USER_H
+#define __DLB_USER_H
+
+#define DLB_MAX_NAME_LEN 64
+
+#include "dlb_osdep_types.h"
+
+enum dlb_error {
+	DLB_ST_SUCCESS = 0,
+	DLB_ST_NAME_EXISTS,
+	DLB_ST_DOMAIN_UNAVAILABLE,
+	DLB_ST_LDB_PORTS_UNAVAILABLE,
+	DLB_ST_DIR_PORTS_UNAVAILABLE,
+	DLB_ST_LDB_QUEUES_UNAVAILABLE,
+	DLB_ST_LDB_CREDITS_UNAVAILABLE,
+	DLB_ST_DIR_CREDITS_UNAVAILABLE,
+	DLB_ST_LDB_CREDIT_POOLS_UNAVAILABLE,
+	DLB_ST_DIR_CREDIT_POOLS_UNAVAILABLE,
+	DLB_ST_SEQUENCE_NUMBERS_UNAVAILABLE,
+	DLB_ST_INVALID_DOMAIN_ID,
+	DLB_ST_INVALID_QID_INFLIGHT_ALLOCATION,
+	DLB_ST_ATOMIC_INFLIGHTS_UNAVAILABLE,
+	DLB_ST_HIST_LIST_ENTRIES_UNAVAILABLE,
+	DLB_ST_INVALID_LDB_CREDIT_POOL_ID,
+	DLB_ST_INVALID_DIR_CREDIT_POOL_ID,
+	DLB_ST_INVALID_POP_COUNT_VIRT_ADDR,
+	DLB_ST_INVALID_LDB_QUEUE_ID,
+	DLB_ST_INVALID_CQ_DEPTH,
+	DLB_ST_INVALID_CQ_VIRT_ADDR,
+	DLB_ST_INVALID_PORT_ID,
+	DLB_ST_INVALID_QID,
+	DLB_ST_INVALID_PRIORITY,
+	DLB_ST_NO_QID_SLOTS_AVAILABLE,
+	DLB_ST_QED_FREELIST_ENTRIES_UNAVAILABLE,
+	DLB_ST_DQED_FREELIST_ENTRIES_UNAVAILABLE,
+	DLB_ST_INVALID_DIR_QUEUE_ID,
+	DLB_ST_DIR_QUEUES_UNAVAILABLE,
+	DLB_ST_INVALID_LDB_CREDIT_LOW_WATERMARK,
+	DLB_ST_INVALID_LDB_CREDIT_QUANTUM,
+	DLB_ST_INVALID_DIR_CREDIT_LOW_WATERMARK,
+	DLB_ST_INVALID_DIR_CREDIT_QUANTUM,
+	DLB_ST_DOMAIN_NOT_CONFIGURED,
+	DLB_ST_PID_ALREADY_ATTACHED,
+	DLB_ST_PID_NOT_ATTACHED,
+	DLB_ST_INTERNAL_ERROR,
+	DLB_ST_DOMAIN_IN_USE,
+	DLB_ST_IOMMU_MAPPING_ERROR,
+	DLB_ST_FAIL_TO_PIN_MEMORY_PAGE,
+	DLB_ST_UNABLE_TO_PIN_POPCOUNT_PAGES,
+	DLB_ST_UNABLE_TO_PIN_CQ_PAGES,
+	DLB_ST_DISCONTIGUOUS_CQ_MEMORY,
+	DLB_ST_DISCONTIGUOUS_POP_COUNT_MEMORY,
+	DLB_ST_DOMAIN_STARTED,
+	DLB_ST_LARGE_POOL_NOT_SPECIFIED,
+	DLB_ST_SMALL_POOL_NOT_SPECIFIED,
+	DLB_ST_NEITHER_POOL_SPECIFIED,
+	DLB_ST_DOMAIN_NOT_STARTED,
+	DLB_ST_INVALID_MEASUREMENT_DURATION,
+	DLB_ST_INVALID_PERF_METRIC_GROUP_ID,
+	DLB_ST_LDB_PORT_REQUIRED_FOR_LDB_QUEUES,
+	DLB_ST_DOMAIN_RESET_FAILED,
+	DLB_ST_MBOX_ERROR,
+	DLB_ST_INVALID_HIST_LIST_DEPTH,
+	DLB_ST_NO_MEMORY,
+};
+
+static const char dlb_error_strings[][128] = {
+	"DLB_ST_SUCCESS",
+	"DLB_ST_NAME_EXISTS",
+	"DLB_ST_DOMAIN_UNAVAILABLE",
+	"DLB_ST_LDB_PORTS_UNAVAILABLE",
+	"DLB_ST_DIR_PORTS_UNAVAILABLE",
+	"DLB_ST_LDB_QUEUES_UNAVAILABLE",
+	"DLB_ST_LDB_CREDITS_UNAVAILABLE",
+	"DLB_ST_DIR_CREDITS_UNAVAILABLE",
+	"DLB_ST_LDB_CREDIT_POOLS_UNAVAILABLE",
+	"DLB_ST_DIR_CREDIT_POOLS_UNAVAILABLE",
+	"DLB_ST_SEQUENCE_NUMBERS_UNAVAILABLE",
+	"DLB_ST_INVALID_DOMAIN_ID",
+	"DLB_ST_INVALID_QID_INFLIGHT_ALLOCATION",
+	"DLB_ST_ATOMIC_INFLIGHTS_UNAVAILABLE",
+	"DLB_ST_HIST_LIST_ENTRIES_UNAVAILABLE",
+	"DLB_ST_INVALID_LDB_CREDIT_POOL_ID",
+	"DLB_ST_INVALID_DIR_CREDIT_POOL_ID",
+	"DLB_ST_INVALID_POP_COUNT_VIRT_ADDR",
+	"DLB_ST_INVALID_LDB_QUEUE_ID",
+	"DLB_ST_INVALID_CQ_DEPTH",
+	"DLB_ST_INVALID_CQ_VIRT_ADDR",
+	"DLB_ST_INVALID_PORT_ID",
+	"DLB_ST_INVALID_QID",
+	"DLB_ST_INVALID_PRIORITY",
+	"DLB_ST_NO_QID_SLOTS_AVAILABLE",
+	"DLB_ST_QED_FREELIST_ENTRIES_UNAVAILABLE",
+	"DLB_ST_DQED_FREELIST_ENTRIES_UNAVAILABLE",
+	"DLB_ST_INVALID_DIR_QUEUE_ID",
+	"DLB_ST_DIR_QUEUES_UNAVAILABLE",
+	"DLB_ST_INVALID_LDB_CREDIT_LOW_WATERMARK",
+	"DLB_ST_INVALID_LDB_CREDIT_QUANTUM",
+	"DLB_ST_INVALID_DIR_CREDIT_LOW_WATERMARK",
+	"DLB_ST_INVALID_DIR_CREDIT_QUANTUM",
+	"DLB_ST_DOMAIN_NOT_CONFIGURED",
+	"DLB_ST_PID_ALREADY_ATTACHED",
+	"DLB_ST_PID_NOT_ATTACHED",
+	"DLB_ST_INTERNAL_ERROR",
+	"DLB_ST_DOMAIN_IN_USE",
+	"DLB_ST_IOMMU_MAPPING_ERROR",
+	"DLB_ST_FAIL_TO_PIN_MEMORY_PAGE",
+	"DLB_ST_UNABLE_TO_PIN_POPCOUNT_PAGES",
+	"DLB_ST_UNABLE_TO_PIN_CQ_PAGES",
+	"DLB_ST_DISCONTIGUOUS_CQ_MEMORY",
+	"DLB_ST_DISCONTIGUOUS_POP_COUNT_MEMORY",
+	"DLB_ST_DOMAIN_STARTED",
+	"DLB_ST_LARGE_POOL_NOT_SPECIFIED",
+	"DLB_ST_SMALL_POOL_NOT_SPECIFIED",
+	"DLB_ST_NEITHER_POOL_SPECIFIED",
+	"DLB_ST_DOMAIN_NOT_STARTED",
+	"DLB_ST_INVALID_MEASUREMENT_DURATION",
+	"DLB_ST_INVALID_PERF_METRIC_GROUP_ID",
+	"DLB_ST_LDB_PORT_REQUIRED_FOR_LDB_QUEUES",
+	"DLB_ST_DOMAIN_RESET_FAILED",
+	"DLB_ST_MBOX_ERROR",
+	"DLB_ST_INVALID_HIST_LIST_DEPTH",
+	"DLB_ST_NO_MEMORY",
+};
+
+struct dlb_cmd_response {
+	__u32 status; /* Interpret using enum dlb_error */
+	__u32 id;
+};
+
+/******************************/
+/* 'dlb' device file commands */
+/******************************/
+
+#define DLB_DEVICE_VERSION(x) (((x) >> 8) & 0xFF)
+#define DLB_DEVICE_REVISION(x) ((x) & 0xFF)
+
+enum dlb_revisions {
+	DLB_REV_A0 = 0,
+	DLB_REV_A1 = 1,
+	DLB_REV_A2 = 2,
+	DLB_REV_A3 = 3,
+	DLB_REV_B0 = 4,
+};
+
+/*
+ * DLB_CMD_GET_DEVICE_VERSION: Query the DLB device version.
+ *
+ *	This ioctl interface is the same in all driver versions and is always
+ *	the first ioctl.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id[7:0]: Device revision.
+ *	response.id[15:8]: Device version.
+ */
+
+struct dlb_get_device_version_args {
+	/* Output parameters */
+	__u64 response;
+};
+
+#define DLB_VERSION_MAJOR_NUMBER 10
+#define DLB_VERSION_MINOR_NUMBER 7
+#define DLB_VERSION_REVISION_NUMBER 9
+#define DLB_VERSION (DLB_VERSION_MAJOR_NUMBER << 24 | \
+		     DLB_VERSION_MINOR_NUMBER << 16 | \
+		     DLB_VERSION_REVISION_NUMBER)
+
+#define DLB_VERSION_GET_MAJOR_NUMBER(x) (((x) >> 24) & 0xFF)
+#define DLB_VERSION_GET_MINOR_NUMBER(x) (((x) >> 16) & 0xFF)
+#define DLB_VERSION_GET_REVISION_NUMBER(x) ((x) & 0xFFFF)
+
+static inline __u8 dlb_version_incompatible(__u32 version)
+{
+	__u8 inc;
+
+	inc = DLB_VERSION_GET_MAJOR_NUMBER(version) != DLB_VERSION_MAJOR_NUMBER;
+	inc |= (int)DLB_VERSION_GET_MINOR_NUMBER(version) <
+		DLB_VERSION_MINOR_NUMBER;
+
+	return inc;
+}
+
+/*
+ * DLB_CMD_GET_DRIVER_VERSION: Query the DLB driver version. The major number
+ *	is changed when there is an ABI-breaking change, the minor number is
+ *	changed if the API is changed in a backwards-compatible way, and the
+ *	revision number is changed for fixes that don't affect the API.
+ *
+ *	If the kernel driver's API version major number and the header's
+ *	DLB_VERSION_MAJOR_NUMBER differ, the two are incompatible, or if the
+ *	major numbers match but the kernel driver's minor number is less than
+ *	the header file's, they are incompatible. The DLB_VERSION_INCOMPATIBLE
+ *	macro should be used to check for compatibility.
+ *
+ *	This ioctl interface is the same in all driver versions. Applications
+ *	should check the driver version before performing any other ioctl
+ *	operations.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Driver API version. Use the DLB_VERSION_GET_MAJOR_NUMBER,
+ *		DLB_VERSION_GET_MINOR_NUMBER, and
+ *		DLB_VERSION_GET_REVISION_NUMBER macros to interpret the field.
+ */
+
+struct dlb_get_driver_version_args {
+	/* Output parameters */
+	__u64 response;
+};
+
+/*
+ * DLB_CMD_CREATE_SCHED_DOMAIN: Create a DLB scheduling domain and reserve the
+ *	resources (queues, ports, etc.) that it contains.
+ *
+ * Input parameters:
+ * - num_ldb_queues: Number of load-balanced queues.
+ * - num_ldb_ports: Number of load-balanced ports.
+ * - num_dir_ports: Number of directed ports. A directed port has one directed
+ *	queue, so no num_dir_queues argument is necessary.
+ * - num_atomic_inflights: This specifies the amount of temporary atomic QE
+ *	storage for the domain. This storage is divided among the domain's
+ *	load-balanced queues that are configured for atomic scheduling.
+ * - num_hist_list_entries: Amount of history list storage. This is divided
+ *	among the domain's CQs.
+ * - num_ldb_credits: Amount of load-balanced QE storage (QED). QEs occupy this
+ *	space until they are scheduled to a load-balanced CQ. One credit
+ *	represents the storage for one QE.
+ * - num_dir_credits: Amount of directed QE storage (DQED). QEs occupy this
+ *	space until they are scheduled to a directed CQ. One credit represents
+ *	the storage for one QE.
+ * - num_ldb_credit_pools: Number of pools into which the load-balanced credits
+ *	are placed.
+ * - num_dir_credit_pools: Number of pools into which the directed credits are
+ *	placed.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: domain ID.
+ */
+struct dlb_create_sched_domain_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 num_ldb_queues;
+	__u32 num_ldb_ports;
+	__u32 num_dir_ports;
+	__u32 num_atomic_inflights;
+	__u32 num_hist_list_entries;
+	__u32 num_ldb_credits;
+	__u32 num_dir_credits;
+	__u32 num_ldb_credit_pools;
+	__u32 num_dir_credit_pools;
+};
+
+/*
+ * DLB_CMD_GET_NUM_RESOURCES: Return the number of available resources
+ *	(queues, ports, etc.) that this device owns.
+ *
+ * Output parameters:
+ * - num_domains: Number of available scheduling domains.
+ * - num_ldb_queues: Number of available load-balanced queues.
+ * - num_ldb_ports: Number of available load-balanced ports.
+ * - num_dir_ports: Number of available directed ports. There is one directed
+ *	queue for every directed port.
+ * - num_atomic_inflights: Amount of available temporary atomic QE storage.
+ * - max_contiguous_atomic_inflights: When a domain is created, the temporary
+ *	atomic QE storage is allocated in a contiguous chunk. This return value
+ *	is the longest available contiguous range of atomic QE storage.
+ * - num_hist_list_entries: Amount of history list storage.
+ * - max_contiguous_hist_list_entries: History list storage is allocated in
+ *	a contiguous chunk, and this return value is the longest available
+ *	contiguous range of history list entries.
+ * - num_ldb_credits: Amount of available load-balanced QE storage.
+ * - max_contiguous_ldb_credits: QED storage is allocated in a contiguous
+ *	chunk, and this return value is the longest available contiguous range
+ *	of load-balanced credit storage.
+ * - num_dir_credits: Amount of available directed QE storage.
+ * - max_contiguous_dir_credits: DQED storage is allocated in a contiguous
+ *	chunk, and this return value is the longest available contiguous range
+ *	of directed credit storage.
+ * - num_ldb_credit_pools: Number of available load-balanced credit pools.
+ * - num_dir_credit_pools: Number of available directed credit pools.
+ * - padding0: Reserved for future use.
+ */
+struct dlb_get_num_resources_args {
+	/* Output parameters */
+	__u32 num_sched_domains;
+	__u32 num_ldb_queues;
+	__u32 num_ldb_ports;
+	__u32 num_dir_ports;
+	__u32 num_atomic_inflights;
+	__u32 max_contiguous_atomic_inflights;
+	__u32 num_hist_list_entries;
+	__u32 max_contiguous_hist_list_entries;
+	__u32 num_ldb_credits;
+	__u32 max_contiguous_ldb_credits;
+	__u32 num_dir_credits;
+	__u32 max_contiguous_dir_credits;
+	__u32 num_ldb_credit_pools;
+	__u32 num_dir_credit_pools;
+	__u32 padding0;
+};
+
+/*
+ * DLB_CMD_SET_SN_ALLOCATION: Configure a sequence number group
+ *
+ * Input parameters:
+ * - group: Sequence number group ID.
+ * - num: Number of sequence numbers per queue.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_set_sn_allocation_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 group;
+	__u32 num;
+};
+
+/*
+ * DLB_CMD_GET_SN_ALLOCATION: Get a sequence number group's configuration
+ *
+ * Input parameters:
+ * - group: Sequence number group ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Specified group's number of sequence numbers per queue.
+ */
+struct dlb_get_sn_allocation_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 group;
+	__u32 padding0;
+};
+
+/*
+ * DLB_CMD_QUERY_CQ_POLL_MODE: Query the CQ poll mode the kernel driver is using
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: CQ poll mode (see enum dlb_cq_poll_modes).
+ */
+struct dlb_query_cq_poll_mode_args {
+	/* Output parameters */
+	__u64 response;
+};
+
+enum dlb_cq_poll_modes {
+	DLB_CQ_POLL_MODE_STD,
+	DLB_CQ_POLL_MODE_SPARSE,
+
+	/* NUM_DLB_CQ_POLL_MODE must be last */
+	NUM_DLB_CQ_POLL_MODE,
+};
+
+/*
+ * DLB_CMD_GET_SN_OCCUPANCY: Get a sequence number group's occupancy
+ *
+ * Each sequence number group has one or more slots, depending on its
+ * configuration. I.e.:
+ * - If configured for 1024 sequence numbers per queue, the group has 1 slot
+ * - If configured for 512 sequence numbers per queue, the group has 2 slots
+ *   ...
+ * - If configured for 32 sequence numbers per queue, the group has 32 slots
+ *
+ * This ioctl returns the group's number of in-use slots. If its occupancy is
+ * 0, the group's sequence number allocation can be reconfigured.
+ *
+ * Input parameters:
+ * - group: Sequence number group ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Specified group's number of used slots.
+ */
+struct dlb_get_sn_occupancy_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 group;
+	__u32 padding0;
+};
+
+enum dlb_user_interface_commands {
+	DLB_CMD_GET_DEVICE_VERSION,
+	DLB_CMD_CREATE_SCHED_DOMAIN,
+	DLB_CMD_GET_NUM_RESOURCES,
+	DLB_CMD_GET_DRIVER_VERSION,
+	DLB_CMD_SAMPLE_PERF_COUNTERS,
+	DLB_CMD_SET_SN_ALLOCATION,
+	DLB_CMD_GET_SN_ALLOCATION,
+	DLB_CMD_MEASURE_SCHED_COUNTS,
+	DLB_CMD_QUERY_CQ_POLL_MODE,
+	DLB_CMD_GET_SN_OCCUPANCY,
+
+	/* NUM_DLB_CMD must be last */
+	NUM_DLB_CMD,
+};
+
+/*******************************/
+/* 'domain' device file alerts */
+/*******************************/
+
+/* Scheduling domain device files can be read to receive domain-specific
+ * notifications, for alerts such as hardware errors.
+ *
+ * Each alert is encoded in a 16B message. The first 8B contains the alert ID,
+ * and the second 8B is optional and contains additional information.
+ * Applications should cast read data to a struct dlb_domain_alert, and
+ * interpret the struct's alert_id according to dlb_domain_alert_id. The read
+ * length must be 16B, or the function will return -EINVAL.
+ *
+ * Reads are destructive, and in the case of multiple file descriptors for the
+ * same domain device file, an alert will be read by only one of the file
+ * descriptors.
+ *
+ * The driver stores alerts in a fixed-size alert ring until they are read. If
+ * the alert ring fills completely, subsequent alerts will be dropped. It is
+ * recommended that DLB applications dedicate a thread to perform blocking
+ * reads on the device file.
+ */
+enum dlb_domain_alert_id {
+	/* A destination domain queue that this domain connected to has
+	 * unregistered, and can no longer be sent to. The aux alert data
+	 * contains the queue ID.
+	 */
+	DLB_DOMAIN_ALERT_REMOTE_QUEUE_UNREGISTER,
+	/* A producer port in this domain attempted to send a QE without a
+	 * credit. aux_alert_data[7:0] contains the port ID, and
+	 * aux_alert_data[15:8] contains a flag indicating whether the port is
+	 * load-balanced (1) or directed (0).
+	 */
+	DLB_DOMAIN_ALERT_PP_OUT_OF_CREDITS,
+	/* Software issued an illegal enqueue for a port in this domain. An
+	 * illegal enqueue could be:
+	 * - Illegal (excess) completion
+	 * - Illegal fragment
+	 * - Illegal enqueue command
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_PP_ILLEGAL_ENQ,
+	/* Software issued excess CQ token pops for a port in this domain.
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_PP_EXCESS_TOKEN_POPS,
+	/* A enqueue contained either an invalid command encoding or a REL,
+	 * REL_T, RLS, FWD, FWD_T, FRAG, or FRAG_T from a directed port.
+	 *
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_ILLEGAL_HCW,
+	/* The QID must be valid and less than 128.
+	 *
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_ILLEGAL_QID,
+	/* An enqueue went to a disabled QID.
+	 *
+	 * aux_alert_data[7:0] contains the port ID, and aux_alert_data[15:8]
+	 * contains a flag indicating whether the port is load-balanced (1) or
+	 * directed (0).
+	 */
+	DLB_DOMAIN_ALERT_DISABLED_QID,
+	/* The device containing this domain was reset. All applications using
+	 * the device need to exit for the driver to complete the reset
+	 * procedure.
+	 *
+	 * aux_alert_data doesn't contain any information for this alert.
+	 */
+	DLB_DOMAIN_ALERT_DEVICE_RESET,
+	/* User-space has enqueued an alert.
+	 *
+	 * aux_alert_data contains user-provided data.
+	 */
+	DLB_DOMAIN_ALERT_USER,
+
+	/* Number of DLB domain alerts */
+	NUM_DLB_DOMAIN_ALERTS
+};
+
+static const char dlb_domain_alert_strings[][128] = {
+	"DLB_DOMAIN_ALERT_REMOTE_QUEUE_UNREGISTER",
+	"DLB_DOMAIN_ALERT_PP_OUT_OF_CREDITS",
+	"DLB_DOMAIN_ALERT_PP_ILLEGAL_ENQ",
+	"DLB_DOMAIN_ALERT_PP_EXCESS_TOKEN_POPS",
+	"DLB_DOMAIN_ALERT_ILLEGAL_HCW",
+	"DLB_DOMAIN_ALERT_ILLEGAL_QID",
+	"DLB_DOMAIN_ALERT_DISABLED_QID",
+	"DLB_DOMAIN_ALERT_DEVICE_RESET",
+	"DLB_DOMAIN_ALERT_USER",
+};
+
+struct dlb_domain_alert {
+	__u64 alert_id;
+	__u64 aux_alert_data;
+};
+
+/*********************************/
+/* 'domain' device file commands */
+/*********************************/
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_LDB_POOL: Configure a load-balanced credit pool.
+ * Input parameters:
+ * - num_ldb_credits: Number of load-balanced credits (QED space) for this
+ *	pool.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: pool ID.
+ */
+struct dlb_create_ldb_pool_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 num_ldb_credits;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_DIR_POOL: Configure a directed credit pool.
+ * Input parameters:
+ * - num_dir_credits: Number of directed credits (DQED space) for this pool.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Pool ID.
+ */
+struct dlb_create_dir_pool_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 num_dir_credits;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_LDB_QUEUE: Configure a load-balanced queue.
+ * Input parameters:
+ * - num_atomic_inflights: This specifies the amount of temporary atomic QE
+ *	storage for this queue. If zero, the queue will not support atomic
+ *	scheduling.
+ * - num_sequence_numbers: This specifies the number of sequence numbers used
+ *	by this queue. If zero, the queue will not support ordered scheduling.
+ *	If non-zero, the queue will not support unordered scheduling.
+ * - num_qid_inflights: The maximum number of QEs that can be inflight
+ *	(scheduled to a CQ but not completed) at any time. If
+ *	num_sequence_numbers is non-zero, num_qid_inflights must be set equal
+ *	to num_sequence_numbers.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Queue ID.
+ */
+struct dlb_create_ldb_queue_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 num_sequence_numbers;
+	__u32 num_qid_inflights;
+	__u32 num_atomic_inflights;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_DIR_QUEUE: Configure a directed queue.
+ * Input parameters:
+ * - port_id: Port ID. If the corresponding directed port is already created,
+ *	specify its ID here. Else this argument must be 0xFFFFFFFF to indicate
+ *	that the queue is being created before the port.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Queue ID.
+ */
+struct dlb_create_dir_queue_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__s32 port_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_LDB_PORT: Configure a load-balanced port.
+ * Input parameters:
+ * - ldb_credit_pool_id: Load-balanced credit pool this port will belong to.
+ * - dir_credit_pool_id: Directed credit pool this port will belong to.
+ * - ldb_credit_high_watermark: Number of load-balanced credits from the pool
+ *	that this port will own.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_high_watermark: Number of directed credits from the pool that
+ *	this port will own.
+ *
+ *	If this port's scheduling domain doesn't have any directed queues,
+ *	this argument is ignored and the port is given no directed credits.
+ * - ldb_credit_low_watermark: Load-balanced credit low watermark. When the
+ *	port's credits reach this watermark, they become eligible to be
+ *	refilled by the DLB as credits until the high watermark
+ *	(num_ldb_credits) is reached.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_low_watermark: Directed credit low watermark. When the port's
+ *	credits reach this watermark, they become eligible to be refilled by
+ *	the DLB as credits until the high watermark (num_dir_credits) is
+ *	reached.
+ *
+ *	If this port's scheduling domain doesn't have any directed queues,
+ *	this argument is ignored and the port is given no directed credits.
+ * - ldb_credit_quantum: Number of load-balanced credits for the DLB to refill
+ *	per refill operation.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_quantum: Number of directed credits for the DLB to refill per
+ *	refill operation.
+ *
+ *	If this port's scheduling domain doesn't have any directed queues,
+ *	this argument is ignored and the port is given no directed credits.
+ * - padding0: Reserved for future use.
+ * - cq_depth: Depth of the port's CQ. Must be a power-of-two between 8 and
+ *	1024, inclusive.
+ * - cq_depth_threshold: CQ depth interrupt threshold. A value of N means that
+ *	the CQ interrupt won't fire until there are N or more outstanding CQ
+ *	tokens.
+ * - cq_history_list_size: Number of history list entries. This must be greater
+ *	than or equal to cq_depth.
+ * - padding1: Reserved for future use.
+ * - padding2: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: port ID.
+ */
+struct dlb_create_ldb_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 ldb_credit_pool_id;
+	__u32 dir_credit_pool_id;
+	__u16 ldb_credit_high_watermark;
+	__u16 ldb_credit_low_watermark;
+	__u16 ldb_credit_quantum;
+	__u16 dir_credit_high_watermark;
+	__u16 dir_credit_low_watermark;
+	__u16 dir_credit_quantum;
+	__u16 padding0;
+	__u16 cq_depth;
+	__u16 cq_depth_threshold;
+	__u16 cq_history_list_size;
+	__u32 padding1;
+};
+
+/*
+ * DLB_DOMAIN_CMD_CREATE_DIR_PORT: Configure a directed port.
+ * Input parameters:
+ * - ldb_credit_pool_id: Load-balanced credit pool this port will belong to.
+ * - dir_credit_pool_id: Directed credit pool this port will belong to.
+ * - ldb_credit_high_watermark: Number of load-balanced credits from the pool
+ *	that this port will own.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_high_watermark: Number of directed credits from the pool that
+ *	this port will own.
+ * - ldb_credit_low_watermark: Load-balanced credit low watermark. When the
+ *	port's credits reach this watermark, they become eligible to be
+ *	refilled by the DLB as credits until the high watermark
+ *	(num_ldb_credits) is reached.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_low_watermark: Directed credit low watermark. When the port's
+ *	credits reach this watermark, they become eligible to be refilled by
+ *	the DLB as credits until the high watermark (num_dir_credits) is
+ *	reached.
+ * - ldb_credit_quantum: Number of load-balanced credits for the DLB to refill
+ *	per refill operation.
+ *
+ *	If this port's scheduling domain doesn't have any load-balanced queues,
+ *	this argument is ignored and the port is given no load-balanced
+ *	credits.
+ * - dir_credit_quantum: Number of directed credits for the DLB to refill per
+ *	refill operation.
+ * - cq_depth: Depth of the port's CQ. Must be a power-of-two between 8 and
+ *	1024, inclusive.
+ * - cq_depth_threshold: CQ depth interrupt threshold. A value of N means that
+ *	the CQ interrupt won't fire until there are N or more outstanding CQ
+ *	tokens.
+ * - qid: Queue ID. If the corresponding directed queue is already created,
+ *	specify its ID here. Else this argument must be 0xFFFFFFFF to indicate
+ *	that the port is being created before the queue.
+ * - padding1: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: Port ID.
+ */
+struct dlb_create_dir_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 ldb_credit_pool_id;
+	__u32 dir_credit_pool_id;
+	__u16 ldb_credit_high_watermark;
+	__u16 ldb_credit_low_watermark;
+	__u16 ldb_credit_quantum;
+	__u16 dir_credit_high_watermark;
+	__u16 dir_credit_low_watermark;
+	__u16 dir_credit_quantum;
+	__u16 cq_depth;
+	__u16 cq_depth_threshold;
+	__s32 queue_id;
+	__u32 padding1;
+};
+
+/*
+ * DLB_DOMAIN_CMD_START_DOMAIN: Mark the end of the domain configuration. This
+ *	must be called before passing QEs into the device, and no configuration
+ *	ioctls can be issued once the domain has started. Sending QEs into the
+ *	device before calling this ioctl will result in undefined behavior.
+ * Input parameters:
+ * - (None)
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_start_domain_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+};
+
+/*
+ * DLB_DOMAIN_CMD_MAP_QID: Map a load-balanced queue to a load-balanced port.
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - qid: Load-balanced queue ID.
+ * - priority: Queue->port service priority.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_map_qid_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 qid;
+	__u32 priority;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_UNMAP_QID: Unmap a load-balanced queue to a load-balanced
+ *	port.
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - qid: Load-balanced queue ID.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_unmap_qid_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 qid;
+};
+
+/*
+ * DLB_DOMAIN_CMD_ENABLE_LDB_PORT: Enable scheduling to a load-balanced port.
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_enable_ldb_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_ENABLE_DIR_PORT: Enable scheduling to a directed port.
+ * Input parameters:
+ * - port_id: Directed port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_enable_dir_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+};
+
+/*
+ * DLB_DOMAIN_CMD_DISABLE_LDB_PORT: Disable scheduling to a load-balanced port.
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_disable_ldb_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_DISABLE_DIR_PORT: Disable scheduling to a directed port.
+ * Input parameters:
+ * - port_id: Directed port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_disable_dir_port_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_BLOCK_ON_CQ_INTERRUPT: Block on a CQ interrupt until a QE
+ *	arrives for the specified port. If a QE is already present, the ioctl
+ *	will immediately return.
+ *
+ *	Note: Only one thread can block on a CQ's interrupt at a time. Doing
+ *	otherwise can result in hung threads.
+ *
+ * Input parameters:
+ * - port_id: Port ID.
+ * - is_ldb: True if the port is load-balanced, false otherwise.
+ * - arm: Tell the driver to arm the interrupt.
+ * - cq_gen: Current CQ generation bit.
+ * - padding0: Reserved for future use.
+ * - cq_va: VA of the CQ entry where the next QE will be placed.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_block_on_cq_interrupt_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u8 is_ldb;
+	__u8 arm;
+	__u8 cq_gen;
+	__u8 padding0;
+	__u64 cq_va;
+};
+
+/*
+ * DLB_DOMAIN_CMD_ENQUEUE_DOMAIN_ALERT: Enqueue a domain alert that will be
+ *	read by one reader thread.
+ *
+ * Input parameters:
+ * - aux_alert_data: user-defined auxiliary data.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ */
+struct dlb_enqueue_domain_alert_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u64 aux_alert_data;
+};
+
+/*
+ * DLB_DOMAIN_CMD_GET_LDB_QUEUE_DEPTH: Get a load-balanced queue's depth.
+ * Input parameters:
+ * - queue_id: The load-balanced queue ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: queue depth.
+ */
+struct dlb_get_ldb_queue_depth_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 queue_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_GET_DIR_QUEUE_DEPTH: Get a directed queue's depth.
+ * Input parameters:
+ * - queue_id: The directed queue ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: queue depth.
+ */
+struct dlb_get_dir_queue_depth_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 queue_id;
+	__u32 padding0;
+};
+
+/*
+ * DLB_DOMAIN_CMD_PENDING_PORT_UNMAPS: Get number of queue unmap operations in
+ *	progress for a load-balanced port.
+ *
+ *	Note: This is a snapshot; the number of unmap operations in progress
+ *	is subject to change at any time.
+ *
+ * Input parameters:
+ * - port_id: Load-balanced port ID.
+ * - padding0: Reserved for future use.
+ *
+ * Output parameters:
+ * - response: pointer to a struct dlb_cmd_response.
+ *	response.status: Detailed error code. In certain cases, such as if the
+ *		response pointer is invalid, the driver won't set status.
+ *	response.id: number of unmaps in progress.
+ */
+struct dlb_pending_port_unmaps_args {
+	/* Output parameters */
+	__u64 response;
+	/* Input parameters */
+	__u32 port_id;
+	__u32 padding0;
+};
+
+enum dlb_domain_user_interface_commands {
+	DLB_DOMAIN_CMD_CREATE_LDB_POOL,
+	DLB_DOMAIN_CMD_CREATE_DIR_POOL,
+	DLB_DOMAIN_CMD_CREATE_LDB_QUEUE,
+	DLB_DOMAIN_CMD_CREATE_DIR_QUEUE,
+	DLB_DOMAIN_CMD_CREATE_LDB_PORT,
+	DLB_DOMAIN_CMD_CREATE_DIR_PORT,
+	DLB_DOMAIN_CMD_START_DOMAIN,
+	DLB_DOMAIN_CMD_MAP_QID,
+	DLB_DOMAIN_CMD_UNMAP_QID,
+	DLB_DOMAIN_CMD_ENABLE_LDB_PORT,
+	DLB_DOMAIN_CMD_ENABLE_DIR_PORT,
+	DLB_DOMAIN_CMD_DISABLE_LDB_PORT,
+	DLB_DOMAIN_CMD_DISABLE_DIR_PORT,
+	DLB_DOMAIN_CMD_BLOCK_ON_CQ_INTERRUPT,
+	DLB_DOMAIN_CMD_ENQUEUE_DOMAIN_ALERT,
+	DLB_DOMAIN_CMD_GET_LDB_QUEUE_DEPTH,
+	DLB_DOMAIN_CMD_GET_DIR_QUEUE_DEPTH,
+	DLB_DOMAIN_CMD_PENDING_PORT_UNMAPS,
+
+	/* NUM_DLB_DOMAIN_CMD must be last */
+	NUM_DLB_DOMAIN_CMD,
+};
+
+/*
+ * Base addresses for memory mapping the consumer queue (CQ) and popcount (PC)
+ * memory space, and producer port (PP) MMIO space. The CQ, PC, and PP
+ * addresses are per-port. Every address is page-separated (e.g. LDB PP 0 is at
+ * 0x2100000 and LDB PP 1 is at 0x2101000).
+ */
+#define DLB_LDB_CQ_BASE 0x3000000
+#define DLB_LDB_CQ_MAX_SIZE 65536
+#define DLB_LDB_CQ_OFFS(id) (DLB_LDB_CQ_BASE + (id) * DLB_LDB_CQ_MAX_SIZE)
+
+#define DLB_DIR_CQ_BASE 0x3800000
+#define DLB_DIR_CQ_MAX_SIZE 65536
+#define DLB_DIR_CQ_OFFS(id) (DLB_DIR_CQ_BASE + (id) * DLB_DIR_CQ_MAX_SIZE)
+
+#define DLB_LDB_PC_BASE 0x2300000
+#define DLB_LDB_PC_MAX_SIZE 4096
+#define DLB_LDB_PC_OFFS(id) (DLB_LDB_PC_BASE + (id) * DLB_LDB_PC_MAX_SIZE)
+
+#define DLB_DIR_PC_BASE 0x2200000
+#define DLB_DIR_PC_MAX_SIZE 4096
+#define DLB_DIR_PC_OFFS(id) (DLB_DIR_PC_BASE + (id) * DLB_DIR_PC_MAX_SIZE)
+
+#define DLB_LDB_PP_BASE 0x2100000
+#define DLB_LDB_PP_MAX_SIZE 4096
+#define DLB_LDB_PP_OFFS(id) (DLB_LDB_PP_BASE + (id) * DLB_LDB_PP_MAX_SIZE)
+
+#define DLB_DIR_PP_BASE 0x2000000
+#define DLB_DIR_PP_MAX_SIZE 4096
+#define DLB_DIR_PP_OFFS(id) (DLB_DIR_PP_BASE + (id) * DLB_DIR_PP_MAX_SIZE)
+
+#endif /* __DLB_USER_H */
-- 
1.7.10


^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH 00/27] event/dlb Intel DLB PMD
@ 2020-06-27  4:37  2% Tim McDaniel
                     ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Tim McDaniel @ 2020-06-27  4:37 UTC (permalink / raw)
  To: jerinj; +Cc: mattias.ronnblom, dev, gage.eads, harry.van.haaren

Hello Jerin and the DPDK community.

    The following patchset adds support for a new eventdev PMD. The DLB
    PMD adds support for the Intel Dynamic Load Balancer (DLB) hardware.
    The DLB is a PCIe device that provides load-balanced, prioritized
    scheduling of core-to-core communication. The device consists of
    queues and arbiters that connect producer and consumer cores, and
    implements load-balanced queueing features including:
    - Lock-free multi-producer/multi-consumer operation.
    - Multiple priority levels for varying traffic types.
    - 'Direct' traffic (i.e. multi-producer/single-consumer)
    - Simple unordered load-balanced distribution.
    - Atomic lock-free load balancing across multiple consumers.
    - Queue element reordering feature allowing ordered load-balanced
      distribution.

    The DLB hardware supports both load balanced and directed ports and
    queues. Unlike other eventdev devices already in the repo,  not all
    DLB ports and queues are equally capable. In particular, directed
    ports are limited to a single link, and must be connected to a directed
    queue.
    Additionally, even though LDB ports may link multiple queues, the
    number of queues that may be linked is limited by hardware. Another
    difference is that DLB does not have a straightforward way of carrying
    the flow_id in the queue elements (QE) that the hardware operates on.

    Due to the above restrictions, we would like to extend the eventdev
    interface in manner that allows our PMD to take full advantage of
    the DLB hardware, while also adding useful functionality for non-Intel
    PMDs. Our proposed changes are contained in the first two patches
    in this patch set.

    While reviewing the code, please be aware that this PMD has full
    control over the DLB hardware. Intel will be extending the DLB PMD
    in the future (not as part of this first series) with a mode that we
    refer to as the bifurcated PMD. The bifurcated PMD communicates with a
    kernel driver to configure the device, ports, and queues, and memory
    maps device MMIO so data path operations occur purely in user-space.

    The framework to support both the PF PMD and bifurcated PMD exists in
    this patch set, and is why the iface.[ch] layer is present.

    New for V2
    =========

    	1) Correct ABI break that was present in V1. 
	2) Address some of the review comments received from Mattias.
    	   I will address the remaining items identified by Mattias in the next
    	   patch delivery.
	3) General code cleanup based on internal code reviews

    Known Issues:
        1) some core dlb enqueue functions appear prematurely in a unrelated
           patch. This will be addressed in the next update to this
           patch series, after the first round of reviews has been
           completed.


    Thank You and Best Regards,

    Tim McDaniel

McDaniel, Timothy (27):
  eventdev: dlb upstream prerequisites
  eventdev: do not pass disable_implicit_release bit to trace macro
  event/dlb: add shared code version 10.7.9
  event/dlb: add make and meson build infrastructure
  event/dlb: add DLB documentation
  event/dlb: add dynamic logging
  event/dlb: add private data structures and constants
  event/dlb: add definitions shared with LKM or shared code
  event/dlb: add inline functions used in multiple files
  event/dlb: add PFPMD-specific interface layer to shared code
  event/dlb: add flexible PMD to device interfaces
  event/dlb: add the PMD's public interfaces
  event/dlb: add xstats support
  event/dlb: add PMD self-tests
  event/dlb: add probe
  event/dlb: add infos_get and configure
  event/dlb: add queue_def_conf and port_def_conf
  event/dlb: add queue setup
  event/dlb: add port_setup
  event/dlb: add port_link
  event/dlb: add queue_release and port_release
  event/dlb: add port_unlink and port_unlinks_in_progress
  event/dlb: add eventdev_start
  event/dlb: add timeout_ticks, dump, xstats, and selftest
  event/dlb: add enqueue and its burst variants
  event/dlb: add dequeue, dequeue_burst, and variants
  event/dlb: add eventdev_stop and eventdev_close

 app/test-eventdev/evt_common.h                     |    1 +
 app/test-eventdev/test_order_atq.c                 |    4 +
 app/test-eventdev/test_order_common.c              |    6 +-
 app/test-eventdev/test_order_queue.c               |    4 +
 app/test-eventdev/test_perf_atq.c                  |    1 +
 app/test-eventdev/test_perf_queue.c                |    1 +
 app/test-eventdev/test_pipeline_atq.c              |    1 +
 app/test-eventdev/test_pipeline_queue.c            |    1 +
 app/test/test_eventdev.c                           |    4 +-
 config/common_base                                 |   19 +-
 config/rte_config.h                                |    8 +-
 doc/guides/eventdevs/dlb.rst                       |  497 +
 drivers/event/Makefile                             |    7 +
 drivers/event/dlb/Makefile                         |   35 +
 drivers/event/dlb/dlb.c                            | 4232 +++++++++
 drivers/event/dlb/dlb_iface.c                      |  105 +
 drivers/event/dlb/dlb_iface.h                      |   92 +
 drivers/event/dlb/dlb_inline_fns.h                 |   80 +
 drivers/event/dlb/dlb_log.h                        |   24 +
 drivers/event/dlb/dlb_priv.h                       |  564 ++
 drivers/event/dlb/dlb_selftest.c                   | 1564 ++++
 drivers/event/dlb/dlb_user.h                       | 1083 +++
 drivers/event/dlb/dlb_xstats.c                     | 1249 +++
 drivers/event/dlb/meson.build                      |   15 +
 drivers/event/dlb/pf/base/dlb_hw_types.h           |  360 +
 drivers/event/dlb/pf/base/dlb_mbox.h               |  645 ++
 drivers/event/dlb/pf/base/dlb_osdep.h              |  348 +
 drivers/event/dlb/pf/base/dlb_osdep_bitmap.h       |  442 +
 drivers/event/dlb/pf/base/dlb_osdep_list.h         |  131 +
 drivers/event/dlb/pf/base/dlb_osdep_types.h        |   31 +
 drivers/event/dlb/pf/base/dlb_regs.h               | 2646 ++++++
 drivers/event/dlb/pf/base/dlb_resource.c           | 9700 ++++++++++++++++++++
 drivers/event/dlb/pf/base/dlb_resource.h           | 1625 ++++
 drivers/event/dlb/pf/base/dlb_user.h               | 1084 +++
 drivers/event/dlb/pf/dlb_main.c                    |  609 ++
 drivers/event/dlb/pf/dlb_main.h                    |   54 +
 drivers/event/dlb/pf/dlb_pf.c                      |  776 ++
 drivers/event/dlb/rte_pmd_dlb.c                    |   38 +
 drivers/event/dlb/rte_pmd_dlb.h                    |   69 +
 drivers/event/dlb/rte_pmd_dlb_event_version.map    |    6 +
 drivers/event/dpaa2/dpaa2_eventdev.c               |    2 +-
 drivers/event/meson.build                          |    4 +
 drivers/event/octeontx/ssovf_evdev.c               |    2 +-
 drivers/event/skeleton/skeleton_eventdev.c         |    2 +-
 drivers/event/sw/sw_evdev.c                        |    5 +-
 drivers/event/sw/sw_evdev_selftest.c               |    9 +-
 .../eventdev_pipeline/pipeline_worker_generic.c    |    8 +-
 examples/eventdev_pipeline/pipeline_worker_tx.c    |    3 +
 examples/l2fwd-event/l2fwd_event_generic.c         |    5 +-
 examples/l2fwd-event/l2fwd_event_internal_port.c   |    5 +-
 examples/l3fwd/l3fwd_event_generic.c               |    5 +-
 examples/l3fwd/l3fwd_event_internal_port.c         |    5 +-
 lib/librte_eal/x86/include/rte_cpuflags.h          |    1 +
 lib/librte_eal/x86/rte_cpuflags.c                  |    1 +
 lib/librte_eventdev/meson.build                    |    1 +
 lib/librte_eventdev/rte_event_crypto_adapter.c     |    2 +-
 lib/librte_eventdev/rte_event_eth_tx_adapter.c     |    7 +-
 lib/librte_eventdev/rte_eventdev.c                 |  201 +-
 lib/librte_eventdev/rte_eventdev.h                 |  198 +
 lib/librte_eventdev/rte_eventdev_pmd_pci.h         |   54 +
 lib/librte_eventdev/rte_eventdev_trace.h           |    9 +-
 lib/librte_eventdev/rte_eventdev_version.map       |   13 +-
 mk/rte.app.mk                                      |    1 +
 63 files changed, 28658 insertions(+), 46 deletions(-)
 create mode 100644 doc/guides/eventdevs/dlb.rst
 create mode 100644 drivers/event/dlb/Makefile
 create mode 100644 drivers/event/dlb/dlb.c
 create mode 100644 drivers/event/dlb/dlb_iface.c
 create mode 100644 drivers/event/dlb/dlb_iface.h
 create mode 100644 drivers/event/dlb/dlb_inline_fns.h
 create mode 100644 drivers/event/dlb/dlb_log.h
 create mode 100644 drivers/event/dlb/dlb_priv.h
 create mode 100644 drivers/event/dlb/dlb_selftest.c
 create mode 100644 drivers/event/dlb/dlb_user.h
 create mode 100644 drivers/event/dlb/dlb_xstats.c
 create mode 100644 drivers/event/dlb/meson.build
 create mode 100644 drivers/event/dlb/pf/base/dlb_hw_types.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_mbox.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_osdep.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_osdep_bitmap.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_osdep_list.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_osdep_types.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_regs.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_resource.c
 create mode 100644 drivers/event/dlb/pf/base/dlb_resource.h
 create mode 100644 drivers/event/dlb/pf/base/dlb_user.h
 create mode 100644 drivers/event/dlb/pf/dlb_main.c
 create mode 100644 drivers/event/dlb/pf/dlb_main.h
 create mode 100644 drivers/event/dlb/pf/dlb_pf.c
 create mode 100644 drivers/event/dlb/rte_pmd_dlb.c
 create mode 100644 drivers/event/dlb/rte_pmd_dlb.h
 create mode 100644 drivers/event/dlb/rte_pmd_dlb_event_version.map

-- 
1.7.10


^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-06-26 23:14  3% [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API Nicolas Chautru
@ 2020-06-26 23:14  3% ` Nicolas Chautru
  2020-06-30  7:30  4% ` David Marchand
  1 sibling, 0 replies; 200+ results
From: Nicolas Chautru @ 2020-06-26 23:14 UTC (permalink / raw)
  To: dev, thomas, akhil.goyal; +Cc: Nicolas Chautru

This commit promotes the full bbdev interface to stable,
starting with the 20.08 major version.
Overdue for some time as bbdev interface has been stable.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 doc/guides/rel_notes/release_20_08.rst |  2 ++
 lib/librte_bbdev/rte_bbdev.h           | 31 -------------------------------
 lib/librte_bbdev/rte_bbdev_op.h        |  9 ---------
 lib/librte_bbdev/rte_bbdev_pmd.h       |  7 -------
 lib/librte_bbdev/rte_bbdev_version.map |  2 +-
 5 files changed, 3 insertions(+), 48 deletions(-)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 39064af..ce9f3d5 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -84,6 +84,8 @@ API Changes
    This section is a comment. Do not overwrite or remove it.
    Also, make sure to start the actual text at the margin.
    =========================================================
+* bbdev: the experimental tag is dropped from the bbdev library, and its
+  interfaces are considered stable as of DPDK 20.11.
 
 
 ABI Changes
diff --git a/lib/librte_bbdev/rte_bbdev.h b/lib/librte_bbdev/rte_bbdev.h
index ecd95a8..79a6fb4 100644
--- a/lib/librte_bbdev/rte_bbdev.h
+++ b/lib/librte_bbdev/rte_bbdev.h
@@ -10,9 +10,6 @@
  *
  * Wireless base band device abstraction APIs.
  *
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * This API allows an application to discover, configure and use a device to
  * process operations. An asynchronous API (enqueue, followed by later dequeue)
  * is used for processing operations.
@@ -55,7 +52,6 @@ enum rte_bbdev_state {
  * @return
  *   The total number of usable devices.
  */
-__rte_experimental
 uint16_t
 rte_bbdev_count(void);
 
@@ -68,7 +64,6 @@ enum rte_bbdev_state {
  * @return
  *   true if device ID is valid and device is attached, false otherwise.
  */
-__rte_experimental
 bool
 rte_bbdev_is_valid(uint16_t dev_id);
 
@@ -82,7 +77,6 @@ enum rte_bbdev_state {
  *   - The next device, or
  *   - RTE_BBDEV_MAX_DEVS if none found
  */
-__rte_experimental
 uint16_t
 rte_bbdev_find_next(uint16_t dev_id);
 
@@ -112,7 +106,6 @@ enum rte_bbdev_state {
  *   - -EBUSY if the identified device has already started
  *   - -ENOMEM if unable to allocate memory
  */
-__rte_experimental
 int
 rte_bbdev_setup_queues(uint16_t dev_id, uint16_t num_queues, int socket_id);
 
@@ -130,7 +123,6 @@ enum rte_bbdev_state {
  *   - -EBUSY if the identified device has already started
  *   - -ENOTSUP if the interrupts are not supported by the device
  */
-__rte_experimental
 int
 rte_bbdev_intr_enable(uint16_t dev_id);
 
@@ -160,7 +152,6 @@ struct rte_bbdev_queue_conf {
  *   - EINVAL if the identified queue size or priority are invalid
  *   - EBUSY if the identified queue or its device have already started
  */
-__rte_experimental
 int
 rte_bbdev_queue_configure(uint16_t dev_id, uint16_t queue_id,
 		const struct rte_bbdev_queue_conf *conf);
@@ -176,7 +167,6 @@ struct rte_bbdev_queue_conf {
  *   - 0 on success
  *   - negative value on failure - as returned from PMD driver
  */
-__rte_experimental
 int
 rte_bbdev_start(uint16_t dev_id);
 
@@ -190,7 +180,6 @@ struct rte_bbdev_queue_conf {
  * @return
  *   - 0 on success
  */
-__rte_experimental
 int
 rte_bbdev_stop(uint16_t dev_id);
 
@@ -204,7 +193,6 @@ struct rte_bbdev_queue_conf {
  * @return
  *   - 0 on success
  */
-__rte_experimental
 int
 rte_bbdev_close(uint16_t dev_id);
 
@@ -222,7 +210,6 @@ struct rte_bbdev_queue_conf {
  *   - 0 on success
  *   - negative value on failure - as returned from PMD driver
  */
-__rte_experimental
 int
 rte_bbdev_queue_start(uint16_t dev_id, uint16_t queue_id);
 
@@ -238,7 +225,6 @@ struct rte_bbdev_queue_conf {
  *   - 0 on success
  *   - negative value on failure - as returned from PMD driver
  */
-__rte_experimental
 int
 rte_bbdev_queue_stop(uint16_t dev_id, uint16_t queue_id);
 
@@ -272,7 +258,6 @@ struct rte_bbdev_stats {
  *   - 0 on success
  *   - EINVAL if invalid parameter pointer is provided
  */
-__rte_experimental
 int
 rte_bbdev_stats_get(uint16_t dev_id, struct rte_bbdev_stats *stats);
 
@@ -284,7 +269,6 @@ struct rte_bbdev_stats {
  * @return
  *   - 0 on success
  */
-__rte_experimental
 int
 rte_bbdev_stats_reset(uint16_t dev_id);
 
@@ -347,7 +331,6 @@ struct rte_bbdev_info {
  *   - 0 on success
  *   - EINVAL if invalid parameter pointer is provided
  */
-__rte_experimental
 int
 rte_bbdev_info_get(uint16_t dev_id, struct rte_bbdev_info *dev_info);
 
@@ -374,7 +357,6 @@ struct rte_bbdev_queue_info {
  *   - 0 on success
  *   - EINVAL if invalid parameter pointer is provided
  */
-__rte_experimental
 int
 rte_bbdev_queue_info_get(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_queue_info *queue_info);
@@ -491,7 +473,6 @@ struct __rte_cache_aligned rte_bbdev {
  *   The number of operations actually enqueued (this is the number of processed
  *   entries in the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_enqueue_enc_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_enc_op **ops, uint16_t num_ops)
@@ -522,7 +503,6 @@ struct __rte_cache_aligned rte_bbdev {
  *   The number of operations actually enqueued (this is the number of processed
  *   entries in the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_enqueue_dec_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_dec_op **ops, uint16_t num_ops)
@@ -553,7 +533,6 @@ struct __rte_cache_aligned rte_bbdev {
  *   The number of operations actually enqueued (this is the number of processed
  *   entries in the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_enqueue_ldpc_enc_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_enc_op **ops, uint16_t num_ops)
@@ -584,7 +563,6 @@ struct __rte_cache_aligned rte_bbdev {
  *   The number of operations actually enqueued (this is the number of processed
  *   entries in the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_enqueue_ldpc_dec_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_dec_op **ops, uint16_t num_ops)
@@ -617,7 +595,6 @@ struct __rte_cache_aligned rte_bbdev {
  *   The number of operations actually dequeued (this is the number of entries
  *   copied into the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_dequeue_enc_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_enc_op **ops, uint16_t num_ops)
@@ -650,7 +627,6 @@ struct __rte_cache_aligned rte_bbdev {
  *   copied into the @p ops array).
  */
 
-__rte_experimental
 static inline uint16_t
 rte_bbdev_dequeue_dec_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_dec_op **ops, uint16_t num_ops)
@@ -682,7 +658,6 @@ struct __rte_cache_aligned rte_bbdev {
  *   The number of operations actually dequeued (this is the number of entries
  *   copied into the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_dequeue_ldpc_enc_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_enc_op **ops, uint16_t num_ops)
@@ -713,7 +688,6 @@ struct __rte_cache_aligned rte_bbdev {
  *   The number of operations actually dequeued (this is the number of entries
  *   copied into the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_dequeue_ldpc_dec_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_dec_op **ops, uint16_t num_ops)
@@ -765,7 +739,6 @@ typedef void (*rte_bbdev_cb_fn)(uint16_t dev_id,
  * @return
  *   Zero on success, negative value on failure.
  */
-__rte_experimental
 int
 rte_bbdev_callback_register(uint16_t dev_id, enum rte_bbdev_event_type event,
 		rte_bbdev_cb_fn cb_fn, void *cb_arg);
@@ -789,7 +762,6 @@ typedef void (*rte_bbdev_cb_fn)(uint16_t dev_id,
  *   - EINVAL if invalid parameter pointer is provided
  *   - EAGAIN if the provided callback pointer does not exist
  */
-__rte_experimental
 int
 rte_bbdev_callback_unregister(uint16_t dev_id, enum rte_bbdev_event_type event,
 		rte_bbdev_cb_fn cb_fn, void *cb_arg);
@@ -810,7 +782,6 @@ typedef void (*rte_bbdev_cb_fn)(uint16_t dev_id,
  *   - 0 on success
  *   - negative value on failure - as returned from PMD driver
  */
-__rte_experimental
 int
 rte_bbdev_queue_intr_enable(uint16_t dev_id, uint16_t queue_id);
 
@@ -827,7 +798,6 @@ typedef void (*rte_bbdev_cb_fn)(uint16_t dev_id,
  *   - 0 on success
  *   - negative value on failure - as returned from PMD driver
  */
-__rte_experimental
 int
 rte_bbdev_queue_intr_disable(uint16_t dev_id, uint16_t queue_id);
 
@@ -855,7 +825,6 @@ typedef void (*rte_bbdev_cb_fn)(uint16_t dev_id,
  *   - ENOTSUP if interrupts are not supported by the identified device
  *   - negative value on failure - as returned from PMD driver
  */
-__rte_experimental
 int
 rte_bbdev_queue_intr_ctl(uint16_t dev_id, uint16_t queue_id, int epfd, int op,
 		void *data);
diff --git a/lib/librte_bbdev/rte_bbdev_op.h b/lib/librte_bbdev/rte_bbdev_op.h
index f726d73..45a4ead 100644
--- a/lib/librte_bbdev/rte_bbdev_op.h
+++ b/lib/librte_bbdev/rte_bbdev_op.h
@@ -9,9 +9,6 @@
  * @file rte_bbdev_op.h
  *
  * Defines wireless base band layer 1 operations and capabilities
- *
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
  */
 
 #ifdef __cplusplus
@@ -807,7 +804,6 @@ struct rte_bbdev_op_pool_private {
  *   Operation type as string or NULL if op_type is invalid
  *
  */
-__rte_experimental
 const char*
 rte_bbdev_op_type_str(enum rte_bbdev_op_type op_type);
 
@@ -831,7 +827,6 @@ struct rte_bbdev_op_pool_private {
  *   - Pointer to a mempool on success,
  *   - NULL pointer on failure.
  */
-__rte_experimental
 struct rte_mempool *
 rte_bbdev_op_pool_create(const char *name, enum rte_bbdev_op_type type,
 		unsigned int num_elements, unsigned int cache_size,
@@ -851,7 +846,6 @@ struct rte_mempool *
  *   - 0 on success
  *   - EINVAL if invalid mempool is provided
  */
-__rte_experimental
 static inline int
 rte_bbdev_enc_op_alloc_bulk(struct rte_mempool *mempool,
 		struct rte_bbdev_enc_op **ops, uint16_t num_ops)
@@ -888,7 +882,6 @@ struct rte_mempool *
  *   - 0 on success
  *   - EINVAL if invalid mempool is provided
  */
-__rte_experimental
 static inline int
 rte_bbdev_dec_op_alloc_bulk(struct rte_mempool *mempool,
 		struct rte_bbdev_dec_op **ops, uint16_t num_ops)
@@ -921,7 +914,6 @@ struct rte_mempool *
  * @param num_ops
  *   Number of structures
  */
-__rte_experimental
 static inline void
 rte_bbdev_dec_op_free_bulk(struct rte_bbdev_dec_op **ops, unsigned int num_ops)
 {
@@ -939,7 +931,6 @@ struct rte_mempool *
  * @param num_ops
  *   Number of structures
  */
-__rte_experimental
 static inline void
 rte_bbdev_enc_op_free_bulk(struct rte_bbdev_enc_op **ops, unsigned int num_ops)
 {
diff --git a/lib/librte_bbdev/rte_bbdev_pmd.h b/lib/librte_bbdev/rte_bbdev_pmd.h
index 237e336..dd0e359 100644
--- a/lib/librte_bbdev/rte_bbdev_pmd.h
+++ b/lib/librte_bbdev/rte_bbdev_pmd.h
@@ -10,9 +10,6 @@
  *
  * Wireless base band driver-facing APIs.
  *
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * This API provides the mechanism for device drivers to register with the
  * bbdev interface. User applications should not use this API.
  */
@@ -43,7 +40,6 @@
  * @return
  *   - Slot in the rte_bbdev array for a new device;
  */
-__rte_experimental
 struct rte_bbdev *
 rte_bbdev_allocate(const char *name);
 
@@ -56,7 +52,6 @@ struct rte_bbdev *
  * @return
  *   - 0 on success, negative on error
  */
-__rte_experimental
 int
 rte_bbdev_release(struct rte_bbdev *bbdev);
 
@@ -71,7 +66,6 @@ struct rte_bbdev *
  *   - NULL otherwise
  *
  */
-__rte_experimental
 struct rte_bbdev *
 rte_bbdev_get_named_dev(const char *name);
 
@@ -190,7 +184,6 @@ struct rte_bbdev_ops {
  * @param ret_param
  *   To pass data back to user application.
  */
-__rte_experimental
 void
 rte_bbdev_pmd_callback_process(struct rte_bbdev *dev,
 	enum rte_bbdev_event_type event, void *ret_param);
diff --git a/lib/librte_bbdev/rte_bbdev_version.map b/lib/librte_bbdev/rte_bbdev_version.map
index 3624eb1..9e79be7 100644
--- a/lib/librte_bbdev/rte_bbdev_version.map
+++ b/lib/librte_bbdev/rte_bbdev_version.map
@@ -1,4 +1,4 @@
-EXPERIMENTAL {
+DPDK_21 {
 	global:
 
 	rte_bbdev_allocate;
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
@ 2020-06-26 23:14  3% Nicolas Chautru
  2020-06-26 23:14  3% ` Nicolas Chautru
  2020-06-30  7:30  4% ` David Marchand
  0 siblings, 2 replies; 200+ results
From: Nicolas Chautru @ 2020-06-26 23:14 UTC (permalink / raw)
  To: dev, thomas, akhil.goyal; +Cc: Nicolas Chautru

Planning to move bbdev API to stable from 20.11 (ABI version 21)
and remove experimental tag. 
Sending now to advertise and get any feedback. 
Some manual rebase will be required later on notably as the
actual release note which is not there yet. 

Nicolas Chautru (1):
  bbdev: remove experimental tag from API

 doc/guides/rel_notes/release_20_08.rst |  2 ++
 lib/librte_bbdev/rte_bbdev.h           | 31 -------------------------------
 lib/librte_bbdev/rte_bbdev_op.h        |  9 ---------
 lib/librte_bbdev/rte_bbdev_pmd.h       |  7 -------
 lib/librte_bbdev/rte_bbdev_version.map |  2 +-
 5 files changed, 3 insertions(+), 48 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper
  2020-06-26 14:47  3%   ` [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper David Marchand
@ 2020-06-26 15:00  0%     ` Jerin Jacob
  2020-06-29  9:07  0%       ` David Marchand
  2020-06-29  8:59  0%     ` [dpdk-dev] [EXT] " Sunil Kumar Kori
  2020-06-30  9:42  0%     ` [dpdk-dev] " Olivier Matz
  2 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-26 15:00 UTC (permalink / raw)
  To: David Marchand
  Cc: dpdk-dev, Richardson, Bruce, Ray Kinsella, Thomas Monjalon,
	Andrew Rybchenko, Kevin Traynor, ian.stokes, i.maximets,
	Jerin Jacob, Sunil Kumar Kori, Neil Horman, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

On Fri, Jun 26, 2020 at 8:18 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> This is a preparation step for dynamically unregistering threads.
>
> Since we explicitly allocate a per thread trace buffer in
> rte_thread_init, add an internal helper to free this buffer.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Note: I preferred renaming the current internal function to free all
> threads trace buffers (new name trace_mem_free()) and reuse the previous
> name (trace_mem_per_thread_free()) when freeing this buffer for a given
> thread.
>
> Changes since v2:
> - added missing stub for windows tracing support,
> - moved free symbol to exported (experimental) ABI as a counterpart of
>   the alloc symbol we already had,
>
> Changes since v1:
> - rebased on master, removed Windows workaround wrt traces support,

> +/**
> + * Uninitialize per-lcore info for current thread.
> + */
> +void rte_thread_uninit(void);
> +

Is it a public API? I guess not as it not adding in .map file.
If it is private API, Is n't it better to change as eal_thread_ like
another private API in eal_thread.h?

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper
    2020-06-26 14:47  3%   ` [dpdk-dev] [PATCH v4 2/9] eal: fix multiple definition of per lcore thread id David Marchand
@ 2020-06-26 14:47  3%   ` David Marchand
  2020-06-26 15:00  0%     ` Jerin Jacob
                       ` (2 more replies)
  1 sibling, 3 replies; 200+ results
From: David Marchand @ 2020-06-26 14:47 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, Jerin Jacob, Sunil Kumar Kori,
	Neil Horman, Harini Ramakrishnan, Omar Cardona, Pallavi Kadam,
	Ranjit Menon

This is a preparation step for dynamically unregistering threads.

Since we explicitly allocate a per thread trace buffer in
rte_thread_init, add an internal helper to free this buffer.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Note: I preferred renaming the current internal function to free all
threads trace buffers (new name trace_mem_free()) and reuse the previous
name (trace_mem_per_thread_free()) when freeing this buffer for a given
thread.

Changes since v2:
- added missing stub for windows tracing support,
- moved free symbol to exported (experimental) ABI as a counterpart of
  the alloc symbol we already had,

Changes since v1:
- rebased on master, removed Windows workaround wrt traces support,

---
 lib/librte_eal/common/eal_common_thread.c |  9 ++++
 lib/librte_eal/common/eal_common_trace.c  | 51 +++++++++++++++++++----
 lib/librte_eal/common/eal_thread.h        |  5 +++
 lib/librte_eal/common/eal_trace.h         |  2 +-
 lib/librte_eal/include/rte_trace_point.h  |  9 ++++
 lib/librte_eal/rte_eal_version.map        |  3 ++
 lib/librte_eal/windows/eal.c              |  5 +++
 7 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index afb30236c5..3b30cc99d9 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -20,6 +20,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_private.h"
 #include "eal_thread.h"
+#include "eal_trace.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
 RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
@@ -161,6 +162,14 @@ rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset)
 	__rte_trace_mem_per_thread_alloc();
 }
 
+void
+rte_thread_uninit(void)
+{
+	__rte_trace_mem_per_thread_free();
+
+	RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY;
+}
+
 struct rte_thread_ctrl_params {
 	void *(*start_routine)(void *);
 	void *arg;
diff --git a/lib/librte_eal/common/eal_common_trace.c b/lib/librte_eal/common/eal_common_trace.c
index 875553d7e5..3e620d76ed 100644
--- a/lib/librte_eal/common/eal_common_trace.c
+++ b/lib/librte_eal/common/eal_common_trace.c
@@ -101,7 +101,7 @@ eal_trace_fini(void)
 {
 	if (!rte_trace_is_enabled())
 		return;
-	trace_mem_per_thread_free();
+	trace_mem_free();
 	trace_metadata_destroy();
 	eal_trace_args_free();
 }
@@ -370,24 +370,59 @@ __rte_trace_mem_per_thread_alloc(void)
 	rte_spinlock_unlock(&trace->lock);
 }
 
+static void
+trace_mem_per_thread_free_unlocked(struct thread_mem_meta *meta)
+{
+	if (meta->area == TRACE_AREA_HUGEPAGE)
+		eal_free_no_trace(meta->mem);
+	else if (meta->area == TRACE_AREA_HEAP)
+		free(meta->mem);
+}
+
+void
+__rte_trace_mem_per_thread_free(void)
+{
+	struct trace *trace = trace_obj_get();
+	struct __rte_trace_header *header;
+	uint32_t count;
+
+	if (RTE_PER_LCORE(trace_mem) == NULL)
+		return;
+
+	header = RTE_PER_LCORE(trace_mem);
+	rte_spinlock_lock(&trace->lock);
+	for (count = 0; count < trace->nb_trace_mem_list; count++) {
+		if (trace->lcore_meta[count].mem == header)
+			break;
+	}
+	if (count != trace->nb_trace_mem_list) {
+		struct thread_mem_meta *meta = &trace->lcore_meta[count];
+
+		trace_mem_per_thread_free_unlocked(meta);
+		if (count != trace->nb_trace_mem_list - 1) {
+			memmove(meta, meta + 1,
+				sizeof(*meta) *
+				 (trace->nb_trace_mem_list - count - 1));
+		}
+		trace->nb_trace_mem_list--;
+	}
+	rte_spinlock_unlock(&trace->lock);
+}
+
 void
-trace_mem_per_thread_free(void)
+trace_mem_free(void)
 {
 	struct trace *trace = trace_obj_get();
 	uint32_t count;
-	void *mem;
 
 	if (!rte_trace_is_enabled())
 		return;
 
 	rte_spinlock_lock(&trace->lock);
 	for (count = 0; count < trace->nb_trace_mem_list; count++) {
-		mem = trace->lcore_meta[count].mem;
-		if (trace->lcore_meta[count].area == TRACE_AREA_HUGEPAGE)
-			eal_free_no_trace(mem);
-		else if (trace->lcore_meta[count].area == TRACE_AREA_HEAP)
-			free(mem);
+		trace_mem_per_thread_free_unlocked(&trace->lcore_meta[count]);
 	}
+	trace->nb_trace_mem_list = 0;
 	rte_spinlock_unlock(&trace->lock);
 }
 
diff --git a/lib/librte_eal/common/eal_thread.h b/lib/librte_eal/common/eal_thread.h
index da5e7c93ba..4ecd8fd53a 100644
--- a/lib/librte_eal/common/eal_thread.h
+++ b/lib/librte_eal/common/eal_thread.h
@@ -25,6 +25,11 @@ __rte_noreturn void *eal_thread_loop(void *arg);
  */
 void rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
 
+/**
+ * Uninitialize per-lcore info for current thread.
+ */
+void rte_thread_uninit(void);
+
 /**
  * Get the NUMA socket id from cpu id.
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/eal_trace.h b/lib/librte_eal/common/eal_trace.h
index 8f60616156..bbb6e1645c 100644
--- a/lib/librte_eal/common/eal_trace.h
+++ b/lib/librte_eal/common/eal_trace.h
@@ -106,7 +106,7 @@ int trace_metadata_create(void);
 void trace_metadata_destroy(void);
 int trace_mkdir(void);
 int trace_epoch_time_save(void);
-void trace_mem_per_thread_free(void);
+void trace_mem_free(void);
 
 /* EAL interface */
 int eal_trace_init(void);
diff --git a/lib/librte_eal/include/rte_trace_point.h b/lib/librte_eal/include/rte_trace_point.h
index 377c2414aa..686b86fdb1 100644
--- a/lib/librte_eal/include/rte_trace_point.h
+++ b/lib/librte_eal/include/rte_trace_point.h
@@ -230,6 +230,15 @@ __rte_trace_point_fp_is_enabled(void)
 __rte_experimental
 void __rte_trace_mem_per_thread_alloc(void);
 
+/**
+ * @internal
+ *
+ * Free trace memory buffer per thread.
+ *
+ */
+__rte_experimental
+void __rte_trace_mem_per_thread_free(void);
+
 /**
  * @internal
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 0d42d44ce9..5831eea4b0 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -393,6 +393,9 @@ EXPERIMENTAL {
 	rte_trace_point_lookup;
 	rte_trace_regexp;
 	rte_trace_save;
+
+	# added in 20.08
+	__rte_trace_mem_per_thread_free;
 };
 
 INTERNAL {
diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c
index adfaa00275..27a44c49ff 100644
--- a/lib/librte_eal/windows/eal.c
+++ b/lib/librte_eal/windows/eal.c
@@ -255,6 +255,11 @@ __rte_trace_mem_per_thread_alloc(void)
 {
 }
 
+void
+__rte_trace_mem_per_thread_free(void)
+{
+}
+
 void
 __rte_trace_point_emit_field(size_t sz, const char *field,
 	const char *type)
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 2/9] eal: fix multiple definition of per lcore thread id
  @ 2020-06-26 14:47  3%   ` David Marchand
  2020-06-30  9:34  0%     ` Olivier Matz
  2020-06-26 14:47  3%   ` [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper David Marchand
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2020-06-26 14:47 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, Neil Horman, Cunming Liang,
	Konstantin Ananyev, Olivier Matz

Because of the inline accessor + static declaration in rte_gettid(),
we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
Each compilation unit will pay a cost when accessing this information
for the first time.

$ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
0000000000000054 d per_lcore__thread_id.5037
0000000000000040 d per_lcore__thread_id.5103
0000000000000048 d per_lcore__thread_id.5259
000000000000004c d per_lcore__thread_id.5259
0000000000000044 d per_lcore__thread_id.5933
0000000000000058 d per_lcore__thread_id.6261
0000000000000050 d per_lcore__thread_id.7378
000000000000005c d per_lcore__thread_id.7496
000000000000000c d per_lcore__thread_id.8016
0000000000000010 d per_lcore__thread_id.8431

Make it global as part of the DPDK_21 stable ABI.

Fixes: ef76436c6834 ("eal: get unique thread id")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 lib/librte_eal/common/eal_common_thread.c | 1 +
 lib/librte_eal/include/rte_eal.h          | 3 ++-
 lib/librte_eal/rte_eal_version.map        | 7 +++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index a5f67d811c..280c64bb76 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -22,6 +22,7 @@
 #include "eal_thread.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 static RTE_DEFINE_PER_LCORE(unsigned int, _socket_id) =
 	(unsigned int)SOCKET_ID_ANY;
 static RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
diff --git a/lib/librte_eal/include/rte_eal.h b/lib/librte_eal/include/rte_eal.h
index 2f9ed298de..2edf8c6556 100644
--- a/lib/librte_eal/include/rte_eal.h
+++ b/lib/librte_eal/include/rte_eal.h
@@ -447,6 +447,8 @@ enum rte_intr_mode rte_eal_vfio_intr_mode(void);
  */
 int rte_sys_gettid(void);
 
+RTE_DECLARE_PER_LCORE(int, _thread_id);
+
 /**
  * Get system unique thread id.
  *
@@ -456,7 +458,6 @@ int rte_sys_gettid(void);
  */
 static inline int rte_gettid(void)
 {
-	static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 	if (RTE_PER_LCORE(_thread_id) == -1)
 		RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
 	return RTE_PER_LCORE(_thread_id);
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 196eef5afa..0d42d44ce9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,6 +221,13 @@ DPDK_20.0 {
 	local: *;
 };
 
+DPDK_21 {
+	global:
+
+	per_lcore__thread_id;
+
+} DPDK_20.0;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-techboard] [PATCH v3 0/3] Experimental/internal libraries cleanup
  2020-06-26  8:16  5% ` [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup David Marchand
                     ` (2 preceding siblings ...)
  2020-06-26  8:16 16%   ` [dpdk-dev] [PATCH v3 3/3] lib: remind experimental status in library headers David Marchand
@ 2020-06-26  9:25  0%   ` Bruce Richardson
  2020-07-05 19:55  3%   ` [dpdk-dev] " Thomas Monjalon
  4 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2020-06-26  9:25 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, honnappa.nagarahalli, techboard

On Fri, Jun 26, 2020 at 10:16:35AM +0200, David Marchand wrote:
> Following discussions on the mailing list and the 05/20 TB meeting, here
> is a series that drops the special versioning for non stable libraries.
> 
> Two notes:
> 
> - RIB/FIB library is not referenced in the API doxygen index, is this
>   intentional?
> - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
>   announced as experimental while their functions are part of the 20
>   stable ABI (in .map files + no __rte_experimental marking).
>   Their fate must be discussed.
> 
> Changes since v2:
> - added librte_graph and librte_node missed when rebasing to 20.05,
> 
> Changes since v1:
> - rebased on master,
> - removed mention of 0 version in abi docs,
> - updated wording in experimental banner and abi docs following Honnappa
>   comment,
> 
> 
> -- 
> David Marchand
> 
> David Marchand (3):
>   build: remove special versioning for non stable libraries
>   drivers: drop workaround for internal libraries
>   lib: remind experimental status in library headers
> 
The build changes, and patchset as a whole, look ok to me and good to see
the code simplified by this.

Series-acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: mark internal symbols in ethdev
  2020-06-23 13:49  9% [dpdk-dev] [PATCH] doc: mark internal symbols in ethdev Ferruh Yigit
@ 2020-06-26  8:49  0% ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-06-26  8:49 UTC (permalink / raw)
  To: Ferruh Yigit, Neil Horman, John McNamara, Marko Kovacevic
  Cc: dev, David Marchand, Thomas Monjalon, Andrew Rybchenko



On 23/06/2020 14:49, Ferruh Yigit wrote:
> The APIs are marked in the doxygen comment but better to mark the
> symbols too. This is planned for v20.11 release.
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 0bee92425..0b0f75720 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -98,6 +98,12 @@ Deprecation Notices
>    Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
>    APIs can be used as replacement.
>  
> +* ethdev: Some internal APIs for driver usage are exported in the .map file.
> +  Now DPDK has ``__rte_internal`` marker so we can mark internal APIs and move
> +  them to the INTERNAL block in .map. Although these APIs are internal it will
> +  break the ABI checks, that is why change is planned for 20.11.
> +  The list of internal APIs are mainly ones listed in ``rte_ethdev_driver.h``.
> +

Acked-by: Ray Kinsella <mdr@ashroe.eu>

A bunch of other folks have already annotated "internal" APIs, and added entries to 
libabigail.abignore to suppress warnings. If you are 100% certain these are never used 
by end applications, you could do likewise.

That said, depreciation notice and completing in 20.11 is definitely the better approach. 
See https://git.dpdk.org/dpdk/tree/devtools/libabigail.abignore#n53



>  * traffic manager: All traffic manager API's in ``rte_tm.h`` were mistakenly made
>    ABI stable in the v19.11 release. The TM maintainer and other contributors have
>    agreed to keep the TM APIs as experimental in expectation of additional spec
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 1/3] build: remove special versioning for non stable libraries
  2020-06-26  8:16 24%   ` [dpdk-dev] [PATCH v3 1/3] build: remove special versioning for non stable libraries David Marchand
@ 2020-06-26  8:38  0%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-06-26  8:38 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, honnappa.nagarahalli, techboard, Neil Horman,
	John McNamara, Marko Kovacevic



On 26/06/2020 09:16, David Marchand wrote:
> Having a special versioning for experimental/internal libraries put a
> additional maintenance cost while this status is already announced in
> MAINTAINERS and the library headers/documentation.
> Following discussions and vote at 05/20 TB meeting [1], use a single
> versioning for all libraries in DPDK.
> 
> Note: for the ABI check, an exception [2] had been added when tweaking
> this special versioning [3].
> Prefer explicit libabigail rules (which will be dropped in 20.11).
> 
> 1: https://mails.dpdk.org/archives/dev/2020-May/168450.html
> 2: https://git.dpdk.org/dpdk/commit/?id=23d7ad5db41c
> 3: https://git.dpdk.org/dpdk/commit/?id=ec2b8cd7ed69
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Changes since v2:
> - added exceptions for librte_graph and librte_node missed post 20.05,
> 
> Changes since v1:
> - removed mention of special handling in ABI docs,
> 
> ---
>  buildtools/meson.build                     |  3 ---
>  config/meson.build                         | 16 +++++-------
>  devtools/check-abi.sh                      |  5 ----
>  devtools/libabigail.abignore               | 29 ++++++++++++++++++++--
>  doc/guides/contributing/abi_policy.rst     |  6 +----
>  doc/guides/contributing/abi_versioning.rst |  3 +--
>  drivers/meson.build                        | 17 +------------
>  lib/meson.build                            | 16 +-----------
>  mk/rte.lib.mk                              |  5 ----
>  9 files changed, 37 insertions(+), 63 deletions(-)
> 
> diff --git a/buildtools/meson.build b/buildtools/meson.build
> index d5f8291beb..79703b6f93 100644
> --- a/buildtools/meson.build
> +++ b/buildtools/meson.build
> @@ -18,6 +18,3 @@ else
>  endif
>  map_to_def_cmd = py3 + files('map_to_def.py')
>  sphinx_wrapper = py3 + files('call-sphinx-build.py')
> -
> -# stable ABI always starts with "DPDK_"
> -is_stable_cmd = [find_program('grep', 'findstr'), '^DPDK_']
> diff --git a/config/meson.build b/config/meson.build
> index 351e268c1f..d6d3f5271d 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -25,18 +25,14 @@ major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
>  abi_version = run_command(find_program('cat', 'more'),
>  	abi_version_file).stdout().strip()
>  
> -# Regular libraries have the abi_version as the filename extension
> +# Libraries have the abi_version as the filename extension
>  # and have the soname be all but the final part of the abi_version.
> -# Experimental libraries have soname with '0.major'
> -# and the filename suffix as 0.majorminor versions,
> -# e.g. v20.1 => librte_stable.so.20.1, librte_experimental.so.0.201
> -#    sonames => librte_stable.so.20, librte_experimental.so.0.20
> -# e.g. v20.0.1 => librte_stable.so.20.0.1, librte_experimental.so.0.2001
> -#      sonames => librte_stable.so.20.0, librte_experimental.so.0.200
> +# e.g. v20.1 => librte_foo.so.20.1
> +#    sonames => librte_foo.so.20
> +# e.g. v20.0.1 => librte_foo.so.20.0.1
> +#      sonames => librte_foo.so.20.0
>  abi_va = abi_version.split('.')
> -stable_so_version = abi_va.length() == 2 ? abi_va[0] : abi_va[0] + '.' + abi_va[1]
> -experimental_abi_version = '0.' + abi_va[0] + abi_va[1] + '.' + abi_va[2]
> -experimental_so_version = experimental_abi_version
> +so_version = abi_va.length() == 2 ? abi_va[0] : abi_va[0] + '.' + abi_va[1]
>  
>  # extract all version information into the build configuration
>  dpdk_conf.set('RTE_VER_YEAR', pver.get(0).to_int())
> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
> index dd9120e69e..e17fedbd9f 100755
> --- a/devtools/check-abi.sh
> +++ b/devtools/check-abi.sh
> @@ -44,11 +44,6 @@ for dump in $(find $refdir -name "*.dump"); do
>  		echo "Skipped glue library $name."
>  		continue
>  	fi
> -	# skip experimental libraries, with a sover starting with 0.
> -	if grep -qE "\<soname='[^']*\.so\.0\.[^']*'" $dump; then
> -		echo "Skipped experimental library $name."
> -		continue
> -	fi
>  	dump2=$(find $newdir -name $name)
>  	if [ -z "$dump2" ] || [ ! -e "$dump2" ]; then
>  		echo "Error: can't find $name in $newdir"
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index becbf842a5..0133f757d0 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -50,9 +50,9 @@
>          name = rte_crypto_aead_algorithm_strings
>  
>  ;;;;;;;;;;;;;;;;;;;;;;
> -; Temporary exceptions for new __rte_internal marking till DPDK 20.11
> +; Temporary exceptions till DPDK 20.11
>  ;;;;;;;;;;;;;;;;;;;;;;
> -; Ignore moving OCTEONTX2 stable functions to INTERNAL tag
> +; Ignore moving OCTEONTX2 stable functions to INTERNAL
>  [suppress_file]
>  	file_name_regexp = ^librte_common_octeontx2\.
>  [suppress_file]
> @@ -77,3 +77,28 @@
>          name = rte_dpaa2_mbuf_alloc_bulk
>  [suppress_function]
>          name_regexp = ^dpaa2?_.*tach$
> +; Ignore soname changes for experimental libraries
> +[suppress_file]
> +	file_name_regexp = ^librte_bbdev\.
> +[suppress_file]
> +	file_name_regexp = ^librte_bpf\.
> +[suppress_file]
> +	file_name_regexp = ^librte_compressdev\.
> +[suppress_file]
> +	file_name_regexp = ^librte_fib\.
> +[suppress_file]
> +	file_name_regexp = ^librte_flow_classify\.
> +[suppress_file]
> +	file_name_regexp = ^librte_graph\.
> +[suppress_file]
> +	file_name_regexp = ^librte_ipsec\.
> +[suppress_file]
> +	file_name_regexp = ^librte_node\.
> +[suppress_file]
> +	file_name_regexp = ^librte_rcu\.
> +[suppress_file]
> +	file_name_regexp = ^librte_rib\.
> +[suppress_file]
> +	file_name_regexp = ^librte_telemetry\.
> +[suppress_file]
> +	file_name_regexp = ^librte_stack\.

Can't help feeling on this, that are swapping one headache for another. 
We are adding the cost of maintaining a bloated libabigail.abignore.

I still maintain, that if we marked libraries as "experimental" in the filename.
It serve both as obvious reminder, and simpifies this maintenance - no bloated libabigail.abiignore. 

I agree that there is no need for special versioning. 

> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
> index ee17ccb200..1b2fa27865 100644
> --- a/doc/guides/contributing/abi_policy.rst
> +++ b/doc/guides/contributing/abi_policy.rst
> @@ -28,7 +28,6 @@ General Guidelines
>     once approved these will form part of the next ABI version.
>  #. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
>     change without constraint, as they are not considered part of an ABI version.
> -   Experimental libraries have the major ABI version ``0``.
>  #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
>     support for hardware which was previously supported, should be treated as an
>     ABI change.
> @@ -331,7 +330,4 @@ Libraries
>  ~~~~~~~~~
>  
>  Libraries marked as ``experimental`` are entirely not considered part of an ABI
> -version, and may change without warning at any time. Experimental libraries
> -always have a major ABI version of ``0`` to indicate they exist outside of
> -:ref:`abi_versioning` , with the minor version incremented with each ABI change
> -to library.
> +version, and may change without warning at any time.
> diff --git a/doc/guides/contributing/abi_versioning.rst b/doc/guides/contributing/abi_versioning.rst
> index e96fde340f..31a9205572 100644
> --- a/doc/guides/contributing/abi_versioning.rst
> +++ b/doc/guides/contributing/abi_versioning.rst
> @@ -112,8 +112,7 @@ how this may be done.
>  
>  At the same time, the major ABI version is changed atomically across all
>  libraries by incrementing the major version in the ABI_VERSION file. This is
> -done globally for all libraries that declare a stable ABI. For libraries marked
> -as EXPERIMENTAL, their major ABI version is always set to 0.
> +done globally for all libraries.

Documentation changes are 100% OK. 
Acked-by: Ray Kinsella <mdr@ashroe.eu>

>  Minor ABI versions
>  ~~~~~~~~~~~~~~~~~~
> diff --git a/drivers/meson.build b/drivers/meson.build
> index cfb6a833c9..d1b59a4bac 100644
> --- a/drivers/meson.build
> +++ b/drivers/meson.build
> @@ -124,21 +124,6 @@ foreach class:dpdk_driver_classes
>  					output: out_filename,
>  					depends: [pmdinfogen, tmp_lib])
>  
> -			version_map = '@0@/@1@/@2@_version.map'.format(
> -					meson.current_source_dir(),
> -					drv_path, lib_name)
> -
> -			is_stable = run_command(is_stable_cmd,
> -				files(version_map)).returncode() == 0
> -
> -			if is_stable
> -				lib_version = abi_version
> -				so_version = stable_so_version
> -			else
> -				lib_version = experimental_abi_version
> -				so_version = experimental_so_version
> -			endif
> -
>  			# now build the static driver
>  			static_lib = static_library(lib_name,
>  				sources,
> @@ -183,7 +168,7 @@ foreach class:dpdk_driver_classes
>  				c_args: cflags,
>  				link_args: lk_args,
>  				link_depends: lk_deps,
> -				version: lib_version,
> +				version: abi_version,
>  				soversion: so_version,
>  				install: true,
>  				install_dir: driver_install_path)
> diff --git a/lib/meson.build b/lib/meson.build
> index d190d84eff..d646f33e07 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -107,20 +107,6 @@ foreach l:libraries
>  				cflags += '-DRTE_USE_FUNCTION_VERSIONING'
>  			endif
>  
> -			version_map = '@0@/@1@/rte_@2@_version.map'.format(
> -					meson.current_source_dir(), dir_name, name)
> -
> -			is_stable = run_command(is_stable_cmd,
> -					files(version_map)).returncode() == 0
> -
> -			if is_stable
> -				lib_version = abi_version
> -				so_version = stable_so_version
> -			else
> -				lib_version = experimental_abi_version
> -				so_version = experimental_so_version
> -			endif
> -
>  			# first build static lib
>  			static_lib = static_library(libname,
>  					sources,
> @@ -179,7 +165,7 @@ foreach l:libraries
>  					include_directories: includes,
>  					link_args: lk_args,
>  					link_depends: lk_deps,
> -					version: lib_version,
> +					version: abi_version,
>  					soversion: so_version,
>  					install: true)
>  			shared_dep = declare_dependency(link_with: shared_lib,
> diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
> index 682b590dba..229ae16814 100644
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -13,11 +13,6 @@ VPATH += $(SRCDIR)
>  
>  LIBABIVER ?= $(shell cat $(RTE_SRCDIR)/ABI_VERSION)
>  SOVER := $(basename $(LIBABIVER))
> -ifeq ($(shell grep -s "^DPDK_" $(SRCDIR)/$(EXPORT_MAP)),)
> -# EXPERIMENTAL ABI is versioned as 0.major+minor, e.g. 0.201 for 20.1 ABI
> -LIBABIVER := 0.$(shell echo $(LIBABIVER) | awk 'BEGIN { FS="." }; { print $$1$$2"."$$3 }')
> -SOVER := $(LIBABIVER)
> -endif
>  
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
>  SONAME := $(patsubst %.a,%.so.$(SOVER),$(LIB))
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 3/3] lib: remind experimental status in library headers
  2020-06-26  8:16  5% ` [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup David Marchand
  2020-06-26  8:16 24%   ` [dpdk-dev] [PATCH v3 1/3] build: remove special versioning for non stable libraries David Marchand
  2020-06-26  8:16  3%   ` [dpdk-dev] [PATCH v3 2/3] drivers: drop workaround for internal libraries David Marchand
@ 2020-06-26  8:16 16%   ` David Marchand
  2020-06-26  9:25  0%   ` [dpdk-dev] [dpdk-techboard] [PATCH v3 0/3] Experimental/internal libraries cleanup Bruce Richardson
  2020-07-05 19:55  3%   ` [dpdk-dev] " Thomas Monjalon
  4 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-26  8:16 UTC (permalink / raw)
  To: dev
  Cc: thomas, honnappa.nagarahalli, techboard, stable, Ray Kinsella,
	Neil Horman, John McNamara, Marko Kovacevic, Nicolas Chautru,
	Konstantin Ananyev, Fiona Trahe, Ashish Gupta,
	Vladimir Medvedkin, Bernard Iremonger, Jerin Jacob,
	Kiran Kumar K, Nithin Dabilpuram, Pavan Nikhilesh, Gage Eads,
	Olivier Matz, Kevin Laatz

The following libraries are experimental, all of their functions can
be changed or removed:

- librte_bbdev
- librte_bpf
- librte_compressdev
- librte_fib
- librte_flow_classify
- librte_graph
- librte_ipsec
- librte_node
- librte_rcu
- librte_rib
- librte_stack
- librte_telemetry

Their status is properly announced in MAINTAINERS.
Remind this status in their headers in a common fashion (aligned to ABI
docs).

Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v2:
- updated librte_graph and librte_node

Changes since v1:
- updated wording following Honnappa comment

---
 doc/guides/contributing/abi_policy.rst       |  8 +++++---
 lib/librte_bbdev/rte_bbdev.h                 |  3 ++-
 lib/librte_bpf/rte_bpf.h                     |  6 +++++-
 lib/librte_compressdev/rte_compressdev.h     |  6 +++++-
 lib/librte_fib/rte_fib.h                     |  7 +++++++
 lib/librte_fib/rte_fib6.h                    |  7 +++++++
 lib/librte_flow_classify/rte_flow_classify.h |  6 ++++--
 lib/librte_graph/rte_graph.h                 |  3 ++-
 lib/librte_graph/rte_graph_worker.h          |  3 ++-
 lib/librte_ipsec/rte_ipsec.h                 |  6 +++++-
 lib/librte_node/rte_node_eth_api.h           |  3 ++-
 lib/librte_node/rte_node_ip4_api.h           |  3 ++-
 lib/librte_rcu/rte_rcu_qsbr.h                |  7 ++++++-
 lib/librte_rib/rte_rib.h                     |  7 +++++++
 lib/librte_rib/rte_rib6.h                    |  7 +++++++
 lib/librte_stack/rte_stack.h                 |  7 +++++--
 lib/librte_telemetry/rte_telemetry.h         | 10 ++++++----
 17 files changed, 79 insertions(+), 20 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index 1b2fa27865..d0affa9e60 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -27,7 +27,8 @@ General Guidelines
 #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
    once approved these will form part of the next ABI version.
 #. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
-   change without constraint, as they are not considered part of an ABI version.
+   be changed or removed without prior notice, as they are not considered part
+   of an ABI version.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -294,7 +295,7 @@ APIs
 ~~~~
 
 APIs marked as ``experimental`` are not considered part of an ABI version and
-may change without warning at any time. Since changes to APIs are most likely
+may be changed or removed without prior notice. Since changes to APIs are most likely
 immediately after their introduction, as users begin to take advantage of those
 new APIs and start finding issues with them, new DPDK APIs will be automatically
 marked as ``experimental`` to allow for a period of stabilization before they
@@ -330,4 +331,5 @@ Libraries
 ~~~~~~~~~
 
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
-version, and may change without warning at any time.
+version.
+All functions in such libraries may be changed or removed without prior notice.
diff --git a/lib/librte_bbdev/rte_bbdev.h b/lib/librte_bbdev/rte_bbdev.h
index ecd95a823d..57291373fa 100644
--- a/lib/librte_bbdev/rte_bbdev.h
+++ b/lib/librte_bbdev/rte_bbdev.h
@@ -11,7 +11,8 @@
  * Wireless base band device abstraction APIs.
  *
  * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * This API allows an application to discover, configure and use a device to
  * process operations. An asynchronous API (enqueue, followed by later dequeue)
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
index cbf1cddaca..e2d419b4ef 100644
--- a/lib/librte_bpf/rte_bpf.h
+++ b/lib/librte_bpf/rte_bpf.h
@@ -7,9 +7,13 @@
 
 /**
  * @file rte_bpf.h
- * @b EXPERIMENTAL: this API may change without prior notice
  *
  * RTE BPF support.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * librte_bpf provides a framework to load and execute eBPF bytecode
  * inside user-space dpdk based applications.
  * It supports basic set of features from eBPF spec
diff --git a/lib/librte_compressdev/rte_compressdev.h b/lib/librte_compressdev/rte_compressdev.h
index 8052efe675..2840c27c6c 100644
--- a/lib/librte_compressdev/rte_compressdev.h
+++ b/lib/librte_compressdev/rte_compressdev.h
@@ -8,7 +8,11 @@
 /**
  * @file rte_compressdev.h
  *
- * RTE Compression Device APIs
+ * RTE Compression Device APIs.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * Defines comp device APIs for the provisioning of compression operations.
  */
diff --git a/lib/librte_fib/rte_fib.h b/lib/librte_fib/rte_fib.h
index af3bbf07ee..84ee774d2d 100644
--- a/lib/librte_fib/rte_fib.h
+++ b/lib/librte_fib/rte_fib.h
@@ -8,6 +8,13 @@
 
 /**
  * @file
+ *
+ * RTE FIB library.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * FIB (Forwarding information base) implementation
  * for IPv4 Longest Prefix Match
  */
diff --git a/lib/librte_fib/rte_fib6.h b/lib/librte_fib/rte_fib6.h
index 66c71c84c9..bbfcf23a85 100644
--- a/lib/librte_fib/rte_fib6.h
+++ b/lib/librte_fib/rte_fib6.h
@@ -8,6 +8,13 @@
 
 /**
  * @file
+ *
+ * RTE FIB6 library.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * FIB (Forwarding information base) implementation
  * for IPv6 Longest Prefix Match
  */
diff --git a/lib/librte_flow_classify/rte_flow_classify.h b/lib/librte_flow_classify/rte_flow_classify.h
index 74d1ecaf50..82ea92b6a6 100644
--- a/lib/librte_flow_classify/rte_flow_classify.h
+++ b/lib/librte_flow_classify/rte_flow_classify.h
@@ -8,9 +8,11 @@
 /**
  * @file
  *
- * RTE Flow Classify Library
+ * RTE Flow Classify Library.
  *
- * @b EXPERIMENTAL: this API may change without prior notice
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * This library provides flow record information with some measured properties.
  *
diff --git a/lib/librte_graph/rte_graph.h b/lib/librte_graph/rte_graph.h
index 9a26ffc185..b32c4bc217 100644
--- a/lib/librte_graph/rte_graph.h
+++ b/lib/librte_graph/rte_graph.h
@@ -9,7 +9,8 @@
  * @file rte_graph.h
  *
  * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * Graph architecture abstracts the data processing functions as
  * "node" and "link" them together to create a complex "graph" to enable
diff --git a/lib/librte_graph/rte_graph_worker.h b/lib/librte_graph/rte_graph_worker.h
index 4c3ddcbdeb..eef77f732a 100644
--- a/lib/librte_graph/rte_graph_worker.h
+++ b/lib/librte_graph/rte_graph_worker.h
@@ -9,7 +9,8 @@
  * @file rte_graph_worker.h
  *
  * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * This API allows a worker thread to walk over a graph and nodes to create,
  * process, enqueue and move streams of objects to the next nodes.
diff --git a/lib/librte_ipsec/rte_ipsec.h b/lib/librte_ipsec/rte_ipsec.h
index 6666cf7619..de05f4e932 100644
--- a/lib/librte_ipsec/rte_ipsec.h
+++ b/lib/librte_ipsec/rte_ipsec.h
@@ -7,9 +7,13 @@
 
 /**
  * @file rte_ipsec.h
- * @b EXPERIMENTAL: this API may change without prior notice
  *
  * RTE IPsec support.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * librte_ipsec provides a framework for data-path IPsec protocol
  * processing (ESP/AH).
  */
diff --git a/lib/librte_node/rte_node_eth_api.h b/lib/librte_node/rte_node_eth_api.h
index e9a53afe5d..4e28f86d77 100644
--- a/lib/librte_node/rte_node_eth_api.h
+++ b/lib/librte_node/rte_node_eth_api.h
@@ -9,7 +9,8 @@
  * @file rte_node_eth_api.h
  *
  * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * This API allows to setup ethdev_rx and ethdev_tx nodes
  * and its queue associations.
diff --git a/lib/librte_node/rte_node_ip4_api.h b/lib/librte_node/rte_node_ip4_api.h
index 31a752b00b..eb9ebd5f89 100644
--- a/lib/librte_node/rte_node_ip4_api.h
+++ b/lib/librte_node/rte_node_ip4_api.h
@@ -9,7 +9,8 @@
  * @file rte_node_ip4_api.h
  *
  * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * This API allows to do control path functions of ip4_* nodes
  * like ip4_lookup, ip4_rewrite.
diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index fd4eb52b7f..a98e8f0f82 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -7,7 +7,12 @@
 
 /**
  * @file
- * RTE Quiescent State Based Reclamation (QSBR)
+ *
+ * RTE Quiescent State Based Reclamation (QSBR).
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * Quiescent State (QS) is any point in the thread execution
  * where the thread does not hold a reference to a data structure
diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
index 6b70de980a..da558c417e 100644
--- a/lib/librte_rib/rte_rib.h
+++ b/lib/librte_rib/rte_rib.h
@@ -8,6 +8,13 @@
 
 /**
  * @file
+ *
+ * RTE RIB library.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * Level compressed tree implementation for IPv4 Longest Prefix Match
  */
 
diff --git a/lib/librte_rib/rte_rib6.h b/lib/librte_rib/rte_rib6.h
index 871457138d..4b284c913c 100644
--- a/lib/librte_rib/rte_rib6.h
+++ b/lib/librte_rib/rte_rib6.h
@@ -8,6 +8,13 @@
 
 /**
  * @file
+ *
+ * RTE rib6 library.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * Level compressed tree implementation for IPv6 Longest Prefix Match
  */
 
diff --git a/lib/librte_stack/rte_stack.h b/lib/librte_stack/rte_stack.h
index 27ddb199e5..abf6420766 100644
--- a/lib/librte_stack/rte_stack.h
+++ b/lib/librte_stack/rte_stack.h
@@ -4,9 +4,12 @@
 
 /**
  * @file rte_stack.h
- * @b EXPERIMENTAL: this API may change without prior notice
  *
- * RTE Stack
+ * RTE Stack.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * librte_stack provides an API for configuration and use of a bounded stack of
  * pointers. Push and pop operations are MT-safe, allowing concurrent access,
diff --git a/lib/librte_telemetry/rte_telemetry.h b/lib/librte_telemetry/rte_telemetry.h
index eb7f2c917c..d13010b8fb 100644
--- a/lib/librte_telemetry/rte_telemetry.h
+++ b/lib/librte_telemetry/rte_telemetry.h
@@ -20,11 +20,13 @@
 #define RTE_TEL_MAX_ARRAY_ENTRIES 512
 
 /**
- * @warning
- * @b EXPERIMENTAL: all functions in this file may change without prior notice
- *
  * @file
- * RTE Telemetry
+ *
+ * RTE Telemetry.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * The telemetry library provides a method to retrieve statistics from
  * DPDK by sending a request message over a socket. DPDK will send
-- 
2.23.0


^ permalink raw reply	[relevance 16%]

* [dpdk-dev] [PATCH v3 2/3] drivers: drop workaround for internal libraries
  2020-06-26  8:16  5% ` [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup David Marchand
  2020-06-26  8:16 24%   ` [dpdk-dev] [PATCH v3 1/3] build: remove special versioning for non stable libraries David Marchand
@ 2020-06-26  8:16  3%   ` David Marchand
  2020-06-26  8:16 16%   ` [dpdk-dev] [PATCH v3 3/3] lib: remind experimental status in library headers David Marchand
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-26  8:16 UTC (permalink / raw)
  To: dev
  Cc: thomas, honnappa.nagarahalli, techboard, Ray Kinsella,
	Neil Horman, Hemant Agrawal, Sachin Saxena, Jerin Jacob,
	Nithin Dabilpuram, Akhil Goyal

Now that all libraries have a single version, we can drop the empty
stable blocks that had been added when moving symbols from stable to
internal ABI.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/bus/dpaa/rte_bus_dpaa_version.map                   | 6 ++----
 drivers/bus/fslmc/rte_bus_fslmc_version.map                 | 6 ++----
 drivers/common/dpaax/rte_common_dpaax_version.map           | 6 ++----
 drivers/common/octeontx2/rte_common_octeontx2_version.map   | 6 ++----
 drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map      | 6 ++----
 drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map        | 6 ++----
 drivers/mempool/dpaa/rte_mempool_dpaa_version.map           | 6 ++----
 drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map | 6 ++----
 drivers/net/dpaa2/rte_pmd_dpaa2_version.map                 | 6 ++----
 9 files changed, 18 insertions(+), 36 deletions(-)

diff --git a/drivers/bus/dpaa/rte_bus_dpaa_version.map b/drivers/bus/dpaa/rte_bus_dpaa_version.map
index 46d42f7d64..491c507119 100644
--- a/drivers/bus/dpaa/rte_bus_dpaa_version.map
+++ b/drivers/bus/dpaa/rte_bus_dpaa_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
@@ -90,4 +86,6 @@ INTERNAL {
 	rte_dpaa_portal_fq_close;
 	rte_dpaa_portal_fq_init;
 	rte_dpaa_portal_init;
+
+	local: *;
 };
diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map b/drivers/bus/fslmc/rte_bus_fslmc_version.map
index 69e7dc6ad9..0a9947a454 100644
--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 EXPERIMENTAL {
 	global:
 
@@ -111,4 +107,6 @@ INTERNAL {
 	rte_fslmc_get_device_count;
 	rte_fslmc_object_register;
 	rte_global_active_dqs_list;
+
+	local: *;
 };
diff --git a/drivers/common/dpaax/rte_common_dpaax_version.map b/drivers/common/dpaax/rte_common_dpaax_version.map
index 49c775c072..ee1ca6801c 100644
--- a/drivers/common/dpaax/rte_common_dpaax_version.map
+++ b/drivers/common/dpaax/rte_common_dpaax_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
@@ -23,4 +19,6 @@ INTERNAL {
 	of_n_addr_cells;
 	of_translate_address;
 	rta_sec_era;
+
+	local: *;
 };
diff --git a/drivers/common/octeontx2/rte_common_octeontx2_version.map b/drivers/common/octeontx2/rte_common_octeontx2_version.map
index d26bd71172..9a9969613b 100644
--- a/drivers/common/octeontx2/rte_common_octeontx2_version.map
+++ b/drivers/common/octeontx2/rte_common_octeontx2_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
@@ -42,4 +38,6 @@ INTERNAL {
 	otx2_sso_pf_func_get;
 	otx2_sso_pf_func_set;
 	otx2_unregister_irq;
+
+	local: *;
 };
diff --git a/drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map b/drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map
index 3d863aff4d..1352f576e5 100644
--- a/drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map
+++ b/drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map
@@ -1,10 +1,8 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
 	dpaa2_sec_eventq_attach;
 	dpaa2_sec_eventq_detach;
+
+	local: *;
 };
diff --git a/drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map b/drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map
index 023e120516..731ea593ad 100644
--- a/drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map
+++ b/drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map
@@ -1,10 +1,8 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
 	dpaa_sec_eventq_attach;
 	dpaa_sec_eventq_detach;
+
+	local: *;
 };
diff --git a/drivers/mempool/dpaa/rte_mempool_dpaa_version.map b/drivers/mempool/dpaa/rte_mempool_dpaa_version.map
index 89d7cf4957..142547ee38 100644
--- a/drivers/mempool/dpaa/rte_mempool_dpaa_version.map
+++ b/drivers/mempool/dpaa/rte_mempool_dpaa_version.map
@@ -1,10 +1,8 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
 	rte_dpaa_bpid_info;
 	rte_dpaa_memsegs;
+
+	local: *;
 };
diff --git a/drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map b/drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map
index 8691efdfd8..e6887ceb8f 100644
--- a/drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map
+++ b/drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map
@@ -1,10 +1,8 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
 	otx2_npa_lf_fini;
 	otx2_npa_lf_init;
+
+	local: *;
 };
diff --git a/drivers/net/dpaa2/rte_pmd_dpaa2_version.map b/drivers/net/dpaa2/rte_pmd_dpaa2_version.map
index b633fdc2a8..c3a457d2b9 100644
--- a/drivers/net/dpaa2/rte_pmd_dpaa2_version.map
+++ b/drivers/net/dpaa2/rte_pmd_dpaa2_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 EXPERIMENTAL {
 	global:
 
@@ -15,4 +11,6 @@ INTERNAL {
 
 	dpaa2_eth_eventq_attach;
 	dpaa2_eth_eventq_detach;
+
+	local: *;
 };
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 1/3] build: remove special versioning for non stable libraries
  2020-06-26  8:16  5% ` [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup David Marchand
@ 2020-06-26  8:16 24%   ` David Marchand
  2020-06-26  8:38  0%     ` Kinsella, Ray
  2020-06-26  8:16  3%   ` [dpdk-dev] [PATCH v3 2/3] drivers: drop workaround for internal libraries David Marchand
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 200+ results
From: David Marchand @ 2020-06-26  8:16 UTC (permalink / raw)
  To: dev
  Cc: thomas, honnappa.nagarahalli, techboard, Ray Kinsella,
	Neil Horman, John McNamara, Marko Kovacevic

Having a special versioning for experimental/internal libraries put a
additional maintenance cost while this status is already announced in
MAINTAINERS and the library headers/documentation.
Following discussions and vote at 05/20 TB meeting [1], use a single
versioning for all libraries in DPDK.

Note: for the ABI check, an exception [2] had been added when tweaking
this special versioning [3].
Prefer explicit libabigail rules (which will be dropped in 20.11).

1: https://mails.dpdk.org/archives/dev/2020-May/168450.html
2: https://git.dpdk.org/dpdk/commit/?id=23d7ad5db41c
3: https://git.dpdk.org/dpdk/commit/?id=ec2b8cd7ed69

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v2:
- added exceptions for librte_graph and librte_node missed post 20.05,

Changes since v1:
- removed mention of special handling in ABI docs,

---
 buildtools/meson.build                     |  3 ---
 config/meson.build                         | 16 +++++-------
 devtools/check-abi.sh                      |  5 ----
 devtools/libabigail.abignore               | 29 ++++++++++++++++++++--
 doc/guides/contributing/abi_policy.rst     |  6 +----
 doc/guides/contributing/abi_versioning.rst |  3 +--
 drivers/meson.build                        | 17 +------------
 lib/meson.build                            | 16 +-----------
 mk/rte.lib.mk                              |  5 ----
 9 files changed, 37 insertions(+), 63 deletions(-)

diff --git a/buildtools/meson.build b/buildtools/meson.build
index d5f8291beb..79703b6f93 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -18,6 +18,3 @@ else
 endif
 map_to_def_cmd = py3 + files('map_to_def.py')
 sphinx_wrapper = py3 + files('call-sphinx-build.py')
-
-# stable ABI always starts with "DPDK_"
-is_stable_cmd = [find_program('grep', 'findstr'), '^DPDK_']
diff --git a/config/meson.build b/config/meson.build
index 351e268c1f..d6d3f5271d 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -25,18 +25,14 @@ major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
 abi_version = run_command(find_program('cat', 'more'),
 	abi_version_file).stdout().strip()
 
-# Regular libraries have the abi_version as the filename extension
+# Libraries have the abi_version as the filename extension
 # and have the soname be all but the final part of the abi_version.
-# Experimental libraries have soname with '0.major'
-# and the filename suffix as 0.majorminor versions,
-# e.g. v20.1 => librte_stable.so.20.1, librte_experimental.so.0.201
-#    sonames => librte_stable.so.20, librte_experimental.so.0.20
-# e.g. v20.0.1 => librte_stable.so.20.0.1, librte_experimental.so.0.2001
-#      sonames => librte_stable.so.20.0, librte_experimental.so.0.200
+# e.g. v20.1 => librte_foo.so.20.1
+#    sonames => librte_foo.so.20
+# e.g. v20.0.1 => librte_foo.so.20.0.1
+#      sonames => librte_foo.so.20.0
 abi_va = abi_version.split('.')
-stable_so_version = abi_va.length() == 2 ? abi_va[0] : abi_va[0] + '.' + abi_va[1]
-experimental_abi_version = '0.' + abi_va[0] + abi_va[1] + '.' + abi_va[2]
-experimental_so_version = experimental_abi_version
+so_version = abi_va.length() == 2 ? abi_va[0] : abi_va[0] + '.' + abi_va[1]
 
 # extract all version information into the build configuration
 dpdk_conf.set('RTE_VER_YEAR', pver.get(0).to_int())
diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
index dd9120e69e..e17fedbd9f 100755
--- a/devtools/check-abi.sh
+++ b/devtools/check-abi.sh
@@ -44,11 +44,6 @@ for dump in $(find $refdir -name "*.dump"); do
 		echo "Skipped glue library $name."
 		continue
 	fi
-	# skip experimental libraries, with a sover starting with 0.
-	if grep -qE "\<soname='[^']*\.so\.0\.[^']*'" $dump; then
-		echo "Skipped experimental library $name."
-		continue
-	fi
 	dump2=$(find $newdir -name $name)
 	if [ -z "$dump2" ] || [ ! -e "$dump2" ]; then
 		echo "Error: can't find $name in $newdir"
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index becbf842a5..0133f757d0 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -50,9 +50,9 @@
         name = rte_crypto_aead_algorithm_strings
 
 ;;;;;;;;;;;;;;;;;;;;;;
-; Temporary exceptions for new __rte_internal marking till DPDK 20.11
+; Temporary exceptions till DPDK 20.11
 ;;;;;;;;;;;;;;;;;;;;;;
-; Ignore moving OCTEONTX2 stable functions to INTERNAL tag
+; Ignore moving OCTEONTX2 stable functions to INTERNAL
 [suppress_file]
 	file_name_regexp = ^librte_common_octeontx2\.
 [suppress_file]
@@ -77,3 +77,28 @@
         name = rte_dpaa2_mbuf_alloc_bulk
 [suppress_function]
         name_regexp = ^dpaa2?_.*tach$
+; Ignore soname changes for experimental libraries
+[suppress_file]
+	file_name_regexp = ^librte_bbdev\.
+[suppress_file]
+	file_name_regexp = ^librte_bpf\.
+[suppress_file]
+	file_name_regexp = ^librte_compressdev\.
+[suppress_file]
+	file_name_regexp = ^librte_fib\.
+[suppress_file]
+	file_name_regexp = ^librte_flow_classify\.
+[suppress_file]
+	file_name_regexp = ^librte_graph\.
+[suppress_file]
+	file_name_regexp = ^librte_ipsec\.
+[suppress_file]
+	file_name_regexp = ^librte_node\.
+[suppress_file]
+	file_name_regexp = ^librte_rcu\.
+[suppress_file]
+	file_name_regexp = ^librte_rib\.
+[suppress_file]
+	file_name_regexp = ^librte_telemetry\.
+[suppress_file]
+	file_name_regexp = ^librte_stack\.
diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index ee17ccb200..1b2fa27865 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -28,7 +28,6 @@ General Guidelines
    once approved these will form part of the next ABI version.
 #. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
    change without constraint, as they are not considered part of an ABI version.
-   Experimental libraries have the major ABI version ``0``.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -331,7 +330,4 @@ Libraries
 ~~~~~~~~~
 
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
-version, and may change without warning at any time. Experimental libraries
-always have a major ABI version of ``0`` to indicate they exist outside of
-:ref:`abi_versioning` , with the minor version incremented with each ABI change
-to library.
+version, and may change without warning at any time.
diff --git a/doc/guides/contributing/abi_versioning.rst b/doc/guides/contributing/abi_versioning.rst
index e96fde340f..31a9205572 100644
--- a/doc/guides/contributing/abi_versioning.rst
+++ b/doc/guides/contributing/abi_versioning.rst
@@ -112,8 +112,7 @@ how this may be done.
 
 At the same time, the major ABI version is changed atomically across all
 libraries by incrementing the major version in the ABI_VERSION file. This is
-done globally for all libraries that declare a stable ABI. For libraries marked
-as EXPERIMENTAL, their major ABI version is always set to 0.
+done globally for all libraries.
 
 Minor ABI versions
 ~~~~~~~~~~~~~~~~~~
diff --git a/drivers/meson.build b/drivers/meson.build
index cfb6a833c9..d1b59a4bac 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -124,21 +124,6 @@ foreach class:dpdk_driver_classes
 					output: out_filename,
 					depends: [pmdinfogen, tmp_lib])
 
-			version_map = '@0@/@1@/@2@_version.map'.format(
-					meson.current_source_dir(),
-					drv_path, lib_name)
-
-			is_stable = run_command(is_stable_cmd,
-				files(version_map)).returncode() == 0
-
-			if is_stable
-				lib_version = abi_version
-				so_version = stable_so_version
-			else
-				lib_version = experimental_abi_version
-				so_version = experimental_so_version
-			endif
-
 			# now build the static driver
 			static_lib = static_library(lib_name,
 				sources,
@@ -183,7 +168,7 @@ foreach class:dpdk_driver_classes
 				c_args: cflags,
 				link_args: lk_args,
 				link_depends: lk_deps,
-				version: lib_version,
+				version: abi_version,
 				soversion: so_version,
 				install: true,
 				install_dir: driver_install_path)
diff --git a/lib/meson.build b/lib/meson.build
index d190d84eff..d646f33e07 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -107,20 +107,6 @@ foreach l:libraries
 				cflags += '-DRTE_USE_FUNCTION_VERSIONING'
 			endif
 
-			version_map = '@0@/@1@/rte_@2@_version.map'.format(
-					meson.current_source_dir(), dir_name, name)
-
-			is_stable = run_command(is_stable_cmd,
-					files(version_map)).returncode() == 0
-
-			if is_stable
-				lib_version = abi_version
-				so_version = stable_so_version
-			else
-				lib_version = experimental_abi_version
-				so_version = experimental_so_version
-			endif
-
 			# first build static lib
 			static_lib = static_library(libname,
 					sources,
@@ -179,7 +165,7 @@ foreach l:libraries
 					include_directories: includes,
 					link_args: lk_args,
 					link_depends: lk_deps,
-					version: lib_version,
+					version: abi_version,
 					soversion: so_version,
 					install: true)
 			shared_dep = declare_dependency(link_with: shared_lib,
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 682b590dba..229ae16814 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -13,11 +13,6 @@ VPATH += $(SRCDIR)
 
 LIBABIVER ?= $(shell cat $(RTE_SRCDIR)/ABI_VERSION)
 SOVER := $(basename $(LIBABIVER))
-ifeq ($(shell grep -s "^DPDK_" $(SRCDIR)/$(EXPORT_MAP)),)
-# EXPERIMENTAL ABI is versioned as 0.major+minor, e.g. 0.201 for 20.1 ABI
-LIBABIVER := 0.$(shell echo $(LIBABIVER) | awk 'BEGIN { FS="." }; { print $$1$$2"."$$3 }')
-SOVER := $(LIBABIVER)
-endif
 
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 SONAME := $(patsubst %.a,%.so.$(SOVER),$(LIB))
-- 
2.23.0


^ permalink raw reply	[relevance 24%]

* [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup
    2020-06-25  7:21  5% ` [dpdk-dev] [PATCH v2 " David Marchand
@ 2020-06-26  8:16  5% ` David Marchand
  2020-06-26  8:16 24%   ` [dpdk-dev] [PATCH v3 1/3] build: remove special versioning for non stable libraries David Marchand
                     ` (4 more replies)
  1 sibling, 5 replies; 200+ results
From: David Marchand @ 2020-06-26  8:16 UTC (permalink / raw)
  To: dev; +Cc: thomas, honnappa.nagarahalli, techboard

Following discussions on the mailing list and the 05/20 TB meeting, here
is a series that drops the special versioning for non stable libraries.

Two notes:

- RIB/FIB library is not referenced in the API doxygen index, is this
  intentional?
- I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
  announced as experimental while their functions are part of the 20
  stable ABI (in .map files + no __rte_experimental marking).
  Their fate must be discussed.

Changes since v2:
- added librte_graph and librte_node missed when rebasing to 20.05,

Changes since v1:
- rebased on master,
- removed mention of 0 version in abi docs,
- updated wording in experimental banner and abi docs following Honnappa
  comment,


-- 
David Marchand

David Marchand (3):
  build: remove special versioning for non stable libraries
  drivers: drop workaround for internal libraries
  lib: remind experimental status in library headers

 buildtools/meson.build                        |  3 --
 config/meson.build                            | 16 ++++------
 devtools/check-abi.sh                         |  5 ----
 devtools/libabigail.abignore                  | 29 +++++++++++++++++--
 doc/guides/contributing/abi_policy.rst        | 12 ++++----
 doc/guides/contributing/abi_versioning.rst    |  3 +-
 drivers/bus/dpaa/rte_bus_dpaa_version.map     |  6 ++--
 drivers/bus/fslmc/rte_bus_fslmc_version.map   |  6 ++--
 .../common/dpaax/rte_common_dpaax_version.map |  6 ++--
 .../rte_common_octeontx2_version.map          |  6 ++--
 .../dpaa2_sec/rte_pmd_dpaa2_sec_version.map   |  6 ++--
 .../dpaa_sec/rte_pmd_dpaa_sec_version.map     |  6 ++--
 .../mempool/dpaa/rte_mempool_dpaa_version.map |  6 ++--
 .../rte_mempool_octeontx2_version.map         |  6 ++--
 drivers/meson.build                           | 17 +----------
 drivers/net/dpaa2/rte_pmd_dpaa2_version.map   |  6 ++--
 lib/librte_bbdev/rte_bbdev.h                  |  3 +-
 lib/librte_bpf/rte_bpf.h                      |  6 +++-
 lib/librte_compressdev/rte_compressdev.h      |  6 +++-
 lib/librte_fib/rte_fib.h                      |  7 +++++
 lib/librte_fib/rte_fib6.h                     |  7 +++++
 lib/librte_flow_classify/rte_flow_classify.h  |  6 ++--
 lib/librte_graph/rte_graph.h                  |  3 +-
 lib/librte_graph/rte_graph_worker.h           |  3 +-
 lib/librte_ipsec/rte_ipsec.h                  |  6 +++-
 lib/librte_node/rte_node_eth_api.h            |  3 +-
 lib/librte_node/rte_node_ip4_api.h            |  3 +-
 lib/librte_rcu/rte_rcu_qsbr.h                 |  7 ++++-
 lib/librte_rib/rte_rib.h                      |  7 +++++
 lib/librte_rib/rte_rib6.h                     |  7 +++++
 lib/librte_stack/rte_stack.h                  |  7 +++--
 lib/librte_telemetry/rte_telemetry.h          | 10 ++++---
 lib/meson.build                               | 16 +---------
 mk/rte.lib.mk                                 |  5 ----
 34 files changed, 133 insertions(+), 118 deletions(-)

-- 
2.23.0


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] DPDK Release Status Meeting 25/06/2020
@ 2020-06-25 11:49  3% Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2020-06-25 11:49 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

Minutes 25 June 2020
--------------------

Agenda:
* Release Dates
* Subtrees
* LTS
* OvS
* Opens

Participants:
* Debian/Microsoft
* Intel
* Marvell
* Mellanox
* NXP
* Red Hat


Release Dates
-------------

* v20.08 dates:
  * -rc1:           Wednesday, 8 July   2020
  * -rc2:           Monday,   20 July   2020
  * Release:        Tuesday,   4 August 2020

* v20.11 proposal dates, please comment:
  * Proposal/V1:    Wednesday, 2 September 2020
  * -rc1:           Wednesday, 30 September 2020
  * -rc2:           Friday, 16 October 2020
  * Release:        Friday, 6 November 2020


Subtrees
--------

* main
  * Window porting progressing
    * Core library support, mempool, mbuf & ethdev planned for this release
      * A PMD for windows planned for 20.11
  * -rc1 deadline looks short considering roadmap features
    * Need more reviews
    * Some features may be dropped if not reviewed
  * Jerin will review Mellanox multicast patch
    * https://mails.dpdk.org/archives/dev/2020-June/169943.html

* next-net
  * Only some patches merged this week, pulled from vendor sub-trees
  * There are still some ethdev patches
  * Big shared code updates from Intel waiting is a concern

* next-crypto
  * Started reviews
  * There is DOCSIS patches under review
  * Will try to merge some on this week
  * If compressdev maintenance takes too much effort we can consider separating
    it
    * Not much activity as of now, Mellanox may send some patches
    * bbdev also in Akhil's bucket, we may consider separating that based on
      work load

* next-eventdev
  * Waiting a new version of Intel DLB PMD
    * If there are API changes, having it late may cause missing the release
  * Will merge some patches next week

* next-virtio
  * vhost improve ready state patchset discussion is going on
  * vDPA API and framework rework, rebased and send a new version
  * Chenbo/Ferruh will cover during absence of Maxime


LTS
---

* v19.11.3 released
  * https://mails.dpdk.org/archives/dev/2020-June/170820.htmlesult

* v18.11.9-rc1 is out, please test
  * https://mails.dpdk.org/archives/dev/2020-June/170888.html
  * The planned date for the final release is 3rd July.
  * Microsoft testing it, will send the report


OvS
---
 * 2.14 Release feature deadline (soft deadline) is approaching
  (beginning of July)


Opens
-----

* We may check ABI documentation if it requires clarification

* Testing
  * There is a new hire in the community lab
    * Will extend DTS for testing
    * Will start with basic ethdev testing

* A change in the recent kernels are causing crash with vfio
  * The kernel side commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abafbc551fddede3e0a08dee1dcde08fc0eb8476
    * The kernel side commit seems merged for v5.8-rc1
  * Fix in DPDK:
    http://inbox.dpdk.org/dev/20200625035046.19820-1-haiyue.wang@intel.com/

* Defects
  * Ajit prepared a Bugzilla report

* Meeting on "meet.jit.si/DPDK" going fine, will continue with it



DPDK Release Status Meetings
============================

The DPDK Release Status Meeting is intended for DPDK Committers to discuss the
status of the master tree and sub-trees, and for project managers to track
progress or milestone dates.

The meeting occurs on every Thursdays at 8:30 UTC. on https://meet.jit.si/DPDK

If you wish to attend just send an email to
"John McNamara <john.mcnamara@intel.com>" for the invite.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v16 2/2] eal: support for VFIO-PCI VF token
  @ 2020-06-25 10:49  3%         ` Wang, Haiyue
  0 siblings, 0 replies; 200+ results
From: Wang, Haiyue @ 2020-06-25 10:49 UTC (permalink / raw)
  To: David Marchand, Harman Kalra, Jerin Jacob Kollanukkaran
  Cc: dev, Burakov, Anatoly, Thomas Monjalon, Andrew Rybchenko,
	Maxime Coquelin

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, June 25, 2020 15:33
> To: Harman Kalra <hkalra@marvell.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Cc: Wang, Haiyue <haiyue.wang@intel.com>; dev <dev@dpdk.org>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: Re: [dpdk-dev] [PATCH v16 2/2] eal: support for VFIO-PCI VF token
> 
> On Mon, Jun 22, 2020 at 10:40 PM Harman Kalra <hkalra@marvell.com> wrote:
> >
> > On Wed, Jun 17, 2020 at 02:33:21PM +0800, Haiyue Wang wrote:
> > > The kernel module vfio-pci introduces the VF token to enable SR-IOV
> > > support since 5.7.
> > >
> > > The VF token can be set by a vfio-pci based PF driver and must be known
> > > by the vfio-pci based VF driver in order to gain access to the device.
> > >
> > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> > > ---
> >
> > Tested-by: Harman Kalra <hkalra@marvell.com>
> 
> Thanks for the test Harman.
> 
> I can see no complaint on using a single token for all devices, which

Yeah, not the best, but may meet with the most needed: create VFs by vfio-pci ;-)

Since devarg was a private option for ALL kind of devices, and have to break the
ABI policy to implement the design, so I drop this revision, choose to use it as
a global option as "vfio-intr" option does.

> is the only concern I would have with the last revision.
> If everyone is ok with this choice, I will take this for -rc1.
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/3] build: remove special versioning for non stable libraries
  2020-06-25  7:21 25%   ` [dpdk-dev] [PATCH v2 1/3] build: remove special versioning for non stable libraries David Marchand
@ 2020-06-25  9:25  0%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-25  9:25 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Honnappa Nagarahalli, techboard, Ray Kinsella,
	Neil Horman, John McNamara, Marko Kovacevic

On Thu, Jun 25, 2020 at 9:22 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> Having a special versioning for experimental/internal libraries put a
> additional maintenance cost while this status is already announced in
> MAINTAINERS and the library headers/documentation.
> Following discussions and vote at 05/20 TB meeting [1], use a single
> versioning for all libraries in DPDK.
>
> Note: for the ABI check, an exception [2] had been added when tweaking
> this special versioning [3].
> Prefer explicit libabigail rules (which will be dropped in 20.11).

I have missed the new experimental libraries from 20.05.
Expect a v3 later today.

-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 3/3] lib: remind experimental status in library headers
  2020-06-25  7:21  5% ` [dpdk-dev] [PATCH v2 " David Marchand
  2020-06-25  7:21 25%   ` [dpdk-dev] [PATCH v2 1/3] build: remove special versioning for non stable libraries David Marchand
  2020-06-25  7:21  3%   ` [dpdk-dev] [PATCH v2 2/3] drivers: drop workaround for internal libraries David Marchand
@ 2020-06-25  7:21 17%   ` David Marchand
  2 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-25  7:21 UTC (permalink / raw)
  To: dev
  Cc: thomas, honnappa.nagarahalli, techboard, stable, Ray Kinsella,
	Neil Horman, John McNamara, Marko Kovacevic, Nicolas Chautru,
	Konstantin Ananyev, Fiona Trahe, Ashish Gupta,
	Vladimir Medvedkin, Bernard Iremonger, Gage Eads, Olivier Matz,
	Kevin Laatz

The following libraries are experimental, all of their functions can
be changed or removed:

- librte_bbdev
- librte_bpf
- librte_compressdev
- librte_fib
- librte_flow_classify
- librte_ipsec
- librte_rcu
- librte_rib
- librte_stack
- librte_telemetry

Their status is properly announced in MAINTAINERS.
Remind this status in their headers in a common fashion (aligned to ABI
docs).

Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- updated wording following Honnappa comment

---
 doc/guides/contributing/abi_policy.rst       |  8 +++++---
 lib/librte_bbdev/rte_bbdev.h                 |  3 ++-
 lib/librte_bpf/rte_bpf.h                     |  6 +++++-
 lib/librte_compressdev/rte_compressdev.h     |  6 +++++-
 lib/librte_fib/rte_fib.h                     |  7 +++++++
 lib/librte_fib/rte_fib6.h                    |  7 +++++++
 lib/librte_flow_classify/rte_flow_classify.h |  6 ++++--
 lib/librte_ipsec/rte_ipsec.h                 |  6 +++++-
 lib/librte_rcu/rte_rcu_qsbr.h                |  7 ++++++-
 lib/librte_rib/rte_rib.h                     |  7 +++++++
 lib/librte_rib/rte_rib6.h                    |  7 +++++++
 lib/librte_stack/rte_stack.h                 |  7 +++++--
 lib/librte_telemetry/rte_telemetry.h         | 10 ++++++----
 13 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index 1b2fa27865..d0affa9e60 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -27,7 +27,8 @@ General Guidelines
 #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
    once approved these will form part of the next ABI version.
 #. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
-   change without constraint, as they are not considered part of an ABI version.
+   be changed or removed without prior notice, as they are not considered part
+   of an ABI version.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -294,7 +295,7 @@ APIs
 ~~~~
 
 APIs marked as ``experimental`` are not considered part of an ABI version and
-may change without warning at any time. Since changes to APIs are most likely
+may be changed or removed without prior notice. Since changes to APIs are most likely
 immediately after their introduction, as users begin to take advantage of those
 new APIs and start finding issues with them, new DPDK APIs will be automatically
 marked as ``experimental`` to allow for a period of stabilization before they
@@ -330,4 +331,5 @@ Libraries
 ~~~~~~~~~
 
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
-version, and may change without warning at any time.
+version.
+All functions in such libraries may be changed or removed without prior notice.
diff --git a/lib/librte_bbdev/rte_bbdev.h b/lib/librte_bbdev/rte_bbdev.h
index ecd95a823d..57291373fa 100644
--- a/lib/librte_bbdev/rte_bbdev.h
+++ b/lib/librte_bbdev/rte_bbdev.h
@@ -11,7 +11,8 @@
  * Wireless base band device abstraction APIs.
  *
  * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * This API allows an application to discover, configure and use a device to
  * process operations. An asynchronous API (enqueue, followed by later dequeue)
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
index cbf1cddaca..e2d419b4ef 100644
--- a/lib/librte_bpf/rte_bpf.h
+++ b/lib/librte_bpf/rte_bpf.h
@@ -7,9 +7,13 @@
 
 /**
  * @file rte_bpf.h
- * @b EXPERIMENTAL: this API may change without prior notice
  *
  * RTE BPF support.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * librte_bpf provides a framework to load and execute eBPF bytecode
  * inside user-space dpdk based applications.
  * It supports basic set of features from eBPF spec
diff --git a/lib/librte_compressdev/rte_compressdev.h b/lib/librte_compressdev/rte_compressdev.h
index 8052efe675..2840c27c6c 100644
--- a/lib/librte_compressdev/rte_compressdev.h
+++ b/lib/librte_compressdev/rte_compressdev.h
@@ -8,7 +8,11 @@
 /**
  * @file rte_compressdev.h
  *
- * RTE Compression Device APIs
+ * RTE Compression Device APIs.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * Defines comp device APIs for the provisioning of compression operations.
  */
diff --git a/lib/librte_fib/rte_fib.h b/lib/librte_fib/rte_fib.h
index af3bbf07ee..84ee774d2d 100644
--- a/lib/librte_fib/rte_fib.h
+++ b/lib/librte_fib/rte_fib.h
@@ -8,6 +8,13 @@
 
 /**
  * @file
+ *
+ * RTE FIB library.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * FIB (Forwarding information base) implementation
  * for IPv4 Longest Prefix Match
  */
diff --git a/lib/librte_fib/rte_fib6.h b/lib/librte_fib/rte_fib6.h
index 66c71c84c9..bbfcf23a85 100644
--- a/lib/librte_fib/rte_fib6.h
+++ b/lib/librte_fib/rte_fib6.h
@@ -8,6 +8,13 @@
 
 /**
  * @file
+ *
+ * RTE FIB6 library.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * FIB (Forwarding information base) implementation
  * for IPv6 Longest Prefix Match
  */
diff --git a/lib/librte_flow_classify/rte_flow_classify.h b/lib/librte_flow_classify/rte_flow_classify.h
index 74d1ecaf50..82ea92b6a6 100644
--- a/lib/librte_flow_classify/rte_flow_classify.h
+++ b/lib/librte_flow_classify/rte_flow_classify.h
@@ -8,9 +8,11 @@
 /**
  * @file
  *
- * RTE Flow Classify Library
+ * RTE Flow Classify Library.
  *
- * @b EXPERIMENTAL: this API may change without prior notice
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * This library provides flow record information with some measured properties.
  *
diff --git a/lib/librte_ipsec/rte_ipsec.h b/lib/librte_ipsec/rte_ipsec.h
index 6666cf7619..de05f4e932 100644
--- a/lib/librte_ipsec/rte_ipsec.h
+++ b/lib/librte_ipsec/rte_ipsec.h
@@ -7,9 +7,13 @@
 
 /**
  * @file rte_ipsec.h
- * @b EXPERIMENTAL: this API may change without prior notice
  *
  * RTE IPsec support.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * librte_ipsec provides a framework for data-path IPsec protocol
  * processing (ESP/AH).
  */
diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index fd4eb52b7f..a98e8f0f82 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -7,7 +7,12 @@
 
 /**
  * @file
- * RTE Quiescent State Based Reclamation (QSBR)
+ *
+ * RTE Quiescent State Based Reclamation (QSBR).
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * Quiescent State (QS) is any point in the thread execution
  * where the thread does not hold a reference to a data structure
diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
index 6b70de980a..da558c417e 100644
--- a/lib/librte_rib/rte_rib.h
+++ b/lib/librte_rib/rte_rib.h
@@ -8,6 +8,13 @@
 
 /**
  * @file
+ *
+ * RTE RIB library.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * Level compressed tree implementation for IPv4 Longest Prefix Match
  */
 
diff --git a/lib/librte_rib/rte_rib6.h b/lib/librte_rib/rte_rib6.h
index 871457138d..4b284c913c 100644
--- a/lib/librte_rib/rte_rib6.h
+++ b/lib/librte_rib/rte_rib6.h
@@ -8,6 +8,13 @@
 
 /**
  * @file
+ *
+ * RTE rib6 library.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
  * Level compressed tree implementation for IPv6 Longest Prefix Match
  */
 
diff --git a/lib/librte_stack/rte_stack.h b/lib/librte_stack/rte_stack.h
index 27ddb199e5..abf6420766 100644
--- a/lib/librte_stack/rte_stack.h
+++ b/lib/librte_stack/rte_stack.h
@@ -4,9 +4,12 @@
 
 /**
  * @file rte_stack.h
- * @b EXPERIMENTAL: this API may change without prior notice
  *
- * RTE Stack
+ * RTE Stack.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * librte_stack provides an API for configuration and use of a bounded stack of
  * pointers. Push and pop operations are MT-safe, allowing concurrent access,
diff --git a/lib/librte_telemetry/rte_telemetry.h b/lib/librte_telemetry/rte_telemetry.h
index eb7f2c917c..d13010b8fb 100644
--- a/lib/librte_telemetry/rte_telemetry.h
+++ b/lib/librte_telemetry/rte_telemetry.h
@@ -20,11 +20,13 @@
 #define RTE_TEL_MAX_ARRAY_ENTRIES 512
 
 /**
- * @warning
- * @b EXPERIMENTAL: all functions in this file may change without prior notice
- *
  * @file
- * RTE Telemetry
+ *
+ * RTE Telemetry.
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
  *
  * The telemetry library provides a method to retrieve statistics from
  * DPDK by sending a request message over a socket. DPDK will send
-- 
2.23.0


^ permalink raw reply	[relevance 17%]

* [dpdk-dev] [PATCH v2 1/3] build: remove special versioning for non stable libraries
  2020-06-25  7:21  5% ` [dpdk-dev] [PATCH v2 " David Marchand
@ 2020-06-25  7:21 25%   ` David Marchand
  2020-06-25  9:25  0%     ` David Marchand
  2020-06-25  7:21  3%   ` [dpdk-dev] [PATCH v2 2/3] drivers: drop workaround for internal libraries David Marchand
  2020-06-25  7:21 17%   ` [dpdk-dev] [PATCH v2 3/3] lib: remind experimental status in library headers David Marchand
  2 siblings, 1 reply; 200+ results
From: David Marchand @ 2020-06-25  7:21 UTC (permalink / raw)
  To: dev
  Cc: thomas, honnappa.nagarahalli, techboard, Ray Kinsella,
	Neil Horman, John McNamara, Marko Kovacevic

Having a special versioning for experimental/internal libraries put a
additional maintenance cost while this status is already announced in
MAINTAINERS and the library headers/documentation.
Following discussions and vote at 05/20 TB meeting [1], use a single
versioning for all libraries in DPDK.

Note: for the ABI check, an exception [2] had been added when tweaking
this special versioning [3].
Prefer explicit libabigail rules (which will be dropped in 20.11).

1: https://mails.dpdk.org/archives/dev/2020-May/168450.html
2: https://git.dpdk.org/dpdk/commit/?id=23d7ad5db41c
3: https://git.dpdk.org/dpdk/commit/?id=ec2b8cd7ed69

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- removed mention of special handling in ABI docs,

---
 buildtools/meson.build                     |  3 ---
 config/meson.build                         | 16 ++++++--------
 devtools/check-abi.sh                      |  5 -----
 devtools/libabigail.abignore               | 25 ++++++++++++++++++++--
 doc/guides/contributing/abi_policy.rst     |  6 +-----
 doc/guides/contributing/abi_versioning.rst |  3 +--
 drivers/meson.build                        | 17 +--------------
 lib/meson.build                            | 16 +-------------
 mk/rte.lib.mk                              |  5 -----
 9 files changed, 33 insertions(+), 63 deletions(-)

diff --git a/buildtools/meson.build b/buildtools/meson.build
index d5f8291beb..79703b6f93 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -18,6 +18,3 @@ else
 endif
 map_to_def_cmd = py3 + files('map_to_def.py')
 sphinx_wrapper = py3 + files('call-sphinx-build.py')
-
-# stable ABI always starts with "DPDK_"
-is_stable_cmd = [find_program('grep', 'findstr'), '^DPDK_']
diff --git a/config/meson.build b/config/meson.build
index 351e268c1f..d6d3f5271d 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -25,18 +25,14 @@ major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
 abi_version = run_command(find_program('cat', 'more'),
 	abi_version_file).stdout().strip()
 
-# Regular libraries have the abi_version as the filename extension
+# Libraries have the abi_version as the filename extension
 # and have the soname be all but the final part of the abi_version.
-# Experimental libraries have soname with '0.major'
-# and the filename suffix as 0.majorminor versions,
-# e.g. v20.1 => librte_stable.so.20.1, librte_experimental.so.0.201
-#    sonames => librte_stable.so.20, librte_experimental.so.0.20
-# e.g. v20.0.1 => librte_stable.so.20.0.1, librte_experimental.so.0.2001
-#      sonames => librte_stable.so.20.0, librte_experimental.so.0.200
+# e.g. v20.1 => librte_foo.so.20.1
+#    sonames => librte_foo.so.20
+# e.g. v20.0.1 => librte_foo.so.20.0.1
+#      sonames => librte_foo.so.20.0
 abi_va = abi_version.split('.')
-stable_so_version = abi_va.length() == 2 ? abi_va[0] : abi_va[0] + '.' + abi_va[1]
-experimental_abi_version = '0.' + abi_va[0] + abi_va[1] + '.' + abi_va[2]
-experimental_so_version = experimental_abi_version
+so_version = abi_va.length() == 2 ? abi_va[0] : abi_va[0] + '.' + abi_va[1]
 
 # extract all version information into the build configuration
 dpdk_conf.set('RTE_VER_YEAR', pver.get(0).to_int())
diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
index dd9120e69e..e17fedbd9f 100755
--- a/devtools/check-abi.sh
+++ b/devtools/check-abi.sh
@@ -44,11 +44,6 @@ for dump in $(find $refdir -name "*.dump"); do
 		echo "Skipped glue library $name."
 		continue
 	fi
-	# skip experimental libraries, with a sover starting with 0.
-	if grep -qE "\<soname='[^']*\.so\.0\.[^']*'" $dump; then
-		echo "Skipped experimental library $name."
-		continue
-	fi
 	dump2=$(find $newdir -name $name)
 	if [ -z "$dump2" ] || [ ! -e "$dump2" ]; then
 		echo "Error: can't find $name in $newdir"
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index becbf842a5..97899b926e 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -50,9 +50,9 @@
         name = rte_crypto_aead_algorithm_strings
 
 ;;;;;;;;;;;;;;;;;;;;;;
-; Temporary exceptions for new __rte_internal marking till DPDK 20.11
+; Temporary exceptions till DPDK 20.11
 ;;;;;;;;;;;;;;;;;;;;;;
-; Ignore moving OCTEONTX2 stable functions to INTERNAL tag
+; Ignore moving OCTEONTX2 stable functions to INTERNAL
 [suppress_file]
 	file_name_regexp = ^librte_common_octeontx2\.
 [suppress_file]
@@ -77,3 +77,24 @@
         name = rte_dpaa2_mbuf_alloc_bulk
 [suppress_function]
         name_regexp = ^dpaa2?_.*tach$
+; Ignore soname changes for experimental libraries
+[suppress_file]
+	file_name_regexp = ^librte_bbdev\.
+[suppress_file]
+	file_name_regexp = ^librte_bpf\.
+[suppress_file]
+	file_name_regexp = ^librte_compressdev\.
+[suppress_file]
+	file_name_regexp = ^librte_fib\.
+[suppress_file]
+	file_name_regexp = ^librte_flow_classify\.
+[suppress_file]
+	file_name_regexp = ^librte_ipsec\.
+[suppress_file]
+	file_name_regexp = ^librte_rcu\.
+[suppress_file]
+	file_name_regexp = ^librte_rib\.
+[suppress_file]
+	file_name_regexp = ^librte_telemetry\.
+[suppress_file]
+	file_name_regexp = ^librte_stack\.
diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index ee17ccb200..1b2fa27865 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -28,7 +28,6 @@ General Guidelines
    once approved these will form part of the next ABI version.
 #. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
    change without constraint, as they are not considered part of an ABI version.
-   Experimental libraries have the major ABI version ``0``.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -331,7 +330,4 @@ Libraries
 ~~~~~~~~~
 
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
-version, and may change without warning at any time. Experimental libraries
-always have a major ABI version of ``0`` to indicate they exist outside of
-:ref:`abi_versioning` , with the minor version incremented with each ABI change
-to library.
+version, and may change without warning at any time.
diff --git a/doc/guides/contributing/abi_versioning.rst b/doc/guides/contributing/abi_versioning.rst
index e96fde340f..31a9205572 100644
--- a/doc/guides/contributing/abi_versioning.rst
+++ b/doc/guides/contributing/abi_versioning.rst
@@ -112,8 +112,7 @@ how this may be done.
 
 At the same time, the major ABI version is changed atomically across all
 libraries by incrementing the major version in the ABI_VERSION file. This is
-done globally for all libraries that declare a stable ABI. For libraries marked
-as EXPERIMENTAL, their major ABI version is always set to 0.
+done globally for all libraries.
 
 Minor ABI versions
 ~~~~~~~~~~~~~~~~~~
diff --git a/drivers/meson.build b/drivers/meson.build
index cfb6a833c9..d1b59a4bac 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -124,21 +124,6 @@ foreach class:dpdk_driver_classes
 					output: out_filename,
 					depends: [pmdinfogen, tmp_lib])
 
-			version_map = '@0@/@1@/@2@_version.map'.format(
-					meson.current_source_dir(),
-					drv_path, lib_name)
-
-			is_stable = run_command(is_stable_cmd,
-				files(version_map)).returncode() == 0
-
-			if is_stable
-				lib_version = abi_version
-				so_version = stable_so_version
-			else
-				lib_version = experimental_abi_version
-				so_version = experimental_so_version
-			endif
-
 			# now build the static driver
 			static_lib = static_library(lib_name,
 				sources,
@@ -183,7 +168,7 @@ foreach class:dpdk_driver_classes
 				c_args: cflags,
 				link_args: lk_args,
 				link_depends: lk_deps,
-				version: lib_version,
+				version: abi_version,
 				soversion: so_version,
 				install: true,
 				install_dir: driver_install_path)
diff --git a/lib/meson.build b/lib/meson.build
index d190d84eff..d646f33e07 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -107,20 +107,6 @@ foreach l:libraries
 				cflags += '-DRTE_USE_FUNCTION_VERSIONING'
 			endif
 
-			version_map = '@0@/@1@/rte_@2@_version.map'.format(
-					meson.current_source_dir(), dir_name, name)
-
-			is_stable = run_command(is_stable_cmd,
-					files(version_map)).returncode() == 0
-
-			if is_stable
-				lib_version = abi_version
-				so_version = stable_so_version
-			else
-				lib_version = experimental_abi_version
-				so_version = experimental_so_version
-			endif
-
 			# first build static lib
 			static_lib = static_library(libname,
 					sources,
@@ -179,7 +165,7 @@ foreach l:libraries
 					include_directories: includes,
 					link_args: lk_args,
 					link_depends: lk_deps,
-					version: lib_version,
+					version: abi_version,
 					soversion: so_version,
 					install: true)
 			shared_dep = declare_dependency(link_with: shared_lib,
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 682b590dba..229ae16814 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -13,11 +13,6 @@ VPATH += $(SRCDIR)
 
 LIBABIVER ?= $(shell cat $(RTE_SRCDIR)/ABI_VERSION)
 SOVER := $(basename $(LIBABIVER))
-ifeq ($(shell grep -s "^DPDK_" $(SRCDIR)/$(EXPORT_MAP)),)
-# EXPERIMENTAL ABI is versioned as 0.major+minor, e.g. 0.201 for 20.1 ABI
-LIBABIVER := 0.$(shell echo $(LIBABIVER) | awk 'BEGIN { FS="." }; { print $$1$$2"."$$3 }')
-SOVER := $(LIBABIVER)
-endif
 
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 SONAME := $(patsubst %.a,%.so.$(SOVER),$(LIB))
-- 
2.23.0


^ permalink raw reply	[relevance 25%]

* [dpdk-dev] [PATCH v2 2/3] drivers: drop workaround for internal libraries
  2020-06-25  7:21  5% ` [dpdk-dev] [PATCH v2 " David Marchand
  2020-06-25  7:21 25%   ` [dpdk-dev] [PATCH v2 1/3] build: remove special versioning for non stable libraries David Marchand
@ 2020-06-25  7:21  3%   ` David Marchand
  2020-06-25  7:21 17%   ` [dpdk-dev] [PATCH v2 3/3] lib: remind experimental status in library headers David Marchand
  2 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-25  7:21 UTC (permalink / raw)
  To: dev
  Cc: thomas, honnappa.nagarahalli, techboard, Ray Kinsella,
	Neil Horman, Hemant Agrawal, Sachin Saxena, Jerin Jacob,
	Nithin Dabilpuram, Akhil Goyal

Now that all libraries have a single version, we can drop the empty
stable blocks that had been added when moving symbols from stable to
internal ABI.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/bus/dpaa/rte_bus_dpaa_version.map                   | 6 ++----
 drivers/bus/fslmc/rte_bus_fslmc_version.map                 | 6 ++----
 drivers/common/dpaax/rte_common_dpaax_version.map           | 6 ++----
 drivers/common/octeontx2/rte_common_octeontx2_version.map   | 6 ++----
 drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map      | 6 ++----
 drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map        | 6 ++----
 drivers/mempool/dpaa/rte_mempool_dpaa_version.map           | 6 ++----
 drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map | 6 ++----
 drivers/net/dpaa2/rte_pmd_dpaa2_version.map                 | 6 ++----
 9 files changed, 18 insertions(+), 36 deletions(-)

diff --git a/drivers/bus/dpaa/rte_bus_dpaa_version.map b/drivers/bus/dpaa/rte_bus_dpaa_version.map
index 46d42f7d64..491c507119 100644
--- a/drivers/bus/dpaa/rte_bus_dpaa_version.map
+++ b/drivers/bus/dpaa/rte_bus_dpaa_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
@@ -90,4 +86,6 @@ INTERNAL {
 	rte_dpaa_portal_fq_close;
 	rte_dpaa_portal_fq_init;
 	rte_dpaa_portal_init;
+
+	local: *;
 };
diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map b/drivers/bus/fslmc/rte_bus_fslmc_version.map
index 69e7dc6ad9..0a9947a454 100644
--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 EXPERIMENTAL {
 	global:
 
@@ -111,4 +107,6 @@ INTERNAL {
 	rte_fslmc_get_device_count;
 	rte_fslmc_object_register;
 	rte_global_active_dqs_list;
+
+	local: *;
 };
diff --git a/drivers/common/dpaax/rte_common_dpaax_version.map b/drivers/common/dpaax/rte_common_dpaax_version.map
index 49c775c072..ee1ca6801c 100644
--- a/drivers/common/dpaax/rte_common_dpaax_version.map
+++ b/drivers/common/dpaax/rte_common_dpaax_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
@@ -23,4 +19,6 @@ INTERNAL {
 	of_n_addr_cells;
 	of_translate_address;
 	rta_sec_era;
+
+	local: *;
 };
diff --git a/drivers/common/octeontx2/rte_common_octeontx2_version.map b/drivers/common/octeontx2/rte_common_octeontx2_version.map
index d26bd71172..9a9969613b 100644
--- a/drivers/common/octeontx2/rte_common_octeontx2_version.map
+++ b/drivers/common/octeontx2/rte_common_octeontx2_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
@@ -42,4 +38,6 @@ INTERNAL {
 	otx2_sso_pf_func_get;
 	otx2_sso_pf_func_set;
 	otx2_unregister_irq;
+
+	local: *;
 };
diff --git a/drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map b/drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map
index 3d863aff4d..1352f576e5 100644
--- a/drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map
+++ b/drivers/crypto/dpaa2_sec/rte_pmd_dpaa2_sec_version.map
@@ -1,10 +1,8 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
 	dpaa2_sec_eventq_attach;
 	dpaa2_sec_eventq_detach;
+
+	local: *;
 };
diff --git a/drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map b/drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map
index 023e120516..731ea593ad 100644
--- a/drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map
+++ b/drivers/crypto/dpaa_sec/rte_pmd_dpaa_sec_version.map
@@ -1,10 +1,8 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
 	dpaa_sec_eventq_attach;
 	dpaa_sec_eventq_detach;
+
+	local: *;
 };
diff --git a/drivers/mempool/dpaa/rte_mempool_dpaa_version.map b/drivers/mempool/dpaa/rte_mempool_dpaa_version.map
index 89d7cf4957..142547ee38 100644
--- a/drivers/mempool/dpaa/rte_mempool_dpaa_version.map
+++ b/drivers/mempool/dpaa/rte_mempool_dpaa_version.map
@@ -1,10 +1,8 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
 	rte_dpaa_bpid_info;
 	rte_dpaa_memsegs;
+
+	local: *;
 };
diff --git a/drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map b/drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map
index 8691efdfd8..e6887ceb8f 100644
--- a/drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map
+++ b/drivers/mempool/octeontx2/rte_mempool_octeontx2_version.map
@@ -1,10 +1,8 @@
-DPDK_20.0 {
-	local: *;
-};
-
 INTERNAL {
 	global:
 
 	otx2_npa_lf_fini;
 	otx2_npa_lf_init;
+
+	local: *;
 };
diff --git a/drivers/net/dpaa2/rte_pmd_dpaa2_version.map b/drivers/net/dpaa2/rte_pmd_dpaa2_version.map
index b633fdc2a8..c3a457d2b9 100644
--- a/drivers/net/dpaa2/rte_pmd_dpaa2_version.map
+++ b/drivers/net/dpaa2/rte_pmd_dpaa2_version.map
@@ -1,7 +1,3 @@
-DPDK_20.0 {
-	local: *;
-};
-
 EXPERIMENTAL {
 	global:
 
@@ -15,4 +11,6 @@ INTERNAL {
 
 	dpaa2_eth_eventq_attach;
 	dpaa2_eth_eventq_detach;
+
+	local: *;
 };
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 0/3] Experimental/internal libraries cleanup
  @ 2020-06-25  7:21  5% ` David Marchand
  2020-06-25  7:21 25%   ` [dpdk-dev] [PATCH v2 1/3] build: remove special versioning for non stable libraries David Marchand
                     ` (2 more replies)
  2020-06-26  8:16  5% ` [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup David Marchand
  1 sibling, 3 replies; 200+ results
From: David Marchand @ 2020-06-25  7:21 UTC (permalink / raw)
  To: dev; +Cc: thomas, honnappa.nagarahalli, techboard

Following discussions on the mailing list and the 05/20 TB meeting, here
is a series that drops the special versioning for non stable libraries.

Two notes:

- RIB/FIB library is not referenced in the API doxygen index, is this
  intentional?
- I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
  announced as experimental while their functions are part of the 20
  stable ABI (in .map files + no __rte_experimental marking).
  Their fate must be discussed.

Changes since v1:
- rebased on master,
- removed mention of 0 version in abi docs,
- updated wording in experimental banner and abi docs following Honnappa
  comment,


-- 
David Marchand

David Marchand (3):
  build: remove special versioning for non stable libraries
  drivers: drop workaround for internal libraries
  lib: remind experimental status in library headers

 buildtools/meson.build                        |  3 ---
 config/meson.build                            | 16 +++++-------
 devtools/check-abi.sh                         |  5 ----
 devtools/libabigail.abignore                  | 25 +++++++++++++++++--
 doc/guides/contributing/abi_policy.rst        | 12 ++++-----
 doc/guides/contributing/abi_versioning.rst    |  3 +--
 drivers/bus/dpaa/rte_bus_dpaa_version.map     |  6 ++---
 drivers/bus/fslmc/rte_bus_fslmc_version.map   |  6 ++---
 .../common/dpaax/rte_common_dpaax_version.map |  6 ++---
 .../rte_common_octeontx2_version.map          |  6 ++---
 .../dpaa2_sec/rte_pmd_dpaa2_sec_version.map   |  6 ++---
 .../dpaa_sec/rte_pmd_dpaa_sec_version.map     |  6 ++---
 .../mempool/dpaa/rte_mempool_dpaa_version.map |  6 ++---
 .../rte_mempool_octeontx2_version.map         |  6 ++---
 drivers/meson.build                           | 17 +------------
 drivers/net/dpaa2/rte_pmd_dpaa2_version.map   |  6 ++---
 lib/librte_bbdev/rte_bbdev.h                  |  3 ++-
 lib/librte_bpf/rte_bpf.h                      |  6 ++++-
 lib/librte_compressdev/rte_compressdev.h      |  6 ++++-
 lib/librte_fib/rte_fib.h                      |  7 ++++++
 lib/librte_fib/rte_fib6.h                     |  7 ++++++
 lib/librte_flow_classify/rte_flow_classify.h  |  6 +++--
 lib/librte_ipsec/rte_ipsec.h                  |  6 ++++-
 lib/librte_rcu/rte_rcu_qsbr.h                 |  7 +++++-
 lib/librte_rib/rte_rib.h                      |  7 ++++++
 lib/librte_rib/rte_rib6.h                     |  7 ++++++
 lib/librte_stack/rte_stack.h                  |  7 ++++--
 lib/librte_telemetry/rte_telemetry.h          | 10 +++++---
 lib/meson.build                               | 16 +-----------
 mk/rte.lib.mk                                 |  5 ----
 30 files changed, 121 insertions(+), 114 deletions(-)

-- 
2.23.0


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v4 1/4] hash: add kv hash library
  @ 2020-06-25  4:27  2%     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2020-06-25  4:27 UTC (permalink / raw)
  To: Vladimir Medvedkin, dev
  Cc: konstantin.ananyev, yipeng1.wang, sameh.gobriel,
	bruce.richardson, nd, Honnappa Nagarahalli, nd

Hi Vladimir,
	Few comments inline.

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Vladimir Medvedkin
> Sent: Friday, May 8, 2020 2:59 PM
> To: dev@dpdk.org
> Cc: konstantin.ananyev@intel.com; yipeng1.wang@intel.com;
> sameh.gobriel@intel.com; bruce.richardson@intel.com
> Subject: [dpdk-dev] [PATCH v4 1/4] hash: add kv hash library
> 
> KV hash is a special optimized key-value storage for fixed key and value sizes.
> At the moment it supports 32 bit keys and 64 bit values. This table is hash
> function agnostic so user must provide precalculated hash signature for
> add/delete/lookup operations.
> 
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> ---
>  lib/Makefile                           |   2 +-
>  lib/librte_hash/Makefile               |  14 +-
>  lib/librte_hash/k32v64_hash.c          | 277
> +++++++++++++++++++++++++++++++++
>  lib/librte_hash/k32v64_hash.h          |  98 ++++++++++++
>  lib/librte_hash/k32v64_hash_avx512vl.c |  59 +++++++
>  lib/librte_hash/meson.build            |  17 +-
>  lib/librte_hash/rte_hash_version.map   |   6 +-
>  lib/librte_hash/rte_kv_hash.c          | 184 ++++++++++++++++++++++
>  lib/librte_hash/rte_kv_hash.h          | 169 ++++++++++++++++++++
>  9 files changed, 821 insertions(+), 5 deletions(-)  create mode 100644
> lib/librte_hash/k32v64_hash.c  create mode 100644
> lib/librte_hash/k32v64_hash.h  create mode 100644
> lib/librte_hash/k32v64_hash_avx512vl.c
>  create mode 100644 lib/librte_hash/rte_kv_hash.c  create mode 100644
> lib/librte_hash/rte_kv_hash.h
> 
> diff --git a/lib/Makefile b/lib/Makefile index 9d24609..42769e9 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -48,7 +48,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
> DEPDIRS-librte_vhost := librte_eal librte_mempool librte_mbuf librte_ethdev \
>  			librte_net librte_hash librte_cryptodev
>  DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash -DEPDIRS-librte_hash :=
> librte_eal librte_ring
> +DEPDIRS-librte_hash := librte_eal librte_ring librte_mempool
>  DIRS-$(CONFIG_RTE_LIBRTE_EFD) += librte_efd  DEPDIRS-librte_efd :=
> librte_eal librte_ring librte_hash
>  DIRS-$(CONFIG_RTE_LIBRTE_RIB) += librte_rib diff --git
> a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index ec9f864..a0cdee9
> 100644
> --- a/lib/librte_hash/Makefile
> +++ b/lib/librte_hash/Makefile
> @@ -8,13 +8,15 @@ LIB = librte_hash.a
> 
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_ring
> +LDLIBS += -lrte_eal -lrte_ring -lrte_mempool
> 
>  EXPORT_MAP := rte_hash_version.map
> 
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_HASH) := rte_cuckoo_hash.c
>  SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
> +SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_kv_hash.c
> +SRCS-$(CONFIG_RTE_LIBRTE_HASH) += k32v64_hash.c
> 
>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h @@ -27,5
> +29,15 @@ endif  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include +=
> rte_jhash.h  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
> SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_kv_hash.h
> +
> +CC_AVX512VL_SUPPORT=$(shell $(CC) -mavx512vl -dM -E - </dev/null 2>&1 |
> +\ grep -q __AVX512VL__ && echo 1)
> +
> +ifeq ($(CC_AVX512VL_SUPPORT), 1)
> +	SRCS-$(CONFIG_RTE_LIBRTE_HASH) += k32v64_hash_avx512vl.c
> +	CFLAGS_k32v64_hash_avx512vl.o += -mavx512vl
> +	CFLAGS_k32v64_hash.o += -DCC_AVX512VL_SUPPORT endif
> 
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_hash/k32v64_hash.c b/lib/librte_hash/k32v64_hash.c
> new file mode 100644 index 0000000..24cd63a
> --- /dev/null
> +++ b/lib/librte_hash/k32v64_hash.c
> @@ -0,0 +1,277 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include <string.h>
> +
> +#include <rte_errno.h>
> +#include <rte_malloc.h>
> +#include <rte_memory.h>
> +
> +#include "k32v64_hash.h"
> +
> +static inline int
> +k32v64_hash_lookup(struct k32v64_hash_table *table, uint32_t key,
> +	uint32_t hash, uint64_t *value)
> +{
> +	return __k32v64_hash_lookup(table, key, hash, value,
> __kv_cmp_keys); }
Since, this is an inline function, would it be better to push this to the header file?


> +
> +static int
> +k32v64_hash_bulk_lookup(struct rte_kv_hash_table *ht, void *keys_p,
> +	uint32_t *hashes, void *values_p, unsigned int n) {
> +	struct k32v64_hash_table *table = (struct k32v64_hash_table *)ht;
> +	uint32_t *keys = keys_p;
> +	uint64_t *values = values_p;
> +	int ret, cnt = 0;
> +	unsigned int i;
> +
> +	if (unlikely((table == NULL) || (keys == NULL) || (hashes == NULL) ||
> +			(values == NULL)))
> +		return -EINVAL;
> +
> +	for (i = 0; i < n; i++) {
> +		ret = k32v64_hash_lookup(table, keys[i], hashes[i],
> +			&values[i]);
> +		if (ret == 0)
> +			cnt++;
> +	}
> +	return cnt;
> +}
> +
> +#ifdef CC_AVX512VL_SUPPORT
> +int
> +k32v64_hash_bulk_lookup_avx512vl(struct rte_kv_hash_table *ht,
> +	void *keys_p, uint32_t *hashes, void *values_p, unsigned int n);
> +#endif
> +
> +static rte_kv_hash_bulk_lookup_t
> +get_lookup_bulk_fn(void)
> +{
> +#ifdef CC_AVX512VL_SUPPORT
> +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL))
> +		return k32v64_hash_bulk_lookup_avx512vl; #endif
> +	return k32v64_hash_bulk_lookup;
> +}
> +
> +static int
> +k32v64_hash_add(struct k32v64_hash_table *table, uint32_t key,
> +	uint32_t hash, uint64_t value, uint64_t *old_value, int *found) {
> +	uint32_t bucket;
> +	int i, idx, ret;
> +	uint8_t msk;
> +	struct k32v64_ext_ent *tmp, *ent, *prev = NULL;
> +
> +	if (table == NULL)
> +		return -EINVAL;
> +
> +	bucket = hash & table->bucket_msk;
> +	/* Search key in table. Update value if exists */
> +	for (i = 0; i < K32V64_KEYS_PER_BUCKET; i++) {
> +		if ((key == table->t[bucket].key[i]) &&
> +				(table->t[bucket].key_mask & (1 << i))) {
> +			*old_value = table->t[bucket].val[i];
> +			*found = 1;
> +			__atomic_fetch_add(&table->t[bucket].cnt, 1,
> +				__ATOMIC_ACQUIRE);
> +			table->t[bucket].val[i] = value;
Suggest using C11 builtin to store value. As far as I know all the supported platforms in DPDK have 64b atomic store in 32b mode.
With this we will be able to avoid incrementing the counter. The reader will either get old or the new value.

> +			__atomic_fetch_add(&table->t[bucket].cnt, 1,
> +				__ATOMIC_RELEASE);
> +			return 0;
> +		}
> +	}
> +
> +	if (!SLIST_EMPTY(&table->t[bucket].head)) {
> +		SLIST_FOREACH(ent, &table->t[bucket].head, next) {
> +			if (ent->key == key) {
> +				*old_value = ent->val;
> +				*found = 1;
> +				__atomic_fetch_add(&table->t[bucket].cnt, 1,
> +					__ATOMIC_ACQUIRE);
> +				ent->val = value;
Same here. __atomic_store(&ent->val, value, __ATOMIC_RELAXED).

> +				__atomic_fetch_add(&table->t[bucket].cnt, 1,
> +					__ATOMIC_RELEASE);
> +				return 0;
> +			}
> +		}
> +	}
> +
> +	msk = ~table->t[bucket].key_mask & VALID_KEY_MSK;
> +	if (msk) {
> +		idx = __builtin_ctz(msk);
> +		table->t[bucket].key[idx] = key;
> +		table->t[bucket].val[idx] = value;
> +		__atomic_or_fetch(&table->t[bucket].key_mask, 1 << idx,
> +			__ATOMIC_RELEASE);
> +		table->nb_ent++;
Looks like bucket counter logic is needed for this case too. Please see the comments below in k32v64_hash_delete.

> +		*found = 0;
> +		return 0;
> +	}
> +
> +	ret = rte_mempool_get(table->ext_ent_pool, (void **)&ent);
> +	if (ret < 0)
> +		return ret;
> +
> +	SLIST_NEXT(ent, next) = NULL;
> +	ent->key = key;
> +	ent->val = value;
> +	rte_smp_wmb();
We need to avoid using rte_smp_* barriers as we are adopting C11 built-in atomics. See the below comment.

> +	SLIST_FOREACH(tmp, &table->t[bucket].head, next)
> +		prev = tmp;
> +
> +	if (prev == NULL)
> +		SLIST_INSERT_HEAD(&table->t[bucket].head, ent, next);
> +	else
> +		SLIST_INSERT_AFTER(prev, ent, next);
Both the inserts need to use release order when the 'ent' is inserted. I am not sure where the SLIST is being picked up from. But looking at the SLIST implementation in 'lib/librte_eal/windows/include/sys/queue.h', it is not implemented using C11. I think we could move queue.h from windows directory to a common directory and change the SLIST implementation.

> +
> +	table->nb_ent++;
> +	table->nb_ext_ent++;
> +	*found = 0;
> +	return 0;
> +}
> +
> +static int
> +k32v64_hash_delete(struct k32v64_hash_table *table, uint32_t key,
> +	uint32_t hash, uint64_t *old_value)
> +{
> +	uint32_t bucket;
> +	int i;
> +	struct k32v64_ext_ent *ent;
> +
> +	if (table == NULL)
> +		return -EINVAL;
> +
> +	bucket = hash & table->bucket_msk;
> +
> +	for (i = 0; i < K32V64_KEYS_PER_BUCKET; i++) {
> +		if ((key == table->t[bucket].key[i]) &&
> +				(table->t[bucket].key_mask & (1 << i))) {
> +			*old_value = table->t[bucket].val[i];
> +			ent = SLIST_FIRST(&table->t[bucket].head);
> +			if (ent) {
> +				__atomic_fetch_add(&table->t[bucket].cnt, 1,
> +					__ATOMIC_ACQUIRE);
> +				table->t[bucket].key[i] = ent->key;
> +				table->t[bucket].val[i] = ent->val;
> +				SLIST_REMOVE_HEAD(&table->t[bucket].head,
> next);
> +				__atomic_fetch_add(&table->t[bucket].cnt, 1,
> +					__ATOMIC_RELEASE);
> +				table->nb_ext_ent--;
> +			} else
> +				__atomic_and_fetch(&table-
> >t[bucket].key_mask,
> +					~(1 << i), __ATOMIC_RELEASE);
(taking note from your responses to my comments in v3)
It is possible that the reader might match the old key but get the new value.
1) Reader: successful key match
2) Writer: k32v64_hash_delete followed by k32v64_hash_add
3) Reader: reads the value

IMO, there are 2 ways to solve this issue:
1) Use the bucket count logic while adding an entry in the non-extended bucket (marked it in k32v64_hash_add).
2) Do not use the entry in the bucket till the readers indicate that they have stopped using the entry

> +			if (ent)
> +				rte_mempool_put(table->ext_ent_pool, ent);
> +			table->nb_ent--;
> +			return 0;
> +		}
> +	}
> +
> +	SLIST_FOREACH(ent, &table->t[bucket].head, next)
> +		if (ent->key == key)
> +			break;
> +
> +	if (ent == NULL)
> +		return -ENOENT;
> +
> +	*old_value = ent->val;
> +
> +	__atomic_fetch_add(&table->t[bucket].cnt, 1, __ATOMIC_ACQUIRE);
> +	SLIST_REMOVE(&table->t[bucket].head, ent, k32v64_ext_ent, next);
> +	__atomic_fetch_add(&table->t[bucket].cnt, 1, __ATOMIC_RELEASE);
> +	rte_mempool_put(table->ext_ent_pool, ent);
> +
> +	table->nb_ext_ent--;
> +	table->nb_ent--;
> +
> +	return 0;
> +}
> +
> +static int
> +k32v64_modify(struct rte_kv_hash_table *table, void *key_p, uint32_t hash,
> +	enum rte_kv_modify_op op, void *value_p, int *found) {
> +	struct k32v64_hash_table *ht = (struct k32v64_hash_table *)table;
> +	uint32_t *key = key_p;
> +	uint64_t value;
> +
> +	if ((ht == NULL) || (key == NULL) || (value_p == NULL) ||
> +			(found == NULL) || (op >= RTE_KV_MODIFY_OP_MAX))
> {
> +		return -EINVAL;
> +	}
Suggest doing this check in rte_kv_hash_add/rte_kv_hash_delete. Then every implementation does not have to do this.

> +
> +	value = *(uint64_t *)value_p;
In the API 'rte_kv_hash_add', value_p is 'void *' which does not convey that it is a pointer to 64b data. What happens while running on 32b systems?

> +	switch (op) {
> +	case RTE_KV_MODIFY_ADD:
> +		return k32v64_hash_add(ht, *key, hash, value, value_p,
> found);
> +	case RTE_KV_MODIFY_DEL:
> +		return k32v64_hash_delete(ht, *key, hash, value_p);
> +	default:
> +		break;
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +struct rte_kv_hash_table *
> +k32v64_hash_create(const struct rte_kv_hash_params *params) {
This is a private symbol, I think it needs to have '__rte' suffix?

> +	char hash_name[RTE_KV_HASH_NAMESIZE];
> +	struct k32v64_hash_table *ht = NULL;
> +	uint32_t mem_size, nb_buckets, max_ent;
> +	int ret;
> +	struct rte_mempool *mp;
> +
> +	if ((params == NULL) || (params->name == NULL) ||
> +			(params->entries == 0)) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
nit, these checks were done already in 'rte_kv_hash_create', these checks can be skipped.

> +
> +	ret = snprintf(hash_name, sizeof(hash_name), "KV_%s", params-
> >name);
> +	if (ret < 0 || ret >= RTE_KV_HASH_NAMESIZE) {
> +		rte_errno = ENAMETOOLONG;
> +		return NULL;
> +	}
Same here, this is checked in the calling function.

> +
> +	max_ent = rte_align32pow2(params->entries);
> +	nb_buckets = max_ent / K32V64_KEYS_PER_BUCKET;
> +	mem_size = sizeof(struct k32v64_hash_table) +
> +		sizeof(struct k32v64_hash_bucket) * nb_buckets;
> +
> +	mp = rte_mempool_create(hash_name, max_ent,
> +		sizeof(struct k32v64_ext_ent), 0, 0, NULL, NULL, NULL, NULL,
> +		params->socket_id, 0);
> +
> +	if (mp == NULL)
> +		return NULL;
> +
> +	ht = rte_zmalloc_socket(hash_name, mem_size,
> +		RTE_CACHE_LINE_SIZE, params->socket_id);
> +	if (ht == NULL) {
> +		rte_mempool_free(mp);
> +		return NULL;
> +	}
> +
> +	memcpy(ht->pub.name, hash_name, sizeof(ht->pub.name));
> +	ht->max_ent = max_ent;
> +	ht->bucket_msk = nb_buckets - 1;
> +	ht->ext_ent_pool = mp;
> +	ht->pub.lookup = get_lookup_bulk_fn();
> +	ht->pub.modify = k32v64_modify;
> +
> +	return (struct rte_kv_hash_table *)ht; }
> +
> +void
> +k32v64_hash_free(struct rte_kv_hash_table *ht) {
This is a private symbol, I think it needs to have '__rte' suffix?

> +	if (ht == NULL)
> +		return;
> +
> +	rte_mempool_free(((struct k32v64_hash_table *)ht)->ext_ent_pool);
> +	rte_free(ht);
> +}
> diff --git a/lib/librte_hash/k32v64_hash.h b/lib/librte_hash/k32v64_hash.h
> new file mode 100644 index 0000000..10061a5
> --- /dev/null
> +++ b/lib/librte_hash/k32v64_hash.h
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include <rte_kv_hash.h>
> +
> +#define K32V64_KEYS_PER_BUCKET		4
> +#define K32V64_WRITE_IN_PROGRESS	1
> +#define VALID_KEY_MSK           ((1 << K32V64_KEYS_PER_BUCKET) - 1)
> +
> +struct k32v64_ext_ent {
> +	SLIST_ENTRY(k32v64_ext_ent) next;
> +	uint32_t	key;
> +	uint64_t	val;
> +};
> +
> +struct k32v64_hash_bucket {
> +	uint32_t	key[K32V64_KEYS_PER_BUCKET];
> +	uint64_t	val[K32V64_KEYS_PER_BUCKET];
> +	uint8_t		key_mask;
> +	uint32_t	cnt;
> +	SLIST_HEAD(k32v64_list_head, k32v64_ext_ent) head; }
> +__rte_cache_aligned;
> +
> +struct k32v64_hash_table {
> +	struct rte_kv_hash_table pub;	/**< Public part */
> +	uint32_t	nb_ent;		/**< Number of entities in the table*/
> +	uint32_t	nb_ext_ent;	/**< Number of extended entities */
> +	uint32_t	max_ent;	/**< Maximum number of entities */
> +	uint32_t	bucket_msk;
> +	struct rte_mempool	*ext_ent_pool;
> +	__extension__ struct k32v64_hash_bucket	t[0];
> +};
> +
> +typedef int (*k32v64_cmp_fn_t)
> +(struct k32v64_hash_bucket *bucket, uint32_t key, uint64_t *val);
> +
> +static inline int
> +__kv_cmp_keys(struct k32v64_hash_bucket *bucket, uint32_t key,
> +	uint64_t *val)
Changing __kv_cmp_keys to __k32v64_cmp_keys would be consistent

> +{
> +	int i;
> +
> +	for (i = 0; i < K32V64_KEYS_PER_BUCKET; i++) {
> +		if ((key == bucket->key[i]) &&
> +				(bucket->key_mask & (1 << i))) {
> +			*val = bucket->val[i];
> +			return 1;
> +		}
> +	}
You have to load-acquire 'key_mask' (corresponding to store-release on 'key_mask' in add). Suggest changing this as follows:

__atomic_load(&bucket->key_mask, &key_mask, __ATOMIC_ACQUIRE);
for (i = 0; i < K32V64_KEYS_PER_BUCKET; i++) {
	if ((key == bucket->key[i]) && (key_mask & (1 << i))) {
		*val = bucket->val[i];
		return 1;
	}
}

> +
> +	return 0;
> +}
> +
> +static inline int
> +__k32v64_hash_lookup(struct k32v64_hash_table *table, uint32_t key,
> +	uint32_t hash, uint64_t *value, k32v64_cmp_fn_t cmp_f) {
> +	uint64_t	val = 0;
> +	struct k32v64_ext_ent *ent;
> +	uint32_t	cnt;
> +	int found = 0;
> +	uint32_t bucket = hash & table->bucket_msk;
> +
> +	do {
> +
> +		do {
> +			cnt = __atomic_load_n(&table->t[bucket].cnt,
> +				__ATOMIC_ACQUIRE);
> +		} while (unlikely(cnt & K32V64_WRITE_IN_PROGRESS));
Agree that it is a small section. But the issue can happen. This also makes the algorithm un-acceptable in many use cases. IMHO, since we have identified the issue, we should fix it.
The issue is mainly due to not following the reader-writer concurrency design principles. i.e. the data that we want to communicate from writer to reader (key and value in this case) is not being communicated atomically. For ex: in rte_hash/cuckoo hash library, you can see that the key and pData are being communicated atomically using the key store index.

I might be wrong, but, I do not think we can make this block-free (readers move forward even when the writer is not scheduled) using bucket counter.

This problem does not exist in existing rte_hash library. It might not be performing as good as this, but it is block-free.

> +
> +		found = cmp_f(&table->t[bucket], key, &val);
> +		if (unlikely((found == 0) &&
> +				(!SLIST_EMPTY(&table->t[bucket].head)))) {
> +			SLIST_FOREACH(ent, &table->t[bucket].head, next) {
> +				if (ent->key == key) {
> +					val = ent->val;
> +					found = 1;
> +					break;
> +				}
> +			}
> +		}
> +		__atomic_thread_fence(__ATOMIC_RELEASE);
> +	} while (unlikely(cnt != __atomic_load_n(&table->t[bucket].cnt,
> +			 __ATOMIC_RELAXED)));
> +
> +	if (found == 1) {
> +		*value = val;
> +		return 0;
> +	} else
> +		return -ENOENT;
> +}
> +
> +struct rte_kv_hash_table *
> +k32v64_hash_create(const struct rte_kv_hash_params *params);
> +
> +void
> +k32v64_hash_free(struct rte_kv_hash_table *ht);
> diff --git a/lib/librte_hash/k32v64_hash_avx512vl.c
> b/lib/librte_hash/k32v64_hash_avx512vl.c
> new file mode 100644
> index 0000000..75cede5
> --- /dev/null
> +++ b/lib/librte_hash/k32v64_hash_avx512vl.c
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include "k32v64_hash.h"
> +
> +int
> +k32v64_hash_bulk_lookup_avx512vl(struct rte_kv_hash_table *ht, void
> *keys_p,
> +	uint32_t *hashes, void *values_p, unsigned int n);
> +
> +static inline int
> +k32v64_cmp_keys_avx512vl(struct k32v64_hash_bucket *bucket, uint32_t
> key,
> +	uint64_t *val)
> +{
> +	__m128i keys, srch_key;
> +	__mmask8 msk;
> +
> +	keys = _mm_load_si128((void *)bucket);
> +	srch_key = _mm_set1_epi32(key);
> +
> +	msk = _mm_mask_cmpeq_epi32_mask(bucket->key_mask, keys,
> srch_key);
> +	if (msk) {
> +		*val = bucket->val[__builtin_ctz(msk)];
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int
> +k32v64_hash_lookup_avx512vl(struct k32v64_hash_table *table, uint32_t
> key,
> +	uint32_t hash, uint64_t *value)
> +{
> +	return __k32v64_hash_lookup(table, key, hash, value,
> +		k32v64_cmp_keys_avx512vl);
> +}
> +
> +int
> +k32v64_hash_bulk_lookup_avx512vl(struct rte_kv_hash_table *ht, void
> *keys_p,
> +	uint32_t *hashes, void *values_p, unsigned int n) {
> +	struct k32v64_hash_table *table = (struct k32v64_hash_table *)ht;
> +	uint32_t *keys = keys_p;
> +	uint64_t *values = values_p;
> +	int ret, cnt = 0;
> +	unsigned int i;
> +
> +	if (unlikely((table == NULL) || (keys == NULL) || (hashes == NULL) ||
> +			(values == NULL)))
> +		return -EINVAL;
> +
> +	for (i = 0; i < n; i++) {
> +		ret = k32v64_hash_lookup_avx512vl(table, keys[i], hashes[i],
> +			&values[i]);
> +		if (ret == 0)
> +			cnt++;
> +	}
> +	return cnt;
> +}
> diff --git a/lib/librte_hash/meson.build b/lib/librte_hash/meson.build index
> 6ab46ae..0d014ea 100644
> --- a/lib/librte_hash/meson.build
> +++ b/lib/librte_hash/meson.build
> @@ -3,10 +3,23 @@
> 
>  headers = files('rte_crc_arm64.h',
>  	'rte_fbk_hash.h',
> +	'rte_kv_hash.h',
>  	'rte_hash_crc.h',
>  	'rte_hash.h',
>  	'rte_jhash.h',
>  	'rte_thash.h')
> 
> -sources = files('rte_cuckoo_hash.c', 'rte_fbk_hash.c') -deps += ['ring']
> +sources = files('rte_cuckoo_hash.c', 'rte_fbk_hash.c', 'rte_kv_hash.c',
> +'k32v64_hash.c') deps += ['ring', 'mempool']
> +
> +if dpdk_conf.has('RTE_ARCH_X86')
> +	if cc.has_argument('-mavx512vl')
> +		avx512_tmplib = static_library('avx512_tmp',
> +                                'k32v64_hash_avx512vl.c',
> +                                dependencies: static_rte_mempool,
> +                                c_args: cflags + ['-mavx512vl'])
> +                objs += avx512_tmplib.extract_objects('k32v64_hash_avx512vl.c')
> +                cflags += '-DCC_AVX512VL_SUPPORT'
> +
> +	endif
> +endif
> diff --git a/lib/librte_hash/rte_hash_version.map
> b/lib/librte_hash/rte_hash_version.map
> index c2a9094..614e0a5 100644
> --- a/lib/librte_hash/rte_hash_version.map
> +++ b/lib/librte_hash/rte_hash_version.map
> @@ -36,5 +36,9 @@ EXPERIMENTAL {
>  	rte_hash_lookup_with_hash_bulk;
>  	rte_hash_lookup_with_hash_bulk_data;
>  	rte_hash_max_key_id;
> -
> +	rte_kv_hash_create;
> +	rte_kv_hash_find_existing;
> +	rte_kv_hash_free;
> +	rte_kv_hash_add;
> +	rte_kv_hash_delete;
>  };
> diff --git a/lib/librte_hash/rte_kv_hash.c b/lib/librte_hash/rte_kv_hash.c new
> file mode 100644 index 0000000..03df8db
> --- /dev/null
> +++ b/lib/librte_hash/rte_kv_hash.c
> @@ -0,0 +1,184 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include <string.h>
> +
> +#include <rte_eal_memconfig.h>
> +#include <rte_errno.h>
> +#include <rte_malloc.h>
> +#include <rte_memory.h>
> +#include <rte_tailq.h>
> +
> +#include <rte_kv_hash.h>
> +#include "k32v64_hash.h"
> +
> +TAILQ_HEAD(rte_kv_hash_list, rte_tailq_entry);
> +
> +static struct rte_tailq_elem rte_kv_hash_tailq = {
> +	.name = "RTE_KV_HASH",
> +};
> +
> +EAL_REGISTER_TAILQ(rte_kv_hash_tailq);
> +
> +int
> +rte_kv_hash_add(struct rte_kv_hash_table *table, void *key,
> +	uint32_t hash, void *value, int *found) {
> +	if (table == NULL)
> +		return -EINVAL;
> +
> +	return table->modify(table, key, hash, RTE_KV_MODIFY_ADD,
> +		value, found);
> +}
> +
> +int
> +rte_kv_hash_delete(struct rte_kv_hash_table *table, void *key,
> +	uint32_t hash, void *value)
> +{
> +	int found;
> +
> +	if (table == NULL)
> +		return -EINVAL;
> +
> +	return table->modify(table, key, hash, RTE_KV_MODIFY_DEL,
> +		value, &found);
> +}
> +
> +struct rte_kv_hash_table *
> +rte_kv_hash_find_existing(const char *name) {
I did not see a test case for this. Please add a test case for 'rte_kv_hash_find_existing'.

> +	struct rte_kv_hash_table *h = NULL;
> +	struct rte_tailq_entry *te;
> +	struct rte_kv_hash_list *kv_hash_list;
> +
> +	kv_hash_list = RTE_TAILQ_CAST(rte_kv_hash_tailq.head,
> +			rte_kv_hash_list);
> +
> +	rte_mcfg_tailq_read_lock();
> +	TAILQ_FOREACH(te, kv_hash_list, next) {
> +		h = (struct rte_kv_hash_table *) te->data;
> +		if (strncmp(name, h->name, RTE_KV_HASH_NAMESIZE) == 0)
> +			break;
> +	}
> +	rte_mcfg_tailq_read_unlock();
> +	if (te == NULL) {
> +		rte_errno = ENOENT;
> +		return NULL;
> +	}
> +	return h;
> +}
> +
> +struct rte_kv_hash_table *
> +rte_kv_hash_create(const struct rte_kv_hash_params *params) {
> +	char hash_name[RTE_KV_HASH_NAMESIZE];
> +	struct rte_kv_hash_table *ht, *tmp_ht = NULL;
> +	struct rte_tailq_entry *te;
> +	struct rte_kv_hash_list *kv_hash_list;
> +	int ret;
> +
> +	if ((params == NULL) || (params->name == NULL) ||
> +			(params->entries == 0) ||
> +			(params->type >= RTE_KV_HASH_MAX)) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	kv_hash_list = RTE_TAILQ_CAST(rte_kv_hash_tailq.head,
> +		rte_kv_hash_list);
> +
> +	ret = snprintf(hash_name, sizeof(hash_name), "KV_%s", params-
> >name);
RTE_KV_HASH_NAMESIZE is a public #define. It is natural for the user to use this to define the size of the hash table name. Now, we are taking 3 characters from that space. I think it is better to increase the size of 'hash_name' by 3 characters or skip adding 'KV_' to the name.

This also has an impact on 'rte_kv_hash_find_existing' where the user passed string is being used as is without adding 'KV_'. Suggest adding a test case for 'rte_kv_hash_find_existing'.

> +	if (ret < 0 || ret >= RTE_KV_HASH_NAMESIZE) {
> +		rte_errno = ENAMETOOLONG;
> +		return NULL;
> +	}
> +
> +	switch (params->type) {
> +	case RTE_KV_HASH_K32V64:
> +		ht = k32v64_hash_create(params);
> +		break;
> +	default:
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +	if (ht == NULL)
> +		return ht;
> +
> +	rte_mcfg_tailq_write_lock();
> +	TAILQ_FOREACH(te, kv_hash_list, next) {
> +		tmp_ht = (struct rte_kv_hash_table *) te->data;
> +		if (strncmp(params->name, tmp_ht->name,
> +				RTE_KV_HASH_NAMESIZE) == 0)
> +			break;
> +	}
> +	if (te != NULL) {
> +		rte_errno = EEXIST;
> +		goto exit;
> +	}
> +
> +	te = rte_zmalloc("KV_HASH_TAILQ_ENTRY", sizeof(*te), 0);
> +	if (te == NULL) {
> +		RTE_LOG(ERR, HASH, "Failed to allocate tailq entry\n");
> +		goto exit;
> +	}
> +
> +	ht->type = params->type;
> +	te->data = (void *)ht;
> +	TAILQ_INSERT_TAIL(kv_hash_list, te, next);
> +
> +	rte_mcfg_tailq_write_unlock();
> +
> +	return ht;
> +
> +exit:
> +	rte_mcfg_tailq_write_unlock();
> +	switch (params->type) {
> +	case RTE_KV_HASH_K32V64:
> +		k32v64_hash_free(ht);
> +		break;
> +	default:
> +		break;
> +	}
> +	return NULL;
> +}
> +
> +void
> +rte_kv_hash_free(struct rte_kv_hash_table *ht) {
> +	struct rte_tailq_entry *te;
> +	struct rte_kv_hash_list *kv_hash_list;
> +
> +	if (ht == NULL)
> +		return;
> +
> +	kv_hash_list = RTE_TAILQ_CAST(rte_kv_hash_tailq.head,
> +				rte_kv_hash_list);
> +
> +	rte_mcfg_tailq_write_lock();
> +
> +	/* find out tailq entry */
> +	TAILQ_FOREACH(te, kv_hash_list, next) {
> +		if (te->data == (void *) ht)
> +			break;
> +	}
> +
> +
> +	if (te == NULL) {
> +		rte_mcfg_tailq_write_unlock();
> +		return;
> +	}
> +
> +	TAILQ_REMOVE(kv_hash_list, te, next);
> +
> +	rte_mcfg_tailq_write_unlock();
I understand that the free is not thread safe. But, it might be safer if unlock is after the call to 'k32v64_hash_free'.

> +
> +	switch (ht->type) {
> +	case RTE_KV_HASH_K32V64:
> +		k32v64_hash_free(ht);
> +		break;
> +	default:
> +		break;
> +	}
> +	rte_free(te);
> +}
> diff --git a/lib/librte_hash/rte_kv_hash.h b/lib/librte_hash/rte_kv_hash.h new
> file mode 100644 index 0000000..c0375d1
> --- /dev/null
> +++ b/lib/librte_hash/rte_kv_hash.h
> @@ -0,0 +1,169 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#ifndef _RTE_KV_HASH_H_
> +#define _RTE_KV_HASH_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_compat.h>
> +#include <rte_atomic.h>
> +#include <rte_mempool.h>
> +
> +#define RTE_KV_HASH_NAMESIZE		32
> +
> +enum rte_kv_hash_type {
> +	RTE_KV_HASH_K32V64,
> +	RTE_KV_HASH_MAX
> +};
> +
> +enum rte_kv_modify_op {
> +	RTE_KV_MODIFY_ADD,
> +	RTE_KV_MODIFY_DEL,
> +	RTE_KV_MODIFY_OP_MAX
> +};
This could be in a private header file.

> +
> +struct rte_kv_hash_params {
> +	const char *name;
> +	uint32_t entries;
> +	int socket_id;
> +	enum rte_kv_hash_type type;
> +};
Since this is a public structure, suggest adding some reserved or flags field which are ignored now but could be used in the future for enhancements.

> +
> +struct rte_kv_hash_table;
> +
> +typedef int (*rte_kv_hash_bulk_lookup_t) (struct rte_kv_hash_table
> +*table, void *keys, uint32_t *hashes,
> +	void *values, unsigned int n);
> +
> +typedef int (*rte_kv_hash_modify_t)
> +(struct rte_kv_hash_table *table, void *key, uint32_t hash,
> +	enum rte_kv_modify_op op, void *value, int *found);
> +
> +struct rte_kv_hash_table {
> +	char name[RTE_KV_HASH_NAMESIZE];	/**< Name of the hash. */
> +	rte_kv_hash_bulk_lookup_t	lookup;
> +	rte_kv_hash_modify_t		modify;
There are separate APIs provided for add and delete. Is there any advantage for combining add/delete into a single function pointer in the backend?
If we keep 2 separate pointers, we can get rid of 'enum rte_kv_modify_op'. It is simple to understand as well.

> +	enum rte_kv_hash_type		type;
> +};
> +
> +/**
> + * Lookup bulk of keys.
> + * This function is multi-thread safe with regarding to other lookup threads.
It is safe with respect to the writer as well (reader-writer concurrency), please capture this as well.

> + *
> + * @param table
> + *   Hash table to add the key to.
nit, 'Hash table to lookup the keys'

> + * @param keys
> + *   Pointer to array of keys
> + * @param hashes
> + *   Pointer to array of hash values associated with keys.
> + * @param values
> + *   Pointer to array of value corresponded to keys
> + *   If the key was not found the corresponding value remains intact.
> + * @param n
> + *   Number of keys to lookup in batch.
> + * @return
> + *   -EINVAL if there's an error, otherwise number of successful lookups.
> + */
> +static inline int
> +rte_kv_hash_bulk_lookup(struct rte_kv_hash_table *table,
> +	void *keys, uint32_t *hashes, void *values, unsigned int n) {
> +	return table->lookup(table, keys, hashes, values, n); }
Consider making this a non-inline function. This is a bulk lookup and I think cost of calling a function should get amortized well.
This will also allow for hiding the 'struct rte_kv_hash_table' from the user which is better for ABI. You can move the definition of the 'struct rte_kv_hash_table' and function pointer declarations to a private header file.

> +
> +/**
> + * Add a key to an existing hash table with hash value.
> + * This operation is not multi-thread safe regarding to add/delete
> +functions
> + * and should only be called from one thread.
> + * However it is safe to call it along with lookup.
> + *
> + * @param table
> + *   Hash table to add the key to.
> + * @param key
> + *   Key to add to the hash table.
> + * @param value
> + *   Value to associate with key.
I think it needs to be called out here that it the data is of size 64b (even in a 32b system) because of the implementation in function 'k32v64_modify'. Why not make 'value' of type 'uint64_t *'?

> + * @param hash
> + *   Hash value associated with key.
> + * @found
> + *   0 if no previously added key was found
> + *   1 previously added key was found, old value associated with the key
> + *   was written to *value
> + * @return
> + *   0 if ok, or negative value on error.
> + */
> +__rte_experimental
> +int
> +rte_kv_hash_add(struct rte_kv_hash_table *table, void *key,
> +	uint32_t hash, void *value, int *found);
> +
> +/**
> + * Remove a key with a given hash value from an existing hash table.
> + * This operation is not multi-thread safe regarding to add/delete
> +functions
> + * and should only be called from one thread.
> + * However it is safe to call it along with lookup.
> + *
> + * @param table
> + *   Hash table to remove the key from.
> + * @param key
> + *   Key to remove from the hash table.
> + * @param hash
> + *   hash value associated with key.
> + * @param value
> + *   pointer to memory where the old value will be written to on success
> + * @return
> + *   0 if ok, or negative value on error.
> + */
> +__rte_experimental
> +int
> +rte_kv_hash_delete(struct rte_kv_hash_table *table, void *key,
> +	uint32_t hash, void *value);
> +
> +/**
> + * Performs a lookup for an existing hash table, and returns a pointer
> +to
> + * the table if found.
> + *
> + * @param name
> + *   Name of the hash table to find
> + *
> + * @return
> + *   pointer to hash table structure or NULL on error with rte_errno
> + *   set appropriately.
> + */
> +__rte_experimental
> +struct rte_kv_hash_table *
> +rte_kv_hash_find_existing(const char *name);
> +
> +/**
> + * Create a new hash table for use with four byte keys.
> + *
> + * @param params
> + *   Parameters used in creation of hash table.
> + *
> + * @return
> + *   Pointer to hash table structure that is used in future hash table
> + *   operations, or NULL on error with rte_errno set appropriately.
> + */
> +__rte_experimental
> +struct rte_kv_hash_table *
> +rte_kv_hash_create(const struct rte_kv_hash_params *params);
> +
> +/**
> + * Free all memory used by a hash table.
> + *
> + * @param table
> + *   Hash table to deallocate.
> + */
> +__rte_experimental
> +void
> +rte_kv_hash_free(struct rte_kv_hash_table *table);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_KV_HASH_H_ */
> --
> 2.7.4


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3 6/9] eal: register non-EAL threads as lcores
  2020-06-24 10:48  0%               ` David Marchand
@ 2020-06-24 11:59  0%                 ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2020-06-24 11:59 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, jerinjacobk, Richardson, Bruce, mdr, ktraynor, Stokes, Ian,
	i.maximets, Thomas Monjalon, Mcnamara, John, Kovacevic, Marko,
	Burakov, Anatoly, Olivier Matz, Andrew Rybchenko, Neil Horman


> 
> On Wed, Jun 24, 2020 at 12:40 PM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> > > Supporting lcore allocation in MP requires exchanges between
> > > primary/secondary processes like what we have for memory allocations.
> > > It will be quite a beast to get to work fine, while not even knowing
> > > if people actually want to use both.
> >
> > I don't think we need to re-implement RPC as we did for memory subsystem.
> > One relatively simple approach - move lcore_role[] and related lock into
> > shared memory (separate memzone or so).
> > I think it should help a lot and will solve majority of the problems.
> > One limitation - init/fini callbacks can be static only.
> > As the drawback, it will introduce change in current behaviour:
> > secondary process with lcore-mask that intersects with master lcore-mask
> > will fail to start.
> > Second approach - make lcore_id local process entity:
> > prohibit indexing by lcore_id in shared data structures.
> > Let say for mempool - make cache local (per process).
> > While that approach is probably more elegant and consistent,
> > it would require more work and will cause ABI (maybe API also) breakage.
> 
> In all scenarii, this is quite some work.
> 
> 
> >
> > > For v4, I added a check to exclude MP and the new API.
> >
> > Do you mean - make this new dynamic-lcore API return an error if callied
> > from secondary process?
> >
> 
> Yes, and prohibiting from attaching a secondary process if dynamic
> lcore API has been used in primary.
> I intend to squash in patch 6:
> https://github.com/david-marchand/dpdk/commit/e5861ee734bfe2e4dc23d9b919b0db2a32a58aee

But secondary process can attach before lcore_register, so we'll have some sort of inconsistency in behaviour.
If we really  want to go ahead with such workaround -
probably better to introduce explicit EAL flag ( --single-process or so).
As Thomas and  Bruce suggested, if I understood them properly.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 6/9] eal: register non-EAL threads as lcores
  2020-06-24 10:39  3%             ` Ananyev, Konstantin
@ 2020-06-24 10:48  0%               ` David Marchand
  2020-06-24 11:59  0%                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2020-06-24 10:48 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, jerinjacobk, Richardson, Bruce, mdr, ktraynor, Stokes, Ian,
	i.maximets, Thomas Monjalon, Mcnamara, John, Kovacevic, Marko,
	Burakov, Anatoly, Olivier Matz, Andrew Rybchenko, Neil Horman

On Wed, Jun 24, 2020 at 12:40 PM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
> > Supporting lcore allocation in MP requires exchanges between
> > primary/secondary processes like what we have for memory allocations.
> > It will be quite a beast to get to work fine, while not even knowing
> > if people actually want to use both.
>
> I don't think we need to re-implement RPC as we did for memory subsystem.
> One relatively simple approach - move lcore_role[] and related lock into
> shared memory (separate memzone or so).
> I think it should help a lot and will solve majority of the problems.
> One limitation - init/fini callbacks can be static only.
> As the drawback, it will introduce change in current behaviour:
> secondary process with lcore-mask that intersects with master lcore-mask
> will fail to start.
> Second approach - make lcore_id local process entity:
> prohibit indexing by lcore_id in shared data structures.
> Let say for mempool - make cache local (per process).
> While that approach is probably more elegant and consistent,
> it would require more work and will cause ABI (maybe API also) breakage.

In all scenarii, this is quite some work.


>
> > For v4, I added a check to exclude MP and the new API.
>
> Do you mean - make this new dynamic-lcore API return an error if callied
> from secondary process?
>

Yes, and prohibiting from attaching a secondary process if dynamic
lcore API has been used in primary.
I intend to squash in patch 6:
https://github.com/david-marchand/dpdk/commit/e5861ee734bfe2e4dc23d9b919b0db2a32a58aee


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 6/9] eal: register non-EAL threads as lcores
  @ 2020-06-24 10:39  3%             ` Ananyev, Konstantin
  2020-06-24 10:48  0%               ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2020-06-24 10:39 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, jerinjacobk, Richardson, Bruce, mdr, ktraynor, Stokes, Ian,
	i.maximets, Thomas Monjalon, Mcnamara, John, Kovacevic, Marko,
	Burakov, Anatoly, Olivier Matz, Andrew Rybchenko, Neil Horman



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, June 24, 2020 10:24 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; jerinjacobk@gmail.com; Richardson, Bruce <bruce.richardson@intel.com>; mdr@ashroe.eu; ktraynor@redhat.com;
> Stokes, Ian <ian.stokes@intel.com>; i.maximets@ovn.org; Thomas Monjalon <thomas@monjalon.net>; Mcnamara, John
> <john.mcnamara@intel.com>; Kovacevic, Marko <marko.kovacevic@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Olivier
> Matz <olivier.matz@6wind.com>; Andrew Rybchenko <arybchenko@solarflare.com>; Neil Horman <nhorman@tuxdriver.com>
> Subject: Re: [dpdk-dev] [PATCH v3 6/9] eal: register non-EAL threads as lcores
> 
> On Tue, Jun 23, 2020 at 3:16 PM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> > > Even before this series, MP has no protection on lcore placing between
> > > primary and secondary processes.
> >
> > Agree, it is not a new problem, it has been there for a while.
> > Though making lcore assignment dynamic will make it more noticeable and harder to avoid.
> > With static only lcore distribution it is much easier to control things.
> >
> > > Personally, I have no use for DPDK MP and marking MP as not supporting
> > > this new feature is tempting for a first phase.
> > > If this is a strong requirement, I can look at it in a second phase.
> > > What do you think?
> >
> > In theory it is possible to mark this new API as not supported for MP.
> > At least for now. Though I think it is sort of temporal solution.
> > AFAIK, MP is used by customers, so sooner or later someone will hit that problem.
> 
> I understand this argument.
> But then we don't see those customers giving feedback.
> 
> 
> > Let say, we do have pdump app/library in our mainline.
> > As I can see - it will be affected when users will start using this new dynamic lcore API
> > inside their apps.
> 
> Supporting lcore allocation in MP requires exchanges between
> primary/secondary processes like what we have for memory allocations.
> It will be quite a beast to get to work fine, while not even knowing
> if people actually want to use both.

I don't think we need to re-implement RPC as we did for memory subsystem.
One relatively simple approach - move lcore_role[] and related lock into
shared memory (separate memzone or so).
I think it should help a lot and will solve majority of the problems.
One limitation - init/fini callbacks can be static only.
As the drawback, it will introduce change in current behaviour:
secondary process with lcore-mask that intersects with master lcore-mask
will fail to start.
Second approach - make lcore_id local process entity:
prohibit indexing by lcore_id in shared data structures.
Let say for mempool - make cache local (per process).
While that approach is probably more elegant and consistent,
it would require more work and will cause ABI (maybe API also) breakage.
 
> For v4, I added a check to exclude MP and the new API.

Do you mean - make this new dynamic-lcore API return an error if callied
from secondary process?

> I am still willing to help if people do care about using both features together.



^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 20.11] eal: simplify exit functions
@ 2020-06-24  9:36  3% Thomas Monjalon
  2020-06-30 10:26  0% ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-06-24  9:36 UTC (permalink / raw)
  To: dev
  Cc: david.marchand, bruce.richardson, John McNamara, Marko Kovacevic,
	Ray Kinsella, Neil Horman

The option RTE_EAL_ALWAYS_PANIC_ON_ERROR was off by default,
and not customizable with meson. It is completely removed.

The function rte_dump_registers is a trace of the bare metal support
era, and was not supported in userland. It is completely removed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
Because the empty function rte_dump_registers is part of the ABI,
this change is planned for DPDK 20.11.
---
 app/test/test_debug.c                    |  3 ---
 config/common_base                       |  1 -
 doc/guides/howto/debug_troubleshoot.rst  |  2 +-
 lib/librte_eal/common/eal_common_debug.c | 17 +----------------
 lib/librte_eal/include/rte_debug.h       |  7 -------
 lib/librte_eal/rte_eal_version.map       |  1 -
 6 files changed, 2 insertions(+), 29 deletions(-)

diff --git a/app/test/test_debug.c b/app/test/test_debug.c
index 25eab97e2a..834a7386f5 100644
--- a/app/test/test_debug.c
+++ b/app/test/test_debug.c
@@ -66,13 +66,11 @@ test_exit_val(int exit_val)
 	}
 	wait(&status);
 	printf("Child process status: %d\n", status);
-#ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
 	if(!WIFEXITED(status) || WEXITSTATUS(status) != (uint8_t)exit_val){
 		printf("Child process terminated with incorrect status (expected = %d)!\n",
 				exit_val);
 		return -1;
 	}
-#endif
 	return 0;
 }
 
@@ -113,7 +111,6 @@ static int
 test_debug(void)
 {
 	rte_dump_stack();
-	rte_dump_registers();
 	if (test_panic() < 0)
 		return -1;
 	if (test_exit() < 0)
diff --git a/config/common_base b/config/common_base
index c7d5c73215..42ad399b17 100644
--- a/config/common_base
+++ b/config/common_base
@@ -103,7 +103,6 @@ CONFIG_RTE_ENABLE_TRACE_FP=n
 CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_BACKTRACE=y
 CONFIG_RTE_LIBEAL_USE_HPET=n
-CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=n
 CONFIG_RTE_EAL_VFIO=n
 CONFIG_RTE_MAX_VFIO_GROUPS=64
diff --git a/doc/guides/howto/debug_troubleshoot.rst b/doc/guides/howto/debug_troubleshoot.rst
index cef016b2fe..1ed8be5a04 100644
--- a/doc/guides/howto/debug_troubleshoot.rst
+++ b/doc/guides/howto/debug_troubleshoot.rst
@@ -313,7 +313,7 @@ Custom worker function :numref:`dtg_distributor_worker`.
    * For high-performance execution logic ensure running it on correct NUMA
      and non-master core.
 
-   * Analyze run logic with ``rte_dump_stack``, ``rte_dump_registers`` and
+   * Analyze run logic with ``rte_dump_stack`` and
      ``rte_memdump`` for more insights.
 
    * Make use of objdump to ensure opcode is matching to the desired state.
diff --git a/lib/librte_eal/common/eal_common_debug.c b/lib/librte_eal/common/eal_common_debug.c
index 722468754d..15418e957f 100644
--- a/lib/librte_eal/common/eal_common_debug.c
+++ b/lib/librte_eal/common/eal_common_debug.c
@@ -7,14 +7,6 @@
 #include <rte_log.h>
 #include <rte_debug.h>
 
-/* not implemented */
-void
-rte_dump_registers(void)
-{
-	return;
-}
-
-/* call abort(), it will generate a coredump if enabled */
 void
 __rte_panic(const char *funcname, const char *format, ...)
 {
@@ -25,8 +17,7 @@ __rte_panic(const char *funcname, const char *format, ...)
 	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
 	va_end(ap);
 	rte_dump_stack();
-	rte_dump_registers();
-	abort();
+	abort(); /* generate a coredump if enabled */
 }
 
 /*
@@ -46,14 +37,8 @@ rte_exit(int exit_code, const char *format, ...)
 	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
 	va_end(ap);
 
-#ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
 	if (rte_eal_cleanup() != 0)
 		RTE_LOG(CRIT, EAL,
 			"EAL could not release all resources\n");
 	exit(exit_code);
-#else
-	rte_dump_stack();
-	rte_dump_registers();
-	abort();
-#endif
 }
diff --git a/lib/librte_eal/include/rte_debug.h b/lib/librte_eal/include/rte_debug.h
index 50052c5a90..c4bc71ce28 100644
--- a/lib/librte_eal/include/rte_debug.h
+++ b/lib/librte_eal/include/rte_debug.h
@@ -26,13 +26,6 @@ extern "C" {
  */
 void rte_dump_stack(void);
 
-/**
- * Dump the registers of the calling core to the console.
- *
- * Note: Not implemented in a userapp environment; use gdb instead.
- */
-void rte_dump_registers(void);
-
 /**
  * Provide notification of a critical non-recoverable error and terminate
  * execution abnormally.
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 196eef5afa..3f36e46b3b 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -37,7 +37,6 @@ DPDK_20.0 {
 	rte_devargs_remove;
 	rte_devargs_type_count;
 	rte_dump_physmem_layout;
-	rte_dump_registers;
 	rte_dump_stack;
 	rte_dump_tailq;
 	rte_eal_alarm_cancel;
-- 
2.26.2


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 10/10] build: generate version.map file for MinGW on Windows
  @ 2020-06-24  8:28  4%   ` talshn
    1 sibling, 0 replies; 200+ results
From: talshn @ 2020-06-24  8:28 UTC (permalink / raw)
  To: dev
  Cc: thomas, pallavi.kadam, dmitry.kozliuk, david.marchand, grive,
	ranjit.menon, navasile, harini.ramakrishnan, ocardona,
	anatoly.burakov, fady, bruce.richardson, Tal Shnaiderman

From: Tal Shnaiderman <talshn@mellanox.com>

The MinGW build for Windows has special cases where exported
function contain additional prefix:

__emutls_v.per_lcore__*

To avoid adding those prefixed functions to the version.map file
the map_to_def.py script was modified to create a map file for MinGW
with the needed changed.

The file name was changed to map_to_win.py and lib/meson.build map output
was unified with drivers/meson.build output

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
---
 buildtools/{map_to_def.py => map_to_win.py} | 11 ++++++++++-
 buildtools/meson.build                      |  4 ++--
 drivers/meson.build                         | 12 +++++++++---
 lib/meson.build                             | 19 ++++++++++++++-----
 4 files changed, 35 insertions(+), 11 deletions(-)
 rename buildtools/{map_to_def.py => map_to_win.py} (69%)

diff --git a/buildtools/map_to_def.py b/buildtools/map_to_win.py
similarity index 69%
rename from buildtools/map_to_def.py
rename to buildtools/map_to_win.py
index 6775b54a9d..2990b58634 100644
--- a/buildtools/map_to_def.py
+++ b/buildtools/map_to_win.py
@@ -10,12 +10,21 @@
 def is_function_line(ln):
     return ln.startswith('\t') and ln.endswith(';\n') and ":" not in ln
 
+# MinGW keeps the original .map file but replaces per_lcore* to __emutls_v.per_lcore*
+def create_mingw_map_file(input_map, output_map):
+    with open(input_map) as f_in, open(output_map, 'w') as f_out:
+        f_out.writelines([lines.replace('per_lcore', '__emutls_v.per_lcore') for lines in f_in.readlines()])
 
 def main(args):
     if not args[1].endswith('version.map') or \
-            not args[2].endswith('exports.def'):
+            not args[2].endswith('exports.def') and \
+            not args[2].endswith('mingw.map'):
         return 1
 
+    if args[2].endswith('mingw.map'):
+        create_mingw_map_file(args[1], args[2])
+        return 0
+
 # special case, allow override if an def file already exists alongside map file
     override_file = join(dirname(args[1]), basename(args[2]))
     if exists(override_file):
diff --git a/buildtools/meson.build b/buildtools/meson.build
index d5f8291beb..f9d2fdf74b 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -9,14 +9,14 @@ list_dir_globs = find_program('list-dir-globs.py')
 check_symbols = find_program('check-symbols.sh')
 ldflags_ibverbs_static = find_program('options-ibverbs-static.sh')
 
-# set up map-to-def script using python, either built-in or external
+# set up map-to-win script using python, either built-in or external
 python3 = import('python').find_installation(required: false)
 if python3.found()
 	py3 = [python3]
 else
 	py3 = ['meson', 'runpython']
 endif
-map_to_def_cmd = py3 + files('map_to_def.py')
+map_to_win_cmd = py3 + files('map_to_win.py')
 sphinx_wrapper = py3 + files('call-sphinx-build.py')
 
 # stable ABI always starts with "DPDK_"
diff --git a/drivers/meson.build b/drivers/meson.build
index 646a7d5eb5..2cd8505d10 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -152,16 +152,22 @@ foreach class:dpdk_driver_classes
 			implib = 'lib' + lib_name + '.dll.a'
 
 			def_file = custom_target(lib_name + '_def',
-				command: [map_to_def_cmd, '@INPUT@', '@OUTPUT@'],
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
 				input: version_map,
 				output: '@0@_exports.def'.format(lib_name))
-			lk_deps = [version_map, def_file]
+
+			mingw_map = custom_target(lib_name + '_mingw',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
+				input: version_map,
+				output: '@0@_mingw.map'.format(lib_name))
+
+			lk_deps = [version_map, def_file, mingw_map]
 			if is_windows
 				if is_ms_linker
 					lk_args = ['-Wl,/def:' + def_file.full_path(),
 						'-Wl,/implib:drivers\\' + implib]
 				else
-					lk_args = []
+					lk_args = ['-Wl,--version-script=' + mingw_map.full_path()]
 				endif
 			else
 				lk_args = ['-Wl,--version-script=' + version_map]
diff --git a/lib/meson.build b/lib/meson.build
index a8fd317a18..af66610fcb 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -149,19 +149,28 @@ foreach l:libraries
 					meson.current_source_dir(), dir_name, name)
 			implib = dir_name + '.dll.a'
 
-			def_file = custom_target(name + '_def',
-				command: [map_to_def_cmd, '@INPUT@', '@OUTPUT@'],
+			def_file = custom_target(libname + '_def',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
 				input: version_map,
-				output: 'rte_@0@_exports.def'.format(name))
+				output: '@0@_exports.def'.format(libname))
+
+			mingw_map = custom_target(libname + '_mingw',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
+				input: version_map,
+				output: '@0@_mingw.map'.format(libname))
 
 			if is_ms_linker
 				lk_args = ['-Wl,/def:' + def_file.full_path(),
 					'-Wl,/implib:lib\\' + implib]
 			else
-				lk_args = ['-Wl,--version-script=' + version_map]
+				if is_windows
+					lk_args = ['-Wl,--version-script=' + mingw_map.full_path()]
+				else
+					lk_args = ['-Wl,--version-script=' + version_map]
+				endif
 			endif
 
-			lk_deps = [version_map, def_file]
+			lk_deps = [version_map, def_file, mingw_map]
 			if not is_windows
 				# on unix systems check the output of the
 				# check-symbols.sh script, using it as a
-- 
2.16.1.windows.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] ethdev: add a field for rte_eth_rxq_info
  2020-06-23 14:48  3% ` Stephen Hemminger
@ 2020-06-23 15:22  0%   ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2020-06-23 15:22 UTC (permalink / raw)
  To: Stephen Hemminger, Chengchang Tang; +Cc: dev, linuxarm, thomas, ferruh.yigit

On 6/23/20 5:48 PM, Stephen Hemminger wrote:
> On Tue, 23 Jun 2020 14:48:54 +0800
> Chengchang Tang <tangchengchang@huawei.com> wrote:
> 
>> In common practice, PMD configure the rx_buf_size according to the data
>> room size of the object in mempool. But in fact the final value is related
>> to the specifications of hw, and its values will affect the number of
>> fragments in recieving pkts.
>>
>> At present, we seem to have no way to espose relevant information to upper
>> layer users.
>>
>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>> size used in recieving pkts for hw.
>>
>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>> ---
>>  lib/librte_ethdev/rte_ethdev.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>> index 0f6d053..82b7e98 100644
>> --- a/lib/librte_ethdev/rte_ethdev.h
>> +++ b/lib/librte_ethdev/rte_ethdev.h
>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>>  	uint16_t nb_desc;           /**< configured number of RXDs. */
>> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>>  } __rte_cache_min_aligned;
>>
>>  /**
>> --
>> 2.7.4
>>
> 
> Will have to wait until 20.11 as it is an ABI change.
> 

I thought about it.
If I'm not mistaken it does  not change size of the structure.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC][PATCH v2 2/3] ethdev: add API to convert raw timestamps to nsec
  @ 2020-06-23 15:04  3%   ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-06-23 15:04 UTC (permalink / raw)
  To: Patrick Keroulas, dev

Hi, Patrick

>  /**< @internal Function used to get the current value of the device clock. */
> @@ -730,6 +734,7 @@ struct eth_dev_ops {
>  	eth_timesync_read_time     timesync_read_time; /** Get the device
> clock time. */
>  	eth_timesync_write_time    timesync_write_time; /** Set the device
> clock time. */
> 
> +	eth_convert_ts_to_ns       convert_ts_to_ns;
>  	eth_read_clock             read_clock;

I have a concern about this - it adds the new field into struct eth_dev_ops and breaks ABI,
should we postpone patch to 20.11?

With best regards, Slava

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Patrick Keroulas
> Sent: Thursday, June 11, 2020 18:16
> To: dev@dpdk.org
> Cc: Patrick Keroulas <patrick.keroulas@radio-canada.ca>
> Subject: [dpdk-dev] [RFC][PATCH v2 2/3] ethdev: add API to convert raw
> timestamps to nsec
> 
> Existing ethdev functions can read/write time from/to device but they're all
> related to timesync and none of them can translate a raw counter in real
> time unit which is usefull in a pdump application.
> 
> A new API is required because the conversion is derived from dev clock info.
> 
> Signed-off-by: Patrick Keroulas <patrick.keroulas@radio-canada.ca>
> ---
>  lib/librte_ethdev/rte_ethdev.c           | 12 ++++++++++++
>  lib/librte_ethdev/rte_ethdev.h           | 17 +++++++++++++++++
>  lib/librte_ethdev/rte_ethdev_core.h      |  5 +++++
>  lib/librte_ethdev/rte_ethdev_version.map |  2 ++
>  lib/librte_mbuf/rte_mbuf_core.h          |  3 ++-
>  lib/librte_pdump/rte_pdump.c             | 14 +++++++++++++-
>  6 files changed, 51 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6fc3..822fa6d5a 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -4810,6 +4810,18 @@ rte_eth_timesync_write_time(uint16_t port_id,
> const struct timespec *timestamp)
>  								timestamp));
>  }
> 
> +int
> +rte_eth_convert_ts_to_ns(uint16_t port_id, uint64_t *timestamp) {
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->convert_ts_to_ns, -
> ENOTSUP);
> +	return eth_err(port_id, (*dev->dev_ops->convert_ts_to_ns)(dev,
> +timestamp)); }
> +
>  int
>  rte_eth_read_clock(uint16_t port_id, uint64_t *clock)  { diff --git
> a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index
> a49242bcd..2d4d0bc7d 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -4103,6 +4103,23 @@ int rte_eth_timesync_read_time(uint16_t
> port_id, struct timespec *time);
>   */
>  int rte_eth_timesync_write_time(uint16_t port_id, const struct timespec
> *time);
> 
> +/**
> + * Convert a raw clock counter to nanoseconds from device clock
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in&out] timestamp
> + *   Pointer to the timestamp to be converted.
> + *
> + * @return
> + *   - 0: Success.
> + *   - -ENODEV: The port ID is invalid.
> + *   - -ENOTSUP: The function is not supported by the Ethernet driver.
> + */
> +__rte_experimental
> +int
> +rte_eth_convert_ts_to_ns(uint16_t port_id, uint64_t *timestamp);
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this API may change without prior notice.
> diff --git a/lib/librte_ethdev/rte_ethdev_core.h
> b/lib/librte_ethdev/rte_ethdev_core.h
> index 32407dd41..255b41b67 100644
> --- a/lib/librte_ethdev/rte_ethdev_core.h
> +++ b/lib/librte_ethdev/rte_ethdev_core.h
> @@ -464,6 +464,10 @@ typedef int (*eth_timesync_write_time)(struct
> rte_eth_dev *dev,
>  				       const struct timespec *timestamp);  /**<
> @internal Function used to get time from the device clock */
> 
> +typedef int (*eth_convert_ts_to_ns)(struct rte_eth_dev *dev,
> +				      uint64_t *timestamp);
> +/**< @internal Function used to convert timestamp from device clock */
> +
>  typedef int (*eth_read_clock)(struct rte_eth_dev *dev,
>  				      uint64_t *timestamp);
>  /**< @internal Function used to get the current value of the device clock. */
> @@ -730,6 +734,7 @@ struct eth_dev_ops {
>  	eth_timesync_read_time     timesync_read_time; /** Get the device
> clock time. */
>  	eth_timesync_write_time    timesync_write_time; /** Set the device
> clock time. */
> 
> +	eth_convert_ts_to_ns       convert_ts_to_ns;
>  	eth_read_clock             read_clock;
> 
>  	eth_xstats_get_by_id_t     xstats_get_by_id;
> diff --git a/lib/librte_ethdev/rte_ethdev_version.map
> b/lib/librte_ethdev/rte_ethdev_version.map
> index 715505604..754c05630 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -241,4 +241,6 @@ EXPERIMENTAL {
>  	__rte_ethdev_trace_rx_burst;
>  	__rte_ethdev_trace_tx_burst;
>  	rte_flow_get_aged_flows;
> +
> +	rte_eth_convert_ts_to_ns;
>  };
> diff --git a/lib/librte_mbuf/rte_mbuf_core.h
> b/lib/librte_mbuf/rte_mbuf_core.h index b9a59c879..7f51f9157 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -592,7 +592,8 @@ struct rte_mbuf {
>  	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
>  	 * are not normalized but are always the same for a given port.
>  	 * Some devices allow to query rte_eth_read_clock that will return
> the
> -	 * current device timestamp.
> +	 * current device timestamp or rte_eth_ts_to_ns that will convert
> raw
> +	 * counter to nanoseconds.
>  	 */
>  	uint64_t timestamp;
> 
> diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
> index f96709f95..03d9ba484 100644
> --- a/lib/librte_pdump/rte_pdump.c
> +++ b/lib/librte_pdump/rte_pdump.c
> @@ -100,12 +100,24 @@ pdump_copy(struct rte_mbuf **pkts, uint16_t
> nb_pkts, void *user_params)
>  	}
>  }
> 
> +static inline void
> +pdump_ts_to_ns(uint16_t port_id, struct rte_mbuf **pkts, uint16_t
> +nb_pkts) {
> +	unsigned int i;
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (pkts[i]->ol_flags & PKT_RX_TIMESTAMP)
> +			rte_eth_convert_ts_to_ns(port_id, &pkts[i]-
> >timestamp);
> +	}
> +}
> +
>  static uint16_t
> -pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
> +pdump_rx(uint16_t port_id, uint16_t qidx __rte_unused,
>  	struct rte_mbuf **pkts, uint16_t nb_pkts,
>  	uint16_t max_pkts __rte_unused,
>  	void *user_params)
>  {
> +	pdump_ts_to_ns(port_id, pkts, nb_pkts);
>  	pdump_copy(pkts, nb_pkts, user_params);
>  	return nb_pkts;
>  }
> --
> 2.17.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ethdev: add a field for rte_eth_rxq_info
  @ 2020-06-23 14:48  3% ` Stephen Hemminger
  2020-06-23 15:22  0%   ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2020-06-23 14:48 UTC (permalink / raw)
  To: Chengchang Tang; +Cc: dev, linuxarm, thomas, ferruh.yigit, arybchenko

On Tue, 23 Jun 2020 14:48:54 +0800
Chengchang Tang <tangchengchang@huawei.com> wrote:

> In common practice, PMD configure the rx_buf_size according to the data
> room size of the object in mempool. But in fact the final value is related
> to the specifications of hw, and its values will affect the number of
> fragments in recieving pkts.
> 
> At present, we seem to have no way to espose relevant information to upper
> layer users.
> 
> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
> size used in recieving pkts for hw.
> 
> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> ---
>  lib/librte_ethdev/rte_ethdev.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 0f6d053..82b7e98 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>  	uint16_t nb_desc;           /**< configured number of RXDs. */
> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>  } __rte_cache_min_aligned;
> 
>  /**
> --
> 2.7.4
> 

Will have to wait until 20.11 as it is an ABI change.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: mark internal symbols in ethdev
@ 2020-06-23 13:49  9% Ferruh Yigit
  2020-06-26  8:49  0% ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2020-06-23 13:49 UTC (permalink / raw)
  To: Ray Kinsella, Neil Horman, John McNamara, Marko Kovacevic
  Cc: dev, Ferruh Yigit, David Marchand, Thomas Monjalon, Andrew Rybchenko

The APIs are marked in the doxygen comment but better to mark the
symbols too. This is planned for v20.11 release.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 0bee92425..0b0f75720 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -98,6 +98,12 @@ Deprecation Notices
   Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
   APIs can be used as replacement.
 
+* ethdev: Some internal APIs for driver usage are exported in the .map file.
+  Now DPDK has ``__rte_internal`` marker so we can mark internal APIs and move
+  them to the INTERNAL block in .map. Although these APIs are internal it will
+  break the ABI checks, that is why change is planned for 20.11.
+  The list of internal APIs are mainly ones listed in ``rte_ethdev_driver.h``.
+
 * traffic manager: All traffic manager API's in ``rte_tm.h`` were mistakenly made
   ABI stable in the v19.11 release. The TM maintainer and other contributors have
   agreed to keep the TM APIs as experimental in expectation of additional spec
-- 
2.25.4


^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH v3 4/9] eal: introduce thread uninit helper
    2020-06-22 13:25  3%   ` [dpdk-dev] [PATCH v3 2/9] eal: fix multiple definition of per lcore thread id David Marchand
@ 2020-06-22 13:25  3%   ` David Marchand
    2 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-22 13:25 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, ktraynor, ian.stokes,
	i.maximets, Jerin Jacob, Sunil Kumar Kori, Neil Horman,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon

This is a preparation step for dynamically unregistering threads.

Since we explicitly allocate a per thread trace buffer in
rte_thread_init, add an internal helper to free this buffer.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Note: I preferred renaming the current internal function to free all
threads trace buffers (new name trace_mem_free()) and reuse the previous
name (trace_mem_per_thread_free()) when freeing this buffer for a given
thread.

Changes since v2:
- added missing stub for windows tracing support,
- moved free symbol to exported (experimental) ABI as a counterpart of
  the alloc symbol we already had,

Changes since v1:
- rebased on master, removed Windows workaround wrt traces support,

---
 lib/librte_eal/common/eal_common_thread.c |  9 ++++
 lib/librte_eal/common/eal_common_trace.c  | 51 +++++++++++++++++++----
 lib/librte_eal/common/eal_thread.h        |  5 +++
 lib/librte_eal/common/eal_trace.h         |  2 +-
 lib/librte_eal/include/rte_trace_point.h  |  9 ++++
 lib/librte_eal/rte_eal_version.map        |  3 ++
 lib/librte_eal/windows/eal.c              |  5 +++
 7 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index afb30236c5..3b30cc99d9 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -20,6 +20,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_private.h"
 #include "eal_thread.h"
+#include "eal_trace.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
 RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
@@ -161,6 +162,14 @@ rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset)
 	__rte_trace_mem_per_thread_alloc();
 }
 
+void
+rte_thread_uninit(void)
+{
+	__rte_trace_mem_per_thread_free();
+
+	RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY;
+}
+
 struct rte_thread_ctrl_params {
 	void *(*start_routine)(void *);
 	void *arg;
diff --git a/lib/librte_eal/common/eal_common_trace.c b/lib/librte_eal/common/eal_common_trace.c
index 875553d7e5..3e620d76ed 100644
--- a/lib/librte_eal/common/eal_common_trace.c
+++ b/lib/librte_eal/common/eal_common_trace.c
@@ -101,7 +101,7 @@ eal_trace_fini(void)
 {
 	if (!rte_trace_is_enabled())
 		return;
-	trace_mem_per_thread_free();
+	trace_mem_free();
 	trace_metadata_destroy();
 	eal_trace_args_free();
 }
@@ -370,24 +370,59 @@ __rte_trace_mem_per_thread_alloc(void)
 	rte_spinlock_unlock(&trace->lock);
 }
 
+static void
+trace_mem_per_thread_free_unlocked(struct thread_mem_meta *meta)
+{
+	if (meta->area == TRACE_AREA_HUGEPAGE)
+		eal_free_no_trace(meta->mem);
+	else if (meta->area == TRACE_AREA_HEAP)
+		free(meta->mem);
+}
+
+void
+__rte_trace_mem_per_thread_free(void)
+{
+	struct trace *trace = trace_obj_get();
+	struct __rte_trace_header *header;
+	uint32_t count;
+
+	if (RTE_PER_LCORE(trace_mem) == NULL)
+		return;
+
+	header = RTE_PER_LCORE(trace_mem);
+	rte_spinlock_lock(&trace->lock);
+	for (count = 0; count < trace->nb_trace_mem_list; count++) {
+		if (trace->lcore_meta[count].mem == header)
+			break;
+	}
+	if (count != trace->nb_trace_mem_list) {
+		struct thread_mem_meta *meta = &trace->lcore_meta[count];
+
+		trace_mem_per_thread_free_unlocked(meta);
+		if (count != trace->nb_trace_mem_list - 1) {
+			memmove(meta, meta + 1,
+				sizeof(*meta) *
+				 (trace->nb_trace_mem_list - count - 1));
+		}
+		trace->nb_trace_mem_list--;
+	}
+	rte_spinlock_unlock(&trace->lock);
+}
+
 void
-trace_mem_per_thread_free(void)
+trace_mem_free(void)
 {
 	struct trace *trace = trace_obj_get();
 	uint32_t count;
-	void *mem;
 
 	if (!rte_trace_is_enabled())
 		return;
 
 	rte_spinlock_lock(&trace->lock);
 	for (count = 0; count < trace->nb_trace_mem_list; count++) {
-		mem = trace->lcore_meta[count].mem;
-		if (trace->lcore_meta[count].area == TRACE_AREA_HUGEPAGE)
-			eal_free_no_trace(mem);
-		else if (trace->lcore_meta[count].area == TRACE_AREA_HEAP)
-			free(mem);
+		trace_mem_per_thread_free_unlocked(&trace->lcore_meta[count]);
 	}
+	trace->nb_trace_mem_list = 0;
 	rte_spinlock_unlock(&trace->lock);
 }
 
diff --git a/lib/librte_eal/common/eal_thread.h b/lib/librte_eal/common/eal_thread.h
index da5e7c93ba..4ecd8fd53a 100644
--- a/lib/librte_eal/common/eal_thread.h
+++ b/lib/librte_eal/common/eal_thread.h
@@ -25,6 +25,11 @@ __rte_noreturn void *eal_thread_loop(void *arg);
  */
 void rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
 
+/**
+ * Uninitialize per-lcore info for current thread.
+ */
+void rte_thread_uninit(void);
+
 /**
  * Get the NUMA socket id from cpu id.
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/eal_trace.h b/lib/librte_eal/common/eal_trace.h
index 8f60616156..bbb6e1645c 100644
--- a/lib/librte_eal/common/eal_trace.h
+++ b/lib/librte_eal/common/eal_trace.h
@@ -106,7 +106,7 @@ int trace_metadata_create(void);
 void trace_metadata_destroy(void);
 int trace_mkdir(void);
 int trace_epoch_time_save(void);
-void trace_mem_per_thread_free(void);
+void trace_mem_free(void);
 
 /* EAL interface */
 int eal_trace_init(void);
diff --git a/lib/librte_eal/include/rte_trace_point.h b/lib/librte_eal/include/rte_trace_point.h
index 377c2414aa..686b86fdb1 100644
--- a/lib/librte_eal/include/rte_trace_point.h
+++ b/lib/librte_eal/include/rte_trace_point.h
@@ -230,6 +230,15 @@ __rte_trace_point_fp_is_enabled(void)
 __rte_experimental
 void __rte_trace_mem_per_thread_alloc(void);
 
+/**
+ * @internal
+ *
+ * Free trace memory buffer per thread.
+ *
+ */
+__rte_experimental
+void __rte_trace_mem_per_thread_free(void);
+
 /**
  * @internal
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 0d42d44ce9..5831eea4b0 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -393,6 +393,9 @@ EXPERIMENTAL {
 	rte_trace_point_lookup;
 	rte_trace_regexp;
 	rte_trace_save;
+
+	# added in 20.08
+	__rte_trace_mem_per_thread_free;
 };
 
 INTERNAL {
diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c
index 23de12ab43..09cc1aef63 100644
--- a/lib/librte_eal/windows/eal.c
+++ b/lib/librte_eal/windows/eal.c
@@ -255,6 +255,11 @@ __rte_trace_mem_per_thread_alloc(void)
 {
 }
 
+void
+__rte_trace_mem_per_thread_free(void)
+{
+}
+
 void
 __rte_trace_point_emit_field(size_t sz, const char *field,
 	const char *type)
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 2/9] eal: fix multiple definition of per lcore thread id
  @ 2020-06-22 13:25  3%   ` David Marchand
  2020-06-22 13:25  3%   ` [dpdk-dev] [PATCH v3 4/9] eal: introduce thread uninit helper David Marchand
    2 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-22 13:25 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, ktraynor, ian.stokes,
	i.maximets, Neil Horman, Cunming Liang, Konstantin Ananyev,
	Olivier Matz

Because of the inline accessor + static declaration in rte_gettid(),
we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
Each compilation unit will pay a cost when accessing this information
for the first time.

$ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
0000000000000054 d per_lcore__thread_id.5037
0000000000000040 d per_lcore__thread_id.5103
0000000000000048 d per_lcore__thread_id.5259
000000000000004c d per_lcore__thread_id.5259
0000000000000044 d per_lcore__thread_id.5933
0000000000000058 d per_lcore__thread_id.6261
0000000000000050 d per_lcore__thread_id.7378
000000000000005c d per_lcore__thread_id.7496
000000000000000c d per_lcore__thread_id.8016
0000000000000010 d per_lcore__thread_id.8431

Make it global as part of the DPDK_21 stable ABI.

Fixes: ef76436c6834 ("eal: get unique thread id")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 lib/librte_eal/common/eal_common_thread.c | 1 +
 lib/librte_eal/include/rte_eal.h          | 3 ++-
 lib/librte_eal/rte_eal_version.map        | 7 +++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index a5f67d811c..280c64bb76 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -22,6 +22,7 @@
 #include "eal_thread.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 static RTE_DEFINE_PER_LCORE(unsigned int, _socket_id) =
 	(unsigned int)SOCKET_ID_ANY;
 static RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
diff --git a/lib/librte_eal/include/rte_eal.h b/lib/librte_eal/include/rte_eal.h
index 2f9ed298de..2edf8c6556 100644
--- a/lib/librte_eal/include/rte_eal.h
+++ b/lib/librte_eal/include/rte_eal.h
@@ -447,6 +447,8 @@ enum rte_intr_mode rte_eal_vfio_intr_mode(void);
  */
 int rte_sys_gettid(void);
 
+RTE_DECLARE_PER_LCORE(int, _thread_id);
+
 /**
  * Get system unique thread id.
  *
@@ -456,7 +458,6 @@ int rte_sys_gettid(void);
  */
 static inline int rte_gettid(void)
 {
-	static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 	if (RTE_PER_LCORE(_thread_id) == -1)
 		RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
 	return RTE_PER_LCORE(_thread_id);
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 196eef5afa..0d42d44ce9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,6 +221,13 @@ DPDK_20.0 {
 	local: *;
 };
 
+DPDK_21 {
+	global:
+
+	per_lcore__thread_id;
+
+} DPDK_20.0;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 0/4] Introduce IF proxy library
  @ 2020-06-22  9:21  2% ` Andrzej Ostruszka
  0 siblings, 0 replies; 200+ results
From: Andrzej Ostruszka @ 2020-06-22  9:21 UTC (permalink / raw)
  To: dev

All

Please find in this patch set updated version of IF Proxy library.  

This is just a rebase on top of master of version 3 and, as announced in
Marvell roadmap, is meant to be merged into 20.08.

As previously note that this version does not change the notification
scheme yet, since discussion about general DPDK messaging/notification
scheme has not started.  Once this framework crystallize we are willing
to rebase on top of it and depending on the outcome of this rebase we
might incorporate some performance improvements to the example
application or merge the example with the regular l3fwd application.

Changes in V4
=============
- Rebased on master

Changes in V3
=============
- Changed callback registration scheme to make the ABI more robust
- Added new platform callback to provide mask with events available
- All library data access is guarded with a lock
- When port is unbound and proxy has no more ports then it is
  automatically released

Changes in V2
=============
- Cleaned up checkpatch warnings
- Removed dead/unused code and added gateway clearing in l3fwd-ifpx

What is this useful for
=======================

Usually, when an ethernet port is assigned to DPDK it vanishes from the
system and user looses ability to control it via normal configuration
utilities (e.g. those from iproute2 package).  Moreover by default DPDK
application is not aware of the network configuration of the system.

To address both of these issues application needs to:
- add some command line interface (or other mechanism) allowing for
  control of the port and its configuration
- query the status of network configuration and monitor its changes

The purpose of this library is to help with both of these tasks (as long
as they remain in domain of configuration available to the system).  In
other words, if DPDK application has some special needs, that cannot be
addressed by the normal system configuration utilities, then they need
to be solved by the application itself.

The connection between DPDK and system is based on the existence of
ports that are visible to both DPDK and system (like Tap, KNI and
possibly some other drivers).  These ports serve as an interface
proxies.

Let's visualize the action of the library by the following example:

              Linux             |            DPDK
==============================================================
                                |
                                |   +-------+       +-------+
                                |   | Port1 |       | Port2 |
"ip link set dev tap1 mtu 1600" |   +-------+       +-------+
                          |     |       ^              ^ ^
                          |  +------+   | mtu_change   | |
                          `->| Tap1 |---' callback     | |
                             +------+                  | |
"ip addr add 198.51.100.14 \    |                      | |
                  dev tap2"     |                      | |
                          |  +------+                  | |
                          +->| Tap2 |------------------' |
                          |  +------+  addr_add callback |
"ip route add 198.0.2.0/24 \    |  |                     |
                  dev tap2"     |  | route_add callback  |
                                |  `---------------------'

So we have two ports Port1 and Port2 that are not visible to the system.
We create two proxy interfaces (here based on Tap driver) and bind the
ports to their proxies.  When user issues a command changing MTU for
Tap1 interface the library notes this and calls "mtu_change" callback
for the Port1.  Similarly when user adds an IPv4 address to the Tap2
interface "addr_add" callback is called for the Port2 and the same
happens for configuration of routing rule pointing to Tap2.  Apart from
callbacks this library can notify about changes via adding events to
notification queues.  See below for more inforamtion about that and
a complete list of available callbacks.

Please note that nothing has been mentioned about forwarding of the
packets between system and DPDK.  Since the proxies are normal DPDK
ports you can receive/send to them via usual RX/TX burst API.  However
since the library is not aware of the structure of packet processing
used by the application it cannot automatically forward the packets - it
is responsibility of the application to include proxy ports into its
packet processing engine.

As mentioned above the intention of the library is to:
- provide information about network configuration that would allow
  application to decide what to do with the packets received on DPDK
  ports,
- allow for control of the ports via standard configuration utilities

Although the library only helps you to identify proxy for given port
(and vice versa) and calls appropriate callbacks it does open some
interesting possibilities.  For example you can use the proxy ports to
forward packets for protocols that you do not wish to handle in DPDK
application to the system protocol stack and just listen to the
configuration changes - so that way you can "offload" handling of those
protocols to the system.

How to use it
=============

Usage of this library is rather simple.  You have to:
1. Create proxy (if you don't have port suitable for being proxy or you
  have one but do not wish to use it as a proxy).
2. Bind port to proxy.
3. Register callbacks and/or event queues.
4. Start listening to the network configuration.

The only mandatory requirement for DPDK port to be able to act as
a proxy is that it is visible in the system - this is checked during
port to proxy binding by calling rte_eth_dev_info_get() on proxy port
and inspecting 'if_index' field (it has to be non-zero).
One can create such port in the application by calling:

  proxy_id = rte_ifpx_create(RTE_IFPX_DEFAULT);

Upon success this returns id of DPDK proxy port created
(RTE_MAX_ETHPORTS on failure).  The argument selects type of proxy port
to create (currently Tap/KNI only).  This function actually is just
a wrapper around:

  uint16_t rte_ifpx_create_by_devarg(const char *devarg);

creating valid 'devarg' string for the chosen type of proxy.  If you have
other driver capable of acting as a proxy you can call
rte_ifpx_create_by_devarg() directly passing appropriate argument.

Once you have id of both port and proxy you can bind the two via:

  rte_ifpx_port_bind(port_id, proxy_id);

This creates logical binding - as mentioned above there is no automatic
packet forwarding.  With this binding whenever user changes the state of
proxy interface in the system (link up/down, change mac/mtu, add/remove
IPv4/IPv6) you get appropriate notification for the bound port.

So far we've mentioned several times that the library calls callbacks.
They are grouped in 'struct rte_ifpx_callbacks' and user provides them
to the library via:

  rte_ifpx_callbacks_register(len, cbs);

It is worth mentioning that the context (lcore/thread) in which these
callbacks are called is implementation defined.  It might differ between
different platforms, so the application needs to assume that some kind
of inter lcore/thread synchronization/communication is required.

Apart from notification via callbacks this library also supports
notifying about the changes via adding events to the configured
notification queues.  The queues are registered via:

  int rte_ifpx_queue_add(struct rte_ring *r);

and the actual logic used is: if there is callback registered then it is
called, if it returns non-zero then event is considered completed,
otherwise event is added to each configured notification queue.
That way application can update data structures that are safe to be
modified by single writer from within callback or do the common
preprocessing steps (if any needed) in callback and data that is
replicated can be updated during handling of queued events.

Once we have bindings in place and notification configured, the only
essential part that remains is to get the current network configuration
and start listening to its changes.  This is accomplished via a call to:

  rte_ifpx_listen();

And basically this is all one needs to understand how to use this
library.  Other less essential parts include:
- ability to query what events are available for given platform
- getting mapping between proxy and port
- unbinding the ports from proxy
- destroying proxy port
- closing the listening service
- getting basic information about proxy


Currently available features and implementation
===============================================

The library's API is system independent but it obviously needs some
system dependent parts.  We provide exemplary Linux implementation
(based on netlink sockets).  Very similar implementation is possible for
FreeBSD (with the usage of PF_ROUTE sockets).  Windows implementation
would need to differ much (probably IP Helper library would be of some
help).

Here is the list of currently implemented callbacks:

  int (*mac_change)(const struct rte_ifpx_mac_change *event);
  int (*mtu_change)(const struct rte_ifpx_mtu_change *event);
  int (*link_change)(const struct rte_ifpx_link_change *event);
  int (*addr_add)(const struct rte_ifpx_addr_change *event);
  int (*addr_del)(const struct rte_ifpx_addr_change *event);
  int (*addr6_add)(const struct rte_ifpx_addr6_change *event);
  int (*addr6_del)(const struct rte_ifpx_addr6_change *event);
  int (*route_add)(const struct rte_ifpx_route_change *event);
  int (*route_del)(const struct rte_ifpx_route_change *event);
  int (*route6_add)(const struct rte_ifpx_route6_change *event);
  int (*route6_del)(const struct rte_ifpx_route6_change *event);
  int (*neigh_add)(const struct rte_ifpx_neigh_change *event);
  int (*neigh_del)(const struct rte_ifpx_neigh_change *event);
  int (*neigh6_add)(const struct rte_ifpx_neigh6_change *event);
  int (*neigh6_del)(const struct rte_ifpx_neigh6_change *event);
  int (*cfg_done)(void);

They are all rather self-descriptive with the exception of the last one.
When the user calls rte_ifpx_listen() the library first queries the
system for its current configuration.  That might require several
request/reply exchanges between DPDK and system and once it is finished
this callback is called to let application know that all info has been
gathered.

It is worth to mention also that while typical case would be a 1-to-1
mapping between port and proxy, the 1-to-many mapping is also supported.
In that case related callbacks will be called for each port bound to
given proxy interface - it is application responsibility to define
semantic of such mapping (e.g. all changes apply to all ports, or link
changes apply to all but other are accepted in "round robin" fashion, or
some other logic).

As mentioned above Linux implementation is based on netlink socket.
This socket is registered as file descriptor in EAL interrupts
(similarly to how EAL alarms are implemented).

With regards
Andrzej Ostruszka


Andrzej Ostruszka (4):
  lib: introduce IF Proxy library
  if_proxy: add library documentation
  if_proxy: add simple functionality test
  if_proxy: add example application

 MAINTAINERS                                  |    6 +
 app/test/Makefile                            |    5 +
 app/test/meson.build                         |    4 +
 app/test/test_if_proxy.c                     |  707 +++++++++++
 config/common_base                           |    5 +
 config/common_linux                          |    1 +
 doc/guides/prog_guide/if_proxy_lib.rst       |  142 +++
 doc/guides/prog_guide/index.rst              |    1 +
 examples/Makefile                            |    1 +
 examples/l3fwd-ifpx/Makefile                 |   60 +
 examples/l3fwd-ifpx/l3fwd.c                  | 1131 ++++++++++++++++++
 examples/l3fwd-ifpx/l3fwd.h                  |   98 ++
 examples/l3fwd-ifpx/main.c                   |  740 ++++++++++++
 examples/l3fwd-ifpx/meson.build              |   11 +
 examples/meson.build                         |    4 +-
 lib/Makefile                                 |    3 +
 lib/librte_eal/include/rte_eal_interrupts.h  |    2 +
 lib/librte_eal/linux/eal_interrupts.c        |   14 +-
 lib/librte_if_proxy/Makefile                 |   29 +
 lib/librte_if_proxy/if_proxy_common.c        |  494 ++++++++
 lib/librte_if_proxy/if_proxy_priv.h          |   97 ++
 lib/librte_if_proxy/linux/Makefile           |    4 +
 lib/librte_if_proxy/linux/if_proxy.c         |  550 +++++++++
 lib/librte_if_proxy/meson.build              |   19 +
 lib/librte_if_proxy/rte_if_proxy.h           |  561 +++++++++
 lib/librte_if_proxy/rte_if_proxy_version.map |   19 +
 lib/meson.build                              |    2 +-
 27 files changed, 4703 insertions(+), 7 deletions(-)
 create mode 100644 app/test/test_if_proxy.c
 create mode 100644 doc/guides/prog_guide/if_proxy_lib.rst
 create mode 100644 examples/l3fwd-ifpx/Makefile
 create mode 100644 examples/l3fwd-ifpx/l3fwd.c
 create mode 100644 examples/l3fwd-ifpx/l3fwd.h
 create mode 100644 examples/l3fwd-ifpx/main.c
 create mode 100644 examples/l3fwd-ifpx/meson.build
 create mode 100644 lib/librte_if_proxy/Makefile
 create mode 100644 lib/librte_if_proxy/if_proxy_common.c
 create mode 100644 lib/librte_if_proxy/if_proxy_priv.h
 create mode 100644 lib/librte_if_proxy/linux/Makefile
 create mode 100644 lib/librte_if_proxy/linux/if_proxy.c
 create mode 100644 lib/librte_if_proxy/meson.build
 create mode 100644 lib/librte_if_proxy/rte_if_proxy.h
 create mode 100644 lib/librte_if_proxy/rte_if_proxy_version.map

-- 
2.17.1


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration
  @ 2020-06-22  8:06  0%           ` Maxime Coquelin
  0 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2020-06-22  8:06 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/21/20 8:26 AM, Matan Azrad wrote:
> Hi Maxime
> 
> From: Maxime Coquelin:
>> On 6/19/20 3:28 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin:
>>>> On 6/18/20 6:28 PM, Matan Azrad wrote:
>>>>> As an arrangement to per queue operations in the vDPA device it is
>>>>> needed to change the next experimental API:
>>>>>
>>>>> The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
>>>>> instead of per device.
>>>>>
>>>>> A `qid` parameter was added to the API arguments list.
>>>>>
>>>>> Setting the parameter to the value VHOST_QUEUE_ALL will configure
>>>>> the host notifier to all the device queues as done before this patch.
>>>>>
>>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>>>> ---
>>>>>  doc/guides/rel_notes/release_20_08.rst |  2 ++
>>>>>  drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
>>>>>  drivers/vdpa/mlx5/mlx5_vdpa.c          |  5 +++--
>>>>>  lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
>>>>>  lib/librte_vhost/rte_vhost.h           |  2 ++
>>>>>  lib/librte_vhost/vhost.h               |  3 ---
>>>>>  lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
>>>>>  7 files changed, 30 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/doc/guides/rel_notes/release_20_08.rst
>>>>> b/doc/guides/rel_notes/release_20_08.rst
>>>>> index ba16d3b..9732959 100644
>>>>> --- a/doc/guides/rel_notes/release_20_08.rst
>>>>> +++ b/doc/guides/rel_notes/release_20_08.rst
>>>>> @@ -111,6 +111,8 @@ API Changes
>>>>>     Also, make sure to start the actual text at the margin.
>>>>>
>>>>
>> =========================================================
>>>>>
>>>>> +* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to
>>>>> +be per
>>>>> +  queue and not per device, a qid parameter was added to the
>>>>> +arguments
>>>> list.
>>>>>
>>>>>  ABI Changes
>>>>>  -----------
>>>>> diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c
>>>>> b/drivers/vdpa/ifc/ifcvf_vdpa.c index ec97178..336837a 100644
>>>>> --- a/drivers/vdpa/ifc/ifcvf_vdpa.c
>>>>> +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
>>>>> @@ -839,7 +839,7 @@ struct internal_list {
>>>>>  	vdpa_ifcvf_stop(internal);
>>>>>  	vdpa_disable_vfio_intr(internal);
>>>>>
>>>>> -	ret = rte_vhost_host_notifier_ctrl(vid, false);
>>>>> +	ret = rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, false);
>>>>>  	if (ret && ret != -ENOTSUP)
>>>>>  		goto error;
>>>>>
>>>>> @@ -858,7 +858,7 @@ struct internal_list {
>>>>>  	if (ret)
>>>>>  		goto stop_vf;
>>>>>
>>>>> -	rte_vhost_host_notifier_ctrl(vid, true);
>>>>> +	rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true);
>>>>>
>>>>>  	internal->sw_fallback_running = true;
>>>>>
>>>>> @@ -893,7 +893,7 @@ struct internal_list {
>>>>>  	rte_atomic32_set(&internal->dev_attached, 1);
>>>>>  	update_datapath(internal);
>>>>>
>>>>> -	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
>>>>> +	if (rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true) != 0)
>>>>>  		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
>>>>>
>>>>>  	return 0;
>>>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>>> b/drivers/vdpa/mlx5/mlx5_vdpa.c index 9e758b6..8ea1300 100644
>>>>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>>> @@ -147,7 +147,8 @@
>>>>>  	int ret;
>>>>>
>>>>>  	if (priv->direct_notifier) {
>>>>> -		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
>>>>> +		ret = rte_vhost_host_notifier_ctrl(priv->vid,
>>>> VHOST_QUEUE_ALL,
>>>>> +						   false);
>>>>>  		if (ret != 0) {
>>>>>  			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
>>>>>  				"destroyed for device %d: %d.", priv->vid,
>>>> ret); @@ -155,7 +156,7
>>>>> @@
>>>>>  		}
>>>>>  		priv->direct_notifier = 0;
>>>>>  	}
>>>>> -	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
>>>>> +	ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL,
>>>>> +true);
>>>>>  	if (ret != 0)
>>>>>  		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured
>>>> for"
>>>>>  			" device %d: %d.", priv->vid, ret); diff --git
>>>>> a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index
>>>>> ecb3d91..2db536c 100644
>>>>> --- a/lib/librte_vhost/rte_vdpa.h
>>>>> +++ b/lib/librte_vhost/rte_vdpa.h
>>>>> @@ -202,22 +202,26 @@ struct rte_vdpa_device *  int
>>>>> rte_vdpa_get_device_num(void);
>>>>>
>>>>> +#define VHOST_QUEUE_ALL VHOST_MAX_VRING
>>>>> +
>>>>>  /**
>>>>>   * @warning
>>>>>   * @b EXPERIMENTAL: this API may change without prior notice
>>>>>   *
>>>>> - * Enable/Disable host notifier mapping for a vdpa port.
>>>>> + * Enable/Disable host notifier mapping for a vdpa queue.
>>>>>   *
>>>>>   * @param vid
>>>>>   *  vhost device id
>>>>>   * @param enable
>>>>>   *  true for host notifier map, false for host notifier unmap
>>>>> + * @param qid
>>>>> + *  vhost queue id, VHOST_QUEUE_ALL to configure all the device
>>>>> + queues
>>>> I would prefer two APIs that passing a special ID that means all queues:
>>>>
>>>> rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
>>>> rte_vhost_host_notifier_ctrl_all(int vid, bool enable);
>>>>
>>>> I think it is clearer for the user of the API.
>>>> Or if you think an extra API is overkill, just let the driver loop on
>>>> all the queues.
>>>
>>> We have a lot of options here with pros and cons.
>>> I took the rte_eth_dev_callback_register style.
>>
>> Ok, I didn't looked at this code.
>>
>>> It is less intrusive with minimum code change.
>>>
>>> I'm not sure what is the clearest option but the current suggestion is
>>> well defined and allows to configure all the queues too.
>>>
>>> Let me know what you prefer....
>>
>> I personally don't like the style, but I can live with it if you prefer doing it like
>> that.
>>
>> If you do it that way, you will have to rename VHOST_QUEUE_ALL to
>> RTE_VHOST_QUEUE_ALL, VHOST_MAX_VRING  to RTE_VHOST_MAX_VRING
>> and VHOST_MAX_QUEUE_PAIRS to RTE_VHOST_MAX_QUEUE_PAIRS as it
>> will become part of the ABI.
>>
>> Not that it also means that we won't be able to increase the maximum
>> number of rings without breaking the ABI.
> 
> What's about defining RTE_VHOST_QUEUE_ALL as UINT16_MAX?

I am not fan, but it is better than basing it on VHOST_MAX_QUEUE_PAIRS.

>>>>>   * @return
>>>>>   *  0 on success, -1 on failure
>>>>>   */
>>>>>  __rte_experimental
>>>>>  int
>>>>> -rte_vhost_host_notifier_ctrl(int vid, bool enable);
>>>>> +rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
>>>>>
>>>>>  /**
>>>
> 


^ permalink raw reply	[relevance 0%]

Results 5201-5400 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2018-01-15 16:16     [dpdk-dev] [PATCH v6] sched: make RED scaling configurable alangordondewar
2019-04-08  8:53     ` [dpdk-dev] [PATCH v7] " Thomas Monjalon
2019-04-08 13:29       ` Dumitrescu, Cristian
2020-07-06 23:09  3%     ` Thomas Monjalon
2019-09-06  9:45     [dpdk-dev] [PATCH v2 0/6] RCU integration with LPM library Ruifeng Wang
2020-06-29  8:02  3% ` [dpdk-dev] [PATCH v5 0/3] " Ruifeng Wang
2020-06-29  8:02       ` [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-06-29 11:56         ` David Marchand
2020-06-29 12:55           ` Bruce Richardson
2020-06-30 10:35  3%         ` Kinsella, Ray
2020-07-04 17:00  3%       ` Ruifeng Wang
2020-07-07 14:40  3% ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-07 15:15  3% ` [dpdk-dev] [PATCH v7 " Ruifeng Wang
2020-07-07 15:15       ` [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-08 14:30  2%     ` David Marchand
2020-07-08 15:34  5%       ` Ruifeng Wang
2020-07-09  8:02  4% ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-09  8:02  2%   ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-09 15:42  4% ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-09 15:42  2%   ` [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-10  2:22  4% ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
2020-03-05  4:33     [dpdk-dev] [RFC v1 1/1] vfio: set vf token and gain vf device access vattunuru
2020-06-17  6:33     ` [dpdk-dev] [PATCH v16 0/2] support for VFIO-PCI VF token interface Haiyue Wang
2020-06-17  6:33       ` [dpdk-dev] [PATCH v16 2/2] eal: support for VFIO-PCI VF token Haiyue Wang
2020-06-22 20:39         ` Harman Kalra
2020-06-25  7:33           ` David Marchand
2020-06-25 10:49  3%         ` Wang, Haiyue
2020-07-03 14:57  4% ` [dpdk-dev] [PATCH v17 0/2] support for VFIO-PCI VF token interface Haiyue Wang
2020-03-06 16:41     [dpdk-dev] [PATCH 0/4] Introduce IF proxy library Andrzej Ostruszka
2020-06-22  9:21  2% ` [dpdk-dev] [PATCH v4 " Andrzej Ostruszka
2020-03-20 16:41     [dpdk-dev] [RFC] ring: make ring implementation non-inlined Konstantin Ananyev
2020-03-25 21:09     ` Jerin Jacob
2020-03-26  8:04       ` Morten Brørup
2020-03-31 23:25         ` Thomas Monjalon
2020-06-30 23:15  0%       ` Honnappa Nagarahalli
2020-07-01  7:27  0%         ` Morten Brørup
2020-07-01 12:21  0%           ` Ananyev, Konstantin
2020-07-01 14:11  0%             ` Honnappa Nagarahalli
2020-04-15 18:17     [dpdk-dev] [PATCH v3 0/4] add new k32v64 hash table Vladimir Medvedkin
2020-05-08 19:58     ` [dpdk-dev] [PATCH v4 0/4] add new kv " Vladimir Medvedkin
2020-05-08 19:58       ` [dpdk-dev] [PATCH v4 1/4] hash: add kv hash library Vladimir Medvedkin
2020-06-25  4:27  2%     ` Honnappa Nagarahalli
2020-04-21  2:04     [dpdk-dev] [PATCH] devtools: remove useless files from ABI reference Thomas Monjalon
2020-05-24 17:43     ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
2020-05-28 13:16       ` David Marchand
2020-07-03  9:08  4%     ` David Marchand
2020-05-22  6:58     [dpdk-dev] [PATCH 0/3] Experimental/internal libraries cleanup David Marchand
2020-06-25  7:21  5% ` [dpdk-dev] [PATCH v2 " David Marchand
2020-06-25  7:21 25%   ` [dpdk-dev] [PATCH v2 1/3] build: remove special versioning for non stable libraries David Marchand
2020-06-25  9:25  0%     ` David Marchand
2020-06-25  7:21  3%   ` [dpdk-dev] [PATCH v2 2/3] drivers: drop workaround for internal libraries David Marchand
2020-06-25  7:21 17%   ` [dpdk-dev] [PATCH v2 3/3] lib: remind experimental status in library headers David Marchand
2020-06-26  8:16  5% ` [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup David Marchand
2020-06-26  8:16 24%   ` [dpdk-dev] [PATCH v3 1/3] build: remove special versioning for non stable libraries David Marchand
2020-06-26  8:38  0%     ` Kinsella, Ray
2020-06-26  8:16  3%   ` [dpdk-dev] [PATCH v3 2/3] drivers: drop workaround for internal libraries David Marchand
2020-06-26  8:16 16%   ` [dpdk-dev] [PATCH v3 3/3] lib: remind experimental status in library headers David Marchand
2020-06-26  9:25  0%   ` [dpdk-dev] [dpdk-techboard] [PATCH v3 0/3] Experimental/internal libraries cleanup Bruce Richardson
2020-07-05 19:55  3%   ` [dpdk-dev] " Thomas Monjalon
2020-07-06  8:02  3%     ` [dpdk-dev] [dpdk-techboard] " Bruce Richardson
2020-07-06  8:12  0%       ` Thomas Monjalon
2020-07-06 16:57  0%     ` [dpdk-dev] " Medvedkin, Vladimir
2020-05-22 13:23     [dpdk-dev] [PATCH 20.08 0/9] adding support for python 3 only Louise Kilheeney
2020-07-02 10:37     ` [dpdk-dev] [PATCH v3 0/9] dding " Louise Kilheeney
2020-07-02 10:37  4%   ` [dpdk-dev] [PATCH v3 8/9] devtools: support python3 only Louise Kilheeney
2020-05-31 14:43     [dpdk-dev] [RFC] ethdev: add fragment attribute to IPv6 item Dekel Peled
2020-06-02 14:32     ` Andrew Rybchenko
2020-06-02 18:28       ` Ori Kam
2020-06-02 19:04         ` Adrien Mazarguil
2020-06-03  8:16           ` Ori Kam
2020-06-03 12:10             ` Dekel Peled
2020-06-18  6:58               ` Dekel Peled
2020-06-28 14:52  0%             ` Dekel Peled
2020-07-05 13:13  0%       ` Andrew Rybchenko
2020-06-04 21:02     [dpdk-dev] [RFC] doc: change to diverse and inclusive language Stephen Hemminger
2020-07-01 19:46     ` [dpdk-dev] [PATCH v3 00/27] Replace references to master and slave Stephen Hemminger
2020-07-01 19:46  1%   ` [dpdk-dev] [PATCH v3 22/27] doc: update references to master/slave lcore in documentation Stephen Hemminger
2020-07-01 20:23     ` [dpdk-dev] [PATCH v4 00/27] Replace references to master and slave Stephen Hemminger
2020-07-01 20:23  1%   ` [dpdk-dev] [PATCH v4 22/27] doc: update references to master/slave lcore in documentation Stephen Hemminger
2020-06-10  6:38     [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
2020-06-10 13:33     ` Harman Kalra
2020-06-10 15:16       ` Slava Ovsiienko
2020-06-17 15:57         ` [dpdk-dev] [EXT] " Harman Kalra
2020-07-01 15:46  0%       ` Slava Ovsiienko
2020-07-01 15:36  2% ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
2020-07-07 11:50  0%   ` Olivier Matz
2020-07-07 12:46  0%     ` Slava Ovsiienko
2020-07-07 12:59  2% ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
2020-07-07 13:08  2% ` [dpdk-dev] [PATCH v3 " Viacheslav Ovsiienko
2020-07-07 14:32  0%   ` Olivier Matz
2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
2020-07-07 15:23  0%   ` Olivier Matz
2020-07-08 14:16  0%   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
2020-07-08 14:54  0%     ` Slava Ovsiienko
2020-07-08 15:27  0%       ` Morten Brørup
2020-07-08 15:51  0%         ` Slava Ovsiienko
2020-07-08 15:47  2% ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
2020-07-08 16:05  0%   ` Slava Ovsiienko
2020-07-09 12:36  2% ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
2020-07-09 23:47  0%   ` Ferruh Yigit
2020-06-10 14:44     [dpdk-dev] [PATCH 0/7] Register external threads as lcore David Marchand
2020-06-22 13:25     ` [dpdk-dev] [PATCH v3 0/9] Register non-EAL " David Marchand
2020-06-22 13:25  3%   ` [dpdk-dev] [PATCH v3 2/9] eal: fix multiple definition of per lcore thread id David Marchand
2020-06-22 13:25  3%   ` [dpdk-dev] [PATCH v3 4/9] eal: introduce thread uninit helper David Marchand
2020-06-22 13:25       ` [dpdk-dev] [PATCH v3 6/9] eal: register non-EAL threads as lcores David Marchand
2020-06-22 15:49         ` Ananyev, Konstantin
2020-06-23  7:49           ` David Marchand
2020-06-23 13:15             ` Ananyev, Konstantin
2020-06-24  9:23               ` David Marchand
2020-06-24 10:39  3%             ` Ananyev, Konstantin
2020-06-24 10:48  0%               ` David Marchand
2020-06-24 11:59  0%                 ` Ananyev, Konstantin
2020-06-26 14:47     ` [dpdk-dev] [PATCH v4 0/9] Register non-EAL threads as lcore David Marchand
2020-06-26 14:47  3%   ` [dpdk-dev] [PATCH v4 2/9] eal: fix multiple definition of per lcore thread id David Marchand
2020-06-30  9:34  0%     ` Olivier Matz
2020-06-26 14:47  3%   ` [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper David Marchand
2020-06-26 15:00  0%     ` Jerin Jacob
2020-06-29  9:07  0%       ` David Marchand
2020-06-29  8:59  0%     ` [dpdk-dev] [EXT] " Sunil Kumar Kori
2020-06-30  9:42  0%     ` [dpdk-dev] " Olivier Matz
2020-07-06 14:15     ` [dpdk-dev] [PATCH v5 00/10] Register non-EAL threads as lcore David Marchand
2020-07-06 14:15  3%   ` [dpdk-dev] [PATCH v5 02/10] eal: fix multiple definition of per lcore thread id David Marchand
2020-07-06 14:16  3%   ` [dpdk-dev] [PATCH v5 04/10] eal: introduce thread uninit helper David Marchand
2020-07-06 20:52     ` [dpdk-dev] [PATCH v6 00/10] Register non-EAL threads as lcore David Marchand
2020-07-06 20:52  3%   ` [dpdk-dev] [PATCH v6 02/10] eal: fix multiple definition of per lcore thread id David Marchand
2020-07-06 20:52  3%   ` [dpdk-dev] [PATCH v6 04/10] eal: introduce thread uninit helper David Marchand
2020-06-10 17:17     [dpdk-dev] [RFC PATCH 1/6] eal: introduce macros for getting value for bit Parav Pandit
2020-06-21 19:11     ` [dpdk-dev] [PATCH v2 0/6] Improve mlx5 PMD common driver framework for multiple classes Parav Pandit
2020-06-21 19:11       ` [dpdk-dev] [PATCH v2 4/6] bus/mlx5_pci: register a PCI driver Parav Pandit
2020-06-29 15:49  2%     ` Gaëtan Rivet
2020-06-11 10:24     [dpdk-dev] [PATCH 1/2] eal: remove redundant code Phil Yang
2020-06-11 10:24     ` [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status Phil Yang
2020-07-08 12:29  3%   ` David Marchand
2020-07-08 13:43  0%     ` Aaron Conole
2020-07-08 15:04  0%     ` Kinsella, Ray
2020-07-09  5:21  0%       ` Phil Yang
2020-07-09  6:46  3% ` [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins " Phil Yang
2020-07-09  8:02  0%   ` Stefan Puiu
2020-07-09  8:34  2%   ` [dpdk-dev] [PATCH v3] " Phil Yang
2020-07-09 10:30  0%     ` David Marchand
2020-06-11 10:26     [dpdk-dev] [PATCH] mbuf: use c11 atomics for refcnt operations Phil Yang
2020-07-03 15:38  3% ` David Marchand
2020-07-06  8:03  3%   ` Phil Yang
2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
2020-07-08  5:11  3%   ` Phil Yang
2020-07-08 11:44  0%   ` Olivier Matz
2020-07-09 10:00  3%     ` Phil Yang
2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
2020-07-09 11:03  3%     ` Olivier Matz
2020-07-09 13:00  3%       ` Phil Yang
2020-07-09 13:31  0%         ` Honnappa Nagarahalli
2020-07-09 14:10  0%           ` Phil Yang
2020-07-09 15:58  4%     ` [dpdk-dev] [PATCH v4 1/2] " Phil Yang
2020-06-11 15:16     [dpdk-dev] [RFC][PATCH v2 0/3] pdump HW timestamps for mlx5 Patrick Keroulas
2020-06-11 15:16     ` [dpdk-dev] [RFC][PATCH v2 2/3] ethdev: add API to convert raw timestamps to nsec Patrick Keroulas
2020-06-23 15:04  3%   ` Slava Ovsiienko
2020-06-12 11:19     [dpdk-dev] [PATCH 1/3] eventdev: fix race condition on timer list counter Phil Yang
2020-07-02  5:26     ` [dpdk-dev] [PATCH v2 1/4] " Phil Yang
2020-07-02  5:26       ` [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11 atomics Phil Yang
2020-07-06 10:04  4%     ` Thomas Monjalon
2020-07-06 15:32  0%       ` Phil Yang
2020-07-06 15:40  0%         ` Thomas Monjalon
2020-07-07 11:13       ` [dpdk-dev] [PATCH v3 1/4] eventdev: fix race condition on timer list counter Phil Yang
2020-07-07 11:13  4%     ` [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics Phil Yang
2020-07-07 14:29  0%       ` Jerin Jacob
2020-07-07 15:56  0%         ` Phil Yang
2020-07-07 15:54         ` [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter Phil Yang
2020-07-07 15:54  4%       ` [dpdk-dev] [PATCH v4 4/4] eventdev: relax smp barriers with C11 atomics Phil Yang
2020-07-08 13:30  4%       ` [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter Jerin Jacob
2020-07-08 15:01  0%         ` Thomas Monjalon
2020-06-14 22:57     [dpdk-dev] [PATCH 0/4] add PPC and Windows to meson test Thomas Monjalon
2020-06-15 22:22     ` [dpdk-dev] [PATCH v2 0/4] add PPC and Windows cross-compilation " Thomas Monjalon
2020-06-29 23:15  0%   ` Thomas Monjalon
2020-06-18 16:28     [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
2020-06-18 16:28     ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
2020-06-19  6:44       ` Maxime Coquelin
2020-06-19 13:28         ` Matan Azrad
2020-06-19 14:01           ` Maxime Coquelin
2020-06-21  6:26             ` Matan Azrad
2020-06-22  8:06  0%           ` Maxime Coquelin
2020-06-20 21:05     [dpdk-dev] [PATCH 0/7] cmdline: support Windows Dmitry Kozlyuk
2020-06-20 21:05     ` [dpdk-dev] [PATCH 6/7] " Dmitry Kozlyuk
2020-06-28 14:20       ` Fady Bader
2020-06-29  6:23         ` Ranjit Menon
2020-06-29  7:42  3%       ` Dmitry Kozlyuk
2020-06-29  8:12  0%         ` Tal Shnaiderman
2020-06-29 23:56  0%           ` Dmitry Kozlyuk
2020-07-08  1:09  0%             ` Dmitry Kozlyuk
2020-06-22  7:55     [dpdk-dev] [PATCH v8 1/9] eal: move OS common functions to single file talshn
2020-06-24  8:28     ` [dpdk-dev] [PATCH v9 00/10] Windows bus/pci support talshn
2020-06-24  8:28  4%   ` [dpdk-dev] [PATCH v9 10/10] build: generate version.map file for MinGW on Windows talshn
2020-06-29 12:37       ` [dpdk-dev] [PATCH v10 00/10] Windows bus/pci support talshn
2020-06-29 12:37  4%     ` [dpdk-dev] [PATCH v10 10/10] build: generate version.map file for MinGW on Windows talshn
2020-06-23  6:48     [dpdk-dev] [RFC] ethdev: add a field for rte_eth_rxq_info Chengchang Tang
2020-06-23 14:48  3% ` Stephen Hemminger
2020-06-23 15:22  0%   ` Andrew Rybchenko
2020-06-23 13:49  9% [dpdk-dev] [PATCH] doc: mark internal symbols in ethdev Ferruh Yigit
2020-06-26  8:49  0% ` Kinsella, Ray
2020-06-24  9:36  3% [dpdk-dev] [PATCH 20.11] eal: simplify exit functions Thomas Monjalon
2020-06-30 10:26  0% ` Kinsella, Ray
2020-06-25 11:49  3% [dpdk-dev] DPDK Release Status Meeting 25/06/2020 Ferruh Yigit
2020-06-25 13:38     [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 0/6] " Matan Azrad
2020-06-29 14:08  4%   ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
2020-06-26 23:14  3% [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API Nicolas Chautru
2020-06-26 23:14  3% ` Nicolas Chautru
2020-06-30  7:30  4% ` David Marchand
2020-06-30  7:35  3%   ` Akhil Goyal
2020-07-02 17:54  0%     ` Akhil Goyal
2020-07-02 18:02  3%       ` Chautru, Nicolas
2020-07-02 18:09  4%         ` Akhil Goyal
2020-06-27  4:37  2% [dpdk-dev] [PATCH 00/27] event/dlb Intel DLB PMD Tim McDaniel
2020-06-27  4:37     ` [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites Tim McDaniel
2020-06-27  7:44  5%   ` Jerin Jacob
2020-06-29 19:30  4%     ` McDaniel, Timothy
2020-06-30  4:21  0%       ` Jerin Jacob
2020-06-30 15:37  0%         ` McDaniel, Timothy
2020-06-30 15:57  0%           ` Jerin Jacob
2020-06-30 19:26  0%             ` McDaniel, Timothy
2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
2020-06-30 21:07  0%                 ` McDaniel, Timothy
2020-07-01  4:50  3%               ` Jerin Jacob
2020-07-01 16:48  0%                 ` McDaniel, Timothy
2020-06-30 11:22  0%     ` Kinsella, Ray
2020-06-30 11:30  0%       ` Jerin Jacob
2020-06-30 11:36  0%         ` Kinsella, Ray
2020-06-30 12:14  0%           ` Jerin Jacob
2020-07-02 15:21  0%             ` Kinsella, Ray
2020-07-02 16:35  3%               ` McDaniel, Timothy
2020-06-27  4:37  1% ` [dpdk-dev] [PATCH 03/27] event/dlb: add shared code version 10.7.9 Tim McDaniel
2020-06-27  4:37  1% ` [dpdk-dev] [PATCH 08/27] event/dlb: add definitions shared with LKM or shared code Tim McDaniel
2020-06-29 22:36  3% [dpdk-dev] [dpdk-announce] DPDK Userspace CFP now open; help celebrate 10 years of DPDK Jill Lovato
2020-07-02  6:19  4% [dpdk-dev] [PATCH (v20.11) 1/2] eventdev: reserve space in config structs for extension pbhagavatula
2020-07-02  6:19  4% ` [dpdk-dev] [PATCH (v20.11) 2/2] eventdev: reserve space in timer " pbhagavatula
2020-07-02 14:58  4% [dpdk-dev] DPDK Release Status Meeting 2/07/2020 Ferruh Yigit
2020-07-03 10:26     [dpdk-dev] [PATCH 0/3] ring clean up Feifei Wang
2020-07-03 10:26     ` [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API Feifei Wang
2020-07-03 16:16  4%   ` Kinsella, Ray
2020-07-03 18:46  3%     ` Honnappa Nagarahalli
2020-07-06  6:23  3%       ` Kinsella, Ray
2020-07-07  3:19  3%         ` Feifei Wang
2020-07-07  7:40  0%           ` Kinsella, Ray
2020-07-03 10:26     ` [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
2020-07-03 16:17  3%   ` Kinsella, Ray
2020-07-03 17:15  4% [dpdk-dev] [PATCH] doc: add sample for ABI checks in contribution guide Ferruh Yigit
2020-07-05  3:41     [dpdk-dev] [pull-request] next-eventdev 20.08 RC1 Jerin Jacob Kollanukkaran
2020-07-06  9:57  3% ` Thomas Monjalon
2020-07-05 11:46     [dpdk-dev] [PATCH v5 0/3] build mempool on Windows Fady Bader
2020-07-05 13:47     ` [dpdk-dev] [PATCH v6 " Fady Bader
2020-07-05 13:47       ` [dpdk-dev] [PATCH v6 1/3] eal: disable function versioning " Fady Bader
2020-07-05 20:23  4%     ` Thomas Monjalon
2020-07-06  7:02  0%       ` Fady Bader
2020-07-06 11:32       ` [dpdk-dev] [PATCH v7 0/3] build mempool " Fady Bader
2020-07-06 11:32  5%     ` [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning " Fady Bader
2020-07-06 12:22  0%       ` Bruce Richardson
2020-07-06 23:16  0%         ` Thomas Monjalon
2020-07-07 14:45  8% [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes Ray Kinsella
2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
2020-07-07 15:23  7%   ` Thomas Monjalon
2020-07-07 16:33  4%     ` Kinsella, Ray
2020-07-07 14:45 12% ` [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period Ray Kinsella
2020-07-07 15:26  0%   ` Thomas Monjalon
2020-07-07 16:35  3%     ` Kinsella, Ray
2020-07-07 16:36  0%       ` Thomas Monjalon
2020-07-07 16:37  0%         ` Kinsella, Ray
2020-07-07 16:55  0%           ` Honnappa Nagarahalli
2020-07-07 17:00  0%             ` Thomas Monjalon
2020-07-07 17:01  0%               ` Kinsella, Ray
2020-07-07 16:57  0%           ` Thomas Monjalon
2020-07-07 17:01  4%             ` Kinsella, Ray
2020-07-07 17:08  0%               ` Thomas Monjalon
2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
2020-07-07 17:51 24%   ` [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows Ray Kinsella
2020-07-07 17:51 12%   ` [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period Ray Kinsella
2020-07-07 18:44  0%     ` Honnappa Nagarahalli
2020-07-08 10:32  7%   ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Thomas Monjalon
2020-07-08 12:02  4%     ` Kinsella, Ray
2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
2020-07-08 13:09  7% ` Kinsella, Ray
2020-07-08 13:15  4%   ` David Marchand
2020-07-08 13:22  4%     ` Kinsella, Ray
2020-07-08 13:45  7%   ` Aaron Conole
2020-07-08 14:01  4%     ` Kinsella, Ray
2020-07-09 15:52  4% ` Dodji Seketeli
     [not found]     <20200703102651.8918-1>
2020-07-09  6:12     ` [dpdk-dev] [PATCH v2 0/3] ring clean up Feifei Wang
2020-07-09  6:12  3%   ` [dpdk-dev] [PATCH v2 1/3] ring: remove experimental tag for ring reset API Feifei Wang
2020-07-09  6:12  3%   ` [dpdk-dev] [PATCH v2 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
2020-07-09  6:53  4% [dpdk-dev] [PATCH] devtools: fix ninja break under default DESTDIR path Phil Yang
2020-07-09 15:20  4% [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs Bruce Richardson
2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn Bruce Richardson
2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 4/5] rawdev: add private data length parameter to queue fns Bruce Richardson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).