DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* [dpdk-dev] [PATCH v7] eal: remove sys/queue.h from public headers.
  2021-08-14  2:51  1%       ` [dpdk-dev] [PATCH v6] " William Tu
@ 2021-08-18 23:26  1%         ` William Tu
  0 siblings, 0 replies; 200+ results
From: William Tu @ 2021-08-18 23:26 UTC (permalink / raw)
  To: dev; +Cc: Dmitry.Kozliuk, nick.connolly

Currently there are some public headers that include 'sys/queue.h', which
is not POSIX, but usually provided by the Linux/BSD system library.
(Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
The file is missing on Windows. During the Windows build, DPDK uses a
bundled copy, so building a DPDK library works fine.  But when OVS or other
applications use DPDK as a library, because some DPDK public headers
include 'sys/queue.h', on Windows, it triggers an error due to no such
file.

One solution is to install the 'lib/eal/windows/include/sys/queue.h' into
Windows environment, such as [1]. However, this means DPDK exports the
functionalities of 'sys/queue.h' into the environment, which might cause
symbols, macros, headers clashing with other applications.

The patch fixes it by removing the "#include <sys/queue.h>" from
DPDK public headers, so programs including DPDK headers don't depend
on the system to provide 'sys/queue.h'. When these public headers use
macros such as TAILQ_xxx, we replace it by the ones with RTE_ prefix.
For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
in Windows EAL. Note that these RTE_ macros are compatible with
<sys/queue.h>, both at the level of API (to use with <sys/queue.h>
macros in C files) and ABI (to avoid breaking it).

Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
the patch replaces it with RTE_TAILQ_FOREACH_SAFE.

[1] http://mails.dpdk.org/archives/dev/2021-August/216304.html

Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
Suggested-by: Dmitry Kozliuk <Dmitry.Kozliuk@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
---
v6-v7:
* remove some redundant "#incldue <sys/queue.h>"
* remove extra newline, add comment at rte_os.h for windows
  use of bundled sys/queue

v5-v6:
* fix tab/indent issue, fix type and spelling
* fix duplicate RTE_TAILQ_FOREACH_SAFE
* fix build error due to drivers/net/mlx5/mlx5_flow_meter.c
---
 drivers/bus/auxiliary/private.h            |  1 +
 drivers/bus/auxiliary/rte_bus_auxiliary.h  |  5 ++--
 drivers/bus/dpaa/dpaa_bus.c                |  4 +--
 drivers/bus/fslmc/fslmc_bus.c              |  4 +--
 drivers/bus/fslmc/fslmc_vfio.c             |  9 ++++---
 drivers/bus/ifpga/rte_bus_ifpga.h          |  8 +++---
 drivers/bus/pci/pci_params.c               |  2 ++
 drivers/bus/pci/rte_bus_pci.h              | 13 +++++----
 drivers/bus/pci/windows/pci.c              |  3 +++
 drivers/bus/pci/windows/pci_netuio.c       |  2 ++
 drivers/bus/vdev/rte_bus_vdev.h            |  7 +++--
 drivers/bus/vdev/vdev.c                    |  3 ++-
 drivers/bus/vmbus/rte_bus_vmbus.h          | 13 +++++----
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c         |  2 +-
 drivers/net/bonding/rte_eth_bond_flow.c    |  2 +-
 drivers/net/failsafe/failsafe_flow.c       |  2 +-
 drivers/net/i40e/i40e_ethdev.c             |  9 ++++---
 drivers/net/i40e/i40e_ethdev.h             |  1 +
 drivers/net/i40e/i40e_flow.c               |  6 ++---
 drivers/net/i40e/i40e_hash.c               |  2 +-
 drivers/net/i40e/rte_pmd_i40e.c            |  6 ++---
 drivers/net/iavf/iavf_generic_flow.c       | 14 +++++-----
 drivers/net/ice/ice_dcf_ethdev.c           |  1 +
 drivers/net/ice/ice_ethdev.c               |  4 +--
 drivers/net/ice/ice_generic_flow.c         | 14 +++++-----
 drivers/net/ipn3ke/ipn3ke_flow.c           |  2 +-
 drivers/net/mlx5/mlx5_flow_dv.c            |  2 +-
 drivers/net/mlx5/mlx5_flow_meter.c         |  2 +-
 drivers/net/softnic/rte_eth_softnic_flow.c |  3 ++-
 drivers/net/softnic/rte_eth_softnic_swq.c  |  2 +-
 drivers/raw/dpaa2_qdma/dpaa2_qdma.c        |  2 +-
 lib/bbdev/rte_bbdev.h                      |  2 +-
 lib/cryptodev/rte_cryptodev.h              |  2 +-
 lib/cryptodev/rte_cryptodev_pmd.h          |  2 +-
 lib/eal/common/eal_common_devargs.c        |  4 +--
 lib/eal/common/eal_common_log.c            |  1 +
 lib/eal/common/eal_common_options.c        |  2 +-
 lib/eal/common/eal_private.h               |  1 +
 lib/eal/freebsd/include/rte_os.h           | 15 +++++++++++
 lib/eal/include/rte_bus.h                  |  5 ++--
 lib/eal/include/rte_class.h                |  6 ++---
 lib/eal/include/rte_dev.h                  |  5 ++--
 lib/eal/include/rte_devargs.h              |  3 +--
 lib/eal/include/rte_log.h                  |  1 -
 lib/eal/include/rte_service.h              |  1 -
 lib/eal/include/rte_tailq.h                | 15 +++--------
 lib/eal/linux/include/rte_os.h             | 15 +++++++++++
 lib/eal/windows/eal_alarm.c                |  1 +
 lib/eal/windows/include/rte_os.h           | 31 ++++++++++++++++++++++
 lib/efd/rte_efd.c                          |  2 +-
 lib/ethdev/rte_ethdev_core.h               |  2 +-
 lib/hash/rte_fbk_hash.h                    |  1 -
 lib/hash/rte_thash.c                       |  2 ++
 lib/ip_frag/rte_ip_frag.h                  |  4 +--
 lib/mempool/rte_mempool.c                  |  2 +-
 lib/mempool/rte_mempool.h                  |  9 +++----
 lib/pci/rte_pci.h                          |  1 -
 lib/ring/rte_ring_core.h                   |  1 -
 lib/table/rte_swx_table.h                  |  7 ++---
 lib/table/rte_swx_table_selector.h         |  5 ++--
 lib/vhost/iotlb.c                          | 11 ++++----
 lib/vhost/rte_vdpa_dev.h                   |  2 +-
 lib/vhost/vdpa.c                           |  2 +-
 63 files changed, 186 insertions(+), 127 deletions(-)

diff --git a/drivers/bus/auxiliary/private.h b/drivers/bus/auxiliary/private.h
index 9987e8b501..d22e83cf7a 100644
--- a/drivers/bus/auxiliary/private.h
+++ b/drivers/bus/auxiliary/private.h
@@ -7,6 +7,7 @@
 
 #include <stdbool.h>
 #include <stdio.h>
+#include <sys/queue.h>
 
 #include "rte_bus_auxiliary.h"
 
diff --git a/drivers/bus/auxiliary/rte_bus_auxiliary.h b/drivers/bus/auxiliary/rte_bus_auxiliary.h
index 2462bad2ba..b1f5610404 100644
--- a/drivers/bus/auxiliary/rte_bus_auxiliary.h
+++ b/drivers/bus/auxiliary/rte_bus_auxiliary.h
@@ -19,7 +19,6 @@ extern "C" {
 #include <stdlib.h>
 #include <limits.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -113,7 +112,7 @@ typedef int (rte_auxiliary_dma_unmap_t)(struct rte_auxiliary_device *dev,
  * A structure describing an auxiliary device.
  */
 struct rte_auxiliary_device {
-	TAILQ_ENTRY(rte_auxiliary_device) next;   /**< Next probed device. */
+	RTE_TAILQ_ENTRY(rte_auxiliary_device) next; /**< Next probed device. */
 	struct rte_device device;                 /**< Inherit core device */
 	char name[RTE_DEV_NAME_MAX_LEN + 1];      /**< ASCII device name */
 	struct rte_intr_handle intr_handle;       /**< Interrupt handle */
@@ -124,7 +123,7 @@ struct rte_auxiliary_device {
  * A structure describing an auxiliary driver.
  */
 struct rte_auxiliary_driver {
-	TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
 	struct rte_driver driver;             /**< Inherit core driver. */
 	struct rte_auxiliary_bus *bus;        /**< Auxiliary bus reference. */
 	rte_auxiliary_match_t *match;         /**< Device match function. */
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index e499305d85..6cab2ae760 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -105,7 +105,7 @@ dpaa_add_to_device_list(struct rte_dpaa_device *newdev)
 	struct rte_dpaa_device *dev = NULL;
 	struct rte_dpaa_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
 		comp = compare_dpaa_devices(newdev, dev);
 		if (comp < 0) {
 			TAILQ_INSERT_BEFORE(dev, newdev, next);
@@ -245,7 +245,7 @@ dpaa_clean_device_list(void)
 	struct rte_dpaa_device *dev = NULL;
 	struct rte_dpaa_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
 		TAILQ_REMOVE(&rte_dpaa_bus.device_list, dev, next);
 		free(dev);
 		dev = NULL;
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index becc455f6b..8c8f8a298d 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -45,7 +45,7 @@ cleanup_fslmc_device_list(void)
 	struct rte_dpaa2_device *dev;
 	struct rte_dpaa2_device *t_dev;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, t_dev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, t_dev) {
 		TAILQ_REMOVE(&rte_fslmc_bus.device_list, dev, next);
 		free(dev);
 		dev = NULL;
@@ -82,7 +82,7 @@ insert_in_device_list(struct rte_dpaa2_device *newdev)
 	struct rte_dpaa2_device *dev = NULL;
 	struct rte_dpaa2_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, tdev) {
 		comp = compare_dpaa2_devname(newdev, dev);
 		if (comp < 0) {
 			TAILQ_INSERT_BEFORE(dev, newdev, next);
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index c8373e627a..852fcfc4dd 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -808,7 +808,8 @@ fslmc_vfio_process_group(void)
 	bool is_dpmcp_in_blocklist = false, is_dpio_in_blocklist = false;
 	int dpmcp_count = 0, dpio_count = 0, current_device;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_MPORTAL) {
 			dpmcp_count++;
 			if (dev->device.devargs &&
@@ -825,7 +826,8 @@ fslmc_vfio_process_group(void)
 
 	/* Search the MCP as that should be initialized first. */
 	current_device = 0;
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_MPORTAL) {
 			current_device++;
 			if (dev->device.devargs &&
@@ -872,7 +874,8 @@ fslmc_vfio_process_group(void)
 	}
 
 	current_device = 0;
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_IO)
 			current_device++;
 		if (dev->device.devargs &&
diff --git a/drivers/bus/ifpga/rte_bus_ifpga.h b/drivers/bus/ifpga/rte_bus_ifpga.h
index b43084155a..a85e90d384 100644
--- a/drivers/bus/ifpga/rte_bus_ifpga.h
+++ b/drivers/bus/ifpga/rte_bus_ifpga.h
@@ -28,9 +28,9 @@ struct rte_afu_device;
 struct rte_afu_driver;
 
 /** Double linked list of Intel FPGA AFU device. */
-TAILQ_HEAD(ifpga_afu_dev_list, rte_afu_device);
+RTE_TAILQ_HEAD(ifpga_afu_dev_list, rte_afu_device);
 /** Double linked list of Intel FPGA AFU device drivers. */
-TAILQ_HEAD(ifpga_afu_drv_list, rte_afu_driver);
+RTE_TAILQ_HEAD(ifpga_afu_drv_list, rte_afu_driver);
 
 #define IFPGA_BUS_BITSTREAM_PATH_MAX_LEN 256
 
@@ -71,7 +71,7 @@ struct rte_afu_shared {
  * A structure describing a AFU device.
  */
 struct rte_afu_device {
-	TAILQ_ENTRY(rte_afu_device) next;       /**< Next in device list. */
+	RTE_TAILQ_ENTRY(rte_afu_device) next;       /**< Next in device list. */
 	struct rte_device device;               /**< Inherit core device */
 	struct rte_rawdev *rawdev;    /**< Point Rawdev */
 	struct rte_afu_id id;                   /**< AFU id within FPGA. */
@@ -105,7 +105,7 @@ typedef int (afu_remove_t)(struct rte_afu_device *);
  * A structure describing a AFU device.
  */
 struct rte_afu_driver {
-	TAILQ_ENTRY(rte_afu_driver) next;       /**< Next afu driver. */
+	RTE_TAILQ_ENTRY(rte_afu_driver) next;   /**< Next afu driver. */
 	struct rte_driver driver;               /**< Inherit core driver. */
 	afu_probe_t *probe;                     /**< Device Probe function. */
 	afu_remove_t *remove;                   /**< Device Remove function. */
diff --git a/drivers/bus/pci/pci_params.c b/drivers/bus/pci/pci_params.c
index 3192e9c967..717388753d 100644
--- a/drivers/bus/pci/pci_params.c
+++ b/drivers/bus/pci/pci_params.c
@@ -2,6 +2,8 @@
  * Copyright 2018 Gaëtan Rivet
  */
 
+#include <sys/queue.h>
+
 #include <rte_bus.h>
 #include <rte_bus_pci.h>
 #include <rte_dev.h>
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 583470e831..673a2850c1 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -19,7 +19,6 @@ extern "C" {
 #include <stdlib.h>
 #include <limits.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -37,16 +36,16 @@ struct rte_pci_device;
 struct rte_pci_driver;
 
 /** List of PCI devices */
-TAILQ_HEAD(rte_pci_device_list, rte_pci_device);
+RTE_TAILQ_HEAD(rte_pci_device_list, rte_pci_device);
 /** List of PCI drivers */
-TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
+RTE_TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
 
 /* PCI Bus iterators */
 #define FOREACH_DEVICE_ON_PCIBUS(p)	\
-		TAILQ_FOREACH(p, &(rte_pci_bus.device_list), next)
+		RTE_TAILQ_FOREACH(p, &(rte_pci_bus.device_list), next)
 
 #define FOREACH_DRIVER_ON_PCIBUS(p)	\
-		TAILQ_FOREACH(p, &(rte_pci_bus.driver_list), next)
+		RTE_TAILQ_FOREACH(p, &(rte_pci_bus.driver_list), next)
 
 struct rte_devargs;
 
@@ -64,7 +63,7 @@ enum rte_pci_kernel_driver {
  * A structure describing a PCI device.
  */
 struct rte_pci_device {
-	TAILQ_ENTRY(rte_pci_device) next;   /**< Next probed PCI device. */
+	RTE_TAILQ_ENTRY(rte_pci_device) next;   /**< Next probed PCI device. */
 	struct rte_device device;           /**< Inherit core device */
 	struct rte_pci_addr addr;           /**< PCI location. */
 	struct rte_pci_id id;               /**< PCI ID. */
@@ -160,7 +159,7 @@ typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
-	TAILQ_ENTRY(rte_pci_driver) next;  /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_pci_driver) next;  /**< Next in list. */
 	struct rte_driver driver;          /**< Inherit core driver. */
 	struct rte_pci_bus *bus;           /**< PCI bus reference. */
 	rte_pci_probe_t *probe;            /**< Device probe function. */
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index d39a7748b8..d7bd5d6e80 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -1,6 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2020 Mellanox Technologies, Ltd
  */
+
+#include <sys/queue.h>
+
 #include <rte_windows.h>
 #include <rte_errno.h>
 #include <rte_log.h>
diff --git a/drivers/bus/pci/windows/pci_netuio.c b/drivers/bus/pci/windows/pci_netuio.c
index 1bf9133f71..a0b175a8fc 100644
--- a/drivers/bus/pci/windows/pci_netuio.c
+++ b/drivers/bus/pci/windows/pci_netuio.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2020 Intel Corporation.
  */
 
+#include <sys/queue.h>
+
 #include <rte_windows.h>
 #include <rte_errno.h>
 #include <rte_log.h>
diff --git a/drivers/bus/vdev/rte_bus_vdev.h b/drivers/bus/vdev/rte_bus_vdev.h
index fc315d10fa..2856799953 100644
--- a/drivers/bus/vdev/rte_bus_vdev.h
+++ b/drivers/bus/vdev/rte_bus_vdev.h
@@ -15,12 +15,11 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
 
 struct rte_vdev_device {
-	TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
+	RTE_TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
 	struct rte_device device;               /**< Inherit core device */
 };
 
@@ -53,7 +52,7 @@ rte_vdev_device_args(const struct rte_vdev_device *dev)
 }
 
 /** Double linked list of virtual device drivers. */
-TAILQ_HEAD(vdev_driver_list, rte_vdev_driver);
+RTE_TAILQ_HEAD(vdev_driver_list, rte_vdev_driver);
 
 /**
  * Probe function called for each virtual device driver once.
@@ -107,7 +106,7 @@ typedef int (rte_vdev_dma_unmap_t)(struct rte_vdev_device *dev, void *addr,
  * A virtual device driver abstraction.
  */
 struct rte_vdev_driver {
-	TAILQ_ENTRY(rte_vdev_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_vdev_driver) next; /**< Next in list. */
 	struct rte_driver driver;        /**< Inherited general driver. */
 	rte_vdev_probe_t *probe;         /**< Virtual device probe function. */
 	rte_vdev_remove_t *remove;       /**< Virtual device remove function. */
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index 281a2c34e8..a8d8b2327e 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -100,7 +100,8 @@ rte_vdev_remove_custom_scan(rte_vdev_scan_callback callback, void *user_arg)
 	struct vdev_custom_scan *custom_scan, *tmp_scan;
 
 	rte_spinlock_lock(&vdev_custom_scan_lock);
-	TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next, tmp_scan) {
+	RTE_TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next,
+				tmp_scan) {
 		if (custom_scan->callback != callback ||
 				(custom_scan->user_arg != (void *)-1 &&
 				custom_scan->user_arg != user_arg))
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h
index 4cf73ce815..6bcff66468 100644
--- a/drivers/bus/vmbus/rte_bus_vmbus.h
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -20,7 +20,6 @@ extern "C" {
 #include <limits.h>
 #include <stdbool.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -38,15 +37,15 @@ struct rte_vmbus_bus;
 struct vmbus_channel;
 struct vmbus_mon_page;
 
-TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
-TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
+RTE_TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
+RTE_TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
 
 /* VMBus iterators */
 #define FOREACH_DEVICE_ON_VMBUS(p)	\
-	TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
+	RTE_TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
 
 #define FOREACH_DRIVER_ON_VMBUS(p)	\
-	TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
+	RTE_TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
 
 /** Maximum number of VMBUS resources. */
 enum hv_uio_map {
@@ -62,7 +61,7 @@ enum hv_uio_map {
  * A structure describing a VMBUS device.
  */
 struct rte_vmbus_device {
-	TAILQ_ENTRY(rte_vmbus_device) next;    /**< Next probed VMBUS device */
+	RTE_TAILQ_ENTRY(rte_vmbus_device) next; /**< Next probed VMBUS device */
 	const struct rte_vmbus_driver *driver; /**< Associated driver */
 	struct rte_device device;              /**< Inherit core device */
 	rte_uuid_t device_id;		       /**< VMBUS device id */
@@ -93,7 +92,7 @@ typedef int (vmbus_remove_t)(struct rte_vmbus_device *);
  * A structure describing a VMBUS driver.
  */
 struct rte_vmbus_driver {
-	TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
 	struct rte_driver driver;
 	struct rte_vmbus_bus *bus;          /**< VM bus reference. */
 	vmbus_probe_t *probe;               /**< Device Probe function. */
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index dbf85e4eda..ac86b70caf 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -2018,7 +2018,7 @@ bnxt_ulp_cntxt_list_del(struct bnxt_ulp_context *ulp_ctx)
 	struct ulp_context_list_entry	*entry, *temp;
 
 	rte_spinlock_lock(&bnxt_ulp_ctxt_lock);
-	TAILQ_FOREACH_SAFE(entry, &ulp_cntx_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(entry, &ulp_cntx_list, next, temp) {
 		if (entry->ulp_ctx == ulp_ctx) {
 			TAILQ_REMOVE(&ulp_cntx_list, entry, next);
 			rte_free(entry);
diff --git a/drivers/net/bonding/rte_eth_bond_flow.c b/drivers/net/bonding/rte_eth_bond_flow.c
index 417f76bf60..65b77faae7 100644
--- a/drivers/net/bonding/rte_eth_bond_flow.c
+++ b/drivers/net/bonding/rte_eth_bond_flow.c
@@ -157,7 +157,7 @@ bond_flow_flush(struct rte_eth_dev *dev, struct rte_flow_error *err)
 	/* Destroy all bond flows from its slaves instead of flushing them to
 	 * keep the LACP flow or any other external flows.
 	 */
-	TAILQ_FOREACH_SAFE(flow, &internals->flow_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &internals->flow_list, next, tmp) {
 		lret = bond_flow_destroy(dev, flow, err);
 		if (unlikely(lret != 0))
 			ret = lret;
diff --git a/drivers/net/failsafe/failsafe_flow.c b/drivers/net/failsafe/failsafe_flow.c
index 5e2b5f7c67..354f9fec20 100644
--- a/drivers/net/failsafe/failsafe_flow.c
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -180,7 +180,7 @@ fs_flow_flush(struct rte_eth_dev *dev,
 			return ret;
 		}
 	}
-	TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
 		TAILQ_REMOVE(&PRIV(dev)->flow_list, flow, next);
 		fs_flow_release(&flow);
 	}
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 7b230e2ed1..6590363556 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -5436,7 +5436,7 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 
 	/* VSI has child to attach, release child first */
 	if (vsi->veb) {
-		TAILQ_FOREACH_SAFE(vsi_list, &vsi->veb->head, list, temp) {
+		RTE_TAILQ_FOREACH_SAFE(vsi_list, &vsi->veb->head, list, temp) {
 			if (i40e_vsi_release(vsi_list->vsi) != I40E_SUCCESS)
 				return -1;
 		}
@@ -5444,7 +5444,8 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 	}
 
 	if (vsi->floating_veb) {
-		TAILQ_FOREACH_SAFE(vsi_list, &vsi->floating_veb->head, list, temp) {
+		RTE_TAILQ_FOREACH_SAFE(vsi_list, &vsi->floating_veb->head,
+			list, temp) {
 			if (i40e_vsi_release(vsi_list->vsi) != I40E_SUCCESS)
 				return -1;
 		}
@@ -5452,7 +5453,7 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 
 	/* Remove all macvlan filters of the VSI */
 	i40e_vsi_remove_all_macvlan_filter(vsi);
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
 		rte_free(f);
 
 	if (vsi->type != I40E_VSI_MAIN &&
@@ -6055,7 +6056,7 @@ i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on)
 	i = 0;
 
 	/* Remove all existing mac */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		mac_filter[i] = f->mac_info;
 		ret = i40e_vsi_delete_mac(vsi, &f->mac_info.mac_addr);
 		if (ret) {
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cd6deabd60..374b73e4a7 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -6,6 +6,7 @@
 #define _I40E_ETHDEV_H_
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 #include <rte_time.h>
 #include <rte_kvargs.h>
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 3c1570bd9c..e41a84f1d7 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -4917,7 +4917,7 @@ i40e_flow_flush_fdir_filter(struct i40e_pf *pf)
 		}
 
 		/* Delete FDIR flows in flow list. */
-		TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+		RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 			if (flow->filter_type == RTE_ETH_FILTER_FDIR) {
 				TAILQ_REMOVE(&pf->flow_list, flow, node);
 			}
@@ -4972,7 +4972,7 @@ i40e_flow_flush_ethertype_filter(struct i40e_pf *pf)
 	}
 
 	/* Delete ethertype flows in flow list. */
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 		if (flow->filter_type == RTE_ETH_FILTER_ETHERTYPE) {
 			TAILQ_REMOVE(&pf->flow_list, flow, node);
 			rte_free(flow);
@@ -5000,7 +5000,7 @@ i40e_flow_flush_tunnel_filter(struct i40e_pf *pf)
 	}
 
 	/* Delete tunnel flows in flow list. */
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 		if (flow->filter_type == RTE_ETH_FILTER_TUNNEL) {
 			TAILQ_REMOVE(&pf->flow_list, flow, node);
 			rte_free(flow);
diff --git a/drivers/net/i40e/i40e_hash.c b/drivers/net/i40e/i40e_hash.c
index 1fb8c9abfc..6579b1a00b 100644
--- a/drivers/net/i40e/i40e_hash.c
+++ b/drivers/net/i40e/i40e_hash.c
@@ -1366,7 +1366,7 @@ i40e_hash_filter_flush(struct i40e_pf *pf)
 {
 	struct rte_flow *flow, *next;
 
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, next) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, next) {
 		if (flow->filter_type != RTE_ETH_FILTER_HASH)
 			continue;
 
diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index 2e34140c5b..ec24046440 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -216,7 +216,7 @@ i40e_vsi_rm_mac_filter(struct i40e_vsi *vsi)
 	void *temp;
 
 	/* remove all the MACs */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		vlan_num = vsi->vlan_num;
 		filter_type = f->mac_info.filter_type;
 		if (filter_type == I40E_MACVLAN_PERFECT_MATCH ||
@@ -274,7 +274,7 @@ i40e_vsi_restore_mac_filter(struct i40e_vsi *vsi)
 	void *temp;
 
 	/* restore all the MACs */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		if (f->mac_info.filter_type == I40E_MACVLAN_PERFECT_MATCH ||
 		    f->mac_info.filter_type == I40E_MACVLAN_HASH_MATCH) {
 			/**
@@ -563,7 +563,7 @@ rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 	rte_ether_addr_copy(mac_addr, &vf->mac_addr);
 
 	/* Remove all existing mac */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
 		if (i40e_vsi_delete_mac(vsi, &f->mac_info.mac_addr)
 				!= I40E_SUCCESS)
 			PMD_DRV_LOG(WARNING, "Delete MAC failed");
diff --git a/drivers/net/iavf/iavf_generic_flow.c b/drivers/net/iavf/iavf_generic_flow.c
index 1fe270fb22..b86d99e57d 100644
--- a/drivers/net/iavf/iavf_generic_flow.c
+++ b/drivers/net/iavf/iavf_generic_flow.c
@@ -1637,7 +1637,7 @@ iavf_flow_init(struct iavf_adapter *ad)
 	TAILQ_INIT(&vf->dist_parser_list);
 	rte_spinlock_init(&vf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->init == NULL) {
 			PMD_INIT_LOG(ERR, "Invalid engine type (%d)",
 				     engine->type);
@@ -1663,7 +1663,7 @@ iavf_flow_uninit(struct iavf_adapter *ad)
 	struct iavf_flow_parser_node *p_parser;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->uninit)
 			engine->uninit(ad);
 	}
@@ -1733,7 +1733,7 @@ iavf_unregister_parser(struct iavf_flow_parser *parser,
 	if (list == NULL)
 		return;
 
-	TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
 		if (p_parser->parser->engine->type == parser->engine->type) {
 			TAILQ_REMOVE(list, p_parser, node);
 			rte_free(p_parser);
@@ -1917,7 +1917,7 @@ iavf_parse_engine_create(struct iavf_adapter *ad,
 	void *temp;
 	void *meta = NULL;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -1946,7 +1946,7 @@ iavf_parse_engine_validate(struct iavf_adapter *ad,
 	void *temp;
 	void *meta = NULL;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -2089,7 +2089,7 @@ iavf_flow_is_valid(struct rte_flow *flow)
 	void *temp;
 
 	if (flow && flow->engine) {
-		TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+		RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 			if (engine == flow->engine)
 				return true;
 		}
@@ -2142,7 +2142,7 @@ iavf_flow_flush(struct rte_eth_dev *dev,
 	void *temp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(p_flow, &vf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &vf->flow_list, node, temp) {
 		ret = iavf_flow_destroy(dev, p_flow, error);
 		if (ret) {
 			PMD_DRV_LOG(ERR, "Failed to flush flows");
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index cab7c4da87..629e88980d 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -4,6 +4,7 @@
 
 #include <errno.h>
 #include <stdbool.h>
+#include <sys/queue.h>
 #include <sys/types.h>
 #include <unistd.h>
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index a4cd39c954..fadd5f2e5a 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -1104,7 +1104,7 @@ ice_remove_all_mac_vlan_filters(struct ice_vsi *vsi)
 	if (!vsi || !vsi->mac_num)
 		return -EINVAL;
 
-	TAILQ_FOREACH_SAFE(m_f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(m_f, &vsi->mac_list, next, temp) {
 		ret = ice_remove_mac_filter(vsi, &m_f->mac_info.mac_addr);
 		if (ret != ICE_SUCCESS) {
 			ret = -EINVAL;
@@ -1115,7 +1115,7 @@ ice_remove_all_mac_vlan_filters(struct ice_vsi *vsi)
 	if (vsi->vlan_num == 0)
 		return 0;
 
-	TAILQ_FOREACH_SAFE(v_f, &vsi->vlan_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(v_f, &vsi->vlan_list, next, temp) {
 		ret = ice_remove_vlan_filter(vsi, &v_f->vlan_info.vlan);
 		if (ret != ICE_SUCCESS) {
 			ret = -EINVAL;
diff --git a/drivers/net/ice/ice_generic_flow.c b/drivers/net/ice/ice_generic_flow.c
index 66b5743abf..3e557efe0c 100644
--- a/drivers/net/ice/ice_generic_flow.c
+++ b/drivers/net/ice/ice_generic_flow.c
@@ -1820,7 +1820,7 @@ ice_flow_init(struct ice_adapter *ad)
 	TAILQ_INIT(&pf->dist_parser_list);
 	rte_spinlock_init(&pf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->init == NULL) {
 			PMD_INIT_LOG(ERR, "Invalid engine type (%d)",
 					engine->type);
@@ -1846,7 +1846,7 @@ ice_flow_uninit(struct ice_adapter *ad)
 	struct ice_flow_parser_node *p_parser;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->uninit)
 			engine->uninit(ad);
 	}
@@ -1946,7 +1946,7 @@ ice_unregister_parser(struct ice_flow_parser *parser,
 	if (list == NULL)
 		return;
 
-	TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
 		if (p_parser->parser->engine->type == parser->engine->type) {
 			TAILQ_REMOVE(list, p_parser, node);
 			rte_free(p_parser);
@@ -2272,7 +2272,7 @@ ice_parse_engine_create(struct ice_adapter *ad,
 	void *meta = NULL;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		int ret;
 
 		if (parser_node->parser->parse_pattern_action(ad,
@@ -2305,7 +2305,7 @@ ice_parse_engine_validate(struct ice_adapter *ad,
 	struct ice_flow_parser_node *parser_node;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -2477,7 +2477,7 @@ ice_flow_flush(struct rte_eth_dev *dev,
 	void *temp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
 		ret = ice_flow_destroy(dev, p_flow, error);
 		if (ret) {
 			PMD_DRV_LOG(ERR, "Failed to flush flows");
@@ -2541,7 +2541,7 @@ ice_flow_redirect(struct ice_adapter *ad,
 
 	rte_spinlock_lock(&pf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
 		if (!p_flow->engine->redirect)
 			continue;
 		ret = p_flow->engine->redirect(ad, p_flow, rd);
diff --git a/drivers/net/ipn3ke/ipn3ke_flow.c b/drivers/net/ipn3ke/ipn3ke_flow.c
index c702e19ea5..f5867ca055 100644
--- a/drivers/net/ipn3ke/ipn3ke_flow.c
+++ b/drivers/net/ipn3ke/ipn3ke_flow.c
@@ -1231,7 +1231,7 @@ ipn3ke_flow_flush(struct rte_eth_dev *dev,
 	struct ipn3ke_hw *hw = IPN3KE_DEV_PRIVATE_TO_HW(dev);
 	struct rte_flow *flow, *temp;
 
-	TAILQ_FOREACH_SAFE(flow, &hw->flow_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &hw->flow_list, next, temp) {
 		TAILQ_REMOVE(&hw->flow_list, flow, next);
 		rte_free(flow);
 	}
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 31d857030f..ba2bf4de37 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -15099,7 +15099,7 @@ __flow_dv_destroy_sub_policy_rules(struct rte_eth_dev *dev,
 		    policy->act_cnt[i].fate_action == MLX5_FLOW_FATE_MTR)
 			next_fm = mlx5_flow_meter_find(priv,
 					policy->act_cnt[i].next_mtr_id, NULL);
-		TAILQ_FOREACH_SAFE(color_rule, &sub_policy->color_rules[i],
+		RTE_TAILQ_FOREACH_SAFE(color_rule, &sub_policy->color_rules[i],
 				   next_port, tmp) {
 			claim_zero(mlx5_flow_os_destroy_flow(color_rule->rule));
 			tbl = container_of(color_rule->matcher->tbl,
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index a24bd9c7ae..ba4e9fca17 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -2168,7 +2168,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 			priv->mtr_idx_tbl = NULL;
 		}
 	} else {
-		TAILQ_FOREACH_SAFE(legacy_fm, fms, next, tmp) {
+		RTE_TAILQ_FOREACH_SAFE(legacy_fm, fms, next, tmp) {
 			fm = &legacy_fm->fm;
 			if (mlx5_flow_meter_params_flush(dev, fm, 0))
 				return -rte_mtr_error_set(error, EINVAL,
diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c b/drivers/net/softnic/rte_eth_softnic_flow.c
index 27eaf380cd..7d054c38d2 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -2207,7 +2207,8 @@ pmd_flow_flush(struct rte_eth_dev *dev,
 			void *temp;
 			int status;
 
-			TAILQ_FOREACH_SAFE(flow, &table->flows, node, temp) {
+			RTE_TAILQ_FOREACH_SAFE(flow, &table->flows, node,
+				temp) {
 				/* Rule delete. */
 				status = softnic_pipeline_table_rule_delete
 						(softnic,
diff --git a/drivers/net/softnic/rte_eth_softnic_swq.c b/drivers/net/softnic/rte_eth_softnic_swq.c
index 2083d0a976..afe6f05e29 100644
--- a/drivers/net/softnic/rte_eth_softnic_swq.c
+++ b/drivers/net/softnic/rte_eth_softnic_swq.c
@@ -39,7 +39,7 @@ softnic_softnic_swq_free_keep_rxq_txq(struct pmd_internals *p)
 {
 	struct softnic_swq *swq, *tswq;
 
-	TAILQ_FOREACH_SAFE(swq, &p->swq_list, node, tswq) {
+	RTE_TAILQ_FOREACH_SAFE(swq, &p->swq_list, node, tswq) {
 		if ((strncmp(swq->name, "RXQ", strlen("RXQ")) == 0) ||
 			(strncmp(swq->name, "TXQ", strlen("TXQ")) == 0))
 			continue;
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index c961e18d67..7b80370b36 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -1606,7 +1606,7 @@ remove_hw_queues_from_list(struct dpaa2_dpdmai_dev *dpdmai_dev)
 
 	DPAA2_QDMA_FUNC_TRACE();
 
-	TAILQ_FOREACH_SAFE(queue, &qdma_queue_list, next, tqueue) {
+	RTE_TAILQ_FOREACH_SAFE(queue, &qdma_queue_list, next, tqueue) {
 		if (queue->dpdmai_dev == dpdmai_dev) {
 			TAILQ_REMOVE(&qdma_queue_list, queue, next);
 			rte_free(queue);
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 7017124414..3ebf62e697 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -434,7 +434,7 @@ struct rte_bbdev_callback;
 struct rte_intr_handle;
 
 /** Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_bbdev_cb_list, rte_bbdev_callback);
+RTE_TAILQ_HEAD(rte_bbdev_cb_list, rte_bbdev_callback);
 
 /**
  * @internal The data structure associated with a device. Drivers can access
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index 11f4e6fdbf..f86bf2260b 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -879,7 +879,7 @@ typedef uint16_t (*enqueue_pkt_burst_t)(void *qp,
 struct rte_cryptodev_callback;
 
 /** Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
+RTE_TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
 
 /**
  * Structure used to hold information about the callbacks to be called for a
diff --git a/lib/cryptodev/rte_cryptodev_pmd.h b/lib/cryptodev/rte_cryptodev_pmd.h
index 1274436870..9542cbf263 100644
--- a/lib/cryptodev/rte_cryptodev_pmd.h
+++ b/lib/cryptodev/rte_cryptodev_pmd.h
@@ -66,7 +66,7 @@ struct rte_cryptodev_global {
 
 /* Cryptodev driver, containing the driver ID */
 struct cryptodev_driver {
-	TAILQ_ENTRY(cryptodev_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(cryptodev_driver) next; /**< Next in list. */
 	const struct rte_driver *driver;
 	uint8_t id;
 };
diff --git a/lib/eal/common/eal_common_devargs.c b/lib/eal/common/eal_common_devargs.c
index 23aaf8b7e4..2e2f35c47e 100644
--- a/lib/eal/common/eal_common_devargs.c
+++ b/lib/eal/common/eal_common_devargs.c
@@ -291,7 +291,7 @@ rte_devargs_insert(struct rte_devargs **da)
 	if (*da == NULL || (*da)->bus == NULL)
 		return -1;
 
-	TAILQ_FOREACH_SAFE(listed_da, &devargs_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(listed_da, &devargs_list, next, tmp) {
 		if (listed_da == *da)
 			/* devargs already in the list */
 			return 0;
@@ -358,7 +358,7 @@ rte_devargs_remove(struct rte_devargs *devargs)
 	if (devargs == NULL || devargs->bus == NULL)
 		return -1;
 
-	TAILQ_FOREACH_SAFE(d, &devargs_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(d, &devargs_list, next, tmp) {
 		if (strcmp(d->bus->name, devargs->bus->name) == 0 &&
 		    strcmp(d->name, devargs->name) == 0) {
 			TAILQ_REMOVE(&devargs_list, d, next);
diff --git a/lib/eal/common/eal_common_log.c b/lib/eal/common/eal_common_log.c
index ec8fe23a7f..1be35f5397 100644
--- a/lib/eal/common/eal_common_log.c
+++ b/lib/eal/common/eal_common_log.c
@@ -10,6 +10,7 @@
 #include <errno.h>
 #include <regex.h>
 #include <fnmatch.h>
+#include <sys/queue.h>
 
 #include <rte_eal.h>
 #include <rte_log.h>
diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
index ff5861b5f3..24f5ceaab0 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -283,7 +283,7 @@ eal_option_device_parse(void)
 	void *tmp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(devopt, &devopt_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(devopt, &devopt_list, next, tmp) {
 		if (ret == 0) {
 			ret = rte_devargs_add(devopt->type, devopt->arg);
 			if (ret)
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 64cf4e81c8..86dab1f057 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -8,6 +8,7 @@
 #include <stdbool.h>
 #include <stdint.h>
 #include <stdio.h>
+#include <sys/queue.h>
 
 #include <rte_dev.h>
 #include <rte_lcore.h>
diff --git a/lib/eal/freebsd/include/rte_os.h b/lib/eal/freebsd/include/rte_os.h
index 627f0483ab..06f30ce238 100644
--- a/lib/eal/freebsd/include/rte_os.h
+++ b/lib/eal/freebsd/include/rte_os.h
@@ -11,6 +11,21 @@
  */
 
 #include <pthread_np.h>
+#include <sys/queue.h>
+
+/* These macros are compatible with system's sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
+#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
+#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
+#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
+#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
+#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
+#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)
+
 
 typedef cpuset_t rte_cpuset_t;
 #define RTE_HAS_CPUSET
diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
index 80b154fb98..84d364df3f 100644
--- a/lib/eal/include/rte_bus.h
+++ b/lib/eal/include/rte_bus.h
@@ -19,13 +19,12 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 
 #include <rte_log.h>
 #include <rte_dev.h>
 
 /** Double linked list of buses */
-TAILQ_HEAD(rte_bus_list, rte_bus);
+RTE_TAILQ_HEAD(rte_bus_list, rte_bus);
 
 
 /**
@@ -250,7 +249,7 @@ typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
  * A structure describing a generic bus.
  */
 struct rte_bus {
-	TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
+	RTE_TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
 	const char *name;            /**< Name of the bus */
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
diff --git a/lib/eal/include/rte_class.h b/lib/eal/include/rte_class.h
index 856d09b22d..d560339652 100644
--- a/lib/eal/include/rte_class.h
+++ b/lib/eal/include/rte_class.h
@@ -22,18 +22,16 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
-
 #include <rte_dev.h>
 
 /** Double linked list of classes */
-TAILQ_HEAD(rte_class_list, rte_class);
+RTE_TAILQ_HEAD(rte_class_list, rte_class);
 
 /**
  * A structure describing a generic device class.
  */
 struct rte_class {
-	TAILQ_ENTRY(rte_class) next; /**< Next device class in linked list */
+	RTE_TAILQ_ENTRY(rte_class) next; /**< Next device class in linked list */
 	const char *name; /**< Name of the class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 };
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index 6dd72c11a1..f6efe0c94e 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -18,7 +18,6 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_compat.h>
@@ -75,7 +74,7 @@ struct rte_mem_resource {
  * A structure describing a device driver.
  */
 struct rte_driver {
-	TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
 	const char *name;                   /**< Driver name. */
 	const char *alias;              /**< Driver alias. */
 };
@@ -90,7 +89,7 @@ struct rte_driver {
  * A structure describing a generic device.
  */
 struct rte_device {
-	TAILQ_ENTRY(rte_device) next; /**< Next device */
+	RTE_TAILQ_ENTRY(rte_device) next; /**< Next device */
 	const char *name;             /**< Device name */
 	const struct rte_driver *driver; /**< Driver assigned after probing */
 	const struct rte_bus *bus;    /**< Bus handle assigned on scan */
diff --git a/lib/eal/include/rte_devargs.h b/lib/eal/include/rte_devargs.h
index cd90944fe8..957477b398 100644
--- a/lib/eal/include/rte_devargs.h
+++ b/lib/eal/include/rte_devargs.h
@@ -21,7 +21,6 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 #include <rte_compat.h>
 #include <rte_bus.h>
 
@@ -76,7 +75,7 @@ enum rte_devtype {
  */
 struct rte_devargs {
 	/** Next in list. */
-	TAILQ_ENTRY(rte_devargs) next;
+	RTE_TAILQ_ENTRY(rte_devargs) next;
 	/** Type of device. */
 	enum rte_devtype type;
 	/** Device policy. */
diff --git a/lib/eal/include/rte_log.h b/lib/eal/include/rte_log.h
index b706bb8710..bb3523467b 100644
--- a/lib/eal/include/rte_log.h
+++ b/lib/eal/include/rte_log.h
@@ -21,7 +21,6 @@ extern "C" {
 #include <stdio.h>
 #include <stdarg.h>
 #include <stdbool.h>
-#include <sys/queue.h>
 
 #include <rte_common.h>
 #include <rte_config.h>
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index c7d037d862..1c9275c32a 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -29,7 +29,6 @@ extern "C" {
 
 #include<stdio.h>
 #include <stdint.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_lcore.h>
diff --git a/lib/eal/include/rte_tailq.h b/lib/eal/include/rte_tailq.h
index b6fe4e5f78..b32033ad66 100644
--- a/lib/eal/include/rte_tailq.h
+++ b/lib/eal/include/rte_tailq.h
@@ -15,17 +15,16 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
 #include <stdio.h>
 #include <rte_debug.h>
 
 /** dummy structure type used by the rte_tailq APIs */
 struct rte_tailq_entry {
-	TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
+	RTE_TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
 	void *data; /**< Pointer to the data referenced by this tailq entry */
 };
 /** dummy */
-TAILQ_HEAD(rte_tailq_entry_head, rte_tailq_entry);
+RTE_TAILQ_HEAD(rte_tailq_entry_head, rte_tailq_entry);
 
 #define RTE_TAILQ_NAMESIZE 32
 
@@ -48,7 +47,7 @@ struct rte_tailq_elem {
 	 * rte_eal_tailqs_init()
 	 */
 	struct rte_tailq_head *head;
-	TAILQ_ENTRY(rte_tailq_elem) next;
+	RTE_TAILQ_ENTRY(rte_tailq_elem) next;
 	const char name[RTE_TAILQ_NAMESIZE];
 };
 
@@ -125,14 +124,6 @@ RTE_INIT(tailqinitfn_ ##t) \
 		rte_panic("Cannot initialize tailq: %s\n", t.name); \
 }
 
-/* This macro permits both remove and free var within the loop safely.*/
-#ifndef TAILQ_FOREACH_SAFE
-#define TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
-	for ((var) = TAILQ_FIRST((head));			\
-	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1);	\
-	    (var) = (tvar))
-#endif
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/linux/include/rte_os.h b/lib/eal/linux/include/rte_os.h
index 1618b4df22..ce5b0aed52 100644
--- a/lib/eal/linux/include/rte_os.h
+++ b/lib/eal/linux/include/rte_os.h
@@ -11,6 +11,21 @@
  */
 
 #include <sched.h>
+#include <sys/queue.h>
+
+/* These macros are compatible with system's sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
+#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
+#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
+#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
+#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
+#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
+#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)
+
 
 #ifdef CPU_SETSIZE /* may require _GNU_SOURCE */
 typedef cpu_set_t rte_cpuset_t;
diff --git a/lib/eal/windows/eal_alarm.c b/lib/eal/windows/eal_alarm.c
index e5dc54efb8..103c1f909d 100644
--- a/lib/eal/windows/eal_alarm.c
+++ b/lib/eal/windows/eal_alarm.c
@@ -4,6 +4,7 @@
 
 #include <stdatomic.h>
 #include <stdbool.h>
+#include <sys/queue.h>
 
 #include <rte_alarm.h>
 #include <rte_spinlock.h>
diff --git a/lib/eal/windows/include/rte_os.h b/lib/eal/windows/include/rte_os.h
index 66c711d458..0cbe1dbc1e 100644
--- a/lib/eal/windows/include/rte_os.h
+++ b/lib/eal/windows/include/rte_os.h
@@ -18,6 +18,37 @@
 extern "C" {
 #endif
 
+/* These macros are compatible with bundled sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) \
+struct name { \
+	struct type *tqh_first; /* first element */ \
+	struct type **tqh_last; /* addr of last next element */ \
+}
+#define RTE_TAILQ_ENTRY(type) \
+struct { \
+	struct type *tqe_next; /* next element */ \
+	struct type **tqe_prev; /* address of previous next element */ \
+}
+#define RTE_TAILQ_FOREACH(var, head, field) \
+	for ((var) = RTE_TAILQ_FIRST((head)); \
+	    (var); \
+	    (var) = RTE_TAILQ_NEXT((var), field))
+#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) ((head)->tqh_first)
+#define RTE_TAILQ_NEXT(elm, field) ((elm)->field.tqe_next)
+#define RTE_STAILQ_HEAD(name, type) \
+struct name { \
+	struct type *stqh_first;/* first element */ \
+	struct type **stqh_last;/* addr of last next element */ \
+}
+#define RTE_STAILQ_ENTRY(type) \
+struct { \
+	struct type *stqe_next; /* next element */ \
+}
+
 /* cpu_set macros implementation */
 #define RTE_CPU_AND(dst, src1, src2) CPU_AND(dst, src1, src2)
 #define RTE_CPU_OR(dst, src1, src2) CPU_OR(dst, src1, src2)
diff --git a/lib/efd/rte_efd.c b/lib/efd/rte_efd.c
index 77f46809f8..5bf517fee9 100644
--- a/lib/efd/rte_efd.c
+++ b/lib/efd/rte_efd.c
@@ -759,7 +759,7 @@ rte_efd_free(struct rte_efd_table *table)
 	efd_list = RTE_TAILQ_CAST(rte_efd_tailq.head, rte_efd_list);
 	rte_mcfg_tailq_write_lock();
 
-	TAILQ_FOREACH_SAFE(te, efd_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(te, efd_list, next, temp) {
 		if (te->data == (void *) table) {
 			TAILQ_REMOVE(efd_list, te, next);
 			rte_free(te);
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index edf96de2dc..d2c9ec42c7 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -21,7 +21,7 @@
 
 struct rte_eth_dev_callback;
 /** @internal Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
+RTE_TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
 
 struct rte_eth_dev;
 
diff --git a/lib/hash/rte_fbk_hash.h b/lib/hash/rte_fbk_hash.h
index c4d6976d2b..9c3a61c1d6 100644
--- a/lib/hash/rte_fbk_hash.h
+++ b/lib/hash/rte_fbk_hash.h
@@ -17,7 +17,6 @@
 
 #include <stdint.h>
 #include <errno.h>
-#include <sys/queue.h>
 
 #ifdef __cplusplus
 extern "C" {
diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c
index d5a95a6e00..696a1121e2 100644
--- a/lib/hash/rte_thash.c
+++ b/lib/hash/rte_thash.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2021 Intel Corporation
  */
 
+#include <sys/queue.h>
+
 #include <rte_thash.h>
 #include <rte_tailq.h>
 #include <rte_random.h>
diff --git a/lib/ip_frag/rte_ip_frag.h b/lib/ip_frag/rte_ip_frag.h
index 0bfe64b14e..80f931c32a 100644
--- a/lib/ip_frag/rte_ip_frag.h
+++ b/lib/ip_frag/rte_ip_frag.h
@@ -62,7 +62,7 @@ struct ip_frag_key {
  * First two entries in the frags[] array are for the last and first fragments.
  */
 struct ip_frag_pkt {
-	TAILQ_ENTRY(ip_frag_pkt) lru;   /**< LRU list */
+	RTE_TAILQ_ENTRY(ip_frag_pkt) lru;   /**< LRU list */
 	struct ip_frag_key key;           /**< fragmentation key */
 	uint64_t             start;       /**< creation timestamp */
 	uint32_t             total_size;  /**< expected reassembled size */
@@ -83,7 +83,7 @@ struct rte_ip_frag_death_row {
 	/**< mbufs to be freed */
 };
 
-TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */
+RTE_TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */
 
 /** fragmentation table statistics */
 struct ip_frag_tbl_stat {
diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 59a588425b..c5f859ae71 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1337,7 +1337,7 @@ void rte_mempool_walk(void (*func)(struct rte_mempool *, void *),
 
 	rte_mcfg_mempool_read_lock();
 
-	TAILQ_FOREACH_SAFE(te, mempool_list, next, tmp_te) {
+	RTE_TAILQ_FOREACH_SAFE(te, mempool_list, next, tmp_te) {
 		(*func)((struct rte_mempool *) te->data, arg);
 	}
 
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 4235d6f0bf..f57ecbd6fc 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -38,7 +38,6 @@
 #include <stdint.h>
 #include <errno.h>
 #include <inttypes.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_spinlock.h>
@@ -141,7 +140,7 @@ struct rte_mempool_objsz {
  * double-frees.
  */
 struct rte_mempool_objhdr {
-	STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
+	RTE_STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
 	struct rte_mempool *mp;          /**< The mempool owning the object. */
 	rte_iova_t iova;                 /**< IO address of the object. */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
@@ -152,7 +151,7 @@ struct rte_mempool_objhdr {
 /**
  * A list of object headers type
  */
-STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
+RTE_STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
 
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 
@@ -171,7 +170,7 @@ struct rte_mempool_objtlr {
 /**
  * A list of memory where objects are stored
  */
-STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
+RTE_STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
 
 /**
  * Callback used to free a memory chunk
@@ -186,7 +185,7 @@ typedef void (rte_mempool_memchunk_free_cb_t)(struct rte_mempool_memhdr *memhdr,
  * and physically contiguous.
  */
 struct rte_mempool_memhdr {
-	STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
+	RTE_STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
 	struct rte_mempool *mp;  /**< The mempool owning the chunk */
 	void *addr;              /**< Virtual address of the chunk */
 	rte_iova_t iova;         /**< IO address of the chunk */
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index 1f33d687f4..71cbd441c7 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -18,7 +18,6 @@ extern "C" {
 
 #include <stdio.h>
 #include <limits.h>
-#include <sys/queue.h>
 #include <inttypes.h>
 #include <sys/types.h>
 
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 16718ca7f1..43ce1a29d4 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -26,7 +26,6 @@ extern "C" {
 #include <stdio.h>
 #include <stdint.h>
 #include <string.h>
-#include <sys/queue.h>
 #include <errno.h>
 #include <rte_common.h>
 #include <rte_config.h>
diff --git a/lib/table/rte_swx_table.h b/lib/table/rte_swx_table.h
index e23f2304c6..f93e5f3f95 100644
--- a/lib/table/rte_swx_table.h
+++ b/lib/table/rte_swx_table.h
@@ -16,7 +16,8 @@ extern "C" {
  */
 
 #include <stdint.h>
-#include <sys/queue.h>
+
+#include <rte_os.h>
 
 /** Match type. */
 enum rte_swx_table_match_type {
@@ -68,7 +69,7 @@ struct rte_swx_table_entry {
 	/** Used to facilitate the membership of this table entry to a
 	 * linked list.
 	 */
-	TAILQ_ENTRY(rte_swx_table_entry) node;
+	RTE_TAILQ_ENTRY(rte_swx_table_entry) node;
 
 	/** Key value for the current entry. Array of *key_size* bytes or NULL
 	 * if the *key_size* for the current table is 0.
@@ -111,7 +112,7 @@ struct rte_swx_table_entry {
 };
 
 /** List of table entries. */
-TAILQ_HEAD(rte_swx_table_entry_list, rte_swx_table_entry);
+RTE_TAILQ_HEAD(rte_swx_table_entry_list, rte_swx_table_entry);
 
 /**
  * Table memory footprint get
diff --git a/lib/table/rte_swx_table_selector.h b/lib/table/rte_swx_table_selector.h
index 71b6a74810..62988d2856 100644
--- a/lib/table/rte_swx_table_selector.h
+++ b/lib/table/rte_swx_table_selector.h
@@ -16,7 +16,6 @@ extern "C" {
  */
 
 #include <stdint.h>
-#include <sys/queue.h>
 
 #include <rte_compat.h>
 
@@ -56,7 +55,7 @@ struct rte_swx_table_selector_params {
 /** Group member parameters. */
 struct rte_swx_table_selector_member {
 	/** Linked list connectivity. */
-	TAILQ_ENTRY(rte_swx_table_selector_member) node;
+	RTE_TAILQ_ENTRY(rte_swx_table_selector_member) node;
 
 	/** Member ID. */
 	uint32_t member_id;
@@ -66,7 +65,7 @@ struct rte_swx_table_selector_member {
 };
 
 /** List of group members. */
-TAILQ_HEAD(rte_swx_table_selector_member_list, rte_swx_table_selector_member);
+RTE_TAILQ_HEAD(rte_swx_table_selector_member_list, rte_swx_table_selector_member);
 
 /** Group parameters. */
 struct rte_swx_table_selector_group {
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index e0b67721b6..e4a445e709 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -32,7 +32,7 @@ vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
 
 	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
 		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
 		rte_mempool_put(vq->iotlb_pool, node);
 	}
@@ -100,7 +100,8 @@ vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
 
 	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next,
+				temp_node) {
 		if (node->iova < iova)
 			continue;
 		if (node->iova >= iova + size)
@@ -121,7 +122,7 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		TAILQ_REMOVE(&vq->iotlb_list, node, next);
 		rte_mempool_put(vq->iotlb_pool, node);
 	}
@@ -141,7 +142,7 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
 
 	entry_idx = rte_rand() % vq->iotlb_cache_nr;
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			rte_mempool_put(vq->iotlb_pool, node);
@@ -218,7 +219,7 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		/* Sorted list */
 		if (unlikely(iova + size < node->iova))
 			break;
diff --git a/lib/vhost/rte_vdpa_dev.h b/lib/vhost/rte_vdpa_dev.h
index bfada387b0..b0f494815f 100644
--- a/lib/vhost/rte_vdpa_dev.h
+++ b/lib/vhost/rte_vdpa_dev.h
@@ -71,7 +71,7 @@ struct rte_vdpa_dev_ops {
  * vdpa device structure includes device address and device operations.
  */
 struct rte_vdpa_device {
-	TAILQ_ENTRY(rte_vdpa_device) next;
+	RTE_TAILQ_ENTRY(rte_vdpa_device) next;
 	/** Generic device information */
 	struct rte_device *device;
 	/** vdpa device operations */
diff --git a/lib/vhost/vdpa.c b/lib/vhost/vdpa.c
index 99a926a772..6dd91859ac 100644
--- a/lib/vhost/vdpa.c
+++ b/lib/vhost/vdpa.c
@@ -115,7 +115,7 @@ rte_vdpa_unregister_device(struct rte_vdpa_device *dev)
 	int ret = -1;
 
 	rte_spinlock_lock(&vdpa_device_list_lock);
-	TAILQ_FOREACH_SAFE(cur_dev, &vdpa_device_list, next, tmp_dev) {
+	RTE_TAILQ_FOREACH_SAFE(cur_dev, &vdpa_device_list, next, tmp_dev) {
 		if (dev != cur_dev)
 			continue;
 
-- 
2.30.2


^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v4 2/6] eal: add function for control thread creation
  2021-08-18 21:19  4% ` [dpdk-dev] [PATCH v4 0/6] Enable the internal EAL thread API Narcisa Ana Maria Vasile
@ 2021-08-18 21:19  4%   ` Narcisa Ana Maria Vasile
  0 siblings, 0 replies; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-08-18 21:19 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

The existing rte_ctrl_thread_create() function will be replaced
with rte_thread_ctrl_thread_create() that uses the internal
EAL thread API.

This patch only introduces the new control thread creation
function. Replacing of the old function needs to be done according
to the ABI change procedures, to avoid an ABI break.

Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
---
 lib/eal/common/eal_common_thread.c | 81 ++++++++++++++++++++++++++++++
 lib/eal/include/rte_thread.h       | 27 ++++++++++
 lib/eal/version.map                |  1 +
 3 files changed, 109 insertions(+)

diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 1a52f42a2b..79545c67d9 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -259,6 +259,87 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
 	return -ret;
 }
 
+struct rte_thread_ctrl_ctx {
+	rte_thread_func start_routine;
+	void *arg;
+	const char *name;
+};
+
+static void *ctrl_thread_wrapper(void *arg)
+{
+	struct internal_config *conf = eal_get_internal_configuration();
+	rte_cpuset_t *cpuset = &conf->ctrl_cpuset;
+	struct rte_thread_ctrl_ctx *ctx = arg;
+	rte_thread_func start_routine = ctx->start_routine;
+	void *routine_arg = ctx->arg;
+
+	__rte_thread_init(rte_lcore_id(), cpuset);
+
+	if (ctx->name != NULL) {
+		if (rte_thread_name_set(rte_thread_self(), ctx->name) < 0)
+			RTE_LOG(DEBUG, EAL, "Cannot set name for ctrl thread\n");
+	}
+
+	free(arg);
+
+	return start_routine(routine_arg);
+}
+
+int
+rte_thread_ctrl_thread_create(rte_thread_t *thread, const char *name,
+		rte_thread_func start_routine, void *arg)
+{
+	int ret;
+	rte_thread_attr_t attr;
+	struct internal_config *conf = eal_get_internal_configuration();
+	rte_cpuset_t *cpuset = &conf->ctrl_cpuset;
+	struct rte_thread_ctrl_ctx *ctx = NULL;
+
+	if (start_routine == NULL) {
+		ret = EINVAL;
+		goto cleanup;
+	}
+
+	ctx = malloc(sizeof(*ctx));
+	if (ctx == NULL) {
+		ret = ENOMEM;
+		goto cleanup;
+	}
+
+	ctx->start_routine = start_routine;
+	ctx->arg = arg;
+	ctx->name = name;
+
+	ret = rte_thread_attr_init(&attr);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot init ctrl thread attributes\n");
+		goto cleanup;
+	}
+
+	ret = rte_thread_attr_set_affinity(&attr, cpuset);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot set afifnity attribute for ctrl thread\n");
+		goto cleanup;
+	}
+	ret = rte_thread_attr_set_priority(&attr, RTE_THREAD_PRIORITY_NORMAL);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot set priority attribute for ctrl thread\n");
+		goto cleanup;
+	}
+
+	ret = rte_thread_create(thread, &attr, ctrl_thread_wrapper, ctx);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot create ctrl thread\n");
+		goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	free(ctx);
+	return ret;
+}
+
 int
 rte_thread_register(void)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 2f6258e336..e34101cc98 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -455,6 +455,33 @@ int rte_thread_barrier_destroy(rte_thread_barrier *barrier);
 __rte_experimental
 int rte_thread_name_set(rte_thread_t thread_id, const char *name);
 
+/**
+ * Create a control thread.
+ *
+ * Set affinity and thread name. The affinity of the new thread is based
+ * on the CPU affinity retrieved at the time rte_eal_init() was called,
+ * the dataplane and service lcores are then excluded.
+ *
+ * @param thread
+ *   Filled with the thread id of the new created thread.
+ *
+ * @param name
+ *   The name of the control thread (max 16 characters including '\0').
+ *
+ * @param start_routine
+ *   Function to be executed by the new thread.
+ *
+ * @param arg
+ *   Argument passed to start_routine.
+ *
+ * @return
+ *   On success, return 0;
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_ctrl_thread_create(rte_thread_t *thread, const char *name,
+		rte_thread_func start_routine, void *arg);
+
 /**
  * Create a TLS data key visible to all threads in the process.
  * the created key is later used to get/set a value.
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 7ce8dcea07..67569b1bf9 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -447,6 +447,7 @@ EXPERIMENTAL {
 	rte_thread_barrier_wait;
 	rte_thread_barrier_destroy;
 	rte_thread_name_set;
+	rte_thread_ctrl_thread_create;
 };
 
 INTERNAL {
-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4 0/6] Enable the internal EAL thread API
  @ 2021-08-18 21:19  4% ` Narcisa Ana Maria Vasile
  2021-08-18 21:19  4%   ` [dpdk-dev] [PATCH v4 2/6] eal: add function for control thread creation Narcisa Ana Maria Vasile
  0 siblings, 1 reply; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-08-18 21:19 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

This patchset enables the new EAL thread API.
The newly defined thread attributes, priority and affinity,
are used in eal/windows when creating the threads. Similarly, 
some changes have been done in eal/linux/eal.c and eal/freebsd/eal.c
to initialize priority to a default value and set thread attributes.

The user is offered the option of either using the rte_thread_* API or
a 3rd party thread library, through a meson flag
called "use_external_thread_lib".
By default, this flag is set to FALSE, which means Windows libraries
and applications will use the EAL rte_thread_* API 
defined in windows/rte_thread.c for managing threads.
When the flag is set to TRUE, the common/rte_thread.c file is compiled
and an external thread library is used.

This patchset adds a new function for creating control threads that
uses the new thread API.
It enables the usage of the new function in Windows code and common code.
The old function is kept to avoid ABI break, however, its definition
is commented away on Windows, since the pthread_t and pthread_attr_t
arguments that it receives have been replaced with the new API on Windows.
This allows testing the "eal: Add EAL API for threading" that this
patchset depends on.

The ethdev lib also contains some changes that break the ABI.
Enabling the new EAL thread API will probably require going through
the proper process of ABI changes.

Depends-on: series-18172 ("eal: Add EAL API for threading")

v4:
- Free resources on error path
- Use RTE_FINI to unload kernel32.dll

v3:
- use RTE_INIT to only load kernel32.dll once and get function
  pointer to SetThreadDescription()
- minor fixes

v2:
- fix typo in SetThreadDescription_type function pointer
- add Depends-on on all patches to fix apply errors.
- modify cover letter

Narcisa Vasile (6):
  eal: add function that sets thread name
  eal: add function for control thread creation
  Enable the new EAL thread API in app, drivers and examples
  lib: enable the new EAL thread API
  eal: set affinity and priority attributes
  Allow choice between internal EAL thread API and external lib

 app/test/process.h                            |   8 +-
 app/test/test_lcores.c                        |  18 +-
 app/test/test_link_bonding.c                  |  14 +-
 app/test/test_lpm_perf.c                      |  12 +-
 config/meson.build                            |   1 -
 drivers/bus/dpaa/base/qbman/bman_driver.c     |   5 +-
 drivers/bus/dpaa/base/qbman/dpaa_sys.c        |  14 +-
 drivers/bus/dpaa/base/qbman/process.c         |   6 +-
 drivers/bus/dpaa/dpaa_bus.c                   |  14 +-
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.c      |  19 +-
 drivers/common/dpaax/compat.h                 |   2 +-
 drivers/common/mlx5/windows/mlx5_common_os.h  |   1 +
 drivers/compress/mlx5/mlx5_compress.c         |  14 +-
 drivers/event/dlb2/dlb2.c                     |   2 +-
 drivers/event/dlb2/pf/base/dlb2_osdep.h       |   7 +-
 drivers/mempool/dpaa/dpaa_mempool.c           |   2 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c           |  18 +-
 drivers/net/ark/ark_ethdev.c                  |   4 +-
 drivers/net/ark/ark_pktgen.c                  |   4 +-
 drivers/net/atlantic/atl_ethdev.c             |   4 +-
 drivers/net/atlantic/atl_types.h              |   4 +-
 .../net/atlantic/hw_atl/hw_atl_utils_fw2x.c   |  26 +--
 drivers/net/axgbe/axgbe_common.h              |   2 +-
 drivers/net/axgbe/axgbe_dev.c                 |   8 +-
 drivers/net/axgbe/axgbe_ethdev.c              |   8 +-
 drivers/net/axgbe/axgbe_ethdev.h              |   8 +-
 drivers/net/axgbe/axgbe_i2c.c                 |   4 +-
 drivers/net/axgbe/axgbe_mdio.c                |   8 +-
 drivers/net/axgbe/axgbe_phy_impl.c            |   6 +-
 drivers/net/bnxt/bnxt.h                       |  16 +-
 drivers/net/bnxt/bnxt_cpr.c                   |   4 +-
 drivers/net/bnxt/bnxt_ethdev.c                |  54 ++---
 drivers/net/bnxt/bnxt_irq.c                   |   8 +-
 drivers/net/bnxt/bnxt_reps.c                  |  10 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c            |  34 ++--
 drivers/net/bnxt/tf_ulp/bnxt_ulp.h            |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c          |  28 +--
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.h          |   8 +-
 drivers/net/bnxt/tf_ulp/ulp_ha_mgr.c          |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_ha_mgr.h          |   2 +-
 drivers/net/dpaa/dpaa_ethdev.c                |   2 +-
 drivers/net/dpaa/dpaa_rxtx.c                  |   2 +-
 drivers/net/ena/base/ena_plat_dpdk.h          |  15 +-
 drivers/net/enic/enic.h                       |   2 +-
 drivers/net/ice/ice_dcf_parent.c              |   8 +-
 drivers/net/ixgbe/ixgbe_ethdev.c              |   6 +-
 drivers/net/ixgbe/ixgbe_ethdev.h              |   2 +-
 drivers/net/mlx5/linux/mlx5_os.c              |   2 +-
 drivers/net/mlx5/mlx5.c                       |  20 +-
 drivers/net/mlx5/mlx5.h                       |   2 +-
 drivers/net/mlx5/mlx5_txpp.c                  |   8 +-
 drivers/net/mlx5/windows/mlx5_flow_os.c       |  10 +-
 drivers/net/mlx5/windows/mlx5_os.c            |   2 +-
 drivers/net/qede/base/bcm_osal.h              |   8 +-
 drivers/net/vhost/rte_eth_vhost.c             |  24 +--
 .../net/virtio/virtio_user/virtio_user_dev.c  |  30 +--
 .../net/virtio/virtio_user/virtio_user_dev.h  |   2 +-
 drivers/vdpa/ifc/ifcvf_vdpa.c                 |  49 +++--
 drivers/vdpa/mlx5/mlx5_vdpa.c                 |  24 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h                 |   4 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c           |  51 ++---
 examples/kni/main.c                           |   1 +
 .../pthread_shim/pthread_shim.h               |   1 +
 lib/eal/common/eal_common_options.c           |   6 +-
 lib/eal/common/eal_common_thread.c            | 105 +++++++++-
 lib/eal/common/eal_common_trace.c             |   1 +
 lib/eal/common/eal_private.h                  |   2 +-
 lib/eal/common/eal_thread.h                   |   6 +
 lib/eal/common/malloc_mp.c                    |   2 +
 lib/eal/common/rte_thread.c                   |  17 ++
 lib/eal/freebsd/eal.c                         |  53 +++--
 lib/eal/freebsd/eal_alarm.c                   |  12 +-
 lib/eal/freebsd/eal_interrupts.c              |   6 +-
 lib/eal/freebsd/eal_thread.c                  |  10 +-
 lib/eal/include/rte_lcore.h                   |   6 +
 lib/eal/include/rte_per_lcore.h               |   2 +-
 lib/eal/include/rte_thread.h                  |  43 ++++
 lib/eal/linux/eal.c                           |  55 +++--
 lib/eal/linux/eal_alarm.c                     |  10 +-
 lib/eal/linux/eal_interrupts.c                |   8 +-
 lib/eal/linux/eal_thread.c                    |  11 +-
 lib/eal/linux/eal_timer.c                     |   6 +-
 lib/eal/version.map                           |   6 +-
 lib/eal/windows/eal.c                         |  44 +++-
 lib/eal/windows/eal_interrupts.c              |   8 +-
 lib/eal/windows/eal_thread.c                  |  35 +---
 lib/eal/windows/eal_windows.h                 |  10 -
 lib/eal/windows/include/pthread.h             | 192 ------------------
 lib/eal/windows/include/rte_windows.h         |   1 +
 lib/eal/windows/meson.build                   |   7 +-
 lib/eal/windows/rte_thread.c                  |  76 +++++++
 lib/ethdev/rte_ethdev.c                       |   4 +-
 lib/ethdev/rte_ethdev_core.h                  |   4 +-
 lib/ethdev/rte_flow.c                         |   4 +-
 lib/eventdev/rte_event_eth_rx_adapter.c       |   1 +
 lib/vhost/vhost.c                             |   1 +
 meson_options.txt                             |   2 +
 97 files changed, 785 insertions(+), 661 deletions(-)
 delete mode 100644 lib/eal/windows/include/pthread.h

-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] ethdev: fix representor port ID search by name
  2021-07-12 16:17  3% [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name Andrew Rybchenko
  2021-07-19  6:58  0% ` Xueming(Steven) Li
  2021-07-29  4:20  0% ` Xueming(Steven) Li
@ 2021-08-18 14:00  3% ` Andrew Rybchenko
  2 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-08-18 14:00 UTC (permalink / raw)
  To: Ajit Khaparde, Somnath Kotur, John Daley, Hyong Youb Kim,
	Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang, Matan Azrad,
	Shahaf Shuler, Viacheslav Ovsiienko, Thomas Monjalon,
	Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, Xueming Li

From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>

Getting a list of representors from a representor does not make sense.
Instead, a parent device should be used.

To this end, extend the rte_eth_dev_data structure to include the port ID
of the parent device for representors.

Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
The new field is added into the hole in rte_eth_dev_data structure.
The patch does not change ABI, but extra care is required since ABI
check is disabled for the structure because of the libabigail bug [1].

Potentially it is bad for out-of-tree drivers which implement
representors but do not fill in a new parert_port_id field in
rte_eth_dev_data structure. Do we care?

May be the patch should add lines to release notes, but I'd like
to get initial feedback first.

mlx5 changes should be reviwed by maintainers very carefully, since
we are not sure if we patch it correctly.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060

 drivers/net/bnxt/bnxt_reps.c             |  1 +
 drivers/net/enic/enic_vf_representor.c   |  1 +
 drivers/net/i40e/i40e_vf_representor.c   |  1 +
 drivers/net/ice/ice_dcf_vf_representor.c |  1 +
 drivers/net/ixgbe/ixgbe_vf_representor.c |  1 +
 drivers/net/mlx5/linux/mlx5_os.c         | 17 +++++++++++++++++
 drivers/net/mlx5/windows/mlx5_os.c       | 17 +++++++++++++++++
 lib/ethdev/ethdev_driver.h               |  6 +++---
 lib/ethdev/rte_class_eth.c               | 22 ++++++++++++++++++++--
 lib/ethdev/rte_ethdev.c                  |  8 ++++----
 lib/ethdev/rte_ethdev_core.h             |  4 ++++
 11 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c
index bdbad53b7d..902591cd39 100644
--- a/drivers/net/bnxt/bnxt_reps.c
+++ b/drivers/net/bnxt/bnxt_reps.c
@@ -187,6 +187,7 @@ int bnxt_representor_init(struct rte_eth_dev *eth_dev, void *params)
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	eth_dev->data->representor_id = rep_params->vf_id;
+	eth_dev->data->parent_port_id = rep_params->parent_dev->data->port_id;
 
 	rte_eth_random_addr(vf_rep_bp->dflt_mac_addr);
 	memcpy(vf_rep_bp->mac_addr, vf_rep_bp->dflt_mac_addr,
diff --git a/drivers/net/enic/enic_vf_representor.c b/drivers/net/enic/enic_vf_representor.c
index 79dd6e5640..6ee7967ce9 100644
--- a/drivers/net/enic/enic_vf_representor.c
+++ b/drivers/net/enic/enic_vf_representor.c
@@ -662,6 +662,7 @@ int enic_vf_representor_init(struct rte_eth_dev *eth_dev, void *init_params)
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	eth_dev->data->representor_id = vf->vf_id;
+	eth_dev->data->parent_port_id = pf->port_id;
 	eth_dev->data->mac_addrs = rte_zmalloc("enic_mac_addr_vf",
 		sizeof(struct rte_ether_addr) *
 		ENIC_UNICAST_PERFECT_FILTERS, 0);
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
index 0481b55381..865b637585 100644
--- a/drivers/net/i40e/i40e_vf_representor.c
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -514,6 +514,7 @@ i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
 	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	ethdev->data->representor_id = representor->vf_id;
+	ethdev->data->parent_port_id = pf->dev_data->parent_port_id;
 
 	/* Setting the number queues allocated to the VF */
 	ethdev->data->nb_rx_queues = vf->vsi->nb_qps;
diff --git a/drivers/net/ice/ice_dcf_vf_representor.c b/drivers/net/ice/ice_dcf_vf_representor.c
index 970461f3e9..c7cd3fd290 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -418,6 +418,7 @@ ice_dcf_vf_repr_init(struct rte_eth_dev *vf_rep_eth_dev, void *init_param)
 
 	vf_rep_eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 	vf_rep_eth_dev->data->representor_id = repr->vf_id;
+	vf_rep_eth_dev->data->parent_port_id = repr->dcf_eth_dev->data->port_id;
 
 	vf_rep_eth_dev->data->mac_addrs = &repr->mac_addr;
 
diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
index d5b636a194..7a2063849e 100644
--- a/drivers/net/ixgbe/ixgbe_vf_representor.c
+++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
@@ -197,6 +197,7 @@ ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
 
 	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 	ethdev->data->representor_id = representor->vf_id;
+	ethdev->data->parent_port_id = representor->pf_ethdev->data->port_id;
 
 	/* Set representor device ops */
 	ethdev->dev_ops = &ixgbe_vf_representor_dev_ops;
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 5f8766aa48..a68fa7beb7 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1677,6 +1677,23 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->representor) {
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 		eth_dev->data->representor_id = priv->representor_id;
+		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+			if (opriv &&
+			    opriv->master &&
+			    opriv->domain_id == priv->domain_id &&
+			    opriv->sh == priv->sh) {
+				eth_dev->data->parent_port_id =
+					rte_eth_devices[port_id].data->port_id;
+				break;
+			}
+		}
+		if (port_id >= RTE_MAX_ETHPORTS) {
+			DRV_LOG(ERR, "no master device for representor");
+			err = ENODEV;
+			goto error;
+		}
 	}
 	priv->mp_id.port_id = eth_dev->data->port_id;
 	strlcpy(priv->mp_id.name, MLX5_MP_NAME, RTE_MP_MAX_NAME_LEN);
diff --git a/drivers/net/mlx5/windows/mlx5_os.c b/drivers/net/mlx5/windows/mlx5_os.c
index 7e1df1c751..0c5a02bfcb 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -543,6 +543,23 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->representor) {
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 		eth_dev->data->representor_id = priv->representor_id;
+		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+			if (opriv &&
+			    opriv->master &&
+			    opriv->domain_id == priv->domain_id &&
+			    opriv->sh == priv->sh) {
+				eth_dev->data->parent_port_id =
+					rte_eth_devices[port_id].data->port_id;
+				break;
+			}
+		}
+		if (port_id >= RTE_MAX_ETHPORTS) {
+			DRV_LOG(ERR, "no master device for representor");
+			err = ENODEV;
+			goto error;
+		}
 	}
 	/*
 	 * Store associated network device interface index. This index
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index fd5b7ca550..d1a1499538 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1287,8 +1287,8 @@ struct rte_eth_devargs {
  * For backward compatibility, if no representor info, direct
  * map legacy VF (no controller and pf).
  *
- * @param ethdev
- *  Handle of ethdev port.
+ * @param port_id
+ *  Port ID of the backing device.
  * @param type
  *  Representor type.
  * @param controller
@@ -1305,7 +1305,7 @@ struct rte_eth_devargs {
  */
 __rte_internal
 int
-rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
+rte_eth_representor_id_get(uint16_t port_id,
 			   enum rte_eth_representor_type type,
 			   int controller, int pf, int representor_port,
 			   uint16_t *repr_id);
diff --git a/lib/ethdev/rte_class_eth.c b/lib/ethdev/rte_class_eth.c
index 1fe5fa1f36..167d2d798c 100644
--- a/lib/ethdev/rte_class_eth.c
+++ b/lib/ethdev/rte_class_eth.c
@@ -95,14 +95,32 @@ eth_representor_cmp(const char *key __rte_unused,
 		c = i / (np * nf);
 		p = (i / nf) % np;
 		f = i % nf;
-		if (rte_eth_representor_id_get(edev,
+		/*
+		 * rte_eth_representor_id_get expects to receive port ID of
+		 * the master device, but in order to maintain compatibility
+		 * with mlx5's hardware bonding and legacy representor
+		 * specification using just VF numbers, the representor's port
+		 * ID is tried first.
+		 */
+		ret = rte_eth_representor_id_get(edev->data->port_id,
 			eth_da.type,
 			eth_da.nb_mh_controllers == 0 ? -1 :
 					eth_da.mh_controllers[c],
 			eth_da.nb_ports == 0 ? -1 : eth_da.ports[p],
 			eth_da.nb_representor_ports == 0 ? -1 :
 					eth_da.representor_ports[f],
-			&id) < 0)
+			&id);
+		if (ret == -ENOTSUP)
+			ret = rte_eth_representor_id_get(
+				edev->data->parent_port_id,
+				eth_da.type,
+				eth_da.nb_mh_controllers == 0 ? -1 :
+						eth_da.mh_controllers[c],
+				eth_da.nb_ports == 0 ? -1 : eth_da.ports[p],
+				eth_da.nb_representor_ports == 0 ? -1 :
+						eth_da.representor_ports[f],
+				&id);
+		if (ret < 0)
 			continue;
 		if (data->representor_id == id)
 			return 0;
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 9d95cd11e1..228ef7bf23 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5997,7 +5997,7 @@ rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
 }
 
 int
-rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
+rte_eth_representor_id_get(uint16_t port_id,
 			   enum rte_eth_representor_type type,
 			   int controller, int pf, int representor_port,
 			   uint16_t *repr_id)
@@ -6013,7 +6013,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 		return -EINVAL;
 
 	/* Get PMD representor range info. */
-	ret = rte_eth_representor_info_get(ethdev->data->port_id, NULL);
+	ret = rte_eth_representor_info_get(port_id, NULL);
 	if (ret == -ENOTSUP && type == RTE_ETH_REPRESENTOR_VF &&
 	    controller == -1 && pf == -1) {
 		/* Direct mapping for legacy VF representor. */
@@ -6028,7 +6028,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 	if (info == NULL)
 		return -ENOMEM;
 	info->nb_ranges_alloc = n;
-	ret = rte_eth_representor_info_get(ethdev->data->port_id, info);
+	ret = rte_eth_representor_info_get(port_id, info);
 	if (ret < 0)
 		goto out;
 
@@ -6047,7 +6047,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 			continue;
 		if (info->ranges[i].id_end < info->ranges[i].id_base) {
 			RTE_LOG(WARNING, EAL, "Port %hu invalid representor ID Range %u - %u, entry %d\n",
-				ethdev->data->port_id, info->ranges[i].id_base,
+				port_id, info->ranges[i].id_base,
 				info->ranges[i].id_end, i);
 			continue;
 
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index edf96de2dc..13cb84b52f 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -185,6 +185,10 @@ struct rte_eth_dev_data {
 			/**< Switch-specific identifier.
 			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
 			 */
+	uint16_t parent_port_id;
+			/**< Port ID of the backing device.
+			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
+			 */
 
 	pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex. */
 	uint64_t reserved_64s[4]; /**< Reserved for future fields */
-- 
2.30.2


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 2/6] eal: add function for control thread creation
  2021-08-18 13:44  4% ` [dpdk-dev] [PATCH v3 " Narcisa Ana Maria Vasile
@ 2021-08-18 13:44  4%   ` Narcisa Ana Maria Vasile
  0 siblings, 0 replies; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-08-18 13:44 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

The existing rte_ctrl_thread_create() function will be replaced
with rte_thread_ctrl_thread_create() that uses the internal
EAL thread API.

This patch only introduces the new control thread creation
function. Replacing of the old function needs to be done according
to the ABI change procedures, to avoid an ABI break.

Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
---
 lib/eal/common/eal_common_thread.c | 81 ++++++++++++++++++++++++++++++
 lib/eal/include/rte_thread.h       | 27 ++++++++++
 lib/eal/version.map                |  1 +
 3 files changed, 109 insertions(+)

diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 1a52f42a2b..79545c67d9 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -259,6 +259,87 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
 	return -ret;
 }
 
+struct rte_thread_ctrl_ctx {
+	rte_thread_func start_routine;
+	void *arg;
+	const char *name;
+};
+
+static void *ctrl_thread_wrapper(void *arg)
+{
+	struct internal_config *conf = eal_get_internal_configuration();
+	rte_cpuset_t *cpuset = &conf->ctrl_cpuset;
+	struct rte_thread_ctrl_ctx *ctx = arg;
+	rte_thread_func start_routine = ctx->start_routine;
+	void *routine_arg = ctx->arg;
+
+	__rte_thread_init(rte_lcore_id(), cpuset);
+
+	if (ctx->name != NULL) {
+		if (rte_thread_name_set(rte_thread_self(), ctx->name) < 0)
+			RTE_LOG(DEBUG, EAL, "Cannot set name for ctrl thread\n");
+	}
+
+	free(arg);
+
+	return start_routine(routine_arg);
+}
+
+int
+rte_thread_ctrl_thread_create(rte_thread_t *thread, const char *name,
+		rte_thread_func start_routine, void *arg)
+{
+	int ret;
+	rte_thread_attr_t attr;
+	struct internal_config *conf = eal_get_internal_configuration();
+	rte_cpuset_t *cpuset = &conf->ctrl_cpuset;
+	struct rte_thread_ctrl_ctx *ctx = NULL;
+
+	if (start_routine == NULL) {
+		ret = EINVAL;
+		goto cleanup;
+	}
+
+	ctx = malloc(sizeof(*ctx));
+	if (ctx == NULL) {
+		ret = ENOMEM;
+		goto cleanup;
+	}
+
+	ctx->start_routine = start_routine;
+	ctx->arg = arg;
+	ctx->name = name;
+
+	ret = rte_thread_attr_init(&attr);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot init ctrl thread attributes\n");
+		goto cleanup;
+	}
+
+	ret = rte_thread_attr_set_affinity(&attr, cpuset);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot set afifnity attribute for ctrl thread\n");
+		goto cleanup;
+	}
+	ret = rte_thread_attr_set_priority(&attr, RTE_THREAD_PRIORITY_NORMAL);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot set priority attribute for ctrl thread\n");
+		goto cleanup;
+	}
+
+	ret = rte_thread_create(thread, &attr, ctrl_thread_wrapper, ctx);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot create ctrl thread\n");
+		goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	free(ctx);
+	return ret;
+}
+
 int
 rte_thread_register(void)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 2f6258e336..e34101cc98 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -455,6 +455,33 @@ int rte_thread_barrier_destroy(rte_thread_barrier *barrier);
 __rte_experimental
 int rte_thread_name_set(rte_thread_t thread_id, const char *name);
 
+/**
+ * Create a control thread.
+ *
+ * Set affinity and thread name. The affinity of the new thread is based
+ * on the CPU affinity retrieved at the time rte_eal_init() was called,
+ * the dataplane and service lcores are then excluded.
+ *
+ * @param thread
+ *   Filled with the thread id of the new created thread.
+ *
+ * @param name
+ *   The name of the control thread (max 16 characters including '\0').
+ *
+ * @param start_routine
+ *   Function to be executed by the new thread.
+ *
+ * @param arg
+ *   Argument passed to start_routine.
+ *
+ * @return
+ *   On success, return 0;
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_ctrl_thread_create(rte_thread_t *thread, const char *name,
+		rte_thread_func start_routine, void *arg);
+
 /**
  * Create a TLS data key visible to all threads in the process.
  * the created key is later used to get/set a value.
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 7ce8dcea07..67569b1bf9 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -447,6 +447,7 @@ EXPERIMENTAL {
 	rte_thread_barrier_wait;
 	rte_thread_barrier_destroy;
 	rte_thread_name_set;
+	rte_thread_ctrl_thread_create;
 };
 
 INTERNAL {
-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 0/6] Enable the internal EAL thread API
  @ 2021-08-18 13:44  4% ` Narcisa Ana Maria Vasile
  2021-08-18 13:44  4%   ` [dpdk-dev] [PATCH v3 2/6] eal: add function for control thread creation Narcisa Ana Maria Vasile
  0 siblings, 1 reply; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-08-18 13:44 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

This patchset enables the new EAL thread API.
The newly defined thread attributes, priority and affinity,
are used in eal/windows when creating the threads. Similarly, 
some changes have been done in eal/linux/eal.c and eal/freebsd/eal.c
to initialize priority to a default value and set thread attributes.

The user is offered the option of either using the rte_thread_* API or
a 3rd party thread library, through a meson flag
called "use_external_thread_lib".
By default, this flag is set to FALSE, which means Windows libraries
and applications will use the EAL rte_thread_* API 
defined in windows/rte_thread.c for managing threads.
When the flag is set to TRUE, the common/rte_thread.c file is compiled
and an external thread library is used.

This patchset adds a new function for creating control threads that
uses the new thread API.
It enables the usage of the new function in Windows code and common code.
The old function is kept to avoid ABI break, however, its definition
is commented away on Windows, since the pthread_t and pthread_attr_t
arguments that it receives have been replaced with the new API on Windows.
This allows testing the "eal: Add EAL API for threading" that this
patchset depends on.

The ethdev lib also contains some changes that break the ABI.
Enabling the new EAL thread API will probably require going through
the proper process of ABI changes.

Depends-on: series-18172 ("eal: Add EAL API for threading")

v3:
- use RTE_INIT to only load kernel32.dll once and get function
  pointer to SetThreadDescription()
- minor fixes

v2:
- fix typo in SetThreadDescription_type function pointer
- add Depends-on on all patches to fix apply errors.
- modify cover letter

Narcisa Vasile (6):
  eal: add function that sets thread name
  eal: add function for control thread creation
  Enable the new EAL thread API in app, drivers and examples
  lib: enable the new EAL thread API
  eal: set affinity and priority attributes
  Allow choice between internal EAL thread API and external lib

 app/test/process.h                            |   8 +-
 app/test/test_lcores.c                        |  18 +-
 app/test/test_link_bonding.c                  |  14 +-
 app/test/test_lpm_perf.c                      |  12 +-
 config/meson.build                            |   1 -
 drivers/bus/dpaa/base/qbman/bman_driver.c     |   5 +-
 drivers/bus/dpaa/base/qbman/dpaa_sys.c        |  14 +-
 drivers/bus/dpaa/base/qbman/process.c         |   6 +-
 drivers/bus/dpaa/dpaa_bus.c                   |  14 +-
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.c      |  19 +-
 drivers/common/dpaax/compat.h                 |   2 +-
 drivers/common/mlx5/windows/mlx5_common_os.h  |   1 +
 drivers/compress/mlx5/mlx5_compress.c         |  14 +-
 drivers/event/dlb2/dlb2.c                     |   2 +-
 drivers/event/dlb2/pf/base/dlb2_osdep.h       |   7 +-
 drivers/mempool/dpaa/dpaa_mempool.c           |   2 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c           |  18 +-
 drivers/net/ark/ark_ethdev.c                  |   4 +-
 drivers/net/ark/ark_pktgen.c                  |   4 +-
 drivers/net/atlantic/atl_ethdev.c             |   4 +-
 drivers/net/atlantic/atl_types.h              |   4 +-
 .../net/atlantic/hw_atl/hw_atl_utils_fw2x.c   |  26 +--
 drivers/net/axgbe/axgbe_common.h              |   2 +-
 drivers/net/axgbe/axgbe_dev.c                 |   8 +-
 drivers/net/axgbe/axgbe_ethdev.c              |   8 +-
 drivers/net/axgbe/axgbe_ethdev.h              |   8 +-
 drivers/net/axgbe/axgbe_i2c.c                 |   4 +-
 drivers/net/axgbe/axgbe_mdio.c                |   8 +-
 drivers/net/axgbe/axgbe_phy_impl.c            |   6 +-
 drivers/net/bnxt/bnxt.h                       |  16 +-
 drivers/net/bnxt/bnxt_cpr.c                   |   4 +-
 drivers/net/bnxt/bnxt_ethdev.c                |  54 ++---
 drivers/net/bnxt/bnxt_irq.c                   |   8 +-
 drivers/net/bnxt/bnxt_reps.c                  |  10 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c            |  34 ++--
 drivers/net/bnxt/tf_ulp/bnxt_ulp.h            |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c          |  28 +--
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.h          |   8 +-
 drivers/net/bnxt/tf_ulp/ulp_ha_mgr.c          |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_ha_mgr.h          |   2 +-
 drivers/net/dpaa/dpaa_ethdev.c                |   2 +-
 drivers/net/dpaa/dpaa_rxtx.c                  |   2 +-
 drivers/net/ena/base/ena_plat_dpdk.h          |  15 +-
 drivers/net/enic/enic.h                       |   2 +-
 drivers/net/ice/ice_dcf_parent.c              |   8 +-
 drivers/net/ixgbe/ixgbe_ethdev.c              |   6 +-
 drivers/net/ixgbe/ixgbe_ethdev.h              |   2 +-
 drivers/net/mlx5/linux/mlx5_os.c              |   2 +-
 drivers/net/mlx5/mlx5.c                       |  20 +-
 drivers/net/mlx5/mlx5.h                       |   2 +-
 drivers/net/mlx5/mlx5_txpp.c                  |   8 +-
 drivers/net/mlx5/windows/mlx5_flow_os.c       |  10 +-
 drivers/net/mlx5/windows/mlx5_os.c            |   2 +-
 drivers/net/qede/base/bcm_osal.h              |   8 +-
 drivers/net/vhost/rte_eth_vhost.c             |  24 +--
 .../net/virtio/virtio_user/virtio_user_dev.c  |  30 +--
 .../net/virtio/virtio_user/virtio_user_dev.h  |   2 +-
 drivers/vdpa/ifc/ifcvf_vdpa.c                 |  49 +++--
 drivers/vdpa/mlx5/mlx5_vdpa.c                 |  24 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h                 |   4 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c           |  51 ++---
 examples/kni/main.c                           |   1 +
 .../pthread_shim/pthread_shim.h               |   1 +
 lib/eal/common/eal_common_options.c           |   6 +-
 lib/eal/common/eal_common_thread.c            | 105 +++++++++-
 lib/eal/common/eal_common_trace.c             |   1 +
 lib/eal/common/eal_private.h                  |   2 +-
 lib/eal/common/eal_thread.h                   |   6 +
 lib/eal/common/malloc_mp.c                    |   2 +
 lib/eal/common/rte_thread.c                   |  17 ++
 lib/eal/freebsd/eal.c                         |  53 +++--
 lib/eal/freebsd/eal_alarm.c                   |  12 +-
 lib/eal/freebsd/eal_interrupts.c              |   6 +-
 lib/eal/freebsd/eal_thread.c                  |  10 +-
 lib/eal/include/rte_lcore.h                   |   6 +
 lib/eal/include/rte_per_lcore.h               |   2 +-
 lib/eal/include/rte_thread.h                  |  43 ++++
 lib/eal/linux/eal.c                           |  55 +++--
 lib/eal/linux/eal_alarm.c                     |  10 +-
 lib/eal/linux/eal_interrupts.c                |   8 +-
 lib/eal/linux/eal_thread.c                    |  11 +-
 lib/eal/linux/eal_timer.c                     |   6 +-
 lib/eal/version.map                           |   6 +-
 lib/eal/windows/eal.c                         |  44 +++-
 lib/eal/windows/eal_interrupts.c              |   8 +-
 lib/eal/windows/eal_thread.c                  |  35 +---
 lib/eal/windows/eal_windows.h                 |  10 -
 lib/eal/windows/include/pthread.h             | 192 ------------------
 lib/eal/windows/include/rte_windows.h         |   1 +
 lib/eal/windows/meson.build                   |   7 +-
 lib/eal/windows/rte_thread.c                  |  68 +++++++
 lib/ethdev/rte_ethdev.c                       |   4 +-
 lib/ethdev/rte_ethdev_core.h                  |   4 +-
 lib/ethdev/rte_flow.c                         |   4 +-
 lib/eventdev/rte_event_eth_rx_adapter.c       |   1 +
 lib/vhost/vhost.c                             |   1 +
 meson_options.txt                             |   2 +
 97 files changed, 777 insertions(+), 661 deletions(-)
 delete mode 100644 lib/eal/windows/include/pthread.h

-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 2/4] mempool: add non-IO flag
  @ 2021-08-18  9:07  4% ` Dmitry Kozlyuk
  0 siblings, 0 replies; 200+ results
From: Dmitry Kozlyuk @ 2021-08-18  9:07 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Olivier Matz, Andrew Rybchenko

Mempool is a generic allocator that is not necessarily used for device
IO operations and its memory for DMA. Add MEMPOOL_F_NON_IO flag to mark
such mempools.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/rel_notes/release_21_11.rst | 3 +++
 lib/mempool/rte_mempool.h              | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index d707a554ef..dc9b98b862 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -84,6 +84,9 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* mempool: Added ``MEMPOOL_F_NON_IO`` flag to give a hint to DPDK components
+  that objects from this pool will not be used for device IO (e.g. DMA).
+
 
 ABI Changes
 -----------
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 1e9b8f0229..7f0657ab16 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -263,6 +263,7 @@ struct rte_mempool {
 #define MEMPOOL_F_SC_GET         0x0008 /**< Default get is "single-consumer".*/
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
+#define MEMPOOL_F_NON_IO         0x0040 /**< Not used for device IO (DMA). */
 
 /**
  * @internal When debug is enabled, store some statistics.
@@ -992,6 +993,9 @@ typedef void (rte_mempool_ctor_t)(struct rte_mempool *, void *);
  *     "single-consumer". Otherwise, it is "multi-consumers".
  *   - MEMPOOL_F_NO_IOVA_CONTIG: If set, allocated objects won't
  *     necessarily be contiguous in IO memory.
+ *   - MEMPOOL_F_NO_IO: If set, the mempool is considered to be
+ *     never used for device IO, i.e. DMA operations,
+ *     which may affect some PMD behavior.
  * @return
  *   The pointer to the new allocated mempool, on success. NULL on error
  *   with rte_errno set appropriately. Possible rte_errno values include:
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-ci] [PATCH] version: 21.11-rc0
  2021-08-17 15:19  0%     ` David Marchand
@ 2021-08-17 16:02  0%       ` Ali Alnubani
  0 siblings, 0 replies; 200+ results
From: Ali Alnubani @ 2021-08-17 16:02 UTC (permalink / raw)
  To: David Marchand, Lincoln Lavoie, NBU-Contact-Thomas Monjalon
  Cc: ci, dev, Ray Kinsella

Hi,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, August 17, 2021 6:20 PM
> To: Lincoln Lavoie <lylavoie@iol.unh.edu>
> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; ci@dpdk.org;
> dev <dev@dpdk.org>; Ray Kinsella <mdr@ashroe.eu>; Ali Alnubani
> <alialnu@nvidia.com>
> Subject: Re: [dpdk-ci] [PATCH] version: 21.11-rc0
> 
> On Tue, Aug 17, 2021 at 2:04 PM Lincoln Lavoie <lylavoie@iol.unh.edu>
> wrote:
> >
> > Hi David,
> >
> > ABI testing was disable / stopped on Friday in the Community CI lab.
> Patches from before that for 21.11 would have still had the test run and could
> have failures listed. I'm not sure if there is a way to "remove" those failure
> marks from patchworks.  But, for all new patches since then, ABI hasn't been
> run.
> 
> I don't think we can easily clean those reports in patchwork.
> Copying Ali, in case he has an idea but otherwise we can live with this.
> 

We can override each check by another one with "success" as the status and "skipped" as the description maybe?

> Thanks Lincoln.
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-ci] [PATCH] version: 21.11-rc0
  2021-08-17 12:04  4%   ` [dpdk-dev] [dpdk-ci] " Lincoln Lavoie
@ 2021-08-17 15:19  0%     ` David Marchand
  2021-08-17 16:02  0%       ` Ali Alnubani
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2021-08-17 15:19 UTC (permalink / raw)
  To: Lincoln Lavoie; +Cc: Thomas Monjalon, ci, dev, Ray Kinsella, Ali Alnubani

On Tue, Aug 17, 2021 at 2:04 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote:
>
> Hi David,
>
> ABI testing was disable / stopped on Friday in the Community CI lab.  Patches from before that for 21.11 would have still had the test run and could have failures listed. I'm not sure if there is a way to "remove" those failure marks from patchworks.  But, for all new patches since then, ABI hasn't been run.

I don't think we can easily clean those reports in patchwork.
Copying Ali, in case he has an idea but otherwise we can live with this.

Thanks Lincoln.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-ci] [PATCH] version: 21.11-rc0
  2021-08-17  6:34  4% ` [dpdk-dev] " David Marchand
@ 2021-08-17 12:04  4%   ` Lincoln Lavoie
  2021-08-17 15:19  0%     ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Lincoln Lavoie @ 2021-08-17 12:04 UTC (permalink / raw)
  To: David Marchand; +Cc: Thomas Monjalon, ci, dev, Ray Kinsella

Hi David,

ABI testing was disable / stopped on Friday in the Community CI lab.
Patches from before that for 21.11 would have still had the test run and
could have failures listed. I'm not sure if there is a way to "remove"
those failure marks from patchworks.  But, for all new patches since then,
ABI hasn't been run.

Cheers,
Lincoln

On Tue, Aug 17, 2021 at 2:34 AM David Marchand <david.marchand@redhat.com>
wrote:

> On Sun, Aug 8, 2021 at 9:27 PM Thomas Monjalon <thomas@monjalon.net>
> wrote:
> > diff --git a/doc/guides/rel_notes/release_21_11.rst
> b/doc/guides/rel_notes/release_21_11.rst
> > new file mode 100644
> > index 0000000000..d707a554ef
> > --- /dev/null
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -0,0 +1,136 @@
>
> [snip]
>
> > +Known Issues
> > +------------
> > +
> > +.. This section should contain new known issues in this release. Sample
> format:
> > +
> > +   * **Add title in present tense with full stop.**
> > +
> > +     Add a short 1-2 sentence description of the known issue
> > +     in the present tense. Add information on any known workarounds.
> > +
> > +   This section is a comment. Do not overwrite or remove it.
> > +   Also, make sure to start the actual text at the margin.
> > +   =======================================================
> > +
> > +
>
> The known issue "**Last mbuf segment not implicitly reset.**" added in
> 21.08 release notes still applies to 21.11.
> But this can be fixed later, patches are starting to accumulate and
> some CI failures are due to patches being applied to 21.08.
>
> The rest lgtm, so:
> Acked-by: David Marchand <david.marchand@redhat.com>
>
> Applied, thanks.
>
>
> On this last subject, this mail is a ping to CI labs owners.
> 21.11 release won't preserve ABI compat with previous releases, so
> please disable ABI checks until 22.02.
>
>
> --
> David Marchand
>
>

-- 
*Lincoln Lavoie*
Principal Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
lylavoie@iol.unh.edu
https://www.iol.unh.edu
+1-603-674-2755 (m)
<https://www.iol.unh.edu>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 21.11 v2 0/3] octeontx build only on 64-bit Linux
  @ 2021-08-17  8:46  0%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2021-08-17  8:46 UTC (permalink / raw)
  To: Pavan Nikhilesh; +Cc: dev, Thomas Monjalon, Jerin Jacob Kollanukkaran

On Thu, Mar 25, 2021 at 3:52 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> This is a reorg of the patches from Pavan.
> It has been discussed that it should wait for DPDK 21.11
> for ABI compatibility reason.
>
> Pavan Nikhilesh (3):
>   net/thunderx: enable build only on 64-bit Linux
>   common/octeontx: enable build only on 64-bit Linux
>   common/octeontx2: enable build only on 64-bit Linux
>
>  drivers/common/octeontx/meson.build   |  6 ++++++
>  drivers/common/octeontx2/meson.build  |  4 ++--
>  drivers/compress/octeontx/meson.build |  6 ++++++
>  drivers/crypto/octeontx/meson.build   |  7 +++++--
>  drivers/event/octeontx/meson.build    |  6 ++++++
>  drivers/event/octeontx2/meson.build   |  4 ++--
>  drivers/mempool/octeontx/meson.build  |  5 +++--
>  drivers/mempool/octeontx2/meson.build |  9 ++-------
>  drivers/net/octeontx/meson.build      |  4 ++--
>  drivers/net/octeontx2/meson.build     | 10 ++--------
>  drivers/net/thunderx/meson.build      |  4 ++--
>  drivers/raw/octeontx2_dma/meson.build | 10 ++++++----
>  12 files changed, 44 insertions(+), 31 deletions(-)

There were a couple of cleanups (indent etc..) and changes in meson files.
This series does not apply cleanly on the main branch.
Could you rebase it?

I noticed that the net/cnxk driver does not have this check, but it is
disabled anyway since it depends on common/cnxk.
Is it worth adding the check for consistency?


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] version: 21.11-rc0
  2021-08-08 19:26 11% [dpdk-dev] [PATCH] version: 21.11-rc0 Thomas Monjalon
  2021-08-12 14:36  0% ` Ferruh Yigit
@ 2021-08-17  6:34  4% ` David Marchand
  2021-08-17 12:04  4%   ` [dpdk-dev] [dpdk-ci] " Lincoln Lavoie
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2021-08-17  6:34 UTC (permalink / raw)
  To: Thomas Monjalon, ci; +Cc: dev, Ray Kinsella

On Sun, Aug 8, 2021 at 9:27 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
> new file mode 100644
> index 0000000000..d707a554ef
> --- /dev/null
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -0,0 +1,136 @@

[snip]

> +Known Issues
> +------------
> +
> +.. This section should contain new known issues in this release. Sample format:
> +
> +   * **Add title in present tense with full stop.**
> +
> +     Add a short 1-2 sentence description of the known issue
> +     in the present tense. Add information on any known workarounds.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +   =======================================================
> +
> +

The known issue "**Last mbuf segment not implicitly reset.**" added in
21.08 release notes still applies to 21.11.
But this can be fixed later, patches are starting to accumulate and
some CI failures are due to patches being applied to 21.08.

The rest lgtm, so:
Acked-by: David Marchand <david.marchand@redhat.com>

Applied, thanks.


On this last subject, this mail is a ping to CI labs owners.
21.11 release won't preserve ABI compat with previous releases, so
please disable ABI checks until 22.02.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] Minutes of Technical Board Meeting, 2021-08-11
       [not found]     <e600e472-2b39-7f07-d20e-9d6fe8e6d515@intel.com>
@ 2021-08-16  9:34  3% ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-08-16  9:34 UTC (permalink / raw)
  To: techboard; +Cc: dev

Minutes of Technical Board Meeting, 2021-08-11

Members Attending: 8/12
   - Aaron Conole
   - Ferruh Yigit (Chair)
   - Hemant Agrawal
   - Honnappa Nagarahalli
   - Jerin Jacob
   - Kevin Traynor
   - Konstantin Ananyev
   - Stephen Hemminger

NOTE: The Technical Board meetings take place every second Wednesday
on https://meet.jit.si/DPDK at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.
Agenda and minutes can be found at http://core.dpdk.org/techboard/minutes

NOTE: Next meeting will be on Wednesday 2021-08-25 @3pm UTC,
and will be chaired by Hemant.


#1 Extending stable ABI / API to two years
    * No decision given yet, left decision to next meeting.
    * Can continue executing the tasks listed in the excel sheet during v21.11

#2 Documenting criteria on adding/removing members to technical board
    * Document needs further reviews, please review.
    * Will set a deadline for the document review in next meeting.

#3 Atomic API
    * Atomic built-ins used because of old compilers.
    * If we can drop old compiler support, we can switch to atomic APIs.
    * Discussion to drop RHEL7 is going on in the mail list.

#4 Exception path sample app
    * No objection to have the sample app in principal.
    * Details and design can be discussed more when patches are available.

#5 github repo access for extending CI for Arm support
    * Honnappa and Aaron will figure out the details on what is exactly required
    * Later we can crate an policy around it, right now it is for
      Thomas/Aaron/Honnappa to manage.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6] eal: remove sys/queue.h from public headers.
  2021-08-13  3:36  1%     ` [dpdk-dev] [PATCHv5] " William Tu
  2021-08-13 18:59  0%       ` Dmitry Kozlyuk
@ 2021-08-14  2:51  1%       ` William Tu
  2021-08-18 23:26  1%         ` [dpdk-dev] [PATCH v7] " William Tu
  1 sibling, 1 reply; 200+ results
From: William Tu @ 2021-08-14  2:51 UTC (permalink / raw)
  To: dev; +Cc: Dmitry.Kozliuk, nick.connolly

Currently there are some public headers that include 'sys/queue.h', which
is not POSIX, but usually provided by the Linux/BSD system library.
(Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
The file is missing on Windows. During the Windows build, DPDK uses a
bundled copy, so building a DPDK library works fine.  But when OVS or other
applications use DPDK as a library, because some DPDK public headers
include 'sys/queue.h', on Windows, it triggers an error due to no such
file.

One solution is to install the 'lib/eal/windows/include/sys/queue.h' into
Windows environment, such as [1]. However, this means DPDK exports the
functionalities of 'sys/queue.h' into the environment, which might cause
symbols, macros, headers clashing with other applications.

The patch fixes it by removing the "#include <sys/queue.h>" from
DPDK public headers, so programs including DPDK headers don't depend
on the system to provide 'sys/queue.h'. When these public headers use
macros such as TAILQ_xxx, we replace it by the ones with RTE_ prefix.
For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
in Windows EAL. Note that these RTE_ macros are compatible with
<sys/queue.h>, both at the level of API (to use with <sys/queue.h>
macros in C files) and ABI (to avoid breaking it).

Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
the patch replaces it with RTE_TAILQ_FOREACH_SAFE.

[1] http://mails.dpdk.org/archives/dev/2021-August/216304.html

Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
Suggested-by: Dmitry Kozliuk <Dmitry.Kozliuk@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
---
v5-v6:
* fix tab/indent issue, fix type and spelling
* fix duplicate RTE_TAILQ_FOREACH_SAFE
* fix build error due to drivers/net/mlx5/mlx5_flow_meter.c
---
 drivers/bus/auxiliary/private.h            |  1 +
 drivers/bus/auxiliary/rte_bus_auxiliary.h  |  5 ++--
 drivers/bus/dpaa/dpaa_bus.c                |  4 +--
 drivers/bus/fslmc/fslmc_bus.c              |  4 +--
 drivers/bus/fslmc/fslmc_vfio.c             |  9 ++++---
 drivers/bus/ifpga/rte_bus_ifpga.h          |  8 +++---
 drivers/bus/pci/pci_params.c               |  2 ++
 drivers/bus/pci/rte_bus_pci.h              | 13 +++++----
 drivers/bus/pci/windows/pci.c              |  3 +++
 drivers/bus/pci/windows/pci_netuio.c       |  2 ++
 drivers/bus/vdev/rte_bus_vdev.h            |  7 +++--
 drivers/bus/vdev/vdev.c                    |  3 ++-
 drivers/bus/vmbus/rte_bus_vmbus.h          | 13 +++++----
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c         |  2 +-
 drivers/net/bonding/rte_eth_bond_flow.c    |  2 +-
 drivers/net/failsafe/failsafe_flow.c       |  2 +-
 drivers/net/i40e/i40e_ethdev.c             |  9 ++++---
 drivers/net/i40e/i40e_ethdev.h             |  1 +
 drivers/net/i40e/i40e_flow.c               |  6 ++---
 drivers/net/i40e/i40e_hash.c               |  2 +-
 drivers/net/i40e/rte_pmd_i40e.c            |  6 ++---
 drivers/net/iavf/iavf_generic_flow.c       | 14 +++++-----
 drivers/net/ice/ice_dcf_ethdev.c           |  1 +
 drivers/net/ice/ice_ethdev.c               |  4 +--
 drivers/net/ice/ice_generic_flow.c         | 14 +++++-----
 drivers/net/ipn3ke/ipn3ke_flow.c           |  2 +-
 drivers/net/mlx5/mlx5_flow_dv.c            |  2 +-
 drivers/net/mlx5/mlx5_flow_meter.c         |  2 +-
 drivers/net/softnic/rte_eth_softnic_flow.c |  3 ++-
 drivers/net/softnic/rte_eth_softnic_swq.c  |  2 +-
 drivers/raw/dpaa2_qdma/dpaa2_qdma.c        |  2 +-
 lib/bbdev/rte_bbdev.h                      |  2 +-
 lib/cryptodev/rte_cryptodev.h              |  2 +-
 lib/cryptodev/rte_cryptodev_pmd.h          |  2 +-
 lib/eal/common/eal_common_devargs.c        |  6 +++--
 lib/eal/common/eal_common_fbarray.c        |  1 +
 lib/eal/common/eal_common_log.c            |  1 +
 lib/eal/common/eal_common_memalloc.c       |  1 +
 lib/eal/common/eal_common_options.c        |  3 ++-
 lib/eal/common/eal_trace.h                 |  2 ++
 lib/eal/freebsd/include/rte_os.h           | 15 +++++++++++
 lib/eal/include/rte_bus.h                  |  5 ++--
 lib/eal/include/rte_class.h                |  6 ++---
 lib/eal/include/rte_dev.h                  |  5 ++--
 lib/eal/include/rte_devargs.h              |  3 +--
 lib/eal/include/rte_log.h                  |  1 -
 lib/eal/include/rte_service.h              |  1 -
 lib/eal/include/rte_tailq.h                | 15 +++--------
 lib/eal/linux/include/rte_os.h             | 15 +++++++++++
 lib/eal/windows/eal_alarm.c                |  1 +
 lib/eal/windows/include/rte_os.h           | 31 ++++++++++++++++++++++
 lib/efd/rte_efd.c                          |  2 +-
 lib/ethdev/rte_ethdev_core.h               |  2 +-
 lib/hash/rte_fbk_hash.h                    |  1 -
 lib/hash/rte_thash.c                       |  2 ++
 lib/ip_frag/rte_ip_frag.h                  |  4 +--
 lib/mempool/rte_mempool.c                  |  2 +-
 lib/mempool/rte_mempool.h                  |  9 +++----
 lib/pci/rte_pci.h                          |  1 -
 lib/ring/rte_ring_core.h                   |  1 -
 lib/table/rte_swx_table.h                  |  7 ++---
 lib/table/rte_swx_table_selector.h         |  5 ++--
 lib/vhost/iotlb.c                          | 11 ++++----
 lib/vhost/rte_vdpa_dev.h                   |  2 +-
 lib/vhost/vdpa.c                           |  2 +-
 65 files changed, 192 insertions(+), 127 deletions(-)

diff --git a/drivers/bus/auxiliary/private.h b/drivers/bus/auxiliary/private.h
index 9987e8b501..d22e83cf7a 100644
--- a/drivers/bus/auxiliary/private.h
+++ b/drivers/bus/auxiliary/private.h
@@ -7,6 +7,7 @@
 
 #include <stdbool.h>
 #include <stdio.h>
+#include <sys/queue.h>
 
 #include "rte_bus_auxiliary.h"
 
diff --git a/drivers/bus/auxiliary/rte_bus_auxiliary.h b/drivers/bus/auxiliary/rte_bus_auxiliary.h
index 2462bad2ba..b1f5610404 100644
--- a/drivers/bus/auxiliary/rte_bus_auxiliary.h
+++ b/drivers/bus/auxiliary/rte_bus_auxiliary.h
@@ -19,7 +19,6 @@ extern "C" {
 #include <stdlib.h>
 #include <limits.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -113,7 +112,7 @@ typedef int (rte_auxiliary_dma_unmap_t)(struct rte_auxiliary_device *dev,
  * A structure describing an auxiliary device.
  */
 struct rte_auxiliary_device {
-	TAILQ_ENTRY(rte_auxiliary_device) next;   /**< Next probed device. */
+	RTE_TAILQ_ENTRY(rte_auxiliary_device) next; /**< Next probed device. */
 	struct rte_device device;                 /**< Inherit core device */
 	char name[RTE_DEV_NAME_MAX_LEN + 1];      /**< ASCII device name */
 	struct rte_intr_handle intr_handle;       /**< Interrupt handle */
@@ -124,7 +123,7 @@ struct rte_auxiliary_device {
  * A structure describing an auxiliary driver.
  */
 struct rte_auxiliary_driver {
-	TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
 	struct rte_driver driver;             /**< Inherit core driver. */
 	struct rte_auxiliary_bus *bus;        /**< Auxiliary bus reference. */
 	rte_auxiliary_match_t *match;         /**< Device match function. */
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index e499305d85..6cab2ae760 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -105,7 +105,7 @@ dpaa_add_to_device_list(struct rte_dpaa_device *newdev)
 	struct rte_dpaa_device *dev = NULL;
 	struct rte_dpaa_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
 		comp = compare_dpaa_devices(newdev, dev);
 		if (comp < 0) {
 			TAILQ_INSERT_BEFORE(dev, newdev, next);
@@ -245,7 +245,7 @@ dpaa_clean_device_list(void)
 	struct rte_dpaa_device *dev = NULL;
 	struct rte_dpaa_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
 		TAILQ_REMOVE(&rte_dpaa_bus.device_list, dev, next);
 		free(dev);
 		dev = NULL;
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index becc455f6b..8c8f8a298d 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -45,7 +45,7 @@ cleanup_fslmc_device_list(void)
 	struct rte_dpaa2_device *dev;
 	struct rte_dpaa2_device *t_dev;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, t_dev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, t_dev) {
 		TAILQ_REMOVE(&rte_fslmc_bus.device_list, dev, next);
 		free(dev);
 		dev = NULL;
@@ -82,7 +82,7 @@ insert_in_device_list(struct rte_dpaa2_device *newdev)
 	struct rte_dpaa2_device *dev = NULL;
 	struct rte_dpaa2_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, tdev) {
 		comp = compare_dpaa2_devname(newdev, dev);
 		if (comp < 0) {
 			TAILQ_INSERT_BEFORE(dev, newdev, next);
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index c8373e627a..852fcfc4dd 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -808,7 +808,8 @@ fslmc_vfio_process_group(void)
 	bool is_dpmcp_in_blocklist = false, is_dpio_in_blocklist = false;
 	int dpmcp_count = 0, dpio_count = 0, current_device;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_MPORTAL) {
 			dpmcp_count++;
 			if (dev->device.devargs &&
@@ -825,7 +826,8 @@ fslmc_vfio_process_group(void)
 
 	/* Search the MCP as that should be initialized first. */
 	current_device = 0;
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_MPORTAL) {
 			current_device++;
 			if (dev->device.devargs &&
@@ -872,7 +874,8 @@ fslmc_vfio_process_group(void)
 	}
 
 	current_device = 0;
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_IO)
 			current_device++;
 		if (dev->device.devargs &&
diff --git a/drivers/bus/ifpga/rte_bus_ifpga.h b/drivers/bus/ifpga/rte_bus_ifpga.h
index b43084155a..a85e90d384 100644
--- a/drivers/bus/ifpga/rte_bus_ifpga.h
+++ b/drivers/bus/ifpga/rte_bus_ifpga.h
@@ -28,9 +28,9 @@ struct rte_afu_device;
 struct rte_afu_driver;
 
 /** Double linked list of Intel FPGA AFU device. */
-TAILQ_HEAD(ifpga_afu_dev_list, rte_afu_device);
+RTE_TAILQ_HEAD(ifpga_afu_dev_list, rte_afu_device);
 /** Double linked list of Intel FPGA AFU device drivers. */
-TAILQ_HEAD(ifpga_afu_drv_list, rte_afu_driver);
+RTE_TAILQ_HEAD(ifpga_afu_drv_list, rte_afu_driver);
 
 #define IFPGA_BUS_BITSTREAM_PATH_MAX_LEN 256
 
@@ -71,7 +71,7 @@ struct rte_afu_shared {
  * A structure describing a AFU device.
  */
 struct rte_afu_device {
-	TAILQ_ENTRY(rte_afu_device) next;       /**< Next in device list. */
+	RTE_TAILQ_ENTRY(rte_afu_device) next;       /**< Next in device list. */
 	struct rte_device device;               /**< Inherit core device */
 	struct rte_rawdev *rawdev;    /**< Point Rawdev */
 	struct rte_afu_id id;                   /**< AFU id within FPGA. */
@@ -105,7 +105,7 @@ typedef int (afu_remove_t)(struct rte_afu_device *);
  * A structure describing a AFU device.
  */
 struct rte_afu_driver {
-	TAILQ_ENTRY(rte_afu_driver) next;       /**< Next afu driver. */
+	RTE_TAILQ_ENTRY(rte_afu_driver) next;   /**< Next afu driver. */
 	struct rte_driver driver;               /**< Inherit core driver. */
 	afu_probe_t *probe;                     /**< Device Probe function. */
 	afu_remove_t *remove;                   /**< Device Remove function. */
diff --git a/drivers/bus/pci/pci_params.c b/drivers/bus/pci/pci_params.c
index 3192e9c967..717388753d 100644
--- a/drivers/bus/pci/pci_params.c
+++ b/drivers/bus/pci/pci_params.c
@@ -2,6 +2,8 @@
  * Copyright 2018 Gaëtan Rivet
  */
 
+#include <sys/queue.h>
+
 #include <rte_bus.h>
 #include <rte_bus_pci.h>
 #include <rte_dev.h>
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 583470e831..673a2850c1 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -19,7 +19,6 @@ extern "C" {
 #include <stdlib.h>
 #include <limits.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -37,16 +36,16 @@ struct rte_pci_device;
 struct rte_pci_driver;
 
 /** List of PCI devices */
-TAILQ_HEAD(rte_pci_device_list, rte_pci_device);
+RTE_TAILQ_HEAD(rte_pci_device_list, rte_pci_device);
 /** List of PCI drivers */
-TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
+RTE_TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
 
 /* PCI Bus iterators */
 #define FOREACH_DEVICE_ON_PCIBUS(p)	\
-		TAILQ_FOREACH(p, &(rte_pci_bus.device_list), next)
+		RTE_TAILQ_FOREACH(p, &(rte_pci_bus.device_list), next)
 
 #define FOREACH_DRIVER_ON_PCIBUS(p)	\
-		TAILQ_FOREACH(p, &(rte_pci_bus.driver_list), next)
+		RTE_TAILQ_FOREACH(p, &(rte_pci_bus.driver_list), next)
 
 struct rte_devargs;
 
@@ -64,7 +63,7 @@ enum rte_pci_kernel_driver {
  * A structure describing a PCI device.
  */
 struct rte_pci_device {
-	TAILQ_ENTRY(rte_pci_device) next;   /**< Next probed PCI device. */
+	RTE_TAILQ_ENTRY(rte_pci_device) next;   /**< Next probed PCI device. */
 	struct rte_device device;           /**< Inherit core device */
 	struct rte_pci_addr addr;           /**< PCI location. */
 	struct rte_pci_id id;               /**< PCI ID. */
@@ -160,7 +159,7 @@ typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
-	TAILQ_ENTRY(rte_pci_driver) next;  /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_pci_driver) next;  /**< Next in list. */
 	struct rte_driver driver;          /**< Inherit core driver. */
 	struct rte_pci_bus *bus;           /**< PCI bus reference. */
 	rte_pci_probe_t *probe;            /**< Device probe function. */
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index d39a7748b8..d7bd5d6e80 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -1,6 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2020 Mellanox Technologies, Ltd
  */
+
+#include <sys/queue.h>
+
 #include <rte_windows.h>
 #include <rte_errno.h>
 #include <rte_log.h>
diff --git a/drivers/bus/pci/windows/pci_netuio.c b/drivers/bus/pci/windows/pci_netuio.c
index 1bf9133f71..a0b175a8fc 100644
--- a/drivers/bus/pci/windows/pci_netuio.c
+++ b/drivers/bus/pci/windows/pci_netuio.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2020 Intel Corporation.
  */
 
+#include <sys/queue.h>
+
 #include <rte_windows.h>
 #include <rte_errno.h>
 #include <rte_log.h>
diff --git a/drivers/bus/vdev/rte_bus_vdev.h b/drivers/bus/vdev/rte_bus_vdev.h
index fc315d10fa..2856799953 100644
--- a/drivers/bus/vdev/rte_bus_vdev.h
+++ b/drivers/bus/vdev/rte_bus_vdev.h
@@ -15,12 +15,11 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
 
 struct rte_vdev_device {
-	TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
+	RTE_TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
 	struct rte_device device;               /**< Inherit core device */
 };
 
@@ -53,7 +52,7 @@ rte_vdev_device_args(const struct rte_vdev_device *dev)
 }
 
 /** Double linked list of virtual device drivers. */
-TAILQ_HEAD(vdev_driver_list, rte_vdev_driver);
+RTE_TAILQ_HEAD(vdev_driver_list, rte_vdev_driver);
 
 /**
  * Probe function called for each virtual device driver once.
@@ -107,7 +106,7 @@ typedef int (rte_vdev_dma_unmap_t)(struct rte_vdev_device *dev, void *addr,
  * A virtual device driver abstraction.
  */
 struct rte_vdev_driver {
-	TAILQ_ENTRY(rte_vdev_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_vdev_driver) next; /**< Next in list. */
 	struct rte_driver driver;        /**< Inherited general driver. */
 	rte_vdev_probe_t *probe;         /**< Virtual device probe function. */
 	rte_vdev_remove_t *remove;       /**< Virtual device remove function. */
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index 281a2c34e8..a8d8b2327e 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -100,7 +100,8 @@ rte_vdev_remove_custom_scan(rte_vdev_scan_callback callback, void *user_arg)
 	struct vdev_custom_scan *custom_scan, *tmp_scan;
 
 	rte_spinlock_lock(&vdev_custom_scan_lock);
-	TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next, tmp_scan) {
+	RTE_TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next,
+				tmp_scan) {
 		if (custom_scan->callback != callback ||
 				(custom_scan->user_arg != (void *)-1 &&
 				custom_scan->user_arg != user_arg))
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h
index 4cf73ce815..6bcff66468 100644
--- a/drivers/bus/vmbus/rte_bus_vmbus.h
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -20,7 +20,6 @@ extern "C" {
 #include <limits.h>
 #include <stdbool.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -38,15 +37,15 @@ struct rte_vmbus_bus;
 struct vmbus_channel;
 struct vmbus_mon_page;
 
-TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
-TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
+RTE_TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
+RTE_TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
 
 /* VMBus iterators */
 #define FOREACH_DEVICE_ON_VMBUS(p)	\
-	TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
+	RTE_TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
 
 #define FOREACH_DRIVER_ON_VMBUS(p)	\
-	TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
+	RTE_TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
 
 /** Maximum number of VMBUS resources. */
 enum hv_uio_map {
@@ -62,7 +61,7 @@ enum hv_uio_map {
  * A structure describing a VMBUS device.
  */
 struct rte_vmbus_device {
-	TAILQ_ENTRY(rte_vmbus_device) next;    /**< Next probed VMBUS device */
+	RTE_TAILQ_ENTRY(rte_vmbus_device) next; /**< Next probed VMBUS device */
 	const struct rte_vmbus_driver *driver; /**< Associated driver */
 	struct rte_device device;              /**< Inherit core device */
 	rte_uuid_t device_id;		       /**< VMBUS device id */
@@ -93,7 +92,7 @@ typedef int (vmbus_remove_t)(struct rte_vmbus_device *);
  * A structure describing a VMBUS driver.
  */
 struct rte_vmbus_driver {
-	TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
 	struct rte_driver driver;
 	struct rte_vmbus_bus *bus;          /**< VM bus reference. */
 	vmbus_probe_t *probe;               /**< Device Probe function. */
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index dbf85e4eda..ac86b70caf 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -2018,7 +2018,7 @@ bnxt_ulp_cntxt_list_del(struct bnxt_ulp_context *ulp_ctx)
 	struct ulp_context_list_entry	*entry, *temp;
 
 	rte_spinlock_lock(&bnxt_ulp_ctxt_lock);
-	TAILQ_FOREACH_SAFE(entry, &ulp_cntx_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(entry, &ulp_cntx_list, next, temp) {
 		if (entry->ulp_ctx == ulp_ctx) {
 			TAILQ_REMOVE(&ulp_cntx_list, entry, next);
 			rte_free(entry);
diff --git a/drivers/net/bonding/rte_eth_bond_flow.c b/drivers/net/bonding/rte_eth_bond_flow.c
index 417f76bf60..65b77faae7 100644
--- a/drivers/net/bonding/rte_eth_bond_flow.c
+++ b/drivers/net/bonding/rte_eth_bond_flow.c
@@ -157,7 +157,7 @@ bond_flow_flush(struct rte_eth_dev *dev, struct rte_flow_error *err)
 	/* Destroy all bond flows from its slaves instead of flushing them to
 	 * keep the LACP flow or any other external flows.
 	 */
-	TAILQ_FOREACH_SAFE(flow, &internals->flow_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &internals->flow_list, next, tmp) {
 		lret = bond_flow_destroy(dev, flow, err);
 		if (unlikely(lret != 0))
 			ret = lret;
diff --git a/drivers/net/failsafe/failsafe_flow.c b/drivers/net/failsafe/failsafe_flow.c
index 5e2b5f7c67..354f9fec20 100644
--- a/drivers/net/failsafe/failsafe_flow.c
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -180,7 +180,7 @@ fs_flow_flush(struct rte_eth_dev *dev,
 			return ret;
 		}
 	}
-	TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
 		TAILQ_REMOVE(&PRIV(dev)->flow_list, flow, next);
 		fs_flow_release(&flow);
 	}
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 7b230e2ed1..6590363556 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -5436,7 +5436,7 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 
 	/* VSI has child to attach, release child first */
 	if (vsi->veb) {
-		TAILQ_FOREACH_SAFE(vsi_list, &vsi->veb->head, list, temp) {
+		RTE_TAILQ_FOREACH_SAFE(vsi_list, &vsi->veb->head, list, temp) {
 			if (i40e_vsi_release(vsi_list->vsi) != I40E_SUCCESS)
 				return -1;
 		}
@@ -5444,7 +5444,8 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 	}
 
 	if (vsi->floating_veb) {
-		TAILQ_FOREACH_SAFE(vsi_list, &vsi->floating_veb->head, list, temp) {
+		RTE_TAILQ_FOREACH_SAFE(vsi_list, &vsi->floating_veb->head,
+			list, temp) {
 			if (i40e_vsi_release(vsi_list->vsi) != I40E_SUCCESS)
 				return -1;
 		}
@@ -5452,7 +5453,7 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 
 	/* Remove all macvlan filters of the VSI */
 	i40e_vsi_remove_all_macvlan_filter(vsi);
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
 		rte_free(f);
 
 	if (vsi->type != I40E_VSI_MAIN &&
@@ -6055,7 +6056,7 @@ i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on)
 	i = 0;
 
 	/* Remove all existing mac */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		mac_filter[i] = f->mac_info;
 		ret = i40e_vsi_delete_mac(vsi, &f->mac_info.mac_addr);
 		if (ret) {
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cd6deabd60..374b73e4a7 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -6,6 +6,7 @@
 #define _I40E_ETHDEV_H_
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 #include <rte_time.h>
 #include <rte_kvargs.h>
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 3c1570bd9c..e41a84f1d7 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -4917,7 +4917,7 @@ i40e_flow_flush_fdir_filter(struct i40e_pf *pf)
 		}
 
 		/* Delete FDIR flows in flow list. */
-		TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+		RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 			if (flow->filter_type == RTE_ETH_FILTER_FDIR) {
 				TAILQ_REMOVE(&pf->flow_list, flow, node);
 			}
@@ -4972,7 +4972,7 @@ i40e_flow_flush_ethertype_filter(struct i40e_pf *pf)
 	}
 
 	/* Delete ethertype flows in flow list. */
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 		if (flow->filter_type == RTE_ETH_FILTER_ETHERTYPE) {
 			TAILQ_REMOVE(&pf->flow_list, flow, node);
 			rte_free(flow);
@@ -5000,7 +5000,7 @@ i40e_flow_flush_tunnel_filter(struct i40e_pf *pf)
 	}
 
 	/* Delete tunnel flows in flow list. */
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 		if (flow->filter_type == RTE_ETH_FILTER_TUNNEL) {
 			TAILQ_REMOVE(&pf->flow_list, flow, node);
 			rte_free(flow);
diff --git a/drivers/net/i40e/i40e_hash.c b/drivers/net/i40e/i40e_hash.c
index 1fb8c9abfc..6579b1a00b 100644
--- a/drivers/net/i40e/i40e_hash.c
+++ b/drivers/net/i40e/i40e_hash.c
@@ -1366,7 +1366,7 @@ i40e_hash_filter_flush(struct i40e_pf *pf)
 {
 	struct rte_flow *flow, *next;
 
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, next) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, next) {
 		if (flow->filter_type != RTE_ETH_FILTER_HASH)
 			continue;
 
diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index 2e34140c5b..ec24046440 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -216,7 +216,7 @@ i40e_vsi_rm_mac_filter(struct i40e_vsi *vsi)
 	void *temp;
 
 	/* remove all the MACs */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		vlan_num = vsi->vlan_num;
 		filter_type = f->mac_info.filter_type;
 		if (filter_type == I40E_MACVLAN_PERFECT_MATCH ||
@@ -274,7 +274,7 @@ i40e_vsi_restore_mac_filter(struct i40e_vsi *vsi)
 	void *temp;
 
 	/* restore all the MACs */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		if (f->mac_info.filter_type == I40E_MACVLAN_PERFECT_MATCH ||
 		    f->mac_info.filter_type == I40E_MACVLAN_HASH_MATCH) {
 			/**
@@ -563,7 +563,7 @@ rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 	rte_ether_addr_copy(mac_addr, &vf->mac_addr);
 
 	/* Remove all existing mac */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
 		if (i40e_vsi_delete_mac(vsi, &f->mac_info.mac_addr)
 				!= I40E_SUCCESS)
 			PMD_DRV_LOG(WARNING, "Delete MAC failed");
diff --git a/drivers/net/iavf/iavf_generic_flow.c b/drivers/net/iavf/iavf_generic_flow.c
index 1fe270fb22..b86d99e57d 100644
--- a/drivers/net/iavf/iavf_generic_flow.c
+++ b/drivers/net/iavf/iavf_generic_flow.c
@@ -1637,7 +1637,7 @@ iavf_flow_init(struct iavf_adapter *ad)
 	TAILQ_INIT(&vf->dist_parser_list);
 	rte_spinlock_init(&vf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->init == NULL) {
 			PMD_INIT_LOG(ERR, "Invalid engine type (%d)",
 				     engine->type);
@@ -1663,7 +1663,7 @@ iavf_flow_uninit(struct iavf_adapter *ad)
 	struct iavf_flow_parser_node *p_parser;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->uninit)
 			engine->uninit(ad);
 	}
@@ -1733,7 +1733,7 @@ iavf_unregister_parser(struct iavf_flow_parser *parser,
 	if (list == NULL)
 		return;
 
-	TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
 		if (p_parser->parser->engine->type == parser->engine->type) {
 			TAILQ_REMOVE(list, p_parser, node);
 			rte_free(p_parser);
@@ -1917,7 +1917,7 @@ iavf_parse_engine_create(struct iavf_adapter *ad,
 	void *temp;
 	void *meta = NULL;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -1946,7 +1946,7 @@ iavf_parse_engine_validate(struct iavf_adapter *ad,
 	void *temp;
 	void *meta = NULL;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -2089,7 +2089,7 @@ iavf_flow_is_valid(struct rte_flow *flow)
 	void *temp;
 
 	if (flow && flow->engine) {
-		TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+		RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 			if (engine == flow->engine)
 				return true;
 		}
@@ -2142,7 +2142,7 @@ iavf_flow_flush(struct rte_eth_dev *dev,
 	void *temp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(p_flow, &vf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &vf->flow_list, node, temp) {
 		ret = iavf_flow_destroy(dev, p_flow, error);
 		if (ret) {
 			PMD_DRV_LOG(ERR, "Failed to flush flows");
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index cab7c4da87..629e88980d 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -4,6 +4,7 @@
 
 #include <errno.h>
 #include <stdbool.h>
+#include <sys/queue.h>
 #include <sys/types.h>
 #include <unistd.h>
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index a4cd39c954..fadd5f2e5a 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -1104,7 +1104,7 @@ ice_remove_all_mac_vlan_filters(struct ice_vsi *vsi)
 	if (!vsi || !vsi->mac_num)
 		return -EINVAL;
 
-	TAILQ_FOREACH_SAFE(m_f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(m_f, &vsi->mac_list, next, temp) {
 		ret = ice_remove_mac_filter(vsi, &m_f->mac_info.mac_addr);
 		if (ret != ICE_SUCCESS) {
 			ret = -EINVAL;
@@ -1115,7 +1115,7 @@ ice_remove_all_mac_vlan_filters(struct ice_vsi *vsi)
 	if (vsi->vlan_num == 0)
 		return 0;
 
-	TAILQ_FOREACH_SAFE(v_f, &vsi->vlan_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(v_f, &vsi->vlan_list, next, temp) {
 		ret = ice_remove_vlan_filter(vsi, &v_f->vlan_info.vlan);
 		if (ret != ICE_SUCCESS) {
 			ret = -EINVAL;
diff --git a/drivers/net/ice/ice_generic_flow.c b/drivers/net/ice/ice_generic_flow.c
index 66b5743abf..3e557efe0c 100644
--- a/drivers/net/ice/ice_generic_flow.c
+++ b/drivers/net/ice/ice_generic_flow.c
@@ -1820,7 +1820,7 @@ ice_flow_init(struct ice_adapter *ad)
 	TAILQ_INIT(&pf->dist_parser_list);
 	rte_spinlock_init(&pf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->init == NULL) {
 			PMD_INIT_LOG(ERR, "Invalid engine type (%d)",
 					engine->type);
@@ -1846,7 +1846,7 @@ ice_flow_uninit(struct ice_adapter *ad)
 	struct ice_flow_parser_node *p_parser;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->uninit)
 			engine->uninit(ad);
 	}
@@ -1946,7 +1946,7 @@ ice_unregister_parser(struct ice_flow_parser *parser,
 	if (list == NULL)
 		return;
 
-	TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
 		if (p_parser->parser->engine->type == parser->engine->type) {
 			TAILQ_REMOVE(list, p_parser, node);
 			rte_free(p_parser);
@@ -2272,7 +2272,7 @@ ice_parse_engine_create(struct ice_adapter *ad,
 	void *meta = NULL;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		int ret;
 
 		if (parser_node->parser->parse_pattern_action(ad,
@@ -2305,7 +2305,7 @@ ice_parse_engine_validate(struct ice_adapter *ad,
 	struct ice_flow_parser_node *parser_node;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -2477,7 +2477,7 @@ ice_flow_flush(struct rte_eth_dev *dev,
 	void *temp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
 		ret = ice_flow_destroy(dev, p_flow, error);
 		if (ret) {
 			PMD_DRV_LOG(ERR, "Failed to flush flows");
@@ -2541,7 +2541,7 @@ ice_flow_redirect(struct ice_adapter *ad,
 
 	rte_spinlock_lock(&pf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
 		if (!p_flow->engine->redirect)
 			continue;
 		ret = p_flow->engine->redirect(ad, p_flow, rd);
diff --git a/drivers/net/ipn3ke/ipn3ke_flow.c b/drivers/net/ipn3ke/ipn3ke_flow.c
index c702e19ea5..f5867ca055 100644
--- a/drivers/net/ipn3ke/ipn3ke_flow.c
+++ b/drivers/net/ipn3ke/ipn3ke_flow.c
@@ -1231,7 +1231,7 @@ ipn3ke_flow_flush(struct rte_eth_dev *dev,
 	struct ipn3ke_hw *hw = IPN3KE_DEV_PRIVATE_TO_HW(dev);
 	struct rte_flow *flow, *temp;
 
-	TAILQ_FOREACH_SAFE(flow, &hw->flow_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &hw->flow_list, next, temp) {
 		TAILQ_REMOVE(&hw->flow_list, flow, next);
 		rte_free(flow);
 	}
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 31d857030f..ba2bf4de37 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -15099,7 +15099,7 @@ __flow_dv_destroy_sub_policy_rules(struct rte_eth_dev *dev,
 		    policy->act_cnt[i].fate_action == MLX5_FLOW_FATE_MTR)
 			next_fm = mlx5_flow_meter_find(priv,
 					policy->act_cnt[i].next_mtr_id, NULL);
-		TAILQ_FOREACH_SAFE(color_rule, &sub_policy->color_rules[i],
+		RTE_TAILQ_FOREACH_SAFE(color_rule, &sub_policy->color_rules[i],
 				   next_port, tmp) {
 			claim_zero(mlx5_flow_os_destroy_flow(color_rule->rule));
 			tbl = container_of(color_rule->matcher->tbl,
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index a24bd9c7ae..ba4e9fca17 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -2168,7 +2168,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 			priv->mtr_idx_tbl = NULL;
 		}
 	} else {
-		TAILQ_FOREACH_SAFE(legacy_fm, fms, next, tmp) {
+		RTE_TAILQ_FOREACH_SAFE(legacy_fm, fms, next, tmp) {
 			fm = &legacy_fm->fm;
 			if (mlx5_flow_meter_params_flush(dev, fm, 0))
 				return -rte_mtr_error_set(error, EINVAL,
diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c b/drivers/net/softnic/rte_eth_softnic_flow.c
index 27eaf380cd..7d054c38d2 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -2207,7 +2207,8 @@ pmd_flow_flush(struct rte_eth_dev *dev,
 			void *temp;
 			int status;
 
-			TAILQ_FOREACH_SAFE(flow, &table->flows, node, temp) {
+			RTE_TAILQ_FOREACH_SAFE(flow, &table->flows, node,
+				temp) {
 				/* Rule delete. */
 				status = softnic_pipeline_table_rule_delete
 						(softnic,
diff --git a/drivers/net/softnic/rte_eth_softnic_swq.c b/drivers/net/softnic/rte_eth_softnic_swq.c
index 2083d0a976..afe6f05e29 100644
--- a/drivers/net/softnic/rte_eth_softnic_swq.c
+++ b/drivers/net/softnic/rte_eth_softnic_swq.c
@@ -39,7 +39,7 @@ softnic_softnic_swq_free_keep_rxq_txq(struct pmd_internals *p)
 {
 	struct softnic_swq *swq, *tswq;
 
-	TAILQ_FOREACH_SAFE(swq, &p->swq_list, node, tswq) {
+	RTE_TAILQ_FOREACH_SAFE(swq, &p->swq_list, node, tswq) {
 		if ((strncmp(swq->name, "RXQ", strlen("RXQ")) == 0) ||
 			(strncmp(swq->name, "TXQ", strlen("TXQ")) == 0))
 			continue;
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index c961e18d67..7b80370b36 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -1606,7 +1606,7 @@ remove_hw_queues_from_list(struct dpaa2_dpdmai_dev *dpdmai_dev)
 
 	DPAA2_QDMA_FUNC_TRACE();
 
-	TAILQ_FOREACH_SAFE(queue, &qdma_queue_list, next, tqueue) {
+	RTE_TAILQ_FOREACH_SAFE(queue, &qdma_queue_list, next, tqueue) {
 		if (queue->dpdmai_dev == dpdmai_dev) {
 			TAILQ_REMOVE(&qdma_queue_list, queue, next);
 			rte_free(queue);
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 7017124414..3ebf62e697 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -434,7 +434,7 @@ struct rte_bbdev_callback;
 struct rte_intr_handle;
 
 /** Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_bbdev_cb_list, rte_bbdev_callback);
+RTE_TAILQ_HEAD(rte_bbdev_cb_list, rte_bbdev_callback);
 
 /**
  * @internal The data structure associated with a device. Drivers can access
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index 11f4e6fdbf..f86bf2260b 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -879,7 +879,7 @@ typedef uint16_t (*enqueue_pkt_burst_t)(void *qp,
 struct rte_cryptodev_callback;
 
 /** Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
+RTE_TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
 
 /**
  * Structure used to hold information about the callbacks to be called for a
diff --git a/lib/cryptodev/rte_cryptodev_pmd.h b/lib/cryptodev/rte_cryptodev_pmd.h
index 1274436870..9542cbf263 100644
--- a/lib/cryptodev/rte_cryptodev_pmd.h
+++ b/lib/cryptodev/rte_cryptodev_pmd.h
@@ -66,7 +66,7 @@ struct rte_cryptodev_global {
 
 /* Cryptodev driver, containing the driver ID */
 struct cryptodev_driver {
-	TAILQ_ENTRY(cryptodev_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(cryptodev_driver) next; /**< Next in list. */
 	const struct rte_driver *driver;
 	uint8_t id;
 };
diff --git a/lib/eal/common/eal_common_devargs.c b/lib/eal/common/eal_common_devargs.c
index 23aaf8b7e4..7edc6798fe 100644
--- a/lib/eal/common/eal_common_devargs.c
+++ b/lib/eal/common/eal_common_devargs.c
@@ -9,6 +9,7 @@
 #include <stdio.h>
 #include <string.h>
 #include <stdarg.h>
+#include <sys/queue.h>
 
 #include <rte_bus.h>
 #include <rte_class.h>
@@ -18,6 +19,7 @@
 #include <rte_errno.h>
 #include <rte_kvargs.h>
 #include <rte_log.h>
+#include <rte_os.h>
 #include <rte_tailq.h>
 #include "eal_private.h"
 
@@ -291,7 +293,7 @@ rte_devargs_insert(struct rte_devargs **da)
 	if (*da == NULL || (*da)->bus == NULL)
 		return -1;
 
-	TAILQ_FOREACH_SAFE(listed_da, &devargs_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(listed_da, &devargs_list, next, tmp) {
 		if (listed_da == *da)
 			/* devargs already in the list */
 			return 0;
@@ -358,7 +360,7 @@ rte_devargs_remove(struct rte_devargs *devargs)
 	if (devargs == NULL || devargs->bus == NULL)
 		return -1;
 
-	TAILQ_FOREACH_SAFE(d, &devargs_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(d, &devargs_list, next, tmp) {
 		if (strcmp(d->bus->name, devargs->bus->name) == 0 &&
 		    strcmp(d->name, devargs->name) == 0) {
 			TAILQ_REMOVE(&devargs_list, d, next);
diff --git a/lib/eal/common/eal_common_fbarray.c b/lib/eal/common/eal_common_fbarray.c
index 3a28a53247..75168ca552 100644
--- a/lib/eal/common/eal_common_fbarray.c
+++ b/lib/eal/common/eal_common_fbarray.c
@@ -9,6 +9,7 @@
 #include <errno.h>
 #include <string.h>
 #include <unistd.h>
+#include <sys/queue.h>
 
 #include <rte_common.h>
 #include <rte_eal_paging.h>
diff --git a/lib/eal/common/eal_common_log.c b/lib/eal/common/eal_common_log.c
index ec8fe23a7f..1be35f5397 100644
--- a/lib/eal/common/eal_common_log.c
+++ b/lib/eal/common/eal_common_log.c
@@ -10,6 +10,7 @@
 #include <errno.h>
 #include <regex.h>
 #include <fnmatch.h>
+#include <sys/queue.h>
 
 #include <rte_eal.h>
 #include <rte_log.h>
diff --git a/lib/eal/common/eal_common_memalloc.c b/lib/eal/common/eal_common_memalloc.c
index e872c6533b..aefdf8de3f 100644
--- a/lib/eal/common/eal_common_memalloc.c
+++ b/lib/eal/common/eal_common_memalloc.c
@@ -3,6 +3,7 @@
  */
 
 #include <string.h>
+#include <sys/queue.h>
 
 #include <rte_errno.h>
 #include <rte_lcore.h>
diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
index ff5861b5f3..2cc74b4472 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <string.h>
+#include <sys/queue.h>
 #ifndef RTE_EXEC_ENV_WINDOWS
 #include <syslog.h>
 #endif
@@ -283,7 +284,7 @@ eal_option_device_parse(void)
 	void *tmp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(devopt, &devopt_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(devopt, &devopt_list, next, tmp) {
 		if (ret == 0) {
 			ret = rte_devargs_add(devopt->type, devopt->arg);
 			if (ret)
diff --git a/lib/eal/common/eal_trace.h b/lib/eal/common/eal_trace.h
index 06751eb23a..76fbcd86b0 100644
--- a/lib/eal/common/eal_trace.h
+++ b/lib/eal/common/eal_trace.h
@@ -5,6 +5,8 @@
 #ifndef __EAL_TRACE_H
 #define __EAL_TRACE_H
 
+#include <sys/queue.h>
+
 #include <rte_cycles.h>
 #include <rte_log.h>
 #include <rte_malloc.h>
diff --git a/lib/eal/freebsd/include/rte_os.h b/lib/eal/freebsd/include/rte_os.h
index 627f0483ab..06f30ce238 100644
--- a/lib/eal/freebsd/include/rte_os.h
+++ b/lib/eal/freebsd/include/rte_os.h
@@ -11,6 +11,21 @@
  */
 
 #include <pthread_np.h>
+#include <sys/queue.h>
+
+/* These macros are compatible with system's sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
+#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
+#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
+#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
+#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
+#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
+#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)
+
 
 typedef cpuset_t rte_cpuset_t;
 #define RTE_HAS_CPUSET
diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
index 80b154fb98..84d364df3f 100644
--- a/lib/eal/include/rte_bus.h
+++ b/lib/eal/include/rte_bus.h
@@ -19,13 +19,12 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 
 #include <rte_log.h>
 #include <rte_dev.h>
 
 /** Double linked list of buses */
-TAILQ_HEAD(rte_bus_list, rte_bus);
+RTE_TAILQ_HEAD(rte_bus_list, rte_bus);
 
 
 /**
@@ -250,7 +249,7 @@ typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
  * A structure describing a generic bus.
  */
 struct rte_bus {
-	TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
+	RTE_TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
 	const char *name;            /**< Name of the bus */
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
diff --git a/lib/eal/include/rte_class.h b/lib/eal/include/rte_class.h
index 856d09b22d..d560339652 100644
--- a/lib/eal/include/rte_class.h
+++ b/lib/eal/include/rte_class.h
@@ -22,18 +22,16 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
-
 #include <rte_dev.h>
 
 /** Double linked list of classes */
-TAILQ_HEAD(rte_class_list, rte_class);
+RTE_TAILQ_HEAD(rte_class_list, rte_class);
 
 /**
  * A structure describing a generic device class.
  */
 struct rte_class {
-	TAILQ_ENTRY(rte_class) next; /**< Next device class in linked list */
+	RTE_TAILQ_ENTRY(rte_class) next; /**< Next device class in linked list */
 	const char *name; /**< Name of the class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 };
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index 6dd72c11a1..f6efe0c94e 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -18,7 +18,6 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_compat.h>
@@ -75,7 +74,7 @@ struct rte_mem_resource {
  * A structure describing a device driver.
  */
 struct rte_driver {
-	TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
 	const char *name;                   /**< Driver name. */
 	const char *alias;              /**< Driver alias. */
 };
@@ -90,7 +89,7 @@ struct rte_driver {
  * A structure describing a generic device.
  */
 struct rte_device {
-	TAILQ_ENTRY(rte_device) next; /**< Next device */
+	RTE_TAILQ_ENTRY(rte_device) next; /**< Next device */
 	const char *name;             /**< Device name */
 	const struct rte_driver *driver; /**< Driver assigned after probing */
 	const struct rte_bus *bus;    /**< Bus handle assigned on scan */
diff --git a/lib/eal/include/rte_devargs.h b/lib/eal/include/rte_devargs.h
index cd90944fe8..957477b398 100644
--- a/lib/eal/include/rte_devargs.h
+++ b/lib/eal/include/rte_devargs.h
@@ -21,7 +21,6 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 #include <rte_compat.h>
 #include <rte_bus.h>
 
@@ -76,7 +75,7 @@ enum rte_devtype {
  */
 struct rte_devargs {
 	/** Next in list. */
-	TAILQ_ENTRY(rte_devargs) next;
+	RTE_TAILQ_ENTRY(rte_devargs) next;
 	/** Type of device. */
 	enum rte_devtype type;
 	/** Device policy. */
diff --git a/lib/eal/include/rte_log.h b/lib/eal/include/rte_log.h
index b706bb8710..bb3523467b 100644
--- a/lib/eal/include/rte_log.h
+++ b/lib/eal/include/rte_log.h
@@ -21,7 +21,6 @@ extern "C" {
 #include <stdio.h>
 #include <stdarg.h>
 #include <stdbool.h>
-#include <sys/queue.h>
 
 #include <rte_common.h>
 #include <rte_config.h>
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index c7d037d862..1c9275c32a 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -29,7 +29,6 @@ extern "C" {
 
 #include<stdio.h>
 #include <stdint.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_lcore.h>
diff --git a/lib/eal/include/rte_tailq.h b/lib/eal/include/rte_tailq.h
index b6fe4e5f78..b32033ad66 100644
--- a/lib/eal/include/rte_tailq.h
+++ b/lib/eal/include/rte_tailq.h
@@ -15,17 +15,16 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
 #include <stdio.h>
 #include <rte_debug.h>
 
 /** dummy structure type used by the rte_tailq APIs */
 struct rte_tailq_entry {
-	TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
+	RTE_TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
 	void *data; /**< Pointer to the data referenced by this tailq entry */
 };
 /** dummy */
-TAILQ_HEAD(rte_tailq_entry_head, rte_tailq_entry);
+RTE_TAILQ_HEAD(rte_tailq_entry_head, rte_tailq_entry);
 
 #define RTE_TAILQ_NAMESIZE 32
 
@@ -48,7 +47,7 @@ struct rte_tailq_elem {
 	 * rte_eal_tailqs_init()
 	 */
 	struct rte_tailq_head *head;
-	TAILQ_ENTRY(rte_tailq_elem) next;
+	RTE_TAILQ_ENTRY(rte_tailq_elem) next;
 	const char name[RTE_TAILQ_NAMESIZE];
 };
 
@@ -125,14 +124,6 @@ RTE_INIT(tailqinitfn_ ##t) \
 		rte_panic("Cannot initialize tailq: %s\n", t.name); \
 }
 
-/* This macro permits both remove and free var within the loop safely.*/
-#ifndef TAILQ_FOREACH_SAFE
-#define TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
-	for ((var) = TAILQ_FIRST((head));			\
-	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1);	\
-	    (var) = (tvar))
-#endif
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/linux/include/rte_os.h b/lib/eal/linux/include/rte_os.h
index 1618b4df22..ce5b0aed52 100644
--- a/lib/eal/linux/include/rte_os.h
+++ b/lib/eal/linux/include/rte_os.h
@@ -11,6 +11,21 @@
  */
 
 #include <sched.h>
+#include <sys/queue.h>
+
+/* These macros are compatible with system's sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
+#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
+#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
+#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
+#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
+#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
+#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)
+
 
 #ifdef CPU_SETSIZE /* may require _GNU_SOURCE */
 typedef cpu_set_t rte_cpuset_t;
diff --git a/lib/eal/windows/eal_alarm.c b/lib/eal/windows/eal_alarm.c
index e5dc54efb8..103c1f909d 100644
--- a/lib/eal/windows/eal_alarm.c
+++ b/lib/eal/windows/eal_alarm.c
@@ -4,6 +4,7 @@
 
 #include <stdatomic.h>
 #include <stdbool.h>
+#include <sys/queue.h>
 
 #include <rte_alarm.h>
 #include <rte_spinlock.h>
diff --git a/lib/eal/windows/include/rte_os.h b/lib/eal/windows/include/rte_os.h
index 66c711d458..54892ab89c 100644
--- a/lib/eal/windows/include/rte_os.h
+++ b/lib/eal/windows/include/rte_os.h
@@ -18,6 +18,37 @@
 extern "C" {
 #endif
 
+#define RTE_TAILQ_HEAD(name, type) \
+struct name { \
+	struct type *tqh_first; /* first element */ \
+	struct type **tqh_last; /* addr of last next element */ \
+}
+#define RTE_TAILQ_ENTRY(type) \
+struct { \
+	struct type *tqe_next; /* next element */ \
+	struct type **tqe_prev; /* address of previous next element */ \
+}
+#define RTE_TAILQ_FOREACH(var, head, field) \
+	for ((var) = RTE_TAILQ_FIRST((head)); \
+	    (var); \
+	    (var) = RTE_TAILQ_NEXT((var), field))
+#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) ((head)->tqh_first)
+#define RTE_TAILQ_NEXT(elm, field) ((elm)->field.tqe_next)
+#define RTE_STAILQ_HEAD(name, type) \
+struct name { \
+	struct type *stqh_first;/* first element */ \
+	struct type **stqh_last;/* addr of last next element */ \
+}
+#define RTE_STAILQ_ENTRY(type) \
+struct { \
+	struct type *stqe_next; /* next element */ \
+}
+
+
 /* cpu_set macros implementation */
 #define RTE_CPU_AND(dst, src1, src2) CPU_AND(dst, src1, src2)
 #define RTE_CPU_OR(dst, src1, src2) CPU_OR(dst, src1, src2)
diff --git a/lib/efd/rte_efd.c b/lib/efd/rte_efd.c
index 77f46809f8..5bf517fee9 100644
--- a/lib/efd/rte_efd.c
+++ b/lib/efd/rte_efd.c
@@ -759,7 +759,7 @@ rte_efd_free(struct rte_efd_table *table)
 	efd_list = RTE_TAILQ_CAST(rte_efd_tailq.head, rte_efd_list);
 	rte_mcfg_tailq_write_lock();
 
-	TAILQ_FOREACH_SAFE(te, efd_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(te, efd_list, next, temp) {
 		if (te->data == (void *) table) {
 			TAILQ_REMOVE(efd_list, te, next);
 			rte_free(te);
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index edf96de2dc..d2c9ec42c7 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -21,7 +21,7 @@
 
 struct rte_eth_dev_callback;
 /** @internal Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
+RTE_TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
 
 struct rte_eth_dev;
 
diff --git a/lib/hash/rte_fbk_hash.h b/lib/hash/rte_fbk_hash.h
index c4d6976d2b..9c3a61c1d6 100644
--- a/lib/hash/rte_fbk_hash.h
+++ b/lib/hash/rte_fbk_hash.h
@@ -17,7 +17,6 @@
 
 #include <stdint.h>
 #include <errno.h>
-#include <sys/queue.h>
 
 #ifdef __cplusplus
 extern "C" {
diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c
index d5a95a6e00..696a1121e2 100644
--- a/lib/hash/rte_thash.c
+++ b/lib/hash/rte_thash.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2021 Intel Corporation
  */
 
+#include <sys/queue.h>
+
 #include <rte_thash.h>
 #include <rte_tailq.h>
 #include <rte_random.h>
diff --git a/lib/ip_frag/rte_ip_frag.h b/lib/ip_frag/rte_ip_frag.h
index 0bfe64b14e..80f931c32a 100644
--- a/lib/ip_frag/rte_ip_frag.h
+++ b/lib/ip_frag/rte_ip_frag.h
@@ -62,7 +62,7 @@ struct ip_frag_key {
  * First two entries in the frags[] array are for the last and first fragments.
  */
 struct ip_frag_pkt {
-	TAILQ_ENTRY(ip_frag_pkt) lru;   /**< LRU list */
+	RTE_TAILQ_ENTRY(ip_frag_pkt) lru;   /**< LRU list */
 	struct ip_frag_key key;           /**< fragmentation key */
 	uint64_t             start;       /**< creation timestamp */
 	uint32_t             total_size;  /**< expected reassembled size */
@@ -83,7 +83,7 @@ struct rte_ip_frag_death_row {
 	/**< mbufs to be freed */
 };
 
-TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */
+RTE_TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */
 
 /** fragmentation table statistics */
 struct ip_frag_tbl_stat {
diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 59a588425b..c5f859ae71 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1337,7 +1337,7 @@ void rte_mempool_walk(void (*func)(struct rte_mempool *, void *),
 
 	rte_mcfg_mempool_read_lock();
 
-	TAILQ_FOREACH_SAFE(te, mempool_list, next, tmp_te) {
+	RTE_TAILQ_FOREACH_SAFE(te, mempool_list, next, tmp_te) {
 		(*func)((struct rte_mempool *) te->data, arg);
 	}
 
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 4235d6f0bf..f57ecbd6fc 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -38,7 +38,6 @@
 #include <stdint.h>
 #include <errno.h>
 #include <inttypes.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_spinlock.h>
@@ -141,7 +140,7 @@ struct rte_mempool_objsz {
  * double-frees.
  */
 struct rte_mempool_objhdr {
-	STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
+	RTE_STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
 	struct rte_mempool *mp;          /**< The mempool owning the object. */
 	rte_iova_t iova;                 /**< IO address of the object. */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
@@ -152,7 +151,7 @@ struct rte_mempool_objhdr {
 /**
  * A list of object headers type
  */
-STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
+RTE_STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
 
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 
@@ -171,7 +170,7 @@ struct rte_mempool_objtlr {
 /**
  * A list of memory where objects are stored
  */
-STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
+RTE_STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
 
 /**
  * Callback used to free a memory chunk
@@ -186,7 +185,7 @@ typedef void (rte_mempool_memchunk_free_cb_t)(struct rte_mempool_memhdr *memhdr,
  * and physically contiguous.
  */
 struct rte_mempool_memhdr {
-	STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
+	RTE_STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
 	struct rte_mempool *mp;  /**< The mempool owning the chunk */
 	void *addr;              /**< Virtual address of the chunk */
 	rte_iova_t iova;         /**< IO address of the chunk */
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index 1f33d687f4..71cbd441c7 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -18,7 +18,6 @@ extern "C" {
 
 #include <stdio.h>
 #include <limits.h>
-#include <sys/queue.h>
 #include <inttypes.h>
 #include <sys/types.h>
 
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 16718ca7f1..43ce1a29d4 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -26,7 +26,6 @@ extern "C" {
 #include <stdio.h>
 #include <stdint.h>
 #include <string.h>
-#include <sys/queue.h>
 #include <errno.h>
 #include <rte_common.h>
 #include <rte_config.h>
diff --git a/lib/table/rte_swx_table.h b/lib/table/rte_swx_table.h
index e23f2304c6..f93e5f3f95 100644
--- a/lib/table/rte_swx_table.h
+++ b/lib/table/rte_swx_table.h
@@ -16,7 +16,8 @@ extern "C" {
  */
 
 #include <stdint.h>
-#include <sys/queue.h>
+
+#include <rte_os.h>
 
 /** Match type. */
 enum rte_swx_table_match_type {
@@ -68,7 +69,7 @@ struct rte_swx_table_entry {
 	/** Used to facilitate the membership of this table entry to a
 	 * linked list.
 	 */
-	TAILQ_ENTRY(rte_swx_table_entry) node;
+	RTE_TAILQ_ENTRY(rte_swx_table_entry) node;
 
 	/** Key value for the current entry. Array of *key_size* bytes or NULL
 	 * if the *key_size* for the current table is 0.
@@ -111,7 +112,7 @@ struct rte_swx_table_entry {
 };
 
 /** List of table entries. */
-TAILQ_HEAD(rte_swx_table_entry_list, rte_swx_table_entry);
+RTE_TAILQ_HEAD(rte_swx_table_entry_list, rte_swx_table_entry);
 
 /**
  * Table memory footprint get
diff --git a/lib/table/rte_swx_table_selector.h b/lib/table/rte_swx_table_selector.h
index 71b6a74810..62988d2856 100644
--- a/lib/table/rte_swx_table_selector.h
+++ b/lib/table/rte_swx_table_selector.h
@@ -16,7 +16,6 @@ extern "C" {
  */
 
 #include <stdint.h>
-#include <sys/queue.h>
 
 #include <rte_compat.h>
 
@@ -56,7 +55,7 @@ struct rte_swx_table_selector_params {
 /** Group member parameters. */
 struct rte_swx_table_selector_member {
 	/** Linked list connectivity. */
-	TAILQ_ENTRY(rte_swx_table_selector_member) node;
+	RTE_TAILQ_ENTRY(rte_swx_table_selector_member) node;
 
 	/** Member ID. */
 	uint32_t member_id;
@@ -66,7 +65,7 @@ struct rte_swx_table_selector_member {
 };
 
 /** List of group members. */
-TAILQ_HEAD(rte_swx_table_selector_member_list, rte_swx_table_selector_member);
+RTE_TAILQ_HEAD(rte_swx_table_selector_member_list, rte_swx_table_selector_member);
 
 /** Group parameters. */
 struct rte_swx_table_selector_group {
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index e0b67721b6..e4a445e709 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -32,7 +32,7 @@ vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
 
 	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
 		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
 		rte_mempool_put(vq->iotlb_pool, node);
 	}
@@ -100,7 +100,8 @@ vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
 
 	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next,
+				temp_node) {
 		if (node->iova < iova)
 			continue;
 		if (node->iova >= iova + size)
@@ -121,7 +122,7 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		TAILQ_REMOVE(&vq->iotlb_list, node, next);
 		rte_mempool_put(vq->iotlb_pool, node);
 	}
@@ -141,7 +142,7 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
 
 	entry_idx = rte_rand() % vq->iotlb_cache_nr;
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			rte_mempool_put(vq->iotlb_pool, node);
@@ -218,7 +219,7 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		/* Sorted list */
 		if (unlikely(iova + size < node->iova))
 			break;
diff --git a/lib/vhost/rte_vdpa_dev.h b/lib/vhost/rte_vdpa_dev.h
index bfada387b0..b0f494815f 100644
--- a/lib/vhost/rte_vdpa_dev.h
+++ b/lib/vhost/rte_vdpa_dev.h
@@ -71,7 +71,7 @@ struct rte_vdpa_dev_ops {
  * vdpa device structure includes device address and device operations.
  */
 struct rte_vdpa_device {
-	TAILQ_ENTRY(rte_vdpa_device) next;
+	RTE_TAILQ_ENTRY(rte_vdpa_device) next;
 	/** Generic device information */
 	struct rte_device *device;
 	/** vdpa device operations */
diff --git a/lib/vhost/vdpa.c b/lib/vhost/vdpa.c
index 99a926a772..6dd91859ac 100644
--- a/lib/vhost/vdpa.c
+++ b/lib/vhost/vdpa.c
@@ -115,7 +115,7 @@ rte_vdpa_unregister_device(struct rte_vdpa_device *dev)
 	int ret = -1;
 
 	rte_spinlock_lock(&vdpa_device_list_lock);
-	TAILQ_FOREACH_SAFE(cur_dev, &vdpa_device_list, next, tmp_dev) {
+	RTE_TAILQ_FOREACH_SAFE(cur_dev, &vdpa_device_list, next, tmp_dev) {
 		if (dev != cur_dev)
 			continue;
 
-- 
2.30.2


^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCHv5] eal: remove sys/queue.h from public headers.
  2021-08-13  3:36  1%     ` [dpdk-dev] [PATCHv5] " William Tu
@ 2021-08-13 18:59  0%       ` Dmitry Kozlyuk
  2021-08-14  2:51  1%       ` [dpdk-dev] [PATCH v6] " William Tu
  1 sibling, 0 replies; 200+ results
From: Dmitry Kozlyuk @ 2021-08-13 18:59 UTC (permalink / raw)
  To: William Tu; +Cc: dev, nick.connolly, stephen

2021-08-13 03:36 (UTC+0000), William Tu:
> Currently there are some public headers that include 'sys/queue.h', which
> is not POSIX, but usually provided by the Linux/BSD system library.
> (Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
> The file is missing on Windows. During the windows build, DPDK uses a

Typo: "Windows".

> bundled copy, so building a DPDK library works fine.  But when OVS or other
> applications use DPDK as a library, because some DPDK public headers
> include 'sys/queue.h', on Windows, it triggers an error due to no such file.
> 
> One solution is to install the 'lib/eal/windows/include/sys/queue.h' into
> Windows environment, such as [1]. However, this means DPDK exports the
> functionalities of 'sys/queue.h' into the environment, which might cause
> symbols, macros, headers clashing with other applications.
> 
> The patch fixes it by removing the "#include <sys/queue.h>" from
> DPDK public headers, so programs including DPDK headers don't depend
> on the system to provide 'sys/queue.h'. When these public headers use
> macros such as TAILQ_xxx, we replace it with RTE_ prefix.

"replace it by _the ones_ with RTE_ prefix"?

> For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
> under windows. Note that these RTE_ macros are compatible with

"under windows" -> "in Windows EAL"

> <sys/queue.h>, only at the level of API (to use with <sys/queue.h>

"only" -> "both"

> macros in C files) and ABI (to avoid breaking it).
> 
> Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
> the patch replaces it with RTE_TAILQ_FOREACH_SAFE.

> With this patch, all the public headers no longer have
> "#include <sys/queue.h>" or "TAILQ_xxx" macros.

This is a repetition of what is stated in the previous paragraph.

> 
> [1] http://mails.dpdk.org/archives/dev/2021-August/216304.html
> 
> Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
> Suggested-by: Dmitry Kozliuk <Dmitry.Kozliuk@gmail.com>
> Signed-off-by: William Tu <u9012063@gmail.com>
> ---
> v4-v5
> * fix compile error due to drivers/net/ipn3ke/ipn3ke_flow.c:1234
> * run spell check

1. Please register at http://patchwork.dpdk.org with the email used for the
patches and update the state of all previous versions to "Superseded".
It is not currently done automatically and only you and a few maintainers
can change the state.

Patchwork also shows CI build failures with v5, they need to be fixed.

2. Are you using `git format-patch -v5 ...` to create patches?
The subject of your patches is missing a space ("PATCH v5" vs "PATCHv5").
Not sure if tools like patchwork will properly process it.

[...]
>  struct rte_afu_driver {
> -	TAILQ_ENTRY(rte_afu_driver) next;       /**< Next afu driver. */
> +	RTE_TAILQ_ENTRY(rte_afu_driver) next;       /**< Next afu driver. */
>  	struct rte_driver driver;               /**< Inherit core driver. */
>  	afu_probe_t *probe;                     /**< Device Probe function. */
>  	afu_remove_t *remove;                   /**< Device Remove function. */

Re: loss of comment alignment here and in other places.
Firstly, it's definitely not a big deal. Current patch is good because it only
changes relevant lines. Re-aligning all the comments would be worse IMO.
However, in cases like this, when keeping alignment doesn't require changing
neighboring lines, it could be kept. Just a nit.

[...]
>  /* This macro permits both remove and free var within the loop safely.*/
> -#ifndef TAILQ_FOREACH_SAFE
> -#define TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
> -	for ((var) = TAILQ_FIRST((head));			\
> -	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1);	\
> +#ifndef RTE_TAILQ_FOREACH_SAFE
> +#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
> +	for ((var) = RTE_TAILQ_FIRST((head));			\
> +	    (var) && ((tvar) = RTE_TAILQ_NEXT((var), field), 1);	\
>  	    (var) = (tvar))
>  #endif

Why duplicate this in rte_os.h (documentation lost, BTW) and add #ifdef?
RTE_TAILQ_FOREACH_SAFE is not needed in headers, it can be left here.

>  
> diff --git a/lib/eal/linux/include/rte_os.h b/lib/eal/linux/include/rte_os.h
> index 1618b4df22..1a6e5b789f 100644
> --- a/lib/eal/linux/include/rte_os.h
> +++ b/lib/eal/linux/include/rte_os.h
> @@ -11,6 +11,21 @@
>   */
>  
>  #include <sched.h>
> +#include <sys/queue.h>
> +
> +/* These macros are compatible with system's sys/queue.h. */
> +#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
> +#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
> +#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
> +#define	RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \

Stray TAB here and in rte_os.h for other platforms.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1 1/6] bbdev: add capability for CRC16 check
  @ 2021-08-13 16:51  4% ` Nicolas Chautru
  0 siblings, 0 replies; 200+ results
From: Nicolas Chautru @ 2021-08-13 16:51 UTC (permalink / raw)
  To: dev, gakhil; +Cc: thomas, trix, hemant.agrawal, mingshan.zhang, Nicolas Chautru

Adding a missing operation when CRC16
is being used for TB CRC check.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 app/test-bbdev/test_bbdev_vector.c     |  2 ++
 doc/guides/prog_guide/bbdev.rst        |  3 +++
 doc/guides/rel_notes/release_21_11.rst |  1 +
 lib/bbdev/rte_bbdev_op.h               | 34 ++++++++++++++++++----------------
 4 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/app/test-bbdev/test_bbdev_vector.c b/app/test-bbdev/test_bbdev_vector.c
index 614dbd1..8d796b1 100644
--- a/app/test-bbdev/test_bbdev_vector.c
+++ b/app/test-bbdev/test_bbdev_vector.c
@@ -167,6 +167,8 @@
 		*op_flag_value = RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK;
 	else if (!strcmp(token, "RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP"))
 		*op_flag_value = RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP;
+	else if (!strcmp(token, "RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK"))
+		*op_flag_value = RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK;
 	else if (!strcmp(token, "RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS"))
 		*op_flag_value = RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS;
 	else if (!strcmp(token, "RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE"))
diff --git a/doc/guides/prog_guide/bbdev.rst b/doc/guides/prog_guide/bbdev.rst
index 9619280..8bd7cba 100644
--- a/doc/guides/prog_guide/bbdev.rst
+++ b/doc/guides/prog_guide/bbdev.rst
@@ -891,6 +891,9 @@ given below.
 |RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP                                    |
 | Set to drop the last CRC bits decoding output                      |
 +--------------------------------------------------------------------+
+|RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK                                    |
+| Set for code block CRC-16 checking                                 |
++--------------------------------------------------------------------+
 |RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS                                 |
 | Set for bit-level de-interleaver bypass on input stream            |
 +--------------------------------------------------------------------+
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index d707a55..69dd518 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -84,6 +84,7 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* bbdev: Added capability related to more comprehensive CRC options.
 
 ABI Changes
 -----------
diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h
index f946842..7c44ddd 100644
--- a/lib/bbdev/rte_bbdev_op.h
+++ b/lib/bbdev/rte_bbdev_op.h
@@ -142,51 +142,53 @@ enum rte_bbdev_op_ldpcdec_flag_bitmasks {
 	RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK = (1ULL << 1),
 	/** Set to drop the last CRC bits decoding output */
 	RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP = (1ULL << 2),
+	/** Set for transport block CRC-16 checking */
+	RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK = (1ULL << 3),
 	/** Set for bit-level de-interleaver bypass on Rx stream. */
-	RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS = (1ULL << 3),
+	RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS = (1ULL << 4),
 	/** Set for HARQ combined input stream enable. */
-	RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE = (1ULL << 4),
+	RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE = (1ULL << 5),
 	/** Set for HARQ combined output stream enable. */
-	RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE = (1ULL << 5),
+	RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE = (1ULL << 6),
 	/** Set for LDPC decoder bypass.
 	 *  RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE must be set.
 	 */
-	RTE_BBDEV_LDPC_DECODE_BYPASS = (1ULL << 6),
+	RTE_BBDEV_LDPC_DECODE_BYPASS = (1ULL << 7),
 	/** Set for soft-output stream enable */
-	RTE_BBDEV_LDPC_SOFT_OUT_ENABLE = (1ULL << 7),
+	RTE_BBDEV_LDPC_SOFT_OUT_ENABLE = (1ULL << 8),
 	/** Set for Rate-Matching bypass on soft-out stream. */
-	RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS = (1ULL << 8),
+	RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS = (1ULL << 9),
 	/** Set for bit-level de-interleaver bypass on soft-output stream. */
-	RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS = (1ULL << 9),
+	RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS = (1ULL << 10),
 	/** Set for iteration stopping on successful decode condition
 	 *  i.e. a successful syndrome check.
 	 */
-	RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE = (1ULL << 10),
+	RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE = (1ULL << 11),
 	/** Set if a device supports decoder dequeue interrupts. */
-	RTE_BBDEV_LDPC_DEC_INTERRUPTS = (1ULL << 11),
+	RTE_BBDEV_LDPC_DEC_INTERRUPTS = (1ULL << 12),
 	/** Set if a device supports scatter-gather functionality. */
-	RTE_BBDEV_LDPC_DEC_SCATTER_GATHER = (1ULL << 12),
+	RTE_BBDEV_LDPC_DEC_SCATTER_GATHER = (1ULL << 13),
 	/** Set if a device supports input/output HARQ compression. */
-	RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION = (1ULL << 13),
+	RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION = (1ULL << 14),
 	/** Set if a device supports input LLR compression. */
-	RTE_BBDEV_LDPC_LLR_COMPRESSION = (1ULL << 14),
+	RTE_BBDEV_LDPC_LLR_COMPRESSION = (1ULL << 15),
 	/** Set if a device supports HARQ input from
 	 *  device's internal memory.
 	 */
-	RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_IN_ENABLE = (1ULL << 15),
+	RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_IN_ENABLE = (1ULL << 16),
 	/** Set if a device supports HARQ output to
 	 *  device's internal memory.
 	 */
-	RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_OUT_ENABLE = (1ULL << 16),
+	RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_OUT_ENABLE = (1ULL << 17),
 	/** Set if a device supports loop-back access to
 	 *  HARQ internal memory. Intended for troubleshooting.
 	 */
-	RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_LOOPBACK = (1ULL << 17),
+	RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_LOOPBACK = (1ULL << 18),
 	/** Set if a device includes LLR filler bits in the circular buffer
 	 *  for HARQ memory. If not set, it is assumed the filler bits are not
 	 *  in HARQ memory and handled directly by the LDPC decoder.
 	 */
-	RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_FILLERS = (1ULL << 18)
+	RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_FILLERS = (1ULL << 19)
 };
 
 /** Flags for LDPC encoder operation and capability structure */
-- 
1.8.3.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCHv5] eal: remove sys/queue.h from public headers.
  2021-08-13  1:02  1%   ` [dpdk-dev] [PATCHv4] eal: remove sys/queue.h from public headers William Tu
  2021-08-13  1:11  0%     ` Stephen Hemminger
@ 2021-08-13  3:36  1%     ` William Tu
  2021-08-13 18:59  0%       ` Dmitry Kozlyuk
  2021-08-14  2:51  1%       ` [dpdk-dev] [PATCH v6] " William Tu
  1 sibling, 2 replies; 200+ results
From: William Tu @ 2021-08-13  3:36 UTC (permalink / raw)
  To: dev; +Cc: Dmitry.Kozliuk, nick.connolly, stephen


Currently there are some public headers that include 'sys/queue.h', which
is not POSIX, but usually provided by the Linux/BSD system library.
(Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
The file is missing on Windows. During the windows build, DPDK uses a
bundled copy, so building a DPDK library works fine.  But when OVS or other
applications use DPDK as a library, because some DPDK public headers
include 'sys/queue.h', on Windows, it triggers an error due to no such file.

One solution is to install the 'lib/eal/windows/include/sys/queue.h' into
Windows environment, such as [1]. However, this means DPDK exports the
functionalities of 'sys/queue.h' into the environment, which might cause
symbols, macros, headers clashing with other applications.

The patch fixes it by removing the "#include <sys/queue.h>" from
DPDK public headers, so programs including DPDK headers don't depend
on the system to provide 'sys/queue.h'. When these public headers use
macros such as TAILQ_xxx, we replace it with RTE_ prefix.
For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
under windows. Note that these RTE_ macros are compatible with
<sys/queue.h>, only at the level of API (to use with <sys/queue.h>
macros in C files) and ABI (to avoid breaking it).

Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
the patch replaces it with RTE_TAILQ_FOREACH_SAFE.
With this patch, all the public headers no longer have
"#include <sys/queue.h>" or "TAILQ_xxx" macros.

[1] http://mails.dpdk.org/archives/dev/2021-August/216304.html

Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
Suggested-by: Dmitry Kozliuk <Dmitry.Kozliuk@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
---
v4-v5
* fix compile error due to drivers/net/ipn3ke/ipn3ke_flow.c:1234
* run spell check
---
 drivers/bus/auxiliary/private.h            |  1 +
 drivers/bus/auxiliary/rte_bus_auxiliary.h  |  5 ++--
 drivers/bus/dpaa/dpaa_bus.c                |  4 +--
 drivers/bus/fslmc/fslmc_bus.c              |  4 +--
 drivers/bus/fslmc/fslmc_vfio.c             |  9 ++++---
 drivers/bus/ifpga/rte_bus_ifpga.h          |  8 +++---
 drivers/bus/pci/pci_params.c               |  2 ++
 drivers/bus/pci/rte_bus_pci.h              | 13 +++++----
 drivers/bus/pci/windows/pci.c              |  3 +++
 drivers/bus/pci/windows/pci_netuio.c       |  2 ++
 drivers/bus/vdev/rte_bus_vdev.h            |  7 +++--
 drivers/bus/vdev/vdev.c                    |  3 ++-
 drivers/bus/vmbus/rte_bus_vmbus.h          | 13 +++++----
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c         |  2 +-
 drivers/net/bonding/rte_eth_bond_flow.c    |  2 +-
 drivers/net/failsafe/failsafe_flow.c       |  2 +-
 drivers/net/i40e/i40e_ethdev.c             |  9 ++++---
 drivers/net/i40e/i40e_ethdev.h             |  1 +
 drivers/net/i40e/i40e_flow.c               |  6 ++---
 drivers/net/i40e/i40e_hash.c               |  2 +-
 drivers/net/i40e/rte_pmd_i40e.c            |  6 ++---
 drivers/net/iavf/iavf_generic_flow.c       | 14 +++++-----
 drivers/net/ice/ice_dcf_ethdev.c           |  1 +
 drivers/net/ice/ice_ethdev.c               |  4 +--
 drivers/net/ice/ice_generic_flow.c         | 14 +++++-----
 drivers/net/ipn3ke/ipn3ke_flow.c           |  2 +-
 drivers/net/softnic/rte_eth_softnic_flow.c |  3 ++-
 drivers/net/softnic/rte_eth_softnic_swq.c  |  2 +-
 drivers/raw/dpaa2_qdma/dpaa2_qdma.c        |  2 +-
 lib/bbdev/rte_bbdev.h                      |  2 +-
 lib/cryptodev/rte_cryptodev.h              |  2 +-
 lib/cryptodev/rte_cryptodev_pmd.h          |  2 +-
 lib/eal/common/eal_common_devargs.c        |  6 +++--
 lib/eal/common/eal_common_fbarray.c        |  1 +
 lib/eal/common/eal_common_log.c            |  1 +
 lib/eal/common/eal_common_memalloc.c       |  1 +
 lib/eal/common/eal_common_options.c        |  3 ++-
 lib/eal/common/eal_trace.h                 |  2 ++
 lib/eal/freebsd/include/rte_os.h           | 15 +++++++++++
 lib/eal/include/rte_bus.h                  |  5 ++--
 lib/eal/include/rte_class.h                |  6 ++---
 lib/eal/include/rte_dev.h                  |  5 ++--
 lib/eal/include/rte_devargs.h              |  3 +--
 lib/eal/include/rte_log.h                  |  1 -
 lib/eal/include/rte_service.h              |  1 -
 lib/eal/include/rte_tailq.h                | 15 +++++------
 lib/eal/linux/include/rte_os.h             | 15 +++++++++++
 lib/eal/windows/eal_alarm.c                |  1 +
 lib/eal/windows/include/rte_os.h           | 31 ++++++++++++++++++++++
 lib/efd/rte_efd.c                          |  2 +-
 lib/ethdev/rte_ethdev_core.h               |  2 +-
 lib/hash/rte_fbk_hash.h                    |  1 -
 lib/hash/rte_thash.c                       |  2 ++
 lib/ip_frag/rte_ip_frag.h                  |  4 +--
 lib/mempool/rte_mempool.c                  |  2 +-
 lib/mempool/rte_mempool.h                  |  9 +++----
 lib/pci/rte_pci.h                          |  1 -
 lib/ring/rte_ring_core.h                   |  1 -
 lib/table/rte_swx_table.h                  |  7 ++---
 lib/table/rte_swx_table_selector.h         |  5 ++--
 lib/vhost/iotlb.c                          | 11 ++++----
 lib/vhost/rte_vdpa_dev.h                   |  2 +-
 lib/vhost/vdpa.c                           |  2 +-
 63 files changed, 194 insertions(+), 121 deletions(-)

diff --git a/drivers/bus/auxiliary/private.h b/drivers/bus/auxiliary/private.h
index 9987e8b501..d22e83cf7a 100644
--- a/drivers/bus/auxiliary/private.h
+++ b/drivers/bus/auxiliary/private.h
@@ -7,6 +7,7 @@
 
 #include <stdbool.h>
 #include <stdio.h>
+#include <sys/queue.h>
 
 #include "rte_bus_auxiliary.h"
 
diff --git a/drivers/bus/auxiliary/rte_bus_auxiliary.h b/drivers/bus/auxiliary/rte_bus_auxiliary.h
index 2462bad2ba..b1f5610404 100644
--- a/drivers/bus/auxiliary/rte_bus_auxiliary.h
+++ b/drivers/bus/auxiliary/rte_bus_auxiliary.h
@@ -19,7 +19,6 @@ extern "C" {
 #include <stdlib.h>
 #include <limits.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -113,7 +112,7 @@ typedef int (rte_auxiliary_dma_unmap_t)(struct rte_auxiliary_device *dev,
  * A structure describing an auxiliary device.
  */
 struct rte_auxiliary_device {
-	TAILQ_ENTRY(rte_auxiliary_device) next;   /**< Next probed device. */
+	RTE_TAILQ_ENTRY(rte_auxiliary_device) next; /**< Next probed device. */
 	struct rte_device device;                 /**< Inherit core device */
 	char name[RTE_DEV_NAME_MAX_LEN + 1];      /**< ASCII device name */
 	struct rte_intr_handle intr_handle;       /**< Interrupt handle */
@@ -124,7 +123,7 @@ struct rte_auxiliary_device {
  * A structure describing an auxiliary driver.
  */
 struct rte_auxiliary_driver {
-	TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
 	struct rte_driver driver;             /**< Inherit core driver. */
 	struct rte_auxiliary_bus *bus;        /**< Auxiliary bus reference. */
 	rte_auxiliary_match_t *match;         /**< Device match function. */
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index e499305d85..6cab2ae760 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -105,7 +105,7 @@ dpaa_add_to_device_list(struct rte_dpaa_device *newdev)
 	struct rte_dpaa_device *dev = NULL;
 	struct rte_dpaa_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
 		comp = compare_dpaa_devices(newdev, dev);
 		if (comp < 0) {
 			TAILQ_INSERT_BEFORE(dev, newdev, next);
@@ -245,7 +245,7 @@ dpaa_clean_device_list(void)
 	struct rte_dpaa_device *dev = NULL;
 	struct rte_dpaa_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
 		TAILQ_REMOVE(&rte_dpaa_bus.device_list, dev, next);
 		free(dev);
 		dev = NULL;
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index becc455f6b..8c8f8a298d 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -45,7 +45,7 @@ cleanup_fslmc_device_list(void)
 	struct rte_dpaa2_device *dev;
 	struct rte_dpaa2_device *t_dev;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, t_dev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, t_dev) {
 		TAILQ_REMOVE(&rte_fslmc_bus.device_list, dev, next);
 		free(dev);
 		dev = NULL;
@@ -82,7 +82,7 @@ insert_in_device_list(struct rte_dpaa2_device *newdev)
 	struct rte_dpaa2_device *dev = NULL;
 	struct rte_dpaa2_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, tdev) {
 		comp = compare_dpaa2_devname(newdev, dev);
 		if (comp < 0) {
 			TAILQ_INSERT_BEFORE(dev, newdev, next);
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index c8373e627a..852fcfc4dd 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -808,7 +808,8 @@ fslmc_vfio_process_group(void)
 	bool is_dpmcp_in_blocklist = false, is_dpio_in_blocklist = false;
 	int dpmcp_count = 0, dpio_count = 0, current_device;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_MPORTAL) {
 			dpmcp_count++;
 			if (dev->device.devargs &&
@@ -825,7 +826,8 @@ fslmc_vfio_process_group(void)
 
 	/* Search the MCP as that should be initialized first. */
 	current_device = 0;
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_MPORTAL) {
 			current_device++;
 			if (dev->device.devargs &&
@@ -872,7 +874,8 @@ fslmc_vfio_process_group(void)
 	}
 
 	current_device = 0;
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_IO)
 			current_device++;
 		if (dev->device.devargs &&
diff --git a/drivers/bus/ifpga/rte_bus_ifpga.h b/drivers/bus/ifpga/rte_bus_ifpga.h
index b43084155a..0186f5acde 100644
--- a/drivers/bus/ifpga/rte_bus_ifpga.h
+++ b/drivers/bus/ifpga/rte_bus_ifpga.h
@@ -28,9 +28,9 @@ struct rte_afu_device;
 struct rte_afu_driver;
 
 /** Double linked list of Intel FPGA AFU device. */
-TAILQ_HEAD(ifpga_afu_dev_list, rte_afu_device);
+RTE_TAILQ_HEAD(ifpga_afu_dev_list, rte_afu_device);
 /** Double linked list of Intel FPGA AFU device drivers. */
-TAILQ_HEAD(ifpga_afu_drv_list, rte_afu_driver);
+RTE_TAILQ_HEAD(ifpga_afu_drv_list, rte_afu_driver);
 
 #define IFPGA_BUS_BITSTREAM_PATH_MAX_LEN 256
 
@@ -71,7 +71,7 @@ struct rte_afu_shared {
  * A structure describing a AFU device.
  */
 struct rte_afu_device {
-	TAILQ_ENTRY(rte_afu_device) next;       /**< Next in device list. */
+	RTE_TAILQ_ENTRY(rte_afu_device) next;       /**< Next in device list. */
 	struct rte_device device;               /**< Inherit core device */
 	struct rte_rawdev *rawdev;    /**< Point Rawdev */
 	struct rte_afu_id id;                   /**< AFU id within FPGA. */
@@ -105,7 +105,7 @@ typedef int (afu_remove_t)(struct rte_afu_device *);
  * A structure describing a AFU device.
  */
 struct rte_afu_driver {
-	TAILQ_ENTRY(rte_afu_driver) next;       /**< Next afu driver. */
+	RTE_TAILQ_ENTRY(rte_afu_driver) next;       /**< Next afu driver. */
 	struct rte_driver driver;               /**< Inherit core driver. */
 	afu_probe_t *probe;                     /**< Device Probe function. */
 	afu_remove_t *remove;                   /**< Device Remove function. */
diff --git a/drivers/bus/pci/pci_params.c b/drivers/bus/pci/pci_params.c
index 3192e9c967..717388753d 100644
--- a/drivers/bus/pci/pci_params.c
+++ b/drivers/bus/pci/pci_params.c
@@ -2,6 +2,8 @@
  * Copyright 2018 Gaëtan Rivet
  */
 
+#include <sys/queue.h>
+
 #include <rte_bus.h>
 #include <rte_bus_pci.h>
 #include <rte_dev.h>
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 583470e831..673a2850c1 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -19,7 +19,6 @@ extern "C" {
 #include <stdlib.h>
 #include <limits.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -37,16 +36,16 @@ struct rte_pci_device;
 struct rte_pci_driver;
 
 /** List of PCI devices */
-TAILQ_HEAD(rte_pci_device_list, rte_pci_device);
+RTE_TAILQ_HEAD(rte_pci_device_list, rte_pci_device);
 /** List of PCI drivers */
-TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
+RTE_TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
 
 /* PCI Bus iterators */
 #define FOREACH_DEVICE_ON_PCIBUS(p)	\
-		TAILQ_FOREACH(p, &(rte_pci_bus.device_list), next)
+		RTE_TAILQ_FOREACH(p, &(rte_pci_bus.device_list), next)
 
 #define FOREACH_DRIVER_ON_PCIBUS(p)	\
-		TAILQ_FOREACH(p, &(rte_pci_bus.driver_list), next)
+		RTE_TAILQ_FOREACH(p, &(rte_pci_bus.driver_list), next)
 
 struct rte_devargs;
 
@@ -64,7 +63,7 @@ enum rte_pci_kernel_driver {
  * A structure describing a PCI device.
  */
 struct rte_pci_device {
-	TAILQ_ENTRY(rte_pci_device) next;   /**< Next probed PCI device. */
+	RTE_TAILQ_ENTRY(rte_pci_device) next;   /**< Next probed PCI device. */
 	struct rte_device device;           /**< Inherit core device */
 	struct rte_pci_addr addr;           /**< PCI location. */
 	struct rte_pci_id id;               /**< PCI ID. */
@@ -160,7 +159,7 @@ typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
-	TAILQ_ENTRY(rte_pci_driver) next;  /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_pci_driver) next;  /**< Next in list. */
 	struct rte_driver driver;          /**< Inherit core driver. */
 	struct rte_pci_bus *bus;           /**< PCI bus reference. */
 	rte_pci_probe_t *probe;            /**< Device probe function. */
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index d39a7748b8..d7bd5d6e80 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -1,6 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2020 Mellanox Technologies, Ltd
  */
+
+#include <sys/queue.h>
+
 #include <rte_windows.h>
 #include <rte_errno.h>
 #include <rte_log.h>
diff --git a/drivers/bus/pci/windows/pci_netuio.c b/drivers/bus/pci/windows/pci_netuio.c
index 1bf9133f71..a0b175a8fc 100644
--- a/drivers/bus/pci/windows/pci_netuio.c
+++ b/drivers/bus/pci/windows/pci_netuio.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2020 Intel Corporation.
  */
 
+#include <sys/queue.h>
+
 #include <rte_windows.h>
 #include <rte_errno.h>
 #include <rte_log.h>
diff --git a/drivers/bus/vdev/rte_bus_vdev.h b/drivers/bus/vdev/rte_bus_vdev.h
index fc315d10fa..2856799953 100644
--- a/drivers/bus/vdev/rte_bus_vdev.h
+++ b/drivers/bus/vdev/rte_bus_vdev.h
@@ -15,12 +15,11 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
 
 struct rte_vdev_device {
-	TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
+	RTE_TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
 	struct rte_device device;               /**< Inherit core device */
 };
 
@@ -53,7 +52,7 @@ rte_vdev_device_args(const struct rte_vdev_device *dev)
 }
 
 /** Double linked list of virtual device drivers. */
-TAILQ_HEAD(vdev_driver_list, rte_vdev_driver);
+RTE_TAILQ_HEAD(vdev_driver_list, rte_vdev_driver);
 
 /**
  * Probe function called for each virtual device driver once.
@@ -107,7 +106,7 @@ typedef int (rte_vdev_dma_unmap_t)(struct rte_vdev_device *dev, void *addr,
  * A virtual device driver abstraction.
  */
 struct rte_vdev_driver {
-	TAILQ_ENTRY(rte_vdev_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_vdev_driver) next; /**< Next in list. */
 	struct rte_driver driver;        /**< Inherited general driver. */
 	rte_vdev_probe_t *probe;         /**< Virtual device probe function. */
 	rte_vdev_remove_t *remove;       /**< Virtual device remove function. */
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index 281a2c34e8..a8d8b2327e 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -100,7 +100,8 @@ rte_vdev_remove_custom_scan(rte_vdev_scan_callback callback, void *user_arg)
 	struct vdev_custom_scan *custom_scan, *tmp_scan;
 
 	rte_spinlock_lock(&vdev_custom_scan_lock);
-	TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next, tmp_scan) {
+	RTE_TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next,
+				tmp_scan) {
 		if (custom_scan->callback != callback ||
 				(custom_scan->user_arg != (void *)-1 &&
 				custom_scan->user_arg != user_arg))
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h
index 4cf73ce815..6bcff66468 100644
--- a/drivers/bus/vmbus/rte_bus_vmbus.h
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -20,7 +20,6 @@ extern "C" {
 #include <limits.h>
 #include <stdbool.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -38,15 +37,15 @@ struct rte_vmbus_bus;
 struct vmbus_channel;
 struct vmbus_mon_page;
 
-TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
-TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
+RTE_TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
+RTE_TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
 
 /* VMBus iterators */
 #define FOREACH_DEVICE_ON_VMBUS(p)	\
-	TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
+	RTE_TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
 
 #define FOREACH_DRIVER_ON_VMBUS(p)	\
-	TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
+	RTE_TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
 
 /** Maximum number of VMBUS resources. */
 enum hv_uio_map {
@@ -62,7 +61,7 @@ enum hv_uio_map {
  * A structure describing a VMBUS device.
  */
 struct rte_vmbus_device {
-	TAILQ_ENTRY(rte_vmbus_device) next;    /**< Next probed VMBUS device */
+	RTE_TAILQ_ENTRY(rte_vmbus_device) next; /**< Next probed VMBUS device */
 	const struct rte_vmbus_driver *driver; /**< Associated driver */
 	struct rte_device device;              /**< Inherit core device */
 	rte_uuid_t device_id;		       /**< VMBUS device id */
@@ -93,7 +92,7 @@ typedef int (vmbus_remove_t)(struct rte_vmbus_device *);
  * A structure describing a VMBUS driver.
  */
 struct rte_vmbus_driver {
-	TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
 	struct rte_driver driver;
 	struct rte_vmbus_bus *bus;          /**< VM bus reference. */
 	vmbus_probe_t *probe;               /**< Device Probe function. */
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index dbf85e4eda..ac86b70caf 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -2018,7 +2018,7 @@ bnxt_ulp_cntxt_list_del(struct bnxt_ulp_context *ulp_ctx)
 	struct ulp_context_list_entry	*entry, *temp;
 
 	rte_spinlock_lock(&bnxt_ulp_ctxt_lock);
-	TAILQ_FOREACH_SAFE(entry, &ulp_cntx_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(entry, &ulp_cntx_list, next, temp) {
 		if (entry->ulp_ctx == ulp_ctx) {
 			TAILQ_REMOVE(&ulp_cntx_list, entry, next);
 			rte_free(entry);
diff --git a/drivers/net/bonding/rte_eth_bond_flow.c b/drivers/net/bonding/rte_eth_bond_flow.c
index 417f76bf60..65b77faae7 100644
--- a/drivers/net/bonding/rte_eth_bond_flow.c
+++ b/drivers/net/bonding/rte_eth_bond_flow.c
@@ -157,7 +157,7 @@ bond_flow_flush(struct rte_eth_dev *dev, struct rte_flow_error *err)
 	/* Destroy all bond flows from its slaves instead of flushing them to
 	 * keep the LACP flow or any other external flows.
 	 */
-	TAILQ_FOREACH_SAFE(flow, &internals->flow_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &internals->flow_list, next, tmp) {
 		lret = bond_flow_destroy(dev, flow, err);
 		if (unlikely(lret != 0))
 			ret = lret;
diff --git a/drivers/net/failsafe/failsafe_flow.c b/drivers/net/failsafe/failsafe_flow.c
index 5e2b5f7c67..354f9fec20 100644
--- a/drivers/net/failsafe/failsafe_flow.c
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -180,7 +180,7 @@ fs_flow_flush(struct rte_eth_dev *dev,
 			return ret;
 		}
 	}
-	TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
 		TAILQ_REMOVE(&PRIV(dev)->flow_list, flow, next);
 		fs_flow_release(&flow);
 	}
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 7b230e2ed1..6590363556 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -5436,7 +5436,7 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 
 	/* VSI has child to attach, release child first */
 	if (vsi->veb) {
-		TAILQ_FOREACH_SAFE(vsi_list, &vsi->veb->head, list, temp) {
+		RTE_TAILQ_FOREACH_SAFE(vsi_list, &vsi->veb->head, list, temp) {
 			if (i40e_vsi_release(vsi_list->vsi) != I40E_SUCCESS)
 				return -1;
 		}
@@ -5444,7 +5444,8 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 	}
 
 	if (vsi->floating_veb) {
-		TAILQ_FOREACH_SAFE(vsi_list, &vsi->floating_veb->head, list, temp) {
+		RTE_TAILQ_FOREACH_SAFE(vsi_list, &vsi->floating_veb->head,
+			list, temp) {
 			if (i40e_vsi_release(vsi_list->vsi) != I40E_SUCCESS)
 				return -1;
 		}
@@ -5452,7 +5453,7 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 
 	/* Remove all macvlan filters of the VSI */
 	i40e_vsi_remove_all_macvlan_filter(vsi);
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
 		rte_free(f);
 
 	if (vsi->type != I40E_VSI_MAIN &&
@@ -6055,7 +6056,7 @@ i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on)
 	i = 0;
 
 	/* Remove all existing mac */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		mac_filter[i] = f->mac_info;
 		ret = i40e_vsi_delete_mac(vsi, &f->mac_info.mac_addr);
 		if (ret) {
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cd6deabd60..374b73e4a7 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -6,6 +6,7 @@
 #define _I40E_ETHDEV_H_
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 #include <rte_time.h>
 #include <rte_kvargs.h>
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 3c1570bd9c..e41a84f1d7 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -4917,7 +4917,7 @@ i40e_flow_flush_fdir_filter(struct i40e_pf *pf)
 		}
 
 		/* Delete FDIR flows in flow list. */
-		TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+		RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 			if (flow->filter_type == RTE_ETH_FILTER_FDIR) {
 				TAILQ_REMOVE(&pf->flow_list, flow, node);
 			}
@@ -4972,7 +4972,7 @@ i40e_flow_flush_ethertype_filter(struct i40e_pf *pf)
 	}
 
 	/* Delete ethertype flows in flow list. */
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 		if (flow->filter_type == RTE_ETH_FILTER_ETHERTYPE) {
 			TAILQ_REMOVE(&pf->flow_list, flow, node);
 			rte_free(flow);
@@ -5000,7 +5000,7 @@ i40e_flow_flush_tunnel_filter(struct i40e_pf *pf)
 	}
 
 	/* Delete tunnel flows in flow list. */
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 		if (flow->filter_type == RTE_ETH_FILTER_TUNNEL) {
 			TAILQ_REMOVE(&pf->flow_list, flow, node);
 			rte_free(flow);
diff --git a/drivers/net/i40e/i40e_hash.c b/drivers/net/i40e/i40e_hash.c
index 1fb8c9abfc..6579b1a00b 100644
--- a/drivers/net/i40e/i40e_hash.c
+++ b/drivers/net/i40e/i40e_hash.c
@@ -1366,7 +1366,7 @@ i40e_hash_filter_flush(struct i40e_pf *pf)
 {
 	struct rte_flow *flow, *next;
 
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, next) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, next) {
 		if (flow->filter_type != RTE_ETH_FILTER_HASH)
 			continue;
 
diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index 2e34140c5b..ec24046440 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -216,7 +216,7 @@ i40e_vsi_rm_mac_filter(struct i40e_vsi *vsi)
 	void *temp;
 
 	/* remove all the MACs */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		vlan_num = vsi->vlan_num;
 		filter_type = f->mac_info.filter_type;
 		if (filter_type == I40E_MACVLAN_PERFECT_MATCH ||
@@ -274,7 +274,7 @@ i40e_vsi_restore_mac_filter(struct i40e_vsi *vsi)
 	void *temp;
 
 	/* restore all the MACs */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		if (f->mac_info.filter_type == I40E_MACVLAN_PERFECT_MATCH ||
 		    f->mac_info.filter_type == I40E_MACVLAN_HASH_MATCH) {
 			/**
@@ -563,7 +563,7 @@ rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 	rte_ether_addr_copy(mac_addr, &vf->mac_addr);
 
 	/* Remove all existing mac */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
 		if (i40e_vsi_delete_mac(vsi, &f->mac_info.mac_addr)
 				!= I40E_SUCCESS)
 			PMD_DRV_LOG(WARNING, "Delete MAC failed");
diff --git a/drivers/net/iavf/iavf_generic_flow.c b/drivers/net/iavf/iavf_generic_flow.c
index 1fe270fb22..b86d99e57d 100644
--- a/drivers/net/iavf/iavf_generic_flow.c
+++ b/drivers/net/iavf/iavf_generic_flow.c
@@ -1637,7 +1637,7 @@ iavf_flow_init(struct iavf_adapter *ad)
 	TAILQ_INIT(&vf->dist_parser_list);
 	rte_spinlock_init(&vf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->init == NULL) {
 			PMD_INIT_LOG(ERR, "Invalid engine type (%d)",
 				     engine->type);
@@ -1663,7 +1663,7 @@ iavf_flow_uninit(struct iavf_adapter *ad)
 	struct iavf_flow_parser_node *p_parser;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->uninit)
 			engine->uninit(ad);
 	}
@@ -1733,7 +1733,7 @@ iavf_unregister_parser(struct iavf_flow_parser *parser,
 	if (list == NULL)
 		return;
 
-	TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
 		if (p_parser->parser->engine->type == parser->engine->type) {
 			TAILQ_REMOVE(list, p_parser, node);
 			rte_free(p_parser);
@@ -1917,7 +1917,7 @@ iavf_parse_engine_create(struct iavf_adapter *ad,
 	void *temp;
 	void *meta = NULL;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -1946,7 +1946,7 @@ iavf_parse_engine_validate(struct iavf_adapter *ad,
 	void *temp;
 	void *meta = NULL;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -2089,7 +2089,7 @@ iavf_flow_is_valid(struct rte_flow *flow)
 	void *temp;
 
 	if (flow && flow->engine) {
-		TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+		RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 			if (engine == flow->engine)
 				return true;
 		}
@@ -2142,7 +2142,7 @@ iavf_flow_flush(struct rte_eth_dev *dev,
 	void *temp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(p_flow, &vf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &vf->flow_list, node, temp) {
 		ret = iavf_flow_destroy(dev, p_flow, error);
 		if (ret) {
 			PMD_DRV_LOG(ERR, "Failed to flush flows");
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index cab7c4da87..629e88980d 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -4,6 +4,7 @@
 
 #include <errno.h>
 #include <stdbool.h>
+#include <sys/queue.h>
 #include <sys/types.h>
 #include <unistd.h>
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index a4cd39c954..fadd5f2e5a 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -1104,7 +1104,7 @@ ice_remove_all_mac_vlan_filters(struct ice_vsi *vsi)
 	if (!vsi || !vsi->mac_num)
 		return -EINVAL;
 
-	TAILQ_FOREACH_SAFE(m_f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(m_f, &vsi->mac_list, next, temp) {
 		ret = ice_remove_mac_filter(vsi, &m_f->mac_info.mac_addr);
 		if (ret != ICE_SUCCESS) {
 			ret = -EINVAL;
@@ -1115,7 +1115,7 @@ ice_remove_all_mac_vlan_filters(struct ice_vsi *vsi)
 	if (vsi->vlan_num == 0)
 		return 0;
 
-	TAILQ_FOREACH_SAFE(v_f, &vsi->vlan_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(v_f, &vsi->vlan_list, next, temp) {
 		ret = ice_remove_vlan_filter(vsi, &v_f->vlan_info.vlan);
 		if (ret != ICE_SUCCESS) {
 			ret = -EINVAL;
diff --git a/drivers/net/ice/ice_generic_flow.c b/drivers/net/ice/ice_generic_flow.c
index 66b5743abf..3e557efe0c 100644
--- a/drivers/net/ice/ice_generic_flow.c
+++ b/drivers/net/ice/ice_generic_flow.c
@@ -1820,7 +1820,7 @@ ice_flow_init(struct ice_adapter *ad)
 	TAILQ_INIT(&pf->dist_parser_list);
 	rte_spinlock_init(&pf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->init == NULL) {
 			PMD_INIT_LOG(ERR, "Invalid engine type (%d)",
 					engine->type);
@@ -1846,7 +1846,7 @@ ice_flow_uninit(struct ice_adapter *ad)
 	struct ice_flow_parser_node *p_parser;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->uninit)
 			engine->uninit(ad);
 	}
@@ -1946,7 +1946,7 @@ ice_unregister_parser(struct ice_flow_parser *parser,
 	if (list == NULL)
 		return;
 
-	TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
 		if (p_parser->parser->engine->type == parser->engine->type) {
 			TAILQ_REMOVE(list, p_parser, node);
 			rte_free(p_parser);
@@ -2272,7 +2272,7 @@ ice_parse_engine_create(struct ice_adapter *ad,
 	void *meta = NULL;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		int ret;
 
 		if (parser_node->parser->parse_pattern_action(ad,
@@ -2305,7 +2305,7 @@ ice_parse_engine_validate(struct ice_adapter *ad,
 	struct ice_flow_parser_node *parser_node;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -2477,7 +2477,7 @@ ice_flow_flush(struct rte_eth_dev *dev,
 	void *temp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
 		ret = ice_flow_destroy(dev, p_flow, error);
 		if (ret) {
 			PMD_DRV_LOG(ERR, "Failed to flush flows");
@@ -2541,7 +2541,7 @@ ice_flow_redirect(struct ice_adapter *ad,
 
 	rte_spinlock_lock(&pf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
 		if (!p_flow->engine->redirect)
 			continue;
 		ret = p_flow->engine->redirect(ad, p_flow, rd);
diff --git a/drivers/net/ipn3ke/ipn3ke_flow.c b/drivers/net/ipn3ke/ipn3ke_flow.c
index c702e19ea5..f5867ca055 100644
--- a/drivers/net/ipn3ke/ipn3ke_flow.c
+++ b/drivers/net/ipn3ke/ipn3ke_flow.c
@@ -1231,7 +1231,7 @@ ipn3ke_flow_flush(struct rte_eth_dev *dev,
 	struct ipn3ke_hw *hw = IPN3KE_DEV_PRIVATE_TO_HW(dev);
 	struct rte_flow *flow, *temp;
 
-	TAILQ_FOREACH_SAFE(flow, &hw->flow_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &hw->flow_list, next, temp) {
 		TAILQ_REMOVE(&hw->flow_list, flow, next);
 		rte_free(flow);
 	}
diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c b/drivers/net/softnic/rte_eth_softnic_flow.c
index 27eaf380cd..7d054c38d2 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -2207,7 +2207,8 @@ pmd_flow_flush(struct rte_eth_dev *dev,
 			void *temp;
 			int status;
 
-			TAILQ_FOREACH_SAFE(flow, &table->flows, node, temp) {
+			RTE_TAILQ_FOREACH_SAFE(flow, &table->flows, node,
+				temp) {
 				/* Rule delete. */
 				status = softnic_pipeline_table_rule_delete
 						(softnic,
diff --git a/drivers/net/softnic/rte_eth_softnic_swq.c b/drivers/net/softnic/rte_eth_softnic_swq.c
index 2083d0a976..afe6f05e29 100644
--- a/drivers/net/softnic/rte_eth_softnic_swq.c
+++ b/drivers/net/softnic/rte_eth_softnic_swq.c
@@ -39,7 +39,7 @@ softnic_softnic_swq_free_keep_rxq_txq(struct pmd_internals *p)
 {
 	struct softnic_swq *swq, *tswq;
 
-	TAILQ_FOREACH_SAFE(swq, &p->swq_list, node, tswq) {
+	RTE_TAILQ_FOREACH_SAFE(swq, &p->swq_list, node, tswq) {
 		if ((strncmp(swq->name, "RXQ", strlen("RXQ")) == 0) ||
 			(strncmp(swq->name, "TXQ", strlen("TXQ")) == 0))
 			continue;
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index c961e18d67..7b80370b36 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -1606,7 +1606,7 @@ remove_hw_queues_from_list(struct dpaa2_dpdmai_dev *dpdmai_dev)
 
 	DPAA2_QDMA_FUNC_TRACE();
 
-	TAILQ_FOREACH_SAFE(queue, &qdma_queue_list, next, tqueue) {
+	RTE_TAILQ_FOREACH_SAFE(queue, &qdma_queue_list, next, tqueue) {
 		if (queue->dpdmai_dev == dpdmai_dev) {
 			TAILQ_REMOVE(&qdma_queue_list, queue, next);
 			rte_free(queue);
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 7017124414..3ebf62e697 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -434,7 +434,7 @@ struct rte_bbdev_callback;
 struct rte_intr_handle;
 
 /** Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_bbdev_cb_list, rte_bbdev_callback);
+RTE_TAILQ_HEAD(rte_bbdev_cb_list, rte_bbdev_callback);
 
 /**
  * @internal The data structure associated with a device. Drivers can access
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index 11f4e6fdbf..f86bf2260b 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -879,7 +879,7 @@ typedef uint16_t (*enqueue_pkt_burst_t)(void *qp,
 struct rte_cryptodev_callback;
 
 /** Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
+RTE_TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
 
 /**
  * Structure used to hold information about the callbacks to be called for a
diff --git a/lib/cryptodev/rte_cryptodev_pmd.h b/lib/cryptodev/rte_cryptodev_pmd.h
index 1274436870..9542cbf263 100644
--- a/lib/cryptodev/rte_cryptodev_pmd.h
+++ b/lib/cryptodev/rte_cryptodev_pmd.h
@@ -66,7 +66,7 @@ struct rte_cryptodev_global {
 
 /* Cryptodev driver, containing the driver ID */
 struct cryptodev_driver {
-	TAILQ_ENTRY(cryptodev_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(cryptodev_driver) next; /**< Next in list. */
 	const struct rte_driver *driver;
 	uint8_t id;
 };
diff --git a/lib/eal/common/eal_common_devargs.c b/lib/eal/common/eal_common_devargs.c
index 23aaf8b7e4..7edc6798fe 100644
--- a/lib/eal/common/eal_common_devargs.c
+++ b/lib/eal/common/eal_common_devargs.c
@@ -9,6 +9,7 @@
 #include <stdio.h>
 #include <string.h>
 #include <stdarg.h>
+#include <sys/queue.h>
 
 #include <rte_bus.h>
 #include <rte_class.h>
@@ -18,6 +19,7 @@
 #include <rte_errno.h>
 #include <rte_kvargs.h>
 #include <rte_log.h>
+#include <rte_os.h>
 #include <rte_tailq.h>
 #include "eal_private.h"
 
@@ -291,7 +293,7 @@ rte_devargs_insert(struct rte_devargs **da)
 	if (*da == NULL || (*da)->bus == NULL)
 		return -1;
 
-	TAILQ_FOREACH_SAFE(listed_da, &devargs_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(listed_da, &devargs_list, next, tmp) {
 		if (listed_da == *da)
 			/* devargs already in the list */
 			return 0;
@@ -358,7 +360,7 @@ rte_devargs_remove(struct rte_devargs *devargs)
 	if (devargs == NULL || devargs->bus == NULL)
 		return -1;
 
-	TAILQ_FOREACH_SAFE(d, &devargs_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(d, &devargs_list, next, tmp) {
 		if (strcmp(d->bus->name, devargs->bus->name) == 0 &&
 		    strcmp(d->name, devargs->name) == 0) {
 			TAILQ_REMOVE(&devargs_list, d, next);
diff --git a/lib/eal/common/eal_common_fbarray.c b/lib/eal/common/eal_common_fbarray.c
index 3a28a53247..75168ca552 100644
--- a/lib/eal/common/eal_common_fbarray.c
+++ b/lib/eal/common/eal_common_fbarray.c
@@ -9,6 +9,7 @@
 #include <errno.h>
 #include <string.h>
 #include <unistd.h>
+#include <sys/queue.h>
 
 #include <rte_common.h>
 #include <rte_eal_paging.h>
diff --git a/lib/eal/common/eal_common_log.c b/lib/eal/common/eal_common_log.c
index ec8fe23a7f..1be35f5397 100644
--- a/lib/eal/common/eal_common_log.c
+++ b/lib/eal/common/eal_common_log.c
@@ -10,6 +10,7 @@
 #include <errno.h>
 #include <regex.h>
 #include <fnmatch.h>
+#include <sys/queue.h>
 
 #include <rte_eal.h>
 #include <rte_log.h>
diff --git a/lib/eal/common/eal_common_memalloc.c b/lib/eal/common/eal_common_memalloc.c
index e872c6533b..aefdf8de3f 100644
--- a/lib/eal/common/eal_common_memalloc.c
+++ b/lib/eal/common/eal_common_memalloc.c
@@ -3,6 +3,7 @@
  */
 
 #include <string.h>
+#include <sys/queue.h>
 
 #include <rte_errno.h>
 #include <rte_lcore.h>
diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
index ff5861b5f3..2cc74b4472 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <string.h>
+#include <sys/queue.h>
 #ifndef RTE_EXEC_ENV_WINDOWS
 #include <syslog.h>
 #endif
@@ -283,7 +284,7 @@ eal_option_device_parse(void)
 	void *tmp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(devopt, &devopt_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(devopt, &devopt_list, next, tmp) {
 		if (ret == 0) {
 			ret = rte_devargs_add(devopt->type, devopt->arg);
 			if (ret)
diff --git a/lib/eal/common/eal_trace.h b/lib/eal/common/eal_trace.h
index 06751eb23a..76fbcd86b0 100644
--- a/lib/eal/common/eal_trace.h
+++ b/lib/eal/common/eal_trace.h
@@ -5,6 +5,8 @@
 #ifndef __EAL_TRACE_H
 #define __EAL_TRACE_H
 
+#include <sys/queue.h>
+
 #include <rte_cycles.h>
 #include <rte_log.h>
 #include <rte_malloc.h>
diff --git a/lib/eal/freebsd/include/rte_os.h b/lib/eal/freebsd/include/rte_os.h
index 627f0483ab..099ad3f019 100644
--- a/lib/eal/freebsd/include/rte_os.h
+++ b/lib/eal/freebsd/include/rte_os.h
@@ -11,6 +11,21 @@
  */
 
 #include <pthread_np.h>
+#include <sys/queue.h>
+
+/* These macros are compatible with system's sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
+#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
+#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
+#define	RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
+#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
+#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
+#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)
+
 
 typedef cpuset_t rte_cpuset_t;
 #define RTE_HAS_CPUSET
diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
index 80b154fb98..84d364df3f 100644
--- a/lib/eal/include/rte_bus.h
+++ b/lib/eal/include/rte_bus.h
@@ -19,13 +19,12 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 
 #include <rte_log.h>
 #include <rte_dev.h>
 
 /** Double linked list of buses */
-TAILQ_HEAD(rte_bus_list, rte_bus);
+RTE_TAILQ_HEAD(rte_bus_list, rte_bus);
 
 
 /**
@@ -250,7 +249,7 @@ typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
  * A structure describing a generic bus.
  */
 struct rte_bus {
-	TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
+	RTE_TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
 	const char *name;            /**< Name of the bus */
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
diff --git a/lib/eal/include/rte_class.h b/lib/eal/include/rte_class.h
index 856d09b22d..d560339652 100644
--- a/lib/eal/include/rte_class.h
+++ b/lib/eal/include/rte_class.h
@@ -22,18 +22,16 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
-
 #include <rte_dev.h>
 
 /** Double linked list of classes */
-TAILQ_HEAD(rte_class_list, rte_class);
+RTE_TAILQ_HEAD(rte_class_list, rte_class);
 
 /**
  * A structure describing a generic device class.
  */
 struct rte_class {
-	TAILQ_ENTRY(rte_class) next; /**< Next device class in linked list */
+	RTE_TAILQ_ENTRY(rte_class) next; /**< Next device class in linked list */
 	const char *name; /**< Name of the class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 };
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index 6dd72c11a1..f6efe0c94e 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -18,7 +18,6 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_compat.h>
@@ -75,7 +74,7 @@ struct rte_mem_resource {
  * A structure describing a device driver.
  */
 struct rte_driver {
-	TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
 	const char *name;                   /**< Driver name. */
 	const char *alias;              /**< Driver alias. */
 };
@@ -90,7 +89,7 @@ struct rte_driver {
  * A structure describing a generic device.
  */
 struct rte_device {
-	TAILQ_ENTRY(rte_device) next; /**< Next device */
+	RTE_TAILQ_ENTRY(rte_device) next; /**< Next device */
 	const char *name;             /**< Device name */
 	const struct rte_driver *driver; /**< Driver assigned after probing */
 	const struct rte_bus *bus;    /**< Bus handle assigned on scan */
diff --git a/lib/eal/include/rte_devargs.h b/lib/eal/include/rte_devargs.h
index cd90944fe8..957477b398 100644
--- a/lib/eal/include/rte_devargs.h
+++ b/lib/eal/include/rte_devargs.h
@@ -21,7 +21,6 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 #include <rte_compat.h>
 #include <rte_bus.h>
 
@@ -76,7 +75,7 @@ enum rte_devtype {
  */
 struct rte_devargs {
 	/** Next in list. */
-	TAILQ_ENTRY(rte_devargs) next;
+	RTE_TAILQ_ENTRY(rte_devargs) next;
 	/** Type of device. */
 	enum rte_devtype type;
 	/** Device policy. */
diff --git a/lib/eal/include/rte_log.h b/lib/eal/include/rte_log.h
index b706bb8710..bb3523467b 100644
--- a/lib/eal/include/rte_log.h
+++ b/lib/eal/include/rte_log.h
@@ -21,7 +21,6 @@ extern "C" {
 #include <stdio.h>
 #include <stdarg.h>
 #include <stdbool.h>
-#include <sys/queue.h>
 
 #include <rte_common.h>
 #include <rte_config.h>
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index c7d037d862..1c9275c32a 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -29,7 +29,6 @@ extern "C" {
 
 #include<stdio.h>
 #include <stdint.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_lcore.h>
diff --git a/lib/eal/include/rte_tailq.h b/lib/eal/include/rte_tailq.h
index b6fe4e5f78..28cd54ef3e 100644
--- a/lib/eal/include/rte_tailq.h
+++ b/lib/eal/include/rte_tailq.h
@@ -15,17 +15,16 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
 #include <stdio.h>
 #include <rte_debug.h>
 
 /** dummy structure type used by the rte_tailq APIs */
 struct rte_tailq_entry {
-	TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
+	RTE_TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
 	void *data; /**< Pointer to the data referenced by this tailq entry */
 };
 /** dummy */
-TAILQ_HEAD(rte_tailq_entry_head, rte_tailq_entry);
+RTE_TAILQ_HEAD(rte_tailq_entry_head, rte_tailq_entry);
 
 #define RTE_TAILQ_NAMESIZE 32
 
@@ -48,7 +47,7 @@ struct rte_tailq_elem {
 	 * rte_eal_tailqs_init()
 	 */
 	struct rte_tailq_head *head;
-	TAILQ_ENTRY(rte_tailq_elem) next;
+	RTE_TAILQ_ENTRY(rte_tailq_elem) next;
 	const char name[RTE_TAILQ_NAMESIZE];
 };
 
@@ -126,10 +125,10 @@ RTE_INIT(tailqinitfn_ ##t) \
 }
 
 /* This macro permits both remove and free var within the loop safely.*/
-#ifndef TAILQ_FOREACH_SAFE
-#define TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
-	for ((var) = TAILQ_FIRST((head));			\
-	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1);	\
+#ifndef RTE_TAILQ_FOREACH_SAFE
+#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
+	for ((var) = RTE_TAILQ_FIRST((head));			\
+	    (var) && ((tvar) = RTE_TAILQ_NEXT((var), field), 1);	\
 	    (var) = (tvar))
 #endif
 
diff --git a/lib/eal/linux/include/rte_os.h b/lib/eal/linux/include/rte_os.h
index 1618b4df22..1a6e5b789f 100644
--- a/lib/eal/linux/include/rte_os.h
+++ b/lib/eal/linux/include/rte_os.h
@@ -11,6 +11,21 @@
  */
 
 #include <sched.h>
+#include <sys/queue.h>
+
+/* These macros are compatible with system's sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
+#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
+#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
+#define	RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
+#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
+#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
+#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)
+
 
 #ifdef CPU_SETSIZE /* may require _GNU_SOURCE */
 typedef cpu_set_t rte_cpuset_t;
diff --git a/lib/eal/windows/eal_alarm.c b/lib/eal/windows/eal_alarm.c
index e5dc54efb8..103c1f909d 100644
--- a/lib/eal/windows/eal_alarm.c
+++ b/lib/eal/windows/eal_alarm.c
@@ -4,6 +4,7 @@
 
 #include <stdatomic.h>
 #include <stdbool.h>
+#include <sys/queue.h>
 
 #include <rte_alarm.h>
 #include <rte_spinlock.h>
diff --git a/lib/eal/windows/include/rte_os.h b/lib/eal/windows/include/rte_os.h
index 66c711d458..ee7a8c7a08 100644
--- a/lib/eal/windows/include/rte_os.h
+++ b/lib/eal/windows/include/rte_os.h
@@ -18,6 +18,37 @@
 extern "C" {
 #endif
 
+#define	RTE_TAILQ_HEAD(name, type) \
+struct name { \
+	struct type *tqh_first;	/* first element */ \
+	struct type **tqh_last;	/* addr of last next element */	\
+}
+#define	RTE_TAILQ_ENTRY(type) \
+struct { \
+	struct type *tqe_next;	/* next element */ \
+	struct type **tqe_prev;	/* address of previous next element */ \
+}
+#define	RTE_TAILQ_FOREACH(var, head, field) \
+	for ((var) = RTE_TAILQ_FIRST((head)); \
+	    (var); \
+	    (var) = RTE_TAILQ_NEXT((var), field))
+#define	RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define	RTE_TAILQ_FIRST(head)	((head)->tqh_first)
+#define	RTE_TAILQ_NEXT(elm, field) ((elm)->field.tqe_next)
+#define	RTE_STAILQ_HEAD(name, type) \
+struct name { \
+	struct type *stqh_first;/* first element */ \
+	struct type **stqh_last;/* addr of last next element */ \
+}
+#define	RTE_STAILQ_ENTRY(type) \
+struct { \
+	struct type *stqe_next;	/* next element */ \
+}
+
+
 /* cpu_set macros implementation */
 #define RTE_CPU_AND(dst, src1, src2) CPU_AND(dst, src1, src2)
 #define RTE_CPU_OR(dst, src1, src2) CPU_OR(dst, src1, src2)
diff --git a/lib/efd/rte_efd.c b/lib/efd/rte_efd.c
index 77f46809f8..5bf517fee9 100644
--- a/lib/efd/rte_efd.c
+++ b/lib/efd/rte_efd.c
@@ -759,7 +759,7 @@ rte_efd_free(struct rte_efd_table *table)
 	efd_list = RTE_TAILQ_CAST(rte_efd_tailq.head, rte_efd_list);
 	rte_mcfg_tailq_write_lock();
 
-	TAILQ_FOREACH_SAFE(te, efd_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(te, efd_list, next, temp) {
 		if (te->data == (void *) table) {
 			TAILQ_REMOVE(efd_list, te, next);
 			rte_free(te);
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index edf96de2dc..d2c9ec42c7 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -21,7 +21,7 @@
 
 struct rte_eth_dev_callback;
 /** @internal Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
+RTE_TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
 
 struct rte_eth_dev;
 
diff --git a/lib/hash/rte_fbk_hash.h b/lib/hash/rte_fbk_hash.h
index c4d6976d2b..9c3a61c1d6 100644
--- a/lib/hash/rte_fbk_hash.h
+++ b/lib/hash/rte_fbk_hash.h
@@ -17,7 +17,6 @@
 
 #include <stdint.h>
 #include <errno.h>
-#include <sys/queue.h>
 
 #ifdef __cplusplus
 extern "C" {
diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c
index d5a95a6e00..696a1121e2 100644
--- a/lib/hash/rte_thash.c
+++ b/lib/hash/rte_thash.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2021 Intel Corporation
  */
 
+#include <sys/queue.h>
+
 #include <rte_thash.h>
 #include <rte_tailq.h>
 #include <rte_random.h>
diff --git a/lib/ip_frag/rte_ip_frag.h b/lib/ip_frag/rte_ip_frag.h
index 0bfe64b14e..80f931c32a 100644
--- a/lib/ip_frag/rte_ip_frag.h
+++ b/lib/ip_frag/rte_ip_frag.h
@@ -62,7 +62,7 @@ struct ip_frag_key {
  * First two entries in the frags[] array are for the last and first fragments.
  */
 struct ip_frag_pkt {
-	TAILQ_ENTRY(ip_frag_pkt) lru;   /**< LRU list */
+	RTE_TAILQ_ENTRY(ip_frag_pkt) lru;   /**< LRU list */
 	struct ip_frag_key key;           /**< fragmentation key */
 	uint64_t             start;       /**< creation timestamp */
 	uint32_t             total_size;  /**< expected reassembled size */
@@ -83,7 +83,7 @@ struct rte_ip_frag_death_row {
 	/**< mbufs to be freed */
 };
 
-TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */
+RTE_TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */
 
 /** fragmentation table statistics */
 struct ip_frag_tbl_stat {
diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 59a588425b..c5f859ae71 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1337,7 +1337,7 @@ void rte_mempool_walk(void (*func)(struct rte_mempool *, void *),
 
 	rte_mcfg_mempool_read_lock();
 
-	TAILQ_FOREACH_SAFE(te, mempool_list, next, tmp_te) {
+	RTE_TAILQ_FOREACH_SAFE(te, mempool_list, next, tmp_te) {
 		(*func)((struct rte_mempool *) te->data, arg);
 	}
 
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 4235d6f0bf..f57ecbd6fc 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -38,7 +38,6 @@
 #include <stdint.h>
 #include <errno.h>
 #include <inttypes.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_spinlock.h>
@@ -141,7 +140,7 @@ struct rte_mempool_objsz {
  * double-frees.
  */
 struct rte_mempool_objhdr {
-	STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
+	RTE_STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
 	struct rte_mempool *mp;          /**< The mempool owning the object. */
 	rte_iova_t iova;                 /**< IO address of the object. */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
@@ -152,7 +151,7 @@ struct rte_mempool_objhdr {
 /**
  * A list of object headers type
  */
-STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
+RTE_STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
 
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 
@@ -171,7 +170,7 @@ struct rte_mempool_objtlr {
 /**
  * A list of memory where objects are stored
  */
-STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
+RTE_STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
 
 /**
  * Callback used to free a memory chunk
@@ -186,7 +185,7 @@ typedef void (rte_mempool_memchunk_free_cb_t)(struct rte_mempool_memhdr *memhdr,
  * and physically contiguous.
  */
 struct rte_mempool_memhdr {
-	STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
+	RTE_STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
 	struct rte_mempool *mp;  /**< The mempool owning the chunk */
 	void *addr;              /**< Virtual address of the chunk */
 	rte_iova_t iova;         /**< IO address of the chunk */
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index 1f33d687f4..71cbd441c7 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -18,7 +18,6 @@ extern "C" {
 
 #include <stdio.h>
 #include <limits.h>
-#include <sys/queue.h>
 #include <inttypes.h>
 #include <sys/types.h>
 
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 16718ca7f1..43ce1a29d4 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -26,7 +26,6 @@ extern "C" {
 #include <stdio.h>
 #include <stdint.h>
 #include <string.h>
-#include <sys/queue.h>
 #include <errno.h>
 #include <rte_common.h>
 #include <rte_config.h>
diff --git a/lib/table/rte_swx_table.h b/lib/table/rte_swx_table.h
index e23f2304c6..f93e5f3f95 100644
--- a/lib/table/rte_swx_table.h
+++ b/lib/table/rte_swx_table.h
@@ -16,7 +16,8 @@ extern "C" {
  */
 
 #include <stdint.h>
-#include <sys/queue.h>
+
+#include <rte_os.h>
 
 /** Match type. */
 enum rte_swx_table_match_type {
@@ -68,7 +69,7 @@ struct rte_swx_table_entry {
 	/** Used to facilitate the membership of this table entry to a
 	 * linked list.
 	 */
-	TAILQ_ENTRY(rte_swx_table_entry) node;
+	RTE_TAILQ_ENTRY(rte_swx_table_entry) node;
 
 	/** Key value for the current entry. Array of *key_size* bytes or NULL
 	 * if the *key_size* for the current table is 0.
@@ -111,7 +112,7 @@ struct rte_swx_table_entry {
 };
 
 /** List of table entries. */
-TAILQ_HEAD(rte_swx_table_entry_list, rte_swx_table_entry);
+RTE_TAILQ_HEAD(rte_swx_table_entry_list, rte_swx_table_entry);
 
 /**
  * Table memory footprint get
diff --git a/lib/table/rte_swx_table_selector.h b/lib/table/rte_swx_table_selector.h
index 71b6a74810..62988d2856 100644
--- a/lib/table/rte_swx_table_selector.h
+++ b/lib/table/rte_swx_table_selector.h
@@ -16,7 +16,6 @@ extern "C" {
  */
 
 #include <stdint.h>
-#include <sys/queue.h>
 
 #include <rte_compat.h>
 
@@ -56,7 +55,7 @@ struct rte_swx_table_selector_params {
 /** Group member parameters. */
 struct rte_swx_table_selector_member {
 	/** Linked list connectivity. */
-	TAILQ_ENTRY(rte_swx_table_selector_member) node;
+	RTE_TAILQ_ENTRY(rte_swx_table_selector_member) node;
 
 	/** Member ID. */
 	uint32_t member_id;
@@ -66,7 +65,7 @@ struct rte_swx_table_selector_member {
 };
 
 /** List of group members. */
-TAILQ_HEAD(rte_swx_table_selector_member_list, rte_swx_table_selector_member);
+RTE_TAILQ_HEAD(rte_swx_table_selector_member_list, rte_swx_table_selector_member);
 
 /** Group parameters. */
 struct rte_swx_table_selector_group {
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index e0b67721b6..e4a445e709 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -32,7 +32,7 @@ vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
 
 	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
 		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
 		rte_mempool_put(vq->iotlb_pool, node);
 	}
@@ -100,7 +100,8 @@ vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
 
 	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next,
+				temp_node) {
 		if (node->iova < iova)
 			continue;
 		if (node->iova >= iova + size)
@@ -121,7 +122,7 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		TAILQ_REMOVE(&vq->iotlb_list, node, next);
 		rte_mempool_put(vq->iotlb_pool, node);
 	}
@@ -141,7 +142,7 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
 
 	entry_idx = rte_rand() % vq->iotlb_cache_nr;
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			rte_mempool_put(vq->iotlb_pool, node);
@@ -218,7 +219,7 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		/* Sorted list */
 		if (unlikely(iova + size < node->iova))
 			break;
diff --git a/lib/vhost/rte_vdpa_dev.h b/lib/vhost/rte_vdpa_dev.h
index bfada387b0..b0f494815f 100644
--- a/lib/vhost/rte_vdpa_dev.h
+++ b/lib/vhost/rte_vdpa_dev.h
@@ -71,7 +71,7 @@ struct rte_vdpa_dev_ops {
  * vdpa device structure includes device address and device operations.
  */
 struct rte_vdpa_device {
-	TAILQ_ENTRY(rte_vdpa_device) next;
+	RTE_TAILQ_ENTRY(rte_vdpa_device) next;
 	/** Generic device information */
 	struct rte_device *device;
 	/** vdpa device operations */
diff --git a/lib/vhost/vdpa.c b/lib/vhost/vdpa.c
index 99a926a772..6dd91859ac 100644
--- a/lib/vhost/vdpa.c
+++ b/lib/vhost/vdpa.c
@@ -115,7 +115,7 @@ rte_vdpa_unregister_device(struct rte_vdpa_device *dev)
 	int ret = -1;
 
 	rte_spinlock_lock(&vdpa_device_list_lock);
-	TAILQ_FOREACH_SAFE(cur_dev, &vdpa_device_list, next, tmp_dev) {
+	RTE_TAILQ_FOREACH_SAFE(cur_dev, &vdpa_device_list, next, tmp_dev) {
 		if (dev != cur_dev)
 			continue;
 
-- 
2.30.2


^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCHv4] eal: remove sys/queue.h from public headers.
  2021-08-13  1:11  0%     ` Stephen Hemminger
@ 2021-08-13  1:36  0%       ` William Tu
  0 siblings, 0 replies; 200+ results
From: William Tu @ 2021-08-13  1:36 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dpdk-dev, Dmitry Kozliuk, Nick Connolly

On Thu, Aug 12, 2021 at 6:11 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Fri, 13 Aug 2021 01:02:50 +0000
> William Tu <u9012063@gmail.com> wrote:
>
> > Currently there are some public headers that include 'sys/queue.h', which
> > is not POSIX, but usually provided by Linux/BSD system library.
> > (Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
> > The file is missing on Windows. During the windows build, DPDK uses a
> > bundled copy, so building DPDK library works fine.  But when OVS or other
> > applications use DPDK as a library, because some DPDK public headers
> > include 'sys/queue.h', on Windows, it triggers error due to no such file.
> >
> > One solution is to installl the 'lib/eal/windows/include/sys/queue.h' into
> > Windows environment, such as [1]. However, this means DPDK exports the
> > functinoalities of 'sys/queue.h' into the environment, which might cause
> > symbols, macros, headers clashing with other applications.
> >
> > The patch fixes it by removing the "#include <sys/queue.h>" from
> > DPDK public headers, so programs including DPDK headers don't depend
> > on system to provide 'sys/queue.h'. When these public headers use
> > macros such as TAILQ_xxx, we replace it with RTE_ prefix.
> > For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
> > under windows. Note that these RTE_ macros are compatible with
> > <sys/queue.h>, only at the level of API (to use with <sys/queue.h>
> > macros in C files) and ABI (to avoid breaking it).
> >
> > Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
> > the patch replaces it with RTE_TAILQ_FOREACH_SAFE.
> > With this patch, all the public headers no longer have
> > "#include <sys/queue.h>" or "TAILQ_xxx" macros.
>
>
> Please run a spell checker on the commit message if you resubmit.

OK, will do it, thanks!
William

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCHv4] eal: remove sys/queue.h from public headers.
  2021-08-13  1:02  1%   ` [dpdk-dev] [PATCHv4] eal: remove sys/queue.h from public headers William Tu
@ 2021-08-13  1:11  0%     ` Stephen Hemminger
  2021-08-13  1:36  0%       ` William Tu
  2021-08-13  3:36  1%     ` [dpdk-dev] [PATCHv5] " William Tu
  1 sibling, 1 reply; 200+ results
From: Stephen Hemminger @ 2021-08-13  1:11 UTC (permalink / raw)
  To: William Tu; +Cc: dev, Dmitry.Kozliuk, nick.connolly

On Fri, 13 Aug 2021 01:02:50 +0000
William Tu <u9012063@gmail.com> wrote:

> Currently there are some public headers that include 'sys/queue.h', which
> is not POSIX, but usually provided by Linux/BSD system library.
> (Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
> The file is missing on Windows. During the windows build, DPDK uses a
> bundled copy, so building DPDK library works fine.  But when OVS or other
> applications use DPDK as a library, because some DPDK public headers
> include 'sys/queue.h', on Windows, it triggers error due to no such file.
> 
> One solution is to installl the 'lib/eal/windows/include/sys/queue.h' into
> Windows environment, such as [1]. However, this means DPDK exports the
> functinoalities of 'sys/queue.h' into the environment, which might cause
> symbols, macros, headers clashing with other applications.
> 
> The patch fixes it by removing the "#include <sys/queue.h>" from
> DPDK public headers, so programs including DPDK headers don't depend
> on system to provide 'sys/queue.h'. When these public headers use
> macros such as TAILQ_xxx, we replace it with RTE_ prefix.
> For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
> under windows. Note that these RTE_ macros are compatible with
> <sys/queue.h>, only at the level of API (to use with <sys/queue.h>
> macros in C files) and ABI (to avoid breaking it).
> 
> Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
> the patch replaces it with RTE_TAILQ_FOREACH_SAFE.
> With this patch, all the public headers no longer have
> "#include <sys/queue.h>" or "TAILQ_xxx" macros.


Please run a spell checker on the commit message if you resubmit.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCHv4] eal: remove sys/queue.h from public headers.
    2021-08-12 21:58  3%   ` Dmitry Kozlyuk
@ 2021-08-13  1:02  1%   ` William Tu
  2021-08-13  1:11  0%     ` Stephen Hemminger
  2021-08-13  3:36  1%     ` [dpdk-dev] [PATCHv5] " William Tu
  1 sibling, 2 replies; 200+ results
From: William Tu @ 2021-08-13  1:02 UTC (permalink / raw)
  To: dev; +Cc: Dmitry.Kozliuk, nick.connolly

Currently there are some public headers that include 'sys/queue.h', which
is not POSIX, but usually provided by Linux/BSD system library.
(Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
The file is missing on Windows. During the windows build, DPDK uses a
bundled copy, so building DPDK library works fine.  But when OVS or other
applications use DPDK as a library, because some DPDK public headers
include 'sys/queue.h', on Windows, it triggers error due to no such file.

One solution is to installl the 'lib/eal/windows/include/sys/queue.h' into
Windows environment, such as [1]. However, this means DPDK exports the
functinoalities of 'sys/queue.h' into the environment, which might cause
symbols, macros, headers clashing with other applications.

The patch fixes it by removing the "#include <sys/queue.h>" from
DPDK public headers, so programs including DPDK headers don't depend
on system to provide 'sys/queue.h'. When these public headers use
macros such as TAILQ_xxx, we replace it with RTE_ prefix.
For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
under windows. Note that these RTE_ macros are compatible with
<sys/queue.h>, only at the level of API (to use with <sys/queue.h>
macros in C files) and ABI (to avoid breaking it).

Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
the patch replaces it with RTE_TAILQ_FOREACH_SAFE.
With this patch, all the public headers no longer have
"#include <sys/queue.h>" or "TAILQ_xxx" macros.

[1] http://mails.dpdk.org/archives/dev/2021-August/216304.html

Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
Suggested-by: Dmitry Kozliuk <Dmitry.Kozliuk@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
---
v3-v4:
* address comments from Dmitry
---
 drivers/bus/auxiliary/private.h            |  1 +
 drivers/bus/auxiliary/rte_bus_auxiliary.h  |  5 ++--
 drivers/bus/dpaa/dpaa_bus.c                |  4 +--
 drivers/bus/fslmc/fslmc_bus.c              |  4 +--
 drivers/bus/fslmc/fslmc_vfio.c             |  9 ++++---
 drivers/bus/ifpga/rte_bus_ifpga.h          |  8 +++---
 drivers/bus/pci/pci_params.c               |  2 ++
 drivers/bus/pci/rte_bus_pci.h              | 13 +++++----
 drivers/bus/pci/windows/pci.c              |  3 +++
 drivers/bus/pci/windows/pci_netuio.c       |  2 ++
 drivers/bus/vdev/rte_bus_vdev.h            |  7 +++--
 drivers/bus/vdev/vdev.c                    |  3 ++-
 drivers/bus/vmbus/rte_bus_vmbus.h          | 13 +++++----
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c         |  2 +-
 drivers/net/bonding/rte_eth_bond_flow.c    |  2 +-
 drivers/net/failsafe/failsafe_flow.c       |  2 +-
 drivers/net/i40e/i40e_ethdev.c             |  9 ++++---
 drivers/net/i40e/i40e_ethdev.h             |  1 +
 drivers/net/i40e/i40e_flow.c               |  6 ++---
 drivers/net/i40e/i40e_hash.c               |  2 +-
 drivers/net/i40e/rte_pmd_i40e.c            |  6 ++---
 drivers/net/iavf/iavf_generic_flow.c       | 14 +++++-----
 drivers/net/ice/ice_dcf_ethdev.c           |  1 +
 drivers/net/ice/ice_ethdev.c               |  4 +--
 drivers/net/ice/ice_generic_flow.c         | 14 +++++-----
 drivers/net/softnic/rte_eth_softnic_flow.c |  3 ++-
 drivers/net/softnic/rte_eth_softnic_swq.c  |  2 +-
 drivers/raw/dpaa2_qdma/dpaa2_qdma.c        |  2 +-
 lib/bbdev/rte_bbdev.h                      |  2 +-
 lib/cryptodev/rte_cryptodev.h              |  2 +-
 lib/cryptodev/rte_cryptodev_pmd.h          |  2 +-
 lib/eal/common/eal_common_devargs.c        |  6 +++--
 lib/eal/common/eal_common_fbarray.c        |  1 +
 lib/eal/common/eal_common_log.c            |  1 +
 lib/eal/common/eal_common_memalloc.c       |  1 +
 lib/eal/common/eal_common_options.c        |  3 ++-
 lib/eal/common/eal_trace.h                 |  2 ++
 lib/eal/freebsd/include/rte_os.h           | 15 +++++++++++
 lib/eal/include/rte_bus.h                  |  5 ++--
 lib/eal/include/rte_class.h                |  6 ++---
 lib/eal/include/rte_dev.h                  |  5 ++--
 lib/eal/include/rte_devargs.h              |  3 +--
 lib/eal/include/rte_log.h                  |  1 -
 lib/eal/include/rte_service.h              |  1 -
 lib/eal/include/rte_tailq.h                | 15 +++++------
 lib/eal/linux/include/rte_os.h             | 15 +++++++++++
 lib/eal/windows/eal_alarm.c                |  1 +
 lib/eal/windows/include/rte_os.h           | 31 ++++++++++++++++++++++
 lib/efd/rte_efd.c                          |  2 +-
 lib/ethdev/rte_ethdev_core.h               |  2 +-
 lib/hash/rte_fbk_hash.h                    |  1 -
 lib/hash/rte_thash.c                       |  2 ++
 lib/ip_frag/rte_ip_frag.h                  |  4 +--
 lib/mempool/rte_mempool.c                  |  2 +-
 lib/mempool/rte_mempool.h                  |  9 +++----
 lib/pci/rte_pci.h                          |  1 -
 lib/ring/rte_ring_core.h                   |  1 -
 lib/table/rte_swx_table.h                  |  7 ++---
 lib/table/rte_swx_table_selector.h         |  5 ++--
 lib/vhost/iotlb.c                          | 11 ++++----
 lib/vhost/rte_vdpa_dev.h                   |  2 +-
 lib/vhost/vdpa.c                           |  2 +-
 62 files changed, 193 insertions(+), 120 deletions(-)

diff --git a/drivers/bus/auxiliary/private.h b/drivers/bus/auxiliary/private.h
index 9987e8b501..d22e83cf7a 100644
--- a/drivers/bus/auxiliary/private.h
+++ b/drivers/bus/auxiliary/private.h
@@ -7,6 +7,7 @@
 
 #include <stdbool.h>
 #include <stdio.h>
+#include <sys/queue.h>
 
 #include "rte_bus_auxiliary.h"
 
diff --git a/drivers/bus/auxiliary/rte_bus_auxiliary.h b/drivers/bus/auxiliary/rte_bus_auxiliary.h
index 2462bad2ba..b1f5610404 100644
--- a/drivers/bus/auxiliary/rte_bus_auxiliary.h
+++ b/drivers/bus/auxiliary/rte_bus_auxiliary.h
@@ -19,7 +19,6 @@ extern "C" {
 #include <stdlib.h>
 #include <limits.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -113,7 +112,7 @@ typedef int (rte_auxiliary_dma_unmap_t)(struct rte_auxiliary_device *dev,
  * A structure describing an auxiliary device.
  */
 struct rte_auxiliary_device {
-	TAILQ_ENTRY(rte_auxiliary_device) next;   /**< Next probed device. */
+	RTE_TAILQ_ENTRY(rte_auxiliary_device) next; /**< Next probed device. */
 	struct rte_device device;                 /**< Inherit core device */
 	char name[RTE_DEV_NAME_MAX_LEN + 1];      /**< ASCII device name */
 	struct rte_intr_handle intr_handle;       /**< Interrupt handle */
@@ -124,7 +123,7 @@ struct rte_auxiliary_device {
  * A structure describing an auxiliary driver.
  */
 struct rte_auxiliary_driver {
-	TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
 	struct rte_driver driver;             /**< Inherit core driver. */
 	struct rte_auxiliary_bus *bus;        /**< Auxiliary bus reference. */
 	rte_auxiliary_match_t *match;         /**< Device match function. */
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index e499305d85..6cab2ae760 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -105,7 +105,7 @@ dpaa_add_to_device_list(struct rte_dpaa_device *newdev)
 	struct rte_dpaa_device *dev = NULL;
 	struct rte_dpaa_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
 		comp = compare_dpaa_devices(newdev, dev);
 		if (comp < 0) {
 			TAILQ_INSERT_BEFORE(dev, newdev, next);
@@ -245,7 +245,7 @@ dpaa_clean_device_list(void)
 	struct rte_dpaa_device *dev = NULL;
 	struct rte_dpaa_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_dpaa_bus.device_list, next, tdev) {
 		TAILQ_REMOVE(&rte_dpaa_bus.device_list, dev, next);
 		free(dev);
 		dev = NULL;
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index becc455f6b..8c8f8a298d 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -45,7 +45,7 @@ cleanup_fslmc_device_list(void)
 	struct rte_dpaa2_device *dev;
 	struct rte_dpaa2_device *t_dev;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, t_dev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, t_dev) {
 		TAILQ_REMOVE(&rte_fslmc_bus.device_list, dev, next);
 		free(dev);
 		dev = NULL;
@@ -82,7 +82,7 @@ insert_in_device_list(struct rte_dpaa2_device *newdev)
 	struct rte_dpaa2_device *dev = NULL;
 	struct rte_dpaa2_device *tdev = NULL;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, tdev) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, tdev) {
 		comp = compare_dpaa2_devname(newdev, dev);
 		if (comp < 0) {
 			TAILQ_INSERT_BEFORE(dev, newdev, next);
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index c8373e627a..852fcfc4dd 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -808,7 +808,8 @@ fslmc_vfio_process_group(void)
 	bool is_dpmcp_in_blocklist = false, is_dpio_in_blocklist = false;
 	int dpmcp_count = 0, dpio_count = 0, current_device;
 
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_MPORTAL) {
 			dpmcp_count++;
 			if (dev->device.devargs &&
@@ -825,7 +826,8 @@ fslmc_vfio_process_group(void)
 
 	/* Search the MCP as that should be initialized first. */
 	current_device = 0;
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_MPORTAL) {
 			current_device++;
 			if (dev->device.devargs &&
@@ -872,7 +874,8 @@ fslmc_vfio_process_group(void)
 	}
 
 	current_device = 0;
-	TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) {
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next,
+		dev_temp) {
 		if (dev->dev_type == DPAA2_IO)
 			current_device++;
 		if (dev->device.devargs &&
diff --git a/drivers/bus/ifpga/rte_bus_ifpga.h b/drivers/bus/ifpga/rte_bus_ifpga.h
index b43084155a..0186f5acde 100644
--- a/drivers/bus/ifpga/rte_bus_ifpga.h
+++ b/drivers/bus/ifpga/rte_bus_ifpga.h
@@ -28,9 +28,9 @@ struct rte_afu_device;
 struct rte_afu_driver;
 
 /** Double linked list of Intel FPGA AFU device. */
-TAILQ_HEAD(ifpga_afu_dev_list, rte_afu_device);
+RTE_TAILQ_HEAD(ifpga_afu_dev_list, rte_afu_device);
 /** Double linked list of Intel FPGA AFU device drivers. */
-TAILQ_HEAD(ifpga_afu_drv_list, rte_afu_driver);
+RTE_TAILQ_HEAD(ifpga_afu_drv_list, rte_afu_driver);
 
 #define IFPGA_BUS_BITSTREAM_PATH_MAX_LEN 256
 
@@ -71,7 +71,7 @@ struct rte_afu_shared {
  * A structure describing a AFU device.
  */
 struct rte_afu_device {
-	TAILQ_ENTRY(rte_afu_device) next;       /**< Next in device list. */
+	RTE_TAILQ_ENTRY(rte_afu_device) next;       /**< Next in device list. */
 	struct rte_device device;               /**< Inherit core device */
 	struct rte_rawdev *rawdev;    /**< Point Rawdev */
 	struct rte_afu_id id;                   /**< AFU id within FPGA. */
@@ -105,7 +105,7 @@ typedef int (afu_remove_t)(struct rte_afu_device *);
  * A structure describing a AFU device.
  */
 struct rte_afu_driver {
-	TAILQ_ENTRY(rte_afu_driver) next;       /**< Next afu driver. */
+	RTE_TAILQ_ENTRY(rte_afu_driver) next;       /**< Next afu driver. */
 	struct rte_driver driver;               /**< Inherit core driver. */
 	afu_probe_t *probe;                     /**< Device Probe function. */
 	afu_remove_t *remove;                   /**< Device Remove function. */
diff --git a/drivers/bus/pci/pci_params.c b/drivers/bus/pci/pci_params.c
index 3192e9c967..717388753d 100644
--- a/drivers/bus/pci/pci_params.c
+++ b/drivers/bus/pci/pci_params.c
@@ -2,6 +2,8 @@
  * Copyright 2018 Gaëtan Rivet
  */
 
+#include <sys/queue.h>
+
 #include <rte_bus.h>
 #include <rte_bus_pci.h>
 #include <rte_dev.h>
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 583470e831..673a2850c1 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -19,7 +19,6 @@ extern "C" {
 #include <stdlib.h>
 #include <limits.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -37,16 +36,16 @@ struct rte_pci_device;
 struct rte_pci_driver;
 
 /** List of PCI devices */
-TAILQ_HEAD(rte_pci_device_list, rte_pci_device);
+RTE_TAILQ_HEAD(rte_pci_device_list, rte_pci_device);
 /** List of PCI drivers */
-TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
+RTE_TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
 
 /* PCI Bus iterators */
 #define FOREACH_DEVICE_ON_PCIBUS(p)	\
-		TAILQ_FOREACH(p, &(rte_pci_bus.device_list), next)
+		RTE_TAILQ_FOREACH(p, &(rte_pci_bus.device_list), next)
 
 #define FOREACH_DRIVER_ON_PCIBUS(p)	\
-		TAILQ_FOREACH(p, &(rte_pci_bus.driver_list), next)
+		RTE_TAILQ_FOREACH(p, &(rte_pci_bus.driver_list), next)
 
 struct rte_devargs;
 
@@ -64,7 +63,7 @@ enum rte_pci_kernel_driver {
  * A structure describing a PCI device.
  */
 struct rte_pci_device {
-	TAILQ_ENTRY(rte_pci_device) next;   /**< Next probed PCI device. */
+	RTE_TAILQ_ENTRY(rte_pci_device) next;   /**< Next probed PCI device. */
 	struct rte_device device;           /**< Inherit core device */
 	struct rte_pci_addr addr;           /**< PCI location. */
 	struct rte_pci_id id;               /**< PCI ID. */
@@ -160,7 +159,7 @@ typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
-	TAILQ_ENTRY(rte_pci_driver) next;  /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_pci_driver) next;  /**< Next in list. */
 	struct rte_driver driver;          /**< Inherit core driver. */
 	struct rte_pci_bus *bus;           /**< PCI bus reference. */
 	rte_pci_probe_t *probe;            /**< Device probe function. */
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index d39a7748b8..d7bd5d6e80 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -1,6 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2020 Mellanox Technologies, Ltd
  */
+
+#include <sys/queue.h>
+
 #include <rte_windows.h>
 #include <rte_errno.h>
 #include <rte_log.h>
diff --git a/drivers/bus/pci/windows/pci_netuio.c b/drivers/bus/pci/windows/pci_netuio.c
index 1bf9133f71..a0b175a8fc 100644
--- a/drivers/bus/pci/windows/pci_netuio.c
+++ b/drivers/bus/pci/windows/pci_netuio.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2020 Intel Corporation.
  */
 
+#include <sys/queue.h>
+
 #include <rte_windows.h>
 #include <rte_errno.h>
 #include <rte_log.h>
diff --git a/drivers/bus/vdev/rte_bus_vdev.h b/drivers/bus/vdev/rte_bus_vdev.h
index fc315d10fa..2856799953 100644
--- a/drivers/bus/vdev/rte_bus_vdev.h
+++ b/drivers/bus/vdev/rte_bus_vdev.h
@@ -15,12 +15,11 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
 
 struct rte_vdev_device {
-	TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
+	RTE_TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
 	struct rte_device device;               /**< Inherit core device */
 };
 
@@ -53,7 +52,7 @@ rte_vdev_device_args(const struct rte_vdev_device *dev)
 }
 
 /** Double linked list of virtual device drivers. */
-TAILQ_HEAD(vdev_driver_list, rte_vdev_driver);
+RTE_TAILQ_HEAD(vdev_driver_list, rte_vdev_driver);
 
 /**
  * Probe function called for each virtual device driver once.
@@ -107,7 +106,7 @@ typedef int (rte_vdev_dma_unmap_t)(struct rte_vdev_device *dev, void *addr,
  * A virtual device driver abstraction.
  */
 struct rte_vdev_driver {
-	TAILQ_ENTRY(rte_vdev_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_vdev_driver) next; /**< Next in list. */
 	struct rte_driver driver;        /**< Inherited general driver. */
 	rte_vdev_probe_t *probe;         /**< Virtual device probe function. */
 	rte_vdev_remove_t *remove;       /**< Virtual device remove function. */
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index 281a2c34e8..a8d8b2327e 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -100,7 +100,8 @@ rte_vdev_remove_custom_scan(rte_vdev_scan_callback callback, void *user_arg)
 	struct vdev_custom_scan *custom_scan, *tmp_scan;
 
 	rte_spinlock_lock(&vdev_custom_scan_lock);
-	TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next, tmp_scan) {
+	RTE_TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next,
+				tmp_scan) {
 		if (custom_scan->callback != callback ||
 				(custom_scan->user_arg != (void *)-1 &&
 				custom_scan->user_arg != user_arg))
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h
index 4cf73ce815..6bcff66468 100644
--- a/drivers/bus/vmbus/rte_bus_vmbus.h
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -20,7 +20,6 @@ extern "C" {
 #include <limits.h>
 #include <stdbool.h>
 #include <errno.h>
-#include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
 
@@ -38,15 +37,15 @@ struct rte_vmbus_bus;
 struct vmbus_channel;
 struct vmbus_mon_page;
 
-TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
-TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
+RTE_TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
+RTE_TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
 
 /* VMBus iterators */
 #define FOREACH_DEVICE_ON_VMBUS(p)	\
-	TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
+	RTE_TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
 
 #define FOREACH_DRIVER_ON_VMBUS(p)	\
-	TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
+	RTE_TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
 
 /** Maximum number of VMBUS resources. */
 enum hv_uio_map {
@@ -62,7 +61,7 @@ enum hv_uio_map {
  * A structure describing a VMBUS device.
  */
 struct rte_vmbus_device {
-	TAILQ_ENTRY(rte_vmbus_device) next;    /**< Next probed VMBUS device */
+	RTE_TAILQ_ENTRY(rte_vmbus_device) next; /**< Next probed VMBUS device */
 	const struct rte_vmbus_driver *driver; /**< Associated driver */
 	struct rte_device device;              /**< Inherit core device */
 	rte_uuid_t device_id;		       /**< VMBUS device id */
@@ -93,7 +92,7 @@ typedef int (vmbus_remove_t)(struct rte_vmbus_device *);
  * A structure describing a VMBUS driver.
  */
 struct rte_vmbus_driver {
-	TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
 	struct rte_driver driver;
 	struct rte_vmbus_bus *bus;          /**< VM bus reference. */
 	vmbus_probe_t *probe;               /**< Device Probe function. */
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index dbf85e4eda..ac86b70caf 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -2018,7 +2018,7 @@ bnxt_ulp_cntxt_list_del(struct bnxt_ulp_context *ulp_ctx)
 	struct ulp_context_list_entry	*entry, *temp;
 
 	rte_spinlock_lock(&bnxt_ulp_ctxt_lock);
-	TAILQ_FOREACH_SAFE(entry, &ulp_cntx_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(entry, &ulp_cntx_list, next, temp) {
 		if (entry->ulp_ctx == ulp_ctx) {
 			TAILQ_REMOVE(&ulp_cntx_list, entry, next);
 			rte_free(entry);
diff --git a/drivers/net/bonding/rte_eth_bond_flow.c b/drivers/net/bonding/rte_eth_bond_flow.c
index 417f76bf60..65b77faae7 100644
--- a/drivers/net/bonding/rte_eth_bond_flow.c
+++ b/drivers/net/bonding/rte_eth_bond_flow.c
@@ -157,7 +157,7 @@ bond_flow_flush(struct rte_eth_dev *dev, struct rte_flow_error *err)
 	/* Destroy all bond flows from its slaves instead of flushing them to
 	 * keep the LACP flow or any other external flows.
 	 */
-	TAILQ_FOREACH_SAFE(flow, &internals->flow_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &internals->flow_list, next, tmp) {
 		lret = bond_flow_destroy(dev, flow, err);
 		if (unlikely(lret != 0))
 			ret = lret;
diff --git a/drivers/net/failsafe/failsafe_flow.c b/drivers/net/failsafe/failsafe_flow.c
index 5e2b5f7c67..354f9fec20 100644
--- a/drivers/net/failsafe/failsafe_flow.c
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -180,7 +180,7 @@ fs_flow_flush(struct rte_eth_dev *dev,
 			return ret;
 		}
 	}
-	TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
 		TAILQ_REMOVE(&PRIV(dev)->flow_list, flow, next);
 		fs_flow_release(&flow);
 	}
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 7b230e2ed1..6590363556 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -5436,7 +5436,7 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 
 	/* VSI has child to attach, release child first */
 	if (vsi->veb) {
-		TAILQ_FOREACH_SAFE(vsi_list, &vsi->veb->head, list, temp) {
+		RTE_TAILQ_FOREACH_SAFE(vsi_list, &vsi->veb->head, list, temp) {
 			if (i40e_vsi_release(vsi_list->vsi) != I40E_SUCCESS)
 				return -1;
 		}
@@ -5444,7 +5444,8 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 	}
 
 	if (vsi->floating_veb) {
-		TAILQ_FOREACH_SAFE(vsi_list, &vsi->floating_veb->head, list, temp) {
+		RTE_TAILQ_FOREACH_SAFE(vsi_list, &vsi->floating_veb->head,
+			list, temp) {
 			if (i40e_vsi_release(vsi_list->vsi) != I40E_SUCCESS)
 				return -1;
 		}
@@ -5452,7 +5453,7 @@ i40e_vsi_release(struct i40e_vsi *vsi)
 
 	/* Remove all macvlan filters of the VSI */
 	i40e_vsi_remove_all_macvlan_filter(vsi);
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
 		rte_free(f);
 
 	if (vsi->type != I40E_VSI_MAIN &&
@@ -6055,7 +6056,7 @@ i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on)
 	i = 0;
 
 	/* Remove all existing mac */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		mac_filter[i] = f->mac_info;
 		ret = i40e_vsi_delete_mac(vsi, &f->mac_info.mac_addr);
 		if (ret) {
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cd6deabd60..374b73e4a7 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -6,6 +6,7 @@
 #define _I40E_ETHDEV_H_
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 #include <rte_time.h>
 #include <rte_kvargs.h>
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 3c1570bd9c..e41a84f1d7 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -4917,7 +4917,7 @@ i40e_flow_flush_fdir_filter(struct i40e_pf *pf)
 		}
 
 		/* Delete FDIR flows in flow list. */
-		TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+		RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 			if (flow->filter_type == RTE_ETH_FILTER_FDIR) {
 				TAILQ_REMOVE(&pf->flow_list, flow, node);
 			}
@@ -4972,7 +4972,7 @@ i40e_flow_flush_ethertype_filter(struct i40e_pf *pf)
 	}
 
 	/* Delete ethertype flows in flow list. */
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 		if (flow->filter_type == RTE_ETH_FILTER_ETHERTYPE) {
 			TAILQ_REMOVE(&pf->flow_list, flow, node);
 			rte_free(flow);
@@ -5000,7 +5000,7 @@ i40e_flow_flush_tunnel_filter(struct i40e_pf *pf)
 	}
 
 	/* Delete tunnel flows in flow list. */
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
 		if (flow->filter_type == RTE_ETH_FILTER_TUNNEL) {
 			TAILQ_REMOVE(&pf->flow_list, flow, node);
 			rte_free(flow);
diff --git a/drivers/net/i40e/i40e_hash.c b/drivers/net/i40e/i40e_hash.c
index 1fb8c9abfc..6579b1a00b 100644
--- a/drivers/net/i40e/i40e_hash.c
+++ b/drivers/net/i40e/i40e_hash.c
@@ -1366,7 +1366,7 @@ i40e_hash_filter_flush(struct i40e_pf *pf)
 {
 	struct rte_flow *flow, *next;
 
-	TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, next) {
+	RTE_TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, next) {
 		if (flow->filter_type != RTE_ETH_FILTER_HASH)
 			continue;
 
diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index 2e34140c5b..ec24046440 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -216,7 +216,7 @@ i40e_vsi_rm_mac_filter(struct i40e_vsi *vsi)
 	void *temp;
 
 	/* remove all the MACs */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		vlan_num = vsi->vlan_num;
 		filter_type = f->mac_info.filter_type;
 		if (filter_type == I40E_MACVLAN_PERFECT_MATCH ||
@@ -274,7 +274,7 @@ i40e_vsi_restore_mac_filter(struct i40e_vsi *vsi)
 	void *temp;
 
 	/* restore all the MACs */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp) {
 		if (f->mac_info.filter_type == I40E_MACVLAN_PERFECT_MATCH ||
 		    f->mac_info.filter_type == I40E_MACVLAN_HASH_MATCH) {
 			/**
@@ -563,7 +563,7 @@ rte_pmd_i40e_set_vf_mac_addr(uint16_t port, uint16_t vf_id,
 	rte_ether_addr_copy(mac_addr, &vf->mac_addr);
 
 	/* Remove all existing mac */
-	TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
+	RTE_TAILQ_FOREACH_SAFE(f, &vsi->mac_list, next, temp)
 		if (i40e_vsi_delete_mac(vsi, &f->mac_info.mac_addr)
 				!= I40E_SUCCESS)
 			PMD_DRV_LOG(WARNING, "Delete MAC failed");
diff --git a/drivers/net/iavf/iavf_generic_flow.c b/drivers/net/iavf/iavf_generic_flow.c
index 1fe270fb22..b86d99e57d 100644
--- a/drivers/net/iavf/iavf_generic_flow.c
+++ b/drivers/net/iavf/iavf_generic_flow.c
@@ -1637,7 +1637,7 @@ iavf_flow_init(struct iavf_adapter *ad)
 	TAILQ_INIT(&vf->dist_parser_list);
 	rte_spinlock_init(&vf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->init == NULL) {
 			PMD_INIT_LOG(ERR, "Invalid engine type (%d)",
 				     engine->type);
@@ -1663,7 +1663,7 @@ iavf_flow_uninit(struct iavf_adapter *ad)
 	struct iavf_flow_parser_node *p_parser;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->uninit)
 			engine->uninit(ad);
 	}
@@ -1733,7 +1733,7 @@ iavf_unregister_parser(struct iavf_flow_parser *parser,
 	if (list == NULL)
 		return;
 
-	TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
 		if (p_parser->parser->engine->type == parser->engine->type) {
 			TAILQ_REMOVE(list, p_parser, node);
 			rte_free(p_parser);
@@ -1917,7 +1917,7 @@ iavf_parse_engine_create(struct iavf_adapter *ad,
 	void *temp;
 	void *meta = NULL;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -1946,7 +1946,7 @@ iavf_parse_engine_validate(struct iavf_adapter *ad,
 	void *temp;
 	void *meta = NULL;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -2089,7 +2089,7 @@ iavf_flow_is_valid(struct rte_flow *flow)
 	void *temp;
 
 	if (flow && flow->engine) {
-		TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+		RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 			if (engine == flow->engine)
 				return true;
 		}
@@ -2142,7 +2142,7 @@ iavf_flow_flush(struct rte_eth_dev *dev,
 	void *temp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(p_flow, &vf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &vf->flow_list, node, temp) {
 		ret = iavf_flow_destroy(dev, p_flow, error);
 		if (ret) {
 			PMD_DRV_LOG(ERR, "Failed to flush flows");
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index cab7c4da87..629e88980d 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -4,6 +4,7 @@
 
 #include <errno.h>
 #include <stdbool.h>
+#include <sys/queue.h>
 #include <sys/types.h>
 #include <unistd.h>
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index a4cd39c954..fadd5f2e5a 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -1104,7 +1104,7 @@ ice_remove_all_mac_vlan_filters(struct ice_vsi *vsi)
 	if (!vsi || !vsi->mac_num)
 		return -EINVAL;
 
-	TAILQ_FOREACH_SAFE(m_f, &vsi->mac_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(m_f, &vsi->mac_list, next, temp) {
 		ret = ice_remove_mac_filter(vsi, &m_f->mac_info.mac_addr);
 		if (ret != ICE_SUCCESS) {
 			ret = -EINVAL;
@@ -1115,7 +1115,7 @@ ice_remove_all_mac_vlan_filters(struct ice_vsi *vsi)
 	if (vsi->vlan_num == 0)
 		return 0;
 
-	TAILQ_FOREACH_SAFE(v_f, &vsi->vlan_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(v_f, &vsi->vlan_list, next, temp) {
 		ret = ice_remove_vlan_filter(vsi, &v_f->vlan_info.vlan);
 		if (ret != ICE_SUCCESS) {
 			ret = -EINVAL;
diff --git a/drivers/net/ice/ice_generic_flow.c b/drivers/net/ice/ice_generic_flow.c
index 66b5743abf..3e557efe0c 100644
--- a/drivers/net/ice/ice_generic_flow.c
+++ b/drivers/net/ice/ice_generic_flow.c
@@ -1820,7 +1820,7 @@ ice_flow_init(struct ice_adapter *ad)
 	TAILQ_INIT(&pf->dist_parser_list);
 	rte_spinlock_init(&pf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->init == NULL) {
 			PMD_INIT_LOG(ERR, "Invalid engine type (%d)",
 					engine->type);
@@ -1846,7 +1846,7 @@ ice_flow_uninit(struct ice_adapter *ad)
 	struct ice_flow_parser_node *p_parser;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(engine, &engine_list, node, temp) {
 		if (engine->uninit)
 			engine->uninit(ad);
 	}
@@ -1946,7 +1946,7 @@ ice_unregister_parser(struct ice_flow_parser *parser,
 	if (list == NULL)
 		return;
 
-	TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_parser, list, node, temp) {
 		if (p_parser->parser->engine->type == parser->engine->type) {
 			TAILQ_REMOVE(list, p_parser, node);
 			rte_free(p_parser);
@@ -2272,7 +2272,7 @@ ice_parse_engine_create(struct ice_adapter *ad,
 	void *meta = NULL;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		int ret;
 
 		if (parser_node->parser->parse_pattern_action(ad,
@@ -2305,7 +2305,7 @@ ice_parse_engine_validate(struct ice_adapter *ad,
 	struct ice_flow_parser_node *parser_node;
 	void *temp;
 
-	TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(parser_node, parser_list, node, temp) {
 		if (parser_node->parser->parse_pattern_action(ad,
 				parser_node->parser->array,
 				parser_node->parser->array_len,
@@ -2477,7 +2477,7 @@ ice_flow_flush(struct rte_eth_dev *dev,
 	void *temp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
 		ret = ice_flow_destroy(dev, p_flow, error);
 		if (ret) {
 			PMD_DRV_LOG(ERR, "Failed to flush flows");
@@ -2541,7 +2541,7 @@ ice_flow_redirect(struct ice_adapter *ad,
 
 	rte_spinlock_lock(&pf->flow_ops_lock);
 
-	TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
+	RTE_TAILQ_FOREACH_SAFE(p_flow, &pf->flow_list, node, temp) {
 		if (!p_flow->engine->redirect)
 			continue;
 		ret = p_flow->engine->redirect(ad, p_flow, rd);
diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c b/drivers/net/softnic/rte_eth_softnic_flow.c
index 27eaf380cd..7d054c38d2 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -2207,7 +2207,8 @@ pmd_flow_flush(struct rte_eth_dev *dev,
 			void *temp;
 			int status;
 
-			TAILQ_FOREACH_SAFE(flow, &table->flows, node, temp) {
+			RTE_TAILQ_FOREACH_SAFE(flow, &table->flows, node,
+				temp) {
 				/* Rule delete. */
 				status = softnic_pipeline_table_rule_delete
 						(softnic,
diff --git a/drivers/net/softnic/rte_eth_softnic_swq.c b/drivers/net/softnic/rte_eth_softnic_swq.c
index 2083d0a976..afe6f05e29 100644
--- a/drivers/net/softnic/rte_eth_softnic_swq.c
+++ b/drivers/net/softnic/rte_eth_softnic_swq.c
@@ -39,7 +39,7 @@ softnic_softnic_swq_free_keep_rxq_txq(struct pmd_internals *p)
 {
 	struct softnic_swq *swq, *tswq;
 
-	TAILQ_FOREACH_SAFE(swq, &p->swq_list, node, tswq) {
+	RTE_TAILQ_FOREACH_SAFE(swq, &p->swq_list, node, tswq) {
 		if ((strncmp(swq->name, "RXQ", strlen("RXQ")) == 0) ||
 			(strncmp(swq->name, "TXQ", strlen("TXQ")) == 0))
 			continue;
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index c961e18d67..7b80370b36 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -1606,7 +1606,7 @@ remove_hw_queues_from_list(struct dpaa2_dpdmai_dev *dpdmai_dev)
 
 	DPAA2_QDMA_FUNC_TRACE();
 
-	TAILQ_FOREACH_SAFE(queue, &qdma_queue_list, next, tqueue) {
+	RTE_TAILQ_FOREACH_SAFE(queue, &qdma_queue_list, next, tqueue) {
 		if (queue->dpdmai_dev == dpdmai_dev) {
 			TAILQ_REMOVE(&qdma_queue_list, queue, next);
 			rte_free(queue);
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 7017124414..3ebf62e697 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -434,7 +434,7 @@ struct rte_bbdev_callback;
 struct rte_intr_handle;
 
 /** Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_bbdev_cb_list, rte_bbdev_callback);
+RTE_TAILQ_HEAD(rte_bbdev_cb_list, rte_bbdev_callback);
 
 /**
  * @internal The data structure associated with a device. Drivers can access
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index 11f4e6fdbf..f86bf2260b 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -879,7 +879,7 @@ typedef uint16_t (*enqueue_pkt_burst_t)(void *qp,
 struct rte_cryptodev_callback;
 
 /** Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
+RTE_TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
 
 /**
  * Structure used to hold information about the callbacks to be called for a
diff --git a/lib/cryptodev/rte_cryptodev_pmd.h b/lib/cryptodev/rte_cryptodev_pmd.h
index 1274436870..9542cbf263 100644
--- a/lib/cryptodev/rte_cryptodev_pmd.h
+++ b/lib/cryptodev/rte_cryptodev_pmd.h
@@ -66,7 +66,7 @@ struct rte_cryptodev_global {
 
 /* Cryptodev driver, containing the driver ID */
 struct cryptodev_driver {
-	TAILQ_ENTRY(cryptodev_driver) next; /**< Next in list. */
+	RTE_TAILQ_ENTRY(cryptodev_driver) next; /**< Next in list. */
 	const struct rte_driver *driver;
 	uint8_t id;
 };
diff --git a/lib/eal/common/eal_common_devargs.c b/lib/eal/common/eal_common_devargs.c
index 23aaf8b7e4..7edc6798fe 100644
--- a/lib/eal/common/eal_common_devargs.c
+++ b/lib/eal/common/eal_common_devargs.c
@@ -9,6 +9,7 @@
 #include <stdio.h>
 #include <string.h>
 #include <stdarg.h>
+#include <sys/queue.h>
 
 #include <rte_bus.h>
 #include <rte_class.h>
@@ -18,6 +19,7 @@
 #include <rte_errno.h>
 #include <rte_kvargs.h>
 #include <rte_log.h>
+#include <rte_os.h>
 #include <rte_tailq.h>
 #include "eal_private.h"
 
@@ -291,7 +293,7 @@ rte_devargs_insert(struct rte_devargs **da)
 	if (*da == NULL || (*da)->bus == NULL)
 		return -1;
 
-	TAILQ_FOREACH_SAFE(listed_da, &devargs_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(listed_da, &devargs_list, next, tmp) {
 		if (listed_da == *da)
 			/* devargs already in the list */
 			return 0;
@@ -358,7 +360,7 @@ rte_devargs_remove(struct rte_devargs *devargs)
 	if (devargs == NULL || devargs->bus == NULL)
 		return -1;
 
-	TAILQ_FOREACH_SAFE(d, &devargs_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(d, &devargs_list, next, tmp) {
 		if (strcmp(d->bus->name, devargs->bus->name) == 0 &&
 		    strcmp(d->name, devargs->name) == 0) {
 			TAILQ_REMOVE(&devargs_list, d, next);
diff --git a/lib/eal/common/eal_common_fbarray.c b/lib/eal/common/eal_common_fbarray.c
index 3a28a53247..75168ca552 100644
--- a/lib/eal/common/eal_common_fbarray.c
+++ b/lib/eal/common/eal_common_fbarray.c
@@ -9,6 +9,7 @@
 #include <errno.h>
 #include <string.h>
 #include <unistd.h>
+#include <sys/queue.h>
 
 #include <rte_common.h>
 #include <rte_eal_paging.h>
diff --git a/lib/eal/common/eal_common_log.c b/lib/eal/common/eal_common_log.c
index ec8fe23a7f..1be35f5397 100644
--- a/lib/eal/common/eal_common_log.c
+++ b/lib/eal/common/eal_common_log.c
@@ -10,6 +10,7 @@
 #include <errno.h>
 #include <regex.h>
 #include <fnmatch.h>
+#include <sys/queue.h>
 
 #include <rte_eal.h>
 #include <rte_log.h>
diff --git a/lib/eal/common/eal_common_memalloc.c b/lib/eal/common/eal_common_memalloc.c
index e872c6533b..aefdf8de3f 100644
--- a/lib/eal/common/eal_common_memalloc.c
+++ b/lib/eal/common/eal_common_memalloc.c
@@ -3,6 +3,7 @@
  */
 
 #include <string.h>
+#include <sys/queue.h>
 
 #include <rte_errno.h>
 #include <rte_lcore.h>
diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
index ff5861b5f3..2cc74b4472 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <string.h>
+#include <sys/queue.h>
 #ifndef RTE_EXEC_ENV_WINDOWS
 #include <syslog.h>
 #endif
@@ -283,7 +284,7 @@ eal_option_device_parse(void)
 	void *tmp;
 	int ret = 0;
 
-	TAILQ_FOREACH_SAFE(devopt, &devopt_list, next, tmp) {
+	RTE_TAILQ_FOREACH_SAFE(devopt, &devopt_list, next, tmp) {
 		if (ret == 0) {
 			ret = rte_devargs_add(devopt->type, devopt->arg);
 			if (ret)
diff --git a/lib/eal/common/eal_trace.h b/lib/eal/common/eal_trace.h
index 06751eb23a..76fbcd86b0 100644
--- a/lib/eal/common/eal_trace.h
+++ b/lib/eal/common/eal_trace.h
@@ -5,6 +5,8 @@
 #ifndef __EAL_TRACE_H
 #define __EAL_TRACE_H
 
+#include <sys/queue.h>
+
 #include <rte_cycles.h>
 #include <rte_log.h>
 #include <rte_malloc.h>
diff --git a/lib/eal/freebsd/include/rte_os.h b/lib/eal/freebsd/include/rte_os.h
index 627f0483ab..099ad3f019 100644
--- a/lib/eal/freebsd/include/rte_os.h
+++ b/lib/eal/freebsd/include/rte_os.h
@@ -11,6 +11,21 @@
  */
 
 #include <pthread_np.h>
+#include <sys/queue.h>
+
+/* These macros are compatible with system's sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
+#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
+#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
+#define	RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
+#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
+#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
+#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)
+
 
 typedef cpuset_t rte_cpuset_t;
 #define RTE_HAS_CPUSET
diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
index 80b154fb98..84d364df3f 100644
--- a/lib/eal/include/rte_bus.h
+++ b/lib/eal/include/rte_bus.h
@@ -19,13 +19,12 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 
 #include <rte_log.h>
 #include <rte_dev.h>
 
 /** Double linked list of buses */
-TAILQ_HEAD(rte_bus_list, rte_bus);
+RTE_TAILQ_HEAD(rte_bus_list, rte_bus);
 
 
 /**
@@ -250,7 +249,7 @@ typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
  * A structure describing a generic bus.
  */
 struct rte_bus {
-	TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
+	RTE_TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
 	const char *name;            /**< Name of the bus */
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
diff --git a/lib/eal/include/rte_class.h b/lib/eal/include/rte_class.h
index 856d09b22d..d560339652 100644
--- a/lib/eal/include/rte_class.h
+++ b/lib/eal/include/rte_class.h
@@ -22,18 +22,16 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
-
 #include <rte_dev.h>
 
 /** Double linked list of classes */
-TAILQ_HEAD(rte_class_list, rte_class);
+RTE_TAILQ_HEAD(rte_class_list, rte_class);
 
 /**
  * A structure describing a generic device class.
  */
 struct rte_class {
-	TAILQ_ENTRY(rte_class) next; /**< Next device class in linked list */
+	RTE_TAILQ_ENTRY(rte_class) next; /**< Next device class in linked list */
 	const char *name; /**< Name of the class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 };
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index 6dd72c11a1..f6efe0c94e 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -18,7 +18,6 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_compat.h>
@@ -75,7 +74,7 @@ struct rte_mem_resource {
  * A structure describing a device driver.
  */
 struct rte_driver {
-	TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
+	RTE_TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
 	const char *name;                   /**< Driver name. */
 	const char *alias;              /**< Driver alias. */
 };
@@ -90,7 +89,7 @@ struct rte_driver {
  * A structure describing a generic device.
  */
 struct rte_device {
-	TAILQ_ENTRY(rte_device) next; /**< Next device */
+	RTE_TAILQ_ENTRY(rte_device) next; /**< Next device */
 	const char *name;             /**< Device name */
 	const struct rte_driver *driver; /**< Driver assigned after probing */
 	const struct rte_bus *bus;    /**< Bus handle assigned on scan */
diff --git a/lib/eal/include/rte_devargs.h b/lib/eal/include/rte_devargs.h
index cd90944fe8..957477b398 100644
--- a/lib/eal/include/rte_devargs.h
+++ b/lib/eal/include/rte_devargs.h
@@ -21,7 +21,6 @@ extern "C" {
 #endif
 
 #include <stdio.h>
-#include <sys/queue.h>
 #include <rte_compat.h>
 #include <rte_bus.h>
 
@@ -76,7 +75,7 @@ enum rte_devtype {
  */
 struct rte_devargs {
 	/** Next in list. */
-	TAILQ_ENTRY(rte_devargs) next;
+	RTE_TAILQ_ENTRY(rte_devargs) next;
 	/** Type of device. */
 	enum rte_devtype type;
 	/** Device policy. */
diff --git a/lib/eal/include/rte_log.h b/lib/eal/include/rte_log.h
index b706bb8710..bb3523467b 100644
--- a/lib/eal/include/rte_log.h
+++ b/lib/eal/include/rte_log.h
@@ -21,7 +21,6 @@ extern "C" {
 #include <stdio.h>
 #include <stdarg.h>
 #include <stdbool.h>
-#include <sys/queue.h>
 
 #include <rte_common.h>
 #include <rte_config.h>
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index c7d037d862..1c9275c32a 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -29,7 +29,6 @@ extern "C" {
 
 #include<stdio.h>
 #include <stdint.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_lcore.h>
diff --git a/lib/eal/include/rte_tailq.h b/lib/eal/include/rte_tailq.h
index b6fe4e5f78..28cd54ef3e 100644
--- a/lib/eal/include/rte_tailq.h
+++ b/lib/eal/include/rte_tailq.h
@@ -15,17 +15,16 @@
 extern "C" {
 #endif
 
-#include <sys/queue.h>
 #include <stdio.h>
 #include <rte_debug.h>
 
 /** dummy structure type used by the rte_tailq APIs */
 struct rte_tailq_entry {
-	TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
+	RTE_TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
 	void *data; /**< Pointer to the data referenced by this tailq entry */
 };
 /** dummy */
-TAILQ_HEAD(rte_tailq_entry_head, rte_tailq_entry);
+RTE_TAILQ_HEAD(rte_tailq_entry_head, rte_tailq_entry);
 
 #define RTE_TAILQ_NAMESIZE 32
 
@@ -48,7 +47,7 @@ struct rte_tailq_elem {
 	 * rte_eal_tailqs_init()
 	 */
 	struct rte_tailq_head *head;
-	TAILQ_ENTRY(rte_tailq_elem) next;
+	RTE_TAILQ_ENTRY(rte_tailq_elem) next;
 	const char name[RTE_TAILQ_NAMESIZE];
 };
 
@@ -126,10 +125,10 @@ RTE_INIT(tailqinitfn_ ##t) \
 }
 
 /* This macro permits both remove and free var within the loop safely.*/
-#ifndef TAILQ_FOREACH_SAFE
-#define TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
-	for ((var) = TAILQ_FIRST((head));			\
-	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1);	\
+#ifndef RTE_TAILQ_FOREACH_SAFE
+#define RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
+	for ((var) = RTE_TAILQ_FIRST((head));			\
+	    (var) && ((tvar) = RTE_TAILQ_NEXT((var), field), 1);	\
 	    (var) = (tvar))
 #endif
 
diff --git a/lib/eal/linux/include/rte_os.h b/lib/eal/linux/include/rte_os.h
index 1618b4df22..1a6e5b789f 100644
--- a/lib/eal/linux/include/rte_os.h
+++ b/lib/eal/linux/include/rte_os.h
@@ -11,6 +11,21 @@
  */
 
 #include <sched.h>
+#include <sys/queue.h>
+
+/* These macros are compatible with system's sys/queue.h. */
+#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
+#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
+#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
+#define	RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
+#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
+#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
+#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)
+
 
 #ifdef CPU_SETSIZE /* may require _GNU_SOURCE */
 typedef cpu_set_t rte_cpuset_t;
diff --git a/lib/eal/windows/eal_alarm.c b/lib/eal/windows/eal_alarm.c
index e5dc54efb8..103c1f909d 100644
--- a/lib/eal/windows/eal_alarm.c
+++ b/lib/eal/windows/eal_alarm.c
@@ -4,6 +4,7 @@
 
 #include <stdatomic.h>
 #include <stdbool.h>
+#include <sys/queue.h>
 
 #include <rte_alarm.h>
 #include <rte_spinlock.h>
diff --git a/lib/eal/windows/include/rte_os.h b/lib/eal/windows/include/rte_os.h
index 66c711d458..ee7a8c7a08 100644
--- a/lib/eal/windows/include/rte_os.h
+++ b/lib/eal/windows/include/rte_os.h
@@ -18,6 +18,37 @@
 extern "C" {
 #endif
 
+#define	RTE_TAILQ_HEAD(name, type) \
+struct name { \
+	struct type *tqh_first;	/* first element */ \
+	struct type **tqh_last;	/* addr of last next element */	\
+}
+#define	RTE_TAILQ_ENTRY(type) \
+struct { \
+	struct type *tqe_next;	/* next element */ \
+	struct type **tqe_prev;	/* address of previous next element */ \
+}
+#define	RTE_TAILQ_FOREACH(var, head, field) \
+	for ((var) = RTE_TAILQ_FIRST((head)); \
+	    (var); \
+	    (var) = RTE_TAILQ_NEXT((var), field))
+#define	RTE_TAILQ_FOREACH_SAFE(var, head, field, tvar) \
+	for ((var) = TAILQ_FIRST((head)); \
+	    (var) && ((tvar) = TAILQ_NEXT((var), field), 1); \
+	    (var) = (tvar))
+#define	RTE_TAILQ_FIRST(head)	((head)->tqh_first)
+#define	RTE_TAILQ_NEXT(elm, field) ((elm)->field.tqe_next)
+#define	RTE_STAILQ_HEAD(name, type) \
+struct name { \
+	struct type *stqh_first;/* first element */ \
+	struct type **stqh_last;/* addr of last next element */ \
+}
+#define	RTE_STAILQ_ENTRY(type) \
+struct { \
+	struct type *stqe_next;	/* next element */ \
+}
+
+
 /* cpu_set macros implementation */
 #define RTE_CPU_AND(dst, src1, src2) CPU_AND(dst, src1, src2)
 #define RTE_CPU_OR(dst, src1, src2) CPU_OR(dst, src1, src2)
diff --git a/lib/efd/rte_efd.c b/lib/efd/rte_efd.c
index 77f46809f8..5bf517fee9 100644
--- a/lib/efd/rte_efd.c
+++ b/lib/efd/rte_efd.c
@@ -759,7 +759,7 @@ rte_efd_free(struct rte_efd_table *table)
 	efd_list = RTE_TAILQ_CAST(rte_efd_tailq.head, rte_efd_list);
 	rte_mcfg_tailq_write_lock();
 
-	TAILQ_FOREACH_SAFE(te, efd_list, next, temp) {
+	RTE_TAILQ_FOREACH_SAFE(te, efd_list, next, temp) {
 		if (te->data == (void *) table) {
 			TAILQ_REMOVE(efd_list, te, next);
 			rte_free(te);
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index edf96de2dc..d2c9ec42c7 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -21,7 +21,7 @@
 
 struct rte_eth_dev_callback;
 /** @internal Structure to keep track of registered callbacks */
-TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
+RTE_TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
 
 struct rte_eth_dev;
 
diff --git a/lib/hash/rte_fbk_hash.h b/lib/hash/rte_fbk_hash.h
index c4d6976d2b..9c3a61c1d6 100644
--- a/lib/hash/rte_fbk_hash.h
+++ b/lib/hash/rte_fbk_hash.h
@@ -17,7 +17,6 @@
 
 #include <stdint.h>
 #include <errno.h>
-#include <sys/queue.h>
 
 #ifdef __cplusplus
 extern "C" {
diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c
index d5a95a6e00..696a1121e2 100644
--- a/lib/hash/rte_thash.c
+++ b/lib/hash/rte_thash.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2021 Intel Corporation
  */
 
+#include <sys/queue.h>
+
 #include <rte_thash.h>
 #include <rte_tailq.h>
 #include <rte_random.h>
diff --git a/lib/ip_frag/rte_ip_frag.h b/lib/ip_frag/rte_ip_frag.h
index 0bfe64b14e..80f931c32a 100644
--- a/lib/ip_frag/rte_ip_frag.h
+++ b/lib/ip_frag/rte_ip_frag.h
@@ -62,7 +62,7 @@ struct ip_frag_key {
  * First two entries in the frags[] array are for the last and first fragments.
  */
 struct ip_frag_pkt {
-	TAILQ_ENTRY(ip_frag_pkt) lru;   /**< LRU list */
+	RTE_TAILQ_ENTRY(ip_frag_pkt) lru;   /**< LRU list */
 	struct ip_frag_key key;           /**< fragmentation key */
 	uint64_t             start;       /**< creation timestamp */
 	uint32_t             total_size;  /**< expected reassembled size */
@@ -83,7 +83,7 @@ struct rte_ip_frag_death_row {
 	/**< mbufs to be freed */
 };
 
-TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */
+RTE_TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */
 
 /** fragmentation table statistics */
 struct ip_frag_tbl_stat {
diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 59a588425b..c5f859ae71 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1337,7 +1337,7 @@ void rte_mempool_walk(void (*func)(struct rte_mempool *, void *),
 
 	rte_mcfg_mempool_read_lock();
 
-	TAILQ_FOREACH_SAFE(te, mempool_list, next, tmp_te) {
+	RTE_TAILQ_FOREACH_SAFE(te, mempool_list, next, tmp_te) {
 		(*func)((struct rte_mempool *) te->data, arg);
 	}
 
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 4235d6f0bf..f57ecbd6fc 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -38,7 +38,6 @@
 #include <stdint.h>
 #include <errno.h>
 #include <inttypes.h>
-#include <sys/queue.h>
 
 #include <rte_config.h>
 #include <rte_spinlock.h>
@@ -141,7 +140,7 @@ struct rte_mempool_objsz {
  * double-frees.
  */
 struct rte_mempool_objhdr {
-	STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
+	RTE_STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
 	struct rte_mempool *mp;          /**< The mempool owning the object. */
 	rte_iova_t iova;                 /**< IO address of the object. */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
@@ -152,7 +151,7 @@ struct rte_mempool_objhdr {
 /**
  * A list of object headers type
  */
-STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
+RTE_STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
 
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 
@@ -171,7 +170,7 @@ struct rte_mempool_objtlr {
 /**
  * A list of memory where objects are stored
  */
-STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
+RTE_STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
 
 /**
  * Callback used to free a memory chunk
@@ -186,7 +185,7 @@ typedef void (rte_mempool_memchunk_free_cb_t)(struct rte_mempool_memhdr *memhdr,
  * and physically contiguous.
  */
 struct rte_mempool_memhdr {
-	STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
+	RTE_STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
 	struct rte_mempool *mp;  /**< The mempool owning the chunk */
 	void *addr;              /**< Virtual address of the chunk */
 	rte_iova_t iova;         /**< IO address of the chunk */
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index 1f33d687f4..71cbd441c7 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -18,7 +18,6 @@ extern "C" {
 
 #include <stdio.h>
 #include <limits.h>
-#include <sys/queue.h>
 #include <inttypes.h>
 #include <sys/types.h>
 
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 16718ca7f1..43ce1a29d4 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -26,7 +26,6 @@ extern "C" {
 #include <stdio.h>
 #include <stdint.h>
 #include <string.h>
-#include <sys/queue.h>
 #include <errno.h>
 #include <rte_common.h>
 #include <rte_config.h>
diff --git a/lib/table/rte_swx_table.h b/lib/table/rte_swx_table.h
index e23f2304c6..f93e5f3f95 100644
--- a/lib/table/rte_swx_table.h
+++ b/lib/table/rte_swx_table.h
@@ -16,7 +16,8 @@ extern "C" {
  */
 
 #include <stdint.h>
-#include <sys/queue.h>
+
+#include <rte_os.h>
 
 /** Match type. */
 enum rte_swx_table_match_type {
@@ -68,7 +69,7 @@ struct rte_swx_table_entry {
 	/** Used to facilitate the membership of this table entry to a
 	 * linked list.
 	 */
-	TAILQ_ENTRY(rte_swx_table_entry) node;
+	RTE_TAILQ_ENTRY(rte_swx_table_entry) node;
 
 	/** Key value for the current entry. Array of *key_size* bytes or NULL
 	 * if the *key_size* for the current table is 0.
@@ -111,7 +112,7 @@ struct rte_swx_table_entry {
 };
 
 /** List of table entries. */
-TAILQ_HEAD(rte_swx_table_entry_list, rte_swx_table_entry);
+RTE_TAILQ_HEAD(rte_swx_table_entry_list, rte_swx_table_entry);
 
 /**
  * Table memory footprint get
diff --git a/lib/table/rte_swx_table_selector.h b/lib/table/rte_swx_table_selector.h
index 71b6a74810..62988d2856 100644
--- a/lib/table/rte_swx_table_selector.h
+++ b/lib/table/rte_swx_table_selector.h
@@ -16,7 +16,6 @@ extern "C" {
  */
 
 #include <stdint.h>
-#include <sys/queue.h>
 
 #include <rte_compat.h>
 
@@ -56,7 +55,7 @@ struct rte_swx_table_selector_params {
 /** Group member parameters. */
 struct rte_swx_table_selector_member {
 	/** Linked list connectivity. */
-	TAILQ_ENTRY(rte_swx_table_selector_member) node;
+	RTE_TAILQ_ENTRY(rte_swx_table_selector_member) node;
 
 	/** Member ID. */
 	uint32_t member_id;
@@ -66,7 +65,7 @@ struct rte_swx_table_selector_member {
 };
 
 /** List of group members. */
-TAILQ_HEAD(rte_swx_table_selector_member_list, rte_swx_table_selector_member);
+RTE_TAILQ_HEAD(rte_swx_table_selector_member_list, rte_swx_table_selector_member);
 
 /** Group parameters. */
 struct rte_swx_table_selector_group {
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index e0b67721b6..e4a445e709 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -32,7 +32,7 @@ vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
 
 	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
 		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
 		rte_mempool_put(vq->iotlb_pool, node);
 	}
@@ -100,7 +100,8 @@ vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
 
 	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next,
+				temp_node) {
 		if (node->iova < iova)
 			continue;
 		if (node->iova >= iova + size)
@@ -121,7 +122,7 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		TAILQ_REMOVE(&vq->iotlb_list, node, next);
 		rte_mempool_put(vq->iotlb_pool, node);
 	}
@@ -141,7 +142,7 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
 
 	entry_idx = rte_rand() % vq->iotlb_cache_nr;
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			rte_mempool_put(vq->iotlb_pool, node);
@@ -218,7 +219,7 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
-	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		/* Sorted list */
 		if (unlikely(iova + size < node->iova))
 			break;
diff --git a/lib/vhost/rte_vdpa_dev.h b/lib/vhost/rte_vdpa_dev.h
index bfada387b0..b0f494815f 100644
--- a/lib/vhost/rte_vdpa_dev.h
+++ b/lib/vhost/rte_vdpa_dev.h
@@ -71,7 +71,7 @@ struct rte_vdpa_dev_ops {
  * vdpa device structure includes device address and device operations.
  */
 struct rte_vdpa_device {
-	TAILQ_ENTRY(rte_vdpa_device) next;
+	RTE_TAILQ_ENTRY(rte_vdpa_device) next;
 	/** Generic device information */
 	struct rte_device *device;
 	/** vdpa device operations */
diff --git a/lib/vhost/vdpa.c b/lib/vhost/vdpa.c
index 99a926a772..6dd91859ac 100644
--- a/lib/vhost/vdpa.c
+++ b/lib/vhost/vdpa.c
@@ -115,7 +115,7 @@ rte_vdpa_unregister_device(struct rte_vdpa_device *dev)
 	int ret = -1;
 
 	rte_spinlock_lock(&vdpa_device_list_lock);
-	TAILQ_FOREACH_SAFE(cur_dev, &vdpa_device_list, next, tmp_dev) {
+	RTE_TAILQ_FOREACH_SAFE(cur_dev, &vdpa_device_list, next, tmp_dev) {
 		if (dev != cur_dev)
 			continue;
 
-- 
2.30.2


^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCHv3] include: fix sys/queue.h.
  @ 2021-08-12 21:58  3%   ` Dmitry Kozlyuk
  2021-08-13  1:02  1%   ` [dpdk-dev] [PATCHv4] eal: remove sys/queue.h from public headers William Tu
  1 sibling, 0 replies; 200+ results
From: Dmitry Kozlyuk @ 2021-08-12 21:58 UTC (permalink / raw)
  To: William Tu; +Cc: dev, nick.connolly

2021-08-12 20:05 (UTC+0000), William Tu:
> Currently there are a couple of public header files include

Suggested subject: "eal: remove sys/queue.h from public headers".

1. The state before the patch should be described in the past tense.
2. Really ten times more than "a couple", suggesting "some" (nit).
2. "files _that_ include"?

> 'sys/queue.h', which is a POSIX functionality.

It's not POSIX, it's found on many Unix systems.

> When compiling DPDK with OVS on Windows, we encountered issues such as, found the missing
> header.

This sentence is a little hard to parse. Instead, suggesting:

	This file is missing on Windows. During the build, DPDK uses a
	bundled copy, but it cannot be installed because macros it exports
	may conflict with the ones from application code or environment.

> In file included from ../lib/dpdk.c:27:
> C:\temp\dpdk\include\rte_log.h:24:10: fatal error: 'sys/queue.h' file
> not found

An explanation is missing why <sys/queue.h> embedded in DPDK shouldn't be
installed (see above, maybe you can come up with something better).

> 
> The patch fixes it by removing the #include <sys/queue.h> from
> DPDK public headers, so programs including DPDK headers don't depend
> on POSIX sys/queue.h. For Linux/FreeBSD, DPDK public headers only need a
> handful of macros for list/tailq heads and links. Those macros should be
> provided by DPDK, with RTE_ prefix.

It is worth noting that RTE_ macros must be compatible with <sys/queue.h>
at the level of API (to use with <sys/queue.h> macros in C files) and ABI
(to avoid breaking it).

Nit: "Should" is not the right word for things done in the patch. Same below.

> For Linux and FreeBSD it will just be:
>     #include <sys/queue.h>
>     #define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
>     /* ... */
> For Windows, we copy these definitions from <sys/queue.h> to rte_os.h.

No need to describe what's inside the patch, diff already does it :)

> With this patch, all the public headers should not have
> "#include <sys/queue.h>" or "TAILQ_xxx" macros.
> 
> Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
> Suggested-by: Dmitry Kozliuk <Dmitry.Kozliuk@gmail.com>
> Signed-off-by: William Tu <u9012063@gmail.com>
> ---
> v2->v3:
>   * follow the suggestion by Dmitry
>   * run checkpatches, there are some errors but I think either
>     the original file has over 80-char line due to comments,
>     or some false positive about macro.
> v1->v2:
>   - follow the suggestion by Nick and Dmitry
>   - http://mails.dpdk.org/archives/dev/2021-August/216304.html
> 
> Signed-off-by: William Tu <u9012063@gmail.com>
> ---
[...]
> diff --git a/lib/eal/freebsd/include/rte_os.h b/lib/eal/freebsd/include/rte_os.h
> index 627f0483ab..dc889e5826 100644
> --- a/lib/eal/freebsd/include/rte_os.h
> +++ b/lib/eal/freebsd/include/rte_os.h
> @@ -11,6 +11,39 @@
>   */
>  
>  #include <pthread_np.h>
> +#include <sys/queue.h>
> +
> +/* These macros are compatible with system's sys/queue.h. */
> +#define RTE_TAILQ_INIT(head) TAILQ_INIT(head)
> +#define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
> +#define RTE_TAILQ_LAST(head, headname) TAILQ_LAST(head, headname)
> +#define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
> +#define RTE_TAILQ_FIRST(head) TAILQ_FIRST(head)
> +#define RTE_TAILQ_EMPTY(head) TAILQ_EMPTY(head)
> +#define RTE_TAILQ_NEXT(elem, field) TAILQ_NEXT(elem, field)
> +#define RTE_TAILQ_HEAD_INITIALIZER(head) TAILQ_HEAD_INITIALIZER(head)
> +#define RTE_TAILQ_FOREACH(var, head, field) TAILQ_FOREACH(var, head, field)
> +#define RTE_TAILQ_INSERT_TAIL(head, elm, field) \
> +	TAILQ_INSERT_TAIL(head, elm, field)
> +#define RTE_TAILQ_REMOVE(head, elm, field) TAILQ_REMOVE(head, elm, field)
> +#define RTE_TAILQ_INSERT_BEFORE(listelm, elm, field) \
> +	TAILQ_INSERT_BEFORE(listelm, elm, field)
> +#define RTE_TAILQ_INSERT_AFTER(head, listelm, elm, field) \
> +	TAILQ_INSERT_AFTER(head, listelm, elm, field)
> +#define RTE_TAILQ_INSERT_HEAD(head, elm, field) \
> +	TAILQ_INSERT_HEAD(head, elm, field)
> +
> +#define RTE_STAILQ_HEAD(name, type) STAILQ_HEAD(name, type)
> +#define RTE_STAILQ_HEAD_INITIALIZER(head) STAILQ_HEAD_INITIALIZER(head)
> +#define RTE_STAILQ_ENTRY(type) STAILQ_ENTRY(type)

Most of these macros are not used in public headers and are not needed.
The idea is that TAILQ_* macros from sys/queue.h can be used in C files
with variables declared with RTE_TAILQ_HEAD/ENTRY in public headers.
Needed macros:
	RTE_TAILQ_HEAD
	RTE_TAILQ_ENTRY
	RTE_TAILQ_FOREACH
	RTE_TAILQ_FIRST (for RTE_TAILQ_FOREACH_SAFE only)
	RTE_TAILQ_NEXT (ditto)
	RTE_STAILQ_HEAD
	RTE_STAILQ_ENTRY

> +
> +/* This is not defined in sys/queue.h */
> +#ifndef TAILQ_FOREACH_SAFE
> +#define TAILQ_FOREACH_SAFE(var, head, field, tvar)		\
> +	for ((var) = RTE_TAILQ_FIRST((head));			\
> +	    (var) && ((tvar) = RTE_TAILQ_NEXT((var), field), 1);	\
> +	    (var) = (tvar))
> +#endif

Please simply change the three usages of TAILQ_FOREACH_SAFE to
RTE_TAILQ_FOREACH_SAFE and remove this one. It cannot be placed in rte_os.h,
because rte_os.h is public and it must not export non-RTE symbols.

All comments to this file obviously apply to Linux version as well.

>  
>  typedef cpuset_t rte_cpuset_t;
>  #define RTE_HAS_CPUSET
[...]
> diff --git a/lib/eal/windows/include/rte_os.h b/lib/eal/windows/include/rte_os.h
> index 66c711d458..d0935c5003 100644
> --- a/lib/eal/windows/include/rte_os.h
> +++ b/lib/eal/windows/include/rte_os.h
> @@ -18,6 +18,144 @@
>  extern "C" {
>  #endif
>  
> +#ifdef QUEUE_MACRO_DEBUG_TRACE

IMO we all these debugging macros should be removed from this header,
including their use in user-facing macros.
They are implementation detail for <sys/queue.h> developers.

> +/* Store the last 2 places the queue element or head was altered */
> +struct qm_trace {
> +	unsigned long	 lastline;
> +	unsigned long	 prevline;
> +	const char	*lastfile;
> +	const char	*prevfile;
> +};
> +
> +/**
> + * These macros are compatible with the sys/queue.h provided
> + * at DPDK source code.
> + */
[...]
> +
> +#define	QMD_TAILQ_CHECK_HEAD(head, field)
> +#define	QMD_TAILQ_CHECK_TAIL(head, headname)
> +#define	QMD_TAILQ_CHECK_NEXT(elm, field)
> +#define	QMD_TAILQ_CHECK_PREV(elm, field)

Redundant empty lines below.

> +
> +
> +#define	RTE_TAILQ_EMPTY(head)	((head)->tqh_first == NULL)
> +
> +#define	RTE_TAILQ_FIRST(head)	((head)->tqh_first)
> +
> +#define	RTE_TAILQ_INIT(head) do {					\

I suggest removing all spaces but one before the backslash
so that you don't need to manually align.
At least please keep the lines within 80 characters.

> +	RTE_TAILQ_FIRST((head)) = NULL;					\
> +	(head)->tqh_last = &RTE_TAILQ_FIRST((head));			\
> +	QMD_TRACE_HEAD(head);						\
> +} while (0)
[...]

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [EXT] Re:  [PATCH] version: 21.11-rc0
  2021-08-12 14:36  0% ` Ferruh Yigit
@ 2021-08-12 18:57  0%   ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2021-08-12 18:57 UTC (permalink / raw)
  To: Ferruh Yigit, Thomas Monjalon, dev; +Cc: david.marchand, mdr

> On 8/8/2021 8:26 PM, Thomas Monjalon wrote:
> > Start a new release cycle with empty release notes.
> >
> > The ABI version becomes 22.0.
> > The map files are updated to the new ABI major number (22).
> > The ABI exceptions are dropped
> > and CI ABI checks are disabled
> > because compatibility is not preserved.
> >
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> 
> (Applied to dpdk-next-net/main until patch merged to main repo.)
Applied to dpdk-next-crypto as well.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] version: 21.11-rc0
  2021-08-08 19:26 11% [dpdk-dev] [PATCH] version: 21.11-rc0 Thomas Monjalon
@ 2021-08-12 14:36  0% ` Ferruh Yigit
  2021-08-12 18:57  0%   ` [dpdk-dev] [EXT] " Akhil Goyal
  2021-08-17  6:34  4% ` [dpdk-dev] " David Marchand
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-08-12 14:36 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: david.marchand, mdr, Akhil Goyal

On 8/8/2021 8:26 PM, Thomas Monjalon wrote:
> Start a new release cycle with empty release notes.
> 
> The ABI version becomes 22.0.
> The map files are updated to the new ABI major number (22).
> The ABI exceptions are dropped
> and CI ABI checks are disabled
> because compatibility is not preserved.
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>


(Applied to dpdk-next-net/main until patch merged to main repo.)


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ethdev: change queue release callback
  2021-08-11 12:13  0%               ` Xueming(Steven) Li
@ 2021-08-12 14:29  0%                 ` Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-08-12 14:29 UTC (permalink / raw)
  To: Xueming(Steven) Li, Ferruh Yigit, Singh, Aman Deep, Andrew Rybchenko
  Cc: dev, Slava Ovsiienko, NBU-Contact-Thomas Monjalon, jerinj



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Xueming(Steven) Li
> Sent: Wednesday, August 11, 2021 8:13 PM
> To: Ferruh Yigit <ferruh.yigit@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>;
> jerinj@marvell.com
> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> 
> 
> 
> > -----Original Message-----
> > From: Ferruh Yigit <ferruh.yigit@intel.com>
> > Sent: Wednesday, August 11, 2021 7:58 PM
> > To: Xueming(Steven) Li <xuemingl@nvidia.com>; Singh, Aman Deep
> > <aman.deep.singh@intel.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>;
> > NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; jerinj@marvell.com
> > Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> >
> > On 8/10/2021 10:07 AM, Xueming(Steven) Li wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> > >> Sent: Tuesday, August 10, 2021 4:54 PM
> > >> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Singh, Aman Deep
> > >> <aman.deep.singh@intel.com>; Andrew Rybchenko
> > >> <andrew.rybchenko@oktetlabs.ru>
> > >> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>;
> > >> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
> > >> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> > >>
> > >> On 8/10/2021 9:03 AM, Xueming(Steven) Li wrote:
> > >>> Hi Singh and Ferruh,
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
> > >>>> Sent: Monday, August 9, 2021 11:31 PM
> > >>>> To: Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew
> > >>>> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Xueming(Steven) Li
> > >>>> <xuemingl@nvidia.com>
> > >>>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>;
> > >>>> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
> > >>>> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release
> > >>>> callback
> > >>>>
> > >>>> On 8/9/2021 3:39 PM, Singh, Aman Deep wrote:
> > >>>>> Hi Xueming,
> > >>>>>
> > >>>>> On 7/28/2021 1:10 PM, Andrew Rybchenko wrote:
> > >>>>>> On 7/27/21 6:41 AM, Xueming Li wrote:
> > >>>>>>> To align with other eth device queue configuration callbacks,
> > >>>>>>> change RX and TX queue release callback API parameter from
> > >>>>>>> queue object to device and queue index.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> > >>>>>>
> > >>>>>> In fact, there is no strong reasons to do it, but I think it is
> > >>>>>> a nice cleanup to use (dev + queue index) on control path.
> > >>>>>>
> > >>>>>> Hopefully it will not result in any regressions.
> > >>>>>
> > >>>>> Combined there are 100+ API's for Rx/Tx queue_release that need
> > >>>>> to be modified for it.
> > >>>>>
> > >>>>> I believe all regression possibilities here will be caught, in
> > >>>>> compilation phase itself.
> > >>>>>
> > >>>>
> > >>>> Same here, it is a good cleanup but there is no strong reason for it.
> > >>>>
> > >>>> Since it is all internal, there is no ABI restriction on the
> > >>>> patch, and v21.11 will be full ABI break patches, to not cause conflicts with this change, what would you think to have it on
> v22.02?
> > >>>
> > >>> This patch is required by shared-rxq feature which ABI broken, target to 21.11.
> > >>
> > >> Why it is required?
> > >
> > > In rx burst function, rxq object is used in data path. For best data performance, it's shared-rxq object in case of shared rxq enabled.
> > > I think eth api defined rxq object for performance as well, specific on data plane.
> > > Hardware saves port info received packet descriptor for my case.
> > > Can't tell which device's queue with this shared rxq object, control
> > > path can't use this shared rxq anymore, have to be specific on
> > dev and queue id.
> > >
> >
> > I have seen shared Rx queue patch, but that just introduces the
> > offload and doesn't have the PMD implementation, so hard to see the dependency, can you please put the pseudocode for PMDs
> for shared-rxq?
> 
> The code is almost ready, I'll upload the PMD part soon.

Seems lots of PMD conflicts to rebase, have to hold it due to other urgent tasks. Here is the overall data structure:
Struct mlx5_rxq_ctrl {
	Bool shared;
	LIST_HEAD owners; // owner rxq(s)
	// datapath resources
}
Struct mlx5_rxq_priv { // rx queue 
	U16 queue_index;
	LIST_ENTRY owner_entry; // membership in shared rxq
	Struct mlx5_rxq_ctrl *ctrl; // save to dev->data->rx_queues[]
	// other per queue resources
}
Rxq_ctrl could be 1:1 mapping to rxq_priv in case of standard rxq, 1:N in case of shared
Shared rxq_ctrl will be released till last owner rxq_priv released.

BTW, v1 posted, please check.

> But firstly, I'll upload v1 patch for this RFC, the make PMD patches depends on this v1 patch.
> 
> > How a queue will know if it is shared or not, during release?
> 
> That's why this RFC want to change callback parameter to device and queue id.
> There is an offloading flag during rxq setup, either in device or in queue configuration.
> PMD driver saves the flag and operate accordingly.
> Ethdev api doesn't need to save this, unless a solid reason.
> 
> >
> > Btw, shared Rx doesn't mention from this dependency in the patch.
> 
> Agree, indeed a strong dependency, thanks!
> 
> >
> > >>
> > >>> I'll do it carefully, fortunately, the change is straightforward.
> > >>>
> > >


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ethdev: change queue release callback
  2021-08-11 11:57  0%             ` Ferruh Yigit
@ 2021-08-11 12:13  0%               ` Xueming(Steven) Li
  2021-08-12 14:29  0%                 ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-08-11 12:13 UTC (permalink / raw)
  To: Ferruh Yigit, Singh, Aman Deep, Andrew Rybchenko
  Cc: dev, Slava Ovsiienko, NBU-Contact-Thomas Monjalon, jerinj



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Wednesday, August 11, 2021 7:58 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>;
> jerinj@marvell.com
> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> 
> On 8/10/2021 10:07 AM, Xueming(Steven) Li wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Tuesday, August 10, 2021 4:54 PM
> >> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Singh, Aman Deep
> >> <aman.deep.singh@intel.com>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>
> >> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>;
> >> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
> >> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> >>
> >> On 8/10/2021 9:03 AM, Xueming(Steven) Li wrote:
> >>> Hi Singh and Ferruh,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >>>> Sent: Monday, August 9, 2021 11:31 PM
> >>>> To: Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko
> >>>> <andrew.rybchenko@oktetlabs.ru>; Xueming(Steven) Li
> >>>> <xuemingl@nvidia.com>
> >>>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>;
> >>>> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
> >>>> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> >>>>
> >>>> On 8/9/2021 3:39 PM, Singh, Aman Deep wrote:
> >>>>> Hi Xueming,
> >>>>>
> >>>>> On 7/28/2021 1:10 PM, Andrew Rybchenko wrote:
> >>>>>> On 7/27/21 6:41 AM, Xueming Li wrote:
> >>>>>>> To align with other eth device queue configuration callbacks,
> >>>>>>> change RX and TX queue release callback API parameter from queue
> >>>>>>> object to device and queue index.
> >>>>>>>
> >>>>>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> >>>>>>
> >>>>>> In fact, there is no strong reasons to do it, but I think it is a
> >>>>>> nice cleanup to use (dev + queue index) on control path.
> >>>>>>
> >>>>>> Hopefully it will not result in any regressions.
> >>>>>
> >>>>> Combined there are 100+ API's for Rx/Tx queue_release that need to
> >>>>> be modified for it.
> >>>>>
> >>>>> I believe all regression possibilities here will be caught, in
> >>>>> compilation phase itself.
> >>>>>
> >>>>
> >>>> Same here, it is a good cleanup but there is no strong reason for it.
> >>>>
> >>>> Since it is all internal, there is no ABI restriction on the patch,
> >>>> and v21.11 will be full ABI break patches, to not cause conflicts with this change, what would you think to have it on v22.02?
> >>>
> >>> This patch is required by shared-rxq feature which ABI broken, target to 21.11.
> >>
> >> Why it is required?
> >
> > In rx burst function, rxq object is used in data path. For best data performance, it's shared-rxq object in case of shared rxq enabled.
> > I think eth api defined rxq object for performance as well, specific on data plane.
> > Hardware saves port info received packet descriptor for my case.
> > Can't tell which device's queue with this shared rxq object, control path can't use this shared rxq anymore, have to be specific on
> dev and queue id.
> >
> 
> I have seen shared Rx queue patch, but that just introduces the offload and doesn't have the PMD implementation, so hard to see the
> dependency, can you please put the pseudocode for PMDs for shared-rxq?

The code is almost ready, I'll upload the PMD part soon.
But firstly, I'll upload v1 patch for this RFC, the make PMD patches depends on this v1 patch.

> How a queue will know if it is shared or not, during release?

That's why this RFC want to change callback parameter to device and queue id.
There is an offloading flag during rxq setup, either in device or in queue configuration.
PMD driver saves the flag and operate accordingly.
Ethdev api doesn't need to save this, unless a solid reason.

> 
> Btw, shared Rx doesn't mention from this dependency in the patch.

Agree, indeed a strong dependency, thanks!

> 
> >>
> >>> I'll do it carefully, fortunately, the change is straightforward.
> >>>
> >


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ethdev: change queue release callback
  2021-08-10  9:07  0%           ` Xueming(Steven) Li
@ 2021-08-11 11:57  0%             ` Ferruh Yigit
  2021-08-11 12:13  0%               ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-08-11 11:57 UTC (permalink / raw)
  To: Xueming(Steven) Li, Singh, Aman Deep, Andrew Rybchenko
  Cc: dev, Slava Ovsiienko, NBU-Contact-Thomas Monjalon, jerinj

On 8/10/2021 10:07 AM, Xueming(Steven) Li wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Tuesday, August 10, 2021 4:54 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
>> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
>>
>> On 8/10/2021 9:03 AM, Xueming(Steven) Li wrote:
>>> Hi Singh and Ferruh,
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Sent: Monday, August 9, 2021 11:31 PM
>>>> To: Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>; Xueming(Steven) Li
>>>> <xuemingl@nvidia.com>
>>>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>;
>>>> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
>>>> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
>>>>
>>>> On 8/9/2021 3:39 PM, Singh, Aman Deep wrote:
>>>>> Hi Xueming,
>>>>>
>>>>> On 7/28/2021 1:10 PM, Andrew Rybchenko wrote:
>>>>>> On 7/27/21 6:41 AM, Xueming Li wrote:
>>>>>>> To align with other eth device queue configuration callbacks,
>>>>>>> change RX and TX queue release callback API parameter from queue
>>>>>>> object to device and queue index.
>>>>>>>
>>>>>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
>>>>>>
>>>>>> In fact, there is no strong reasons to do it, but I think it is a
>>>>>> nice cleanup to use (dev + queue index) on control path.
>>>>>>
>>>>>> Hopefully it will not result in any regressions.
>>>>>
>>>>> Combined there are 100+ API's for Rx/Tx queue_release that need to
>>>>> be modified for it.
>>>>>
>>>>> I believe all regression possibilities here will be caught, in
>>>>> compilation phase itself.
>>>>>
>>>>
>>>> Same here, it is a good cleanup but there is no strong reason for it.
>>>>
>>>> Since it is all internal, there is no ABI restriction on the patch,
>>>> and v21.11 will be full ABI break patches, to not cause conflicts with this change, what would you think to have it on v22.02?
>>>
>>> This patch is required by shared-rxq feature which ABI broken, target to 21.11.
>>
>> Why it is required?
> 
> In rx burst function, rxq object is used in data path. For best data performance, it's shared-rxq object in case of shared rxq enabled.
> I think eth api defined rxq object for performance as well, specific on data plane. 
> Hardware saves port info received packet descriptor for my case.
> Can't tell which device's queue with this shared rxq object, control path can't use this shared rxq anymore, have to be specific on dev and queue id.
> 

I have seen shared Rx queue patch, but that just introduces the offload and
doesn't have the PMD implementation, so hard to see the dependency, can you
please put the pseudocode for PMDs for shared-rxq?
How a queue will know if it is shared or not, during release?

Btw, shared Rx doesn't mention from this dependency in the patch.

>>
>>> I'll do it carefully, fortunately, the change is straightforward.
>>>
> 


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 1/1] devtools: add relative path support for ABI compatibility check
  2021-08-11  6:17  8% ` [dpdk-dev] [PATCH v2 0/1] relative path support for ABI compatibility check Feifei Wang
@ 2021-08-11  6:17 17%   ` Feifei Wang
  0 siblings, 0 replies; 200+ results
From: Feifei Wang @ 2021-08-11  6:17 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, nd, Phil Yang, Feifei Wang, Juraj Linkeš, Ruifeng Wang

From: Phil Yang <phil.yang@arm.com>

Because dpdk guide does not limit the relative path for ABI
compatibility check, users maybe set 'DPDK_ABI_REF_DIR' as a relative
path:

~/dpdk/devtools$ DPDK_ABI_REF_VERSION=v19.11 DPDK_ABI_REF_DIR=build-gcc-shared
./test-meson-builds.sh

And if the DESTDIR is not an absolute path, ninja complains:
+ install_target build-gcc-shared/v19.11/build build-gcc-shared/v19.11/build-gcc-shared
+ rm -rf build-gcc-shared/v19.11/build-gcc-shared
+ echo 'DESTDIR=build-gcc-shared/v19.11/build-gcc-shared ninja -C build-gcc-shared/v19.11/build install'
+ DESTDIR=build-gcc-shared/v19.11/build-gcc-shared
+ ninja -C build-gcc-shared/v19.11/build install
...
ValueError: dst_dir must be absolute, got build-gcc-shared/v19.11/build-gcc-shared/usr/local/share/dpdk/
examples/bbdev_app
...
Error: install directory 'build-gcc-shared/v19.11/build-gcc-shared' does not exist.

To fix this, add relative path support using 'readlink -f'.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 devtools/test-meson-builds.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index 9ec8e2bc7e..8ddde95276 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -168,7 +168,8 @@ build () # <directory> <target cc | cross file> <ABI check> [meson options]
 	config $srcdir $builds_dir/$targetdir $cross --werror $*
 	compile $builds_dir/$targetdir
 	if [ -n "$DPDK_ABI_REF_VERSION" -a "$abicheck" = ABI ] ; then
-		abirefdir=${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION
+		abirefdir=$(readlink -f \
+			${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION)
 		if [ ! -d $abirefdir/$targetdir ]; then
 			# clone current sources
 			if [ ! -d $abirefdir/src ]; then
-- 
2.25.1


^ permalink raw reply	[relevance 17%]

* [dpdk-dev] [PATCH v2 0/1] relative path support for ABI compatibility check
    @ 2021-08-11  6:17  8% ` Feifei Wang
  2021-08-11  6:17 17%   ` [dpdk-dev] [PATCH v2 1/1] devtools: add " Feifei Wang
  1 sibling, 1 reply; 200+ results
From: Feifei Wang @ 2021-08-11  6:17 UTC (permalink / raw)
  Cc: dev, nd, Feifei Wang

Add relative path support for ABI compatibility check.

v2: 
1. delete the code simplification patch due to negative effects (Thomas)

Phil Yang (1):
  devtools: add relative path support for ABI compatibility check

 devtools/test-meson-builds.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [Bug 788] i40e: 16BYTE_RX_DESC build broken on FreeBSD-13
@ 2021-08-10 18:27  5% bugzilla
  0 siblings, 0 replies; 200+ results
From: bugzilla @ 2021-08-10 18:27 UTC (permalink / raw)
  To: dev

https://bugs.dpdk.org/show_bug.cgi?id=788

            Bug ID: 788
           Summary: i40e: 16BYTE_RX_DESC build broken on FreeBSD-13
           Product: DPDK
           Version: 21.08
          Hardware: x86
                OS: FreeBSD
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: brian90013@gmail.com
  Target Milestone: ---

Hello,

I just tried compiling DPDK 21.08 and found my configuration no longer builds
on FreeBSD-13.0. With version 21.05, I defined RTE_LIBRTE_I40E_16BYTE_RX_DESC
in rte_config.h as described in section "Use 16 Bytes RX Descriptor Size" of
the current i40e PMD documentation. I also defined a similar variable
RTE_LIBRTE_ICE_16BYTE_RX_DESC in rte_config.h for the ice PMD.

This morning I brought in version 21.08 and watched it compile on FreeBSD-12.2
(clang version 10.0.1) running on an 'Intel(R) Xeon(R) CPU E5-2637 v3'. Then I
tried building it on FreeBSD-13.0 (clang version 11.0.1) on a 'AMD Ryzen
Threadripper 3990X 64-Core Processor' but the build died with a number of
compilation errors related to avx512f features enabled in functions compiled
without support for avx512f.

Below I have an edited build log from the FreeBSD-12.2 system that works
followed by the log from the FreeBSD-13.0 system that fails. Looking at the
12.2 log, there is a warning “Binutils error with AVX512 assembly, disabling
AVX512 support” that might be hiding this issue? Neither system has hardware
support for AVX-512 but it appears that the compiler does. Thank you for your
help!



*** FreeBSD-12.2 build that works ***
The Meson build system
Version: 0.58.1
Build type: native build
Program cat found: YES (/bin/cat)
Project name: DPDK
Project version: 21.08.0
C compiler for the host machine: cc (clang 10.0.1 "FreeBSD clang version 10.0.1
(git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)")
C linker for the host machine: cc ld.lld 10.0.1
Host machine cpu family: x86_64
Host machine cpu: x86_64

Compiler for C supports arguments -mno-avx512f: YES 
config/x86/meson.build:9: WARNING: Binutils error with AVX512 assembly,
disabling AVX512 support
Compiler for C supports arguments -mavx512f: YES 
Checking if "AVX512 checking" compiles: YES 
Fetching value of define "__SSE4_2__" : 1 
Fetching value of define "__AES__" : 1 
Fetching value of define "__AVX__" : 1 
Fetching value of define "__AVX2__" : 1 
Fetching value of define "__AVX512BW__" :  
Fetching value of define "__AVX512CD__" :  
Fetching value of define "__AVX512DQ__" :  
Fetching value of define "__AVX512F__" :  
Fetching value of define "__AVX512VL__" :  
Fetching value of define "__PCLMUL__" : 1 
Fetching value of define "__RDRND__" : 1 
Fetching value of define "__RDSEED__" :  
Fetching value of define "__VPCLMULQDQ__" :  

Compiler for C supports arguments -mpclmul: YES 
Compiler for C supports arguments -maes: YES 



*** FreeBSD-13.0 system that does not build ***
The Meson build system
Version: 0.58.1
Build type: native build
Program cat found: YES (/bin/cat)
Project name: DPDK
Project version: 21.08.0
C compiler for the host machine: cc (clang 11.0.1 "FreeBSD clang version 11.0.1
(git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe)")
C linker for the host machine: cc ld.lld 11.0.1
Host machine cpu family: x86_64
Host machine cpu: x86_64

Compiler for C supports arguments -mavx512f: YES 
Checking if "AVX512 checking" compiles: YES 
Fetching value of define "__SSE4_2__" : 1 
Fetching value of define "__AES__" : 1 
Fetching value of define "__AVX__" : 1 
Fetching value of define "__AVX2__" : 1 
Fetching value of define "__AVX512BW__" :  
Fetching value of define "__AVX512CD__" :  
Fetching value of define "__AVX512DQ__" :  
Fetching value of define "__AVX512F__" :  
Fetching value of define "__AVX512VL__" :  
Fetching value of define "__PCLMUL__" : 1 
Fetching value of define "__RDRND__" : 1 
Fetching value of define "__RDSEED__" : 1 
Fetching value of define "__VPCLMULQDQ__" :  

Compiler for C supports arguments -mpclmul: YES 
Compiler for C supports arguments -maes: YES 
Compiler for C supports arguments -mavx512f: YES (cached)
Compiler for C supports arguments -mavx512bw: YES 
Compiler for C supports arguments -mavx512dq: YES 
Compiler for C supports arguments -mavx512vl: YES 
Compiler for C supports arguments -mvpclmulqdq: YES 
Compiler for C supports arguments -mavx2: YES 
Compiler for C supports arguments -mavx: YES 

Compiler for C supports arguments -mavx512f -mavx512vl -mavx512cd -mavx512bw:
YES 
Compiler for C supports arguments -mavx512f -mavx512dq: YES 



FAILED: drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o 
cc -Idrivers/libtmp_rte_net_i40e.a.p -Idrivers -I../drivers -Idrivers/net/i40e
-I../drivers/net/i40e -Idrivers/net/i40e/base -I../drivers/net/i40e/base
-Ilib/ethdev -I../lib/ethdev -I. -I.. -Iconfig -I../config -Ilib/eal/include
-I../lib/eal/include -Ilib/eal/freebsd/include -I../lib/eal/freebsd/include
-Ilib/eal/x86/include -I../lib/eal/x86/include -Ilib/eal/common
-I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs
-Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Ilib/net
-I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring
-I../lib/ring -Ilib/meter -I../lib/meter -Idrivers/bus/pci -I../drivers/bus/pci
-I../drivers/bus/pci/bsd -Ilib/pci -I../lib/pci -Idrivers/bus/vdev
-I../drivers/bus/vdev -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu
-fcolor-diagnostics -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -include
rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral
-Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes
-Wundef -Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -D__BSD_VISIBLE -fPIC
-march=native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -DPF_DRIVER
-DVF_DRIVER -DINTEGRATED_VF -DX722_A0_SUPPORT -DCC_AVX2_SUPPORT
-DCC_AVX512_SUPPORT -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.i40e -MD -MQ
drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o -MF
drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o.d -o
drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o -c
../drivers/net/i40e/i40e_rxtx_vec_avx2.c
In file included from ../drivers/net/i40e/i40e_rxtx_vec_avx2.c:13:
../drivers/net/i40e/i40e_rxtx_vec_common.h:337:22: error: always_inline
function '_mm512_set1_epi64' requires target feature 'avx512f', but would be
inlined into function 'i40e_rxq_rearm_common' that is compiled without support
for 'avx512f'
                __m512i hdr_room = _mm512_set1_epi64(RTE_PKTMBUF_HEADROOM);
                                   ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:337:22: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:385:5: error:
'__builtin_ia32_inserti64x4' needs target feature avx512f
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
                                ^
/usr/lib/clang/11.0.1/include/avx512fintrin.h:7413:12: note: expanded from
macro '_mm512_inserti64x4'
  (__m512i)__builtin_ia32_inserti64x4((__v8di)(__m512i)(A), \
           ^
In file included from ../drivers/net/i40e/i40e_rxtx_vec_avx2.c:13:
../drivers/net/i40e/i40e_rxtx_vec_common.h:385:24: error: always_inline
function '_mm512_castsi256_si512' requires target feature 'avx512f', but would
be inlined into function 'i40e_rxq_rearm_common' that is compiled without
support for 'avx512f'
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
                                                   ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:385:24: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:388:5: error:
'__builtin_ia32_inserti64x4' needs target feature avx512f
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
                                ^
/usr/lib/clang/11.0.1/include/avx512fintrin.h:7413:12: note: expanded from
macro '_mm512_inserti64x4'
  (__m512i)__builtin_ia32_inserti64x4((__v8di)(__m512i)(A), \
           ^
In file included from ../drivers/net/i40e/i40e_rxtx_vec_avx2.c:13:
../drivers/net/i40e/i40e_rxtx_vec_common.h:388:24: error: always_inline
function '_mm512_castsi256_si512' requires target feature 'avx512f', but would
be inlined into function 'i40e_rxq_rearm_common' that is compiled without
support for 'avx512f'
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
                                                   ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:388:24: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:392:18: error: always_inline
function '_mm512_unpackhi_epi64' requires target feature 'avx512f', but would
be inlined into function 'i40e_rxq_rearm_common' that is compiled without
support for 'avx512f'
                        dma_addr0_3 = _mm512_unpackhi_epi64(vaddr0_3,
vaddr0_3);
                                      ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:392:18: error: AVX vector argument
of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:393:18: error: always_inline
function '_mm512_unpackhi_epi64' requires target feature 'avx512f', but would
be inlined into function 'i40e_rxq_rearm_common' that is compiled without
support for 'avx512f'
                        dma_addr4_7 = _mm512_unpackhi_epi64(vaddr4_7,
vaddr4_7);
                                      ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:393:18: error: AVX vector argument
of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:396:18: error: always_inline
function '_mm512_add_epi64' requires target feature 'avx512f', but would be
inlined into function 'i40e_rxq_rearm_common' that is compiled without support
for 'avx512f'
                        dma_addr0_3 = _mm512_add_epi64(dma_addr0_3, hdr_room);
                                      ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:396:18: error: AVX vector argument
of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:397:18: error: always_inline
function '_mm512_add_epi64' requires target feature 'avx512f', but would be
inlined into function 'i40e_rxq_rearm_common' that is compiled without support
for 'avx512f'
                        dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room);
                                      ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:397:18: error: AVX vector argument
of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:400:4: error: always_inline function
'_mm512_store_si512' requires target feature 'avx512f', but would be inlined
into function 'i40e_rxq_rearm_common' that is compiled without support for
'avx512f'
                        _mm512_store_si512((__m512i *)&rxdp->read,
dma_addr0_3);
                        ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:400:4: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:401:4: error: always_inline function
'_mm512_store_si512' requires target feature 'avx512f', but would be inlined
into function 'i40e_rxq_rearm_common' that is compiled without support for
'avx512f'
                        _mm512_store_si512((__m512i *)&(rxdp + 4)->read,
dma_addr4_7);
                        ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
[971/1893] Compiling C object
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o
FAILED: drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o 
cc -Idrivers/libtmp_rte_net_ice.a.p -Idrivers -I../drivers -Idrivers/net/ice
-I../drivers/net/ice -Idrivers/net/ice/base -I../drivers/net/ice/base
-Idrivers/common/iavf -I../drivers/common/iavf -Ilib/ethdev -I../lib/ethdev -I.
-I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include
-Ilib/eal/freebsd/include -I../lib/eal/freebsd/include -Ilib/eal/x86/include
-I../lib/eal/x86/include -Ilib/eal/common -I../lib/eal/common -Ilib/eal
-I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/metrics -I../lib/metrics
-Ilib/telemetry -I../lib/telemetry -Ilib/net -I../lib/net -Ilib/mbuf
-I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring
-Ilib/meter -I../lib/meter -Idrivers/bus/pci -I../drivers/bus/pci
-I../drivers/bus/pci/bsd -Ilib/pci -I../lib/pci -Idrivers/bus/vdev
-I../drivers/bus/vdev -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu
-fcolor-diagnostics -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -include
rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral
-Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes
-Wundef -Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -D__BSD_VISIBLE -fPIC
-march=native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -DCC_AVX2_SUPPORT
-DCC_AVX512_SUPPORT -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.ice -MD -MQ
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o -MF
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o.d -o
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o -c
../drivers/net/ice/ice_rxtx_vec_avx2.c
In file included from ../drivers/net/ice/ice_rxtx_vec_avx2.c:5:
../drivers/net/ice/ice_rxtx_vec_common.h:422:22: error: always_inline function
'_mm512_set1_epi64' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                __m512i hdr_room = _mm512_set1_epi64(RTE_PKTMBUF_HEADROOM);
                                   ^
../drivers/net/ice/ice_rxtx_vec_common.h:422:22: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:470:5: error:
'__builtin_ia32_inserti64x4' needs target feature avx512f
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
                                ^
/usr/lib/clang/11.0.1/include/avx512fintrin.h:7413:12: note: expanded from
macro '_mm512_inserti64x4'
  (__m512i)__builtin_ia32_inserti64x4((__v8di)(__m512i)(A), \
           ^
In file included from ../drivers/net/ice/ice_rxtx_vec_avx2.c:5:
../drivers/net/ice/ice_rxtx_vec_common.h:470:24: error: always_inline function
'_mm512_castsi256_si512' requires target feature 'avx512f', but would be
inlined into function 'ice_rxq_rearm_common' that is compiled without support
for 'avx512f'
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
                                                   ^
../drivers/net/ice/ice_rxtx_vec_common.h:470:24: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:473:5: error:
'__builtin_ia32_inserti64x4' needs target feature avx512f
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
                                ^
/usr/lib/clang/11.0.1/include/avx512fintrin.h:7413:12: note: expanded from
macro '_mm512_inserti64x4'
  (__m512i)__builtin_ia32_inserti64x4((__v8di)(__m512i)(A), \
           ^
In file included from ../drivers/net/ice/ice_rxtx_vec_avx2.c:5:
../drivers/net/ice/ice_rxtx_vec_common.h:473:24: error: always_inline function
'_mm512_castsi256_si512' requires target feature 'avx512f', but would be
inlined into function 'ice_rxq_rearm_common' that is compiled without support
for 'avx512f'
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
                                                   ^
../drivers/net/ice/ice_rxtx_vec_common.h:473:24: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:477:18: error: always_inline function
'_mm512_unpackhi_epi64' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                        dma_addr0_3 = _mm512_unpackhi_epi64(vaddr0_3,
vaddr0_3);
                                      ^
../drivers/net/ice/ice_rxtx_vec_common.h:477:18: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:478:18: error: always_inline function
'_mm512_unpackhi_epi64' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                        dma_addr4_7 = _mm512_unpackhi_epi64(vaddr4_7,
vaddr4_7);
                                      ^
../drivers/net/ice/ice_rxtx_vec_common.h:478:18: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:481:18: error: always_inline function
'_mm512_add_epi64' requires target feature 'avx512f', but would be inlined into
function 'ice_rxq_rearm_common' that is compiled without support for 'avx512f'
                        dma_addr0_3 = _mm512_add_epi64(dma_addr0_3, hdr_room);
                                      ^
../drivers/net/ice/ice_rxtx_vec_common.h:481:18: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:482:18: error: always_inline function
'_mm512_add_epi64' requires target feature 'avx512f', but would be inlined into
function 'ice_rxq_rearm_common' that is compiled without support for 'avx512f'
                        dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room);
                                      ^
../drivers/net/ice/ice_rxtx_vec_common.h:482:18: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:485:4: error: always_inline function
'_mm512_store_si512' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                        _mm512_store_si512((__m512i *)&rxdp->read,
dma_addr0_3);
                        ^
../drivers/net/ice/ice_rxtx_vec_common.h:485:4: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:486:4: error: always_inline function
'_mm512_store_si512' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                        _mm512_store_si512((__m512i *)&(rxdp + 4)->read,
dma_addr4_7);
                        ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
[998/1893] Compiling C object
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx.c.o
../drivers/net/ice/ice_rxtx.c:129:60: warning: unused parameter 'rxq'
[-Wunused-parameter]
ice_rxd_to_pkt_fields_by_comms_aux_v1(struct ice_rx_queue *rxq,
                                                           ^
../drivers/net/ice/ice_rxtx.c:171:60: warning: unused parameter 'rxq'
[-Wunused-parameter]
ice_rxd_to_pkt_fields_by_comms_aux_v2(struct ice_rx_queue *rxq,
                                                           ^
2 warnings generated.
[1006/1893] Compiling C object
lib/librte_pipeline.a.p/pipeline_rte_table_action.c.o
ninja: build stopped: subcommand failed.
[109/890] Compiling C object
drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o
FAILED: drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o 
cc -Idrivers/libtmp_rte_net_i40e.a.p -Idrivers -I../drivers -Idrivers/net/i40e
-I../drivers/net/i40e -Idrivers/net/i40e/base -I../drivers/net/i40e/base
-Ilib/ethdev -I../lib/ethdev -I. -I.. -Iconfig -I../config -Ilib/eal/include
-I../lib/eal/include -Ilib/eal/freebsd/include -I../lib/eal/freebsd/include
-Ilib/eal/x86/include -I../lib/eal/x86/include -Ilib/eal/common
-I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs
-Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Ilib/net
-I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring
-I../lib/ring -Ilib/meter -I../lib/meter -Idrivers/bus/pci -I../drivers/bus/pci
-I../drivers/bus/pci/bsd -Ilib/pci -I../lib/pci -Idrivers/bus/vdev
-I../drivers/bus/vdev -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu
-fcolor-diagnostics -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -include
rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral
-Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes
-Wundef -Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -D__BSD_VISIBLE -fPIC
-march=native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -DPF_DRIVER
-DVF_DRIVER -DINTEGRATED_VF -DX722_A0_SUPPORT -DCC_AVX2_SUPPORT
-DCC_AVX512_SUPPORT -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.i40e -MD -MQ
drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o -MF
drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o.d -o
drivers/libtmp_rte_net_i40e.a.p/net_i40e_i40e_rxtx_vec_avx2.c.o -c
../drivers/net/i40e/i40e_rxtx_vec_avx2.c
In file included from ../drivers/net/i40e/i40e_rxtx_vec_avx2.c:13:
../drivers/net/i40e/i40e_rxtx_vec_common.h:337:22: error: always_inline
function '_mm512_set1_epi64' requires target feature 'avx512f', but would be
inlined into function 'i40e_rxq_rearm_common' that is compiled without support
for 'avx512f'
                __m512i hdr_room = _mm512_set1_epi64(RTE_PKTMBUF_HEADROOM);
                                   ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:337:22: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:385:5: error:
'__builtin_ia32_inserti64x4' needs target feature avx512f
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
                                ^
/usr/lib/clang/11.0.1/include/avx512fintrin.h:7413:12: note: expanded from
macro '_mm512_inserti64x4'
  (__m512i)__builtin_ia32_inserti64x4((__v8di)(__m512i)(A), \
           ^
In file included from ../drivers/net/i40e/i40e_rxtx_vec_avx2.c:13:
../drivers/net/i40e/i40e_rxtx_vec_common.h:385:24: error: always_inline
function '_mm512_castsi256_si512' requires target feature 'avx512f', but would
be inlined into function 'i40e_rxq_rearm_common' that is compiled without
support for 'avx512f'
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
                                                   ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:385:24: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:388:5: error:
'__builtin_ia32_inserti64x4' needs target feature avx512f
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
                                ^
/usr/lib/clang/11.0.1/include/avx512fintrin.h:7413:12: note: expanded from
macro '_mm512_inserti64x4'
  (__m512i)__builtin_ia32_inserti64x4((__v8di)(__m512i)(A), \
           ^
In file included from ../drivers/net/i40e/i40e_rxtx_vec_avx2.c:13:
../drivers/net/i40e/i40e_rxtx_vec_common.h:388:24: error: always_inline
function '_mm512_castsi256_si512' requires target feature 'avx512f', but would
be inlined into function 'i40e_rxq_rearm_common' that is compiled without
support for 'avx512f'
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
                                                   ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:388:24: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:392:18: error: always_inline
function '_mm512_unpackhi_epi64' requires target feature 'avx512f', but would
be inlined into function 'i40e_rxq_rearm_common' that is compiled without
support for 'avx512f'
                        dma_addr0_3 = _mm512_unpackhi_epi64(vaddr0_3,
vaddr0_3);
                                      ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:392:18: error: AVX vector argument
of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:393:18: error: always_inline
function '_mm512_unpackhi_epi64' requires target feature 'avx512f', but would
be inlined into function 'i40e_rxq_rearm_common' that is compiled without
support for 'avx512f'
                        dma_addr4_7 = _mm512_unpackhi_epi64(vaddr4_7,
vaddr4_7);
                                      ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:393:18: error: AVX vector argument
of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:396:18: error: always_inline
function '_mm512_add_epi64' requires target feature 'avx512f', but would be
inlined into function 'i40e_rxq_rearm_common' that is compiled without support
for 'avx512f'
                        dma_addr0_3 = _mm512_add_epi64(dma_addr0_3, hdr_room);
                                      ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:396:18: error: AVX vector argument
of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:397:18: error: always_inline
function '_mm512_add_epi64' requires target feature 'avx512f', but would be
inlined into function 'i40e_rxq_rearm_common' that is compiled without support
for 'avx512f'
                        dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room);
                                      ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:397:18: error: AVX vector argument
of type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:400:4: error: always_inline function
'_mm512_store_si512' requires target feature 'avx512f', but would be inlined
into function 'i40e_rxq_rearm_common' that is compiled without support for
'avx512f'
                        _mm512_store_si512((__m512i *)&rxdp->read,
dma_addr0_3);
                        ^
../drivers/net/i40e/i40e_rxtx_vec_common.h:400:4: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/i40e/i40e_rxtx_vec_common.h:401:4: error: always_inline function
'_mm512_store_si512' requires target feature 'avx512f', but would be inlined
into function 'i40e_rxq_rearm_common' that is compiled without support for
'avx512f'
                        _mm512_store_si512((__m512i *)&(rxdp + 4)->read,
dma_addr4_7);
                        ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
[119/890] Compiling C object
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o
FAILED: drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o 
cc -Idrivers/libtmp_rte_net_ice.a.p -Idrivers -I../drivers -Idrivers/net/ice
-I../drivers/net/ice -Idrivers/net/ice/base -I../drivers/net/ice/base
-Idrivers/common/iavf -I../drivers/common/iavf -Ilib/ethdev -I../lib/ethdev -I.
-I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include
-Ilib/eal/freebsd/include -I../lib/eal/freebsd/include -Ilib/eal/x86/include
-I../lib/eal/x86/include -Ilib/eal/common -I../lib/eal/common -Ilib/eal
-I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/metrics -I../lib/metrics
-Ilib/telemetry -I../lib/telemetry -Ilib/net -I../lib/net -Ilib/mbuf
-I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring
-Ilib/meter -I../lib/meter -Idrivers/bus/pci -I../drivers/bus/pci
-I../drivers/bus/pci/bsd -Ilib/pci -I../lib/pci -Idrivers/bus/vdev
-I../drivers/bus/vdev -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu
-fcolor-diagnostics -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -include
rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral
-Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes
-Wundef -Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -D__BSD_VISIBLE -fPIC
-march=native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -DCC_AVX2_SUPPORT
-DCC_AVX512_SUPPORT -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.ice -MD -MQ
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o -MF
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o.d -o
drivers/libtmp_rte_net_ice.a.p/net_ice_ice_rxtx_vec_avx2.c.o -c
../drivers/net/ice/ice_rxtx_vec_avx2.c
In file included from ../drivers/net/ice/ice_rxtx_vec_avx2.c:5:
../drivers/net/ice/ice_rxtx_vec_common.h:422:22: error: always_inline function
'_mm512_set1_epi64' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                __m512i hdr_room = _mm512_set1_epi64(RTE_PKTMBUF_HEADROOM);
                                   ^
../drivers/net/ice/ice_rxtx_vec_common.h:422:22: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:470:5: error:
'__builtin_ia32_inserti64x4' needs target feature avx512f
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
                                ^
/usr/lib/clang/11.0.1/include/avx512fintrin.h:7413:12: note: expanded from
macro '_mm512_inserti64x4'
  (__m512i)__builtin_ia32_inserti64x4((__v8di)(__m512i)(A), \
           ^
In file included from ../drivers/net/ice/ice_rxtx_vec_avx2.c:5:
../drivers/net/ice/ice_rxtx_vec_common.h:470:24: error: always_inline function
'_mm512_castsi256_si512' requires target feature 'avx512f', but would be
inlined into function 'ice_rxq_rearm_common' that is compiled without support
for 'avx512f'
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1),
                                                   ^
../drivers/net/ice/ice_rxtx_vec_common.h:470:24: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:473:5: error:
'__builtin_ia32_inserti64x4' needs target feature avx512f
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
                                ^
/usr/lib/clang/11.0.1/include/avx512fintrin.h:7413:12: note: expanded from
macro '_mm512_inserti64x4'
  (__m512i)__builtin_ia32_inserti64x4((__v8di)(__m512i)(A), \
           ^
In file included from ../drivers/net/ice/ice_rxtx_vec_avx2.c:5:
../drivers/net/ice/ice_rxtx_vec_common.h:473:24: error: always_inline function
'_mm512_castsi256_si512' requires target feature 'avx512f', but would be
inlined into function 'ice_rxq_rearm_common' that is compiled without support
for 'avx512f'
                               
_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
                                                   ^
../drivers/net/ice/ice_rxtx_vec_common.h:473:24: error: AVX vector return of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:477:18: error: always_inline function
'_mm512_unpackhi_epi64' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                        dma_addr0_3 = _mm512_unpackhi_epi64(vaddr0_3,
vaddr0_3);
                                      ^
../drivers/net/ice/ice_rxtx_vec_common.h:477:18: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:478:18: error: always_inline function
'_mm512_unpackhi_epi64' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                        dma_addr4_7 = _mm512_unpackhi_epi64(vaddr4_7,
vaddr4_7);
                                      ^
../drivers/net/ice/ice_rxtx_vec_common.h:478:18: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:481:18: error: always_inline function
'_mm512_add_epi64' requires target feature 'avx512f', but would be inlined into
function 'ice_rxq_rearm_common' that is compiled without support for 'avx512f'
                        dma_addr0_3 = _mm512_add_epi64(dma_addr0_3, hdr_room);
                                      ^
../drivers/net/ice/ice_rxtx_vec_common.h:481:18: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:482:18: error: always_inline function
'_mm512_add_epi64' requires target feature 'avx512f', but would be inlined into
function 'ice_rxq_rearm_common' that is compiled without support for 'avx512f'
                        dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room);
                                      ^
../drivers/net/ice/ice_rxtx_vec_common.h:482:18: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:485:4: error: always_inline function
'_mm512_store_si512' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                        _mm512_store_si512((__m512i *)&rxdp->read,
dma_addr0_3);
                        ^
../drivers/net/ice/ice_rxtx_vec_common.h:485:4: error: AVX vector argument of
type '__m512i' (vector of 8 'long long' values) without 'avx512f' enabled
changes the ABI
../drivers/net/ice/ice_rxtx_vec_common.h:486:4: error: always_inline function
'_mm512_store_si512' requires target feature 'avx512f', but would be inlined
into function 'ice_rxq_rearm_common' that is compiled without support for
'avx512f'
                        _mm512_store_si512((__m512i *)&(rxdp + 4)->read,
dma_addr4_7);
                        ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
ninja: build stopped: subcommand failed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [RFC] ethdev: change queue release callback
  2021-08-10  8:54  0%         ` Ferruh Yigit
@ 2021-08-10  9:07  0%           ` Xueming(Steven) Li
  2021-08-11 11:57  0%             ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-08-10  9:07 UTC (permalink / raw)
  To: Ferruh Yigit, Singh, Aman Deep, Andrew Rybchenko
  Cc: dev, Slava Ovsiienko, NBU-Contact-Thomas Monjalon



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Tuesday, August 10, 2021 4:54 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> 
> On 8/10/2021 9:03 AM, Xueming(Steven) Li wrote:
> > Hi Singh and Ferruh,
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Monday, August 9, 2021 11:31 PM
> >> To: Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>; Xueming(Steven) Li
> >> <xuemingl@nvidia.com>
> >> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>;
> >> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
> >> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> >>
> >> On 8/9/2021 3:39 PM, Singh, Aman Deep wrote:
> >>> Hi Xueming,
> >>>
> >>> On 7/28/2021 1:10 PM, Andrew Rybchenko wrote:
> >>>> On 7/27/21 6:41 AM, Xueming Li wrote:
> >>>>> To align with other eth device queue configuration callbacks,
> >>>>> change RX and TX queue release callback API parameter from queue
> >>>>> object to device and queue index.
> >>>>>
> >>>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> >>>>
> >>>> In fact, there is no strong reasons to do it, but I think it is a
> >>>> nice cleanup to use (dev + queue index) on control path.
> >>>>
> >>>> Hopefully it will not result in any regressions.
> >>>
> >>> Combined there are 100+ API's for Rx/Tx queue_release that need to
> >>> be modified for it.
> >>>
> >>> I believe all regression possibilities here will be caught, in
> >>> compilation phase itself.
> >>>
> >>
> >> Same here, it is a good cleanup but there is no strong reason for it.
> >>
> >> Since it is all internal, there is no ABI restriction on the patch,
> >> and v21.11 will be full ABI break patches, to not cause conflicts with this change, what would you think to have it on v22.02?
> >
> > This patch is required by shared-rxq feature which ABI broken, target to 21.11.
> 
> Why it is required?

In rx burst function, rxq object is used in data path. For best data performance, it's shared-rxq object in case of shared rxq enabled.
I think eth api defined rxq object for performance as well, specific on data plane. 
Hardware saves port info received packet descriptor for my case.
Can't tell which device's queue with this shared rxq object, control path can't use this shared rxq anymore, have to be specific on dev and queue id.

> 
> > I'll do it carefully, fortunately, the change is straightforward.
> >


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ethdev: change queue release callback
  2021-08-10  8:03  3%       ` Xueming(Steven) Li
@ 2021-08-10  8:54  0%         ` Ferruh Yigit
  2021-08-10  9:07  0%           ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-08-10  8:54 UTC (permalink / raw)
  To: Xueming(Steven) Li, Singh, Aman Deep, Andrew Rybchenko
  Cc: dev, Slava Ovsiienko, NBU-Contact-Thomas Monjalon

On 8/10/2021 9:03 AM, Xueming(Steven) Li wrote:
> Hi Singh and Ferruh,
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Monday, August 9, 2021 11:31 PM
>> To: Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Xueming(Steven) Li
>> <xuemingl@nvidia.com>
>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
>> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
>>
>> On 8/9/2021 3:39 PM, Singh, Aman Deep wrote:
>>> Hi Xueming,
>>>
>>> On 7/28/2021 1:10 PM, Andrew Rybchenko wrote:
>>>> On 7/27/21 6:41 AM, Xueming Li wrote:
>>>>> To align with other eth device queue configuration callbacks, change
>>>>> RX and TX queue release callback API parameter from queue object to
>>>>> device and queue index.
>>>>>
>>>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
>>>>
>>>> In fact, there is no strong reasons to do it, but I think it is a
>>>> nice cleanup to use (dev + queue index) on control path.
>>>>
>>>> Hopefully it will not result in any regressions.
>>>
>>> Combined there are 100+ API's for Rx/Tx queue_release that need to be
>>> modified for it.
>>>
>>> I believe all regression possibilities here will be caught, in
>>> compilation phase itself.
>>>
>>
>> Same here, it is a good cleanup but there is no strong reason for it.
>>
>> Since it is all internal, there is no ABI restriction on the patch, and v21.11 will be full ABI break patches, to not cause conflicts with this
>> change, what would you think to have it on v22.02?
> 
> This patch is required by shared-rxq feature which ABI broken, target to 21.11.

Why it is required?

> I'll do it carefully, fortunately, the change is straightforward.
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ethdev: change queue release callback
  2021-08-09 15:31  4%     ` Ferruh Yigit
@ 2021-08-10  8:03  3%       ` Xueming(Steven) Li
  2021-08-10  8:54  0%         ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-08-10  8:03 UTC (permalink / raw)
  To: Ferruh Yigit, Singh, Aman Deep, Andrew Rybchenko
  Cc: dev, Slava Ovsiienko, NBU-Contact-Thomas Monjalon

Hi Singh and Ferruh,

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Monday, August 9, 2021 11:31 PM
> To: Singh, Aman Deep <aman.deep.singh@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Xueming(Steven) Li
> <xuemingl@nvidia.com>
> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [RFC] ethdev: change queue release callback
> 
> On 8/9/2021 3:39 PM, Singh, Aman Deep wrote:
> > Hi Xueming,
> >
> > On 7/28/2021 1:10 PM, Andrew Rybchenko wrote:
> >> On 7/27/21 6:41 AM, Xueming Li wrote:
> >>> To align with other eth device queue configuration callbacks, change
> >>> RX and TX queue release callback API parameter from queue object to
> >>> device and queue index.
> >>>
> >>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> >>
> >> In fact, there is no strong reasons to do it, but I think it is a
> >> nice cleanup to use (dev + queue index) on control path.
> >>
> >> Hopefully it will not result in any regressions.
> >
> > Combined there are 100+ API's for Rx/Tx queue_release that need to be
> > modified for it.
> >
> > I believe all regression possibilities here will be caught, in
> > compilation phase itself.
> >
> 
> Same here, it is a good cleanup but there is no strong reason for it.
> 
> Since it is all internal, there is no ABI restriction on the patch, and v21.11 will be full ABI break patches, to not cause conflicts with this
> change, what would you think to have it on v22.02?

This patch is required by shared-rxq feature which ABI broken, target to 21.11.
I'll do it carefully, fortunately, the change is straightforward.


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ethdev: change queue release callback
  @ 2021-08-09 15:31  4%     ` Ferruh Yigit
  2021-08-10  8:03  3%       ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-08-09 15:31 UTC (permalink / raw)
  To: Singh, Aman Deep, Andrew Rybchenko, Xueming Li
  Cc: dev, Viacheslav Ovsiienko, Thomas Monjalon

On 8/9/2021 3:39 PM, Singh, Aman Deep wrote:
> Hi Xueming,
> 
> On 7/28/2021 1:10 PM, Andrew Rybchenko wrote:
>> On 7/27/21 6:41 AM, Xueming Li wrote:
>>> To align with other eth device queue configuration callbacks, change RX
>>> and TX queue release callback API parameter from queue object to device
>>> and queue index.
>>>
>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
>>
>> In fact, there is no strong reasons to do it, but I think it is a nice
>> cleanup to use (dev + queue index) on control path.
>>
>> Hopefully it will not result in any regressions.
> 
> Combined there are 100+ API's for Rx/Tx queue_release that need to be modified
> for it.
> 
> I believe all regression possibilities here will be caught, in compilation phase
> itself.
> 

Same here, it is a good cleanup but there is no strong reason for it.

Since it is all internal, there is no ABI restriction on the patch, and v21.11
will be full ABI break patches, to not cause conflicts with this change, what
would you think to have it on v22.02?

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v9 2/2] devtools: script to send notifications of expired symbols
    2021-08-09 12:53  5%   ` [dpdk-dev] [PATCH v9 1/2] devtools: script to track symbols over releases Ray Kinsella
@ 2021-08-09 12:53  5%   ` Ray Kinsella
  1 sibling, 0 replies; 200+ results
From: Ray Kinsella @ 2021-08-09 12:53 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, stephen, ferruh.yigit, thomas, ktraynor, mdr

Use this script with the output of the DPDK symbol tool, to notify
maintainers of expired symbols by email. You need to define the environment
variable DPDK_GETMAINTAINER_PATH for this tool to work.

Use terminal output to review the emails before sending.
e.g.
$ devtools/symbol-tool.py list-expired --format-output csv \
| DPDK_GETMAINTAINER_PATH=<somewhere>/get_maintainer.pl \
devtools/notify_expired_symbols.py --format-output terminal

Then use email output to send the emails to the maintainers.
e.g.
$ devtools/symbol-tool.py list-expired --format-output csv \
| DPDK_GETMAINTAINER_PATH=<somewhere>/get_maintainer.pl \
devtools/notify_expired_symbols.py --format-output email \
--smtp-server <server> --sender <someone@somewhere.com> \
--password <password>

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 devtools/notify-symbol-maintainers.py | 234 ++++++++++++++++++++++++++
 1 file changed, 234 insertions(+)
 create mode 100755 devtools/notify-symbol-maintainers.py

diff --git a/devtools/notify-symbol-maintainers.py b/devtools/notify-symbol-maintainers.py
new file mode 100755
index 0000000000..a6c27b067c
--- /dev/null
+++ b/devtools/notify-symbol-maintainers.py
@@ -0,0 +1,234 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to notify maintainers of expired symbols'''
+import smtplib
+import ssl
+import sys
+import subprocess
+import argparse
+from argparse import RawTextHelpFormatter
+import time
+from email.message import EmailMessage
+
+DESCRIPTION = '''
+Use this script with the output of the DPDK symbol tool, to notify maintainers
+of expired symbols by email. You need to define the environment variable
+DPDK_GETMAINTAINER_PATH, for this tool to work.
+
+Use terminal output to review the emails before sending.
+e.g.
+$ devtools/symbol-tool.py list-expired --format-output csv \\
+| DPDK_GETMAINTAINER_PATH=<somewhere>/get_maintainer.pl \\
+{s} --format-output terminal
+
+Then use email output to send the emails to the maintainers.
+e.g.
+$ devtools/symbol-tool.py list-expired --format-output csv \\
+| DPDK_GETMAINTAINER_PATH=<somewhere>/get_maintainer.pl \\
+{s} --format-output email \\
+--smtp-server <server> --sender <someone@somewhere.com> --password <password>
+'''
+
+EMAIL_TEMPLATE = '''Hi there,
+
+Please note the symbols listed below have expired. In line with the DPDK ABI
+policy, they should be scheduled for removal, in the next DPDK release.
+
+For more information, please see the DPDK ABI Policy, section 3.5.3.
+https://doc.dpdk.org/guides/contributing/abi_policy.html
+
+Thanks,
+
+The DPDK Symbol Bot
+
+'''
+
+ABI_POLICY = 'doc/guides/contributing/abi_policy.rst'
+get_maintainer = ['devtools/get-maintainer.sh', \
+                  '--email', '-f']
+
+def _get_maintainers(libpath):
+    '''Get the maintainers for given library'''
+    try:
+        cmd = get_maintainer + [libpath]
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        return None
+
+    if result is None:
+        return None
+
+    email = result.stdout.decode('utf-8')
+    if email == '':
+        return None
+
+    email = list(filter(None,email.split('\n')))
+    return email
+
+default_maintainers = _get_maintainers(ABI_POLICY)
+
+def get_maintainers(libpath):
+    '''Get the maintainers for given library'''
+    maintainers=_get_maintainers(libpath)
+
+    if maintainers is None:
+        maintainers = default_maintainers
+
+    return maintainers
+
+def get_message(library, symbols):
+    '''Build email message from symbols, config and maintainers'''
+    message = {}
+    maintainers = get_maintainers(library)
+
+    message['To'] = maintainers
+    if maintainers != default_maintainers:
+        message['CC'] = default_maintainers
+
+    message['Subject'] = 'Expired symbols in {}\n'.format(library)
+
+    body = EMAIL_TEMPLATE
+    for sym in symbols:
+        body += ('{}\n'.format(sym))
+
+    message['Body'] = body
+
+    return message
+
+class OutputEmail():
+    '''Format the output for email'''
+    def __init__(self, config):
+        self.config = config
+
+        self.terminal = OutputTerminal(config)
+        context = ssl.create_default_context()
+
+        # Try to log in to server and send email
+        try:
+            self.server = smtplib.SMTP(config['smtp_server'], 587)
+            self.server.starttls(context=context) # Secure the connection
+            self.server.login(config['sender'], config['password'])
+        except Exception as exception:
+            print(exception)
+            raise exception
+
+    def message(self,message):
+        '''send email'''
+        self.terminal.message(message)
+
+        msg = EmailMessage()
+        msg.set_content(message.pop('Body'))
+
+        for key in message.keys():
+            msg[key] = message[key]
+
+        msg['From'] = self.config['sender']
+        msg['Reply-To'] = 'no-reply@dpdk.org'
+
+        self.server.send_message(msg)
+
+        time.sleep(1)
+
+    def __del__(self):
+        self.server.quit()
+
+class OutputTerminal(): # pylint: disable=too-few-public-methods
+    '''Format the output for the terminal'''
+    def __init__(self, config):
+        self.config = config
+
+    def message(self,message):
+        '''Print email to terminal'''
+        terminal = 'To:' + ', '.join(message['To']) + '\n'
+        if 'sender' in self.config.keys():
+            terminal += 'From:' + self.config['sender'] + '\n'
+
+        terminal += 'Reply-To:' + 'no-reply@dpdk.org' + '\n'
+        if 'CC' in message.keys():
+            terminal += 'CC:' + ', '.join(message['CC']) + '\n'
+
+        terminal += 'Subject:' + message['Subject'] + '\n'
+        terminal += 'Body:' + message['Body'] + '\n'
+
+        print(terminal)
+        print('-' * 80)
+
+def parse_config(args):
+    '''put the command line args in the right places'''
+    config = {}
+    error_msg = None
+
+    outputs = {
+        None : OutputTerminal,
+        'terminal' : OutputTerminal,
+        'email' : OutputEmail
+    }
+
+    if args.format_output == 'email':
+        if args.smtp_server is None:
+            error_msg = 'SMTP server'
+        else:
+            config['smtp_server'] = args.smtp_server
+
+        if args.sender is None:
+            error_msg = 'sender'
+        else:
+            config['sender'] = args.sender
+
+        if args.password is None:
+            error_msg = 'password'
+        else:
+            config['password'] = args.password
+
+    if error_msg is not None:
+        print('Please specify a {} for email output'.format(error_msg))
+        return None
+
+    config['output'] = outputs[args.format_output]
+    return config
+
+def main():
+    '''Main entry point'''
+    parser = argparse.ArgumentParser(description=DESCRIPTION.format(s=__file__), \
+                                     formatter_class=RawTextHelpFormatter)
+    parser.add_argument('--format-output', choices=['terminal','email'], \
+                        default='terminal')
+    parser.add_argument('--smtp-server')
+    parser.add_argument('--password')
+    parser.add_argument('--sender')
+
+    args = parser.parse_args()
+    config = parse_config(args)
+    if config is None:
+        return
+
+    symbols = []
+    lastlib = library = ''
+
+    output = config['output'](config)
+
+    for line in sys.stdin:
+        line = line.rstrip('\n')
+        library, symbol = [line[:line.find(',')], \
+                           line[line.find(',') + 1: len(line)]]
+        if library == 'mapfile':
+            continue
+
+        if library != lastlib:
+            message = get_message(lastlib, symbols)
+            output.message(message)
+            symbols = []
+
+        lastlib = library
+        symbols = symbols + [symbol]
+
+    #print the last library
+    message = get_message(lastlib, symbols)
+    output.message(message)
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v9 1/2] devtools: script to track symbols over releases
  @ 2021-08-09 12:53  5%   ` Ray Kinsella
  2021-08-09 12:53  5%   ` [dpdk-dev] [PATCH v9 2/2] devtools: script to send notifications of expired symbols Ray Kinsella
  1 sibling, 0 replies; 200+ results
From: Ray Kinsella @ 2021-08-09 12:53 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, stephen, ferruh.yigit, thomas, ktraynor, mdr

This script tracks the growth of stable and experimental symbols
over releases since v19.11. The script has the ability to
count the added symbols between two dpdk releases, and to
list experimental symbols present in two dpdk releases
(expired symbols).

example usages:

Count symbols added since v19.11
$ devtools/symbol-tool.py count-symbols

Count symbols added since v20.11
$ devtools/symbol-tool.py count-symbols --releases v20.11,v21.05

List experimental symbols present in v20.11 and v21.05
$ devtools/symbol-tool.py list-expired --releases v20.11,v21.05

List experimental symbols in libraries only, present since v19.11
$ devtools/symbol-tool.py list-expired --directory lib

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 devtools/symbol-tool.py | 402 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 402 insertions(+)
 create mode 100755 devtools/symbol-tool.py

diff --git a/devtools/symbol-tool.py b/devtools/symbol-tool.py
new file mode 100755
index 0000000000..4a357579dc
--- /dev/null
+++ b/devtools/symbol-tool.py
@@ -0,0 +1,402 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to count or list symbols in each DPDK release'''
+from pathlib import Path
+import sys
+import os
+import subprocess
+import argparse
+from argparse import RawTextHelpFormatter
+import re
+import datetime
+try:
+    from parsley import makeGrammar
+except ImportError:
+    print('This script uses the package Parsley to parse C Mapfiles.\n'
+          'This can be installed with \"pip install parsley".')
+    sys.exit()
+
+DESCRIPTION = '''
+This script tracks the growth of stable and experimental symbols
+over releases since v19.11. The script has the ability to
+count the added symbols between two dpdk releases, and to
+list experimental symbols present in two dpdk releases
+(expired symbols).
+
+example usages:
+
+Count symbols added since v19.11
+$ {s} count-symbols
+
+Count symbols added since v20.11
+$ {s} count-symbols --releases v20.11,v21.05
+
+List experimental symbols present in v20.11 and v21.05
+$ {s} list-expired --releases v20.11,v21.05
+
+List experimental symbols in libraries only, present since v19.11
+$ {s} list-expired --directory lib
+'''
+
+MAP_GRAMMAR = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+def get_abi_versions():
+    '''Returns a string of possible dpdk abi versions'''
+
+    year = datetime.date.today().year - 2000
+    tags = " |".join(['\'{}\''.format(i) \
+                     for i in reversed(range(21, year + 1)) ])
+    tags  = tags + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return tags
+
+def get_dpdk_releases():
+    '''Returns a list of dpdk release tags names  since v19.11'''
+
+    year = datetime.date.today().year - 2000
+    year_range = "|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile(r'^\"v(' +  year_range + r')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    try:
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        print("Failed to interogate git for release tags")
+        sys.exit()
+
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+def fix_directory_name(path):
+    '''Prepend librte to the source directory name'''
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+def directory_renamed(path, rel):
+    '''Fix removal of the librte_ from the directory names'''
+
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    return result
+
+def mapfile_renamed(path, rel):
+    '''Fix renaming of the map file'''
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE,
+                            check=True)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if newfile is not None:
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        try:
+            result = subprocess.run(['git', 'show', tagfile], \
+                                    stdout=subprocess.PIPE, \
+                                    stderr=subprocess.PIPE,
+                                    check=True)
+        except subprocess.CalledProcessError:
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+def mapfile_and_directory_renamed(path, rel):
+    '''Fix renaming of the map file & the source directory'''
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+FIX_STRATEGIES = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+def get_symbols(map_parser, release, mapfile_path):
+    '''Count the symbols for a given release and mapfile'''
+    abi_sections = {}
+
+    tagfile = '{}:{}'.format(release,mapfile_path)
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    for fix_strategy in FIX_STRATEGIES:
+        if result is not None:
+            break
+        result = fix_strategy(mapfile_path, release)
+
+    if result is not None:
+        mapfile = result.stdout.decode('utf-8')
+        abi_sections = map_parser(mapfile).abi()
+
+    return abi_sections
+
+def get_terminal_rows():
+    '''Find the number of rows in the terminal'''
+
+    try:
+        return os.get_terminal_size().lines
+    except IOError:
+        return 0
+
+class SymbolCountOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  dpdk_releases
+
+        self.terminal_rows = get_terminal_rows()
+        self.row = 0
+
+    def set_terminal_output(self,dpdk_rel):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}' + \
+            ''.join(['{:<6}{:<6}'] * (len(dpdk_rel)))
+        self.column_fmt = '{:50}' + \
+            ''.join(['{:<12}'] * (len(dpdk_rel)))
+
+    def set_csv_output(self,dpdk_rel):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},' + \
+            ','.join(['{},{}'] * (len(dpdk_rel)))
+        self.column_fmt = '{},' + \
+            ','.join(['{},'] * (len(dpdk_rel)))
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+        self.row += 1
+
+    def print_row(self, mapfile, symbols):
+        '''Print row of symbol values'''
+        print(self.output_fmt.format(*([mapfile] + symbols)))
+        self.row += 1
+
+        if((self.terminal_rows>0) and ((self.row % self.terminal_rows) == 0)):
+            self.print_columns()
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+
+class ListExpiredOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.terminal = True
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  \
+            ['expired (' + ','.join(dpdk_releases) + ')']
+
+    def set_terminal_output(self, _):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}{:<50}'
+        self.column_fmt = '{:50}{:50}'
+
+    def set_csv_output(self, _):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},{}'
+        self.column_fmt = '{},{}'
+        self.terminal = False
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+
+    def print_row(self, mapfile, symbols):
+        '''Print row of symbol values'''
+
+        for symbol in symbols:
+            print(self.output_fmt.format(mapfile,symbol))
+            if self.terminal :
+                mapfile = ''
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+
+class CountSymbolsAction:
+    ''' Logic to count symbols added since a give release '''
+    IGNORE_SECTIONS = ['EXPERIMENTAL','INTERNAL']
+
+    def __init__(self, mapfile_path, mapfile_parser, format_output):
+        self.path = mapfile_path
+        self.parser = mapfile_parser
+        self.format_output = format_output
+        self.symbols_count = []
+
+    def add_mapfile(self, release):
+        ''' add a version mapfile '''
+        symbol_count = experimental_count = 0
+
+        symbols = get_symbols(self.parser, release, self.path)
+
+        # which versions are present, and we care about
+        abi_vers = [abi_ver \
+                    for abi_ver in symbols \
+                    if abi_ver not in self.IGNORE_SECTIONS]
+
+        for abi_ver in abi_vers:
+            symbol_count += len(symbols[abi_ver])
+
+        # count experimental symbols
+        if 'EXPERIMENTAL' in symbols.keys():
+            experimental_count = len(symbols['EXPERIMENTAL'])
+
+        self.symbols_count += [symbol_count, experimental_count]
+
+    def __del__(self):
+        self.format_output.print_row(self.path.parent, self.symbols_count)
+
+class ListExpiredAction:
+    ''' Logic to list expired symbols between two releases '''
+
+    def __init__(self, mapfile_path, mapfile_parser, format_output):
+        self.path = mapfile_path
+        self.parser = mapfile_parser
+        self.format_output = format_output
+        self.experimental_symbols = []
+
+    def add_mapfile(self, release):
+        ''' add a version mapfile '''
+        symbols = get_symbols(self.parser, release, self.path)
+        if 'EXPERIMENTAL' in symbols.keys():
+            self.experimental_symbols.append(symbols['EXPERIMENTAL'])
+
+    def __del__(self):
+        if len(self.experimental_symbols) != 2:
+            return
+
+        tmp = self.experimental_symbols
+        # find symbols present in both dpdk releases
+        intersect_syms = [sym for sym in tmp[0] if sym in tmp[1]]
+
+        # check for empty set
+        if intersect_syms == []:
+            return
+
+        self.format_output.print_row(self.path.parent, intersect_syms)
+
+SRC_DIRECTORIES = 'drivers,lib'
+
+ACTIONS = {None: CountSymbolsAction, \
+           'count-symbols': CountSymbolsAction, \
+           'list-expired': ListExpiredAction}
+
+ACTION_OUTPUT = {None: SymbolCountOutput, \
+                 'count-symbols': SymbolCountOutput, \
+                 'list-expired': ListExpiredOutput}
+
+def main():
+    '''Main entry point'''
+
+    dpdk_releases = get_dpdk_releases()
+
+    parser = argparse.ArgumentParser(description=DESCRIPTION.format(s=__file__), \
+                                     formatter_class=RawTextHelpFormatter
+                                     )
+    parser.add_argument('mode', choices=['count-symbols','list-expired'])
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=SRC_DIRECTORIES.split(','),
+                        default=SRC_DIRECTORIES)
+    parser.add_argument('--releases', \
+                        help='2 x comma separated release tags e.g. \'' \
+                        + ','.join([dpdk_releases[0],dpdk_releases[-1]]) \
+                        + '\'')
+    args = parser.parse_args()
+
+    if args.releases is not None:
+        dpdk_releases = args.releases.split(',')
+
+    if args.mode == 'list-expired':
+        if len(dpdk_releases) < 2:
+            sys.exit('Please specify two releases to compare ' \
+                     'in \'list-expired\' mode.')
+        dpdk_releases = [dpdk_releases[0], dpdk_releases[len(dpdk_releases) - 1]]
+
+    action = ACTIONS[args.mode]
+    format_output = ACTION_OUTPUT[args.mode](args.format_output, dpdk_releases)
+
+    map_grammar = MAP_GRAMMAR.format(get_abi_versions())
+    map_parser = makeGrammar(map_grammar, {})
+
+    format_output.print_columns()
+
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            release_action = action(path, map_parser, format_output)
+
+            for release in dpdk_releases:
+                release_action.add_mapfile(release)
+
+            # all the magic happens in the destructor
+            del release_action
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash offload
  2021-07-22 11:03  0%             ` Andrew Rybchenko
@ 2021-08-09  8:53  0%               ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-08-09  8:53 UTC (permalink / raw)
  To: Andrew Rybchenko, Wang, Jie1X, Li, Xiaoyun, dev; +Cc: stable

On 7/22/2021 12:03 PM, Andrew Rybchenko wrote:
> On 7/19/21 7:18 PM, Ferruh Yigit wrote:
>> On 7/19/2021 10:55 AM, Wang, Jie1X wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Yigit, Ferruh <ferruh.yigit@intel.com>
>>>> Sent: Friday, July 16, 2021 4:52 PM
>>>> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wang, Jie1X <jie1x.wang@intel.com>;
>>>> dev@dpdk.org
>>>> Cc: andrew.rybchenko@oktetlabs.ru; stable@dpdk.org
>>>> Subject: Re: [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show
>>>> RSS hash offload
>>>>
>>>> On 7/16/2021 9:30 AM, Li, Xiaoyun wrote:
>>>>>> -----Original Message-----
>>>>>> From: stable <stable-bounces@dpdk.org> On Behalf Of Li, Xiaoyun
>>>>>> Sent: Thursday, July 15, 2021 12:54
>>>>>> To: Wang, Jie1X <jie1x.wang@intel.com>; dev@dpdk.org
>>>>>> Cc: andrew.rybchenko@oktetlabs.ru; stable@dpdk.org
>>>>>> Subject: Re: [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd
>>>>>> doesn't show RSS hash offload
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Wang, Jie1X <jie1x.wang@intel.com>
>>>>>>> Sent: Thursday, July 15, 2021 19:57
>>>>>>> To: dev@dpdk.org
>>>>>>> Cc: Li, Xiaoyun <xiaoyun.li@intel.com>;
>>>>>>> andrew.rybchenko@oktetlabs.ru; Wang, Jie1X <jie1x.wang@intel.com>;
>>>>>>> stable@dpdk.org
>>>>>>> Subject: [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash
>>>>>>> offload
>>>>>>>
>>>>>>> The driver may change offloads info into dev->data->dev_conf in
>>>>>>> dev_configure which may cause port->dev_conf and port->rx_conf
>>>>>>> contain
>>>>>> outdated values.
>>>>>>>
>>>>>>> This patch updates the offloads info if it changes to fix this issue.
>>>>>>>
>>>>>>> Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
>>>>>>> Cc: stable@dpdk.org
>>>>>>>
>>>>>>> Signed-off-by: Jie Wang <jie1x.wang@intel.com>
>>>>>>> ---
>>>>>>> v4: delete the whitespace at the end of the line.
>>>>>>> v3:
>>>>>>>   - check and update the "offloads" of "port->dev_conf.rx/txmode".
>>>>>>>   - update the commit log.
>>>>>>> v2: copy "rx/txmode.offloads", instead of copying the entire struct
>>>>>>> "dev->data-
>>>>>>>> dev_conf.rx/txmode".
>>>>>>> ---
>>>>>>>   app/test-pmd/testpmd.c | 27 +++++++++++++++++++++++++++
>>>>>>>   1 file changed, 27 insertions(+)
>>>>>>
>>>>>> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>>>>
>>>>> Although I gave my ack, app shouldn't touch rte_eth_devices which this patch
>>>> does. Usually, testpmd should only call function like
>>>> eth_dev_info_get_print_err().
>>>>> But dev_info doesn't contain the info dev->data->dev_conf which the driver
>>>> modifies.
>>>>>
>>>>> Probably we need a better fix.
>>>>>
>>>>
>>>> Agree, an application accessing directly to 'rte_eth_devices' is sign of
>>>> something
>>>> missing/wrong.
>>>>
>>>> In this case there is no way for application to know what is the configured
>>>> offload settings per port and queue. Which is missing part I think.
>>>>
>>>> As you said normally we get data from PMD mainly via 'rte_eth_dev_info_get()',
>>>> which is an overloaded function, it provides many different things, like driver
>>>> default values, limitations, current config/status, capabilities etc...
>>>>
>>>> So I think we can do a few things:
>>>> 1) Add current offload configuration to 'rte_eth_dev_info_get()', so
>>>> application
>>>> can get it and use it.
>>>> The advantage is this API already called many places, many times, so there is a
>>>> big chance that application already have this information when it needs.
>>>> Disadvantage is, as mentioned above the API already big and messy, making it
>>>> bigger makes more error prone and makes easier to break ABI.
>>>>
>>> I prefer to choose the 1st suggestion.
>>>
>>> Normally PMD gets data via 'rte_eth_dev_info_get()'. When we add offloads
>>> configuration
>>> to it, we can get offloads as same as getting other info.
>>>
>>
>> Most probably it is easier to implement 1), I see your point but as said before
>> I think 'rte_eth_dev_info_get()' is already messy and I am worried to make it
>> even bigger.
> 
> IMHO, (1) is not an option.
> 
>> I prefer option 2).
> 
> I'm not sure that API function for each config parameter is an option as
> well. We should find a balance. May be I'd add something like
> rte_eth_dev_get_conf(uint16_t port_id, const struct rte_eth_conf **conf)
> which returns a pointer to up-to-date configuration. I.e. option (3).
> 

That is option 3, that can work too.

> The tricky part here is to ensure that all specific API which modifies
> various bits of the configuration updates dev_conf.
> 

They have to, aren't they? Otherwise there is no where to record the current
config for PMD too.

>>
>> @Thomas, @Andrew, what do you think?
>>
>>
>>>> 2) Add a new API to get configured offload information, so a specific API
>>>> for it.
>>>>
>>>> 3) Get a more generic API to get configured config (dev_conf) which will cover
>>>> offloads too.
>>>> Disadvantage can be leaking out too many internal config to user
>>>> unintentionally.
> 
> I don't understand it. dev_conf is provided by user on
> rte_eth_dev_configure().

Yes but application doesn't provide all config, my concern was if some internal
config should be hidden from applications (possibly via some APIs).

Overall I am OK to go with option 3, I think it can simplify the applications
life. And later we can have some more updates on testpmd to benefit from new API.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] version: 21.11-rc0
@ 2021-08-08 19:26 11% Thomas Monjalon
  2021-08-12 14:36  0% ` Ferruh Yigit
  2021-08-17  6:34  4% ` [dpdk-dev] " David Marchand
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2021-08-08 19:26 UTC (permalink / raw)
  To: dev; +Cc: david.marchand, mdr

Start a new release cycle with empty release notes.

The ABI version becomes 22.0.
The map files are updated to the new ABI major number (22).
The ABI exceptions are dropped
and CI ABI checks are disabled
because compatibility is not preserved.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 .github/workflows/build.yml                |   4 +-
 .travis.yml                                |  21 +---
 ABI_VERSION                                |   2 +-
 VERSION                                    |   2 +-
 devtools/libabigail.abignore               |  41 -------
 doc/guides/rel_notes/index.rst             |   1 +
 doc/guides/rel_notes/release_21_11.rst     | 136 +++++++++++++++++++++
 drivers/baseband/acc100/version.map        |   2 +-
 drivers/baseband/fpga_5gnr_fec/version.map |   2 +-
 drivers/baseband/fpga_lte_fec/version.map  |   2 +-
 drivers/baseband/null/version.map          |   2 +-
 drivers/baseband/turbo_sw/version.map      |   2 +-
 drivers/bus/ifpga/version.map              |   2 +-
 drivers/bus/pci/version.map                |   2 +-
 drivers/bus/vdev/version.map               |   2 +-
 drivers/bus/vmbus/version.map              |   2 +-
 drivers/common/qat/version.map             |   2 +-
 drivers/compress/isal/version.map          |   2 +-
 drivers/compress/mlx5/version.map          |   2 +-
 drivers/compress/octeontx/version.map      |   2 +-
 drivers/compress/zlib/version.map          |   2 +-
 drivers/crypto/aesni_gcm/version.map       |   2 +-
 drivers/crypto/aesni_mb/version.map        |   2 +-
 drivers/crypto/armv8/version.map           |   2 +-
 drivers/crypto/bcmfs/version.map           |   2 +-
 drivers/crypto/caam_jr/version.map         |   2 +-
 drivers/crypto/ccp/version.map             |   2 +-
 drivers/crypto/kasumi/version.map          |   2 +-
 drivers/crypto/mlx5/version.map            |   2 +-
 drivers/crypto/mvsam/version.map           |   2 +-
 drivers/crypto/nitrox/version.map          |   2 +-
 drivers/crypto/null/version.map            |   2 +-
 drivers/crypto/octeontx/version.map        |   2 +-
 drivers/crypto/octeontx2/version.map       |   2 +-
 drivers/crypto/openssl/version.map         |   2 +-
 drivers/crypto/scheduler/version.map       |   2 +-
 drivers/crypto/snow3g/version.map          |   2 +-
 drivers/crypto/virtio/version.map          |   2 +-
 drivers/crypto/zuc/version.map             |   2 +-
 drivers/event/dlb2/version.map             |   2 +-
 drivers/event/dpaa/version.map             |   2 +-
 drivers/event/dpaa2/version.map            |   2 +-
 drivers/event/dsw/version.map              |   2 +-
 drivers/event/octeontx/version.map         |   2 +-
 drivers/event/octeontx2/version.map        |   2 +-
 drivers/event/opdl/version.map             |   2 +-
 drivers/event/skeleton/version.map         |   2 +-
 drivers/event/sw/version.map               |   2 +-
 drivers/mempool/bucket/version.map         |   2 +-
 drivers/mempool/dpaa2/version.map          |   2 +-
 drivers/mempool/octeontx/version.map       |   2 +-
 drivers/mempool/ring/version.map           |   2 +-
 drivers/mempool/stack/version.map          |   2 +-
 drivers/net/af_packet/version.map          |   2 +-
 drivers/net/af_xdp/version.map             |   2 +-
 drivers/net/ark/version.map                |   2 +-
 drivers/net/atlantic/version.map           |   2 +-
 drivers/net/avp/version.map                |   2 +-
 drivers/net/axgbe/version.map              |   2 +-
 drivers/net/bnx2x/version.map              |   2 +-
 drivers/net/bnxt/version.map               |   2 +-
 drivers/net/bonding/version.map            |   2 +-
 drivers/net/cnxk/version.map               |   2 +-
 drivers/net/cxgbe/version.map              |   2 +-
 drivers/net/dpaa/version.map               |   2 +-
 drivers/net/e1000/version.map              |   2 +-
 drivers/net/ena/version.map                |   2 +-
 drivers/net/enetc/version.map              |   2 +-
 drivers/net/enic/version.map               |   2 +-
 drivers/net/failsafe/version.map           |   2 +-
 drivers/net/fm10k/version.map              |   2 +-
 drivers/net/hinic/version.map              |   2 +-
 drivers/net/hns3/version.map               |   2 +-
 drivers/net/i40e/version.map               |   2 +-
 drivers/net/iavf/version.map               |   2 +-
 drivers/net/ice/version.map                |   2 +-
 drivers/net/igc/version.map                |   2 +-
 drivers/net/ionic/version.map              |   2 +-
 drivers/net/ipn3ke/version.map             |   2 +-
 drivers/net/ixgbe/version.map              |   2 +-
 drivers/net/kni/version.map                |   2 +-
 drivers/net/liquidio/version.map           |   2 +-
 drivers/net/memif/version.map              |   2 +-
 drivers/net/mlx4/version.map               |   2 +-
 drivers/net/mlx5/version.map               |   2 +-
 drivers/net/mvneta/version.map             |   2 +-
 drivers/net/mvpp2/version.map              |   2 +-
 drivers/net/netvsc/version.map             |   2 +-
 drivers/net/nfb/version.map                |   2 +-
 drivers/net/nfp/version.map                |   2 +-
 drivers/net/ngbe/version.map               |   2 +-
 drivers/net/null/version.map               |   2 +-
 drivers/net/octeontx/version.map           |   2 +-
 drivers/net/octeontx2/version.map          |   2 +-
 drivers/net/octeontx_ep/version.map        |   4 +-
 drivers/net/pcap/version.map               |   2 +-
 drivers/net/pfe/version.map                |   2 +-
 drivers/net/qede/version.map               |   2 +-
 drivers/net/ring/version.map               |   2 +-
 drivers/net/sfc/version.map                |   2 +-
 drivers/net/softnic/version.map            |   2 +-
 drivers/net/szedata2/version.map           |   2 +-
 drivers/net/tap/version.map                |   2 +-
 drivers/net/thunderx/version.map           |   2 +-
 drivers/net/txgbe/version.map              |   2 +-
 drivers/net/vdev_netvsc/version.map        |   2 +-
 drivers/net/vhost/version.map              |   2 +-
 drivers/net/virtio/version.map             |   2 +-
 drivers/net/vmxnet3/version.map            |   2 +-
 drivers/raw/cnxk_bphy/version.map          |   2 +-
 drivers/raw/dpaa2_cmdif/version.map        |   2 +-
 drivers/raw/dpaa2_qdma/version.map         |   2 +-
 drivers/raw/ifpga/version.map              |   2 +-
 drivers/raw/ioat/version.map               |   2 +-
 drivers/raw/ntb/version.map                |   2 +-
 drivers/raw/octeontx2_dma/version.map      |   2 +-
 drivers/raw/octeontx2_ep/version.map       |   2 +-
 drivers/raw/skeleton/version.map           |   2 +-
 drivers/regex/mlx5/version.map             |   2 +-
 drivers/regex/octeontx2/version.map        |   2 +-
 drivers/vdpa/ifc/version.map               |   2 +-
 drivers/vdpa/mlx5/version.map              |   2 +-
 lib/acl/version.map                        |   2 +-
 lib/bitratestats/version.map               |   2 +-
 lib/bpf/version.map                        |   2 +-
 lib/cfgfile/version.map                    |   2 +-
 lib/cmdline/version.map                    |   2 +-
 lib/cryptodev/version.map                  |   2 +-
 lib/distributor/version.map                |   2 +-
 lib/eal/version.map                        |   2 +-
 lib/efd/version.map                        |   2 +-
 lib/ethdev/version.map                     |   2 +-
 lib/eventdev/version.map                   |   2 +-
 lib/gro/version.map                        |   2 +-
 lib/gso/version.map                        |   2 +-
 lib/hash/version.map                       |   2 +-
 lib/ip_frag/version.map                    |   2 +-
 lib/ipsec/version.map                      |   2 +-
 lib/jobstats/version.map                   |   2 +-
 lib/kni/version.map                        |   2 +-
 lib/kvargs/version.map                     |   2 +-
 lib/latencystats/version.map               |   2 +-
 lib/lpm/version.map                        |   2 +-
 lib/mbuf/version.map                       |   2 +-
 lib/member/version.map                     |   2 +-
 lib/mempool/version.map                    |   2 +-
 lib/meter/version.map                      |   2 +-
 lib/metrics/version.map                    |   2 +-
 lib/net/version.map                        |   2 +-
 lib/pci/version.map                        |   2 +-
 lib/pdump/version.map                      |   2 +-
 lib/pipeline/version.map                   |   2 +-
 lib/port/version.map                       |   2 +-
 lib/power/version.map                      |   2 +-
 lib/rawdev/version.map                     |   2 +-
 lib/rcu/version.map                        |   2 +-
 lib/reorder/version.map                    |   2 +-
 lib/ring/version.map                       |   2 +-
 lib/sched/version.map                      |   2 +-
 lib/security/version.map                   |   2 +-
 lib/stack/version.map                      |   2 +-
 lib/table/version.map                      |   2 +-
 lib/timer/version.map                      |   2 +-
 lib/vhost/version.map                      |  40 +++---
 164 files changed, 319 insertions(+), 242 deletions(-)
 create mode 100644 doc/guides/rel_notes/release_21_11.rst

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 7dac20ddeb..151641e6fa 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -21,7 +21,7 @@ jobs:
       CC: ccache ${{ matrix.config.compiler }}
       DEF_LIB: ${{ matrix.config.library }}
       LIBABIGAIL_VERSION: libabigail-1.8
-      REF_GIT_TAG: v21.05
+      REF_GIT_TAG: none
       RUN_TESTS: ${{ contains(matrix.config.checks, 'tests') }}
 
     strategy:
@@ -34,7 +34,7 @@ jobs:
           - os: ubuntu-18.04
             compiler: gcc
             library: shared
-            checks: abi+doc+tests
+            checks: doc+tests
           - os: ubuntu-18.04
             compiler: clang
             library: static
diff --git a/.travis.yml b/.travis.yml
index 23067d9e3c..4bb5bf629e 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -42,7 +42,7 @@ script: ./.ci/${TRAVIS_OS_NAME}-build.sh
 env:
   global:
     - LIBABIGAIL_VERSION=libabigail-1.8
-    - REF_GIT_TAG=v21.05
+    - REF_GIT_TAG=none
 
 jobs:
   include:
@@ -61,14 +61,6 @@ jobs:
         packages:
           - *required_packages
           - *doc_packages
-  - env: DEF_LIB="shared" ABI_CHECKS=true
-    arch: amd64
-    compiler: gcc
-    addons:
-      apt:
-        packages:
-          - *required_packages
-          - *libabigail_build_packages
   # x86_64 clang jobs
   - env: DEF_LIB="static"
     arch: amd64
@@ -145,17 +137,6 @@ jobs:
         packages:
           - *required_packages
           - *doc_packages
-  - env: DEF_LIB="shared" ABI_CHECKS=true
-    dist: focal
-    arch: arm64-graviton2
-    virt: vm
-    group: edge
-    compiler: gcc
-    addons:
-      apt:
-        packages:
-          - *required_packages
-          - *libabigail_build_packages
   # aarch64 clang jobs
   - env: DEF_LIB="static"
     dist: focal
diff --git a/ABI_VERSION b/ABI_VERSION
index 8e5954eb6f..b090fe57f6 100644
--- a/ABI_VERSION
+++ b/ABI_VERSION
@@ -1 +1 @@
-21.3
+22.0
diff --git a/VERSION b/VERSION
index 6512890184..0931886fb0 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-21.08.0
+21.11.0-rc0
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 93158405e0..4b676f317d 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -11,44 +11,3 @@
 ; Ignore generated PMD information strings
 [suppress_variable]
         name_regexp = _pmd_info$
-
-; Explicit ignore for driver-only ABI
-[suppress_function]
-        name_regexp = rte_vdev_(|un)register
-
-; Ignore fields inserted in cacheline boundary of rte_cryptodev
-[suppress_type]
-        name = rte_cryptodev
-        has_data_member_inserted_between = {offset_after(attached), end}
-
-; Ignore fields inserted in union boundary of rte_cryptodev_symmetric_capability
-[suppress_type]
-        name = rte_cryptodev_symmetric_capability
-        has_data_member_inserted_between = {offset_after(cipher.iv_size), end}
-
-; Ignore fields inserted in middle padding of rte_crypto_cipher_xform
-[suppress_type]
-        name = rte_crypto_cipher_xform
-        has_data_member_inserted_between = {offset_after(key), offset_of(iv)}
-
-; Ignore fields inserted in place of reserved fields of rte_eventdev
-[suppress_type]
-	name = rte_eventdev
-	has_data_member_inserted_between = {offset_after(attached), end}
-
-; Ignore fields inserted in alignment hole of rte_eth_rxq_info
-[suppress_type]
-	name = rte_eth_rxq_info
-	has_data_member_inserted_at = offset_after(scattered_rx)
-
-; Ignore fields inserted in cacheline boundary of rte_eth_txq_info
-[suppress_type]
-	name = rte_eth_txq_info
-	has_data_member_inserted_between = {offset_after(nb_desc), end}
-
-; Ignore all changes to rte_eth_dev_data
-; Note: we only cared about dev_configured bit addition, but libabigail
-; seems to wrongly compute bitfields offset.
-; https://sourceware.org/bugzilla/show_bug.cgi?id=28060
-[suppress_type]
-	name = rte_eth_dev_data
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 9648ba60e1..78861ee57b 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,6 +8,7 @@ Release Notes
     :maxdepth: 1
     :numbered:
 
+    release_21_11
     release_21_08
     release_21_05
     release_21_02
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
new file mode 100644
index 0000000000..d707a554ef
--- /dev/null
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -0,0 +1,136 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright 2021 The DPDK contributors
+
+.. include:: <isonum.txt>
+
+DPDK Release 21.11
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+      xdg-open build/doc/html/guides/rel_notes/release_21_11.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release.
+   Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense.
+     The description should be enough to allow someone scanning
+     the release notes to understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list
+     like this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     Suggested order in release notes items:
+     * Core libs (EAL, mempool, ring, mbuf, buses)
+     * Device abstraction libs and PMDs (ordered alphabetically by vendor name)
+       - ethdev (lib, PMDs)
+       - cryptodev (lib, PMDs)
+       - eventdev (lib, PMDs)
+       - etc
+     * Other libs
+     * Apps, Examples, Tools (if significant)
+
+     This section is a comment. Do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =======================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item
+     in the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the API change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the ABI change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue
+     in the present tense. Add information on any known workarounds.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested
+   with this release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
diff --git a/drivers/baseband/acc100/version.map b/drivers/baseband/acc100/version.map
index 47a23b8dac..40604c73d2 100644
--- a/drivers/baseband/acc100/version.map
+++ b/drivers/baseband/acc100/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/baseband/fpga_5gnr_fec/version.map b/drivers/baseband/fpga_5gnr_fec/version.map
index db43cd8403..de4e5025bf 100644
--- a/drivers/baseband/fpga_5gnr_fec/version.map
+++ b/drivers/baseband/fpga_5gnr_fec/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/baseband/fpga_lte_fec/version.map b/drivers/baseband/fpga_lte_fec/version.map
index b95b7838e8..e3bfa6edb0 100644
--- a/drivers/baseband/fpga_lte_fec/version.map
+++ b/drivers/baseband/fpga_lte_fec/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/baseband/null/version.map b/drivers/baseband/null/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/baseband/null/version.map
+++ b/drivers/baseband/null/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/baseband/turbo_sw/version.map b/drivers/baseband/turbo_sw/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/baseband/turbo_sw/version.map
+++ b/drivers/baseband/turbo_sw/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/bus/ifpga/version.map b/drivers/bus/ifpga/version.map
index 6e8f85da3c..8ac3a4d258 100644
--- a/drivers/bus/ifpga/version.map
+++ b/drivers/bus/ifpga/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_ifpga_driver_register;
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index 00fac8864c..aa56439c2b 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pci_dump;
diff --git a/drivers/bus/vdev/version.map b/drivers/bus/vdev/version.map
index 61b6cefcee..0d60b7e2bc 100644
--- a/drivers/bus/vdev/version.map
+++ b/drivers/bus/vdev/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_vdev_add_custom_scan;
diff --git a/drivers/bus/vmbus/version.map b/drivers/bus/vmbus/version.map
index fa8e91c282..3cadec7fae 100644
--- a/drivers/bus/vmbus/version.map
+++ b/drivers/bus/vmbus/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_vmbus_chan_close;
diff --git a/drivers/common/qat/version.map b/drivers/common/qat/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/common/qat/version.map
+++ b/drivers/common/qat/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/compress/isal/version.map b/drivers/compress/isal/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/compress/isal/version.map
+++ b/drivers/compress/isal/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/compress/mlx5/version.map b/drivers/compress/mlx5/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/compress/mlx5/version.map
+++ b/drivers/compress/mlx5/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/compress/octeontx/version.map b/drivers/compress/octeontx/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/compress/octeontx/version.map
+++ b/drivers/compress/octeontx/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/compress/zlib/version.map b/drivers/compress/zlib/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/compress/zlib/version.map
+++ b/drivers/compress/zlib/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/aesni_gcm/version.map b/drivers/crypto/aesni_gcm/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/aesni_gcm/version.map
+++ b/drivers/crypto/aesni_gcm/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/aesni_mb/version.map b/drivers/crypto/aesni_mb/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/aesni_mb/version.map
+++ b/drivers/crypto/aesni_mb/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/armv8/version.map b/drivers/crypto/armv8/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/armv8/version.map
+++ b/drivers/crypto/armv8/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/bcmfs/version.map b/drivers/crypto/bcmfs/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/bcmfs/version.map
+++ b/drivers/crypto/bcmfs/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/caam_jr/version.map b/drivers/crypto/caam_jr/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/caam_jr/version.map
+++ b/drivers/crypto/caam_jr/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/ccp/version.map b/drivers/crypto/ccp/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/ccp/version.map
+++ b/drivers/crypto/ccp/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/kasumi/version.map b/drivers/crypto/kasumi/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/kasumi/version.map
+++ b/drivers/crypto/kasumi/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/mlx5/version.map b/drivers/crypto/mlx5/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/mlx5/version.map
+++ b/drivers/crypto/mlx5/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/mvsam/version.map b/drivers/crypto/mvsam/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/mvsam/version.map
+++ b/drivers/crypto/mvsam/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/nitrox/version.map b/drivers/crypto/nitrox/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/nitrox/version.map
+++ b/drivers/crypto/nitrox/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/null/version.map b/drivers/crypto/null/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/null/version.map
+++ b/drivers/crypto/null/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/octeontx/version.map b/drivers/crypto/octeontx/version.map
index 41f33a4ecf..997a95ea33 100644
--- a/drivers/crypto/octeontx/version.map
+++ b/drivers/crypto/octeontx/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/crypto/octeontx2/version.map b/drivers/crypto/octeontx2/version.map
index 02684781b3..d36663132a 100644
--- a/drivers/crypto/octeontx2/version.map
+++ b/drivers/crypto/octeontx2/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/crypto/openssl/version.map b/drivers/crypto/openssl/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/openssl/version.map
+++ b/drivers/crypto/openssl/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/scheduler/version.map b/drivers/crypto/scheduler/version.map
index ab7d505629..47e4487b75 100644
--- a/drivers/crypto/scheduler/version.map
+++ b/drivers/crypto/scheduler/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_cryptodev_scheduler_load_user_scheduler;
diff --git a/drivers/crypto/snow3g/version.map b/drivers/crypto/snow3g/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/snow3g/version.map
+++ b/drivers/crypto/snow3g/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/virtio/version.map b/drivers/crypto/virtio/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/virtio/version.map
+++ b/drivers/crypto/virtio/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/crypto/zuc/version.map b/drivers/crypto/zuc/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/crypto/zuc/version.map
+++ b/drivers/crypto/zuc/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/event/dlb2/version.map b/drivers/event/dlb2/version.map
index b1e4dff0ff..c727207d1a 100644
--- a/drivers/event/dlb2/version.map
+++ b/drivers/event/dlb2/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/event/dpaa/version.map b/drivers/event/dpaa/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/event/dpaa/version.map
+++ b/drivers/event/dpaa/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/event/dpaa2/version.map b/drivers/event/dpaa2/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/event/dpaa2/version.map
+++ b/drivers/event/dpaa2/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/event/dsw/version.map b/drivers/event/dsw/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/event/dsw/version.map
+++ b/drivers/event/dsw/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/event/octeontx/version.map b/drivers/event/octeontx/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/event/octeontx/version.map
+++ b/drivers/event/octeontx/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/event/octeontx2/version.map b/drivers/event/octeontx2/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/event/octeontx2/version.map
+++ b/drivers/event/octeontx2/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/event/opdl/version.map b/drivers/event/opdl/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/event/opdl/version.map
+++ b/drivers/event/opdl/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/event/skeleton/version.map b/drivers/event/skeleton/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/event/skeleton/version.map
+++ b/drivers/event/skeleton/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/event/sw/version.map b/drivers/event/sw/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/event/sw/version.map
+++ b/drivers/event/sw/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/mempool/bucket/version.map b/drivers/mempool/bucket/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/mempool/bucket/version.map
+++ b/drivers/mempool/bucket/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/mempool/dpaa2/version.map b/drivers/mempool/dpaa2/version.map
index 473b8c90e8..49c460ec54 100644
--- a/drivers/mempool/dpaa2/version.map
+++ b/drivers/mempool/dpaa2/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_dpaa2_mbuf_from_buf_addr;
diff --git a/drivers/mempool/octeontx/version.map b/drivers/mempool/octeontx/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/mempool/octeontx/version.map
+++ b/drivers/mempool/octeontx/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/mempool/ring/version.map b/drivers/mempool/ring/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/mempool/ring/version.map
+++ b/drivers/mempool/ring/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/mempool/stack/version.map b/drivers/mempool/stack/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/mempool/stack/version.map
+++ b/drivers/mempool/stack/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/af_packet/version.map b/drivers/net/af_packet/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/af_packet/version.map
+++ b/drivers/net/af_packet/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/af_xdp/version.map b/drivers/net/af_xdp/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/af_xdp/version.map
+++ b/drivers/net/af_xdp/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/ark/version.map b/drivers/net/ark/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/ark/version.map
+++ b/drivers/net/ark/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/atlantic/version.map b/drivers/net/atlantic/version.map
index 6e17832684..d36fc61a84 100644
--- a/drivers/net/atlantic/version.map
+++ b/drivers/net/atlantic/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/net/avp/version.map b/drivers/net/avp/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/avp/version.map
+++ b/drivers/net/avp/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/axgbe/version.map b/drivers/net/axgbe/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/axgbe/version.map
+++ b/drivers/net/axgbe/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/bnx2x/version.map b/drivers/net/bnx2x/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/bnx2x/version.map
+++ b/drivers/net/bnx2x/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/bnxt/version.map b/drivers/net/bnxt/version.map
index a050d86ab7..2ba5ec5f6e 100644
--- a/drivers/net/bnxt/version.map
+++ b/drivers/net/bnxt/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pmd_bnxt_get_vf_rx_status;
diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
index df81ee74c1..d7142c4f94 100644
--- a/drivers/net/bonding/version.map
+++ b/drivers/net/bonding/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_eth_bond_8023ad_agg_selection_get;
diff --git a/drivers/net/cnxk/version.map b/drivers/net/cnxk/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/cnxk/version.map
+++ b/drivers/net/cnxk/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/cxgbe/version.map b/drivers/net/cxgbe/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/cxgbe/version.map
+++ b/drivers/net/cxgbe/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/dpaa/version.map b/drivers/net/dpaa/version.map
index 87ce8f5b6c..338ea2d8b2 100644
--- a/drivers/net/dpaa/version.map
+++ b/drivers/net/dpaa/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pmd_dpaa_set_tx_loopback;
diff --git a/drivers/net/e1000/version.map b/drivers/net/e1000/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/e1000/version.map
+++ b/drivers/net/e1000/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/ena/version.map b/drivers/net/ena/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/ena/version.map
+++ b/drivers/net/ena/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/enetc/version.map b/drivers/net/enetc/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/enetc/version.map
+++ b/drivers/net/enetc/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/enic/version.map b/drivers/net/enic/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/enic/version.map
+++ b/drivers/net/enic/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/failsafe/version.map b/drivers/net/failsafe/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/failsafe/version.map
+++ b/drivers/net/failsafe/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/fm10k/version.map b/drivers/net/fm10k/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/fm10k/version.map
+++ b/drivers/net/fm10k/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/hinic/version.map b/drivers/net/hinic/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/hinic/version.map
+++ b/drivers/net/hinic/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/hns3/version.map b/drivers/net/hns3/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/hns3/version.map
+++ b/drivers/net/hns3/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/i40e/version.map b/drivers/net/i40e/version.map
index 413c58cb21..5dd68158d3 100644
--- a/drivers/net/i40e/version.map
+++ b/drivers/net/i40e/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pmd_i40e_add_vf_mac_addr;
diff --git a/drivers/net/iavf/version.map b/drivers/net/iavf/version.map
index 2a411da2e9..f3efe756cf 100644
--- a/drivers/net/iavf/version.map
+++ b/drivers/net/iavf/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/net/ice/version.map b/drivers/net/ice/version.map
index 632a296a0c..cc837f1c00 100644
--- a/drivers/net/ice/version.map
+++ b/drivers/net/ice/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/net/igc/version.map b/drivers/net/igc/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/igc/version.map
+++ b/drivers/net/igc/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/ionic/version.map b/drivers/net/ionic/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/ionic/version.map
+++ b/drivers/net/ionic/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/ipn3ke/version.map b/drivers/net/ipn3ke/version.map
index d8cc1026e0..568ce32e88 100644
--- a/drivers/net/ipn3ke/version.map
+++ b/drivers/net/ipn3ke/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/net/ixgbe/version.map b/drivers/net/ixgbe/version.map
index 9402802b04..bca5cc5826 100644
--- a/drivers/net/ixgbe/version.map
+++ b/drivers/net/ixgbe/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pmd_ixgbe_bypass_event_show;
diff --git a/drivers/net/kni/version.map b/drivers/net/kni/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/kni/version.map
+++ b/drivers/net/kni/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/liquidio/version.map b/drivers/net/liquidio/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/liquidio/version.map
+++ b/drivers/net/liquidio/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/memif/version.map b/drivers/net/memif/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/memif/version.map
+++ b/drivers/net/memif/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/mlx4/version.map b/drivers/net/mlx4/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/mlx4/version.map
+++ b/drivers/net/mlx4/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/mlx5/version.map b/drivers/net/mlx5/version.map
index 82a32b53da..0af7a12488 100644
--- a/drivers/net/mlx5/version.map
+++ b/drivers/net/mlx5/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/net/mvneta/version.map b/drivers/net/mvneta/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/mvneta/version.map
+++ b/drivers/net/mvneta/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/mvpp2/version.map b/drivers/net/mvpp2/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/mvpp2/version.map
+++ b/drivers/net/mvpp2/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/netvsc/version.map b/drivers/net/netvsc/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/netvsc/version.map
+++ b/drivers/net/netvsc/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/nfb/version.map b/drivers/net/nfb/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/nfb/version.map
+++ b/drivers/net/nfb/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/nfp/version.map b/drivers/net/nfp/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/nfp/version.map
+++ b/drivers/net/nfp/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/ngbe/version.map b/drivers/net/ngbe/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/ngbe/version.map
+++ b/drivers/net/ngbe/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/null/version.map b/drivers/net/null/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/null/version.map
+++ b/drivers/net/null/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/octeontx/version.map b/drivers/net/octeontx/version.map
index 6dda72890c..d12156778e 100644
--- a/drivers/net/octeontx/version.map
+++ b/drivers/net/octeontx/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_octeontx_pchan_map;
diff --git a/drivers/net/octeontx2/version.map b/drivers/net/octeontx2/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/octeontx2/version.map
+++ b/drivers/net/octeontx2/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/octeontx_ep/version.map b/drivers/net/octeontx_ep/version.map
index 6e4fb220ac..c2e0723b4c 100644
--- a/drivers/net/octeontx_ep/version.map
+++ b/drivers/net/octeontx_ep/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
-        local: *;
+DPDK_22 {
+	local: *;
 };
diff --git a/drivers/net/pcap/version.map b/drivers/net/pcap/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/pcap/version.map
+++ b/drivers/net/pcap/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/pfe/version.map b/drivers/net/pfe/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/pfe/version.map
+++ b/drivers/net/pfe/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/qede/version.map b/drivers/net/qede/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/qede/version.map
+++ b/drivers/net/qede/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/ring/version.map b/drivers/net/ring/version.map
index 29770fe3e4..e43f5ea908 100644
--- a/drivers/net/ring/version.map
+++ b/drivers/net/ring/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_eth_from_ring;
diff --git a/drivers/net/sfc/version.map b/drivers/net/sfc/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/sfc/version.map
+++ b/drivers/net/sfc/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/softnic/version.map b/drivers/net/softnic/version.map
index 530d2e6b72..6784318f77 100644
--- a/drivers/net/softnic/version.map
+++ b/drivers/net/softnic/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pmd_softnic_run;
diff --git a/drivers/net/szedata2/version.map b/drivers/net/szedata2/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/szedata2/version.map
+++ b/drivers/net/szedata2/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/tap/version.map b/drivers/net/tap/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/tap/version.map
+++ b/drivers/net/tap/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/thunderx/version.map b/drivers/net/thunderx/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/thunderx/version.map
+++ b/drivers/net/thunderx/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/txgbe/version.map b/drivers/net/txgbe/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/txgbe/version.map
+++ b/drivers/net/txgbe/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/vdev_netvsc/version.map b/drivers/net/vdev_netvsc/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/vdev_netvsc/version.map
+++ b/drivers/net/vdev_netvsc/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/vhost/version.map b/drivers/net/vhost/version.map
index 634255829e..1aa8abef75 100644
--- a/drivers/net/vhost/version.map
+++ b/drivers/net/vhost/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_eth_vhost_get_queue_event;
diff --git a/drivers/net/virtio/version.map b/drivers/net/virtio/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/virtio/version.map
+++ b/drivers/net/virtio/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/net/vmxnet3/version.map b/drivers/net/vmxnet3/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/net/vmxnet3/version.map
+++ b/drivers/net/vmxnet3/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/raw/cnxk_bphy/version.map b/drivers/raw/cnxk_bphy/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/raw/cnxk_bphy/version.map
+++ b/drivers/raw/cnxk_bphy/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/raw/dpaa2_cmdif/version.map b/drivers/raw/dpaa2_cmdif/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/raw/dpaa2_cmdif/version.map
+++ b/drivers/raw/dpaa2_cmdif/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/raw/dpaa2_qdma/version.map b/drivers/raw/dpaa2_qdma/version.map
index 9130383ab8..441918d55e 100644
--- a/drivers/raw/dpaa2_qdma/version.map
+++ b/drivers/raw/dpaa2_qdma/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_qdma_vq_stats;
diff --git a/drivers/raw/ifpga/version.map b/drivers/raw/ifpga/version.map
index 995c419a9b..a1a6be25a9 100644
--- a/drivers/raw/ifpga/version.map
+++ b/drivers/raw/ifpga/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
 
diff --git a/drivers/raw/ioat/version.map b/drivers/raw/ioat/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/raw/ioat/version.map
+++ b/drivers/raw/ioat/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/raw/ntb/version.map b/drivers/raw/ntb/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/raw/ntb/version.map
+++ b/drivers/raw/ntb/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/raw/octeontx2_dma/version.map b/drivers/raw/octeontx2_dma/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/raw/octeontx2_dma/version.map
+++ b/drivers/raw/octeontx2_dma/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/raw/octeontx2_ep/version.map b/drivers/raw/octeontx2_ep/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/raw/octeontx2_ep/version.map
+++ b/drivers/raw/octeontx2_ep/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/raw/skeleton/version.map b/drivers/raw/skeleton/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/raw/skeleton/version.map
+++ b/drivers/raw/skeleton/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/regex/mlx5/version.map b/drivers/regex/mlx5/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/regex/mlx5/version.map
+++ b/drivers/regex/mlx5/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/regex/octeontx2/version.map b/drivers/regex/octeontx2/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/regex/octeontx2/version.map
+++ b/drivers/regex/octeontx2/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/vdpa/ifc/version.map b/drivers/vdpa/ifc/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/vdpa/ifc/version.map
+++ b/drivers/vdpa/ifc/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/drivers/vdpa/mlx5/version.map b/drivers/vdpa/mlx5/version.map
index 4a76d1d52d..c2e0723b4c 100644
--- a/drivers/vdpa/mlx5/version.map
+++ b/drivers/vdpa/mlx5/version.map
@@ -1,3 +1,3 @@
-DPDK_21 {
+DPDK_22 {
 	local: *;
 };
diff --git a/lib/acl/version.map b/lib/acl/version.map
index d97f2927bf..2b18c21601 100644
--- a/lib/acl/version.map
+++ b/lib/acl/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_acl_add_rules;
diff --git a/lib/bitratestats/version.map b/lib/bitratestats/version.map
index 152730bb4e..c15e34d82c 100644
--- a/lib/bitratestats/version.map
+++ b/lib/bitratestats/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_stats_bitrate_calc;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index b75a0034bc..0bf35f4876 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_bpf_destroy;
diff --git a/lib/cfgfile/version.map b/lib/cfgfile/version.map
index 180c42b717..02cbccb8ab 100644
--- a/lib/cfgfile/version.map
+++ b/lib/cfgfile/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_cfgfile_add_entry;
diff --git a/lib/cmdline/version.map b/lib/cmdline/version.map
index 9df0272152..980adb4f23 100644
--- a/lib/cmdline/version.map
+++ b/lib/cmdline/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	cirbuf_add_buf_head;
diff --git a/lib/cryptodev/version.map b/lib/cryptodev/version.map
index 9f04737aed..979d823a7c 100644
--- a/lib/cryptodev/version.map
+++ b/lib/cryptodev/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_crypto_aead_algorithm_strings;
diff --git a/lib/distributor/version.map b/lib/distributor/version.map
index 1ddcd01fe6..4d9ff07373 100644
--- a/lib/distributor/version.map
+++ b/lib/distributor/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_distributor_clear_returns;
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 887012d02a..beeb986adc 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	__rte_panic;
diff --git a/lib/efd/version.map b/lib/efd/version.map
index 425c0a85a9..0226285245 100644
--- a/lib/efd/version.map
+++ b/lib/efd/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_efd_create;
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 44d30b05ae..3eece75b72 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_eth_add_first_rx_callback;
diff --git a/lib/eventdev/version.map b/lib/eventdev/version.map
index 7e264d3b8d..88625621ec 100644
--- a/lib/eventdev/version.map
+++ b/lib/eventdev/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_event_crypto_adapter_caps_get;
diff --git a/lib/gro/version.map b/lib/gro/version.map
index 19dc66b0d4..f8a32e221c 100644
--- a/lib/gro/version.map
+++ b/lib/gro/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_gro_ctx_create;
diff --git a/lib/gso/version.map b/lib/gso/version.map
index 60aa1b54e4..73767623b9 100644
--- a/lib/gso/version.map
+++ b/lib/gso/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_gso_segment;
diff --git a/lib/hash/version.map b/lib/hash/version.map
index 9b9519745c..ce4309aa07 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_fbk_hash_create;
diff --git a/lib/ip_frag/version.map b/lib/ip_frag/version.map
index 82b308ddb0..33f231fb31 100644
--- a/lib/ip_frag/version.map
+++ b/lib/ip_frag/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_ip_frag_free_death_row;
diff --git a/lib/ipsec/version.map b/lib/ipsec/version.map
index ad3e38b7c8..ba8753eac4 100644
--- a/lib/ipsec/version.map
+++ b/lib/ipsec/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_ipsec_pkt_crypto_group;
diff --git a/lib/jobstats/version.map b/lib/jobstats/version.map
index 3e166ad548..89faa02004 100644
--- a/lib/jobstats/version.map
+++ b/lib/jobstats/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_jobstats_abort;
diff --git a/lib/kni/version.map b/lib/kni/version.map
index bb810a7f2f..cc7790651a 100644
--- a/lib/kni/version.map
+++ b/lib/kni/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_kni_alloc;
diff --git a/lib/kvargs/version.map b/lib/kvargs/version.map
index ce8a9175dd..a07166b4d2 100644
--- a/lib/kvargs/version.map
+++ b/lib/kvargs/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_kvargs_count;
diff --git a/lib/latencystats/version.map b/lib/latencystats/version.map
index 0c4360ab43..be5b014cd7 100644
--- a/lib/latencystats/version.map
+++ b/lib/latencystats/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_latencystats_get;
diff --git a/lib/lpm/version.map b/lib/lpm/version.map
index b4d437cc75..0cdd04822e 100644
--- a/lib/lpm/version.map
+++ b/lib/lpm/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_lpm6_add;
diff --git a/lib/mbuf/version.map b/lib/mbuf/version.map
index b7d98e7eb1..29654330eb 100644
--- a/lib/mbuf/version.map
+++ b/lib/mbuf/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	__rte_pktmbuf_linearize;
diff --git a/lib/member/version.map b/lib/member/version.map
index b8c6322e73..f287aabc91 100644
--- a/lib/member/version.map
+++ b/lib/member/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_member_add;
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index 50b0602952..9f77da6fff 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_mempool_audit;
diff --git a/lib/meter/version.map b/lib/meter/version.map
index b67f860b15..befa3b7e32 100644
--- a/lib/meter/version.map
+++ b/lib/meter/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_meter_srtcm_config;
diff --git a/lib/metrics/version.map b/lib/metrics/version.map
index 20f99cd19a..c86e405971 100644
--- a/lib/metrics/version.map
+++ b/lib/metrics/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_metrics_get_names;
diff --git a/lib/net/version.map b/lib/net/version.map
index 621f237945..355b7c25b4 100644
--- a/lib/net/version.map
+++ b/lib/net/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_eth_random_addr;
diff --git a/lib/pci/version.map b/lib/pci/version.map
index 1db19a5122..3f38303749 100644
--- a/lib/pci/version.map
+++ b/lib/pci/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pci_addr_cmp;
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index 2f9e952d0b..f0a9d12c9a 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pdump_disable;
diff --git a/lib/pipeline/version.map b/lib/pipeline/version.map
index ff0974c2ee..2b68f584a4 100644
--- a/lib/pipeline/version.map
+++ b/lib/pipeline/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_pipeline_ah_packet_drop;
diff --git a/lib/port/version.map b/lib/port/version.map
index 70922e11ee..73d0825d2e 100644
--- a/lib/port/version.map
+++ b/lib/port/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_port_ethdev_reader_ops;
diff --git a/lib/power/version.map b/lib/power/version.map
index b004e3e4a9..6ec6d5d96d 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_power_exit;
diff --git a/lib/rawdev/version.map b/lib/rawdev/version.map
index eb29a3ac0d..4f56870761 100644
--- a/lib/rawdev/version.map
+++ b/lib/rawdev/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_rawdev_close;
diff --git a/lib/rcu/version.map b/lib/rcu/version.map
index 82e55c6329..b63c74f856 100644
--- a/lib/rcu/version.map
+++ b/lib/rcu/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_rcu_log_type;
diff --git a/lib/reorder/version.map b/lib/reorder/version.map
index d902a7fa12..250e6664f5 100644
--- a/lib/reorder/version.map
+++ b/lib/reorder/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_reorder_create;
diff --git a/lib/ring/version.map b/lib/ring/version.map
index e35d6b9712..3377293ee4 100644
--- a/lib/ring/version.map
+++ b/lib/ring/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_ring_create;
diff --git a/lib/sched/version.map b/lib/sched/version.map
index ace284b7de..53c337b143 100644
--- a/lib/sched/version.map
+++ b/lib/sched/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_approx;
diff --git a/lib/security/version.map b/lib/security/version.map
index 22775558c8..c44c7f5f60 100644
--- a/lib/security/version.map
+++ b/lib/security/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_security_attach_session;
diff --git a/lib/stack/version.map b/lib/stack/version.map
index 8c4ca0245d..e145e32451 100644
--- a/lib/stack/version.map
+++ b/lib/stack/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_stack_create;
diff --git a/lib/table/version.map b/lib/table/version.map
index 29301480cb..65f9645d25 100644
--- a/lib/table/version.map
+++ b/lib/table/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_table_acl_ops;
diff --git a/lib/timer/version.map b/lib/timer/version.map
index 8021ccf9cf..4b782456da 100644
--- a/lib/timer/version.map
+++ b/lib/timer/version.map
@@ -1,4 +1,4 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
 	rte_timer_alt_dump_stats;
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index e2504ba657..c92a9d4962 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -1,12 +1,26 @@
-DPDK_21 {
+DPDK_22 {
 	global:
 
+	rte_vdpa_find_device_by_name;
+	rte_vdpa_get_features;
+	rte_vdpa_get_protocol_features;
+	rte_vdpa_get_queue_num;
+	rte_vdpa_get_rte_device;
+	rte_vdpa_get_stats;
+	rte_vdpa_get_stats_names;
+	rte_vdpa_register_device;
+	rte_vdpa_relay_vring_used;
+	rte_vdpa_reset_stats;
+	rte_vdpa_unregister_device;
 	rte_vhost_avail_entries;
 	rte_vhost_dequeue_burst;
+	rte_vhost_driver_attach_vdpa_device;
 	rte_vhost_driver_callback_register;
+	rte_vhost_driver_detach_vdpa_device;
 	rte_vhost_driver_disable_features;
 	rte_vhost_driver_enable_features;
 	rte_vhost_driver_get_features;
+	rte_vhost_driver_get_vdpa_device;
 	rte_vhost_driver_register;
 	rte_vhost_driver_set_features;
 	rte_vhost_driver_start;
@@ -14,37 +28,23 @@ DPDK_21 {
 	rte_vhost_enable_guest_notification;
 	rte_vhost_enqueue_burst;
 	rte_vhost_get_ifname;
+	rte_vhost_get_log_base;
 	rte_vhost_get_mem_table;
 	rte_vhost_get_mtu;
 	rte_vhost_get_negotiated_features;
 	rte_vhost_get_numa_node;
 	rte_vhost_get_queue_num;
+	rte_vhost_get_vdpa_device;
 	rte_vhost_get_vhost_vring;
+	rte_vhost_get_vring_base;
 	rte_vhost_get_vring_num;
 	rte_vhost_gpa_to_vva;
+	rte_vhost_host_notifier_ctrl;
 	rte_vhost_log_used_vring;
 	rte_vhost_log_write;
 	rte_vhost_rx_queue_count;
-	rte_vhost_vring_call;
-	rte_vhost_get_log_base;
-	rte_vhost_get_vring_base;
 	rte_vhost_set_vring_base;
-	rte_vhost_host_notifier_ctrl;
-	rte_vdpa_register_device;
-	rte_vdpa_unregister_device;
-	rte_vdpa_get_stats_names;
-	rte_vdpa_get_stats;
-	rte_vdpa_reset_stats;
-	rte_vhost_driver_attach_vdpa_device;
-	rte_vhost_driver_detach_vdpa_device;
-	rte_vhost_driver_get_vdpa_device;
-	rte_vhost_get_vdpa_device;
-	rte_vdpa_find_device_by_name;
-	rte_vdpa_get_rte_device;
-	rte_vdpa_get_queue_num;
-	rte_vdpa_get_features;
-	rte_vdpa_get_protocol_features;
-	rte_vdpa_relay_vring_used;
+	rte_vhost_vring_call;
 
 	local: *;
 };
-- 
2.31.1


^ permalink raw reply	[relevance 11%]

* Re: [dpdk-dev] [dpdk-announce] DPDK 21.08 released
  2021-08-08 17:46  3% [dpdk-dev] [dpdk-announce] DPDK 21.08 released Thomas Monjalon
@ 2021-08-08 17:50  0% ` St Leger, Jim
  0 siblings, 0 replies; 200+ results
From: St Leger, Jim @ 2021-08-08 17:50 UTC (permalink / raw)
  To: dev

Nice work by all! (This release should be called the Olympic Release, out just as the Tokyo 2020 games are concluding.)

Now go off and enjoy some well-earned summer holidays. 

Stay safe,
Jim


> On Aug 8, 2021, at 10:47, Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> A new release is available:
>    https://fast.dpdk.org/rel/dpdk-21.08.tar.xz
> 
> Summer release numbers:
>    922 commits from 159 authors
>    1069 files changed, 150746 insertions(+), 85146 deletions(-)
> 
> It is not planned to start a maintenance branch for 21.08.
> This version is ABI-compatible with 20.11, 21.02 and 21.05.
> 
> Below are some new features:
>    - Linux auxiliary bus
>    - Aarch32 cross-compilation
>    - Arm CPPC power management
>    - Rx multi-queue monitoring for power management
>    - XZ compressed firmware read
>    - Marvell CNXK drivers for ethernet, crypto and baseband PHY
>    - Wangxun ngbe ethernet driver
>    - NVIDIA mlx5 crypto driver supporting AES-XTS
>    - ISAL compress support on Arm
> 
> More details in the release notes:
>    https://doc.dpdk.org/guides/rel_notes/release_21_08.html
> 
> 
> There are 30 new contributors (including authors, reviewers and testers).
> Welcome to Aakash Sasidharan, Aman Deep Singh, Cheng Liu, Chenglian Sun,
> Conor Fogarty, Douglas Flint, Gaoxiang Liu, Ghalem Boudour,
> Gordon Noonan, Heng Wang, Henry Nadeau, James Grant, Jeffrey Huang,
> Jochen Behrens, John Levon, Lior Margalit, Martin Havlik,
> Naga Harish K S V, Nathan Skrzypczak, Owen Hilyard, Paulis Gributs,
> Raja Zidane, Rebecca Troy, Rob Scheepens, Rongwei Liu, Shai Brandes,
> Srujana Challa, Tudor Cornea, Vanshika Shukla, and Yixue Wang.
> 
> Below is the number of commits per employer (with authors count):
>    222     Marvell (22)
>    183     NVIDIA (26)
>    168     Intel (44)
>    100     Broadcom (12)
>     45     OKTET Labs (5)
>     36     Huawei (7)
>     35     Arm (7)
>     29     Red Hat (5)
>     20     Trustnet (1)
>     17     6WIND (3)
>     13     Microsoft (2)
>      8     NXP (4)
>      7     Semihalf (1)
>      5     UNH (2)
>      5     PANTHEON.tech (1)
>      4     Chelsio (1)
>      3     IBM (1)
> 
> Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:
>    45     Akhil Goyal <gakhil@marvell.com>
>    34     Jerin Jacob <jerinj@marvell.com>
>    21     Ruifeng Wang <ruifeng.wang@arm.com>
>    20     Ajit Khaparde <ajit.khaparde@broadcom.com>
>    19     Matan Azrad <matan@nvidia.com>
>    19     Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>    17     Konstantin Ananyev <konstantin.ananyev@intel.com>
>    15     Chenbo Xia <chenbo.xia@intel.com>
>    14     Maxime Coquelin <maxime.coquelin@redhat.com>
>    14     David Marchand <david.marchand@redhat.com>
>    13     Viacheslav Ovsiienko <viacheslavo@nvidia.com>
>    11     Thomas Monjalon <thomas@monjalon.net>
>     9     Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
>     8     Stephen Hemminger <stephen@networkplumber.org>
>     8     Bruce Richardson <bruce.richardson@intel.com>
> 
> 
> DPDK 21.11 will be a big and busy release.
> The new features for 21.11 can be submitted during one month:
>    http://core.dpdk.org/roadmap#dates
> Please share your features roadmap.
> 
> Thanks everyone
> 
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [dpdk-announce] DPDK 21.08 released
@ 2021-08-08 17:46  3% Thomas Monjalon
  2021-08-08 17:50  0% ` St Leger, Jim
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-08-08 17:46 UTC (permalink / raw)
  To: announce

A new release is available:
	https://fast.dpdk.org/rel/dpdk-21.08.tar.xz

Summer release numbers:
	922 commits from 159 authors
	1069 files changed, 150746 insertions(+), 85146 deletions(-)

It is not planned to start a maintenance branch for 21.08.
This version is ABI-compatible with 20.11, 21.02 and 21.05.

Below are some new features:
	- Linux auxiliary bus
	- Aarch32 cross-compilation
	- Arm CPPC power management
	- Rx multi-queue monitoring for power management
	- XZ compressed firmware read
	- Marvell CNXK drivers for ethernet, crypto and baseband PHY
	- Wangxun ngbe ethernet driver
	- NVIDIA mlx5 crypto driver supporting AES-XTS
	- ISAL compress support on Arm

More details in the release notes:
	https://doc.dpdk.org/guides/rel_notes/release_21_08.html


There are 30 new contributors (including authors, reviewers and testers).
Welcome to Aakash Sasidharan, Aman Deep Singh, Cheng Liu, Chenglian Sun,
Conor Fogarty, Douglas Flint, Gaoxiang Liu, Ghalem Boudour,
Gordon Noonan, Heng Wang, Henry Nadeau, James Grant, Jeffrey Huang,
Jochen Behrens, John Levon, Lior Margalit, Martin Havlik,
Naga Harish K S V, Nathan Skrzypczak, Owen Hilyard, Paulis Gributs,
Raja Zidane, Rebecca Troy, Rob Scheepens, Rongwei Liu, Shai Brandes,
Srujana Challa, Tudor Cornea, Vanshika Shukla, and Yixue Wang.

Below is the number of commits per employer (with authors count):
	222     Marvell (22)
	183     NVIDIA (26)
	168     Intel (44)
	100     Broadcom (12)
	 45     OKTET Labs (5)
	 36     Huawei (7)
	 35     Arm (7)
	 29     Red Hat (5)
	 20     Trustnet (1)
	 17     6WIND (3)
	 13     Microsoft (2)
	  8     NXP (4)
	  7     Semihalf (1)
	  5     UNH (2)
	  5     PANTHEON.tech (1)
	  4     Chelsio (1)
	  3     IBM (1)

Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:
	45     Akhil Goyal <gakhil@marvell.com>
	34     Jerin Jacob <jerinj@marvell.com>
	21     Ruifeng Wang <ruifeng.wang@arm.com>
	20     Ajit Khaparde <ajit.khaparde@broadcom.com>
	19     Matan Azrad <matan@nvidia.com>
	19     Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
	17     Konstantin Ananyev <konstantin.ananyev@intel.com>
	15     Chenbo Xia <chenbo.xia@intel.com>
	14     Maxime Coquelin <maxime.coquelin@redhat.com>
	14     David Marchand <david.marchand@redhat.com>
	13     Viacheslav Ovsiienko <viacheslavo@nvidia.com>
	11     Thomas Monjalon <thomas@monjalon.net>
	 9     Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
	 8     Stephen Hemminger <stephen@networkplumber.org>
	 8     Bruce Richardson <bruce.richardson@intel.com>


DPDK 21.11 will be a big and busy release.
The new features for 21.11 can be submitted during one month:
	http://core.dpdk.org/roadmap#dates
Please share your features roadmap.

Thanks everyone



^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 5/5] devtools: test different build types
  @ 2021-08-08 12:51 23%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-08 12:51 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, david.marchand, Andrew Rybchenko

All builds were of type debugoptimized.
It is kept only for builds having an ABI check.
Others will have the default build type (release),
except if specified differently as in the x86 generic build
which will be a test of the non-optimized debug build type.
Some static builds will test the minsize build type.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

---

This patch cannot be merged now because it makes clang 11.1.0 crashing.
---
 devtools/test-meson-builds.sh | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index 9ec8e2bc7e..7bd305a669 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -92,13 +92,16 @@ load_env () # <target compiler>
 	command -v $targetcc >/dev/null 2>&1 || return 1
 }
 
-config () # <dir> <builddir> <meson options>
+config () # <dir> <builddir> <ABI check> <meson options>
 {
 	dir=$1
 	shift
 	builddir=$1
 	shift
+	abicheck=$1
+	shift
 	if [ -f "$builddir/build.ninja" ] ; then
+		[ $abicheck = ABI ] || return 0
 		# for existing environments, switch to debugoptimized if unset
 		# so that ABI checks can run
 		if ! $MESON configure $builddir |
@@ -114,7 +117,9 @@ config () # <dir> <builddir> <meson options>
 	else
 		options="$options -Dexamples=l3fwd" # save disk space
 	fi
-	options="$options --buildtype=debugoptimized"
+	if [ $abicheck = ABI ] ; then
+		options="$options --buildtype=debugoptimized"
+	fi
 	for option in $DPDK_MESON_OPTIONS ; do
 		options="$options -D$option"
 	done
@@ -165,7 +170,7 @@ build () # <directory> <target cc | cross file> <ABI check> [meson options]
 		cross=
 	fi
 	load_env $targetcc || return 0
-	config $srcdir $builds_dir/$targetdir $cross --werror $*
+	config $srcdir $builds_dir/$targetdir $abicheck $cross --werror $*
 	compile $builds_dir/$targetdir
 	if [ -n "$DPDK_ABI_REF_VERSION" -a "$abicheck" = ABI ] ; then
 		abirefdir=${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION
@@ -179,7 +184,7 @@ build () # <directory> <target cc | cross file> <ABI check> [meson options]
 			fi
 
 			rm -rf $abirefdir/build
-			config $abirefdir/src $abirefdir/build $cross \
+			config $abirefdir/src $abirefdir/build $abicheck $cross \
 				-Dexamples= $*
 			compile $abirefdir/build
 			install_target $abirefdir/build $abirefdir/$targetdir
@@ -211,11 +216,13 @@ for c in gcc clang ; do
 	for s in static shared ; do
 		if [ $s = shared ] ; then
 			abicheck=ABI
+			buildtype=
 		else
 			abicheck=skipABI # save time and disk space
+			buildtype='--buildtype=minsize'
 		fi
 		export CC="$CCACHE $c"
-		build build-$c-$s $c $abicheck --default-library=$s
+		build build-$c-$s $c $abicheck $buildtype --default-library=$s
 		unset CC
 	done
 done
@@ -227,7 +234,7 @@ generic_isa='nehalem'
 if ! check_cc_flags "-march=$generic_isa" ; then
 	generic_isa='corei7'
 fi
-build build-x86-generic cc skipABI -Dcheck_includes=true \
+build build-x86-generic cc skipABI --buildtype=debug -Dcheck_includes=true \
 	-Dlibdir=lib -Dcpu_instruction_set=$generic_isa $use_shared
 
 # 32-bit with default compiler
-- 
2.31.1


^ permalink raw reply	[relevance 23%]

* [dpdk-dev] [PATCH v8 2/2] devtools: script to send notifications of expired symbols
    2021-08-06 17:54  5%   ` [dpdk-dev] [PATCH v8 1/2] devtools: script to track map symbols Ray Kinsella
@ 2021-08-06 17:54  5%   ` Ray Kinsella
  1 sibling, 0 replies; 200+ results
From: Ray Kinsella @ 2021-08-06 17:54 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, stephen, ferruh.yigit, thomas, ktraynor, mdr

Use this script with the output of the DPDK symbol tool, to notify
maintainers of expired symbols by email. You need to define the environment
variable DPDK_GETMAINTAINER_PATH, for this tool to work.

Use terminal output to review the emails before sending.
e.g.
$ devtools/symbol-tool.py list-expired --format-output csv \
| DPDK_GETMAINTAINER_PATH=<somewhere>/get_maintainer.pl \
devtools/notify_expired_symbols.py --format-output terminal

Then use email output to send the emails to the maintainers.
e.g.
$ devtools/symbol-tool.py list-expired --format-output csv \
| DPDK_GETMAINTAINER_PATH=<somewhere>/get_maintainer.pl \
--smtp-server <server> --sender <someone@somewhere.com> \
--password <password>

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 devtools/notify-symbol-maintainers.py | 224 ++++++++++++++++++++++++++
 1 file changed, 224 insertions(+)
 create mode 100755 devtools/notify-symbol-maintainers.py

diff --git a/devtools/notify-symbol-maintainers.py b/devtools/notify-symbol-maintainers.py
new file mode 100755
index 0000000000..447f88bb03
--- /dev/null
+++ b/devtools/notify-symbol-maintainers.py
@@ -0,0 +1,224 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to notify maintainers of expired symbols'''
+import smtplib
+import ssl
+import sys
+import subprocess
+import argparse
+from argparse import RawTextHelpFormatter
+import time
+from email.message import EmailMessage
+
+DESCRIPTION = '''
+Use this script with the output of the DPDK symbol tool, to notify maintainers
+of expired symbols by email. You need to define the environment variable
+DPDK_GETMAINTAINER_PATH, for this tool to work.
+
+Use terminal output to review the emails before sending.
+e.g.
+$ devtools/symbol-tool.py list-expired --format-output csv \\
+| DPDK_GETMAINTAINER_PATH=<somewhere>/get_maintainer.pl \\
+devtools/notify_expired_symbols.py --format-output terminal
+
+Then use email output to send the emails to the maintainers.
+e.g.
+$ devtools/symbol-tool.py list-expired --format-output csv \\
+| DPDK_GETMAINTAINER_PATH=<somewhere>/get_maintainer.pl \\
+--smtp-server <server> --sender <someone@somewhere.com> --password <password>
+'''
+
+EMAIL_TEMPLATE = '''Hi there,
+
+Please note the symbols listed below have expired. In line with the DPDK ABI
+policy, they should be scheduled for removal, in the next DPDK release.
+
+For more information, please see the DPDK ABI Policy, section 3.5.3.
+https://doc.dpdk.org/guides/contributing/abi_policy.html
+
+Thanks,
+
+The DPDK Symbol Bot
+
+'''
+
+default_maintainers = ['Ray Kinsella <mdr@ashroe.eu>', \
+                       'Thomas Monjalon <thomas@monjalon.net>']
+get_maintainer = ['devtools/get-maintainer.sh', \
+                  '--email', '-f']
+
+def get_maintainers(libpath):
+    '''Get the maintainers for given library'''
+    try:
+        cmd = get_maintainer + [libpath]
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    if result is not None:
+        email = result.stdout.decode('utf-8')
+        if email == '':
+            email = default_maintainers
+        else:
+            email = list(filter(None,email.split('\n')))
+    else:
+        email = default_maintainers
+
+    return email
+
+def get_message(library, symbols):
+    '''Build email message from symbols, config and maintainers'''
+    message = {}
+    maintainers = get_maintainers(library)
+
+    message['To'] = maintainers
+    if maintainers != default_maintainers:
+        message['CC'] = default_maintainers
+
+    message['Subject'] = 'Expired symbols in {}\n'.format(library)
+
+    body = EMAIL_TEMPLATE
+    for sym in symbols:
+        body += ('{}\n'.format(sym))
+
+    message['Body'] = body
+
+    return message
+
+class OutputEmail():
+    '''Format the output for email'''
+    def __init__(self, config):
+        self.config = config
+
+        self.terminal = OutputTerminal(config)
+        context = ssl.create_default_context()
+
+        # Try to log in to server and send email
+        try:
+            self.server = smtplib.SMTP(config['smtp_server'], 587)
+            self.server.starttls(context=context) # Secure the connection
+            self.server.login(config['sender'], config['password'])
+        except Exception as exception:
+            print(exception)
+            raise exception
+
+    def message(self,message):
+        '''send email'''
+        self.terminal.message(message)
+
+        msg = EmailMessage()
+        msg.set_content(message.pop('Body'))
+
+        for key in message.keys():
+            msg[key] = message[key]
+
+        msg['From'] = self.config['sender']
+        msg['Reply-To'] = 'no-reply@dpdk.org'
+
+        self.server.send_message(msg)
+
+        time.sleep(1)
+
+    def __del__(self):
+        self.server.quit()
+
+class OutputTerminal(): # pylint: disable=too-few-public-methods
+    '''Format the output for the terminal'''
+    def __init__(self, config):
+        self.config = config
+
+    def message(self,message):
+        '''Print email to terminal'''
+        terminal = 'To:' + ', '.join(message['To']) + '\n'
+        if 'sender' in self.config.keys():
+            terminal += 'From:' + self.config['sender'] + '\n'
+
+        terminal += 'Reply-To:' + 'no-reply@dpdk.org' + '\n'
+        if 'CC' in message.keys():
+            terminal += 'CC:' + ', '.join(message['CC']) + '\n'
+
+        terminal += 'Subject:' + message['Subject'] + '\n'
+        terminal += 'Body:' + message['Body'] + '\n'
+
+        print(terminal)
+        print('-' * 80)
+
+def parse_config(args):
+    '''put the command line args in the right places'''
+    config = {}
+    error_msg = None
+
+    outputs = {
+        None : OutputTerminal,
+        'terminal' : OutputTerminal,
+        'email' : OutputEmail
+    }
+
+    if args.format_output == 'email':
+        if args.smtp_server is None:
+            error_msg = 'SMTP server'
+        else:
+            config['smtp_server'] = args.smtp_server
+
+        if args.sender is None:
+            error_msg = 'sender'
+        else:
+            config['sender'] = args.sender
+
+        if args.password is None:
+            error_msg = 'password'
+        else:
+            config['password'] = args.password
+
+    if error_msg is not None:
+        print('Please specify a {} for email output'.format(error_msg))
+        return None
+
+    config['output'] = outputs[args.format_output]
+    return config
+
+def main():
+    '''Main entry point'''
+    parser = argparse.ArgumentParser(description=DESCRIPTION, \
+                                     formatter_class=RawTextHelpFormatter)
+    parser.add_argument('--format-output', choices=['terminal','email'], \
+                        default='terminal')
+    parser.add_argument('--smtp-server')
+    parser.add_argument('--password')
+    parser.add_argument('--sender')
+
+    args = parser.parse_args()
+    config = parse_config(args)
+    if config is None:
+        return
+
+    symbols = []
+    lastlib = library = ''
+
+    output = config['output'](config)
+
+    for line in sys.stdin:
+        line = line.rstrip('\n')
+        library, symbol = [line[:line.find(',')], \
+                           line[line.find(',') + 1: len(line)]]
+        if library == 'mapfile':
+            continue
+
+        if library != lastlib:
+            message = get_message(lastlib, symbols)
+            output.message(message)
+            symbols = []
+
+        lastlib = library
+        symbols = symbols + [symbol]
+
+    #print the last library
+    message = get_message(lastlib, symbols)
+    output.message(message)
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v8 1/2] devtools: script to track map symbols
  @ 2021-08-06 17:54  5%   ` Ray Kinsella
  2021-08-06 17:54  5%   ` [dpdk-dev] [PATCH v8 2/2] devtools: script to send notifications of expired symbols Ray Kinsella
  1 sibling, 0 replies; 200+ results
From: Ray Kinsella @ 2021-08-06 17:54 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, stephen, ferruh.yigit, thomas, ktraynor, mdr

This script tracks the growth of stable and experimental symbols
over releases since v19.11. The script has the ability to
count the added symbols between two dpdk releases, and to
list experimental symbols present in two dpdk releases
(expired symbols).

example usages:

Count symbols added since v19.11
$ devtools/symbol-tool.py count-symbols

Count symbols added since v20.11
$ devtools/symbol-tool.py count-symbols --releases v20.11,v21.05

List experimental symbols present in v20.11 and v21.05
$ devtools/symbol-tool.py list-expired --releases v20.11,v21.05

List experimental symbols in libraries only, present since v19.11
$ devtools/symbol-tool.py list-expired --directory lib

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 devtools/symbol-tool.py | 402 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 402 insertions(+)
 create mode 100755 devtools/symbol-tool.py

diff --git a/devtools/symbol-tool.py b/devtools/symbol-tool.py
new file mode 100755
index 0000000000..39727c9a32
--- /dev/null
+++ b/devtools/symbol-tool.py
@@ -0,0 +1,402 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to count or list symbols in each DPDK release'''
+from pathlib import Path
+import sys
+import os
+import subprocess
+import argparse
+from argparse import RawTextHelpFormatter
+import re
+import datetime
+try:
+    from parsley import makeGrammar
+except ImportError:
+    print('This script uses the package Parsley to parse C Mapfiles.\n'
+          'This can be installed with \"pip install parsley".')
+    sys.exit()
+
+DESCRIPTION = '''
+This script tracks the growth of stable and experimental symbols
+over releases since v19.11. The script has the ability to
+count the added symbols between two dpdk releases, and to
+list experimental symbols present in two dpdk releases
+(expired symbols).
+
+example usages:
+
+Count symbols added since v19.11
+$ devtools/symbol-tool.py count-symbols
+
+Count symbols added since v20.11
+$ devtools/symbol-tool.py count-symbols --releases v20.11,v21.05
+
+List experimental symbols present in v20.11 and v21.05
+$ devtools/symbol-tool.py list-expired --releases v20.11,v21.05
+
+List experimental symbols in libraries only, present since v19.11
+$ devtools/symbol-tool.py list-expired --directory lib
+'''
+
+MAP_GRAMMAR = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+def get_abi_versions():
+    '''Returns a string of possible dpdk abi versions'''
+
+    year = datetime.date.today().year - 2000
+    tags = " |".join(['\'{}\''.format(i) \
+                     for i in reversed(range(21, year + 1)) ])
+    tags  = tags + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return tags
+
+def get_dpdk_releases():
+    '''Returns a list of dpdk release tags names  since v19.11'''
+
+    year = datetime.date.today().year - 2000
+    year_range = "|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile(r'^\"v(' +  year_range + r')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    try:
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        print("Failed to interogate git for release tags")
+        sys.exit()
+
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+def fix_directory_name(path):
+    '''Prepend librte to the source directory name'''
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+def directory_renamed(path, rel):
+    '''Fix removal of the librte_ from the directory names'''
+
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    return result
+
+def mapfile_renamed(path, rel):
+    '''Fix renaming of the map file'''
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE,
+                            check=True)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if newfile is not None:
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        try:
+            result = subprocess.run(['git', 'show', tagfile], \
+                                    stdout=subprocess.PIPE, \
+                                    stderr=subprocess.PIPE,
+                                    check=True)
+        except subprocess.CalledProcessError:
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+def mapfile_and_directory_renamed(path, rel):
+    '''Fix renaming of the map file & the source directory'''
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+FIX_STRATEGIES = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+def get_symbols(map_parser, release, mapfile_path):
+    '''Count the symbols for a given release and mapfile'''
+    abi_sections = {}
+
+    tagfile = '{}:{}'.format(release,mapfile_path)
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    for fix_strategy in FIX_STRATEGIES:
+        if result is not None:
+            break
+        result = fix_strategy(mapfile_path, release)
+
+    if result is not None:
+        mapfile = result.stdout.decode('utf-8')
+        abi_sections = map_parser(mapfile).abi()
+
+    return abi_sections
+
+def get_terminal_rows():
+    '''Find the number of rows in the terminal'''
+
+    try:
+        return os.get_terminal_size().lines
+    except IOError:
+        return 0
+
+class SymbolCountOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  dpdk_releases
+
+        self.terminal_rows = get_terminal_rows()
+        self.row = 0
+
+    def set_terminal_output(self,dpdk_rel):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}' + \
+            ''.join(['{:<6}{:<6}'] * (len(dpdk_rel)))
+        self.column_fmt = '{:50}' + \
+            ''.join(['{:<12}'] * (len(dpdk_rel)))
+
+    def set_csv_output(self,dpdk_rel):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},' + \
+            ','.join(['{},{}'] * (len(dpdk_rel)))
+        self.column_fmt = '{},' + \
+            ','.join(['{},'] * (len(dpdk_rel)))
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+        self.row += 1
+
+    def print_row(self, mapfile, symbols):
+        '''Print row of symbol values'''
+        print(self.output_fmt.format(*([mapfile] + symbols)))
+        self.row += 1
+
+        if((self.terminal_rows>0) and ((self.row % self.terminal_rows) == 0)):
+            self.print_columns()
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+
+class ListExpiredOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.terminal = True
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  \
+            ['expired (' + ','.join(dpdk_releases) + ')']
+
+    def set_terminal_output(self, _):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}{:<50}'
+        self.column_fmt = '{:50}{:50}'
+
+    def set_csv_output(self, _):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},{}'
+        self.column_fmt = '{},{}'
+        self.terminal = False
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+
+    def print_row(self, mapfile, symbols):
+        '''Print row of symbol values'''
+
+        for symbol in symbols:
+            print(self.output_fmt.format(mapfile,symbol))
+            if self.terminal :
+                mapfile = ''
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+
+class CountSymbolsAction:
+    ''' Logic to count symbols added since a give release '''
+    IGNORE_SECTIONS = ['EXPERIMENTAL','INTERNAL']
+
+    def __init__(self, mapfile_path, mapfile_parser, format_output):
+        self.path = mapfile_path
+        self.parser = mapfile_parser
+        self.format_output = format_output
+        self.symbols_count = []
+
+    def add_mapfile(self, release):
+        ''' add a version mapfile '''
+        symbol_count = experimental_count = 0
+
+        symbols = get_symbols(self.parser, release, self.path)
+
+        # which versions are present, and we care about
+        abi_vers = [abi_ver \
+                    for abi_ver in symbols \
+                    if abi_ver not in self.IGNORE_SECTIONS]
+
+        for abi_ver in abi_vers:
+            symbol_count += len(symbols[abi_ver])
+
+        # count experimental symbols
+        if 'EXPERIMENTAL' in symbols.keys():
+            experimental_count = len(symbols['EXPERIMENTAL'])
+
+        self.symbols_count += [symbol_count, experimental_count]
+
+    def __del__(self):
+        self.format_output.print_row(self.path.parent, self.symbols_count)
+
+class ListExpiredAction:
+    ''' Logic to list expired symbols between two releases '''
+
+    def __init__(self, mapfile_path, mapfile_parser, format_output):
+        self.path = mapfile_path
+        self.parser = mapfile_parser
+        self.format_output = format_output
+        self.experimental_symbols = []
+
+    def add_mapfile(self, release):
+        ''' add a version mapfile '''
+        symbols = get_symbols(self.parser, release, self.path)
+        if 'EXPERIMENTAL' in symbols.keys():
+            self.experimental_symbols.append(symbols['EXPERIMENTAL'])
+
+    def __del__(self):
+        if len(self.experimental_symbols) != 2:
+            return
+
+        tmp = self.experimental_symbols
+        # find symbols present in both dpdk releases
+        intersect_syms = [sym for sym in tmp[0] if sym in tmp[1]]
+
+        # check for empty set
+        if intersect_syms == []:
+            return
+
+        self.format_output.print_row(self.path.parent, intersect_syms)
+
+SRC_DIRECTORIES = 'drivers,lib'
+
+ACTIONS = {None: CountSymbolsAction, \
+           'count-symbols': CountSymbolsAction, \
+           'list-expired': ListExpiredAction}
+
+ACTION_OUTPUT = {None: SymbolCountOutput, \
+                 'count-symbols': SymbolCountOutput, \
+                 'list-expired': ListExpiredOutput}
+
+def main():
+    '''Main entry point'''
+
+    dpdk_releases = get_dpdk_releases()
+
+    parser = argparse.ArgumentParser(description=DESCRIPTION, \
+                                     formatter_class=RawTextHelpFormatter
+                                     )
+    parser.add_argument('mode', choices=['count-symbols','list-expired'])
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=SRC_DIRECTORIES.split(','),
+                        default=SRC_DIRECTORIES)
+    parser.add_argument('--releases', \
+                        help='2 x comma separated release tags e.g. \'' \
+                        + ','.join([dpdk_releases[0],dpdk_releases[-1]]) \
+                        + '\'')
+    args = parser.parse_args()
+
+    if args.releases is not None:
+        dpdk_releases = args.releases.split(',')
+
+    if args.mode == 'list-expired':
+        if len(dpdk_releases) < 2:
+            sys.exit('Please specify two releases to compare ' \
+                     'in \'list-expired\' mode.')
+        dpdk_releases = [dpdk_releases[0], dpdk_releases[len(dpdk_releases) - 1]]
+
+    action = ACTIONS[args.mode]
+    format_output = ACTION_OUTPUT[args.mode](args.format_output, dpdk_releases)
+
+    map_grammar = MAP_GRAMMAR.format(get_abi_versions())
+    map_parser = makeGrammar(map_grammar, {})
+
+    format_output.print_columns()
+
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            release_action = action(path, map_parser, format_output)
+
+            for release in dpdk_releases:
+                release_action.add_mapfile(release)
+
+            # all the magic happens in the destructor
+            del release_action
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v1] doc: update release notes for 21.08
@ 2021-08-05 21:57  7% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2021-08-05 21:57 UTC (permalink / raw)
  To: dev; +Cc: thomas, John McNamara

Fix grammar, spelling and formatting of DPDK 21.08 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/release_21_08.rst | 78 ++++++++------------------
 1 file changed, 24 insertions(+), 54 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index d7559ec6bf..0a7b817d9f 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -57,20 +57,20 @@ New Features
 
 * **Added auxiliary bus support.**
 
-  Auxiliary bus provides a way to split function into child-devices
+  An auxiliary bus provides a way to split a function into child-devices
   representing sub-domains of functionality. Each auxiliary device
   represents a part of its parent functionality.
 
 * **Added XZ compressed firmware support.**
 
-  Using ``rte_firmware_read``, a driver can now handle XZ compressed firmware
-  in a transparent way, with EAL uncompressing using libarchive if this library
+  Using ``rte_firmware_read`` a driver can now handle XZ compressed firmware
+  in a transparent way, with EAL uncompressing using libarchive, if this library
   is available when building DPDK.
 
 * **Updated Amazon ENA PMD.**
 
-  The new driver version (v2.4.0) introduced bug fixes and improvements,
-  including:
+  Updated the Amazon ENA PMD. The new driver version (v2.4.0) introduced bug
+  fixes and improvements, including:
 
   * Added Rx interrupt support.
   * RSS hash function key reconfiguration support.
@@ -78,20 +78,20 @@ New Features
 * **Updated Intel iavf driver.**
 
   * Added Tx QoS VF queue TC mapping.
-  * Added FDIR and RSS for GTPoGRE, support filter based on GTPU TEID/QFI,
-    outer most L3 or inner most l3/l4. 
+  * Added FDIR and RSS for GTPoGRE, and support for filters based on GTPU TEID/QFI,
+    outermost L3 or innermost L3/L4.
 
 * **Updated Intel ice driver.**
 
-  * In AVX2 code, added the new RX and TX paths to use the HW offload
+  * Added new RX and TX paths in the AVX2 code to use HW offload
     features. When the HW offload features are configured to be used, the
     offload paths are chosen automatically. In parallel the support for HW
     offload features was removed from the legacy AVX2 paths.
   * Added Tx QoS TC bandwidth configuration in DCF.
 
-* **Added support for Marvell CN10K SoC ethernet device.**
+* **Added support for Marvell CN10K SoC Ethernet device.**
 
-  * Added net/cnxk driver which provides the support for the integrated ethernet
+  * Added net/cnxk driver which provides the support for the integrated Ethernet
     device.
 
 * **Updated Mellanox mlx5 driver.**
@@ -100,44 +100,44 @@ New Features
   * Added support for meter hierarchy.
   * Added support for metering policy actions of yellow color.
   * Added support for metering trTCM RFC2698 and RFC4115.
-  * Added devargs options ``allow_duplicate_pattern``.
+  * Added devargs option ``allow_duplicate_pattern``.
   * Added matching on IPv4 Internet Header Length (IHL).
   * Added support for matching on VXLAN header last 8-bits reserved field.
   * Optimized multi-thread flow rule insertion rate.
 
 * **Added Wangxun ngbe PMD.**
 
-  Added a new PMD driver for Wangxun 1 Gigabit Ethernet NICs.
+  Added a new PMD driver for Wangxun 1Gb Ethernet NICs.
   See the :doc:`../nics/ngbe` for more details.
 
 * **Updated Solarflare network PMD.**
 
   Updated the Solarflare ``sfc_efx`` driver with changes including:
 
-  * Added COUNT action support for SN1000 NICs
+  * Added COUNT action support for SN1000 NICs.
 
 * **Added inflight packets clear API in vhost library.**
 
-  Added an API which can clear the inflight packets submitted to DMA
-  engine in vhost async data path.
+  Added an API which can clear the inflight packets submitted to the DMA
+  engine in the vhost async data path.
 
 * **Updated Intel QuickAssist crypto PMD.**
 
   Added fourth generation of QuickAssist Technology(QAT) devices support.
-  Only symmetric crypto has been currently enabled, compression and asymmetric
+  Only symmetric crypto has been currently enabled. Compression and asymmetric
   crypto PMD will fail to create.
 
 * **Added support for Marvell CNXK crypto driver.**
 
   * Added cnxk crypto PMD which provides support for an integrated
     crypto driver for CN9K and CN10K series of SOCs. Support for
-    symmetric crypto algorithms is added to both the PMDs.
+    symmetric crypto algorithms was added to both the PMDs.
   * Added support for lookaside protocol (IPsec) offload in cn10k PMD.
   * Added support for asymmetric crypto operations in cn9k and cn10k PMD.
 
 * **Updated Marvell OCTEON TX crypto PMD.**
 
-  Added support for crypto adapter OP_FORWARD mode.
+  Added support for crypto adapter ``OP_FORWARD`` mode.
 
 * **Added support for Nvidia crypto device driver.**
 
@@ -150,14 +150,14 @@ New Features
 
 * **Added Baseband PHY CNXK PMD.**
 
-  Added Baseband PHY PMD which allows to configure BPHY hardware block
+  Added Baseband PHY PMD which allows configuration of the BPHY hardware block
   comprising accelerators and DSPs specifically tailored for 5G/LTE inline
   use cases. Configuration happens via standard rawdev enq/deq operations. See
   the :doc:`../rawdevs/cnxk_bphy` rawdev guide for more details on this driver.
 
 * **Added support for Marvell CN10K, CN9K, event Rx/Tx adapter.**
 
-  * Added Rx/Tx adapter support for event/cnxk when the ethernet device requested
+  * Added Rx/Tx adapter support for event/cnxk when the Ethernet device requested
     is net/cnxk.
   * Added support for event vectorization for Rx/Tx adapter.
 
@@ -165,29 +165,15 @@ New Features
 
   Added support for cppc_cpufreq driver which works on most arm64 platforms.
 
-* **Added multi-queue support to Ethernet PMD Power Management**
+* **Added multi-queue support to Ethernet PMD Power Management.**
 
   The experimental PMD power management API now supports managing
   multiple Ethernet Rx queues per lcore.
 
-* **Updated testpmd to log errors to stderr.**
-
-  Updated testpmd application to log errors and warnings to stderr
-  instead of stdout used before.
-
-
-Removed Items
--------------
-
-.. This section should contain removed items in this release. Sample format:
-
-   * Add a short 1-2 sentence description of the removed item
-     in the past tense.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =======================================================
+* **Updated testpmd to output log errors to stderr.**
 
+  Updated testpmd application to output log errors and warnings to stderr
+  instead of stdout.
 
 API Changes
 -----------
@@ -236,22 +222,6 @@ ABI Changes
 
 * No ABI change that would break compatibility with 20.11.
 
-
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue
-     in the present tense. Add information on any known workarounds.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =======================================================
-
-
 Tested Platforms
 ----------------
 
-- 
2.25.1


^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH v2] doc: announce restructuring of crypto session structs
  @ 2021-08-05 15:03  3%         ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2021-08-05 15:03 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev
  Cc: Anoob Joseph, Nicolau, Radu, Doherty, Declan, hemant.agrawal,
	matan, Ananyev, Konstantin, thomas, asomalap, ruifeng.wang,
	ajit.khaparde, De Lara Guarch, Pablo, Trahe, Fiona,
	Ankur Dwivedi, Michael Shamis, Nagadheeraj Rottela, jianjay.zhou

> Hi Akhil,
> 
> No problem. Glad to help. If you have code ready to share please let me
> know.
> 
I haven't started work on this yet. There are a few items in ABI improvements,
If you could pick some of them, it would be helpful.
I am currently working on PMD interface.
- Security and crypto session structs are next inline.
If you can spend some time, you could work on
rte_cryptodev and rte_cryptodev_data split and hide.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] doc: announce changes to eventdev library
  2021-08-03  4:12  3%   ` Jerin Jacob
                       ` (2 preceding siblings ...)
  2021-08-04  6:06  0%     ` Gujjar, Abhinandan S
@ 2021-08-05 14:22  0%     ` Thomas Monjalon
  3 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-05 14:22 UTC (permalink / raw)
  To: Pavan Nikhilesh, Jerin Jacob
  Cc: Gujjar, Abhinandan S, Erik Gabriel Carrillo, Van Haaren, Harry,
	Hemant Agrawal, McDaniel, Timothy, Liang Ma, Jayatheerthan, Jay,
	dev, Ray Kinsella, Mattias Rönnblom, Jerin Jacob

03/08/2021 06:12, Jerin Jacob:
> On Tue, Aug 3, 2021 at 2:46 AM <pbhagavatula@marvell.com> wrote:
> >
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >
> > Make driver layer as internal, remove unnecessary rte_ prefix for
> > structures and functions that are not a part of public API.
> > Promote experimental trace and vector APIs to stable.
> > Add reserved field to `rte_event_timer` structure.
> >
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> 
> 
> ++ Eventdev driver Maintainers.
> 
> This list is based on items identified for 21.11 ABI improvement at
> https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE/edit#gid=0

    Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
    Acked-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
    Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
    Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>


> > +* eventdev: The file ``rte_eventdev_pmd.h`` will be renamed to ``eventdev_driver.h``
> > +  to make the driver interface as internal and the structures ``rte_eventdev_data``,
> > +  ``rte_eventdev`` and ``rte_eventdevs`` will be moved to a new file named
> > +  ``rte_eventdev_core.h`` in DPDK 21.11.
> > +  The ``rte_`` prefix for internal structures and functions will be removed across the
> > +  library.

If a function is used outside of the library (in drivers),
it is better to keep rte_ prefix to avoid possible clash
with some driver dependencies.

> > +  The experimental eventdev trace APIs and ``rte_event_vector_pool_create``,
> > +  ``rte_event_eth_rx_adapter_vector_limits_get`` will be promoted to stable.
> > +  An 8byte reserved field will be added to the structure ``rte_event_timer`` to
> > +  support future extensions.

Applied, thanks.



^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v7] devtools: script to track map symbols
    2021-08-04 16:23  5% ` [dpdk-dev] [PATCH v6] " Ray Kinsella
@ 2021-08-04 16:27  5% ` Ray Kinsella
      3 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2021-08-04 16:27 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, stephen, ferruh.yigit, thomas, ktraynor, mdr

This script tracks the growth of stable and experimental symbols
over releases since v19.11. The script has the ability to
count the added symbols between two dpdk releases, and to
list experimental symbols present in two dpdk releases
(expired symbols).

example usages:

Count symbols added since v19.11
$ devtools/symbol_tool.py count-symbols

Count symbols added since v20.11
$ devtools/symbol_tool.py count-symbols --releases v20.11,v21.05

List experimental symbols present in v20.11 and v21.05
$ devtools/symbol_tool.py list-expired --releases v20.11,v21.05

List experimental symbols in libraries only, present since v19.11
$ devtools/symbol_tool.py list-expired --directory lib

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
v2: reworked to fix pylint errors
v3: sent with the correct in-reply-to
v4: fix typos picked up by the CI
v5: fix terminal_size & directory args
v6: added list-expired, to list expired experimental symbols
v7: fix typo in comments

 devtools/symbol_tool.py | 377 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 377 insertions(+)
 create mode 100755 devtools/symbol_tool.py

diff --git a/devtools/symbol_tool.py b/devtools/symbol_tool.py
new file mode 100755
index 0000000000..f2a2d43a15
--- /dev/null
+++ b/devtools/symbol_tool.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to count or list symbols in each DPDK release'''
+from pathlib import Path
+import sys
+import os
+import subprocess
+import argparse
+import re
+import datetime
+try:
+    from parsley import makeGrammar
+except ImportError:
+    print('This script uses the package Parsley to parse C Mapfiles.\n'
+          'This can be installed with \"pip install parsley".')
+    sys.exit()
+
+MAP_GRAMMAR = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+def get_abi_versions():
+    '''Returns a string of possible dpdk abi versions'''
+
+    year = datetime.date.today().year - 2000
+    tags = " |".join(['\'{}\''.format(i) \
+                     for i in reversed(range(21, year + 1)) ])
+    tags  = tags + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return tags
+
+def get_dpdk_releases():
+    '''Returns a list of dpdk release tags names  since v19.11'''
+
+    year = datetime.date.today().year - 2000
+    year_range = "|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile(r'^\"v(' +  year_range + r')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    try:
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        print("Failed to interogate git for release tags")
+        sys.exit()
+
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+def fix_directory_name(path):
+    '''Prepend librte to the source directory name'''
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+def directory_renamed(path, rel):
+    '''Fix removal of the librte_ from the directory names'''
+
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    return result
+
+def mapfile_renamed(path, rel):
+    '''Fix renaming of the map file'''
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE,
+                            check=True)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if newfile is not None:
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        try:
+            result = subprocess.run(['git', 'show', tagfile], \
+                                    stdout=subprocess.PIPE, \
+                                    stderr=subprocess.PIPE,
+                                    check=True)
+        except subprocess.CalledProcessError:
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+def mapfile_and_directory_renamed(path, rel):
+    '''Fix renaming of the map file & the source directory'''
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+FIX_STRATEGIES = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+def get_symbols(map_parser, release, mapfile_path):
+    '''Count the symbols for a given release and mapfile'''
+    abi_sections = {}
+
+    tagfile = '{}:{}'.format(release,mapfile_path)
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    for fix_strategy in FIX_STRATEGIES:
+        if result is not None:
+            break
+        result = fix_strategy(mapfile_path, release)
+
+    if result is not None:
+        mapfile = result.stdout.decode('utf-8')
+        abi_sections = map_parser(mapfile).abi()
+
+    return abi_sections
+
+def get_terminal_rows():
+    '''Find the number of rows in the terminal'''
+
+    try:
+        return os.get_terminal_size().lines
+    except IOError:
+        return 0
+
+class SymbolCountOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  dpdk_releases
+
+        self.terminal_rows = get_terminal_rows()
+        self.row = 0
+
+    def set_terminal_output(self,dpdk_rel):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}' + \
+            ''.join(['{:<6}{:<6}'] * (len(dpdk_rel)))
+        self.column_fmt = '{:50}' + \
+            ''.join(['{:<12}'] * (len(dpdk_rel)))
+
+    def set_csv_output(self,dpdk_rel):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},' + \
+            ','.join(['{},{}'] * (len(dpdk_rel)))
+        self.column_fmt = '{},' + \
+            ','.join(['{},'] * (len(dpdk_rel)))
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+        self.row += 1
+
+    def print_row(self, mapfile, symbols):
+        '''Print row of symbol values'''
+        print(self.output_fmt.format(*([mapfile] + symbols)))
+        self.row += 1
+
+        if((self.terminal_rows>0) and ((self.row % self.terminal_rows) == 0)):
+            self.print_columns()
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+
+class ListExpiredOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.terminal = True
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  \
+            ['expired (' + ','.join(dpdk_releases) + ')']
+
+    def set_terminal_output(self, _):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}{:<50}'
+        self.column_fmt = '{:50}{:50}'
+
+    def set_csv_output(self, _):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},{}'
+        self.column_fmt = '{},{}'
+        self.terminal = False
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+
+    def print_row(self, mapfile, symbols):
+        '''Print row of symbol values'''
+
+        for symbol in symbols:
+            print(self.output_fmt.format(mapfile,symbol))
+            if self.terminal :
+                mapfile = ''
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+
+class CountSymbolsAction:
+    ''' Logic to count symbols added since a give release '''
+    IGNORE_SECTIONS = ['EXPERIMENTAL','INTERNAL']
+
+    def __init__(self, mapfile_path, mapfile_parser, format_output):
+        self.path = mapfile_path
+        self.parser = mapfile_parser
+        self.format_output = format_output
+        self.symbols_count = []
+
+    def add_mapfile(self, release):
+        ''' add a version mapfile '''
+        symbol_count = experimental_count = 0
+
+        symbols = get_symbols(self.parser, release, self.path)
+
+        # which versions are present, and we care about
+        abi_vers = [abi_ver \
+                    for abi_ver in symbols \
+                    if abi_ver not in self.IGNORE_SECTIONS]
+
+        for abi_ver in abi_vers:
+            symbol_count += len(symbols[abi_ver])
+
+        # count experimental symbols
+        if 'EXPERIMENTAL' in symbols.keys():
+            experimental_count = len(symbols['EXPERIMENTAL'])
+
+        self.symbols_count += [symbol_count, experimental_count]
+
+    def __del__(self):
+        self.format_output.print_row(self.path.parent.name, self.symbols_count)
+
+class ListExpiredAction:
+    ''' Logic to list expired symbols between two releases '''
+
+    def __init__(self, mapfile_path, mapfile_parser, format_output):
+        self.path = mapfile_path
+        self.parser = mapfile_parser
+        self.format_output = format_output
+        self.experimental_symbols = []
+
+    def add_mapfile(self, release):
+        ''' add a version mapfile '''
+        symbols = get_symbols(self.parser, release, self.path)
+        if 'EXPERIMENTAL' in symbols.keys():
+            self.experimental_symbols.append(symbols['EXPERIMENTAL'])
+
+    def __del__(self):
+        if len(self.experimental_symbols) != 2:
+            return
+
+        tmp = self.experimental_symbols
+        # find symbols present in both dpdk releases
+        intersect_syms = [sym for sym in tmp[0] if sym in tmp[1]]
+
+        # check for empty set
+        if intersect_syms == []:
+            return
+
+        self.format_output.print_row(self.path.parent.name, intersect_syms)
+
+SRC_DIRECTORIES = 'drivers,lib'
+
+ACTIONS = {None: CountSymbolsAction, \
+           'count-symbols': CountSymbolsAction, \
+           'list-expired': ListExpiredAction}
+
+ACTION_OUTPUT = {None: SymbolCountOutput, \
+                 'count-symbols': SymbolCountOutput, \
+                 'list-expired': ListExpiredOutput}
+
+def main():
+    '''Main entry point'''
+
+    dpdk_releases = get_dpdk_releases()
+
+    parser = argparse.ArgumentParser(description='Count symbols in DPDK Libs')
+    parser.add_argument('mode', choices=['count-symbols','list-expired'])
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=SRC_DIRECTORIES.split(','),
+                        default=SRC_DIRECTORIES)
+    parser.add_argument('--releases', \
+                        help='2 x comma separated release tags e.g. \'' \
+                        + ','.join([dpdk_releases[0],dpdk_releases[-1]]) \
+                        + '\'')
+    args = parser.parse_args()
+
+    if args.releases is not None:
+        dpdk_releases = args.releases.split(',')
+
+    if args.mode == 'list-expired':
+        if len(dpdk_releases) < 2:
+            sys.exit('Please specify two releases to compare ' \
+                     'in \'list-expired\' mode.')
+        dpdk_releases = [dpdk_releases[0], dpdk_releases[len(dpdk_releases) - 1]]
+
+    action = ACTIONS[args.mode]
+    format_output = ACTION_OUTPUT[args.mode](args.format_output, dpdk_releases)
+
+    map_grammar = MAP_GRAMMAR.format(get_abi_versions())
+    map_parser = makeGrammar(map_grammar, {})
+
+    format_output.print_columns()
+
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            release_action = action(path, map_parser, format_output)
+
+            for release in dpdk_releases:
+                release_action.add_mapfile(release)
+
+            # all the magic happens in the destructor
+            del release_action
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v6] devtools: script to track map symbols
  @ 2021-08-04 16:23  5% ` Ray Kinsella
  2021-08-04 16:27  5% ` [dpdk-dev] [PATCH v7] " Ray Kinsella
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2021-08-04 16:23 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, stephen, ferruh.yigit, thomas, ktraynor, mdr

This script tracks the growth of stable and experimental symbols
over releases since v19.11. The script has the ability to
count the added symbols between two dpdk releases, and to
list experimental symbols present in two dpdk releases
(expired symbols).

example usages:

Count symbols added since v19.11
$ devtools/symbol_tool.py count-symbols

Count symbols added since v20.11
$ devtools/symbol_tool.py count-symbols --releases v20.11,v21.05

List experimental symbols present in v20.11 and v21.05
$ devtools/symbol_tool.py list-expired --releases v20.11,v21.05

List experimental symbols in libraries only, present since v19.11
$ devtools/symbol_tool.py list-expired --directory lib

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
v2: reworked to fix pylint errors
v3: sent with the correct in-reply-to
v4: fix typos picked up by the CI
v5: fix terminal_size & directory args
v6: added list-expired, to list expired experimental symbols

 devtools/symbol_tool.py | 377 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 377 insertions(+)
 create mode 100755 devtools/symbol_tool.py

diff --git a/devtools/symbol_tool.py b/devtools/symbol_tool.py
new file mode 100755
index 0000000000..63969a131b
--- /dev/null
+++ b/devtools/symbol_tool.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to count or list symbols in each DPDK release'''
+from pathlib import Path
+import sys
+import os
+import subprocess
+import argparse
+import re
+import datetime
+try:
+    from parsley import makeGrammar
+except ImportError:
+    print('This script uses the package Parsley to parse C Mapfiles.\n'
+          'This can be installed with \"pip install parsley".')
+    sys.exit()
+
+MAP_GRAMMAR = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+def get_abi_versions():
+    '''Returns a string of possible dpdk abi versions'''
+
+    year = datetime.date.today().year - 2000
+    tags = " |".join(['\'{}\''.format(i) \
+                     for i in reversed(range(21, year + 1)) ])
+    tags  = tags + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return tags
+
+def get_dpdk_releases():
+    '''Returns a list of dpdk release tags names  since v19.11'''
+
+    year = datetime.date.today().year - 2000
+    year_range = "|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile(r'^\"v(' +  year_range + r')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    try:
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        print("Failed to interogate git for release tags")
+        sys.exit()
+
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+def fix_directory_name(path):
+    '''Prepend librte to the source directory name'''
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+def directory_renamed(path, rel):
+    '''Fix removal of the librte_ from the directory names'''
+
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    return result
+
+def mapfile_renamed(path, rel):
+    '''Fix renaming of the map file'''
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE,
+                            check=True)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if newfile is not None:
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        try:
+            result = subprocess.run(['git', 'show', tagfile], \
+                                    stdout=subprocess.PIPE, \
+                                    stderr=subprocess.PIPE,
+                                    check=True)
+        except subprocess.CalledProcessError:
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+def mapfile_and_directory_renamed(path, rel):
+    '''Fix renaming of the map file & the source directory'''
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+FIX_STRATEGIES = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+def get_symbols(map_parser, release, mapfile_path):
+    '''Count the symbols for a given release and mapfile'''
+    abi_sections = {}
+
+    tagfile = '{}:{}'.format(release,mapfile_path)
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    for fix_strategy in FIX_STRATEGIES:
+        if result is not None:
+            break
+        result = fix_strategy(mapfile_path, release)
+
+    if result is not None:
+        mapfile = result.stdout.decode('utf-8')
+        abi_sections = map_parser(mapfile).abi()
+
+    return abi_sections
+
+def get_terminal_rows():
+    '''Find the number of rows in the terminal'''
+
+    try:
+        return os.get_terminal_size().lines
+    except IOError:
+        return 0
+
+class SymbolCountOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  dpdk_releases
+
+        self.terminal_rows = get_terminal_rows()
+        self.row = 0
+
+    def set_terminal_output(self,dpdk_rel):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}' + \
+            ''.join(['{:<6}{:<6}'] * (len(dpdk_rel)))
+        self.column_fmt = '{:50}' + \
+            ''.join(['{:<12}'] * (len(dpdk_rel)))
+
+    def set_csv_output(self,dpdk_rel):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},' + \
+            ','.join(['{},{}'] * (len(dpdk_rel)))
+        self.column_fmt = '{},' + \
+            ','.join(['{},'] * (len(dpdk_rel)))
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+        self.row += 1
+
+    def print_row(self, mapfile, symbols):
+        '''Print row of symbol values'''
+        print(self.output_fmt.format(*([mapfile] + symbols)))
+        self.row += 1
+
+        if((self.terminal_rows>0) and ((self.row % self.terminal_rows) == 0)):
+            self.print_columns()
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+
+class ListExpiredOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.terminal = True
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  \
+            ['expired (' + ','.join(dpdk_releases) + ')']
+
+    def set_terminal_output(self, _):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}{:<50}'
+        self.column_fmt = '{:50}{:50}'
+
+    def set_csv_output(self, _):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},{}'
+        self.column_fmt = '{},{}'
+        self.terminal = False
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+
+    def print_row(self, mapfile, symbols):
+        '''Print row of symbol values'''
+
+        for symbol in symbols:
+            print(self.output_fmt.format(mapfile,symbol))
+            if self.terminal :
+                mapfile = ''
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+
+class CountSymbolsAction:
+    ''' Logic to count symbols added since a give release '''
+    IGNORE_SECTIONS = ['EXPERIMENTAL','INTERNAL']
+
+    def __init__(self, mapfile_path, mapfile_parser, format_output):
+        self.path = mapfile_path
+        self.parser = mapfile_parser
+        self.format_output = format_output
+        self.symbols_count = []
+
+    def add_mapfile(self, release):
+        ''' add a version mapfile '''
+        symbol_count = experimental_count = 0
+
+        symbols = get_symbols(self.parser, release, self.path)
+
+        # which versions are present, and we care about
+        abi_vers = [abi_ver \
+                    for abi_ver in symbols \
+                    if abi_ver not in self.IGNORE_SECTIONS]
+
+        for abi_ver in abi_vers:
+            symbol_count += len(symbols[abi_ver])
+
+        # count experimental symbols
+        if 'EXPERIMENTAL' in symbols.keys():
+            experimental_count = len(symbols['EXPERIMENTAL'])
+
+        self.symbols_count += [symbol_count, experimental_count]
+
+    def __del__(self):
+        self.format_output.print_row(self.path.parent.name, self.symbols_count)
+
+class ListExpiredAction:
+    ''' Logic to list expired symbols between two releases '''
+
+    def __init__(self, mapfile_path, mapfile_parser, format_output):
+        self.path = mapfile_path
+        self.parser = mapfile_parser
+        self.format_output = format_output
+        self.experimental_symbols = []
+
+    def add_mapfile(self, release):
+        ''' add a version mapfile '''
+        symbols = get_symbols(self.parser, release, self.path)
+        if 'EXPERIMENTAL' in symbols.keys():
+            self.experimental_symbols.append(symbols['EXPERIMENTAL'])
+
+    def __del__(self):
+        if len(self.experimental_symbols) != 2:
+            return
+
+        tmp = self.experimental_symbols
+        # find symbols present in both dpdk releases
+        intersect_syms = [sym for sym in tmp[0] if sym in tmp[1]]
+
+        # check for empty set
+        if intersect_syms == []:
+            return
+
+        self.format_output.print_row(self.path.parent.name, intersect_syms)
+
+SRC_DIRECTORIES = 'drivers,lib'
+
+ACTIONS = {None: CountSymbolsAction, \
+           'count-symbols': CountSymbolsAction, \
+           'list-expired': ListExpiredAction}
+
+ACTION_OUTPUT = {None: SymbolCountOutput, \
+                 'count-symbols': SymbolCountOutput, \
+                 'list-expired': ListExpiredOutput}
+
+def main():
+    '''Main entry point'''
+
+    dpdk_releases = get_dpdk_releases()
+
+    parser = argparse.ArgumentParser(description='Count symbols in DPDK Libs')
+    parser.add_argument('mode', choices=['count-symbols','list-expired'])
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=SRC_DIRECTORIES.split(','),
+                        default=SRC_DIRECTORIES)
+    parser.add_argument('--releases', \
+                        help='2 x comma seperated release tags e.g. \'' \
+                        + ','.join([dpdk_releases[0],dpdk_releases[-1]]) \
+                        + '\'')
+    args = parser.parse_args()
+
+    if args.releases is not None:
+        dpdk_releases = args.releases.split(',')
+
+    if args.mode == 'list-expired':
+        if len(dpdk_releases) < 2:
+            sys.exit('Please specify two releases to compare ' \
+                     'in \'list-expired\' mode.')
+        dpdk_releases = [dpdk_releases[0], dpdk_releases[len(dpdk_releases) - 1]]
+
+    action = ACTIONS[args.mode]
+    format_output = ACTION_OUTPUT[args.mode](args.format_output, dpdk_releases)
+
+    map_grammar = MAP_GRAMMAR.format(get_abi_versions())
+    map_parser = makeGrammar(map_grammar, {})
+
+    format_output.print_columns()
+
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            release_action = action(path, map_parser, format_output)
+
+            for release in dpdk_releases:
+                release_action.add_mapfile(release)
+
+            # all the magic happens in the destructor
+            del release_action
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH] doc: announce cryptodev-PMD interface as internal
  2021-08-04  8:44  0%     ` Hemant Agrawal
@ 2021-08-04 14:35  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-04 14:35 UTC (permalink / raw)
  To: Akhil Goyal
  Cc: Matan Azrad, Ajit Khaparde, dev, anoobj, Radu Nicolau, Doherty,
	Declan, Ananyev, Konstantin, Zhang, Roy Fan,
	Somalapuram Amaranath, Ruifeng Wang, Pablo de Lara, Fiona Trahe,
	adwivedi, michaelsh, rnagadheeraj, Jay Zhou, Hemant Agrawal

> > > > The APIs which are internal to PMD and cryptodev library can be
> > > > marked as internal so that ABI checking do not shout for changes in
> > > > APIs which are internal to DPDK.
> > > >
> > > > Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> > > Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> > Acked-by: Matan Azrad <matan@nvidia.com>
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

Applied, thanks.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal
  2021-08-03  4:05  0%   ` Jerin Jacob
@ 2021-08-04 14:22  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-04 14:22 UTC (permalink / raw)
  To: Harman Kalra
  Cc: Xia, Chenbo, dev, jerinj, david.marchand, Ray Kinsella, Jerin Jacob

> > > Moving struct rte_intr_handle as an internal structure to
> > > avoid any ABI breakages in future. Since this structure defines
> > > some static arrays and changing respective macros breaks the ABI.
> > > Eg:
> > > Currently RTE_MAX_RXTX_INTR_VEC_ID imposes a limit of maximum 512
> > > MSI-X interrupts that can be defined for a PCI device, while PCI
> > > specification allows maximum 2048 MSI-X interrupts that can be used.
> > > If some PCI device requires more than 512 vectors, either change the
> > > RTE_MAX_RXTX_INTR_VEC_ID limit or dynamically allocate based on
> > > PCI device MSI-X size on probe time. Either way its an ABI breakage.
> > >
> > > Discussion thread:
> > > https://mails.dpdk.org/archives/dev/2021-March/202959.html
> > >
> > > Change already included in 21.11 ABI improvement spreadsheet (item 42):
> > > https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9U
> > > xeyfE/edit#gid=0
> > >
> > > Signed-off-by: Harman Kalra <hkalra@marvell.com>
> > > ---
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > +* eal: Making ``struct rte_intr_handle`` internal to avoid any ABI breakages
> > > +  in future.
> > > +
> >
> > Acked-by: Chenbo Xia <chenbo.xia@intel.com>
> 
> Acked-by: Jerin Jacob <jerinj@marvell.com>

Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

Applied, thanks.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
  2021-08-04 13:53  0%                 ` Kinsella, Ray
@ 2021-08-04 14:13  4%                   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-04 14:13 UTC (permalink / raw)
  To: Xueming(Steven) Li, Kinsella, Ray; +Cc: dpdk-dev, techboard

04/08/2021 15:53, Kinsella, Ray:
> On 04/08/2021 14:12, Thomas Monjalon wrote:
> > 04/08/2021 15:00, Xueming(Steven) Li:
> >> From: Kinsella, Ray <mdr@ashroe.eu>
> >>> On 04/08/2021 13:11, Xueming(Steven) Li wrote:
> >>>> From: Kinsella, Ray <mdr@ashroe.eu>
> >>>>> Its not strictly a depreciation notice though, you are not breaking anything right.
> >>>>> Since you are not breaking anything, don't think the notice is required in the 21.11 timeframe.
> >>>>>
> >>>>> Now if you where doing it in 21.08, it would be an ABI change and that would be a different story.
> >>>>
> >>>> Thanks for looking at this!
> >>>> Yes, it targets to 21.11. The offloading flag is fine, but the shared_group does break ABI, detail:
> >>>> 	https://mails.dpdk.org/archives/dev/2021-July/215575.html
> >>>
> >>> Right ... its a new field, not a depreciation as such.
> >>> What I mean by this is that no existing code is broken.
> >>>
> >>> 21.11 is a new ABI in any case and you are not depreciating anything, so no notice is required.
> >>
> >> Maybe it a new process, confirmed with Thomas, it's expected:
> >> https://doc.dpdk.org/guides/contributing/abi_policy.html#abi-changes
> > 
> > I think what Ray means is that it breaks ABI but not API,
> > so he doesn't consider a notice is required.
> 
> > My understanding of the policy is that *any* ABI change requires a notice.
> > But if you want to make it lighter and allow any non-announced ABI change
> > in an ABI-breaking release, I think I would vote for.
> 
> Thanks for clarifying Thomas ... you are correct.

In the meantime, let's review and ack notices, even if ABI-only change:
https://patches.dpdk.org/bundle/tmonjalo/deprecation-notices/

We'll discuss later if we can accept more ABI change,
but we should try to be on the safe side for those already announced.



^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
  2021-08-04 13:12  5%               ` Thomas Monjalon
@ 2021-08-04 13:53  0%                 ` Kinsella, Ray
  2021-08-04 14:13  4%                   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-08-04 13:53 UTC (permalink / raw)
  To: Thomas Monjalon, Xueming(Steven) Li; +Cc: dpdk-dev, techboard



On 04/08/2021 14:12, Thomas Monjalon wrote:
> 04/08/2021 15:00, Xueming(Steven) Li:
>> From: Kinsella, Ray <mdr@ashroe.eu>
>>> On 04/08/2021 13:11, Xueming(Steven) Li wrote:
>>>> From: Kinsella, Ray <mdr@ashroe.eu>
>>>>> Its not strictly a depreciation notice though, you are not breaking anything right.
>>>>> Since you are not breaking anything, don't think the notice is required in the 21.11 timeframe.
>>>>>
>>>>> Now if you where doing it in 21.08, it would be an ABI change and that would be a different story.
>>>>
>>>> Thanks for looking at this!
>>>> Yes, it targets to 21.11. The offloading flag is fine, but the shared_group does break ABI, detail:
>>>> 	https://mails.dpdk.org/archives/dev/2021-July/215575.html
>>>
>>> Right ... its a new field, not a depreciation as such.
>>> What I mean by this is that no existing code is broken.
>>>
>>> 21.11 is a new ABI in any case and you are not depreciating anything, so no notice is required.
>>
>> Maybe it a new process, confirmed with Thomas, it's expected:
>> https://doc.dpdk.org/guides/contributing/abi_policy.html#abi-changes
> 
> I think what Ray means is that it breaks ABI but not API,
> so he doesn't consider a notice is required.

> My understanding of the policy is that *any* ABI change requires a notice.
> But if you want to make it lighter and allow any non-announced ABI change
> in an ABI-breaking release, I think I would vote for.

Thanks for clarifying Thomas ... you are correct. 

> 
> Cc techboard@dpdk.org
> 
 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
  2021-08-04 13:00  3%             ` Xueming(Steven) Li
@ 2021-08-04 13:12  5%               ` Thomas Monjalon
  2021-08-04 13:53  0%                 ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-08-04 13:12 UTC (permalink / raw)
  To: Kinsella, Ray, Xueming(Steven) Li; +Cc: dpdk-dev, techboard

04/08/2021 15:00, Xueming(Steven) Li:
> From: Kinsella, Ray <mdr@ashroe.eu>
> > On 04/08/2021 13:11, Xueming(Steven) Li wrote:
> > > From: Kinsella, Ray <mdr@ashroe.eu>
> > >> Its not strictly a depreciation notice though, you are not breaking anything right.
> > >> Since you are not breaking anything, don't think the notice is required in the 21.11 timeframe.
> > >>
> > >> Now if you where doing it in 21.08, it would be an ABI change and that would be a different story.
> > >
> > > Thanks for looking at this!
> > > Yes, it targets to 21.11. The offloading flag is fine, but the shared_group does break ABI, detail:
> > > 	https://mails.dpdk.org/archives/dev/2021-July/215575.html
> > 
> > Right ... its a new field, not a depreciation as such.
> > What I mean by this is that no existing code is broken.
> > 
> > 21.11 is a new ABI in any case and you are not depreciating anything, so no notice is required.
> 
> Maybe it a new process, confirmed with Thomas, it's expected:
> https://doc.dpdk.org/guides/contributing/abi_policy.html#abi-changes

I think what Ray means is that it breaks ABI but not API,
so he doesn't consider a notice is required.
My understanding of the policy is that *any* ABI change requires a notice.
But if you want to make it lighter and allow any non-announced ABI change
in an ABI-breaking release, I think I would vote for.

Cc techboard@dpdk.org



^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
  2021-08-04 12:14  3%           ` Kinsella, Ray
@ 2021-08-04 13:00  3%             ` Xueming(Steven) Li
  2021-08-04 13:12  5%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-08-04 13:00 UTC (permalink / raw)
  To: Kinsella, Ray, dpdk-dev



> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: Wednesday, August 4, 2021 8:14 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; dpdk-dev <dev@dpdk.org>
> Subject: Re: [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
> 
> 
> 
> On 04/08/2021 13:11, Xueming(Steven) Li wrote:
> >
> >
> >> -----Original Message-----
> >> From: Kinsella, Ray <mdr@ashroe.eu>
> >> Sent: Wednesday, August 4, 2021 7:46 PM
> >> To: Xueming(Steven) Li <xuemingl@nvidia.com>
> >> Subject: Re: [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
> >>
> >>
> >>
> >> On 04/08/2021 12:21, Xueming(Steven) Li wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Kinsella, Ray <mdr@ashroe.eu>
> >>>> Sent: Wednesday, August 4, 2021 6:00 PM
> >>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
> >>>> Cc: dev@dpdk.org; Wang Haiyue <haiyue.wang@intel.com>;
> >>>> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Neil Horman
> >>>> <nhorman@tuxdriver.com>
> >>>> Subject: Re: [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
> >>>>
> >>>>
> >>>>
> >>>> On 25/06/2021 12:47, Xueming Li wrote:
> >>>>> Auxiliary bus [1] provides a way to split function into
> >>>>> child-devices representing sub-domains of functionality. Each
> >>>>> auxiliary device represents a part of its parent functionality.
> >>>>>
> >>>>> Auxiliary device is identified by unique device name, sysfs path:
> >>>>>   /sys/bus/auxiliary/devices/<name>
> >>>>>
> >>>>> Devargs legacy syntax ofauxiliary device:
> >>>>>   -a auxiliary:<name>[,args...]
> >>>>> Devargs generic syntax of auxiliary device:
> >>>>>   -a
> >>>>> bus=auxiliary,name=<name>,,/class=<classs>,,/driver=<driver>,,
> >>>>>
> >>>>> [1] kernel auxiliary bus document:
> >>>>> https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.ht
> >>>>> ml
> >>>>>
> >>>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> >>>>> Cc: Wang Haiyue <haiyue.wang@intel.com>
> >>>>> Cc: Thomas Monjalon <thomas@monjalon.net>
> >>>>> Cc: Kinsella Ray <mdr@ashroe.eu>
> >>>>> ---
> >>>>>  MAINTAINERS                               |   5 +
> >>>>>  doc/guides/rel_notes/release_21_08.rst    |   6 +
> >>>>>  drivers/bus/auxiliary/auxiliary_common.c  | 411
> >>>>> ++++++++++++++++++++++  drivers/bus/auxiliary/auxiliary_params.c
> >>>>> ++++++++++++++++++++++ |
> >>>>> ++++++++++++++++++++++ 59 ++++
> >>>>>  drivers/bus/auxiliary/linux/auxiliary.c   | 141 ++++++++
> >>>>>  drivers/bus/auxiliary/meson.build         |  16 +
> >>>>>  drivers/bus/auxiliary/private.h           |  74 ++++
> >>>>>  drivers/bus/auxiliary/rte_bus_auxiliary.h | 201 +++++++++++
> >>>>>  drivers/bus/auxiliary/version.map         |   7 +
> >>>>>  drivers/bus/meson.build                   |   1 +
> >>>>>  10 files changed, 921 insertions(+)  create mode 100644
> >>>>> drivers/bus/auxiliary/auxiliary_common.c
> >>>>>  create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
> >>>>>  create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
> >>>>>  create mode 100644 drivers/bus/auxiliary/meson.build  create mode
> >>>>> 100644 drivers/bus/auxiliary/private.h  create mode 100644
> >>>>> drivers/bus/auxiliary/rte_bus_auxiliary.h
> >>>>>  create mode 100644 drivers/bus/auxiliary/version.map
> >>>>>
> >>>>
> >>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> >>>
> >>> Thanks, but this patch already integrated :)
> >>
> >> It appears in the order in which I am going through my email is
> >> incorrect. :-)
> >>
> >>>
> >>> Would you like to have a look at another deprecation notice? Andrew reviewed RFC:
> >>> https://mails.dpdk.org/archives/dev/2021-August/216007.html
> >>>
> >>
> >> Its not strictly a depreciation notice though, you are not breaking anything right.
> >> Since you are not breaking anything, don't think the notice is required in the 21.11 timeframe.
> >>
> >> Now if you where doing it in 21.08, it would be an ABI change and that would be a different story.
> >
> > Thanks for looking at this!
> > Yes, it targets to 21.11. The offloading flag is fine, but the shared_group does break ABI, detail:
> > 	https://mails.dpdk.org/archives/dev/2021-July/215575.html
> 
> Right ... its a new field, not a depreciation as such.
> What I mean by this is that no existing code is broken.
> 
> 21.11 is a new ABI in any case and you are not depreciating anything, so no notice is required.

Maybe it a new process, confirmed with Thomas, it's expected:
https://doc.dpdk.org/guides/contributing/abi_policy.html#abi-changes

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
       [not found]             ` <DM4PR12MB53736410D2C07101F872363EA1F19@DM4PR12MB5373.namprd12.prod.outlook.com>
@ 2021-08-04 12:14  3%           ` Kinsella, Ray
  2021-08-04 13:00  3%             ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-08-04 12:14 UTC (permalink / raw)
  To: Xueming(Steven) Li, dpdk-dev



On 04/08/2021 13:11, Xueming(Steven) Li wrote:
> 
> 
>> -----Original Message-----
>> From: Kinsella, Ray <mdr@ashroe.eu>
>> Sent: Wednesday, August 4, 2021 7:46 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>> Subject: Re: [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
>>
>>
>>
>> On 04/08/2021 12:21, Xueming(Steven) Li wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Kinsella, Ray <mdr@ashroe.eu>
>>>> Sent: Wednesday, August 4, 2021 6:00 PM
>>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>>>> Cc: dev@dpdk.org; Wang Haiyue <haiyue.wang@intel.com>;
>>>> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Neil Horman
>>>> <nhorman@tuxdriver.com>
>>>> Subject: Re: [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
>>>>
>>>>
>>>>
>>>> On 25/06/2021 12:47, Xueming Li wrote:
>>>>> Auxiliary bus [1] provides a way to split function into
>>>>> child-devices representing sub-domains of functionality. Each
>>>>> auxiliary device represents a part of its parent functionality.
>>>>>
>>>>> Auxiliary device is identified by unique device name, sysfs path:
>>>>>   /sys/bus/auxiliary/devices/<name>
>>>>>
>>>>> Devargs legacy syntax ofauxiliary device:
>>>>>   -a auxiliary:<name>[,args...]
>>>>> Devargs generic syntax of auxiliary device:
>>>>>   -a bus=auxiliary,name=<name>,,/class=<classs>,,/driver=<driver>,,
>>>>>
>>>>> [1] kernel auxiliary bus document:
>>>>> https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
>>>>>
>>>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
>>>>> Cc: Wang Haiyue <haiyue.wang@intel.com>
>>>>> Cc: Thomas Monjalon <thomas@monjalon.net>
>>>>> Cc: Kinsella Ray <mdr@ashroe.eu>
>>>>> ---
>>>>>  MAINTAINERS                               |   5 +
>>>>>  doc/guides/rel_notes/release_21_08.rst    |   6 +
>>>>>  drivers/bus/auxiliary/auxiliary_common.c  | 411
>>>>> ++++++++++++++++++++++  drivers/bus/auxiliary/auxiliary_params.c  |
>>>>> ++++++++++++++++++++++ 59 ++++
>>>>>  drivers/bus/auxiliary/linux/auxiliary.c   | 141 ++++++++
>>>>>  drivers/bus/auxiliary/meson.build         |  16 +
>>>>>  drivers/bus/auxiliary/private.h           |  74 ++++
>>>>>  drivers/bus/auxiliary/rte_bus_auxiliary.h | 201 +++++++++++
>>>>>  drivers/bus/auxiliary/version.map         |   7 +
>>>>>  drivers/bus/meson.build                   |   1 +
>>>>>  10 files changed, 921 insertions(+)  create mode 100644
>>>>> drivers/bus/auxiliary/auxiliary_common.c
>>>>>  create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
>>>>>  create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
>>>>>  create mode 100644 drivers/bus/auxiliary/meson.build  create mode
>>>>> 100644 drivers/bus/auxiliary/private.h  create mode 100644
>>>>> drivers/bus/auxiliary/rte_bus_auxiliary.h
>>>>>  create mode 100644 drivers/bus/auxiliary/version.map
>>>>>
>>>>
>>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>>
>>> Thanks, but this patch already integrated :)
>>
>> It appears in the order in which I am going through my email is incorrect. :-)
>>
>>>
>>> Would you like to have a look at another deprecation notice? Andrew reviewed RFC:
>>> https://mails.dpdk.org/archives/dev/2021-August/216007.html
>>>
>>
>> Its not strictly a depreciation notice though, you are not breaking anything right.
>> Since you are not breaking anything, don't think the notice is required in the 21.11 timeframe.
>>
>> Now if you where doing it in 21.08, it would be an ABI change and that would be a different story.
> 
> Thanks for looking at this!
> Yes, it targets to 21.11. The offloading flag is fine, but the shared_group does break ABI, detail:
> 	https://mails.dpdk.org/archives/dev/2021-July/215575.html

Right ... its a new field, not a depreciation as such.
What I mean by this is that no existing code is broken.

21.11 is a new ABI in any case and you are not depreciating anything, so no notice is required. 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v5] doc: policy on the promotion of experimental APIs
  2021-08-04 10:39  3%   ` Thomas Monjalon
@ 2021-08-04 11:49  0%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-08-04 11:49 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, bruce.richardson, john.mcnamara, roretzla, ferruh.yigit,
	david.marchand, stephen, jerinjacobk



On 04/08/2021 11:39, Thomas Monjalon wrote:
> 04/08/2021 11:34, Ray Kinsella:
>> Clarifying the ABI policy on the promotion of experimental APIS to stable.
>> We have a fair number of APIs that have been experimental for more than
>> 2 years. This policy amendment indicates that these APIs should be
>> promoted or removed, or should at least form a conservation between the
> 
> s/conservation/conversation/
> 
>> maintainer and original contributor.
>>
>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>> ---
>> +#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
>> +   changed or removed without prior notice, as they are not considered part of
>> +   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
>> +   is not an indefinite state.
> [...]
>> +Promotion to stable
>> +~~~~~~~~~~~~~~~~~~~
>> +
>> +An API's ``experimental`` status should be reviewed annually, by both the
>> +maintainer and/or the original contributor. Ordinarily APIs marked as
>> +``experimental`` will be promoted to the stable ABI once a maintainer has become
>> +satisfied that the API is mature and is unlikely to change.
>> +
>> +In exceptional circumstances, should an API still be classified as
>> +``experimental`` after two years and is without any prospect of becoming part of
>> +the stable API. The API will then become a candidate for removal, to avoid the
>> +accumulation of abandoned symbols.
>> +
>> +Should an API's Binary Interface change, usually due to a direct change to the
> 
> API's Binary Interface?
> I assume you mean ABI.
> 
>> +API's signature, it is reasonable for the review and expiry clocks to reset. The
>> +promotion or removal of symbols will typically form part of a conversation
>> +between the maintainer and the original contributor.
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>
> 
> Applied with above changes, thanks.
> 

Thanks.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5] doc: policy on the promotion of experimental APIs
  2021-08-04  9:34 23% ` [dpdk-dev] [PATCH v5] " Ray Kinsella
@ 2021-08-04 10:39  3%   ` Thomas Monjalon
  2021-08-04 11:49  0%     ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-08-04 10:39 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, bruce.richardson, john.mcnamara, roretzla, ferruh.yigit,
	david.marchand, stephen, jerinjacobk

04/08/2021 11:34, Ray Kinsella:
> Clarifying the ABI policy on the promotion of experimental APIS to stable.
> We have a fair number of APIs that have been experimental for more than
> 2 years. This policy amendment indicates that these APIs should be
> promoted or removed, or should at least form a conservation between the

s/conservation/conversation/

> maintainer and original contributor.
> 
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
> +#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
> +   changed or removed without prior notice, as they are not considered part of
> +   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
> +   is not an indefinite state.
[...]
> +Promotion to stable
> +~~~~~~~~~~~~~~~~~~~
> +
> +An API's ``experimental`` status should be reviewed annually, by both the
> +maintainer and/or the original contributor. Ordinarily APIs marked as
> +``experimental`` will be promoted to the stable ABI once a maintainer has become
> +satisfied that the API is mature and is unlikely to change.
> +
> +In exceptional circumstances, should an API still be classified as
> +``experimental`` after two years and is without any prospect of becoming part of
> +the stable API. The API will then become a candidate for removal, to avoid the
> +accumulation of abandoned symbols.
> +
> +Should an API's Binary Interface change, usually due to a direct change to the

API's Binary Interface?
I assume you mean ABI.

> +API's signature, it is reasonable for the review and expiry clocks to reset. The
> +promotion or removal of symbols will typically form part of a conversation
> +between the maintainer and the original contributor.

Acked-by: Thomas Monjalon <thomas@monjalon.net>

Applied with above changes, thanks.



^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5] doc: policy on the promotion of experimental APIs
      2021-08-03 16:44 23% ` [dpdk-dev] [PATCH v4] " Ray Kinsella
@ 2021-08-04  9:34 23% ` Ray Kinsella
  2021-08-04 10:39  3%   ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Ray Kinsella @ 2021-08-04  9:34 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, roretzla, ferruh.yigit, thomas,
	david.marchand, stephen, jerinjacobk, Ray Kinsella

Clarifying the ABI policy on the promotion of experimental APIS to stable.
We have a fair number of APIs that have been experimental for more than
2 years. This policy amendment indicates that these APIs should be
promoted or removed, or should at least form a conservation between the
maintainer and original contributor.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
v2: comments on abi expiry from Tyler Retzlaff.
v3: typos in the git commit message
v4: typos and comments by Jerin Jacob
v5: typos caught by the CI

 doc/guides/contributing/abi_policy.rst | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index 4ad87dbfed..520763b63a 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -26,9 +26,10 @@ General Guidelines
    symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
 #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
    once approved these will form part of the next ABI version.
-#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
-   be changed or removed without prior notice, as they are not considered part
-   of an ABI version.
+#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
+   changed or removed without prior notice, as they are not considered part of
+   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
+   is not an indefinite state.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -358,3 +359,21 @@ Libraries
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
 version.
 All functions in such libraries may be changed or removed without prior notice.
+
+Promotion to stable
+~~~~~~~~~~~~~~~~~~~
+
+An API's ``experimental`` status should be reviewed annually, by both the
+maintainer and/or the original contributor. Ordinarily APIs marked as
+``experimental`` will be promoted to the stable ABI once a maintainer has become
+satisfied that the API is mature and is unlikely to change.
+
+In exceptional circumstances, should an API still be classified as
+``experimental`` after two years and is without any prospect of becoming part of
+the stable API. The API will then become a candidate for removal, to avoid the
+accumulation of abandoned symbols.
+
+Should an API's Binary Interface change, usually due to a direct change to the
+API's signature, it is reasonable for the review and expiry clocks to reset. The
+promotion or removal of symbols will typically form part of a conversation
+between the maintainer and the original contributor.
-- 
2.26.2


^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH] doc: announce cryptodev-PMD interface as internal
  2021-08-04  6:44  0%   ` Matan Azrad
@ 2021-08-04  8:44  0%     ` Hemant Agrawal
  2021-08-04 14:35  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Hemant Agrawal @ 2021-08-04  8:44 UTC (permalink / raw)
  To: Matan Azrad, Ajit Khaparde, Akhil Goyal
  Cc: dpdk-dev, anoobj, Radu Nicolau, Doherty, Declan, Ananyev,
	Konstantin, NBU-Contact-Thomas Monjalon, Zhang, Roy Fan,
	Somalapuram Amaranath, Ruifeng Wang, Pablo de Lara, Fiona Trahe,
	adwivedi, michaelsh, rnagadheeraj, Jay Zhou

> 
> From: Ajit Khaparde
> > > The APIs which are internal to PMD and cryptodev library can be
> > > marked as internal so that ABI checking do not shout for changes in
> > > APIs which are internal to DPDK.
> > >
> > > Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> > Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce cryptodev-PMD interface as internal
  2021-08-03 19:25  0% ` Ajit Khaparde
@ 2021-08-04  6:44  0%   ` Matan Azrad
  2021-08-04  8:44  0%     ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Matan Azrad @ 2021-08-04  6:44 UTC (permalink / raw)
  To: Ajit Khaparde, Akhil Goyal
  Cc: dpdk-dev, anoobj, Radu Nicolau, Doherty, Declan, Hemant Agrawal,
	Ananyev, Konstantin, NBU-Contact-Thomas Monjalon, Zhang, Roy Fan,
	Somalapuram Amaranath, Ruifeng Wang, Pablo de Lara, Fiona Trahe,
	adwivedi, michaelsh, rnagadheeraj, Jay Zhou



From: Ajit Khaparde
> > The APIs which are internal to PMD and cryptodev library
> > can be marked as internal so that ABI checking do not
> > shout for changes in APIs which are internal to DPDK.
> >
> > Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Matan Azrad <matan@nvidia.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] doc: announce changes to eventdev library
  2021-08-03  4:12  3%   ` Jerin Jacob
  2021-08-03  8:32  0%     ` Mattias Rönnblom
  2021-08-04  5:57  0%     ` Jayatheerthan, Jay
@ 2021-08-04  6:06  0%     ` Gujjar, Abhinandan S
  2021-08-05 14:22  0%     ` Thomas Monjalon
  3 siblings, 0 replies; 200+ results
From: Gujjar, Abhinandan S @ 2021-08-04  6:06 UTC (permalink / raw)
  To: Jerin Jacob, Pavan Nikhilesh, Carrillo, Erik G, Van Haaren,
	Harry, Hemant Agrawal, McDaniel, Timothy, Liang Ma,
	Jayatheerthan, Jay
  Cc: Jerin Jacob, Ray Kinsella, dpdk-dev, mattias.ronnblom, Thomas Monjalon



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, August 3, 2021 9:43 AM
> To: Pavan Nikhilesh <pbhagavatula@marvell.com>; Gujjar, Abhinandan S
> <abhinandan.gujjar@intel.com>; Carrillo, Erik G <erik.g.carrillo@intel.com>;
> Van Haaren, Harry <harry.van.haaren@intel.com>; Hemant Agrawal
> <hemant.agrawal@nxp.com>; McDaniel, Timothy
> <timothy.mcdaniel@intel.com>; Liang Ma <liang.j.ma@intel.com>;
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Cc: Jerin Jacob <jerinj@marvell.com>; Ray Kinsella <mdr@ashroe.eu>; dpdk-
> dev <dev@dpdk.org>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>; Thomas Monjalon
> <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce changes to eventdev
> library
> 
> On Tue, Aug 3, 2021 at 2:46 AM <pbhagavatula@marvell.com> wrote:
> >
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >
> > Make driver layer as internal, remove unnecessary rte_ prefix for
> > structures and functions that are not a part of public API.
> > Promote experimental trace and vector APIs to stable.
> > Add reserved field to `rte_event_timer` structure.
> >
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> 
> 
> ++ Eventdev driver Maintainers.
> 
> This list is based on items identified for 21.11 ABI improvement at
> https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW
> 6voH5Dqv9UxeyfE/edit#gid=0
> 
> 
> > ---
> >  v2 Changes:
> >  - Fix build issues.
> >
> >  doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index d9c0e65921..6ac321eb1e 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -158,3 +158,14 @@ Deprecation Notices
> >  * security: The functions ``rte_security_set_pkt_metadata`` and
> >    ``rte_security_get_userdata`` will be made inline functions and additional
> >    flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
> > +
> > +* eventdev: The file ``rte_eventdev_pmd.h`` will be renamed to
> > +``eventdev_driver.h``
> > +  to make the driver interface as internal and the structures
> > +``rte_eventdev_data``,
> > +  ``rte_eventdev`` and ``rte_eventdevs`` will be moved to a new file
> > +named
> > +  ``rte_eventdev_core.h`` in DPDK 21.11.
> > +  The ``rte_`` prefix for internal structures and functions will be
> > +removed across the
> > +  library.
> > +  The experimental eventdev trace APIs and
> > +``rte_event_vector_pool_create``,
> > +  ``rte_event_eth_rx_adapter_vector_limits_get`` will be promoted to
> stable.
> > +  An 8byte reserved field will be added to the structure
> > +``rte_event_timer`` to
> > +  support future extensions.
> > --
> > 2.17.1
> >
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] doc: announce changes to eventdev library
  2021-08-03  4:12  3%   ` Jerin Jacob
  2021-08-03  8:32  0%     ` Mattias Rönnblom
@ 2021-08-04  5:57  0%     ` Jayatheerthan, Jay
  2021-08-04  6:06  0%     ` Gujjar, Abhinandan S
  2021-08-05 14:22  0%     ` Thomas Monjalon
  3 siblings, 0 replies; 200+ results
From: Jayatheerthan, Jay @ 2021-08-04  5:57 UTC (permalink / raw)
  To: Jerin Jacob, Pavan Nikhilesh, Gujjar, Abhinandan S, Carrillo,
	Erik G, Van Haaren, Harry, Hemant Agrawal, McDaniel, Timothy,
	Liang Ma
  Cc: Jerin Jacob, Ray Kinsella, dpdk-dev, mattias.ronnblom, Thomas Monjalon

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, August 3, 2021 9:43 AM
> To: Pavan Nikhilesh <pbhagavatula@marvell.com>; Gujjar, Abhinandan S <abhinandan.gujjar@intel.com>; Carrillo, Erik G
> <erik.g.carrillo@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; Hemant Agrawal <hemant.agrawal@nxp.com>;
> McDaniel, Timothy <timothy.mcdaniel@intel.com>; Liang Ma <liang.j.ma@intel.com>; Jayatheerthan, Jay
> <jay.jayatheerthan@intel.com>
> Cc: Jerin Jacob <jerinj@marvell.com>; Ray Kinsella <mdr@ashroe.eu>; dpdk-dev <dev@dpdk.org>; mattias.ronnblom
> <mattias.ronnblom@ericsson.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce changes to eventdev library
> 
> On Tue, Aug 3, 2021 at 2:46 AM <pbhagavatula@marvell.com> wrote:
> >
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >
> > Make driver layer as internal, remove unnecessary rte_ prefix for
> > structures and functions that are not a part of public API.
> > Promote experimental trace and vector APIs to stable.
> > Add reserved field to `rte_event_timer` structure.
> >
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> 
> 
> ++ Eventdev driver Maintainers.
> 
> This list is based on items identified for 21.11 ABI improvement at
> https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE/edit#gid=0
> 
> 
> > ---
> >  v2 Changes:
> >  - Fix build issues.
> >
> >  doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> > index d9c0e65921..6ac321eb1e 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -158,3 +158,14 @@ Deprecation Notices
> >  * security: The functions ``rte_security_set_pkt_metadata`` and
> >    ``rte_security_get_userdata`` will be made inline functions and additional
> >    flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
> > +
> > +* eventdev: The file ``rte_eventdev_pmd.h`` will be renamed to ``eventdev_driver.h``
> > +  to make the driver interface as internal and the structures ``rte_eventdev_data``,
> > +  ``rte_eventdev`` and ``rte_eventdevs`` will be moved to a new file named
> > +  ``rte_eventdev_core.h`` in DPDK 21.11.
> > +  The ``rte_`` prefix for internal structures and functions will be removed across the
> > +  library.
> > +  The experimental eventdev trace APIs and ``rte_event_vector_pool_create``,
> > +  ``rte_event_eth_rx_adapter_vector_limits_get`` will be promoted to stable.
> > +  An 8byte reserved field will be added to the structure ``rte_event_timer`` to
> > +  support future extensions.
> > --
> > 2.17.1
> >

Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce cryptodev-PMD interface as internal
  2021-08-03 11:44  3% [dpdk-dev] [PATCH] doc: announce cryptodev-PMD interface as internal Akhil Goyal
@ 2021-08-03 19:25  0% ` Ajit Khaparde
  2021-08-04  6:44  0%   ` Matan Azrad
  0 siblings, 1 reply; 200+ results
From: Ajit Khaparde @ 2021-08-03 19:25 UTC (permalink / raw)
  To: Akhil Goyal
  Cc: dpdk-dev, anoobj, Radu Nicolau, Doherty, Declan, Hemant Agrawal,
	Matan Azrad, Ananyev, Konstantin, Thomas Monjalon, Zhang,
	Roy Fan, Somalapuram Amaranath, Ruifeng Wang, Pablo de Lara,
	Fiona Trahe, adwivedi, michaelsh, rnagadheeraj, Jay Zhou

[-- Attachment #1: Type: text/plain, Size: 1184 bytes --]

On Tue, Aug 3, 2021 at 4:45 AM Akhil Goyal <gakhil@marvell.com> wrote:
>
> The APIs which are internal to PMD and cryptodev library
> can be marked as internal so that ABI checking do not
> shout for changes in APIs which are internal to DPDK.
>
> Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

> ---
>  doc/guides/rel_notes/deprecation.rst | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 6a35c7649a..f81bd87f10 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -148,6 +148,9 @@ Deprecation Notices
>    content. On Linux and FreeBSD, supported prior to DPDK 20.11,
>    original structure will be kept until DPDK 21.11.
>
> +* cryptodev: The APIs for interfacing between library and PMD will be marked
> +  as internal APIs in DPDK 21.11.
> +
>  * security: The functions ``rte_security_set_pkt_metadata`` and
>    ``rte_security_get_userdata`` will be made inline functions and additional
>    flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
> --
> 2.25.1
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v13 00/10] eal: Add EAL API for threading
  2021-08-02 17:32  3%   ` [dpdk-dev] [PATCH v12 " Narcisa Ana Maria Vasile
@ 2021-08-03 19:01  3%     ` Narcisa Ana Maria Vasile
  0 siblings, 0 replies; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-08-03 19:01 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

EAL thread API

**Problem Statement**
DPDK currently uses the pthread interface to create and manage threads.
Windows does not support the POSIX thread programming model,
so it currently
relies on a header file that hides the Windows calls under
pthread matched interfaces. Given that EAL should isolate the environment
specifics from the applications and libraries and mediate
all the communication with the operating systems, a new EAL interface
is needed for thread management.

**Goals**
* Introduce a generic EAL API for threading support that will remove
  the current Windows pthread.h shim.
* Replace references to pthread_* across the DPDK codebase with the new
  RTE_THREAD_* API.
* Allow users to choose between using the RTE_THREAD_* API or a
  3rd party thread library through a configuration option.

**Design plan**
New API main files:
* rte_thread.h (librte_eal/include)
* rte_thread.c (librte_eal/windows)
* rte_thread.c (librte_eal/common)

**A schematic example of the design**
--------------------------------------------------
lib/librte_eal/include/rte_thread.h
int rte_thread_create();

lib/librte_eal/common/rte_thread.c
int rte_thread_create() 
{
	return pthread_create();
}

lib/librte_eal/windows/rte_thread.c
int rte_thread_create() 
{
	return CreateThread();
}
-----------------------------------------------------

**Thread attributes**

When or after a thread is created, specific characteristics of the thread
can be adjusted. Given that the thread characteristics that are of interest
for DPDK applications are affinity and priority, the following structure
that represents thread attributes has been defined:

typedef struct
{
	enum rte_thread_priority priority;
	rte_cpuset_t cpuset;
} rte_thread_attr_t;

The *rte_thread_create()* function can optionally receive
an rte_thread_attr_t
object that will cause the thread to be created with the
affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.
An rte_thread_attr_t object can also be set to the default values
by calling *rte_thread_attr_init()*.

*Priority* is represented through an enum that currently advertises
two values for priority:
	- RTE_THREAD_PRIORITY_NORMAL
	- RTE_THREAD_PRIORITY_REALTIME_CRITICAL
The enum can be extended to allow for multiple priority levels.
rte_thread_set_priority      - sets the priority of a thread
rte_thread_attr_set_priority - updates an rte_thread_attr_t object
                               with a new value for priority

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p ffff

*Affinity* is described by the already known “rte_cpuset_t” type.
rte_thread_attr_set/get_affinity - sets/gets the affinity field in a
                                   rte_thread_attr_t object
rte_thread_set/get_affinity      – sets/gets the affinity of a thread

**Errors**
A translation function that maps Windows error codes to errno-style
error codes is provided. 

**Future work**
The long term plan is for EAL to provide full threading support:
* Add support for conditional variables
* Add support for pthread_mutex_trylock
* Additional functionality offered by pthread_*
  (such as pthread_setname_np, etc.)

v13:
 - Fix syntax error in unit tests

v12:
 - Fix freebsd warning about initializer in unit tests

v11:
 - Add unit tests for thread API
 - Rebase

v10:
 - Remove patch no. 10. It will be broken down in subpatches 
   and sent as a different patchset that depends on this one.
   This is done due to the ABI breaks that would be caused by patch 10.
 - Replace unix/rte_thread.c with common/rte_thread.c
 - Remove initializations that may prevent compiler from issuing useful
   warnings.
 - Remove rte_thread_types.h and rte_windows_thread_types.h
 - Remove unneeded priority macros (EAL_THREAD_PRIORITY*)
 - Remove functions that retrieves thread handle from process handle
 - Remove rte_thread_cancel() until same behavior is obtained on
   all platforms.
 - Fix rte_thread_detach() function description,
   return value and remove empty line.
 - Reimplement mutex functions. Add compatible representation for mutex
   identifier. Add macro to replace static mutex initialization instances.
 - Fix commit messages (lines too long, remove unicode symbols)

v9:
- Sign patches

v8:
- Rebase
- Add rte_thread_detach() API
- Set default priority, when user did not specify a value

v7:
Based on DmitryK's review:
- Change thread id representation
- Change mutex id representation
- Implement static mutex inititalizer for Windows
- Change barrier identifier representation
- Improve commit messages
- Add missing doxygen comments
- Split error translation function
- Improve name for affinity function
- Remove cpuset_size parameter
- Fix eal_create_cpu_map function
- Map EAL priority values to OS specific values
- Add thread wrapper for start routine
- Do not export rte_thread_cancel() on Windows
- Cleanup, fix comments, fix typos.

v6:
- improve error-translation function
- call the error translation function in rte_thread_value_get()

v5:
- update cover letter with more details on the priority argument

v4:
- fix function description
- rebase

v3:
- rebase

v2:
- revert changes that break ABI 
- break up changes into smaller patches
- fix coding style issues
- fix issues with errors
- fix parameter type in examples/kni.c


Narcisa Vasile (10):
  eal: add basic threading functions
  eal: add thread attributes
  eal/windows: translate Windows errors to errno-style errors
  eal: implement functions for thread affinity management
  eal: implement thread priority management functions
  eal: add thread lifetime management
  eal: implement functions for mutex management
  eal: implement functions for thread barrier management
  eal: add EAL argument for setting thread priority
  Add unit tests for thread API

 app/test/meson.build                |   2 +
 app/test/test_threads.c             | 419 ++++++++++++++++++++
 lib/eal/common/eal_common_options.c |  28 +-
 lib/eal/common/eal_internal_cfg.h   |   2 +
 lib/eal/common/eal_options.h        |   2 +
 lib/eal/common/meson.build          |   1 +
 lib/eal/common/rte_thread.c         | 445 +++++++++++++++++++++
 lib/eal/include/rte_thread.h        | 406 ++++++++++++++++++-
 lib/eal/unix/meson.build            |   1 -
 lib/eal/unix/rte_thread.c           |  92 -----
 lib/eal/version.map                 |  20 +
 lib/eal/windows/eal_lcore.c         | 176 ++++++---
 lib/eal/windows/eal_windows.h       |  10 +
 lib/eal/windows/include/sched.h     |   2 +-
 lib/eal/windows/rte_thread.c        | 588 ++++++++++++++++++++++++++--
 15 files changed, 2020 insertions(+), 174 deletions(-)
 create mode 100644 app/test/test_threads.c
 create mode 100644 lib/eal/common/rte_thread.c
 delete mode 100644 lib/eal/unix/rte_thread.c

-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4] doc: policy on the promotion of experimental APIs
    @ 2021-08-03 16:44 23% ` Ray Kinsella
  2021-08-04  9:34 23% ` [dpdk-dev] [PATCH v5] " Ray Kinsella
  2 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2021-08-03 16:44 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, roretzla, ferruh.yigit, thomas,
	david.marchand, stephen, jerinjacobk, Ray Kinsella

Clarifying the ABI policy on the promotion of experimental APIS to stable.
We have a fair number of APIs that have been experimental for more than
2 years. This policy amendment indicates that these APIs should be
promoted or removed, or should at least form a conservation between the
maintainer and original contributor.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
v2: addressing comments on abi expiry from Tyler Retzlaff.
v3: addressing typos in the git commit message
v4: addressing typos and comments by Jerin Jacob

 doc/guides/contributing/abi_policy.rst | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index 4ad87dbfed..1acd12cbf4 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -26,9 +26,10 @@ General Guidelines
    symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
 #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
    once approved these will form part of the next ABI version.
-#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
-   be changed or removed without prior notice, as they are not considered part
-   of an ABI version.
+#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
+   changed or removed without prior notice, as they are not considered part of
+   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
+   is not an indefinite state.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -358,3 +359,21 @@ Libraries
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
 version.
 All functions in such libraries may be changed or removed without prior notice.
+
+Promotion to stable
+~~~~~~~~~~~~~~~~~~~
+
+An API's ``experimental`` status should be reviewed annually, by both the
+maintainer and/or the original contributor. Ordinarily APIs marked as
+``experimental`` will be promoted to the stable ABI once a maintainer has become
+satisfied that the API is mature and is unlikely to change.
+
+In exceptional circumstances, should an API still be classified as
+``experimental`` after two years and is without any prospect of becoming part of
+the stable API. The API will then become a candidate for removal, to avoid the
+acculumation of abandoned symbols.
+
+Should an API's Binary Interface change, usually due to a direct change to the
+API's signature, it is reasonable for the review and expiry clocks to reset. The
+promotion or removal of symbols will typically form part of a conversation
+between the maintainer and the original contributor.
-- 
2.26.2


^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  2021-07-11  7:22  0%       ` Jerin Jacob
@ 2021-08-03 14:12  3%         ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-08-03 14:12 UTC (permalink / raw)
  To: Jerin Jacob, Tyler Retzlaff
  Cc: dpdk-dev, Richardson, Bruce, John McNamara, Ferruh Yigit,
	Thomas Monjalon, David Marchand, Stephen Hemminger



On 11/07/2021 08:22, Jerin Jacob wrote:
> On Sat, Jul 10, 2021 at 12:46 AM Tyler Retzlaff
> <roretzla@linux.microsoft.com> wrote:
>>
>> On Fri, Jul 09, 2021 at 11:46:54AM +0530, Jerin Jacob wrote:
>>>> +
>>>> +Promotion to stable
>>>> +~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
>>>> +once a maintainer and/or the original contributor is satisfied that the API is
>>>> +reasonably mature. In exceptional circumstances, should an API still be
>>>
>>> Is this line with git commit message?
>>> Why making an exceptional case? why not make it stable after two years
>>> or remove it.
>>> My worry is if we make an exception case, it will be difficult to
>>> enumerate the exception case.
>>
>> i think the intent here is to indicate that an api/abi doesn't just
>> automatically become stable after a period of time.  there also has to
>> be an evaluation by the maintainer / community before making it stable.
>>
>> so i guess the timer is something that should force that evaluation. as
>> a part of that evaluation one would imagine there is justification for
>> keeping the api as experimental for longer and if so a rationale as to
>> why.
> 
> I think, we need to have a deadline. Probably one year timer for evaluation and
> two year for max time for decision to make it as stable or remove.
> 

Tyler is correct here (sorry for the delay I was out on vacation). 
In my usage of the word exception - I was conveying that an API aging or timing out should be an exceptional event.
What I am hoping will happen in the 90%-ile of cases is conveyed in the previous line. 

"Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
once a maintainer and/or the original contributor is satisfied that the API is
reasonably mature."

i.e. that the symbol has be pro-actively managed with the maintainer and original author deciding when to promote.

I will add a line to indicate that experimental apis should be reviewed after one year. 

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: announce cryptodev-PMD interface as internal
@ 2021-08-03 11:44  3% Akhil Goyal
  2021-08-03 19:25  0% ` Ajit Khaparde
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2021-08-03 11:44 UTC (permalink / raw)
  To: dev
  Cc: anoobj, radu.nicolau, declan.doherty, hemant.agrawal, matan,
	konstantin.ananyev, thomas, roy.fan.zhang, asomalap,
	ruifeng.wang, ajit.khaparde, pablo.de.lara.guarch, fiona.trahe,
	adwivedi, michaelsh, rnagadheeraj, jianjay.zhou, Akhil Goyal

The APIs which are internal to PMD and cryptodev library
can be marked as internal so that ABI checking do not
shout for changes in APIs which are internal to DPDK.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
---
 doc/guides/rel_notes/deprecation.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6a35c7649a..f81bd87f10 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -148,6 +148,9 @@ Deprecation Notices
   content. On Linux and FreeBSD, supported prior to DPDK 20.11,
   original structure will be kept until DPDK 21.11.
 
+* cryptodev: The APIs for interfacing between library and PMD will be marked
+  as internal APIs in DPDK 21.11.
+
 * security: The functions ``rte_security_set_pkt_metadata`` and
   ``rte_security_get_userdata`` will be made inline functions and additional
   flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] doc: announce changes to eventdev library
  2021-08-03  4:12  3%   ` Jerin Jacob
@ 2021-08-03  8:32  0%     ` Mattias Rönnblom
  2021-08-04  5:57  0%     ` Jayatheerthan, Jay
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Mattias Rönnblom @ 2021-08-03  8:32 UTC (permalink / raw)
  To: Jerin Jacob, Pavan Nikhilesh, Gujjar, Abhinandan S,
	Erik Gabriel Carrillo, Van Haaren, Harry, Hemant Agrawal,
	McDaniel, Timothy, Liang Ma, Jayatheerthan, Jay
  Cc: Jerin Jacob, Ray Kinsella, dpdk-dev, Thomas Monjalon

On 2021-08-03 06:12, Jerin Jacob wrote:
> On Tue, Aug 3, 2021 at 2:46 AM <pbhagavatula@marvell.com> wrote:
>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>
>> Make driver layer as internal, remove unnecessary rte_ prefix for
>> structures and functions that are not a part of public API.
>> Promote experimental trace and vector APIs to stable.
>> Add reserved field to `rte_event_timer` structure.
>>
>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
>
>
> ++ Eventdev driver Maintainers.
>
> This list is based on items identified for 21.11 ABI improvement at
> https://protect2.fireeye.com/v1/url?k=bb3a87ff-e4a1bf2d-bb3ac764-866132fe445e-d427d33ed389149e&q=1&e=db41f48a-6628-48aa-93d1-3190b8a53257&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE%2Fedit%23gid%3D0
>

Acked-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>


>> ---
>>   v2 Changes:
>>   - Fix build issues.
>>
>>   doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index d9c0e65921..6ac321eb1e 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -158,3 +158,14 @@ Deprecation Notices
>>   * security: The functions ``rte_security_set_pkt_metadata`` and
>>     ``rte_security_get_userdata`` will be made inline functions and additional
>>     flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
>> +
>> +* eventdev: The file ``rte_eventdev_pmd.h`` will be renamed to ``eventdev_driver.h``
>> +  to make the driver interface as internal and the structures ``rte_eventdev_data``,
>> +  ``rte_eventdev`` and ``rte_eventdevs`` will be moved to a new file named
>> +  ``rte_eventdev_core.h`` in DPDK 21.11.
>> +  The ``rte_`` prefix for internal structures and functions will be removed across the
>> +  library.
>> +  The experimental eventdev trace APIs and ``rte_event_vector_pool_create``,
>> +  ``rte_event_eth_rx_adapter_vector_limits_get`` will be promoted to stable.
>> +  An 8byte reserved field will be added to the structure ``rte_event_timer`` to
>> +  support future extensions.
>> --
>> 2.17.1
>>


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
  2021-08-03  1:52  0%           ` Xia, Chenbo
@ 2021-08-03  8:19  0%             ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-03  8:19 UTC (permalink / raw)
  To: Xia, Chenbo
  Cc: dev, Yigit, Ferruh, dev, mdr, david.marchand, Richardson, Bruce,
	andrew.rybchenko, Ananyev, Konstantin

03/08/2021 03:52, Xia, Chenbo:
> Hi Thomas,
> 
> From: Thomas Monjalon <thomas@monjalon.net>
> > 27/07/2021 10:44, Bruce Richardson:
> > > On Mon, Jul 26, 2021 at 05:56:17AM +0000, Xia, Chenbo wrote:
> > > > From: Yigit, Ferruh <ferruh.yigit@intel.com>
> > > > > On 7/23/2021 8:39 AM, Xia, Chenbo wrote:
> > > > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Chenbo Xia
> > > > > >> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver,
> > > > > "rte_bus_pci.h"
> > > > > >> +  will be made internal in 21.11 and macros/data
> > structures/functions
> > > > > defined
> > > > > >> +  in the header will not be considered as ABI anymore. This change
> > is
> > > > > >> inspired
> > > > > >> +  by the RFC
> > > > > https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
> > > > > >
> > > > > > I see there's some ABI improvement work on-going and I think it could
> > be
> > > > > part of
> > > > > > the work. If it makes sense to you, I'd like some ACKs.
> > > > > >
> > > > >
> > > > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > > >
> > > > > I am for reducing the public ABI as much as possible. How big will the
> > > > > change
> > > > > be? Is the 'rte_bus_pci.h' used other than './drivers/bus/pci/'?
> > > >
> > > > I don't see big change here. And I am not sure if I understand your second
> > > > question. The rte_bus_pci.h will still be used by drivers (maybe remove
> > the
> > > > rte prefix and change the file name).
> > > >
> > > The file itself will still be exported in some cases, where the end-user
> > > has their own drivers which need to be compiled, so I'd recommend keeping
> > > the rte_ prefix. However, I think making all bus APIs internal-only to DPDK
> > > is a good idea.
> > 
> > I don't understand how it can exported _and_ internal.
> 
> I think we can use the meson option 'enable_driver_sdk'. The first use case is in
> lib ethdev for exporting internal APIs for out-of-tree drivers. For pci bus, I
> think the use case is similar: users who want to build out-of-tree drivers can
> set the option true to export pci header but the structs/functions are marked
> internal. Make sense to you?

I understand the intent.
You are saying an out-of-tree driver is considered internal.
Let's see how it works for real.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] doc: announce changes to eventdev library
  @ 2021-08-03  4:12  3%   ` Jerin Jacob
  2021-08-03  8:32  0%     ` Mattias Rönnblom
                       ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Jerin Jacob @ 2021-08-03  4:12 UTC (permalink / raw)
  To: Pavan Nikhilesh, Gujjar, Abhinandan S, Erik Gabriel Carrillo,
	Van Haaren, Harry, Hemant Agrawal, McDaniel, Timothy, Liang Ma,
	Jayatheerthan, Jay
  Cc: Jerin Jacob, Ray Kinsella, dpdk-dev, Mattias Rönnblom,
	Thomas Monjalon

On Tue, Aug 3, 2021 at 2:46 AM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Make driver layer as internal, remove unnecessary rte_ prefix for
> structures and functions that are not a part of public API.
> Promote experimental trace and vector APIs to stable.
> Add reserved field to `rte_event_timer` structure.
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>


++ Eventdev driver Maintainers.

This list is based on items identified for 21.11 ABI improvement at
https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE/edit#gid=0


> ---
>  v2 Changes:
>  - Fix build issues.
>
>  doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d9c0e65921..6ac321eb1e 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -158,3 +158,14 @@ Deprecation Notices
>  * security: The functions ``rte_security_set_pkt_metadata`` and
>    ``rte_security_get_userdata`` will be made inline functions and additional
>    flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
> +
> +* eventdev: The file ``rte_eventdev_pmd.h`` will be renamed to ``eventdev_driver.h``
> +  to make the driver interface as internal and the structures ``rte_eventdev_data``,
> +  ``rte_eventdev`` and ``rte_eventdevs`` will be moved to a new file named
> +  ``rte_eventdev_core.h`` in DPDK 21.11.
> +  The ``rte_`` prefix for internal structures and functions will be removed across the
> +  library.
> +  The experimental eventdev trace APIs and ``rte_event_vector_pool_create``,
> +  ``rte_event_eth_rx_adapter_vector_limits_get`` will be promoted to stable.
> +  An 8byte reserved field will be added to the structure ``rte_event_timer`` to
> +  support future extensions.
> --
> 2.17.1
>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal
  2021-08-03  2:37  0% ` Xia, Chenbo
@ 2021-08-03  4:05  0%   ` Jerin Jacob
  2021-08-04 14:22  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2021-08-03  4:05 UTC (permalink / raw)
  To: Xia, Chenbo
  Cc: Harman Kalra, jerinj, david.marchand, thomas, Ray Kinsella, dev

On Tue, Aug 3, 2021 at 8:07 AM Xia, Chenbo <chenbo.xia@intel.com> wrote:
>
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Harman Kalra
> > Sent: Tuesday, August 3, 2021 12:04 AM
> > To: jerinj@marvell.com; david.marchand@redhat.com; thomas@monjalon.net; Ray
> > Kinsella <mdr@ashroe.eu>
> > Cc: dev@dpdk.org; Harman Kalra <hkalra@marvell.com>
> > Subject: [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal
> >
> > Moving struct rte_intr_handle as an internal structure to
> > avoid any ABI breakages in future. Since this structure defines
> > some static arrays and changing respective macros breaks the ABI.
> > Eg:
> > Currently RTE_MAX_RXTX_INTR_VEC_ID imposes a limit of maximum 512
> > MSI-X interrupts that can be defined for a PCI device, while PCI
> > specification allows maximum 2048 MSI-X interrupts that can be used.
> > If some PCI device requires more than 512 vectors, either change the
> > RTE_MAX_RXTX_INTR_VEC_ID limit or dynamically allocate based on
> > PCI device MSI-X size on probe time. Either way its an ABI breakage.
> >
> > Discussion thread:
> > https://mails.dpdk.org/archives/dev/2021-March/202959.html
> >
> > Change already included in 21.11 ABI improvement spreadsheet (item 42):
> > https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9U
> > xeyfE/edit#gid=0
> >
> > Signed-off-by: Harman Kalra <hkalra@marvell.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index d9c0e65921..e95574b1ec 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -17,6 +17,9 @@ Deprecation Notices
> >  * eal: The function ``rte_eal_remote_launch`` will return new error codes
> >    after read or write error on the pipe, instead of calling ``rte_panic``.
> >
> > +* eal: Making ``struct rte_intr_handle`` internal to avoid any ABI breakages
> > +  in future.
> > +
> >  * rte_atomicNN_xxx: These APIs do not take memory order parameter. This does
> >    not allow for writing optimized code for all the CPU architectures
> > supported
> >    in DPDK. DPDK has adopted the atomic operations from
> > --
> > 2.18.0
>
> Acked-by: Chenbo Xia <chenbo.xia@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal
  2021-08-02 16:03 10% [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal Harman Kalra
  2021-08-02 19:20  0% ` Andrew Rybchenko
@ 2021-08-03  2:37  0% ` Xia, Chenbo
  2021-08-03  4:05  0%   ` Jerin Jacob
  1 sibling, 1 reply; 200+ results
From: Xia, Chenbo @ 2021-08-03  2:37 UTC (permalink / raw)
  To: Harman Kalra, jerinj, david.marchand, thomas, Ray Kinsella; +Cc: dev

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Harman Kalra
> Sent: Tuesday, August 3, 2021 12:04 AM
> To: jerinj@marvell.com; david.marchand@redhat.com; thomas@monjalon.net; Ray
> Kinsella <mdr@ashroe.eu>
> Cc: dev@dpdk.org; Harman Kalra <hkalra@marvell.com>
> Subject: [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal
> 
> Moving struct rte_intr_handle as an internal structure to
> avoid any ABI breakages in future. Since this structure defines
> some static arrays and changing respective macros breaks the ABI.
> Eg:
> Currently RTE_MAX_RXTX_INTR_VEC_ID imposes a limit of maximum 512
> MSI-X interrupts that can be defined for a PCI device, while PCI
> specification allows maximum 2048 MSI-X interrupts that can be used.
> If some PCI device requires more than 512 vectors, either change the
> RTE_MAX_RXTX_INTR_VEC_ID limit or dynamically allocate based on
> PCI device MSI-X size on probe time. Either way its an ABI breakage.
> 
> Discussion thread:
> https://mails.dpdk.org/archives/dev/2021-March/202959.html
> 
> Change already included in 21.11 ABI improvement spreadsheet (item 42):
> https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9U
> xeyfE/edit#gid=0
> 
> Signed-off-by: Harman Kalra <hkalra@marvell.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index d9c0e65921..e95574b1ec 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -17,6 +17,9 @@ Deprecation Notices
>  * eal: The function ``rte_eal_remote_launch`` will return new error codes
>    after read or write error on the pipe, instead of calling ``rte_panic``.
> 
> +* eal: Making ``struct rte_intr_handle`` internal to avoid any ABI breakages
> +  in future.
> +
>  * rte_atomicNN_xxx: These APIs do not take memory order parameter. This does
>    not allow for writing optimized code for all the CPU architectures
> supported
>    in DPDK. DPDK has adopted the atomic operations from
> --
> 2.18.0

Acked-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce security API changes for Inline IPsec
  2021-07-30 22:16  3% ` Thomas Monjalon
@ 2021-08-03  2:11  3%   ` Nithin Dabilpuram
  0 siblings, 0 replies; 200+ results
From: Nithin Dabilpuram @ 2021-08-03  2:11 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: konstantin.ananyev, jerinj, gakhil, roy.fan.zhang,
	hemant.agrawal, matan, dev, ferruh.yigit, bruce.richardson, mdr,
	david.marchand

On Sat, Jul 31, 2021 at 12:16:12AM +0200, Thomas Monjalon wrote:
> 27/07/2021 19:36, Nithin Dabilpuram:
> > Announce changes to make rte_security_set_pkt_metadata() and
> > rte_security_get_userdata() inline instead of C functions and
> > also addition of another field in structure rte_security_ctx for
> > holding flags.
> 
> I guess there is a performance reason but the motivation
> is not explained. Also it is going in the opposite direction
> of what is discussed in the Technical Board meetings:
> we should avoid and reduce the number of inline functions
> to reduce the ABI surface.

Yes, it is a performance improvement. It is discussed in detail in
https://inbox.dpdk.org/dev/20210624102848.3878788-1-gakhil@marvell.com/T/#mc4ba3500c024f9911b7af7e5a6e95e23f6197fdd

To summarize, initially the two per-pkt fast path API's rte_security_set_pkt_metadata()
and rte_security_get_userdata() where added with anticipation that PMD's would
have lot of processing to be done on per-pkt basis for security offload packets
unlike other ethdev Rx/Tx offloads. 

Now that we have few PMD's that implemented inline ipsec support, it looks more
benefitial to have PMD specific logic in tx_burst()/rx_burst() for
performance instead of doing a per-pkt function ptr jump to do the same in
rte_security_set_pkt_metadata() or rte_security_get_userdata(). 
In our PMD rte_security_set_pkt_metadata() is currently just to copy private SA ptr 
from rte_security_session to security mbuf dynamic field and rte_security_get_userdata()
is to copy userdata ptr from mbuf dynamic field.

Hence the above proposal provides an alternative to PMD's which want to avoid 
function ptr jump, by doing a simple metadata get/set to mbuf security dynamic
field apart from existing function ptr jump. 

Also, in future when there will be no PMD's that need the function ptr support
for the same operations, this new method can be made the only method and rest
of the function pointer jump logic can be removed probably without breaking ABI.

> 
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
  2021-07-31 20:44  0%         ` Thomas Monjalon
@ 2021-08-03  1:52  0%           ` Xia, Chenbo
  2021-08-03  8:19  0%             ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Xia, Chenbo @ 2021-08-03  1:52 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Yigit, Ferruh, dev, mdr, david.marchand, Richardson, Bruce,
	andrew.rybchenko, Ananyev, Konstantin

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Sunday, August 1, 2021 4:44 AM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org;
> mdr@ashroe.eu; david.marchand@redhat.com; Richardson, Bruce
> <bruce.richardson@intel.com>; andrew.rybchenko@oktetlabs.ru; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus
> driver
> 
> 27/07/2021 10:44, Bruce Richardson:
> > On Mon, Jul 26, 2021 at 05:56:17AM +0000, Xia, Chenbo wrote:
> > > From: Yigit, Ferruh <ferruh.yigit@intel.com>
> > > > On 7/23/2021 8:39 AM, Xia, Chenbo wrote:
> > > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Chenbo Xia
> > > > >> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver,
> > > > "rte_bus_pci.h"
> > > > >> +  will be made internal in 21.11 and macros/data
> structures/functions
> > > > defined
> > > > >> +  in the header will not be considered as ABI anymore. This change
> is
> > > > >> inspired
> > > > >> +  by the RFC
> > > > https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
> > > > >
> > > > > I see there's some ABI improvement work on-going and I think it could
> be
> > > > part of
> > > > > the work. If it makes sense to you, I'd like some ACKs.
> > > > >
> > > >
> > > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > >
> > > > I am for reducing the public ABI as much as possible. How big will the
> > > > change
> > > > be? Is the 'rte_bus_pci.h' used other than './drivers/bus/pci/'?
> > >
> > > I don't see big change here. And I am not sure if I understand your second
> > > question. The rte_bus_pci.h will still be used by drivers (maybe remove
> the
> > > rte prefix and change the file name).
> > >
> > The file itself will still be exported in some cases, where the end-user
> > has their own drivers which need to be compiled, so I'd recommend keeping
> > the rte_ prefix. However, I think making all bus APIs internal-only to DPDK
> > is a good idea.
> 
> I don't understand how it can exported _and_ internal.

I think we can use the meson option 'enable_driver_sdk'. The first use case is in
lib ethdev for exporting internal APIs for out-of-tree drivers. For pci bus, I
think the use case is similar: users who want to build out-of-tree drivers can
set the option true to export pci header but the structs/functions are marked
internal. Make sense to you?

Thanks,
Chenbo

> And about the rte_ prefix, it should be kept even if it used only
> in internal drivers because it prevent from namespace clash with other
> libraries included by the drivers.
> As a rule we should always have rte_ prefix for each symbol used outside
> of its own library.
> 
> That said I am OK with the direction of hiding PCI bus API.
> 
> Applied, thanks.
> 
> 


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] devtools: test different build types
    @ 2021-08-02 22:45 23% ` Thomas Monjalon
    2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-02 22:45 UTC (permalink / raw)
  To: dev; +Cc: Andrew Rybchenko, Bruce Richardson

All builds were of type debugoptimized.
It is kept only for builds having an ABI check.
Others will have the default build type (release),
except if specified differently as in the x86 generic build
which will be a test of the non-optimized debug build type.
Some static builds will test the minsize build type.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
v2: fix init of var buildtype
---
 devtools/test-meson-builds.sh | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index 9ec8e2bc7e..7bd305a669 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -92,13 +92,16 @@ load_env () # <target compiler>
 	command -v $targetcc >/dev/null 2>&1 || return 1
 }
 
-config () # <dir> <builddir> <meson options>
+config () # <dir> <builddir> <ABI check> <meson options>
 {
 	dir=$1
 	shift
 	builddir=$1
 	shift
+	abicheck=$1
+	shift
 	if [ -f "$builddir/build.ninja" ] ; then
+		[ $abicheck = ABI ] || return 0
 		# for existing environments, switch to debugoptimized if unset
 		# so that ABI checks can run
 		if ! $MESON configure $builddir |
@@ -114,7 +117,9 @@ config () # <dir> <builddir> <meson options>
 	else
 		options="$options -Dexamples=l3fwd" # save disk space
 	fi
-	options="$options --buildtype=debugoptimized"
+	if [ $abicheck = ABI ] ; then
+		options="$options --buildtype=debugoptimized"
+	fi
 	for option in $DPDK_MESON_OPTIONS ; do
 		options="$options -D$option"
 	done
@@ -165,7 +170,7 @@ build () # <directory> <target cc | cross file> <ABI check> [meson options]
 		cross=
 	fi
 	load_env $targetcc || return 0
-	config $srcdir $builds_dir/$targetdir $cross --werror $*
+	config $srcdir $builds_dir/$targetdir $abicheck $cross --werror $*
 	compile $builds_dir/$targetdir
 	if [ -n "$DPDK_ABI_REF_VERSION" -a "$abicheck" = ABI ] ; then
 		abirefdir=${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION
@@ -179,7 +184,7 @@ build () # <directory> <target cc | cross file> <ABI check> [meson options]
 			fi
 
 			rm -rf $abirefdir/build
-			config $abirefdir/src $abirefdir/build $cross \
+			config $abirefdir/src $abirefdir/build $abicheck $cross \
 				-Dexamples= $*
 			compile $abirefdir/build
 			install_target $abirefdir/build $abirefdir/$targetdir
@@ -211,11 +216,13 @@ for c in gcc clang ; do
 	for s in static shared ; do
 		if [ $s = shared ] ; then
 			abicheck=ABI
+			buildtype=
 		else
 			abicheck=skipABI # save time and disk space
+			buildtype='--buildtype=minsize'
 		fi
 		export CC="$CCACHE $c"
-		build build-$c-$s $c $abicheck --default-library=$s
+		build build-$c-$s $c $abicheck $buildtype --default-library=$s
 		unset CC
 	done
 done
@@ -227,7 +234,7 @@ generic_isa='nehalem'
 if ! check_cc_flags "-march=$generic_isa" ; then
 	generic_isa='corei7'
 fi
-build build-x86-generic cc skipABI -Dcheck_includes=true \
+build build-x86-generic cc skipABI --buildtype=debug -Dcheck_includes=true \
 	-Dlibdir=lib -Dcpu_instruction_set=$generic_isa $use_shared
 
 # 32-bit with default compiler
-- 
2.31.1


^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal
  2021-08-02 16:03 10% [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal Harman Kalra
@ 2021-08-02 19:20  0% ` Andrew Rybchenko
  2021-08-03  2:37  0% ` Xia, Chenbo
  1 sibling, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-08-02 19:20 UTC (permalink / raw)
  To: Harman Kalra, jerinj, david.marchand, thomas, Ray Kinsella; +Cc: dev

On 8/2/21 7:03 PM, Harman Kalra wrote:
> Moving struct rte_intr_handle as an internal structure to
> avoid any ABI breakages in future. Since this structure defines
> some static arrays and changing respective macros breaks the ABI.
> Eg:
> Currently RTE_MAX_RXTX_INTR_VEC_ID imposes a limit of maximum 512
> MSI-X interrupts that can be defined for a PCI device, while PCI
> specification allows maximum 2048 MSI-X interrupts that can be used.
> If some PCI device requires more than 512 vectors, either change the
> RTE_MAX_RXTX_INTR_VEC_ID limit or dynamically allocate based on
> PCI device MSI-X size on probe time. Either way its an ABI breakage.
> 
> Discussion thread:
> https://mails.dpdk.org/archives/dev/2021-March/202959.html
> 
> Change already included in 21.11 ABI improvement spreadsheet (item 42):
> https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE/edit#gid=0
> 
> Signed-off-by: Harman Kalra <hkalra@marvell.com>

Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v4] doc: announce API changes for Windows compatibility
  2021-08-02 13:48  0%           ` Akhil Goyal
  2021-08-02 14:57  0%             ` Tal Shnaiderman
@ 2021-08-02 17:46  0%             ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-02 17:46 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: dev, Ferruh Yigit, Fiona Trahe, Khoa To, Ray Kinsella,
	andrew.rybchenko, olivier.matz, navasile, pallavi.kadam,
	ranjit.menon, bruce.richardson, stephen, Akhil Goyal

02/08/2021 15:48, Akhil Goyal:
> > 2021-08-02 12:45 (UTC+0000), Akhil Goyal:
> > > > 21/07/2021 21:55, Dmitry Kozlyuk:
> > > > > Windows headers define `s_addr`, `min`, and `max` as macros.
> > > > > If DPDK headers are included after Windows ones, DPDK structure
> > > > > definitions containing fields with these names get broken (example 1),
> > > > > as well as any usage of such fields (example 2). If DPDK headers
> > > > > undefined these macros, it could break consumer code (example 3).
> > > > > It is proposed to rename structure fields in DPDK, because Win32
> > headers
> > > > > are used more widely than DPDK, as a general-purpose platform
> > compared
> > > > > to domain-specific kit, and are harder to fix because of that.
> > > > > Exact new names are left for further discussion.
> > > > >
> > > > > Example 1:
> > > > >
> > > > >     /* in DPDK public header included after windows.h */
> > > > >     struct rte_type {
> > > > >         int min;    /* ERROR: `min` is a macro */
> > > > >     };
> > > > >
> > > > > Example 2:
> > > > >
> > > > >     #include <rte_ether.h>
> > > > >     #include <winsock2.h>
> > > > >     struct rte_ether_hdr eh;
> > > > >     eh.s_addr.addr_bytes[0] = 0;    /* ERROR: `addr_s` is a macro */
> > > > >
> > > > > Example 3:
> > > > >
> > > > >     #include <winsock2.h>
> > > > >     #include <rte_ether.h>
> > > > >     struct in_addr addr;
> > > > >     addr.s_addr = 0;      /* ERROR: there is no `s_addr` field,
> > > > >                              and `s_addr` macro is undefined by DPDK. */
> > > > >
> > > > > Commit 6c068dbd9fea ("net: work around s_addr macro on Windows")
> > > > > modified definition of `struct rte_ether_hdr` to avoid the issue.
> > > > > However, the workaround assumes `#define s_addr S_addr.S_un`
> > > > > in Windows headers, which is not a part of official API.
> > > > > It also complicates the definition of `struct rte_ether_hdr`.
> > > > >
> > > > > Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > > > > Acked-by: Khoa To <khot@microsoft.com>
[...]
> > > > Acked-by: Thomas Monjalon <thomas@monjalon.net>
> > > >
> > > Can we have a local variable named as min/max?
> > > If not, then I believe it is not a good idea.
> > 
> > Yes, except for inline functions in public headers.
> > The only problematic one I know of is this (rte_lru_x86.h):
> > 
> > static inline int
> > f_lru_pos(uint64_t lru_list)
> > {
> > 	__m128i lst = _mm_set_epi64x((uint64_t)-1, lru_list);
> > 	__m128i min = _mm_minpos_epu16(lst); /* <<< */
> > 	return _mm_extract_epi16(min, 1);
> > }
> > 
> > Fixing it breaks neither API nor ABI, thus no explicit deprecation notice.
> OK,
> Acked-by: Akhil Goyal <gakhil@marvell.com>

Applied, thanks.




^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v12 00/10] eal: Add EAL API for threading
  2021-07-30 22:31  3% ` [dpdk-dev] [PATCH v11 00/10] " Narcisa Ana Maria Vasile
@ 2021-08-02 17:32  3%   ` Narcisa Ana Maria Vasile
  2021-08-03 19:01  3%     ` [dpdk-dev] [PATCH v13 " Narcisa Ana Maria Vasile
  0 siblings, 1 reply; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-08-02 17:32 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

EAL thread API

**Problem Statement**
DPDK currently uses the pthread interface to create and manage threads.
Windows does not support the POSIX thread programming model,
so it currently
relies on a header file that hides the Windows calls under
pthread matched interfaces. Given that EAL should isolate the environment
specifics from the applications and libraries and mediate
all the communication with the operating systems, a new EAL interface
is needed for thread management.

**Goals**
* Introduce a generic EAL API for threading support that will remove
  the current Windows pthread.h shim.
* Replace references to pthread_* across the DPDK codebase with the new
  RTE_THREAD_* API.
* Allow users to choose between using the RTE_THREAD_* API or a
  3rd party thread library through a configuration option.

**Design plan**
New API main files:
* rte_thread.h (librte_eal/include)
* rte_thread.c (librte_eal/windows)
* rte_thread.c (librte_eal/common)

**A schematic example of the design**
--------------------------------------------------
lib/librte_eal/include/rte_thread.h
int rte_thread_create();

lib/librte_eal/common/rte_thread.c
int rte_thread_create() 
{
	return pthread_create();
}

lib/librte_eal/windows/rte_thread.c
int rte_thread_create() 
{
	return CreateThread();
}
-----------------------------------------------------

**Thread attributes**

When or after a thread is created, specific characteristics of the thread
can be adjusted. Given that the thread characteristics that are of interest
for DPDK applications are affinity and priority, the following structure
that represents thread attributes has been defined:

typedef struct
{
	enum rte_thread_priority priority;
	rte_cpuset_t cpuset;
} rte_thread_attr_t;

The *rte_thread_create()* function can optionally receive
an rte_thread_attr_t
object that will cause the thread to be created with the
affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.
An rte_thread_attr_t object can also be set to the default values
by calling *rte_thread_attr_init()*.

*Priority* is represented through an enum that currently advertises
two values for priority:
	- RTE_THREAD_PRIORITY_NORMAL
	- RTE_THREAD_PRIORITY_REALTIME_CRITICAL
The enum can be extended to allow for multiple priority levels.
rte_thread_set_priority      - sets the priority of a thread
rte_thread_attr_set_priority - updates an rte_thread_attr_t object
                               with a new value for priority

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p ffff

*Affinity* is described by the already known “rte_cpuset_t” type.
rte_thread_attr_set/get_affinity - sets/gets the affinity field in a
                                   rte_thread_attr_t object
rte_thread_set/get_affinity      – sets/gets the affinity of a thread

**Errors**
A translation function that maps Windows error codes to errno-style
error codes is provided. 

**Future work**
The long term plan is for EAL to provide full threading support:
* Add support for conditional variables
* Add support for pthread_mutex_trylock
* Additional functionality offered by pthread_*
  (such as pthread_setname_np, etc.)

v12:
 - Fix freebsd warning about initializer in unit tests

v11:
 - Add unit tests for thread API
 - Rebase

v10:
 - Remove patch no. 10. It will be broken down in subpatches 
   and sent as a different patchset that depends on this one.
   This is done due to the ABI breaks that would be caused by patch 10.
 - Replace unix/rte_thread.c with common/rte_thread.c
 - Remove initializations that may prevent compiler from issuing useful
   warnings.
 - Remove rte_thread_types.h and rte_windows_thread_types.h
 - Remove unneeded priority macros (EAL_THREAD_PRIORITY*)
 - Remove functions that retrieves thread handle from process handle
 - Remove rte_thread_cancel() until same behavior is obtained on
   all platforms.
 - Fix rte_thread_detach() function description,
   return value and remove empty line.
 - Reimplement mutex functions. Add compatible representation for mutex
   identifier. Add macro to replace static mutex initialization instances.
 - Fix commit messages (lines too long, remove unicode symbols)

v9:
- Sign patches

v8:
- Rebase
- Add rte_thread_detach() API
- Set default priority, when user did not specify a value

v7:
Based on DmitryK's review:
- Change thread id representation
- Change mutex id representation
- Implement static mutex inititalizer for Windows
- Change barrier identifier representation
- Improve commit messages
- Add missing doxygen comments
- Split error translation function
- Improve name for affinity function
- Remove cpuset_size parameter
- Fix eal_create_cpu_map function
- Map EAL priority values to OS specific values
- Add thread wrapper for start routine
- Do not export rte_thread_cancel() on Windows
- Cleanup, fix comments, fix typos.

v6:
- improve error-translation function
- call the error translation function in rte_thread_value_get()

v5:
- update cover letter with more details on the priority argument

v4:
- fix function description
- rebase

v3:
- rebase

v2:
- revert changes that break ABI 
- break up changes into smaller patches
- fix coding style issues
- fix issues with errors
- fix parameter type in examples/kni.c


Narcisa Vasile (10):
  eal: add basic threading functions
  eal: add thread attributes
  eal/windows: translate Windows errors to errno-style errors
  eal: implement functions for thread affinity management
  eal: implement thread priority management functions
  eal: add thread lifetime management
  eal: implement functions for mutex management
  eal: implement functions for thread barrier management
  eal: add EAL argument for setting thread priority
  Add unit tests for thread API

 app/test/meson.build                |   2 +
 app/test/test_threads.c             | 419 ++++++++++++++++++++
 lib/eal/common/eal_common_options.c |  28 +-
 lib/eal/common/eal_internal_cfg.h   |   2 +
 lib/eal/common/eal_options.h        |   2 +
 lib/eal/common/meson.build          |   1 +
 lib/eal/common/rte_thread.c         | 445 +++++++++++++++++++++
 lib/eal/include/rte_thread.h        | 406 ++++++++++++++++++-
 lib/eal/unix/meson.build            |   1 -
 lib/eal/unix/rte_thread.c           |  92 -----
 lib/eal/version.map                 |  20 +
 lib/eal/windows/eal_lcore.c         | 176 ++++++---
 lib/eal/windows/eal_windows.h       |  10 +
 lib/eal/windows/include/sched.h     |   2 +-
 lib/eal/windows/rte_thread.c        | 588 ++++++++++++++++++++++++++--
 15 files changed, 2020 insertions(+), 174 deletions(-)
 create mode 100644 app/test/test_threads.c
 create mode 100644 lib/eal/common/rte_thread.c
 delete mode 100644 lib/eal/unix/rte_thread.c

-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal
@ 2021-08-02 16:03 10% Harman Kalra
  2021-08-02 19:20  0% ` Andrew Rybchenko
  2021-08-03  2:37  0% ` Xia, Chenbo
  0 siblings, 2 replies; 200+ results
From: Harman Kalra @ 2021-08-02 16:03 UTC (permalink / raw)
  To: jerinj, david.marchand, thomas, Ray Kinsella; +Cc: dev, Harman Kalra

Moving struct rte_intr_handle as an internal structure to
avoid any ABI breakages in future. Since this structure defines
some static arrays and changing respective macros breaks the ABI.
Eg:
Currently RTE_MAX_RXTX_INTR_VEC_ID imposes a limit of maximum 512
MSI-X interrupts that can be defined for a PCI device, while PCI
specification allows maximum 2048 MSI-X interrupts that can be used.
If some PCI device requires more than 512 vectors, either change the
RTE_MAX_RXTX_INTR_VEC_ID limit or dynamically allocate based on
PCI device MSI-X size on probe time. Either way its an ABI breakage.

Discussion thread:
https://mails.dpdk.org/archives/dev/2021-March/202959.html

Change already included in 21.11 ABI improvement spreadsheet (item 42):
https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE/edit#gid=0

Signed-off-by: Harman Kalra <hkalra@marvell.com>
---
 doc/guides/rel_notes/deprecation.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d9c0e65921..e95574b1ec 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -17,6 +17,9 @@ Deprecation Notices
 * eal: The function ``rte_eal_remote_launch`` will return new error codes
   after read or write error on the pipe, instead of calling ``rte_panic``.
 
+* eal: Making ``struct rte_intr_handle`` internal to avoid any ABI breakages
+  in future.
+
 * rte_atomicNN_xxx: These APIs do not take memory order parameter. This does
   not allow for writing optimized code for all the CPU architectures supported
   in DPDK. DPDK has adopted the atomic operations from
-- 
2.18.0


^ permalink raw reply	[relevance 10%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v4] doc: announce API changes for Windows compatibility
  2021-08-02 13:48  0%           ` Akhil Goyal
@ 2021-08-02 14:57  0%             ` Tal Shnaiderman
  2021-08-02 17:46  0%             ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Tal Shnaiderman @ 2021-08-02 14:57 UTC (permalink / raw)
  To: Akhil Goyal, Dmitry Kozlyuk
  Cc: NBU-Contact-Thomas Monjalon, dev, Ferruh Yigit, Fiona Trahe,
	Khoa To, Ray Kinsella, andrew.rybchenko, olivier.matz, navasile,
	pallavi.kadam, ranjit.menon, bruce.richardson, stephen

> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v4] doc: announce API changes for
> Windows compatibility
> 
> External email: Use caution opening links or attachments
> 
> 
> > 2021-08-02 12:45 (UTC+0000), Akhil Goyal:
> > > > 21/07/2021 21:55, Dmitry Kozlyuk:
> > > > > Windows headers define `s_addr`, `min`, and `max` as macros.
> > > > > If DPDK headers are included after Windows ones, DPDK structure
> > > > > definitions containing fields with these names get broken
> > > > > (example 1), as well as any usage of such fields (example 2). If
> > > > > DPDK headers undefined these macros, it could break consumer code
> (example 3).
> > > > > It is proposed to rename structure fields in DPDK, because Win32
> > headers
> > > > > are used more widely than DPDK, as a general-purpose platform
> > compared
> > > > > to domain-specific kit, and are harder to fix because of that.
> > > > > Exact new names are left for further discussion.
> > > > >
> > > > > Example 1:
> > > > >
> > > > >     /* in DPDK public header included after windows.h */
> > > > >     struct rte_type {
> > > > >         int min;    /* ERROR: `min` is a macro */
> > > > >     };
> > > > >
> > > > > Example 2:
> > > > >
> > > > >     #include <rte_ether.h>
> > > > >     #include <winsock2.h>
> > > > >     struct rte_ether_hdr eh;
> > > > >     eh.s_addr.addr_bytes[0] = 0;    /* ERROR: `addr_s` is a macro */
> > > > >
> > > > > Example 3:
> > > > >
> > > > >     #include <winsock2.h>
> > > > >     #include <rte_ether.h>
> > > > >     struct in_addr addr;
> > > > >     addr.s_addr = 0;      /* ERROR: there is no `s_addr` field,
> > > > >                              and `s_addr` macro is undefined by
> > > > > DPDK. */
> > > > >
> > > > > Commit 6c068dbd9fea ("net: work around s_addr macro on
> Windows")
> > > > > modified definition of `struct rte_ether_hdr` to avoid the issue.
> > > > > However, the workaround assumes `#define s_addr S_addr.S_un` in
> > > > > Windows headers, which is not a part of official API.
> > > > > It also complicates the definition of `struct rte_ether_hdr`.
> > > > >
> > > > > Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > > > > Acked-by: Khoa To <khot@microsoft.com>
> > > > > ---
> > > > > +* net: ``s_addr`` and ``d_addr`` fields of ``rte_ether_hdr``
> > > > > +structure
> > > > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows
> > Sockets
> > > > headers.
> > > > > +
> > > > > +* compressdev: ``min`` and ``max`` fields of
> > > > > +``rte_param_log2_range``
> > > > structure
> > > > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows
> > Sockets
> > > > headers.
> > > >
> > > > The struct rte_param_log2_range should also be renamed to include
> > > > "compress" prefix.
> > > > But as we break the struct API, it is not an issue I guess.
> > > >
> > > > > +* cryptodev: ``min`` and ``max`` fields of
> > > > > +``rte_crypto_param_range``
> > > > structure
> > > > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows
> > Sockets
> > > > headers.
> > > >
> > > > Acked-by: Thomas Monjalon <thomas@monjalon.net>
> > > >
> > > Can we have a local variable named as min/max?
> > > If not, then I believe it is not a good idea.
> >
> > Yes, except for inline functions in public headers.
> > The only problematic one I know of is this (rte_lru_x86.h):
> >
> > static inline int
> > f_lru_pos(uint64_t lru_list)
> > {
> >       __m128i lst = _mm_set_epi64x((uint64_t)-1, lru_list);
> >       __m128i min = _mm_minpos_epu16(lst); /* <<< */
> >       return _mm_extract_epi16(min, 1); }
> >
> > Fixing it breaks neither API nor ABI, thus no explicit deprecation notice.
> OK,
> Acked-by: Akhil Goyal <gakhil@marvell.com>
> 
> I hope when windows compilation is enabled, it will be part of CI and it will
> run on each patch which goes to patchworks.

Windows compilation is already part of CI in ci/iol-testing and ci/Intel-compilation.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v4] doc: announce API changes for Windows compatibility
  2021-08-02 13:00  3%         ` Dmitry Kozlyuk
@ 2021-08-02 13:48  0%           ` Akhil Goyal
  2021-08-02 14:57  0%             ` Tal Shnaiderman
  2021-08-02 17:46  0%             ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Akhil Goyal @ 2021-08-02 13:48 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: Thomas Monjalon, dev, Ferruh Yigit, Fiona Trahe, Khoa To,
	Ray Kinsella, andrew.rybchenko, olivier.matz, navasile,
	pallavi.kadam, ranjit.menon, bruce.richardson, stephen

> 2021-08-02 12:45 (UTC+0000), Akhil Goyal:
> > > 21/07/2021 21:55, Dmitry Kozlyuk:
> > > > Windows headers define `s_addr`, `min`, and `max` as macros.
> > > > If DPDK headers are included after Windows ones, DPDK structure
> > > > definitions containing fields with these names get broken (example 1),
> > > > as well as any usage of such fields (example 2). If DPDK headers
> > > > undefined these macros, it could break consumer code (example 3).
> > > > It is proposed to rename structure fields in DPDK, because Win32
> headers
> > > > are used more widely than DPDK, as a general-purpose platform
> compared
> > > > to domain-specific kit, and are harder to fix because of that.
> > > > Exact new names are left for further discussion.
> > > >
> > > > Example 1:
> > > >
> > > >     /* in DPDK public header included after windows.h */
> > > >     struct rte_type {
> > > >         int min;    /* ERROR: `min` is a macro */
> > > >     };
> > > >
> > > > Example 2:
> > > >
> > > >     #include <rte_ether.h>
> > > >     #include <winsock2.h>
> > > >     struct rte_ether_hdr eh;
> > > >     eh.s_addr.addr_bytes[0] = 0;    /* ERROR: `addr_s` is a macro */
> > > >
> > > > Example 3:
> > > >
> > > >     #include <winsock2.h>
> > > >     #include <rte_ether.h>
> > > >     struct in_addr addr;
> > > >     addr.s_addr = 0;      /* ERROR: there is no `s_addr` field,
> > > >                              and `s_addr` macro is undefined by DPDK. */
> > > >
> > > > Commit 6c068dbd9fea ("net: work around s_addr macro on Windows")
> > > > modified definition of `struct rte_ether_hdr` to avoid the issue.
> > > > However, the workaround assumes `#define s_addr S_addr.S_un`
> > > > in Windows headers, which is not a part of official API.
> > > > It also complicates the definition of `struct rte_ether_hdr`.
> > > >
> > > > Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > > > Acked-by: Khoa To <khot@microsoft.com>
> > > > ---
> > > > +* net: ``s_addr`` and ``d_addr`` fields of ``rte_ether_hdr`` structure
> > > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows
> Sockets
> > > headers.
> > > > +
> > > > +* compressdev: ``min`` and ``max`` fields of ``rte_param_log2_range``
> > > structure
> > > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows
> Sockets
> > > headers.
> > >
> > > The struct rte_param_log2_range should also be renamed to include
> > > "compress" prefix.
> > > But as we break the struct API, it is not an issue I guess.
> > >
> > > > +* cryptodev: ``min`` and ``max`` fields of ``rte_crypto_param_range``
> > > structure
> > > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows
> Sockets
> > > headers.
> > >
> > > Acked-by: Thomas Monjalon <thomas@monjalon.net>
> > >
> > Can we have a local variable named as min/max?
> > If not, then I believe it is not a good idea.
> 
> Yes, except for inline functions in public headers.
> The only problematic one I know of is this (rte_lru_x86.h):
> 
> static inline int
> f_lru_pos(uint64_t lru_list)
> {
> 	__m128i lst = _mm_set_epi64x((uint64_t)-1, lru_list);
> 	__m128i min = _mm_minpos_epu16(lst); /* <<< */
> 	return _mm_extract_epi16(min, 1);
> }
> 
> Fixing it breaks neither API nor ABI, thus no explicit deprecation notice.
OK,
Acked-by: Akhil Goyal <gakhil@marvell.com>

I hope when windows compilation is enabled, it will be part of CI and it will run
on each patch which goes to patchworks.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v4] doc: announce API changes for Windows compatibility
  @ 2021-08-02 13:00  3%         ` Dmitry Kozlyuk
  2021-08-02 13:48  0%           ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Dmitry Kozlyuk @ 2021-08-02 13:00 UTC (permalink / raw)
  To: Akhil Goyal
  Cc: Thomas Monjalon, dev, Ferruh Yigit, Fiona Trahe, Khoa To,
	Ray Kinsella, andrew.rybchenko, olivier.matz, navasile,
	pallavi.kadam, ranjit.menon, bruce.richardson, stephen

2021-08-02 12:45 (UTC+0000), Akhil Goyal:
> > 21/07/2021 21:55, Dmitry Kozlyuk:  
> > > Windows headers define `s_addr`, `min`, and `max` as macros.
> > > If DPDK headers are included after Windows ones, DPDK structure
> > > definitions containing fields with these names get broken (example 1),
> > > as well as any usage of such fields (example 2). If DPDK headers
> > > undefined these macros, it could break consumer code (example 3).
> > > It is proposed to rename structure fields in DPDK, because Win32 headers
> > > are used more widely than DPDK, as a general-purpose platform compared
> > > to domain-specific kit, and are harder to fix because of that.
> > > Exact new names are left for further discussion.
> > >
> > > Example 1:
> > >
> > >     /* in DPDK public header included after windows.h */
> > >     struct rte_type {
> > >         int min;    /* ERROR: `min` is a macro */
> > >     };
> > >
> > > Example 2:
> > >
> > >     #include <rte_ether.h>
> > >     #include <winsock2.h>
> > >     struct rte_ether_hdr eh;
> > >     eh.s_addr.addr_bytes[0] = 0;    /* ERROR: `addr_s` is a macro */
> > >
> > > Example 3:
> > >
> > >     #include <winsock2.h>
> > >     #include <rte_ether.h>
> > >     struct in_addr addr;
> > >     addr.s_addr = 0;      /* ERROR: there is no `s_addr` field,
> > >                              and `s_addr` macro is undefined by DPDK. */
> > >
> > > Commit 6c068dbd9fea ("net: work around s_addr macro on Windows")
> > > modified definition of `struct rte_ether_hdr` to avoid the issue.
> > > However, the workaround assumes `#define s_addr S_addr.S_un`
> > > in Windows headers, which is not a part of official API.
> > > It also complicates the definition of `struct rte_ether_hdr`.
> > >
> > > Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > > Acked-by: Khoa To <khot@microsoft.com>
> > > ---
> > > +* net: ``s_addr`` and ``d_addr`` fields of ``rte_ether_hdr`` structure
> > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows Sockets  
> > headers.  
> > > +
> > > +* compressdev: ``min`` and ``max`` fields of ``rte_param_log2_range``  
> > structure  
> > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows Sockets  
> > headers.
> > 
> > The struct rte_param_log2_range should also be renamed to include
> > "compress" prefix.
> > But as we break the struct API, it is not an issue I guess.
> >   
> > > +* cryptodev: ``min`` and ``max`` fields of ``rte_crypto_param_range``  
> > structure  
> > > +  will be renamed in DPDK 21.11 to avoid conflict with Windows Sockets  
> > headers.
> > 
> > Acked-by: Thomas Monjalon <thomas@monjalon.net>
> >   
> Can we have a local variable named as min/max?
> If not, then I believe it is not a good idea.

Yes, except for inline functions in public headers.
The only problematic one I know of is this (rte_lru_x86.h):

static inline int
f_lru_pos(uint64_t lru_list)
{
	__m128i lst = _mm_set_epi64x((uint64_t)-1, lru_list);
	__m128i min = _mm_minpos_epu16(lst); /* <<< */
	return _mm_extract_epi16(min, 1);
}

Fixing it breaks neither API nor ABI, thus no explicit deprecation notice.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [dpdk-announce] URGENT: review of deprecation notices before closing 21.08
@ 2021-08-02 12:33  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-08-02 12:33 UTC (permalink / raw)
  To: announce

The next release 21.11 will allow API/ABI breaking changes.
The process is to announce such changes in the previous release notes.
We are closing the release 21.08 this week so it becomes very urgent
to review all these notices and vote (ack) or reject them now.

For convenience, I am adding those patches in a bundle for easy review:
https://patches.dpdk.org/bundle/tmonjalo/deprecation-notices/

Thanks for participating



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  2021-08-02  7:28  0%                 ` Ori Kam
@ 2021-08-02 10:11  0%                   ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-08-02 10:11 UTC (permalink / raw)
  To: Ori Kam, Eli Britstein, NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov

Hi Ori,

On 8/2/21 10:28 AM, Ori Kam wrote:
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>
>> On 8/1/21 7:13 PM, Ori Kam wrote:
>>> Hi  Andrew,
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Sunday, August 1, 2021 4:24 PM
>>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
>>>> changes
>>>>
>>>> On 8/1/21 3:56 PM, Ori Kam wrote:
>>>>> Hi Andrew,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> Sent: Sunday, August 1, 2021 3:44 PM
>>>>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
>>>>>> changes
>>>>>>
>>>>>> Hi Ori,
>>>>>>
>>>>>> On 8/1/21 3:23 PM, Ori Kam wrote:
>>>>>>> Hi Andrew,
>>>>>>>
>>>>>>> I think before we can change the API we must agree on the meaning
>>>>>>> of
>>>>>> representor.
>>>>>>
>>>>>> The question is not directly related to a representor definition.
>>>>>> Just indirectly. PORT_ID action makes sense for non-representor
>>>>>> ports as well.
>>>>>>
>>>>>>> PSB more comments
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>>>> Sent: Sunday, August 1, 2021 3:04 PM
>>>>>>>> To: Eli Britstein <elibr@nvidia.com>; NBU-Contact-Thomas Monjalon
>>>>>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Ori
>>>>>>>> Kam <orika@nvidia.com>
>>>>>>>> Cc: dev@dpdk.org; Ilya Maximets <i.maximets@ovn.org>; Ajit
>>>> Khaparde
>>>>>>>> <ajit.khaparde@broadcom.com>; Matan Azrad
>> <matan@nvidia.com>;
>>>>>> Ivan
>>>>>>>> Malov <ivan.malov@oktetlabs.ru>; Viacheslav Galaktionov
>>>>>>>> <viacheslav.galaktionov@oktetlabs.ru>
>>>>>>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
>>>>>>>> changes
>>>>>>>>
>>>>>>>> On 8/1/21 1:57 PM, Eli Britstein wrote:
>>>>>>>>>
>>>>>>>>> On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
>>>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> By its very name, action PORT_ID means that packets hit an
>>>>>>>>>> ethdev with the given DPDK port ID. At least the current
>>>>>>>>>> comments don't state the opposite.
>>>>>>>>>> That said, since port representors had been adopted,
>>>>>>>>>> applications like OvS have been misusing the action. They
>>>>>>>>>> misread its purpose as sending packets to the opposite end of
>>>>>>>>>> the "wire" plugged to the given ethdev, for example,
>>>>>>>>>> redirecting packets to the VF itself rather than to its representor
>> ethdev.
>>>>>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>>>>>> ethdev port ID specified in it in order to send offloaded
>>>>>>>>>> packets to the physical port.
>>>>>>>>>>
>>>>>>>>>> Since there might be applications which use this action in its
>>>>>>>>>> valid sense, one can't just change the documentation to
>>>>>>>>>> greenlight the opposite meaning.
>>>>>>>>>>
>>>>>>>>>> The documentation must be clarified and rte_flow_action_port_id
>>>>>>>>>> structure should be extended to support both meanings.
>>>>>>>>>
>>>>>>>>> I think the only clarification needed is that PORT_ID acts as if
>>>>>>>>> rte_eth_tx_burst is called with the specified port-id.
>>>>>>>>
>>>>>>>> Sorry, but I still think that it is opposite meaning to the
>>>>>>>> current documentation which says "Directs matching traffic to a
>>>>>>>> given DPDK port
>>>>>> ID."
>>>>>>>> Since it happens on switching level (transfer rule) "to a given
>>>>>>>> DPDK
>>>> port"
>>>>>>>> means that it will be received on a given DPDK port.
>>>>>>>>
>>>>>>>> Anyway, the goal of the deprecation notice is to highlight that
>>>>>>>> it must be fixed and ensure that we can choose right decision
>>>>>>>> even if it
>>>>>> breaks API/ABI.
>>>>>>>>
>>>>>>> Agree, it is good that you created the announcement.
>>>>>>
>>>>>> Hopefully you agree that the area requires clarification and must
>>>>>> be improved. I think so hot discussions really prove it.
>>>>>>
>>>>> +1
>>>>>
>>>>>>> I think we should continue our discussion on what is a representor.
>>>>>>
>>>>>> Yes, but it is a hard topic. I'd like to unbind PORT_ID action from
>>>>>> the discussion, since the action makes sense for non-representors as
>> well.
>>>>>>
>>>>> If this can be done great, I'm for it, but I'm not sure it can be, but let's
>> try.
>>>>>
>>>>>>> I think for current implementation the doc should say "direct /
>>>>>>> matches traffic to / from the switch port which the selected DPDK
>>>>>>> representor port is connected to or to DPDK port if this port is
>>>>>>> not a
>>>>>> representor."
>>>>>>
>>>>>> IMHO it is better to keep the definition of the action simple and
>>>>>> do not have any representor specifics in it. Representor is an
>>>>>> ethdev port. If we direct traffic to an ethdev port, it should be
>>>>>> received on the ethdev port regardless if it is a representor or not.
>>>>>> It is better to avoid exceptions and special cases.
>>>>>>
>>>>>
>>>>> Lets see if I understand correctly, you suggest that port  action /
>>>>> item will be for DPDK port, unless they are marked with some bit
>>>>> which means that the traffic should be routed to the switch port
>>>>> which the DPDK port represent am I correct?
>>>>
>>>> Here I'm talking about PORT_ID action only. As for details, I've
>>>> tried to keep it out-of-scope of the deprecation notice.
>>>>
>>> +1 but we need to check if we need it at all or just change doc.
>>>
>>>> However, since we are going to break something here, it is better to
>>>> break hard to be sure that every since usage is updated. So, I tend
>>>> to to solution suggested by Ilya [1] which is similar to Linux kernel.
>>>> I.e. add an enum with invalid zero value and two members to specify
>>>> direction.
>>>>
>>>> [1]
>>>> https://patches.dpdk.org/project/dpdk/patch/20210601111420.5549-1-
>>>> ivan.malov@oktetlabs.ru/#133431
>>>>
>>>> as for PORT_ID pattern item, I think ingress/egress attributes define
>>>> direction. If it is an ingress flow rule, PORT_ID item should match
>>>> traffic coming from represented entity in the case of port
>>>> representor and associated network port in the case of ethdev port
>>>> associated with it. In egress case it otherwise matches traffic sent
>>>> using Tx burst via corresponding ethdev port.
>>>>
>>> I think that Ingress egress has only meaning when talking about NIC
>>> steering and not E-Switch steering.
>>
>> See [2]  12.2.2.4. Attribute: Transfer last paragraph.
>>
>> [2] https://doc.dpdk.org/guides/prog_guide/rte_flow.html#attributes
>>
>> In fact I was going to submit one more deprecation notice on the topic to
>> clarify it, but reread the documentation and now think that it is good enough.
>>
> 
> I think this needs to change,
> " When transferring flow rules, ingress and egress attributes (Attribute: Traffic direction) keep their original meaning,
> as if processing traffic emitted or received by the application."
> But if we route traffic between vports was is the app direction?
> For example if sending traffic from VF A to VF B (app is on PF)
> is it ingress or egress traffic? If the direction is reverse (B to A) does it change?

It is ingress since it would go DPDK app if we don't reroute it to VF B
directly. I think that egress is what is sent by DPDK app. Everything 
else is ingress. I.e. egress rules are applied to traffic which is
generated by the DPDK application itself.

> what if we are sending traffic from VF A to wire or from wire to A what is ingress / egress?
> (Assuming that the VFs are connected to different application.)

See above. It is all ingress rules, since it is applied on traffic which
is not generated by the DPDK application which inserts these rules.

>>> I think that we can just use original bit to mark if we want to send
>>> traffic to DPDK port or to other port.
>>
>> As I say the problem of the solution is that a silent breakage.
>> It is typically bad since  old code can simply misuse it.
>>
> You have a point but then maybe we should also delete this bit.
> Also I don't like the idea to break almost all apps that are using DPDK.
> especially if it will not cause error on build.

We can rename a field, to cause errors on build :)

> just adding more fields will break the app logic not the compilation which
> I think is the worst thing. (large number of application are based on
> the current logic)

The problem is that it could be different logic because of ambiguous
definition.

>>> In any case I will be happy if we could have a meeting to discuss this
>>> approach before sending your patch.
>>
>> Please, let the deprecation notice in. In whatever direction we fix it, we'll
>> break something in any case and DPDK users must be warned in advance.
>> We either change definition of the action or change support of the action in
>> drivers (in different ways in different drivers) or do both.
> 
> O.K.

Great, many thanks.

Andrew.

>>> I think this can save a lot of time.
>>
>> It is a good idea, let's schedule to the end of August. I guess many of us have
>> vacations now or in the nearest time. It will be simply hard to find time in the
>> nearest 3 weeks which is good for all or at least majority of us.
>>
> 
> Sure.
> Best,
> Ori
>> Thanks,
>> Andrew.
>>
>>> Best,
>>> Ori
>>>
>>>
>>>>>>> If we go this way there is no need to change the API only the doc.
>>>>>>>
>>>>>>>>> Regarding representors, it's not different. When using TX on a
>>>>>>>>> representor port, the packets appear as RX on its represented port.
>>>>>>>>>
>>>>>>>>> Please elaborate if there is a use case for the PORT_ID~ in
>>>>>>>>> which the app can get the packets using rte_eth_rx_burst on the
>>>>>>>>> specified
>>>>>> port-id.
>>>>>>>>
>>>>>>>> Multi-home host with a NIC with two physical ports and two PFs
>>>>>>>> used by DPDK app with layer 3 (IP addresses). Different cores
>>>>>>>> used to handle traffic from different ports plus routing in DPDK
>>>>>>>> app. If traffic to port #0 IP address is received on phys port
>>>>>>>> #1, it is useful to redirect traffic to port ID 0 directly to
>>>>>>>> have these packets on correct CPU cores from the very beginning
>>>>>>>> to avoid SW
>>>>>> mechanisms to pass from port #1 CPU cores to port #0 CPU cores.
>>>>>>>>
>>>>>>> To make sure I understand you are talking about a DPDK application
>>>>>>> that is connected to number of ports and it is Eswitch manager,
>>>>>>> but it doesn't use representors but the actual ports, right?
>>>>>>> I think the definition I wrote above also works for this case.
>>>>>>
>>>>>> Other possible request is to direct traffic from phys port #0 to
>>>>>> phys port #1 directly and say it in terms of PORT_ID action.
>>>>>>
>>>>> But we are talking using the switch layer(transfer mode) right?
>>>>
>>>> Yes.
>>>>
>>>>> Best,
>>>>> Ori
>>>>>> Thanks,
>>>>>> Andrew.
>>>>>>
>>>>>>> Best,
>>>>>>> Ori
>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>
>>>>>>>>>> ---
>>>>>>>>>>       doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>>>>>>>       1 file changed, 5 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>>>>>>>> b/doc/guides/rel_notes/deprecation.rst
>>>>>>>>>> index d9c0e65921..6e6413c89f 100644
>>>>>>>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>>>>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>>>>>>>> @@ -158,3 +158,8 @@ Deprecation Notices
>>>>>>>>>>       * security: The functions ``rte_security_set_pkt_metadata`` and
>>>>>>>>>>         ``rte_security_get_userdata`` will be made inline
>>>>>>>>>> functions and additional
>>>>>>>>>>         flags will be added in structure ``rte_security_ctx`` in DPDK
>> 21.11.
>>>>>>>>>> +
>>>>>>>>>> +* ethdev: Definition of the flow API action PORT_ID is
>>>>>>>>>> +ambiguous and
>>>>>>>>>> needs
>>>>>>>>>> +  clarification. Structure rte_flow_action_port_id will be
>>>>>>>>>> +extended to
>>>>>>>>>> +  specify traffic direction to represented entity or ethdev
>>>>>>>>>> +port
>>>>>>>>>> itself in
>>>>>>>>>> +  DPDK 21.11.
>>>>>>>>>> --
>>>>>>>>>> 2.30.2
>>>>>>>>>>
>>>>>>>
>>>>>
>>>
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  2021-08-01 20:09  0%               ` Andrew Rybchenko
@ 2021-08-02  7:28  0%                 ` Ori Kam
  2021-08-02 10:11  0%                   ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Ori Kam @ 2021-08-02  7:28 UTC (permalink / raw)
  To: Andrew Rybchenko, Eli Britstein, NBU-Contact-Thomas Monjalon,
	Ferruh Yigit
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov



> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> 
> On 8/1/21 7:13 PM, Ori Kam wrote:
> > Hi  Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Sunday, August 1, 2021 4:24 PM
> >> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
> >> changes
> >>
> >> On 8/1/21 3:56 PM, Ori Kam wrote:
> >>> Hi Andrew,
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Sunday, August 1, 2021 3:44 PM
> >>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
> >>>> changes
> >>>>
> >>>> Hi Ori,
> >>>>
> >>>> On 8/1/21 3:23 PM, Ori Kam wrote:
> >>>>> Hi Andrew,
> >>>>>
> >>>>> I think before we can change the API we must agree on the meaning
> >>>>> of
> >>>> representor.
> >>>>
> >>>> The question is not directly related to a representor definition.
> >>>> Just indirectly. PORT_ID action makes sense for non-representor
> >>>> ports as well.
> >>>>
> >>>>> PSB more comments
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>> Sent: Sunday, August 1, 2021 3:04 PM
> >>>>>> To: Eli Britstein <elibr@nvidia.com>; NBU-Contact-Thomas Monjalon
> >>>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Ori
> >>>>>> Kam <orika@nvidia.com>
> >>>>>> Cc: dev@dpdk.org; Ilya Maximets <i.maximets@ovn.org>; Ajit
> >> Khaparde
> >>>>>> <ajit.khaparde@broadcom.com>; Matan Azrad
> <matan@nvidia.com>;
> >>>> Ivan
> >>>>>> Malov <ivan.malov@oktetlabs.ru>; Viacheslav Galaktionov
> >>>>>> <viacheslav.galaktionov@oktetlabs.ru>
> >>>>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
> >>>>>> changes
> >>>>>>
> >>>>>> On 8/1/21 1:57 PM, Eli Britstein wrote:
> >>>>>>>
> >>>>>>> On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
> >>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> By its very name, action PORT_ID means that packets hit an
> >>>>>>>> ethdev with the given DPDK port ID. At least the current
> >>>>>>>> comments don't state the opposite.
> >>>>>>>> That said, since port representors had been adopted,
> >>>>>>>> applications like OvS have been misusing the action. They
> >>>>>>>> misread its purpose as sending packets to the opposite end of
> >>>>>>>> the "wire" plugged to the given ethdev, for example,
> >>>>>>>> redirecting packets to the VF itself rather than to its representor
> ethdev.
> >>>>>>>> Another example: OvS relies on this action with the admin PF's
> >>>>>>>> ethdev port ID specified in it in order to send offloaded
> >>>>>>>> packets to the physical port.
> >>>>>>>>
> >>>>>>>> Since there might be applications which use this action in its
> >>>>>>>> valid sense, one can't just change the documentation to
> >>>>>>>> greenlight the opposite meaning.
> >>>>>>>>
> >>>>>>>> The documentation must be clarified and rte_flow_action_port_id
> >>>>>>>> structure should be extended to support both meanings.
> >>>>>>>
> >>>>>>> I think the only clarification needed is that PORT_ID acts as if
> >>>>>>> rte_eth_tx_burst is called with the specified port-id.
> >>>>>>
> >>>>>> Sorry, but I still think that it is opposite meaning to the
> >>>>>> current documentation which says "Directs matching traffic to a
> >>>>>> given DPDK port
> >>>> ID."
> >>>>>> Since it happens on switching level (transfer rule) "to a given
> >>>>>> DPDK
> >> port"
> >>>>>> means that it will be received on a given DPDK port.
> >>>>>>
> >>>>>> Anyway, the goal of the deprecation notice is to highlight that
> >>>>>> it must be fixed and ensure that we can choose right decision
> >>>>>> even if it
> >>>> breaks API/ABI.
> >>>>>>
> >>>>> Agree, it is good that you created the announcement.
> >>>>
> >>>> Hopefully you agree that the area requires clarification and must
> >>>> be improved. I think so hot discussions really prove it.
> >>>>
> >>> +1
> >>>
> >>>>> I think we should continue our discussion on what is a representor.
> >>>>
> >>>> Yes, but it is a hard topic. I'd like to unbind PORT_ID action from
> >>>> the discussion, since the action makes sense for non-representors as
> well.
> >>>>
> >>> If this can be done great, I'm for it, but I'm not sure it can be, but let's
> try.
> >>>
> >>>>> I think for current implementation the doc should say "direct /
> >>>>> matches traffic to / from the switch port which the selected DPDK
> >>>>> representor port is connected to or to DPDK port if this port is
> >>>>> not a
> >>>> representor."
> >>>>
> >>>> IMHO it is better to keep the definition of the action simple and
> >>>> do not have any representor specifics in it. Representor is an
> >>>> ethdev port. If we direct traffic to an ethdev port, it should be
> >>>> received on the ethdev port regardless if it is a representor or not.
> >>>> It is better to avoid exceptions and special cases.
> >>>>
> >>>
> >>> Lets see if I understand correctly, you suggest that port  action /
> >>> item will be for DPDK port, unless they are marked with some bit
> >>> which means that the traffic should be routed to the switch port
> >>> which the DPDK port represent am I correct?
> >>
> >> Here I'm talking about PORT_ID action only. As for details, I've
> >> tried to keep it out-of-scope of the deprecation notice.
> >>
> > +1 but we need to check if we need it at all or just change doc.
> >
> >> However, since we are going to break something here, it is better to
> >> break hard to be sure that every since usage is updated. So, I tend
> >> to to solution suggested by Ilya [1] which is similar to Linux kernel.
> >> I.e. add an enum with invalid zero value and two members to specify
> >> direction.
> >>
> >> [1]
> >> https://patches.dpdk.org/project/dpdk/patch/20210601111420.5549-1-
> >> ivan.malov@oktetlabs.ru/#133431
> >>
> >> as for PORT_ID pattern item, I think ingress/egress attributes define
> >> direction. If it is an ingress flow rule, PORT_ID item should match
> >> traffic coming from represented entity in the case of port
> >> representor and associated network port in the case of ethdev port
> >> associated with it. In egress case it otherwise matches traffic sent
> >> using Tx burst via corresponding ethdev port.
> >>
> > I think that Ingress egress has only meaning when talking about NIC
> > steering and not E-Switch steering.
> 
> See [2]  12.2.2.4. Attribute: Transfer last paragraph.
> 
> [2] https://doc.dpdk.org/guides/prog_guide/rte_flow.html#attributes
> 
> In fact I was going to submit one more deprecation notice on the topic to
> clarify it, but reread the documentation and now think that it is good enough.
> 

I think this needs to change, 
" When transferring flow rules, ingress and egress attributes (Attribute: Traffic direction) keep their original meaning, 
as if processing traffic emitted or received by the application."
But if we route traffic between vports was is the app direction?
For example if sending traffic from VF A to VF B (app is on PF)
is it ingress or egress traffic? If the direction is reverse (B to A) does it change?
what if we are sending traffic from VF A to wire or from wire to A what is ingress / egress?
(Assuming that the VFs are connected to different application.)



> > I think that we can just use original bit to mark if we want to send
> > traffic to DPDK port or to other port.
> 
> As I say the problem of the solution is that a silent breakage.
> It is typically bad since  old code can simply misuse it.
> 
You have a point but then maybe we should also delete this bit.
Also I don't like the idea to break almost all apps that are using DPDK.
especially if it will not cause error on build.
just adding more fields will break the app logic not the compilation which
I think is the worst thing. (large number of application are based on
the current logic)

> > In any case I will be happy if we could have a meeting to discuss this
> > approach before sending your patch.
> 
> Please, let the deprecation notice in. In whatever direction we fix it, we'll
> break something in any case and DPDK users must be warned in advance.
> We either change definition of the action or change support of the action in
> drivers (in different ways in different drivers) or do both.

O.K.
> 
> > I think this can save a lot of time.
> 
> It is a good idea, let's schedule to the end of August. I guess many of us have
> vacations now or in the nearest time. It will be simply hard to find time in the
> nearest 3 weeks which is good for all or at least majority of us.
> 

Sure.
Best,
Ori
> Thanks,
> Andrew.
> 
> > Best,
> > Ori
> >
> >
> >>>>> If we go this way there is no need to change the API only the doc.
> >>>>>
> >>>>>>> Regarding representors, it's not different. When using TX on a
> >>>>>>> representor port, the packets appear as RX on its represented port.
> >>>>>>>
> >>>>>>> Please elaborate if there is a use case for the PORT_ID~ in
> >>>>>>> which the app can get the packets using rte_eth_rx_burst on the
> >>>>>>> specified
> >>>> port-id.
> >>>>>>
> >>>>>> Multi-home host with a NIC with two physical ports and two PFs
> >>>>>> used by DPDK app with layer 3 (IP addresses). Different cores
> >>>>>> used to handle traffic from different ports plus routing in DPDK
> >>>>>> app. If traffic to port #0 IP address is received on phys port
> >>>>>> #1, it is useful to redirect traffic to port ID 0 directly to
> >>>>>> have these packets on correct CPU cores from the very beginning
> >>>>>> to avoid SW
> >>>> mechanisms to pass from port #1 CPU cores to port #0 CPU cores.
> >>>>>>
> >>>>> To make sure I understand you are talking about a DPDK application
> >>>>> that is connected to number of ports and it is Eswitch manager,
> >>>>> but it doesn't use representors but the actual ports, right?
> >>>>> I think the definition I wrote above also works for this case.
> >>>>
> >>>> Other possible request is to direct traffic from phys port #0 to
> >>>> phys port #1 directly and say it in terms of PORT_ID action.
> >>>>
> >>> But we are talking using the switch layer(transfer mode) right?
> >>
> >> Yes.
> >>
> >>> Best,
> >>> Ori
> >>>> Thanks,
> >>>> Andrew.
> >>>>
> >>>>> Best,
> >>>>> Ori
> >>>>>
> >>>>>>>>
> >>>>>>>> Signed-off-by: Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>
> >>>>>>>> ---
> >>>>>>>>      doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>>>>>>>      1 file changed, 5 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
> >>>>>>>> b/doc/guides/rel_notes/deprecation.rst
> >>>>>>>> index d9c0e65921..6e6413c89f 100644
> >>>>>>>> --- a/doc/guides/rel_notes/deprecation.rst
> >>>>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
> >>>>>>>> @@ -158,3 +158,8 @@ Deprecation Notices
> >>>>>>>>      * security: The functions ``rte_security_set_pkt_metadata`` and
> >>>>>>>>        ``rte_security_get_userdata`` will be made inline
> >>>>>>>> functions and additional
> >>>>>>>>        flags will be added in structure ``rte_security_ctx`` in DPDK
> 21.11.
> >>>>>>>> +
> >>>>>>>> +* ethdev: Definition of the flow API action PORT_ID is
> >>>>>>>> +ambiguous and
> >>>>>>>> needs
> >>>>>>>> +  clarification. Structure rte_flow_action_port_id will be
> >>>>>>>> +extended to
> >>>>>>>> +  specify traffic direction to represented entity or ethdev
> >>>>>>>> +port
> >>>>>>>> itself in
> >>>>>>>> +  DPDK 21.11.
> >>>>>>>> --
> >>>>>>>> 2.30.2
> >>>>>>>>
> >>>>>
> >>>
> >


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  2021-08-01 16:13  0%             ` Ori Kam
@ 2021-08-01 20:09  0%               ` Andrew Rybchenko
  2021-08-02  7:28  0%                 ` Ori Kam
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-08-01 20:09 UTC (permalink / raw)
  To: Ori Kam, Eli Britstein, NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov

On 8/1/21 7:13 PM, Ori Kam wrote:
> Hi  Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Sunday, August 1, 2021 4:24 PM
>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
>>
>> On 8/1/21 3:56 PM, Ori Kam wrote:
>>> Hi Andrew,
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Sunday, August 1, 2021 3:44 PM
>>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
>>>> changes
>>>>
>>>> Hi Ori,
>>>>
>>>> On 8/1/21 3:23 PM, Ori Kam wrote:
>>>>> Hi Andrew,
>>>>>
>>>>> I think before we can change the API we must agree on the meaning of
>>>> representor.
>>>>
>>>> The question is not directly related to a representor definition.
>>>> Just indirectly. PORT_ID action makes sense for non-representor ports
>>>> as well.
>>>>
>>>>> PSB more comments
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> Sent: Sunday, August 1, 2021 3:04 PM
>>>>>> To: Eli Britstein <elibr@nvidia.com>; NBU-Contact-Thomas Monjalon
>>>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Ori
>>>>>> Kam <orika@nvidia.com>
>>>>>> Cc: dev@dpdk.org; Ilya Maximets <i.maximets@ovn.org>; Ajit
>> Khaparde
>>>>>> <ajit.khaparde@broadcom.com>; Matan Azrad <matan@nvidia.com>;
>>>> Ivan
>>>>>> Malov <ivan.malov@oktetlabs.ru>; Viacheslav Galaktionov
>>>>>> <viacheslav.galaktionov@oktetlabs.ru>
>>>>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
>>>>>> changes
>>>>>>
>>>>>> On 8/1/21 1:57 PM, Eli Britstein wrote:
>>>>>>>
>>>>>>> On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>
>>>>>>>>
>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>>>>> with the given DPDK port ID. At least the current comments don't
>>>>>>>> state the opposite.
>>>>>>>> That said, since port representors had been adopted, applications
>>>>>>>> like OvS have been misusing the action. They misread its purpose
>>>>>>>> as sending packets to the opposite end of the "wire" plugged to
>>>>>>>> the given ethdev, for example, redirecting packets to the VF
>>>>>>>> itself rather than to its representor ethdev.
>>>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>>>> ethdev port ID specified in it in order to send offloaded packets
>>>>>>>> to the physical port.
>>>>>>>>
>>>>>>>> Since there might be applications which use this action in its
>>>>>>>> valid sense, one can't just change the documentation to
>>>>>>>> greenlight the opposite meaning.
>>>>>>>>
>>>>>>>> The documentation must be clarified and rte_flow_action_port_id
>>>>>>>> structure should be extended to support both meanings.
>>>>>>>
>>>>>>> I think the only clarification needed is that PORT_ID acts as if
>>>>>>> rte_eth_tx_burst is called with the specified port-id.
>>>>>>
>>>>>> Sorry, but I still think that it is opposite meaning to the current
>>>>>> documentation which says "Directs matching traffic to a given DPDK
>>>>>> port
>>>> ID."
>>>>>> Since it happens on switching level (transfer rule) "to a given DPDK
>> port"
>>>>>> means that it will be received on a given DPDK port.
>>>>>>
>>>>>> Anyway, the goal of the deprecation notice is to highlight that it
>>>>>> must be fixed and ensure that we can choose right decision even if
>>>>>> it
>>>> breaks API/ABI.
>>>>>>
>>>>> Agree, it is good that you created the announcement.
>>>>
>>>> Hopefully you agree that the area requires clarification and must be
>>>> improved. I think so hot discussions really prove it.
>>>>
>>> +1
>>>
>>>>> I think we should continue our discussion on what is a representor.
>>>>
>>>> Yes, but it is a hard topic. I'd like to unbind PORT_ID action from
>>>> the discussion, since the action makes sense for non-representors as well.
>>>>
>>> If this can be done great, I'm for it, but I'm not sure it can be, but let's try.
>>>
>>>>> I think for current implementation the doc should say "direct /
>>>>> matches traffic to / from the switch port which the selected DPDK
>>>>> representor port is connected to or to DPDK port if this port is not
>>>>> a
>>>> representor."
>>>>
>>>> IMHO it is better to keep the definition of the action simple and do
>>>> not have any representor specifics in it. Representor is an ethdev
>>>> port. If we direct traffic to an ethdev port, it should be received
>>>> on the ethdev port regardless if it is a representor or not.
>>>> It is better to avoid exceptions and special cases.
>>>>
>>>
>>> Lets see if I understand correctly, you suggest that port  action /
>>> item will be for DPDK port, unless they are marked with some bit which
>>> means that the traffic should be routed to the switch port which the
>>> DPDK port represent am I correct?
>>
>> Here I'm talking about PORT_ID action only. As for details, I've tried to keep it
>> out-of-scope of the deprecation notice.
>>
> +1 but we need to check if we need it at all or just change doc.
> 
>> However, since we are going to break something here, it is better to break
>> hard to be sure that every since usage is updated. So, I tend to to solution
>> suggested by Ilya [1] which is similar to Linux kernel.
>> I.e. add an enum with invalid zero value and two members to specify
>> direction.
>>
>> [1]
>> https://patches.dpdk.org/project/dpdk/patch/20210601111420.5549-1-
>> ivan.malov@oktetlabs.ru/#133431
>>
>> as for PORT_ID pattern item, I think ingress/egress attributes define
>> direction. If it is an ingress flow rule, PORT_ID item should match traffic
>> coming from represented entity in the case of port representor and
>> associated network port in the case of ethdev port associated with it. In
>> egress case it otherwise matches traffic sent using Tx burst via corresponding
>> ethdev port.
>>
> I think that Ingress egress has only meaning when talking about NIC steering
> and not E-Switch steering.

See [2]  12.2.2.4. Attribute: Transfer last paragraph.

[2] https://doc.dpdk.org/guides/prog_guide/rte_flow.html#attributes

In fact I was going to submit one more deprecation notice on the topic
to clarify it, but reread the documentation and now think that it is
good enough.

> I think that we can just use original bit to mark if we want to send traffic
> to DPDK port or to other port.

As I say the problem of the solution is that a silent breakage.
It is typically bad since  old code can simply misuse it.

> In any case I will be happy if we could have a meeting to discuss this
> approach before sending your patch.

Please, let the deprecation notice in. In whatever direction we fix it,
we'll break something in any case and DPDK users must be warned in
advance. We either change definition of the action or change support
of the action in drivers (in different ways in different drivers) or
do both.

> I think this can save a lot of time.

It is a good idea, let's schedule to the end of August. I guess many
of us have vacations now or in the nearest time. It will be simply
hard to find time in the nearest 3 weeks which is good for all or
at least majority of us.

Thanks,
Andrew.

> Best,
> Ori
> 
> 
>>>>> If we go this way there is no need to change the API only the doc.
>>>>>
>>>>>>> Regarding representors, it's not different. When using TX on a
>>>>>>> representor port, the packets appear as RX on its represented port.
>>>>>>>
>>>>>>> Please elaborate if there is a use case for the PORT_ID~ in which
>>>>>>> the app can get the packets using rte_eth_rx_burst on the
>>>>>>> specified
>>>> port-id.
>>>>>>
>>>>>> Multi-home host with a NIC with two physical ports and two PFs used
>>>>>> by DPDK app with layer 3 (IP addresses). Different cores used to
>>>>>> handle traffic from different ports plus routing in DPDK app. If
>>>>>> traffic to port #0 IP address is received on phys port #1, it is
>>>>>> useful to redirect traffic to port ID 0 directly to have these
>>>>>> packets on correct CPU cores from the very beginning to avoid SW
>>>> mechanisms to pass from port #1 CPU cores to port #0 CPU cores.
>>>>>>
>>>>> To make sure I understand you are talking about a DPDK application
>>>>> that is connected to number of ports and it is Eswitch manager, but
>>>>> it doesn't use representors but the actual ports, right?
>>>>> I think the definition I wrote above also works for this case.
>>>>
>>>> Other possible request is to direct traffic from phys port #0 to phys
>>>> port #1 directly and say it in terms of PORT_ID action.
>>>>
>>> But we are talking using the switch layer(transfer mode) right?
>>
>> Yes.
>>
>>> Best,
>>> Ori
>>>> Thanks,
>>>> Andrew.
>>>>
>>>>> Best,
>>>>> Ori
>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>
>>>>>>>> ---
>>>>>>>>      doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>>>>>      1 file changed, 5 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>>>>>> b/doc/guides/rel_notes/deprecation.rst
>>>>>>>> index d9c0e65921..6e6413c89f 100644
>>>>>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>>>>>> @@ -158,3 +158,8 @@ Deprecation Notices
>>>>>>>>      * security: The functions ``rte_security_set_pkt_metadata`` and
>>>>>>>>        ``rte_security_get_userdata`` will be made inline functions
>>>>>>>> and additional
>>>>>>>>        flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
>>>>>>>> +
>>>>>>>> +* ethdev: Definition of the flow API action PORT_ID is ambiguous
>>>>>>>> +and
>>>>>>>> needs
>>>>>>>> +  clarification. Structure rte_flow_action_port_id will be
>>>>>>>> +extended to
>>>>>>>> +  specify traffic direction to represented entity or ethdev port
>>>>>>>> itself in
>>>>>>>> +  DPDK 21.11.
>>>>>>>> --
>>>>>>>> 2.30.2
>>>>>>>>
>>>>>
>>>
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  2021-08-01 13:23  0%           ` Andrew Rybchenko
@ 2021-08-01 16:13  0%             ` Ori Kam
  2021-08-01 20:09  0%               ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Ori Kam @ 2021-08-01 16:13 UTC (permalink / raw)
  To: Andrew Rybchenko, Eli Britstein, NBU-Contact-Thomas Monjalon,
	Ferruh Yigit
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov

Hi  Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Sunday, August 1, 2021 4:24 PM
> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
> 
> On 8/1/21 3:56 PM, Ori Kam wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Sunday, August 1, 2021 3:44 PM
> >> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
> >> changes
> >>
> >> Hi Ori,
> >>
> >> On 8/1/21 3:23 PM, Ori Kam wrote:
> >>> Hi Andrew,
> >>>
> >>> I think before we can change the API we must agree on the meaning of
> >> representor.
> >>
> >> The question is not directly related to a representor definition.
> >> Just indirectly. PORT_ID action makes sense for non-representor ports
> >> as well.
> >>
> >>> PSB more comments
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Sunday, August 1, 2021 3:04 PM
> >>>> To: Eli Britstein <elibr@nvidia.com>; NBU-Contact-Thomas Monjalon
> >>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Ori
> >>>> Kam <orika@nvidia.com>
> >>>> Cc: dev@dpdk.org; Ilya Maximets <i.maximets@ovn.org>; Ajit
> Khaparde
> >>>> <ajit.khaparde@broadcom.com>; Matan Azrad <matan@nvidia.com>;
> >> Ivan
> >>>> Malov <ivan.malov@oktetlabs.ru>; Viacheslav Galaktionov
> >>>> <viacheslav.galaktionov@oktetlabs.ru>
> >>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
> >>>> changes
> >>>>
> >>>> On 8/1/21 1:57 PM, Eli Britstein wrote:
> >>>>>
> >>>>> On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
> >>>>>> External email: Use caution opening links or attachments
> >>>>>>
> >>>>>>
> >>>>>> By its very name, action PORT_ID means that packets hit an ethdev
> >>>>>> with the given DPDK port ID. At least the current comments don't
> >>>>>> state the opposite.
> >>>>>> That said, since port representors had been adopted, applications
> >>>>>> like OvS have been misusing the action. They misread its purpose
> >>>>>> as sending packets to the opposite end of the "wire" plugged to
> >>>>>> the given ethdev, for example, redirecting packets to the VF
> >>>>>> itself rather than to its representor ethdev.
> >>>>>> Another example: OvS relies on this action with the admin PF's
> >>>>>> ethdev port ID specified in it in order to send offloaded packets
> >>>>>> to the physical port.
> >>>>>>
> >>>>>> Since there might be applications which use this action in its
> >>>>>> valid sense, one can't just change the documentation to
> >>>>>> greenlight the opposite meaning.
> >>>>>>
> >>>>>> The documentation must be clarified and rte_flow_action_port_id
> >>>>>> structure should be extended to support both meanings.
> >>>>>
> >>>>> I think the only clarification needed is that PORT_ID acts as if
> >>>>> rte_eth_tx_burst is called with the specified port-id.
> >>>>
> >>>> Sorry, but I still think that it is opposite meaning to the current
> >>>> documentation which says "Directs matching traffic to a given DPDK
> >>>> port
> >> ID."
> >>>> Since it happens on switching level (transfer rule) "to a given DPDK
> port"
> >>>> means that it will be received on a given DPDK port.
> >>>>
> >>>> Anyway, the goal of the deprecation notice is to highlight that it
> >>>> must be fixed and ensure that we can choose right decision even if
> >>>> it
> >> breaks API/ABI.
> >>>>
> >>> Agree, it is good that you created the announcement.
> >>
> >> Hopefully you agree that the area requires clarification and must be
> >> improved. I think so hot discussions really prove it.
> >>
> > +1
> >
> >>> I think we should continue our discussion on what is a representor.
> >>
> >> Yes, but it is a hard topic. I'd like to unbind PORT_ID action from
> >> the discussion, since the action makes sense for non-representors as well.
> >>
> > If this can be done great, I'm for it, but I'm not sure it can be, but let's try.
> >
> >>> I think for current implementation the doc should say "direct /
> >>> matches traffic to / from the switch port which the selected DPDK
> >>> representor port is connected to or to DPDK port if this port is not
> >>> a
> >> representor."
> >>
> >> IMHO it is better to keep the definition of the action simple and do
> >> not have any representor specifics in it. Representor is an ethdev
> >> port. If we direct traffic to an ethdev port, it should be received
> >> on the ethdev port regardless if it is a representor or not.
> >> It is better to avoid exceptions and special cases.
> >>
> >
> > Lets see if I understand correctly, you suggest that port  action /
> > item will be for DPDK port, unless they are marked with some bit which
> > means that the traffic should be routed to the switch port which the
> > DPDK port represent am I correct?
> 
> Here I'm talking about PORT_ID action only. As for details, I've tried to keep it
> out-of-scope of the deprecation notice.
> 
+1 but we need to check if we need it at all or just change doc.

> However, since we are going to break something here, it is better to break
> hard to be sure that every since usage is updated. So, I tend to to solution
> suggested by Ilya [1] which is similar to Linux kernel.
> I.e. add an enum with invalid zero value and two members to specify
> direction.
> 
> [1]
> https://patches.dpdk.org/project/dpdk/patch/20210601111420.5549-1-
> ivan.malov@oktetlabs.ru/#133431
> 
> as for PORT_ID pattern item, I think ingress/egress attributes define
> direction. If it is an ingress flow rule, PORT_ID item should match traffic
> coming from represented entity in the case of port representor and
> associated network port in the case of ethdev port associated with it. In
> egress case it otherwise matches traffic sent using Tx burst via corresponding
> ethdev port.
> 
I think that Ingress egress has only meaning when talking about NIC steering
and not E-Switch steering.
I think that we can just use original bit to mark if we want to send traffic
to DPDK port or to other port.

In any case I will be happy if we could have a meeting to discuss this
approach before sending your patch. 
I think this can save a lot of time.

Best,
Ori


> >>> If we go this way there is no need to change the API only the doc.
> >>>
> >>>>> Regarding representors, it's not different. When using TX on a
> >>>>> representor port, the packets appear as RX on its represented port.
> >>>>>
> >>>>> Please elaborate if there is a use case for the PORT_ID~ in which
> >>>>> the app can get the packets using rte_eth_rx_burst on the
> >>>>> specified
> >> port-id.
> >>>>
> >>>> Multi-home host with a NIC with two physical ports and two PFs used
> >>>> by DPDK app with layer 3 (IP addresses). Different cores used to
> >>>> handle traffic from different ports plus routing in DPDK app. If
> >>>> traffic to port #0 IP address is received on phys port #1, it is
> >>>> useful to redirect traffic to port ID 0 directly to have these
> >>>> packets on correct CPU cores from the very beginning to avoid SW
> >> mechanisms to pass from port #1 CPU cores to port #0 CPU cores.
> >>>>
> >>> To make sure I understand you are talking about a DPDK application
> >>> that is connected to number of ports and it is Eswitch manager, but
> >>> it doesn't use representors but the actual ports, right?
> >>> I think the definition I wrote above also works for this case.
> >>
> >> Other possible request is to direct traffic from phys port #0 to phys
> >> port #1 directly and say it in terms of PORT_ID action.
> >>
> > But we are talking using the switch layer(transfer mode) right?
> 
> Yes.
> 
> > Best,
> > Ori
> >> Thanks,
> >> Andrew.
> >>
> >>> Best,
> >>> Ori
> >>>
> >>>>>>
> >>>>>> Signed-off-by: Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> >>>>>> ---
> >>>>>>     doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>>>>>     1 file changed, 5 insertions(+)
> >>>>>>
> >>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
> >>>>>> b/doc/guides/rel_notes/deprecation.rst
> >>>>>> index d9c0e65921..6e6413c89f 100644
> >>>>>> --- a/doc/guides/rel_notes/deprecation.rst
> >>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
> >>>>>> @@ -158,3 +158,8 @@ Deprecation Notices
> >>>>>>     * security: The functions ``rte_security_set_pkt_metadata`` and
> >>>>>>       ``rte_security_get_userdata`` will be made inline functions
> >>>>>> and additional
> >>>>>>       flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
> >>>>>> +
> >>>>>> +* ethdev: Definition of the flow API action PORT_ID is ambiguous
> >>>>>> +and
> >>>>>> needs
> >>>>>> +  clarification. Structure rte_flow_action_port_id will be
> >>>>>> +extended to
> >>>>>> +  specify traffic direction to represented entity or ethdev port
> >>>>>> itself in
> >>>>>> +  DPDK 21.11.
> >>>>>> --
> >>>>>> 2.30.2
> >>>>>>
> >>>
> >


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-08-01  8:40  0%               ` Andrew Rybchenko
@ 2021-08-01 14:25  0%                 ` Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-08-01 14:25 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable



> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Sunday, August 1, 2021 4:40 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> 
> On 7/29/21 7:13 AM, Xueming(Steven) Li wrote:
> >
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Tuesday, July 20, 2021 5:00 PM
> >> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
> >> <ajit.khaparde@broadcom.com>; Somnath Kotur
> >> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong
> >> Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>;
> >> Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>;
> >> Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>;
> >> Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> >> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> >>
> >> On 7/19/21 3:50 PM, Xueming(Steven) Li wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Monday, July 19, 2021 8:36 PM
> >>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
> >>>> <ajit.khaparde@broadcom.com>; Somnath Kotur
> >>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
> >>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> >>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi
> >>>> Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>;
> >>>> Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>;
> >>>> Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
> >>>> Monjalon <thomas@monjalon.net>; Ferruh Yigit
> >>>> <ferruh.yigit@intel.com>
> >>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >>>> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> >>>>
> >>>> On 7/19/21 2:54 PM, Xueming(Steven) Li wrote:
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>> Sent: Monday, July 19, 2021 4:46 PM
> >>>>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
> >>>>>> <ajit.khaparde@broadcom.com>; Somnath Kotur
> >>>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
> >>>>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> >>>>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi
> >>>>>> Zhang <qi.z.zhang@intel.com>; Haiyue Wang
> >>>>>> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf
> >>>>>> Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >>>>>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> >>>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> >>>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >>>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >>>>>> Subject: Re: [PATCH] ethdev: fix representor port ID search by
> >>>>>> name
> >>>>>>
> >>>>>> On 7/19/21 9:58 AM, Xueming(Steven) Li wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>>>> Sent: Tuesday, July 13, 2021 12:18 AM
> >>>>>>>> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> >>>>>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
> >>>>>>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> >>>>>>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>;
> >>>>>>>> Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
> >>>>>>>> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf
> >>>>>>>> Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >>>>>>>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> >>>>>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
> >>>>>>>> Xueming(Steven) Li <xuemingl@nvidia.com>
> >>>>>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >>>>>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >>>>>>>> Subject: [PATCH] ethdev: fix representor port ID search by name
> >>>>>>>>
> >>>>>>>> From: Viacheslav Galaktionov
> >>>>>>>> <viacheslav.galaktionov@oktetlabs.ru>
> >>>>>>>>
> >>>>>>>> Fix representor port ID search by name if the representor
> >>>>>>>> itself does not provide representors info. Getting a list of
> >>>>>>>> representors from a representor does not make sense. Instead, a
> >>>>>>>> parent device
> >>>>>> should be used.
> >>>>>>>>
> >>>>>>>> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
> >>>>>>>>
> >>>>>>>> Fixes: df7547a6a2cc ("ethdev: add helper function to get
> >>>>>>>> representor
> >>>>>>>> ID")
> >>>>>>>> Cc: stable@dpdk.org
> >>>>>>>>
> >>>>>>>> Signed-off-by: Viacheslav Galaktionov
> >>>>>>>> <viacheslav.galaktionov@oktetlabs.ru>
> >>>>>>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>>>> ---
> >>>>>>>> The new field is added into the hole in rte_eth_dev_data structure.
> >>>>>>>> The patch does not change ABI, but extra care is required since
> >>>>>>>> ABI check is disabled for the structure because of the
> >>>>>>>> libabigail bug
> >>>>>> [1].
> >>>>>>>>
> >>>>>>>> Potentially it is bad for out-of-tree drivers which implement
> >>>>>>>> representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Do we care?
> >>>>>>>>
> >>>>>>>> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
> >>>>>>>>
> >>>>>>>> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
> >>>>>>>>
> >>>>>>>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
> >>>>>>
> >>>>>> [snip]
> >>>>>>
> >>>>>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>>>>> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
> >>>>>>>>       * For backward compatibility, if no representor info, direct
> >>>>>>>>       * map legacy VF (no controller and pf).
> >>>>>>>>       *
> >>>>>>>> - * @param ethdev
> >>>>>>>> - *  Handle of ethdev port.
> >>>>>>>> + * @param parent_port_id
> >>>>>>>> + *  Port ID of the backing device.
> >>>>>>>>       * @param type
> >>>>>>>>       *  Representor type.
> >>>>>>>>       * @param controller
> >>>>>>>> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
> >>>>>>>>       */
> >>>>>>>>      __rte_internal
> >>>>>>>>      int
> >>>>>>>> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
> >>>>>>>> +rte_eth_representor_id_get(uint16_t parent_port_id,
> >>>>>>>
> >>>>>>> It make more sense to get representor info from parent port.
> >>>>>>> Representor is a member of switch domain, PMD owns the
> >>>>>>> information of the representor owner port and info of
> >>>>>>> representors. This change looks better, but not sure whether it
> >>>>>>> valuable to introduce a new
> >>>>>> member to the EAL data structure.
> >>>>>>
> >>>>>> IMHO, it is simply incorrect to return representors info on a
> >>>>>> representor itself. Representor info is an information which representors may be populated using the device.
> >>>>>>
> >>>>>> If above statement is correct, we need a way to get parent device
> >>>>>> by representor to do name to representor ID mapping. I see two options to do it:
> >>>>>>      A. Dedicated field in rte_eth_dev_data as the patch does.
> >>>>>>      B. Dedicated ethdev op (since representor knows parent port ID anyway).
> >>>>>> We have chosen (A) because of simplicity.
> >>>>>
> >>>>> Just recalled that representor port could be probed w/o owner PF, is a force for parent port?
> >>>>
> >>>> I thought that it is impossible and parent port is absolutely
> >>>> required for a representor. Could you provide an example and explain how will it work?
> >>>
> >>> In case of bonding, PF0 and PF1 become one PF port `bond0`, PCI address is PF0.
> >>> 	-a <PF0>,representor=pf[0-1]vf[0-99] // this is the syntax we proposed.
> >>
> >> Is it net/bonding or vendor-specific bonding in HW?
> >> If I remember correctly in the case of net/bonding we have ethdev ports for bonded devices.
> >
> > Not net/bonding pmd, it's Linux bonding, supported by hw driver.
> 
> Got it.
> 
> >>
> >>>
> >>> To be backward compatible, also support the following 2 devargs:
> >>> 	-a <pf0>,representor=[0-99] // probe bond0 and representor on pf0
> >>> 	-a <pf1>,representor=[0-99] // probe representors on pf1.
> >>> If devargs start with PF1 devargs, no owner PF1 created as it
> >>> disabled in bonding. Can't create bond0(PF0) automatically here as device is located by PCI address(PF1) from devargs.
> >>
> >> So, I guess the problem is vendor-specific bonding in HW. Anyway
> >> legacy backward compatible representor spec should not require
> >> representors info since it worked before without it. So, it does not sound like a reason to have representors info on a representor
> itself.
> >
> > Legacy backward logic could be something like this: if PF owner port found, use it, fallback to current representor.
> > This won't break anything I guess, how do you think?
> 
> Logically even in legacy backward compatibility PF1 VFs representors have parent port ID - PF0 which is a bond of PF0 & PF1. So,
> parent_port_id should be filled in. In this case eth_representor_cmp() will do rte_eth_representor_id_get(PF0-bond-id, -1, -1, VF, &id)
> which will return PF0 VF representor ID. Most likely it will even match and everything works, but it is still incorrect.

The PF0, bond of PF0 and PF1 will return representor info for VF/SFs under both PFs. It should work.

> 
> In fact, I have another idea. Try to do:
> rte_eth_representor_id_get(representor-port-id, ...) first for the backward compatibility case and, if not supported, do it on parent
> port ID.

Looks good to me

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-08-01  8:50  0%   ` Andrew Rybchenko
@ 2021-08-01 14:15  0%     ` Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-08-01 14:15 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable



> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Sunday, August 1, 2021 4:50 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> 
> On 7/29/21 7:20 AM, Xueming(Steven) Li wrote:
> >
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Tuesday, July 13, 2021 12:18 AM
> >> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> >> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong
> >> Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>;
> >> Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>;
> >> Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>;
> >> Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
> >> Xueming(Steven) Li <xuemingl@nvidia.com>
> >> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >> Subject: [PATCH] ethdev: fix representor port ID search by name
> >>
> >> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
> >>
> >> Fix representor port ID search by name if the representor itself does
> >> not provide representors info. Getting a list of representors from a representor does not make sense. Instead, a parent device
> should be used.
> >>
> >> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
> >>
> >> Fixes: df7547a6a2cc ("ethdev: add helper function to get representor
> >> ID")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Viacheslav Galaktionov
> >> <viacheslav.galaktionov@oktetlabs.ru>
> >> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> ---
> >> The new field is added into the hole in rte_eth_dev_data structure.
> >> The patch does not change ABI, but extra care is required since ABI check is disabled for the structure because of the libabigail bug
> [1].
> >>
> >> Potentially it is bad for out-of-tree drivers which implement
> >> representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Do we care?
> >>
> >> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
> >>
> >> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
> >>
> >> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
> 
> [snip]
> 
> >> b/drivers/net/mlx5/linux/mlx5_os.c
> >> index be22d9cbd2..5550d30628 100644
> >> --- a/drivers/net/mlx5/linux/mlx5_os.c
> >> +++ b/drivers/net/mlx5/linux/mlx5_os.c
> >> @@ -1511,6 +1511,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
> >>   	if (priv->representor) {
> >>   		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
> >>   		eth_dev->data->representor_id = priv->representor_id;
> >> +		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
> >> +			const struct mlx5_priv *opriv =
> >> +				rte_eth_devices[port_id].data->dev_private;
> >> +
> >> +			if (!opriv ||
> >> +			    opriv->sh != priv->sh ||
> >> +			    opriv->representor)
> >> +				continue;
> >> +			eth_dev->data->parent_port_id = port_id;
> >> +			break;
> >> +		}
> >
> > At line 126, there is a logic that locate priv->domain_id, parent port_id could be found there.
> 
> Do you mean line 1260? The comment above says "Look for sibling devices in order to reuse their switch domain if any, otherwise
> allocate one.".
> So, it is not a parent. Is the comment misleading and parent matches the search criteria as well? But in any case, we should guarantee
> that it is a parent port, not a sibling port. So, we need extra criteria to match parent port only.

Yes, you are correct. How about mlx5_find_master_dev()? It locate master port in same switch domain.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  2021-08-01 12:56  0%         ` Ori Kam
@ 2021-08-01 13:23  0%           ` Andrew Rybchenko
  2021-08-01 16:13  0%             ` Ori Kam
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-08-01 13:23 UTC (permalink / raw)
  To: Ori Kam, Eli Britstein, NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov

On 8/1/21 3:56 PM, Ori Kam wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Sunday, August 1, 2021 3:44 PM
>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
>>
>> Hi Ori,
>>
>> On 8/1/21 3:23 PM, Ori Kam wrote:
>>> Hi Andrew,
>>>
>>> I think before we can change the API we must agree on the meaning of
>> representor.
>>
>> The question is not directly related to a representor definition.
>> Just indirectly. PORT_ID action makes sense for non-representor ports as
>> well.
>>
>>> PSB more comments
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Sunday, August 1, 2021 3:04 PM
>>>> To: Eli Britstein <elibr@nvidia.com>; NBU-Contact-Thomas Monjalon
>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Ori Kam
>>>> <orika@nvidia.com>
>>>> Cc: dev@dpdk.org; Ilya Maximets <i.maximets@ovn.org>; Ajit Khaparde
>>>> <ajit.khaparde@broadcom.com>; Matan Azrad <matan@nvidia.com>;
>> Ivan
>>>> Malov <ivan.malov@oktetlabs.ru>; Viacheslav Galaktionov
>>>> <viacheslav.galaktionov@oktetlabs.ru>
>>>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
>>>> changes
>>>>
>>>> On 8/1/21 1:57 PM, Eli Britstein wrote:
>>>>>
>>>>> On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>>> with the given DPDK port ID. At least the current comments don't
>>>>>> state the opposite.
>>>>>> That said, since port representors had been adopted, applications
>>>>>> like OvS have been misusing the action. They misread its purpose as
>>>>>> sending packets to the opposite end of the "wire" plugged to the
>>>>>> given ethdev, for example, redirecting packets to the VF itself
>>>>>> rather than to its representor ethdev.
>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>> ethdev port ID specified in it in order to send offloaded packets
>>>>>> to the physical port.
>>>>>>
>>>>>> Since there might be applications which use this action in its
>>>>>> valid sense, one can't just change the documentation to greenlight
>>>>>> the opposite meaning.
>>>>>>
>>>>>> The documentation must be clarified and rte_flow_action_port_id
>>>>>> structure should be extended to support both meanings.
>>>>>
>>>>> I think the only clarification needed is that PORT_ID acts as if
>>>>> rte_eth_tx_burst is called with the specified port-id.
>>>>
>>>> Sorry, but I still think that it is opposite meaning to the current
>>>> documentation which says "Directs matching traffic to a given DPDK port
>> ID."
>>>> Since it happens on switching level (transfer rule) "to a given DPDK port"
>>>> means that it will be received on a given DPDK port.
>>>>
>>>> Anyway, the goal of the deprecation notice is to highlight that it
>>>> must be fixed and ensure that we can choose right decision even if it
>> breaks API/ABI.
>>>>
>>> Agree, it is good that you created the announcement.
>>
>> Hopefully you agree that the area requires clarification and must be
>> improved. I think so hot discussions really prove it.
>>
> +1
> 
>>> I think we should continue our discussion on what is a representor.
>>
>> Yes, but it is a hard topic. I'd like to unbind PORT_ID action from the
>> discussion, since the action makes sense for non-representors as well.
>>
> If this can be done great, I'm for it, but I'm not sure it can be, but let's try.
> 
>>> I think for current implementation the doc should say "direct /
>>> matches traffic to / from the switch port which the selected DPDK
>>> representor port is connected to or to DPDK port if this port is not a
>> representor."
>>
>> IMHO it is better to keep the definition of the action simple and do not have
>> any representor specifics in it. Representor is an ethdev port. If we direct
>> traffic to an ethdev port, it should be received on the ethdev port regardless
>> if it is a representor or not.
>> It is better to avoid exceptions and special cases.
>>
> 
> Lets see if I understand correctly, you suggest that port  action / item will be
> for DPDK port, unless they are marked with some bit which means that
> the traffic should be routed to the switch port which the DPDK port represent
> am I correct?

Here I'm talking about PORT_ID action only. As for details, I've tried
to keep it out-of-scope of the deprecation notice.

However, since we are going to break something here, it is better to
break hard to be sure that every since usage is updated. So, I tend to
to solution suggested by Ilya [1] which is similar to Linux kernel.
I.e. add an enum with invalid zero value and two members to specify
direction.

[1] 
https://patches.dpdk.org/project/dpdk/patch/20210601111420.5549-1-ivan.malov@oktetlabs.ru/#133431

as for PORT_ID pattern item, I think ingress/egress attributes define
direction. If it is an ingress flow rule, PORT_ID item should match
traffic coming from represented entity in the case of port representor
and associated network port in the case of ethdev port associated with
it. In egress case it otherwise matches traffic sent using Tx burst via
corresponding ethdev port.

>>> If we go this way there is no need to change the API only the doc.
>>>
>>>>> Regarding representors, it's not different. When using TX on a
>>>>> representor port, the packets appear as RX on its represented port.
>>>>>
>>>>> Please elaborate if there is a use case for the PORT_ID~ in which
>>>>> the app can get the packets using rte_eth_rx_burst on the specified
>> port-id.
>>>>
>>>> Multi-home host with a NIC with two physical ports and two PFs used
>>>> by DPDK app with layer 3 (IP addresses). Different cores used to
>>>> handle traffic from different ports plus routing in DPDK app. If
>>>> traffic to port #0 IP address is received on phys port #1, it is
>>>> useful to redirect traffic to port ID 0 directly to have these
>>>> packets on correct CPU cores from the very beginning to avoid SW
>> mechanisms to pass from port #1 CPU cores to port #0 CPU cores.
>>>>
>>> To make sure I understand you are talking about a DPDK application
>>> that is connected to number of ports and it is Eswitch manager, but it
>>> doesn't use representors but the actual ports, right?
>>> I think the definition I wrote above also works for this case.
>>
>> Other possible request is to direct traffic from phys port #0 to phys port #1
>> directly and say it in terms of PORT_ID action.
>>
> But we are talking using the switch layer(transfer mode) right?

Yes.

> Best,
> Ori
>> Thanks,
>> Andrew.
>>
>>> Best,
>>> Ori
>>>
>>>>>>
>>>>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> ---
>>>>>>     doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>>>     1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>>>> b/doc/guides/rel_notes/deprecation.rst
>>>>>> index d9c0e65921..6e6413c89f 100644
>>>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>>>> @@ -158,3 +158,8 @@ Deprecation Notices
>>>>>>     * security: The functions ``rte_security_set_pkt_metadata`` and
>>>>>>       ``rte_security_get_userdata`` will be made inline functions
>>>>>> and additional
>>>>>>       flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
>>>>>> +
>>>>>> +* ethdev: Definition of the flow API action PORT_ID is ambiguous
>>>>>> +and
>>>>>> needs
>>>>>> +  clarification. Structure rte_flow_action_port_id will be
>>>>>> +extended to
>>>>>> +  specify traffic direction to represented entity or ethdev port
>>>>>> itself in
>>>>>> +  DPDK 21.11.
>>>>>> --
>>>>>> 2.30.2
>>>>>>
>>>
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  2021-08-01 12:43  0%       ` Andrew Rybchenko
@ 2021-08-01 12:56  0%         ` Ori Kam
  2021-08-01 13:23  0%           ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Ori Kam @ 2021-08-01 12:56 UTC (permalink / raw)
  To: Andrew Rybchenko, Eli Britstein, NBU-Contact-Thomas Monjalon,
	Ferruh Yigit
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Sunday, August 1, 2021 3:44 PM
> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
> 
> Hi Ori,
> 
> On 8/1/21 3:23 PM, Ori Kam wrote:
> > Hi Andrew,
> >
> > I think before we can change the API we must agree on the meaning of
> representor.
> 
> The question is not directly related to a representor definition.
> Just indirectly. PORT_ID action makes sense for non-representor ports as
> well.
> 
> > PSB more comments
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Sunday, August 1, 2021 3:04 PM
> >> To: Eli Britstein <elibr@nvidia.com>; NBU-Contact-Thomas Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Ori Kam
> >> <orika@nvidia.com>
> >> Cc: dev@dpdk.org; Ilya Maximets <i.maximets@ovn.org>; Ajit Khaparde
> >> <ajit.khaparde@broadcom.com>; Matan Azrad <matan@nvidia.com>;
> Ivan
> >> Malov <ivan.malov@oktetlabs.ru>; Viacheslav Galaktionov
> >> <viacheslav.galaktionov@oktetlabs.ru>
> >> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID
> >> changes
> >>
> >> On 8/1/21 1:57 PM, Eli Britstein wrote:
> >>>
> >>> On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
> >>>> External email: Use caution opening links or attachments
> >>>>
> >>>>
> >>>> By its very name, action PORT_ID means that packets hit an ethdev
> >>>> with the given DPDK port ID. At least the current comments don't
> >>>> state the opposite.
> >>>> That said, since port representors had been adopted, applications
> >>>> like OvS have been misusing the action. They misread its purpose as
> >>>> sending packets to the opposite end of the "wire" plugged to the
> >>>> given ethdev, for example, redirecting packets to the VF itself
> >>>> rather than to its representor ethdev.
> >>>> Another example: OvS relies on this action with the admin PF's
> >>>> ethdev port ID specified in it in order to send offloaded packets
> >>>> to the physical port.
> >>>>
> >>>> Since there might be applications which use this action in its
> >>>> valid sense, one can't just change the documentation to greenlight
> >>>> the opposite meaning.
> >>>>
> >>>> The documentation must be clarified and rte_flow_action_port_id
> >>>> structure should be extended to support both meanings.
> >>>
> >>> I think the only clarification needed is that PORT_ID acts as if
> >>> rte_eth_tx_burst is called with the specified port-id.
> >>
> >> Sorry, but I still think that it is opposite meaning to the current
> >> documentation which says "Directs matching traffic to a given DPDK port
> ID."
> >> Since it happens on switching level (transfer rule) "to a given DPDK port"
> >> means that it will be received on a given DPDK port.
> >>
> >> Anyway, the goal of the deprecation notice is to highlight that it
> >> must be fixed and ensure that we can choose right decision even if it
> breaks API/ABI.
> >>
> > Agree, it is good that you created the announcement.
> 
> Hopefully you agree that the area requires clarification and must be
> improved. I think so hot discussions really prove it.
> 
+1

> > I think we should continue our discussion on what is a representor.
> 
> Yes, but it is a hard topic. I'd like to unbind PORT_ID action from the
> discussion, since the action makes sense for non-representors as well.
> 
If this can be done great, I'm for it, but I'm not sure it can be, but let's try.

> > I think for current implementation the doc should say "direct /
> > matches traffic to / from the switch port which the selected DPDK
> > representor port is connected to or to DPDK port if this port is not a
> representor."
> 
> IMHO it is better to keep the definition of the action simple and do not have
> any representor specifics in it. Representor is an ethdev port. If we direct
> traffic to an ethdev port, it should be received on the ethdev port regardless
> if it is a representor or not.
> It is better to avoid exceptions and special cases.
> 

Lets see if I understand correctly, you suggest that port  action / item will be
for DPDK port, unless they are marked with some bit which means that
the traffic should be routed to the switch port which the DPDK port represent
am I correct?

> > If we go this way there is no need to change the API only the doc.
> >
> >>> Regarding representors, it's not different. When using TX on a
> >>> representor port, the packets appear as RX on its represented port.
> >>>
> >>> Please elaborate if there is a use case for the PORT_ID~ in which
> >>> the app can get the packets using rte_eth_rx_burst on the specified
> port-id.
> >>
> >> Multi-home host with a NIC with two physical ports and two PFs used
> >> by DPDK app with layer 3 (IP addresses). Different cores used to
> >> handle traffic from different ports plus routing in DPDK app. If
> >> traffic to port #0 IP address is received on phys port #1, it is
> >> useful to redirect traffic to port ID 0 directly to have these
> >> packets on correct CPU cores from the very beginning to avoid SW
> mechanisms to pass from port #1 CPU cores to port #0 CPU cores.
> >>
> > To make sure I understand you are talking about a DPDK application
> > that is connected to number of ports and it is Eswitch manager, but it
> > doesn't use representors but the actual ports, right?
> > I think the definition I wrote above also works for this case.
> 
> Other possible request is to direct traffic from phys port #0 to phys port #1
> directly and say it in terms of PORT_ID action.
> 
But we are talking using the switch layer(transfer mode) right?

Best,
Ori
> Thanks,
> Andrew.
> 
> > Best,
> > Ori
> >
> >>>>
> >>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> ---
> >>>>    doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>>>    1 file changed, 5 insertions(+)
> >>>>
> >>>> diff --git a/doc/guides/rel_notes/deprecation.rst
> >>>> b/doc/guides/rel_notes/deprecation.rst
> >>>> index d9c0e65921..6e6413c89f 100644
> >>>> --- a/doc/guides/rel_notes/deprecation.rst
> >>>> +++ b/doc/guides/rel_notes/deprecation.rst
> >>>> @@ -158,3 +158,8 @@ Deprecation Notices
> >>>>    * security: The functions ``rte_security_set_pkt_metadata`` and
> >>>>      ``rte_security_get_userdata`` will be made inline functions
> >>>> and additional
> >>>>      flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
> >>>> +
> >>>> +* ethdev: Definition of the flow API action PORT_ID is ambiguous
> >>>> +and
> >>>> needs
> >>>> +  clarification. Structure rte_flow_action_port_id will be
> >>>> +extended to
> >>>> +  specify traffic direction to represented entity or ethdev port
> >>>> itself in
> >>>> +  DPDK 21.11.
> >>>> --
> >>>> 2.30.2
> >>>>
> >


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  2021-08-01 12:23  0%     ` Ori Kam
@ 2021-08-01 12:43  0%       ` Andrew Rybchenko
  2021-08-01 12:56  0%         ` Ori Kam
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-08-01 12:43 UTC (permalink / raw)
  To: Ori Kam, Eli Britstein, NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov

Hi Ori,

On 8/1/21 3:23 PM, Ori Kam wrote:
> Hi Andrew,
> 
> I think before we can change the API we must agree on the meaning of representor.

The question is not directly related to a representor definition.
Just indirectly. PORT_ID action makes sense for non-representor
ports as well.

> PSB more comments
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Sunday, August 1, 2021 3:04 PM
>> To: Eli Britstein <elibr@nvidia.com>; NBU-Contact-Thomas Monjalon
>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Ori Kam
>> <orika@nvidia.com>
>> Cc: dev@dpdk.org; Ilya Maximets <i.maximets@ovn.org>; Ajit Khaparde
>> <ajit.khaparde@broadcom.com>; Matan Azrad <matan@nvidia.com>; Ivan
>> Malov <ivan.malov@oktetlabs.ru>; Viacheslav Galaktionov
>> <viacheslav.galaktionov@oktetlabs.ru>
>> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
>>
>> On 8/1/21 1:57 PM, Eli Britstein wrote:
>>>
>>> On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>> with the given DPDK port ID. At least the current comments don't
>>>> state the opposite.
>>>> That said, since port representors had been adopted, applications
>>>> like OvS have been misusing the action. They misread its purpose as
>>>> sending packets to the opposite end of the "wire" plugged to the
>>>> given ethdev, for example, redirecting packets to the VF itself
>>>> rather than to its representor ethdev.
>>>> Another example: OvS relies on this action with the admin PF's ethdev
>>>> port ID specified in it in order to send offloaded packets to the
>>>> physical port.
>>>>
>>>> Since there might be applications which use this action in its valid
>>>> sense, one can't just change the documentation to greenlight the
>>>> opposite meaning.
>>>>
>>>> The documentation must be clarified and rte_flow_action_port_id
>>>> structure should be extended to support both meanings.
>>>
>>> I think the only clarification needed is that PORT_ID acts as if
>>> rte_eth_tx_burst is called with the specified port-id.
>>
>> Sorry, but I still think that it is opposite meaning to the current
>> documentation which says "Directs matching traffic to a given DPDK port ID."
>> Since it happens on switching level (transfer rule) "to a given DPDK port"
>> means that it will be received on a given DPDK port.
>>
>> Anyway, the goal of the deprecation notice is to highlight that it must be
>> fixed and ensure that we can choose right decision even if it breaks API/ABI.
>>
> Agree, it is good that you created the announcement.

Hopefully you agree that the area requires clarification and must
be improved. I think so hot discussions really prove it.

> I think we should continue our discussion on what is a representor.

Yes, but it is a hard topic. I'd like to unbind PORT_ID action from
the discussion, since the action makes sense for non-representors
as well.

> I think for current implementation the doc should say "direct / matches
> traffic to / from the switch port which the selected DPDK representor port
> is connected to or to DPDK port if this port is not a representor."

IMHO it is better to keep the definition of the action simple and
do not have any representor specifics in it. Representor is an ethdev
port. If we direct traffic to an ethdev port, it should be received
on the ethdev port regardless if it is a representor or not.
It is better to avoid exceptions and special cases.

> If we go this way there is no need to change the API only the doc.
> 
>>> Regarding representors, it's not different. When using TX on a
>>> representor port, the packets appear as RX on its represented port.
>>>
>>> Please elaborate if there is a use case for the PORT_ID~ in which the
>>> app can get the packets using rte_eth_rx_burst on the specified port-id.
>>
>> Multi-home host with a NIC with two physical ports and two PFs used by
>> DPDK app with layer 3 (IP addresses). Different cores used to handle traffic
>> from different ports plus routing in DPDK app. If traffic to port #0 IP address
>> is received on phys port #1, it is useful to redirect traffic to port ID 0 directly
>> to have these packets on correct CPU cores from the very beginning to avoid
>> SW mechanisms to pass from port #1 CPU cores to port #0 CPU cores.
>>
> To make sure I understand you are talking about a DPDK application that
> is connected to number of ports and it is Eswitch manager, but it doesn't use
> representors but the actual ports, right?
> I think the definition I wrote above also works for this case.

Other possible request is to direct traffic from phys port #0
to phys port #1 directly and say it in terms of PORT_ID action.

Thanks,
Andrew.

> Best,
> Ori
> 
>>>>
>>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> ---
>>>>    doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>    1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>> b/doc/guides/rel_notes/deprecation.rst
>>>> index d9c0e65921..6e6413c89f 100644
>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>> @@ -158,3 +158,8 @@ Deprecation Notices
>>>>    * security: The functions ``rte_security_set_pkt_metadata`` and
>>>>      ``rte_security_get_userdata`` will be made inline functions and
>>>> additional
>>>>      flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
>>>> +
>>>> +* ethdev: Definition of the flow API action PORT_ID is ambiguous and
>>>> needs
>>>> +  clarification. Structure rte_flow_action_port_id will be extended
>>>> +to
>>>> +  specify traffic direction to represented entity or ethdev port
>>>> itself in
>>>> +  DPDK 21.11.
>>>> --
>>>> 2.30.2
>>>>
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  2021-08-01 12:03  3%   ` Andrew Rybchenko
@ 2021-08-01 12:23  0%     ` Ori Kam
  2021-08-01 12:43  0%       ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Ori Kam @ 2021-08-01 12:23 UTC (permalink / raw)
  To: Andrew Rybchenko, Eli Britstein, NBU-Contact-Thomas Monjalon,
	Ferruh Yigit
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov

Hi Andrew,

I think before we can change the API we must agree on the meaning of representor.

PSB more comments

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Sunday, August 1, 2021 3:04 PM
> To: Eli Britstein <elibr@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Ori Kam
> <orika@nvidia.com>
> Cc: dev@dpdk.org; Ilya Maximets <i.maximets@ovn.org>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; Matan Azrad <matan@nvidia.com>; Ivan
> Malov <ivan.malov@oktetlabs.ru>; Viacheslav Galaktionov
> <viacheslav.galaktionov@oktetlabs.ru>
> Subject: Re: [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
> 
> On 8/1/21 1:57 PM, Eli Britstein wrote:
> >
> > On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> By its very name, action PORT_ID means that packets hit an ethdev
> >> with the given DPDK port ID. At least the current comments don't
> >> state the opposite.
> >> That said, since port representors had been adopted, applications
> >> like OvS have been misusing the action. They misread its purpose as
> >> sending packets to the opposite end of the "wire" plugged to the
> >> given ethdev, for example, redirecting packets to the VF itself
> >> rather than to its representor ethdev.
> >> Another example: OvS relies on this action with the admin PF's ethdev
> >> port ID specified in it in order to send offloaded packets to the
> >> physical port.
> >>
> >> Since there might be applications which use this action in its valid
> >> sense, one can't just change the documentation to greenlight the
> >> opposite meaning.
> >>
> >> The documentation must be clarified and rte_flow_action_port_id
> >> structure should be extended to support both meanings.
> >
> > I think the only clarification needed is that PORT_ID acts as if
> > rte_eth_tx_burst is called with the specified port-id.
> 
> Sorry, but I still think that it is opposite meaning to the current
> documentation which says "Directs matching traffic to a given DPDK port ID."
> Since it happens on switching level (transfer rule) "to a given DPDK port"
> means that it will be received on a given DPDK port.
> 
> Anyway, the goal of the deprecation notice is to highlight that it must be
> fixed and ensure that we can choose right decision even if it breaks API/ABI.
> 
Agree, it is good that you created the announcement.
I think we should continue our discussion on what is a representor.
I think for current implementation the doc should say "direct / matches
traffic to / from the switch port which the selected DPDK representor port
is connected to or to DPDK port if this port is not a representor."
If we go this way there is no need to change the API only the doc.

> > Regarding representors, it's not different. When using TX on a
> > representor port, the packets appear as RX on its represented port.
> >
> > Please elaborate if there is a use case for the PORT_ID~ in which the
> > app can get the packets using rte_eth_rx_burst on the specified port-id.
> 
> Multi-home host with a NIC with two physical ports and two PFs used by
> DPDK app with layer 3 (IP addresses). Different cores used to handle traffic
> from different ports plus routing in DPDK app. If traffic to port #0 IP address
> is received on phys port #1, it is useful to redirect traffic to port ID 0 directly
> to have these packets on correct CPU cores from the very beginning to avoid
> SW mechanisms to pass from port #1 CPU cores to port #0 CPU cores.
> 
To make sure I understand you are talking about a DPDK application that
is connected to number of ports and it is Eswitch manager, but it doesn't use
representors but the actual ports, right?
I think the definition I wrote above also works for this case.


Best,
Ori

> >>
> >> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> ---
> >>   doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>   1 file changed, 5 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> >> b/doc/guides/rel_notes/deprecation.rst
> >> index d9c0e65921..6e6413c89f 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -158,3 +158,8 @@ Deprecation Notices
> >>   * security: The functions ``rte_security_set_pkt_metadata`` and
> >>     ``rte_security_get_userdata`` will be made inline functions and
> >> additional
> >>     flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
> >> +
> >> +* ethdev: Definition of the flow API action PORT_ID is ambiguous and
> >> needs
> >> +  clarification. Structure rte_flow_action_port_id will be extended
> >> +to
> >> +  specify traffic direction to represented entity or ethdev port
> >> itself in
> >> +  DPDK 21.11.
> >> --
> >> 2.30.2
> >>


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes
  @ 2021-08-01 12:03  3%   ` Andrew Rybchenko
  2021-08-01 12:23  0%     ` Ori Kam
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-08-01 12:03 UTC (permalink / raw)
  To: Eli Britstein, Thomas Monjalon, Ferruh Yigit, Ori Kam
  Cc: dev, Ilya Maximets, Ajit Khaparde, Matan Azrad, Ivan Malov,
	Viacheslav Galaktionov

On 8/1/21 1:57 PM, Eli Britstein wrote:
> 
> On 8/1/2021 1:22 PM, Andrew Rybchenko wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> By its very name, action PORT_ID means that packets hit an ethdev with 
>> the
>> given DPDK port ID. At least the current comments don't state the 
>> opposite.
>> That said, since port representors had been adopted, applications like 
>> OvS
>> have been misusing the action. They misread its purpose as sending 
>> packets
>> to the opposite end of the "wire" plugged to the given ethdev, for 
>> example,
>> redirecting packets to the VF itself rather than to its representor 
>> ethdev.
>> Another example: OvS relies on this action with the admin PF's ethdev 
>> port
>> ID specified in it in order to send offloaded packets to the physical 
>> port.
>>
>> Since there might be applications which use this action in its valid 
>> sense,
>> one can't just change the documentation to greenlight the opposite 
>> meaning.
>>
>> The documentation must be clarified and rte_flow_action_port_id structure
>> should be extended to support both meanings.
> 
> I think the only clarification needed is that PORT_ID acts as if 
> rte_eth_tx_burst is called with the specified port-id.

Sorry, but I still think that it is opposite meaning to the current
documentation which says "Directs matching traffic to a given DPDK port 
ID." Since it happens on switching level (transfer rule) "to a given
DPDK port" means that it will be received on a given DPDK port.

Anyway, the goal of the deprecation notice is to highlight that it must
be fixed and ensure that we can choose right decision even if it
breaks API/ABI.

> Regarding representors, it's not different. When using TX on a 
> representor port, the packets appear as RX on its represented port.
> 
> Please elaborate if there is a use case for the PORT_ID~ in which the 
> app can get the packets using rte_eth_rx_burst on the specified port-id.

Multi-home host with a NIC with two physical ports and two PFs used
by DPDK app with layer 3 (IP addresses). Different cores used to handle
traffic from different ports plus routing in DPDK app. If traffic to
port #0 IP address is received on phys port #1, it is useful to redirect
traffic to port ID 0 directly to have these packets on correct CPU cores
from the very beginning to avoid SW mechanisms to pass from port #1 CPU
cores to port #0 CPU cores.

>>
>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> ---
>>   doc/guides/rel_notes/deprecation.rst | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst 
>> b/doc/guides/rel_notes/deprecation.rst
>> index d9c0e65921..6e6413c89f 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -158,3 +158,8 @@ Deprecation Notices
>>   * security: The functions ``rte_security_set_pkt_metadata`` and
>>     ``rte_security_get_userdata`` will be made inline functions and 
>> additional
>>     flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
>> +
>> +* ethdev: Definition of the flow API action PORT_ID is ambiguous and 
>> needs
>> +  clarification. Structure rte_flow_action_port_id will be extended to
>> +  specify traffic direction to represented entity or ethdev port 
>> itself in
>> +  DPDK 21.11.
>> -- 
>> 2.30.2
>>


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-29  4:20  0% ` Xueming(Steven) Li
@ 2021-08-01  8:50  0%   ` Andrew Rybchenko
  2021-08-01 14:15  0%     ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-08-01  8:50 UTC (permalink / raw)
  To: Xueming(Steven) Li, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable

On 7/29/21 7:20 AM, Xueming(Steven) Li wrote:
> 
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Tuesday, July 13, 2021 12:18 AM
>> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur <somnath.kotur@broadcom.com>; John Daley
>> <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>; Qiming Yang
>> <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad
>> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
>> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Xueming(Steven) Li <xuemingl@nvidia.com>
>> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>> Subject: [PATCH] ethdev: fix representor port ID search by name
>>
>> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
>>
>> Fix representor port ID search by name if the representor itself does not provide representors info. Getting a list of representors from
>> a representor does not make sense. Instead, a parent device should be used.
>>
>> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
>>
>> Fixes: df7547a6a2cc ("ethdev: add helper function to get representor ID")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> ---
>> The new field is added into the hole in rte_eth_dev_data structure.
>> The patch does not change ABI, but extra care is required since ABI check is disabled for the structure because of the libabigail bug [1].
>>
>> Potentially it is bad for out-of-tree drivers which implement representors but do not fill in a new parert_port_id field in
>> rte_eth_dev_data structure. Do we care?
>>
>> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
>>
>> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
>>
>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060

[snip]

>> b/drivers/net/mlx5/linux/mlx5_os.c
>> index be22d9cbd2..5550d30628 100644
>> --- a/drivers/net/mlx5/linux/mlx5_os.c
>> +++ b/drivers/net/mlx5/linux/mlx5_os.c
>> @@ -1511,6 +1511,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>>   	if (priv->representor) {
>>   		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>>   		eth_dev->data->representor_id = priv->representor_id;
>> +		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
>> +			const struct mlx5_priv *opriv =
>> +				rte_eth_devices[port_id].data->dev_private;
>> +
>> +			if (!opriv ||
>> +			    opriv->sh != priv->sh ||
>> +			    opriv->representor)
>> +				continue;
>> +			eth_dev->data->parent_port_id = port_id;
>> +			break;
>> +		}
> 
> At line 126, there is a logic that locate priv->domain_id, parent port_id could be found there.

Do you mean line 1260? The comment above says "Look for sibling devices 
in order to reuse their switch domain if any, otherwise allocate one.".
So, it is not a parent. Is the comment misleading and parent matches
the search criteria as well? But in any case, we should guarantee that
it is a parent port, not a sibling port. So, we need extra criteria to
match parent port only.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-29  4:13  0%             ` Xueming(Steven) Li
@ 2021-08-01  8:40  0%               ` Andrew Rybchenko
  2021-08-01 14:25  0%                 ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-08-01  8:40 UTC (permalink / raw)
  To: Xueming(Steven) Li, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable

On 7/29/21 7:13 AM, Xueming(Steven) Li wrote:
> 
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Tuesday, July 20, 2021 5:00 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
>> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
>>
>> On 7/19/21 3:50 PM, Xueming(Steven) Li wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Monday, July 19, 2021 8:36 PM
>>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
>>>> <ajit.khaparde@broadcom.com>; Somnath Kotur
>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong
>>>> Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>;
>>>> Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>;
>>>> Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>;
>>>> Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>>>> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
>>>>
>>>> On 7/19/21 2:54 PM, Xueming(Steven) Li wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> Sent: Monday, July 19, 2021 4:46 PM
>>>>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
>>>>>> <ajit.khaparde@broadcom.com>; Somnath Kotur
>>>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
>>>>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
>>>>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi
>>>>>> Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>;
>>>>>> Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>;
>>>>>> Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
>>>>>> Monjalon <thomas@monjalon.net>; Ferruh Yigit
>>>>>> <ferruh.yigit@intel.com>
>>>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
>>>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>>>>>> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
>>>>>>
>>>>>> On 7/19/21 9:58 AM, Xueming(Steven) Li wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>>>> Sent: Tuesday, July 13, 2021 12:18 AM
>>>>>>>> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
>>>>>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
>>>>>>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
>>>>>>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi
>>>>>>>> Zhang <qi.z.zhang@intel.com>; Haiyue Wang
>>>>>>>> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf
>>>>>>>> Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>>>>>>>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
>>>>>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
>>>>>>>> Xueming(Steven) Li <xuemingl@nvidia.com>
>>>>>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
>>>>>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>>>>>>>> Subject: [PATCH] ethdev: fix representor port ID search by name
>>>>>>>>
>>>>>>>> From: Viacheslav Galaktionov
>>>>>>>> <viacheslav.galaktionov@oktetlabs.ru>
>>>>>>>>
>>>>>>>> Fix representor port ID search by name if the representor itself
>>>>>>>> does not provide representors info. Getting a list of
>>>>>>>> representors from a representor does not make sense. Instead, a
>>>>>>>> parent device
>>>>>> should be used.
>>>>>>>>
>>>>>>>> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
>>>>>>>>
>>>>>>>> Fixes: df7547a6a2cc ("ethdev: add helper function to get
>>>>>>>> representor
>>>>>>>> ID")
>>>>>>>> Cc: stable@dpdk.org
>>>>>>>>
>>>>>>>> Signed-off-by: Viacheslav Galaktionov
>>>>>>>> <viacheslav.galaktionov@oktetlabs.ru>
>>>>>>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>>>> ---
>>>>>>>> The new field is added into the hole in rte_eth_dev_data structure.
>>>>>>>> The patch does not change ABI, but extra care is required since
>>>>>>>> ABI check is disabled for the structure because of the libabigail
>>>>>>>> bug
>>>>>> [1].
>>>>>>>>
>>>>>>>> Potentially it is bad for out-of-tree drivers which implement
>>>>>>>> representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Do we care?
>>>>>>>>
>>>>>>>> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
>>>>>>>>
>>>>>>>> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
>>>>>>>>
>>>>>>>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>>>> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
>>>>>>>>       * For backward compatibility, if no representor info, direct
>>>>>>>>       * map legacy VF (no controller and pf).
>>>>>>>>       *
>>>>>>>> - * @param ethdev
>>>>>>>> - *  Handle of ethdev port.
>>>>>>>> + * @param parent_port_id
>>>>>>>> + *  Port ID of the backing device.
>>>>>>>>       * @param type
>>>>>>>>       *  Representor type.
>>>>>>>>       * @param controller
>>>>>>>> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
>>>>>>>>       */
>>>>>>>>      __rte_internal
>>>>>>>>      int
>>>>>>>> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
>>>>>>>> +rte_eth_representor_id_get(uint16_t parent_port_id,
>>>>>>>
>>>>>>> It make more sense to get representor info from parent port.
>>>>>>> Representor is a member of switch domain, PMD owns the information
>>>>>>> of the representor owner port and info of representors. This
>>>>>>> change looks better, but not sure whether it valuable to introduce
>>>>>>> a new
>>>>>> member to the EAL data structure.
>>>>>>
>>>>>> IMHO, it is simply incorrect to return representors info on a
>>>>>> representor itself. Representor info is an information which representors may be populated using the device.
>>>>>>
>>>>>> If above statement is correct, we need a way to get parent device
>>>>>> by representor to do name to representor ID mapping. I see two options to do it:
>>>>>>      A. Dedicated field in rte_eth_dev_data as the patch does.
>>>>>>      B. Dedicated ethdev op (since representor knows parent port ID anyway).
>>>>>> We have chosen (A) because of simplicity.
>>>>>
>>>>> Just recalled that representor port could be probed w/o owner PF, is a force for parent port?
>>>>
>>>> I thought that it is impossible and parent port is absolutely
>>>> required for a representor. Could you provide an example and explain how will it work?
>>>
>>> In case of bonding, PF0 and PF1 become one PF port `bond0`, PCI address is PF0.
>>> 	-a <PF0>,representor=pf[0-1]vf[0-99] // this is the syntax we proposed.
>>
>> Is it net/bonding or vendor-specific bonding in HW?
>> If I remember correctly in the case of net/bonding we have ethdev ports for bonded devices.
> 
> Not net/bonding pmd, it's Linux bonding, supported by hw driver.

Got it.

>>
>>>
>>> To be backward compatible, also support the following 2 devargs:
>>> 	-a <pf0>,representor=[0-99] // probe bond0 and representor on pf0
>>> 	-a <pf1>,representor=[0-99] // probe representors on pf1.
>>> If devargs start with PF1 devargs, no owner PF1 created as it disabled
>>> in bonding. Can't create bond0(PF0) automatically here as device is located by PCI address(PF1) from devargs.
>>
>> So, I guess the problem is vendor-specific bonding in HW. Anyway legacy backward compatible representor spec should not require
>> representors info since it worked before without it. So, it does not sound like a reason to have representors info on a representor
>> itself.
> 
> Legacy backward logic could be something like this: if PF owner port found, use it, fallback to current representor.
> This won't break anything I guess, how do you think? 

Logically even in legacy backward compatibility PF1 VFs representors
have parent port ID - PF0 which is a bond of PF0 & PF1. So,
parent_port_id should be filled in. In this case eth_representor_cmp()
will do rte_eth_representor_id_get(PF0-bond-id, -1, -1, VF, &id) which
will return PF0 VF representor ID. Most likely it will even match and
everything works, but it is still incorrect.

In fact, I have another idea. Try to do:
rte_eth_representor_id_get(representor-port-id, ...) first
for the backward compatibility case and, if not supported, do
it on parent port ID.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
  2021-07-27  8:44  0%       ` Bruce Richardson
  2021-07-28 15:32  0%         ` Andrew Rybchenko
@ 2021-07-31 20:44  0%         ` Thomas Monjalon
  2021-08-03  1:52  0%           ` Xia, Chenbo
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-07-31 20:44 UTC (permalink / raw)
  To: Xia, Chenbo
  Cc: dev, Yigit, Ferruh, dev, mdr, david.marchand, Bruce Richardson,
	andrew.rybchenko, konstantin.ananyev

27/07/2021 10:44, Bruce Richardson:
> On Mon, Jul 26, 2021 at 05:56:17AM +0000, Xia, Chenbo wrote:
> > From: Yigit, Ferruh <ferruh.yigit@intel.com>
> > > On 7/23/2021 8:39 AM, Xia, Chenbo wrote:
> > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Chenbo Xia
> > > >> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver,
> > > "rte_bus_pci.h"
> > > >> +  will be made internal in 21.11 and macros/data structures/functions
> > > defined
> > > >> +  in the header will not be considered as ABI anymore. This change is
> > > >> inspired
> > > >> +  by the RFC
> > > https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
> > > >
> > > > I see there's some ABI improvement work on-going and I think it could be
> > > part of
> > > > the work. If it makes sense to you, I'd like some ACKs.
> > > >
> > > 
> > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > 
> > > I am for reducing the public ABI as much as possible. How big will the
> > > change
> > > be? Is the 'rte_bus_pci.h' used other than './drivers/bus/pci/'?
> > 
> > I don't see big change here. And I am not sure if I understand your second
> > question. The rte_bus_pci.h will still be used by drivers (maybe remove the
> > rte prefix and change the file name).
> > 
> The file itself will still be exported in some cases, where the end-user
> has their own drivers which need to be compiled, so I'd recommend keeping
> the rte_ prefix. However, I think making all bus APIs internal-only to DPDK
> is a good idea.

I don't understand how it can exported _and_ internal.
And about the rte_ prefix, it should be kept even if it used only
in internal drivers because it prevent from namespace clash with other
libraries included by the drivers.
As a rule we should always have rte_ prefix for each symbol used outside
of its own library.

That said I am OK with the direction of hiding PCI bus API.

Applied, thanks.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements
  2021-07-31 18:13  8% [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements Akhil Goyal
  2021-07-31 18:13  3% ` [dpdk-dev] [PATCH 1/4] cryptodev: remove LIST_END enumerators Akhil Goyal
  2021-07-31 18:13  3% ` [dpdk-dev] [PATCH 4/4] security: add reserved bitfields Akhil Goyal
@ 2021-07-31 18:17  4% ` Akhil Goyal
  2 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2021-07-31 18:17 UTC (permalink / raw)
  To: Akhil Goyal, dev
  Cc: thomas, david.marchand, hemant.agrawal, Anoob Joseph,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	g.singh, roy.fan.zhang, jianjay.zhou, asomalap, ruifeng.wang

> Subject: [PATCH 0/4] cryptodev and security ABI improvements
> 
> This is a first series planned for ABI improvements
> in cryptodev and security library.
> 
> Other planned improvements under development.
> - cryptodev: export driver interface as internal
> - cryptodev: split and hide struct rte_cryptodev, struct
> rte_cryptodev_data
> - cryptodev: hide struct rte_cryptodev_sym_session,
> rte_cryptodev_asym_session
> - security: hide struct rte_security_session
> 
> Request everyone to review and contribute for the missing
> pieces to improve ABI stability.
> 
Forgot to mention, this is an RFC series for DPDK 21.11

> Akhil Goyal (4):
>   cryptodev: remove LIST_END enumerators
>   cryptodev: promote asym APIs to stable
>   security: hide internal API
>   security: add reserved bitfields
> 
>  app/test/test_cryptodev_asym.c     |  4 ++--
>  devtools/libabigail.abignore       |  4 ++++
>  drivers/crypto/qat/qat_asym.c      |  2 +-
>  lib/cryptodev/rte_crypto_asym.h    |  4 ----
>  lib/cryptodev/rte_cryptodev.h      | 10 ----------
>  lib/cryptodev/version.map          | 24 +++++++++++++-----------
>  lib/security/rte_security.h        |  6 ++++++
>  lib/security/rte_security_driver.h |  2 +-
>  lib/security/version.map           |  7 ++++++-
>  9 files changed, 33 insertions(+), 30 deletions(-)
> 
> --
> 2.25.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 4/4] security: add reserved bitfields
  2021-07-31 18:13  8% [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements Akhil Goyal
  2021-07-31 18:13  3% ` [dpdk-dev] [PATCH 1/4] cryptodev: remove LIST_END enumerators Akhil Goyal
@ 2021-07-31 18:13  3% ` Akhil Goyal
  2021-07-31 18:17  4% ` [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements Akhil Goyal
  2 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2021-07-31 18:13 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, hemant.agrawal, anoobj,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	g.singh, roy.fan.zhang, jianjay.zhou, asomalap, ruifeng.wang,
	Akhil Goyal

In struct rte_security_ipsec_sa_options, for every new option
added, there is an ABI breakage, to avoid, a reserved_opts
bitfield is added to for the remaining bits available in the
structure.
Now for every new sa option, these reserved_opts can be reduced
and new option can be added. A corresponding exception is also
added in devtools/libabigail.abignore

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
---
 devtools/libabigail.abignore | 4 ++++
 lib/security/rte_security.h  | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 93158405e0..5d8da28e55 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -52,3 +52,7 @@
 ; https://sourceware.org/bugzilla/show_bug.cgi?id=28060
 [suppress_type]
 	name = rte_eth_dev_data
+
+; Ignore changes in reserved_opts bitfield of rte_security_ipsec_sa_options
+[suppress_variable]
+	name = reserved_opts
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 88d31de0a6..4606425e8d 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -181,6 +181,12 @@ struct rte_security_ipsec_sa_options {
 	 * * 0: Disable per session security statistics collection for this SA.
 	 */
 	uint32_t stats : 1;
+
+	/** Reserved bit fields for future extension
+	 *
+	 * Note: reduce number of bits in reserved_opts for every new option
+	 */
+	uint32_t reserved_opts : 24;
 };
 
 /** IPSec security association direction */
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 1/4] cryptodev: remove LIST_END enumerators
  2021-07-31 18:13  8% [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements Akhil Goyal
@ 2021-07-31 18:13  3% ` Akhil Goyal
  2021-07-31 18:13  3% ` [dpdk-dev] [PATCH 4/4] security: add reserved bitfields Akhil Goyal
  2021-07-31 18:17  4% ` [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements Akhil Goyal
  2 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2021-07-31 18:13 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, hemant.agrawal, anoobj,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	g.singh, roy.fan.zhang, jianjay.zhou, asomalap, ruifeng.wang,
	Akhil Goyal

Remove *_LIST_END enumerators from asymmetric crypto
lib to avoid ABI breakage for every new addition in
enums.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
---
 app/test/test_cryptodev_asym.c  | 4 ++--
 drivers/crypto/qat/qat_asym.c   | 2 +-
 lib/cryptodev/rte_crypto_asym.h | 4 ----
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/app/test/test_cryptodev_asym.c b/app/test/test_cryptodev_asym.c
index 847b074a4f..afa0e91a45 100644
--- a/app/test/test_cryptodev_asym.c
+++ b/app/test/test_cryptodev_asym.c
@@ -542,7 +542,7 @@ test_one_case(const void *test_case, int sessionless)
 		printf("  %u) TestCase %s %s\n", test_index++,
 			tc.modex.description, test_msg);
 	} else {
-		for (i = 0; i < RTE_CRYPTO_ASYM_OP_LIST_END; i++) {
+		for (i = 0; i <= RTE_CRYPTO_ASYM_OP_SHARED_SECRET_COMPUTE; i++) {
 			if (tc.modex.xform_type == RTE_CRYPTO_ASYM_XFORM_RSA) {
 				if (tc.rsa_data.op_type_flags & (1 << i)) {
 					if (tc.rsa_data.key_exp) {
@@ -1028,7 +1028,7 @@ static inline void print_asym_capa(
 			rte_crypto_asym_xform_strings[capa->xform_type]);
 	printf("operation supported -");
 
-	for (i = 0; i < RTE_CRYPTO_ASYM_OP_LIST_END; i++) {
+	for (i = 0; i <= RTE_CRYPTO_ASYM_OP_SHARED_SECRET_COMPUTE; i++) {
 		/* check supported operations */
 		if (rte_cryptodev_asym_xform_capability_check_optype(capa, i))
 			printf(" %s",
diff --git a/drivers/crypto/qat/qat_asym.c b/drivers/crypto/qat/qat_asym.c
index 85973812a8..026625a4d2 100644
--- a/drivers/crypto/qat/qat_asym.c
+++ b/drivers/crypto/qat/qat_asym.c
@@ -742,7 +742,7 @@ qat_asym_session_configure(struct rte_cryptodev *dev,
 			err = -EINVAL;
 			goto error;
 		}
-	} else if (xform->xform_type >= RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+	} else if (xform->xform_type > RTE_CRYPTO_ASYM_XFORM_ECPM
 			|| xform->xform_type <= RTE_CRYPTO_ASYM_XFORM_NONE) {
 		QAT_LOG(ERR, "Invalid asymmetric crypto xform");
 		err = -EINVAL;
diff --git a/lib/cryptodev/rte_crypto_asym.h b/lib/cryptodev/rte_crypto_asym.h
index 9c866f553f..5edf658572 100644
--- a/lib/cryptodev/rte_crypto_asym.h
+++ b/lib/cryptodev/rte_crypto_asym.h
@@ -94,8 +94,6 @@ enum rte_crypto_asym_xform_type {
 	 */
 	RTE_CRYPTO_ASYM_XFORM_ECPM,
 	/**< Elliptic Curve Point Multiplication */
-	RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
-	/**< End of list */
 };
 
 /**
@@ -116,7 +114,6 @@ enum rte_crypto_asym_op_type {
 	/**< DH Public Key generation operation */
 	RTE_CRYPTO_ASYM_OP_SHARED_SECRET_COMPUTE,
 	/**< DH Shared Secret compute operation */
-	RTE_CRYPTO_ASYM_OP_LIST_END
 };
 
 /**
@@ -133,7 +130,6 @@ enum rte_crypto_rsa_padding_type {
 	/**< RSA PKCS#1 OAEP padding scheme */
 	RTE_CRYPTO_RSA_PADDING_PSS,
 	/**< RSA PKCS#1 PSS padding scheme */
-	RTE_CRYPTO_RSA_PADDING_TYPE_LIST_END
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements
@ 2021-07-31 18:13  8% Akhil Goyal
  2021-07-31 18:13  3% ` [dpdk-dev] [PATCH 1/4] cryptodev: remove LIST_END enumerators Akhil Goyal
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Akhil Goyal @ 2021-07-31 18:13 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, hemant.agrawal, anoobj,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	g.singh, roy.fan.zhang, jianjay.zhou, asomalap, ruifeng.wang,
	Akhil Goyal

This is a first series planned for ABI improvements
in cryptodev and security library.

Other planned improvements under development.
- cryptodev: export driver interface as internal
- cryptodev: split and hide struct rte_cryptodev, struct
rte_cryptodev_data
- cryptodev: hide struct rte_cryptodev_sym_session,
rte_cryptodev_asym_session
- security: hide struct rte_security_session

Request everyone to review and contribute for the missing
pieces to improve ABI stability. 

Akhil Goyal (4):
  cryptodev: remove LIST_END enumerators
  cryptodev: promote asym APIs to stable
  security: hide internal API
  security: add reserved bitfields

 app/test/test_cryptodev_asym.c     |  4 ++--
 devtools/libabigail.abignore       |  4 ++++
 drivers/crypto/qat/qat_asym.c      |  2 +-
 lib/cryptodev/rte_crypto_asym.h    |  4 ----
 lib/cryptodev/rte_cryptodev.h      | 10 ----------
 lib/cryptodev/version.map          | 24 +++++++++++++-----------
 lib/security/rte_security.h        |  6 ++++++
 lib/security/rte_security_driver.h |  2 +-
 lib/security/version.map           |  7 ++++++-
 9 files changed, 33 insertions(+), 30 deletions(-)

-- 
2.25.1


^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [EXT] [PATCH 01/10] security: add support for TSO on IPsec session
  2021-07-27 18:34  3%   ` [dpdk-dev] [EXT] " Akhil Goyal
  2021-07-29  8:37  0%     ` Nicolau, Radu
@ 2021-07-31 17:50  0%     ` Akhil Goyal
  1 sibling, 0 replies; 200+ results
From: Akhil Goyal @ 2021-07-31 17:50 UTC (permalink / raw)
  To: Radu Nicolau, Declan Doherty, Abhijit Sinha, Daniel Martin Buckley
  Cc: Anoob Joseph, dev, Ankur Dwivedi, Tejasree Kondoj

> > Allow user to provision a per security session maximum segment size
> > (MSS) for use when Transmit Segmentation Offload (TSO) is supported.
> > The MSS value will be used when PKT_TX_TCP_SEG or PKT_TX_UDP_SEG
> > ol_flags are specified in mbuf.
> >
> > Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> > Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
> > Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
> > Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com>
> > ---
> Can we have deprecation notice for the changes introduced in this series.
> 
> Also there are 2 other features which modify same struct. Can we have a
> Single deprecation notice for all the changes in the
> rte_security_ipsec_sa_options?
> The notice can be something like:
> +* security: The IPsec SA config options structure ``struct
> rte_security_ipsec_sa_options``
> +  will be updated to support more features.
> And we may have a reserved bit fields for rest of the vacant bits so that ABI is
> not broken
> When a new bit field is added.
> 
> http://patches.dpdk.org/project/dpdk/patch/20210630112049.3747-1-
> marchana@marvell.com/
> http://patches.dpdk.org/project/dpdk/patch/20210705131335.21070-1-
> ktejasree@marvell.com/

I have sent the consolidated deprecation notice for all three features.
Can you guys Ack it?
https://mails.dpdk.org/archives/dev/2021-July/215906.html

Also, please send deprecation notice for changes in ipsec xform as well.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v11 00/10] eal: Add EAL API for threading
  @ 2021-07-30 22:31  3% ` Narcisa Ana Maria Vasile
  2021-08-02 17:32  3%   ` [dpdk-dev] [PATCH v12 " Narcisa Ana Maria Vasile
  0 siblings, 1 reply; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-07-30 22:31 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

EAL thread API

**Problem Statement**
DPDK currently uses the pthread interface to create and manage threads.
Windows does not support the POSIX thread programming model,
so it currently
relies on a header file that hides the Windows calls under
pthread matched interfaces. Given that EAL should isolate the environment
specifics from the applications and libraries and mediate
all the communication with the operating systems, a new EAL interface
is needed for thread management.

**Goals**
* Introduce a generic EAL API for threading support that will remove
  the current Windows pthread.h shim.
* Replace references to pthread_* across the DPDK codebase with the new
  RTE_THREAD_* API.
* Allow users to choose between using the RTE_THREAD_* API or a
  3rd party thread library through a configuration option.

**Design plan**
New API main files:
* rte_thread.h (librte_eal/include)
* rte_thread.c (librte_eal/windows)
* rte_thread.c (librte_eal/common)

**A schematic example of the design**
--------------------------------------------------
lib/librte_eal/include/rte_thread.h
int rte_thread_create();

lib/librte_eal/common/rte_thread.c
int rte_thread_create() 
{
	return pthread_create();
}

lib/librte_eal/windows/rte_thread.c
int rte_thread_create() 
{
	return CreateThread();
}
-----------------------------------------------------

**Thread attributes**

When or after a thread is created, specific characteristics of the thread
can be adjusted. Given that the thread characteristics that are of interest
for DPDK applications are affinity and priority, the following structure
that represents thread attributes has been defined:

typedef struct
{
	enum rte_thread_priority priority;
	rte_cpuset_t cpuset;
} rte_thread_attr_t;

The *rte_thread_create()* function can optionally receive
an rte_thread_attr_t
object that will cause the thread to be created with the
affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.
An rte_thread_attr_t object can also be set to the default values
by calling *rte_thread_attr_init()*.

*Priority* is represented through an enum that currently advertises
two values for priority:
	- RTE_THREAD_PRIORITY_NORMAL
	- RTE_THREAD_PRIORITY_REALTIME_CRITICAL
The enum can be extended to allow for multiple priority levels.
rte_thread_set_priority      - sets the priority of a thread
rte_thread_attr_set_priority - updates an rte_thread_attr_t object
                               with a new value for priority

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p ffff

*Affinity* is described by the already known “rte_cpuset_t” type.
rte_thread_attr_set/get_affinity - sets/gets the affinity field in a
                                   rte_thread_attr_t object
rte_thread_set/get_affinity      – sets/gets the affinity of a thread

**Errors**
A translation function that maps Windows error codes to errno-style
error codes is provided. 

**Future work**
The long term plan is for EAL to provide full threading support:
* Add support for conditional variables
* Add support for pthread_mutex_trylock
* Additional functionality offered by pthread_*
  (such as pthread_setname_np, etc.)

v11:
 - Add unit tests for thread API
 - Rebase

v10:
 - Remove patch no. 10. It will be broken down in subpatches 
   and sent as a different patchset that depends on this one.
   This is done due to the ABI breaks that would be caused by patch 10.
 - Replace unix/rte_thread.c with common/rte_thread.c
 - Remove initializations that may prevent compiler from issuing useful
   warnings.
 - Remove rte_thread_types.h and rte_windows_thread_types.h
 - Remove unneeded priority macros (EAL_THREAD_PRIORITY*)
 - Remove functions that retrieves thread handle from process handle
 - Remove rte_thread_cancel() until same behavior is obtained on
   all platforms.
 - Fix rte_thread_detach() function description,
   return value and remove empty line.
 - Reimplement mutex functions. Add compatible representation for mutex
   identifier. Add macro to replace static mutex initialization instances.
 - Fix commit messages (lines too long, remove unicode symbols)

v9:
- Sign patches

v8:
- Rebase
- Add rte_thread_detach() API
- Set default priority, when user did not specify a value

v7:
Based on DmitryK's review:
- Change thread id representation
- Change mutex id representation
- Implement static mutex inititalizer for Windows
- Change barrier identifier representation
- Improve commit messages
- Add missing doxygen comments
- Split error translation function
- Improve name for affinity function
- Remove cpuset_size parameter
- Fix eal_create_cpu_map function
- Map EAL priority values to OS specific values
- Add thread wrapper for start routine
- Do not export rte_thread_cancel() on Windows
- Cleanup, fix comments, fix typos.

v6:
- improve error-translation function
- call the error translation function in rte_thread_value_get()

v5:
- update cover letter with more details on the priority argument

v4:
- fix function description
- rebase

v3:
- rebase

v2:
- revert changes that break ABI 
- break up changes into smaller patches
- fix coding style issues
- fix issues with errors
- fix parameter type in examples/kni.c


Narcisa Vasile (10):
  eal: add basic threading functions
  eal: add thread attributes
  eal/windows: translate Windows errors to errno-style errors
  eal: implement functions for thread affinity management
  eal: implement thread priority management functions
  eal: add thread lifetime management
  eal: implement functions for mutex management
  eal: implement functions for thread barrier management
  eal: add EAL argument for setting thread priority
  Add unit tests for thread API

 app/test/meson.build                |   2 +
 app/test/test_threads.c             | 419 ++++++++++++++++++++
 lib/eal/common/eal_common_options.c |  28 +-
 lib/eal/common/eal_internal_cfg.h   |   2 +
 lib/eal/common/eal_options.h        |   2 +
 lib/eal/common/meson.build          |   1 +
 lib/eal/common/rte_thread.c         | 445 +++++++++++++++++++++
 lib/eal/include/rte_thread.h        | 406 ++++++++++++++++++-
 lib/eal/unix/meson.build            |   1 -
 lib/eal/unix/rte_thread.c           |  92 -----
 lib/eal/version.map                 |  20 +
 lib/eal/windows/eal_lcore.c         | 176 ++++++---
 lib/eal/windows/eal_windows.h       |  10 +
 lib/eal/windows/include/sched.h     |   2 +-
 lib/eal/windows/rte_thread.c        | 588 ++++++++++++++++++++++++++--
 15 files changed, 2020 insertions(+), 174 deletions(-)
 create mode 100644 app/test/test_threads.c
 create mode 100644 lib/eal/common/rte_thread.c
 delete mode 100644 lib/eal/unix/rte_thread.c

-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce security API changes for Inline IPsec
  @ 2021-07-30 22:16  3% ` Thomas Monjalon
  2021-08-03  2:11  3%   ` Nithin Dabilpuram
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-07-30 22:16 UTC (permalink / raw)
  To: konstantin.ananyev, jerinj, gakhil, roy.fan.zhang,
	hemant.agrawal, Nithin Dabilpuram
  Cc: matan, dev, ferruh.yigit, bruce.richardson, mdr, david.marchand

27/07/2021 19:36, Nithin Dabilpuram:
> Announce changes to make rte_security_set_pkt_metadata() and
> rte_security_get_userdata() inline instead of C functions and
> also addition of another field in structure rte_security_ctx for
> holding flags.

I guess there is a performance reason but the motivation
is not explained. Also it is going in the opposite direction
of what is discussed in the Technical Board meetings:
we should avoid and reduce the number of inline functions
to reduce the ABI surface.



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11
  2021-07-28  6:23  4%     ` Kundapura, Ganapati
@ 2021-07-30 11:17  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-30 11:17 UTC (permalink / raw)
  To: Kundapura, Ganapati; +Cc: dpdk-dev, Jayatheerthan, Jay

On Wed, Jul 28, 2021 at 11:53 AM Kundapura, Ganapati
<ganapati.kundapura@intel.com> wrote:
>
> Comments inlined

Please fix your email client for adding proper >

>
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 28 July 2021 11:38
> To: Kundapura, Ganapati <ganapati.kundapura@intel.com>
> Cc: dpdk-dev <dev@dpdk.org>; Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: Re: RFC: Enahancements to Rx adapter for DPDK 21.11
>
> On Mon, Jul 26, 2021 at 6:37 PM Kundapura, Ganapati <ganapati.kundapura@intel.com> wrote:
> >
> > A gentle ping for comments.
> >
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Kundapura, Ganapati
> > Sent: 23 July 2021 12:33
> > To: dpdk-dev <dev@dpdk.org>; Jerin Jacob <jerinjacobk@gmail.com>;
> > Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> > Subject: [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11
> >
> > Hi dpdk-dev,
> >
> > We would like to submit series of patches to Rx adapters that will enhance the configuration and performance.
> > Please find the details below.
> >
> > (1) Configure Rx event buffer at run time
> >     Add new api to configure the size of the Rx event buffer at run time.
> >     This api allows setting the size of the event buffer at adapter level.
>
> Since we can change ABI for 21.11, Not prefer to add a new API instead add a param to config structure.
> Please send the deprecation notice for ABI change.
>
> Config structure passed to rte_event_eth_rx_adapter_create() is of type rte_event_port_conf which
> comes from event framework(rte_eventdev.h).
> Does it make sense to pass adapter event buffer size in rte_event_port_conf structure?

I see. Then new API is better to set the buffer is OK.


>
> >
> > (2) Change packet enqueue buffer in Rx adapter to circular buffer
> >     Rx adapter uses memmove() to move unprocessed events to the begining
> >     of packet enqueue buffer which consumes good amount of CPU cycles.
>
> Looks good.
>
>
> >
> > (3) Add API to retrieve the Rx queue info
> >     Rx queue info containinin  flags for handling received packets,
> >     event queue identifier, schedular type, event priority,
> >     polling frequence of the receive queue and flow identifier
>
> Looks good. Please implement it as adaptor ops so that it can be adapter specific to support HW implementations.
>
>
>
> >
> > (4) Add adapter_stats cli to retrive Rx/Tx adapter stats and rxq info
> >     This cli displays Rx and Tx adapter stats containing recieved packet count,
> >     eventdev enqueue count, enqueue retry count, event buffer size, queue poll count,
> >     transmitted packet count, packet dropped count, transmit fail count etc and rx queue info.
>
> Generally, we don't entertain CLI in the library. You can add command-line arguments to app/test-eventdev to test this.
>
> Adapter_stats is standalone application not part of library and it'll be in app/adapter_stats.

No need for a new app. Please add stats as telemetry, then it can be
pull through
usertools/dpdk-telemetry.py



> >
> > (5) Update Rx timestamp in mbuf using mbuf dynamic field
> >     Add support to register timestamp dynamic field in mbuf
> >     Update the timestamp in mbuf for each packet before eventdev
> > enqueue
>
> Cool.
>
> >
> > We look forward to feedback on this proposal. Once we have initial feedback, patches will be submitted for review.
> >
> > Thanks,
> > Ganapati

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] [PATCH 01/10] security: add support for TSO on IPsec session
  2021-07-27 18:34  3%   ` [dpdk-dev] [EXT] " Akhil Goyal
@ 2021-07-29  8:37  0%     ` Nicolau, Radu
  2021-07-31 17:50  0%     ` Akhil Goyal
  1 sibling, 0 replies; 200+ results
From: Nicolau, Radu @ 2021-07-29  8:37 UTC (permalink / raw)
  To: Akhil Goyal, Tejasree Kondoj, Declan Doherty
  Cc: Anoob Joseph, dev, Abhijit Sinha, Daniel Martin Buckley, Ankur Dwivedi

Hi, thanks for reviewing. I'm OOO at the moment, I will send an updated 
patchset next week.

On 7/27/2021 9:34 PM, Akhil Goyal wrote:
>> Allow user to provision a per security session maximum segment size
>> (MSS) for use when Transmit Segmentation Offload (TSO) is supported.
>> The MSS value will be used when PKT_TX_TCP_SEG or PKT_TX_UDP_SEG
>> ol_flags are specified in mbuf.
>>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
>> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
>> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com>
>> ---
> Can we have deprecation notice for the changes introduced in this series.
>
> Also there are 2 other features which modify same struct. Can we have a
> Single deprecation notice for all the changes in the rte_security_ipsec_sa_options?
> The notice can be something like:
> +* security: The IPsec SA config options structure ``struct rte_security_ipsec_sa_options``
> +  will be updated to support more features.
> And we may have a reserved bit fields for rest of the vacant bits so that ABI is not broken
> When a new bit field is added.
>
> http://patches.dpdk.org/project/dpdk/patch/20210630112049.3747-1-marchana@marvell.com/
> http://patches.dpdk.org/project/dpdk/patch/20210705131335.21070-1-ktejasree@marvell.com/

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-12 16:17  3% [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name Andrew Rybchenko
  2021-07-19  6:58  0% ` Xueming(Steven) Li
@ 2021-07-29  4:20  0% ` Xueming(Steven) Li
  2021-08-01  8:50  0%   ` Andrew Rybchenko
  2021-08-18 14:00  3% ` [dpdk-dev] [PATCH v2] " Andrew Rybchenko
  2 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-07-29  4:20 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable



> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, July 13, 2021 12:18 AM
> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur <somnath.kotur@broadcom.com>; John Daley
> <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>; Qiming Yang
> <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Xueming(Steven) Li <xuemingl@nvidia.com>
> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> Subject: [PATCH] ethdev: fix representor port ID search by name
> 
> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
> 
> Fix representor port ID search by name if the representor itself does not provide representors info. Getting a list of representors from
> a representor does not make sense. Instead, a parent device should be used.
> 
> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
> 
> Fixes: df7547a6a2cc ("ethdev: add helper function to get representor ID")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> ---
> The new field is added into the hole in rte_eth_dev_data structure.
> The patch does not change ABI, but extra care is required since ABI check is disabled for the structure because of the libabigail bug [1].
> 
> Potentially it is bad for out-of-tree drivers which implement representors but do not fill in a new parert_port_id field in
> rte_eth_dev_data structure. Do we care?
> 
> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
> 
> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
> 
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
> 
>  drivers/net/bnxt/bnxt_reps.c             |  1 +
>  drivers/net/enic/enic_vf_representor.c   |  1 +
>  drivers/net/i40e/i40e_vf_representor.c   |  1 +
>  drivers/net/ice/ice_dcf_vf_representor.c |  1 +  drivers/net/ixgbe/ixgbe_vf_representor.c |  1 +
>  drivers/net/mlx5/linux/mlx5_os.c         | 11 +++++++++++
>  drivers/net/mlx5/windows/mlx5_os.c       | 11 +++++++++++
>  lib/ethdev/ethdev_driver.h               |  6 +++---
>  lib/ethdev/rte_class_eth.c               |  2 +-
>  lib/ethdev/rte_ethdev.c                  |  8 ++++----
>  lib/ethdev/rte_ethdev_core.h             |  4 ++++
>  11 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c index bdbad53b7d..902591cd39 100644
> --- a/drivers/net/bnxt/bnxt_reps.c
> +++ b/drivers/net/bnxt/bnxt_reps.c
> @@ -187,6 +187,7 @@ int bnxt_representor_init(struct rte_eth_dev *eth_dev, void *params)
>  	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
>  					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
>  	eth_dev->data->representor_id = rep_params->vf_id;
> +	eth_dev->data->parent_port_id = rep_params->parent_dev->data->port_id;
> 
>  	rte_eth_random_addr(vf_rep_bp->dflt_mac_addr);
>  	memcpy(vf_rep_bp->mac_addr, vf_rep_bp->dflt_mac_addr, diff --git a/drivers/net/enic/enic_vf_representor.c
> b/drivers/net/enic/enic_vf_representor.c
> index 79dd6e5640..6ee7967ce9 100644
> --- a/drivers/net/enic/enic_vf_representor.c
> +++ b/drivers/net/enic/enic_vf_representor.c
> @@ -662,6 +662,7 @@ int enic_vf_representor_init(struct rte_eth_dev *eth_dev, void *init_params)
>  	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
>  					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
>  	eth_dev->data->representor_id = vf->vf_id;
> +	eth_dev->data->parent_port_id = pf->port_id;
>  	eth_dev->data->mac_addrs = rte_zmalloc("enic_mac_addr_vf",
>  		sizeof(struct rte_ether_addr) *
>  		ENIC_UNICAST_PERFECT_FILTERS, 0);
> diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
> index 0481b55381..865b637585 100644
> --- a/drivers/net/i40e/i40e_vf_representor.c
> +++ b/drivers/net/i40e/i40e_vf_representor.c
> @@ -514,6 +514,7 @@ i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
>  	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
>  					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
>  	ethdev->data->representor_id = representor->vf_id;
> +	ethdev->data->parent_port_id = pf->dev_data->parent_port_id;
> 
>  	/* Setting the number queues allocated to the VF */
>  	ethdev->data->nb_rx_queues = vf->vsi->nb_qps; diff --git a/drivers/net/ice/ice_dcf_vf_representor.c
> b/drivers/net/ice/ice_dcf_vf_representor.c
> index 970461f3e9..c7cd3fd290 100644
> --- a/drivers/net/ice/ice_dcf_vf_representor.c
> +++ b/drivers/net/ice/ice_dcf_vf_representor.c
> @@ -418,6 +418,7 @@ ice_dcf_vf_repr_init(struct rte_eth_dev *vf_rep_eth_dev, void *init_param)
> 
>  	vf_rep_eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>  	vf_rep_eth_dev->data->representor_id = repr->vf_id;
> +	vf_rep_eth_dev->data->parent_port_id =
> +repr->dcf_eth_dev->data->port_id;
> 
>  	vf_rep_eth_dev->data->mac_addrs = &repr->mac_addr;
> 
> diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
> index d5b636a194..7a2063849e 100644
> --- a/drivers/net/ixgbe/ixgbe_vf_representor.c
> +++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
> @@ -197,6 +197,7 @@ ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
> 
>  	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>  	ethdev->data->representor_id = representor->vf_id;
> +	ethdev->data->parent_port_id = representor->pf_ethdev->data->port_id;
> 
>  	/* Set representor device ops */
>  	ethdev->dev_ops = &ixgbe_vf_representor_dev_ops; diff --git a/drivers/net/mlx5/linux/mlx5_os.c
> b/drivers/net/mlx5/linux/mlx5_os.c
> index be22d9cbd2..5550d30628 100644
> --- a/drivers/net/mlx5/linux/mlx5_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_os.c
> @@ -1511,6 +1511,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  	if (priv->representor) {
>  		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>  		eth_dev->data->representor_id = priv->representor_id;
> +		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
> +			const struct mlx5_priv *opriv =
> +				rte_eth_devices[port_id].data->dev_private;
> +
> +			if (!opriv ||
> +			    opriv->sh != priv->sh ||
> +			    opriv->representor)
> +				continue;
> +			eth_dev->data->parent_port_id = port_id;
> +			break;
> +		}

At line 126, there is a logic that locate priv->domain_id, parent port_id could be found there.

>  	}
>  	priv->mp_id.port_id = eth_dev->data->port_id;
>  	strlcpy(priv->mp_id.name, MLX5_MP_NAME, RTE_MP_MAX_NAME_LEN); diff --git a/drivers/net/mlx5/windows/mlx5_os.c
> b/drivers/net/mlx5/windows/mlx5_os.c
> index e30b682822..037c928dc1 100644
> --- a/drivers/net/mlx5/windows/mlx5_os.c
> +++ b/drivers/net/mlx5/windows/mlx5_os.c
> @@ -506,6 +506,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  	if (priv->representor) {
>  		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>  		eth_dev->data->representor_id = priv->representor_id;
> +		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
> +			const struct mlx5_priv *opriv =
> +				rte_eth_devices[port_id].data->dev_private;
> +
> +			if (!opriv ||
> +			    opriv->sh != priv->sh ||
> +			    opriv->representor)
> +				continue;
> +			eth_dev->data->parent_port_id = port_id;
> +			break;
> +		}
>  	}
>  	/*
>  	 * Store associated network device interface index. This index diff --git a/lib/ethdev/ethdev_driver.h
> b/lib/ethdev/ethdev_driver.h index 40e474aa7e..07f6d1f9a4 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
>   * For backward compatibility, if no representor info, direct
>   * map legacy VF (no controller and pf).
>   *
> - * @param ethdev
> - *  Handle of ethdev port.
> + * @param parent_port_id
> + *  Port ID of the backing device.
>   * @param type
>   *  Representor type.
>   * @param controller
> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
>   */
>  __rte_internal
>  int
> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
> +rte_eth_representor_id_get(uint16_t parent_port_id,
>  			   enum rte_eth_representor_type type,
>  			   int controller, int pf, int representor_port,
>  			   uint16_t *repr_id);
> diff --git a/lib/ethdev/rte_class_eth.c b/lib/ethdev/rte_class_eth.c index 1fe5fa1f36..e3b7ab9728 100644
> --- a/lib/ethdev/rte_class_eth.c
> +++ b/lib/ethdev/rte_class_eth.c
> @@ -95,7 +95,7 @@ eth_representor_cmp(const char *key __rte_unused,
>  		c = i / (np * nf);
>  		p = (i / nf) % np;
>  		f = i % nf;
> -		if (rte_eth_representor_id_get(edev,
> +		if (rte_eth_representor_id_get(edev->data->parent_port_id,
>  			eth_da.type,
>  			eth_da.nb_mh_controllers == 0 ? -1 :
>  					eth_da.mh_controllers[c],
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index 6ebf52b641..acda1d43fb 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -5997,7 +5997,7 @@ rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)  }
> 
>  int
> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
> +rte_eth_representor_id_get(uint16_t parent_port_id,
>  			   enum rte_eth_representor_type type,
>  			   int controller, int pf, int representor_port,
>  			   uint16_t *repr_id)
> @@ -6012,7 +6012,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
>  		return -EINVAL;
> 
>  	/* Get PMD representor range info. */
> -	ret = rte_eth_representor_info_get(ethdev->data->port_id, NULL);
> +	ret = rte_eth_representor_info_get(parent_port_id, NULL);
>  	if (ret == -ENOTSUP && type == RTE_ETH_REPRESENTOR_VF &&
>  	    controller == -1 && pf == -1) {
>  		/* Direct mapping for legacy VF representor. */ @@ -6026,7 +6026,7 @@ rte_eth_representor_id_get(const struct
> rte_eth_dev *ethdev,
>  	info = calloc(1, size);
>  	if (info == NULL)
>  		return -ENOMEM;
> -	ret = rte_eth_representor_info_get(ethdev->data->port_id, info);
> +	ret = rte_eth_representor_info_get(parent_port_id, info);
>  	if (ret < 0)
>  		goto out;
> 
> @@ -6045,7 +6045,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
>  			continue;
>  		if (info->ranges[i].id_end < info->ranges[i].id_base) {
>  			RTE_LOG(WARNING, EAL, "Port %hu invalid representor ID Range %u - %u, entry %d\n",
> -				ethdev->data->port_id, info->ranges[i].id_base,
> +				parent_port_id, info->ranges[i].id_base,
>  				info->ranges[i].id_end, i);
>  			continue;
> 
> diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h index edf96de2dc..13cb84b52f 100644
> --- a/lib/ethdev/rte_ethdev_core.h
> +++ b/lib/ethdev/rte_ethdev_core.h
> @@ -185,6 +185,10 @@ struct rte_eth_dev_data {
>  			/**< Switch-specific identifier.
>  			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
>  			 */
> +	uint16_t parent_port_id;
> +			/**< Port ID of the backing device.
> +			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
> +			 */
> 
>  	pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex. */
>  	uint64_t reserved_64s[4]; /**< Reserved for future fields */
> --
> 2.30.2


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-20  8:59  0%           ` Andrew Rybchenko
@ 2021-07-29  4:13  0%             ` Xueming(Steven) Li
  2021-08-01  8:40  0%               ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-07-29  4:13 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable



> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, July 20, 2021 5:00 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> 
> On 7/19/21 3:50 PM, Xueming(Steven) Li wrote:
> >
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Monday, July 19, 2021 8:36 PM
> >> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
> >> <ajit.khaparde@broadcom.com>; Somnath Kotur
> >> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong
> >> Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>;
> >> Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>;
> >> Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>;
> >> Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> >> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> >>
> >> On 7/19/21 2:54 PM, Xueming(Steven) Li wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Monday, July 19, 2021 4:46 PM
> >>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
> >>>> <ajit.khaparde@broadcom.com>; Somnath Kotur
> >>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
> >>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> >>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi
> >>>> Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>;
> >>>> Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>;
> >>>> Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
> >>>> Monjalon <thomas@monjalon.net>; Ferruh Yigit
> >>>> <ferruh.yigit@intel.com>
> >>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >>>> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> >>>>
> >>>> On 7/19/21 9:58 AM, Xueming(Steven) Li wrote:
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>> Sent: Tuesday, July 13, 2021 12:18 AM
> >>>>>> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> >>>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
> >>>>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> >>>>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi
> >>>>>> Zhang <qi.z.zhang@intel.com>; Haiyue Wang
> >>>>>> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf
> >>>>>> Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >>>>>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> >>>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
> >>>>>> Xueming(Steven) Li <xuemingl@nvidia.com>
> >>>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >>>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >>>>>> Subject: [PATCH] ethdev: fix representor port ID search by name
> >>>>>>
> >>>>>> From: Viacheslav Galaktionov
> >>>>>> <viacheslav.galaktionov@oktetlabs.ru>
> >>>>>>
> >>>>>> Fix representor port ID search by name if the representor itself
> >>>>>> does not provide representors info. Getting a list of
> >>>>>> representors from a representor does not make sense. Instead, a
> >>>>>> parent device
> >>>> should be used.
> >>>>>>
> >>>>>> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
> >>>>>>
> >>>>>> Fixes: df7547a6a2cc ("ethdev: add helper function to get
> >>>>>> representor
> >>>>>> ID")
> >>>>>> Cc: stable@dpdk.org
> >>>>>>
> >>>>>> Signed-off-by: Viacheslav Galaktionov
> >>>>>> <viacheslav.galaktionov@oktetlabs.ru>
> >>>>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>> ---
> >>>>>> The new field is added into the hole in rte_eth_dev_data structure.
> >>>>>> The patch does not change ABI, but extra care is required since
> >>>>>> ABI check is disabled for the structure because of the libabigail
> >>>>>> bug
> >>>> [1].
> >>>>>>
> >>>>>> Potentially it is bad for out-of-tree drivers which implement
> >>>>>> representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Do we care?
> >>>>>>
> >>>>>> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
> >>>>>>
> >>>>>> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
> >>>>>>
> >>>>>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
> >>>>
> >>>> [snip]
> >>>>
> >>>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>>> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
> >>>>>>      * For backward compatibility, if no representor info, direct
> >>>>>>      * map legacy VF (no controller and pf).
> >>>>>>      *
> >>>>>> - * @param ethdev
> >>>>>> - *  Handle of ethdev port.
> >>>>>> + * @param parent_port_id
> >>>>>> + *  Port ID of the backing device.
> >>>>>>      * @param type
> >>>>>>      *  Representor type.
> >>>>>>      * @param controller
> >>>>>> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
> >>>>>>      */
> >>>>>>     __rte_internal
> >>>>>>     int
> >>>>>> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
> >>>>>> +rte_eth_representor_id_get(uint16_t parent_port_id,
> >>>>>
> >>>>> It make more sense to get representor info from parent port.
> >>>>> Representor is a member of switch domain, PMD owns the information
> >>>>> of the representor owner port and info of representors. This
> >>>>> change looks better, but not sure whether it valuable to introduce
> >>>>> a new
> >>>> member to the EAL data structure.
> >>>>
> >>>> IMHO, it is simply incorrect to return representors info on a
> >>>> representor itself. Representor info is an information which representors may be populated using the device.
> >>>>
> >>>> If above statement is correct, we need a way to get parent device
> >>>> by representor to do name to representor ID mapping. I see two options to do it:
> >>>>     A. Dedicated field in rte_eth_dev_data as the patch does.
> >>>>     B. Dedicated ethdev op (since representor knows parent port ID anyway).
> >>>> We have chosen (A) because of simplicity.
> >>>
> >>> Just recalled that representor port could be probed w/o owner PF, is a force for parent port?
> >>
> >> I thought that it is impossible and parent port is absolutely
> >> required for a representor. Could you provide an example and explain how will it work?
> >
> > In case of bonding, PF0 and PF1 become one PF port `bond0`, PCI address is PF0.
> > 	-a <PF0>,representor=pf[0-1]vf[0-99] // this is the syntax we proposed.
> 
> Is it net/bonding or vendor-specific bonding in HW?
> If I remember correctly in the case of net/bonding we have ethdev ports for bonded devices.

Not net/bonding pmd, it's Linux bonding, supported by hw driver.

> 
> >
> > To be backward compatible, also support the following 2 devargs:
> > 	-a <pf0>,representor=[0-99] // probe bond0 and representor on pf0
> > 	-a <pf1>,representor=[0-99] // probe representors on pf1.
> > If devargs start with PF1 devargs, no owner PF1 created as it disabled
> > in bonding. Can't create bond0(PF0) automatically here as device is located by PCI address(PF1) from devargs.
> 
> So, I guess the problem is vendor-specific bonding in HW. Anyway legacy backward compatible representor spec should not require
> representors info since it worked before without it. So, it does not sound like a reason to have representors info on a representor
> itself.

Legacy backward logic could be something like this: if PF owner port found, use it, fallback to current representor.
This won't break anything I guess, how do you think?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
  2021-07-27  8:44  0%       ` Bruce Richardson
@ 2021-07-28 15:32  0%         ` Andrew Rybchenko
  2021-07-31 20:44  0%         ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-07-28 15:32 UTC (permalink / raw)
  To: Bruce Richardson, Xia, Chenbo
  Cc: Yigit, Ferruh, dev, thomas, mdr, nhorman, david.marchand

On 7/27/21 11:44 AM, Bruce Richardson wrote:
> On Mon, Jul 26, 2021 at 05:56:17AM +0000, Xia, Chenbo wrote:
>> Hi, Ferruh
>>
>>> -----Original Message-----
>>> From: Yigit, Ferruh <ferruh.yigit@intel.com>
>>> Sent: Friday, July 23, 2021 8:47 PM
>>> To: Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org; thomas@monjalon.net
>>> Cc: mdr@ashroe.eu; nhorman@tuxdriver.com; david.marchand@redhat.com
>>> Subject: Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus
>>> driver
>>>
>>> On 7/23/2021 8:39 AM, Xia, Chenbo wrote:
>>>> Hi,
>>>>
>>>> A gentle ping for comments..
>>>>
>>>>> -----Original Message-----
>>>>> From: dev <dev-bounces@dpdk.org> On Behalf Of Chenbo Xia
>>>>> Sent: Tuesday, June 1, 2021 4:42 PM
>>>>> To: dev@dpdk.org; thomas@monjalon.net
>>>>> Cc: mdr@ashroe.eu; nhorman@tuxdriver.com
>>>>> Subject: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus
>>> driver
>>>>>
>>>>> All ABIs in PCI bus driver, which are defined in rte_buc_pci.h,
>>>>> will be removed and the header will be made internal.
>>>>>
>>>>> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
>>>>> ---
>>>>>   doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>>   1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>>> b/doc/guides/rel_notes/deprecation.rst
>>>>> index 9584d6bfd7..b01f46c62e 100644
>>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>>> @@ -147,3 +147,8 @@ Deprecation Notices
>>>>>   * cmdline: ``cmdline`` structure will be made opaque to hide platform-
>>>>> specific
>>>>>     content. On Linux and FreeBSD, supported prior to DPDK 20.11,
>>>>>     original structure will be kept until DPDK 21.11.
>>>>> +
>>>>> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver,
>>> "rte_bus_pci.h"
>>>>> +  will be made internal in 21.11 and macros/data structures/functions
>>> defined
>>>>> +  in the header will not be considered as ABI anymore. This change is
>>>>> inspired
>>>>> +  by the RFC
>>> https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
>>>>
>>>> I see there's some ABI improvement work on-going and I think it could be
>>> part of
>>>> the work. If it makes sense to you, I'd like some ACKs.
>>>>
>>>
>>> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>
>>> I am for reducing the public ABI as much as possible. How big will the
>>> change
>>> be? Is the 'rte_bus_pci.h' used other than './drivers/bus/pci/'?
>>
>> I don't see big change here. And I am not sure if I understand your second
>> question. The rte_bus_pci.h will still be used by drivers (maybe remove the
>> rte prefix and change the file name).
>>
> The file itself will still be exported in some cases, where the end-user
> has their own drivers which need to be compiled, so I'd recommend keeping
> the rte_ prefix. However, I think making all bus APIs internal-only to DPDK
> is a good idea.
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 

Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] 回复: [PATCH v1 2/2] devtools: use absolute path for the build directory
  @ 2021-07-28  7:20  0%   ` Feifei Wang
  0 siblings, 0 replies; 200+ results
From: Feifei Wang @ 2021-07-28  7:20 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, nd, Juraj Linkeš, Ruifeng Wang, nd

Hi, Bruce

Sorry to disturb you again. Would you please help review the second patch
of this series? Thanks very much.

Best Regards
Feifei

> -----邮件原件-----
> 发件人: Feifei Wang <feifei.wang2@arm.com>
> 发送时间: Tuesday, June 1, 2021 9:57 AM
> 收件人: Bruce Richardson <bruce.richardson@intel.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Phil Yang <Phil.Yang@arm.com>;
> Juraj Linkeš <juraj.linkes@pantheon.tech>; Feifei Wang
> <Feifei.Wang2@arm.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>
> 主题: [PATCH v1 2/2] devtools: use absolute path for the build directory
> 
> From: Phil Yang <phil.yang@arm.com>
> 
> To make the code easier to maintain, use the absolute path for the default
> build_dir to avoid repeatedly calling of readlink.
> 
> Suggested-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  devtools/test-meson-builds.sh | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
> index 43b906598d..d6b0e7e059 100755
> --- a/devtools/test-meson-builds.sh
> +++ b/devtools/test-meson-builds.sh
> @@ -16,7 +16,7 @@ srcdir=$(dirname $(readlink -f $0))/..
> 
>  MESON=${MESON:-meson}
>  use_shared="--default-library=shared"
> -builds_dir=${DPDK_BUILD_TEST_DIR:-.}
> +builds_dir=$(readlink -f ${DPDK_BUILD_TEST_DIR:-.})
> 
>  if command -v gmake >/dev/null 2>&1 ; then
>  	MAKE=gmake
> @@ -193,16 +193,16 @@ build () # <directory> <target cc | cross file> <ABI
> check> [meson options]
>  		fi
> 
>  		install_target $builds_dir/$targetdir \
> -			$(readlink -f $builds_dir/$targetdir/install)
> +			$builds_dir/$targetdir/install
>  		echo "Checking ABI compatibility of $targetdir" >&$verbose
>  		echo $srcdir/devtools/gen-abi.sh \
> -			$(readlink -f
> $builds_dir/$targetdir/install) >&$veryverbose
> +			$builds_dir/$targetdir/install >&$veryverbose
>  		$srcdir/devtools/gen-abi.sh \
> -			$(readlink -f
> $builds_dir/$targetdir/install) >&$veryverbose
> +			$builds_dir/$targetdir/install >&$veryverbose
>  		echo $srcdir/devtools/check-abi.sh $abirefdir/$targetdir \
> -			$(readlink -f
> $builds_dir/$targetdir/install) >&$veryverbose
> +			$builds_dir/$targetdir/install >&$veryverbose
>  		$srcdir/devtools/check-abi.sh $abirefdir/$targetdir \
> -			$(readlink -f
> $builds_dir/$targetdir/install) >&$verbose
> +			$builds_dir/$targetdir/install >&$verbose
>  	fi
>  }
> 
> @@ -275,7 +275,7 @@ done
>  # Test installation of the x86-generic target, to be used for checking  # the
> sample apps build using the pkg-config file for cflags and libs  load_env cc -
> build_path=$(readlink -f $builds_dir/build-x86-generic)
> +build_path=$builds_dir/build-x86-generic
>  export DESTDIR=$build_path/install
>  install_target $build_path $DESTDIR
>  pc_file=$(find $DESTDIR -name libdpdk.pc)
> --
> 2.25.1


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11
  2021-07-28  6:08  4%   ` Jerin Jacob
@ 2021-07-28  6:23  4%     ` Kundapura, Ganapati
  2021-07-30 11:17  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Kundapura, Ganapati @ 2021-07-28  6:23 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dpdk-dev, Jayatheerthan, Jay

Comments inlined

-----Original Message-----
From: Jerin Jacob <jerinjacobk@gmail.com> 
Sent: 28 July 2021 11:38
To: Kundapura, Ganapati <ganapati.kundapura@intel.com>
Cc: dpdk-dev <dev@dpdk.org>; Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
Subject: Re: RFC: Enahancements to Rx adapter for DPDK 21.11

On Mon, Jul 26, 2021 at 6:37 PM Kundapura, Ganapati <ganapati.kundapura@intel.com> wrote:
>
> A gentle ping for comments.
>
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Kundapura, Ganapati
> Sent: 23 July 2021 12:33
> To: dpdk-dev <dev@dpdk.org>; Jerin Jacob <jerinjacobk@gmail.com>; 
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11
>
> Hi dpdk-dev,
>
> We would like to submit series of patches to Rx adapters that will enhance the configuration and performance.
> Please find the details below.
>
> (1) Configure Rx event buffer at run time
>     Add new api to configure the size of the Rx event buffer at run time.
>     This api allows setting the size of the event buffer at adapter level.

Since we can change ABI for 21.11, Not prefer to add a new API instead add a param to config structure.
Please send the deprecation notice for ABI change.

Config structure passed to rte_event_eth_rx_adapter_create() is of type rte_event_port_conf which
comes from event framework(rte_eventdev.h). 
Does it make sense to pass adapter event buffer size in rte_event_port_conf structure?

>
> (2) Change packet enqueue buffer in Rx adapter to circular buffer
>     Rx adapter uses memmove() to move unprocessed events to the begining
>     of packet enqueue buffer which consumes good amount of CPU cycles.

Looks good.


>
> (3) Add API to retrieve the Rx queue info
>     Rx queue info containinin  flags for handling received packets,
>     event queue identifier, schedular type, event priority,
>     polling frequence of the receive queue and flow identifier

Looks good. Please implement it as adaptor ops so that it can be adapter specific to support HW implementations.



>
> (4) Add adapter_stats cli to retrive Rx/Tx adapter stats and rxq info
>     This cli displays Rx and Tx adapter stats containing recieved packet count,
>     eventdev enqueue count, enqueue retry count, event buffer size, queue poll count,
>     transmitted packet count, packet dropped count, transmit fail count etc and rx queue info.

Generally, we don't entertain CLI in the library. You can add command-line arguments to app/test-eventdev to test this.

Adapter_stats is standalone application not part of library and it'll be in app/adapter_stats. 
>
> (5) Update Rx timestamp in mbuf using mbuf dynamic field
>     Add support to register timestamp dynamic field in mbuf
>     Update the timestamp in mbuf for each packet before eventdev 
> enqueue

Cool.

>
> We look forward to feedback on this proposal. Once we have initial feedback, patches will be submitted for review.
>
> Thanks,
> Ganapati

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11
  @ 2021-07-28  6:08  4%   ` Jerin Jacob
  2021-07-28  6:23  4%     ` Kundapura, Ganapati
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2021-07-28  6:08 UTC (permalink / raw)
  To: Kundapura, Ganapati; +Cc: dpdk-dev, Jayatheerthan, Jay

On Mon, Jul 26, 2021 at 6:37 PM Kundapura, Ganapati
<ganapati.kundapura@intel.com> wrote:
>
> A gentle ping for comments.
>
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Kundapura, Ganapati
> Sent: 23 July 2021 12:33
> To: dpdk-dev <dev@dpdk.org>; Jerin Jacob <jerinjacobk@gmail.com>; Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11
>
> Hi dpdk-dev,
>
> We would like to submit series of patches to Rx adapters that will enhance the configuration and performance.
> Please find the details below.
>
> (1) Configure Rx event buffer at run time
>     Add new api to configure the size of the Rx event buffer at run time.
>     This api allows setting the size of the event buffer at adapter level.

Since we can change ABI for 21.11, Not prefer to add a new API instead
add a param to config structure.
Please send the deprecation notice for ABI change.

>
> (2) Change packet enqueue buffer in Rx adapter to circular buffer
>     Rx adapter uses memmove() to move unprocessed events to the begining
>     of packet enqueue buffer which consumes good amount of CPU cycles.

Looks good.


>
> (3) Add API to retrieve the Rx queue info
>     Rx queue info containinin  flags for handling received packets,
>     event queue identifier, schedular type, event priority,
>     polling frequence of the receive queue and flow identifier

Looks good. Please implement it as adaptor ops so that it can be
adapter specific to
support HW implementations.



>
> (4) Add adapter_stats cli to retrive Rx/Tx adapter stats and rxq info
>     This cli displays Rx and Tx adapter stats containing recieved packet count,
>     eventdev enqueue count, enqueue retry count, event buffer size, queue poll count,
>     transmitted packet count, packet dropped count, transmit fail count etc and rx queue info.

Generally, we don't entertain CLI in the library. You can add
command-line arguments to app/test-eventdev
to test this.

>
> (5) Update Rx timestamp in mbuf using mbuf dynamic field
>     Add support to register timestamp dynamic field in mbuf
>     Update the timestamp in mbuf for each packet before eventdev enqueue

Cool.

>
> We look forward to feedback on this proposal. Once we have initial feedback, patches will be submitted for review.
>
> Thanks,
> Ganapati

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [EXT] [PATCH 01/10] security: add support for TSO on IPsec session
  @ 2021-07-27 18:34  3%   ` Akhil Goyal
  2021-07-29  8:37  0%     ` Nicolau, Radu
  2021-07-31 17:50  0%     ` Akhil Goyal
  0 siblings, 2 replies; 200+ results
From: Akhil Goyal @ 2021-07-27 18:34 UTC (permalink / raw)
  To: Radu Nicolau, Tejasree Kondoj, Declan Doherty
  Cc: Anoob Joseph, dev, Abhijit Sinha, Daniel Martin Buckley, Ankur Dwivedi

> Allow user to provision a per security session maximum segment size
> (MSS) for use when Transmit Segmentation Offload (TSO) is supported.
> The MSS value will be used when PKT_TX_TCP_SEG or PKT_TX_UDP_SEG
> ol_flags are specified in mbuf.
> 
> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com>
> ---
Can we have deprecation notice for the changes introduced in this series.

Also there are 2 other features which modify same struct. Can we have a
Single deprecation notice for all the changes in the rte_security_ipsec_sa_options?
The notice can be something like:
+* security: The IPsec SA config options structure ``struct rte_security_ipsec_sa_options``
+  will be updated to support more features.
And we may have a reserved bit fields for rest of the vacant bits so that ABI is not broken
When a new bit field is added.

http://patches.dpdk.org/project/dpdk/patch/20210630112049.3747-1-marchana@marvell.com/
http://patches.dpdk.org/project/dpdk/patch/20210705131335.21070-1-ktejasree@marvell.com/

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
    2021-07-23  7:39  3% ` Xia, Chenbo
@ 2021-07-27 10:58  0% ` Ananyev, Konstantin
  1 sibling, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2021-07-27 10:58 UTC (permalink / raw)
  To: Xia, Chenbo, dev, thomas; +Cc: mdr, nhorman

> 
> All ABIs in PCI bus driver, which are defined in rte_buc_pci.h,
> will be removed and the header will be made internal.
> 
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 9584d6bfd7..b01f46c62e 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,8 @@ Deprecation Notices
>  * cmdline: ``cmdline`` structure will be made opaque to hide platform-specific
>    content. On Linux and FreeBSD, supported prior to DPDK 20.11,
>    original structure will be kept until DPDK 21.11.
> +
> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver, "rte_bus_pci.h"
> +  will be made internal in 21.11 and macros/data structures/functions defined
> +  in the header will not be considered as ABI anymore. This change is inspired
> +  by the RFC https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.17.1


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
  2021-07-26  5:56  0%     ` Xia, Chenbo
@ 2021-07-27  8:44  0%       ` Bruce Richardson
  2021-07-28 15:32  0%         ` Andrew Rybchenko
  2021-07-31 20:44  0%         ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Bruce Richardson @ 2021-07-27  8:44 UTC (permalink / raw)
  To: Xia, Chenbo; +Cc: Yigit, Ferruh, dev, thomas, mdr, nhorman, david.marchand

On Mon, Jul 26, 2021 at 05:56:17AM +0000, Xia, Chenbo wrote:
> Hi, Ferruh
> 
> > -----Original Message-----
> > From: Yigit, Ferruh <ferruh.yigit@intel.com>
> > Sent: Friday, July 23, 2021 8:47 PM
> > To: Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org; thomas@monjalon.net
> > Cc: mdr@ashroe.eu; nhorman@tuxdriver.com; david.marchand@redhat.com
> > Subject: Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus
> > driver
> > 
> > On 7/23/2021 8:39 AM, Xia, Chenbo wrote:
> > > Hi,
> > >
> > > A gentle ping for comments..
> > >
> > >> -----Original Message-----
> > >> From: dev <dev-bounces@dpdk.org> On Behalf Of Chenbo Xia
> > >> Sent: Tuesday, June 1, 2021 4:42 PM
> > >> To: dev@dpdk.org; thomas@monjalon.net
> > >> Cc: mdr@ashroe.eu; nhorman@tuxdriver.com
> > >> Subject: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus
> > driver
> > >>
> > >> All ABIs in PCI bus driver, which are defined in rte_buc_pci.h,
> > >> will be removed and the header will be made internal.
> > >>
> > >> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> > >> ---
> > >>  doc/guides/rel_notes/deprecation.rst | 5 +++++
> > >>  1 file changed, 5 insertions(+)
> > >>
> > >> diff --git a/doc/guides/rel_notes/deprecation.rst
> > >> b/doc/guides/rel_notes/deprecation.rst
> > >> index 9584d6bfd7..b01f46c62e 100644
> > >> --- a/doc/guides/rel_notes/deprecation.rst
> > >> +++ b/doc/guides/rel_notes/deprecation.rst
> > >> @@ -147,3 +147,8 @@ Deprecation Notices
> > >>  * cmdline: ``cmdline`` structure will be made opaque to hide platform-
> > >> specific
> > >>    content. On Linux and FreeBSD, supported prior to DPDK 20.11,
> > >>    original structure will be kept until DPDK 21.11.
> > >> +
> > >> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver,
> > "rte_bus_pci.h"
> > >> +  will be made internal in 21.11 and macros/data structures/functions
> > defined
> > >> +  in the header will not be considered as ABI anymore. This change is
> > >> inspired
> > >> +  by the RFC
> > https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
> > >
> > > I see there's some ABI improvement work on-going and I think it could be
> > part of
> > > the work. If it makes sense to you, I'd like some ACKs.
> > >
> > 
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > 
> > I am for reducing the public ABI as much as possible. How big will the
> > change
> > be? Is the 'rte_bus_pci.h' used other than './drivers/bus/pci/'?
> 
> I don't see big change here. And I am not sure if I understand your second
> question. The rte_bus_pci.h will still be used by drivers (maybe remove the
> rte prefix and change the file name).
> 
The file itself will still be exported in some cases, where the end-user
has their own drivers which need to be compiled, so I'd recommend keeping
the rte_ prefix. However, I think making all bus APIs internal-only to DPDK
is a good idea.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
  2021-07-23 12:46  3%   ` Ferruh Yigit
@ 2021-07-26  5:56  0%     ` Xia, Chenbo
  2021-07-27  8:44  0%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Xia, Chenbo @ 2021-07-26  5:56 UTC (permalink / raw)
  To: Yigit, Ferruh, dev, thomas; +Cc: mdr, nhorman, david.marchand

Hi, Ferruh

> -----Original Message-----
> From: Yigit, Ferruh <ferruh.yigit@intel.com>
> Sent: Friday, July 23, 2021 8:47 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org; thomas@monjalon.net
> Cc: mdr@ashroe.eu; nhorman@tuxdriver.com; david.marchand@redhat.com
> Subject: Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus
> driver
> 
> On 7/23/2021 8:39 AM, Xia, Chenbo wrote:
> > Hi,
> >
> > A gentle ping for comments..
> >
> >> -----Original Message-----
> >> From: dev <dev-bounces@dpdk.org> On Behalf Of Chenbo Xia
> >> Sent: Tuesday, June 1, 2021 4:42 PM
> >> To: dev@dpdk.org; thomas@monjalon.net
> >> Cc: mdr@ashroe.eu; nhorman@tuxdriver.com
> >> Subject: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus
> driver
> >>
> >> All ABIs in PCI bus driver, which are defined in rte_buc_pci.h,
> >> will be removed and the header will be made internal.
> >>
> >> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> >> b/doc/guides/rel_notes/deprecation.rst
> >> index 9584d6bfd7..b01f46c62e 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -147,3 +147,8 @@ Deprecation Notices
> >>  * cmdline: ``cmdline`` structure will be made opaque to hide platform-
> >> specific
> >>    content. On Linux and FreeBSD, supported prior to DPDK 20.11,
> >>    original structure will be kept until DPDK 21.11.
> >> +
> >> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver,
> "rte_bus_pci.h"
> >> +  will be made internal in 21.11 and macros/data structures/functions
> defined
> >> +  in the header will not be considered as ABI anymore. This change is
> >> inspired
> >> +  by the RFC
> https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
> >
> > I see there's some ABI improvement work on-going and I think it could be
> part of
> > the work. If it makes sense to you, I'd like some ACKs.
> >
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> I am for reducing the public ABI as much as possible. How big will the
> change
> be? Is the 'rte_bus_pci.h' used other than './drivers/bus/pci/'?

I don't see big change here. And I am not sure if I understand your second
question. The rte_bus_pci.h will still be used by drivers (maybe remove the
rte prefix and change the file name).

Thanks,
Chenbo

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] eal: fix argument to rte_bsf32_safe
  2021-07-24  7:58  0%   ` Thomas Monjalon
@ 2021-07-24 23:50  0%     ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2021-07-24 23:50 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, anatoly.burakov, Tyler Retzlaff

On Sat, 24 Jul 2021 09:58:44 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 23/07/2021 17:45, Stephen Hemminger:
> > The first argument to rte_bsf32_safe was incorrectly declared as
> > a 64 bit value. The code only works on 32 bit values and the underlying
> > function rte_bsf32 only accepts 32 bit values. This was a mistake
> > introduced when the safe version was added and probably cause
> > by copy/paste from the 64 bit version.
> > 
> > The bug passed silently under the radar until some other code was
> > built with -Wall and -Wextra in C++ and C++ complains about the
> > missing cast.
> > 
> > Yes, this is a API signature change, but the original code was wrong.
> > It is an inline so not an ABI change.
> > 
> > Fixes: 4e261f551986 ("eal: add 64-bit bsf and 32-bit safe bsf functions")
> > Cc: anatoly.burakov@intel.com
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>  
> 
> +Cc: stable@dpdk.org
> 
> Applied, thanks.
> 
> I think these functions lack a reference to the name Bit Scan Forward.
> 
> 
> 
> 

Tyler wanted to fix a bunch more stuff in these for 21.11 where it will
be a bigger API change.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] eal: fix argument to rte_bsf32_safe
  2021-07-23 15:45  8% ` [dpdk-dev] [PATCH v3] " Stephen Hemminger
@ 2021-07-24  7:58  0%   ` Thomas Monjalon
  2021-07-24 23:50  0%     ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-07-24  7:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, anatoly.burakov, Tyler Retzlaff

23/07/2021 17:45, Stephen Hemminger:
> The first argument to rte_bsf32_safe was incorrectly declared as
> a 64 bit value. The code only works on 32 bit values and the underlying
> function rte_bsf32 only accepts 32 bit values. This was a mistake
> introduced when the safe version was added and probably cause
> by copy/paste from the 64 bit version.
> 
> The bug passed silently under the radar until some other code was
> built with -Wall and -Wextra in C++ and C++ complains about the
> missing cast.
> 
> Yes, this is a API signature change, but the original code was wrong.
> It is an inline so not an ABI change.
> 
> Fixes: 4e261f551986 ("eal: add 64-bit bsf and 32-bit safe bsf functions")
> Cc: anatoly.burakov@intel.com
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

+Cc: stable@dpdk.org

Applied, thanks.

I think these functions lack a reference to the name Bit Scan Forward.





^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: test different build types
  @ 2021-07-23 20:26  0%   ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-07-23 20:26 UTC (permalink / raw)
  To: David Marchand, Thomas Monjalon; +Cc: dev, Bruce Richardson

On 5/21/21 6:03 PM, David Marchand wrote:
> On Mon, Apr 12, 2021 at 11:54 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>> @@ -213,9 +218,10 @@ for c in gcc clang ; do
>>                          abicheck=ABI
> 
> init of buildtype var is missing here.

+1

> Rest lgtm.
> 
>>                  else
>>                          abicheck=skipABI # save time and disk space
>> +                       buildtype='--buildtype=minsize'
>>                  fi
>>                  export CC="$CCACHE $c"
>> -               build build-$c-$s $c $abicheck --default-library=$s
>> +               build build-$c-$s $c $abicheck $buildtype --default-library=$s
>>                  unset CC
>>          done
>>   done
> 
> 

with review notes applied:

Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3] eal: fix argument to rte_bsf32_safe
  2021-07-13 20:12  3% [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe Stephen Hemminger
  2021-07-19 17:15  0% ` Tyler Retzlaff
  2021-07-23  0:52  8% ` [dpdk-dev] [PATCH v2] " Stephen Hemminger
@ 2021-07-23 15:45  8% ` Stephen Hemminger
  2021-07-24  7:58  0%   ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2021-07-23 15:45 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, anatoly.burakov, Tyler Retzlaff

The first argument to rte_bsf32_safe was incorrectly declared as
a 64 bit value. The code only works on 32 bit values and the underlying
function rte_bsf32 only accepts 32 bit values. This was a mistake
introduced when the safe version was added and probably cause
by copy/paste from the 64 bit version.

The bug passed silently under the radar until some other code was
built with -Wall and -Wextra in C++ and C++ complains about the
missing cast.

Yes, this is a API signature change, but the original code was wrong.
It is an inline so not an ABI change.

Fixes: 4e261f551986 ("eal: add 64-bit bsf and 32-bit safe bsf functions")
Cc: anatoly.burakov@intel.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
v3 - reword commit description for checkpatch

 doc/guides/rel_notes/release_21_08.rst | 4 ++++
 lib/eal/include/rte_common.h           | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index e2c5ccbf7d90..148405891fcb 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -196,6 +196,10 @@ API Changes
   to be thread safe; all Rx queues affected by the API will now need to be
   stopped before making any changes to the power management scheme.
 
+* eal: ``rte_bsf32_safe`` now takes a 32 bit value for its first
+  argument. This fixes warnings about loss of precision when used
+  with some compilers settings.
+
 
 ABI Changes
 -----------
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index d5a32c66a5fe..99eb5f1820ae 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -623,7 +623,7 @@ rte_bsf32(uint32_t v)
  *     Returns 0 if ``v`` was 0, otherwise returns 1.
  */
 static inline int
-rte_bsf32_safe(uint64_t v, uint32_t *pos)
+rte_bsf32_safe(uint32_t v, uint32_t *pos)
 {
 	if (v == 0)
 		return 0;
-- 
2.30.2


^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
  2021-07-23  7:39  3% ` Xia, Chenbo
@ 2021-07-23 12:46  3%   ` Ferruh Yigit
  2021-07-26  5:56  0%     ` Xia, Chenbo
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-07-23 12:46 UTC (permalink / raw)
  To: Xia, Chenbo, dev, thomas; +Cc: mdr, nhorman, david.marchand

On 7/23/2021 8:39 AM, Xia, Chenbo wrote:
> Hi,
> 
> A gentle ping for comments..
> 
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Chenbo Xia
>> Sent: Tuesday, June 1, 2021 4:42 PM
>> To: dev@dpdk.org; thomas@monjalon.net
>> Cc: mdr@ashroe.eu; nhorman@tuxdriver.com
>> Subject: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
>>
>> All ABIs in PCI bus driver, which are defined in rte_buc_pci.h,
>> will be removed and the header will be made internal.
>>
>> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst
>> b/doc/guides/rel_notes/deprecation.rst
>> index 9584d6bfd7..b01f46c62e 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -147,3 +147,8 @@ Deprecation Notices
>>  * cmdline: ``cmdline`` structure will be made opaque to hide platform-
>> specific
>>    content. On Linux and FreeBSD, supported prior to DPDK 20.11,
>>    original structure will be kept until DPDK 21.11.
>> +
>> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver, "rte_bus_pci.h"
>> +  will be made internal in 21.11 and macros/data structures/functions defined
>> +  in the header will not be considered as ABI anymore. This change is
>> inspired
>> +  by the RFC https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
> 
> I see there's some ABI improvement work on-going and I think it could be part of
> the work. If it makes sense to you, I'd like some ACKs.
> 

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

I am for reducing the public ABI as much as possible. How big will the change
be? Is the 'rte_bus_pci.h' used other than './drivers/bus/pci/'?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2] doc: update atomic operation deprecation
  2021-07-12  8:02  4% [dpdk-dev] [PATCH v1] doc: update atomic operation deprecation Joyce Kong
  2021-07-17 18:47  0% ` Honnappa Nagarahalli
@ 2021-07-23  9:49  4% ` Joyce Kong
  1 sibling, 0 replies; 200+ results
From: Joyce Kong @ 2021-07-23  9:49 UTC (permalink / raw)
  To: thomas, stephen, honnappa.nagarahalli, ruifeng.wang, mdr; +Cc: dev, nd, stable

Update the incorrect description about atomic operations
with provided wrappers in deprecation doc[1].

[1]https://mails.dpdk.org/archives/dev/2021-July/213333.html

Fixes: 7518c5c4ae6a ("doc: announce adoption of C11 atomic operations semantics")
Cc: stable@dpdk.org

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 doc/guides/rel_notes/deprecation.rst | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..a4f350fa09 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -19,16 +19,18 @@ Deprecation Notices
 
 * rte_atomicNN_xxx: These APIs do not take memory order parameter. This does
   not allow for writing optimized code for all the CPU architectures supported
-  in DPDK. DPDK will adopt C11 atomic operations semantics and provide wrappers
-  using C11 atomic built-ins. These wrappers must be used for patches that
-  need to be merged in 20.08 onwards. This change will not introduce any
-  performance degradation.
+  in DPDK. DPDK has adopted the atomic operations from
+  https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. These
+  operations must be used for patches that need to be merged in 20.08 onwards.
+  This change will not introduce any performance degradation.
 
 * rte_smp_*mb: These APIs provide full barrier functionality. However, many
-  use cases do not require full barriers. To support such use cases, DPDK will
-  adopt C11 barrier semantics and provide wrappers using C11 atomic built-ins.
-  These wrappers must be used for patches that need to be merged in 20.08
-  onwards. This change will not introduce any performance degradation.
+  use cases do not require full barriers. To support such use cases, DPDK has
+  adopted atomic operations from
+  https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. These
+  operations and a new wrapper ``rte_atomic_thread_fence`` instead of
+  ``__atomic_thread_fence`` must be used for patches that need to be merged in
+  20.08 onwards. This change will not introduce any performance degradation.
 
 * lib: will fix extending some enum/define breaking the ABI. There are multiple
   samples in DPDK that enum/define terminated with a ``.*MAX.*`` value which is
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
  @ 2021-07-23  7:39  3% ` Xia, Chenbo
  2021-07-23 12:46  3%   ` Ferruh Yigit
  2021-07-27 10:58  0% ` Ananyev, Konstantin
  1 sibling, 1 reply; 200+ results
From: Xia, Chenbo @ 2021-07-23  7:39 UTC (permalink / raw)
  To: dev, thomas; +Cc: mdr, nhorman, david.marchand, Yigit, Ferruh

Hi,

A gentle ping for comments..

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Chenbo Xia
> Sent: Tuesday, June 1, 2021 4:42 PM
> To: dev@dpdk.org; thomas@monjalon.net
> Cc: mdr@ashroe.eu; nhorman@tuxdriver.com
> Subject: [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver
> 
> All ABIs in PCI bus driver, which are defined in rte_buc_pci.h,
> will be removed and the header will be made internal.
> 
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 9584d6bfd7..b01f46c62e 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,8 @@ Deprecation Notices
>  * cmdline: ``cmdline`` structure will be made opaque to hide platform-
> specific
>    content. On Linux and FreeBSD, supported prior to DPDK 20.11,
>    original structure will be kept until DPDK 21.11.
> +
> +* pci: To reduce unnecessary ABIs exposed by DPDK bus driver, "rte_bus_pci.h"
> +  will be made internal in 21.11 and macros/data structures/functions defined
> +  in the header will not be considered as ABI anymore. This change is
> inspired
> +  by the RFC https://patchwork.dpdk.org/project/dpdk/list/?series=17176.

I see there's some ABI improvement work on-going and I think it could be part of
the work. If it makes sense to you, I'd like some ACKs.

Thanks,
Chenbo

> --
> 2.17.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2] eal: fix argument to rte_bsf32_safe
  2021-07-13 20:12  3% [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe Stephen Hemminger
  2021-07-19 17:15  0% ` Tyler Retzlaff
@ 2021-07-23  0:52  8% ` Stephen Hemminger
  2021-07-23 15:45  8% ` [dpdk-dev] [PATCH v3] " Stephen Hemminger
  2 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2021-07-23  0:52 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, anatoly.burakov, Tyler Retzlaff

The first argument to rte_bsf32_safe was incorrectly declared as
a 64 bit value. The code only works on 32 bit values and the underlying
function rte_bsf32 only accepts 32 bit values. This was a mistake
introduced when the safe version was added and probaly cause
by copy/paste from the 64 bit version.

The bug passed silently under the radar until some other code was
built with -Wall and -Wextra in C++ and C++ complains about the
missing cast.

Yes, this is a API signature change, but the original code was wrong.
It is an inline so not an ABI change.

Fixes: 4e261f551986 ("eal: add 64-bit bsf and 32-bit safe bsf functions")
Cc: anatoly.burakov@intel.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-By: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
v2 - add suggested release note

 doc/guides/rel_notes/release_21_08.rst | 4 ++++
 lib/eal/include/rte_common.h           | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index e2c5ccbf7d90..148405891fcb 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -196,6 +196,10 @@ API Changes
   to be thread safe; all Rx queues affected by the API will now need to be
   stopped before making any changes to the power management scheme.
 
+* eal: ``rte_bsf32_safe`` now takes a 32 bit value for its first
+  argument. This fixes warnings about loss of precision when used
+  with some compilers settings.
+
 
 ABI Changes
 -----------
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index d5a32c66a5fe..99eb5f1820ae 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -623,7 +623,7 @@ rte_bsf32(uint32_t v)
  *     Returns 0 if ``v`` was 0, otherwise returns 1.
  */
 static inline int
-rte_bsf32_safe(uint64_t v, uint32_t *pos)
+rte_bsf32_safe(uint32_t v, uint32_t *pos)
 {
 	if (v == 0)
 		return 0;
-- 
2.30.2


^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS
  2021-07-22 20:20  3%                         ` Brandon Lo
@ 2021-07-22 20:32  0%                           ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-22 20:32 UTC (permalink / raw)
  To: Brandon Lo
  Cc: dpdklab, lylavoie, Shijith Thotton, Akhil Goyal, david.marchand,
	dev, aconole, ci

22/07/2021 22:20, Brandon Lo:
> On Thu, Jul 22, 2021 at 3:08 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 22/07/2021 21:06, Thomas Monjalon:
> > > 22/07/2021 11:17, Akhil Goyal:
> > > > > Enabled build of Octeontx crypto PMD on non linux OS. Other Octeontx
> > > > > PMDs are enabled already.
> > > > >
> > > > > This is to avoid ABI test failure on an OS once we add dependency
> > > > > between a driver which is built to another which is not.
> > > >
> > > > Fixes: 8dc6c2f12ecf ("crypto/octeontx: add crypto adapter framework")
> > > > >
> > > >
> > > > Reported-by: David Marchand <david.marchand@redhat.com>
> > > >
> > > > > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > > >
> > > > Acked-by: Akhil Goyal <gakhil@marvell.com>
> > > >
> > > > Thomas/David: please pick this patch directly on main to fix build on CI for FreeBSD.
> > >
> > > Applied, thanks.
> >
> > Please could you re-test the ABI on FreeBSD
> > and re-enable in the CI if the test is passing?
> >
> > Thank you
> 
> I ran a couple test runs on FreeBSD 13 to ensure that the patch
> compiles successfully, and I enabled reporting.
> FreeBSD 13 should start to appear in the ABI test results of newer
> tarballs with the patch.

Thanks a lot Brandon, well managed.




^ permalink raw reply	[relevance 0%]

* [dpdk-dev] DPDK Release Status Meeting 22/07/2021
@ 2021-07-22 20:22  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-22 20:22 UTC (permalink / raw)
  To: dev; +Cc: john.mcnamara, ferruh.yigit, david.marchand, Christian Ehrhardt

Release Dates
-------------

* v21.08
  - Proposal/V1:    Wednesday,  2 June (completed)
  - rc1:            Saturday,  10 July (completed)
  - rc2:            Friday,    23 July
  - rc3:            Thursday,  29 July
  - rc4:            Wednesday,  4 August
  - Release:        Friday,     6 August

Subtrees
--------

* next-net
  - Bug with libatomic in clang, fixed today.

* next-crypto
  - Pulled yesterday.
  - Only deprecation notices left for this release.
  - ABI check on FreeBSD: fixed today.

* next-eventdev
  - Few patches for -rc3.

* next-virtio
  - Pulled yesterday.
  - One more series to look at (was rejected later).
  - Change on async experimental code - candidate for -rc3

* next-net-brcm
  - No update.

* next-net-intel
  - No update.

* next-net-mlx
  - Integration in progress

* next-net-mrvl
  - Few patches for -rc3.

LTS
---

DPDK 19.11.9 released on Monday by Christian.

Call for help for 19.11.x to fix issues with new toolchains, kernels, etc.



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS
  2021-07-22 19:08  3%                       ` Thomas Monjalon
@ 2021-07-22 20:20  3%                         ` Brandon Lo
  2021-07-22 20:32  0%                           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Brandon Lo @ 2021-07-22 20:20 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dpdklab, lylavoie, Shijith Thotton, Akhil Goyal, david.marchand,
	dev, aconole, ci

On Thu, Jul 22, 2021 at 3:08 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 22/07/2021 21:06, Thomas Monjalon:
> > 22/07/2021 11:17, Akhil Goyal:
> > > > Enabled build of Octeontx crypto PMD on non linux OS. Other Octeontx
> > > > PMDs are enabled already.
> > > >
> > > > This is to avoid ABI test failure on an OS once we add dependency
> > > > between a driver which is built to another which is not.
> > >
> > > Fixes: 8dc6c2f12ecf ("crypto/octeontx: add crypto adapter framework")
> > > >
> > >
> > > Reported-by: David Marchand <david.marchand@redhat.com>
> > >
> > > > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > >
> > > Acked-by: Akhil Goyal <gakhil@marvell.com>
> > >
> > > Thomas/David: please pick this patch directly on main to fix build on CI for FreeBSD.
> >
> > Applied, thanks.
>
> Please could you re-test the ABI on FreeBSD
> and re-enable in the CI if the test is passing?
>
> Thank you

I ran a couple test runs on FreeBSD 13 to ensure that the patch
compiles successfully, and I enabled reporting.
FreeBSD 13 should start to appear in the ABI test results of newer
tarballs with the patch.

Thanks,
Brandon


--
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
blo@iol.unh.edu
www.iol.unh.edu

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS
  2021-07-22 19:06  0%                     ` Thomas Monjalon
@ 2021-07-22 19:08  3%                       ` Thomas Monjalon
  2021-07-22 20:20  3%                         ` Brandon Lo
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-07-22 19:08 UTC (permalink / raw)
  To: dpdklab, lylavoie, Brandon Lo
  Cc: Shijith Thotton, Akhil Goyal, david.marchand, dev, aconole, ci

22/07/2021 21:06, Thomas Monjalon:
> 22/07/2021 11:17, Akhil Goyal:
> > > Enabled build of Octeontx crypto PMD on non linux OS. Other Octeontx
> > > PMDs are enabled already.
> > > 
> > > This is to avoid ABI test failure on an OS once we add dependency
> > > between a driver which is built to another which is not.
> > 
> > Fixes: 8dc6c2f12ecf ("crypto/octeontx: add crypto adapter framework")
> > > 
> > 
> > Reported-by: David Marchand <david.marchand@redhat.com>
> > 
> > > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > 
> > Acked-by: Akhil Goyal <gakhil@marvell.com>
> > 
> > Thomas/David: please pick this patch directly on main to fix build on CI for FreeBSD.
> 
> Applied, thanks.

Please could you re-test the ABI on FreeBSD
and re-enable in the CI if the test is passing?

Thank you



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS
  2021-07-22  9:17  0%                   ` Akhil Goyal
@ 2021-07-22 19:06  0%                     ` Thomas Monjalon
  2021-07-22 19:08  3%                       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-07-22 19:06 UTC (permalink / raw)
  To: Shijith Thotton, Akhil Goyal
  Cc: david.marchand, dev, abhinandan.gujjar, aconole, Ankur Dwivedi,
	Anoob Joseph, dpdklab, Jerin Jacob Kollanukkaran, lylavoie, mdr,
	Pavan Nikhilesh Bhagavatula

22/07/2021 11:17, Akhil Goyal:
> > Enabled build of Octeontx crypto PMD on non linux OS. Other Octeontx
> > PMDs are enabled already.
> > 
> > This is to avoid ABI test failure on an OS once we add dependency
> > between a driver which is built to another which is not.
> 
> Fixes: 8dc6c2f12ecf ("crypto/octeontx: add crypto adapter framework")
> > 
> 
> Reported-by: David Marchand <david.marchand@redhat.com>
> 
> > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> 
> Acked-by: Akhil Goyal <gakhil@marvell.com>
> 
> Thomas/David: please pick this patch directly on main to fix build on CI for FreeBSD.

Applied, thanks.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash offload
  2021-07-19 16:18  0%           ` Ferruh Yigit
@ 2021-07-22 11:03  0%             ` Andrew Rybchenko
  2021-08-09  8:53  0%               ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-22 11:03 UTC (permalink / raw)
  To: Ferruh Yigit, Wang, Jie1X, Li, Xiaoyun, dev; +Cc: stable

On 7/19/21 7:18 PM, Ferruh Yigit wrote:
> On 7/19/2021 10:55 AM, Wang, Jie1X wrote:
>>
>>
>>> -----Original Message-----
>>> From: Yigit, Ferruh <ferruh.yigit@intel.com>
>>> Sent: Friday, July 16, 2021 4:52 PM
>>> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wang, Jie1X <jie1x.wang@intel.com>;
>>> dev@dpdk.org
>>> Cc: andrew.rybchenko@oktetlabs.ru; stable@dpdk.org
>>> Subject: Re: [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show
>>> RSS hash offload
>>>
>>> On 7/16/2021 9:30 AM, Li, Xiaoyun wrote:
>>>>> -----Original Message-----
>>>>> From: stable <stable-bounces@dpdk.org> On Behalf Of Li, Xiaoyun
>>>>> Sent: Thursday, July 15, 2021 12:54
>>>>> To: Wang, Jie1X <jie1x.wang@intel.com>; dev@dpdk.org
>>>>> Cc: andrew.rybchenko@oktetlabs.ru; stable@dpdk.org
>>>>> Subject: Re: [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd
>>>>> doesn't show RSS hash offload
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Wang, Jie1X <jie1x.wang@intel.com>
>>>>>> Sent: Thursday, July 15, 2021 19:57
>>>>>> To: dev@dpdk.org
>>>>>> Cc: Li, Xiaoyun <xiaoyun.li@intel.com>;
>>>>>> andrew.rybchenko@oktetlabs.ru; Wang, Jie1X <jie1x.wang@intel.com>;
>>>>>> stable@dpdk.org
>>>>>> Subject: [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash
>>>>>> offload
>>>>>>
>>>>>> The driver may change offloads info into dev->data->dev_conf in
>>>>>> dev_configure which may cause port->dev_conf and port->rx_conf
>>>>>> contain
>>>>> outdated values.
>>>>>>
>>>>>> This patch updates the offloads info if it changes to fix this issue.
>>>>>>
>>>>>> Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Jie Wang <jie1x.wang@intel.com>
>>>>>> ---
>>>>>> v4: delete the whitespace at the end of the line.
>>>>>> v3:
>>>>>>   - check and update the "offloads" of "port->dev_conf.rx/txmode".
>>>>>>   - update the commit log.
>>>>>> v2: copy "rx/txmode.offloads", instead of copying the entire struct
>>>>>> "dev->data-
>>>>>>> dev_conf.rx/txmode".
>>>>>> ---
>>>>>>   app/test-pmd/testpmd.c | 27 +++++++++++++++++++++++++++
>>>>>>   1 file changed, 27 insertions(+)
>>>>>
>>>>> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>>>
>>>> Although I gave my ack, app shouldn't touch rte_eth_devices which this patch
>>> does. Usually, testpmd should only call function like
>>> eth_dev_info_get_print_err().
>>>> But dev_info doesn't contain the info dev->data->dev_conf which the driver
>>> modifies.
>>>>
>>>> Probably we need a better fix.
>>>>
>>>
>>> Agree, an application accessing directly to 'rte_eth_devices' is sign of something
>>> missing/wrong.
>>>
>>> In this case there is no way for application to know what is the configured
>>> offload settings per port and queue. Which is missing part I think.
>>>
>>> As you said normally we get data from PMD mainly via 'rte_eth_dev_info_get()',
>>> which is an overloaded function, it provides many different things, like driver
>>> default values, limitations, current config/status, capabilities etc...
>>>
>>> So I think we can do a few things:
>>> 1) Add current offload configuration to 'rte_eth_dev_info_get()', so application
>>> can get it and use it.
>>> The advantage is this API already called many places, many times, so there is a
>>> big chance that application already have this information when it needs.
>>> Disadvantage is, as mentioned above the API already big and messy, making it
>>> bigger makes more error prone and makes easier to break ABI.
>>>
>> I prefer to choose the 1st suggestion.
>>
>> Normally PMD gets data via 'rte_eth_dev_info_get()'. When we add offloads configuration
>> to it, we can get offloads as same as getting other info.
>>
> 
> Most probably it is easier to implement 1), I see your point but as said before
> I think 'rte_eth_dev_info_get()' is already messy and I am worried to make it
> even bigger.

IMHO, (1) is not an option.

> I prefer option 2).

I'm not sure that API function for each config parameter is an option as
well. We should find a balance. May be I'd add something like
rte_eth_dev_get_conf(uint16_t port_id, const struct rte_eth_conf **conf)
which returns a pointer to up-to-date configuration. I.e. option (3).

The tricky part here is to ensure that all specific API which modifies
various bits of the configuration updates dev_conf.

> 
> @Thomas, @Andrew, what do you think?
> 
> 
>>> 2) Add a new API to get configured offload information, so a specific API for it.
>>>
>>> 3) Get a more generic API to get configured config (dev_conf) which will cover
>>> offloads too.
>>> Disadvantage can be leaking out too many internal config to user unintentionally.

I don't understand it. dev_conf is provided by user on
rte_eth_dev_configure().

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS
  2021-07-22  9:06  3%                 ` [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS Shijith Thotton
@ 2021-07-22  9:17  0%                   ` Akhil Goyal
  2021-07-22 19:06  0%                     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2021-07-22  9:17 UTC (permalink / raw)
  To: Shijith Thotton, thomas, david.marchand
  Cc: abhinandan.gujjar, aconole, Ankur Dwivedi, Anoob Joseph, dev,
	dpdklab, Jerin Jacob Kollanukkaran, lylavoie, mdr,
	Pavan Nikhilesh Bhagavatula, Shijith Thotton

> Enabled build of Octeontx crypto PMD on non linux OS. Other Octeontx
> PMDs are enabled already.
> 
> This is to avoid ABI test failure on an OS once we add dependency
> between a driver which is built to another which is not.

Fixes: 8dc6c2f12ecf ("crypto/octeontx: add crypto adapter framework")
> 

Reported-by: David Marchand <david.marchand@redhat.com>

> Signed-off-by: Shijith Thotton <sthotton@marvell.com>

Acked-by: Akhil Goyal <gakhil@marvell.com>

Thomas/David: please pick this patch directly on main to fix build on CI for FreeBSD.


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS
  2021-07-22  7:45  0%               ` Akhil Goyal
@ 2021-07-22  9:06  3%                 ` Shijith Thotton
  2021-07-22  9:17  0%                   ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Shijith Thotton @ 2021-07-22  9:06 UTC (permalink / raw)
  To: gakhil, thomas
  Cc: abhinandan.gujjar, aconole, adwivedi, anoobj, david.marchand,
	dev, dpdklab, jerinj, lylavoie, mdr, pbhagavatula, sthotton

Enabled build of Octeontx crypto PMD on non linux OS. Other Octeontx
PMDs are enabled already.

This is to avoid ABI test failure on an OS once we add dependency
between a driver which is built to another which is not.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
---
 drivers/crypto/octeontx/meson.build | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/crypto/octeontx/meson.build b/drivers/crypto/octeontx/meson.build
index 3ae6729e8f..244b16230e 100644
--- a/drivers/crypto/octeontx/meson.build
+++ b/drivers/crypto/octeontx/meson.build
@@ -1,9 +1,5 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Cavium, Inc
-if not is_linux
-    build = false
-    reason = 'only supported on Linux'
-endif
 
 deps += ['bus_pci']
 deps += ['bus_vdev']
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  2021-07-21  9:44  3%             ` Thomas Monjalon
  2021-07-21 15:11  4%               ` Brandon Lo
@ 2021-07-22  7:45  0%               ` Akhil Goyal
  2021-07-22  9:06  3%                 ` [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS Shijith Thotton
  1 sibling, 1 reply; 200+ results
From: Akhil Goyal @ 2021-07-22  7:45 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand
  Cc: dev, Ray Kinsella, Pavan Nikhilesh Bhagavatula, Anoob Joseph,
	Abhinandan Gujjar, Ankur Dwivedi, Jerin Jacob Kollanukkaran,
	Aaron Conole, dpdklab, Lincoln Lavoie, Shijith Thotton

> 20/07/2021 14:14, David Marchand:
> > On Tue, Jul 20, 2021 at 1:59 PM Akhil Goyal <gakhil@marvell.com> wrote:
> > >
> > >  Hi David,
> > > >
> > > > > >  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev',
> > > > > 'net_octeontx']
> > > > > > +deps += ['crypto_octeontx']
> > > > >
> > > > > This extra dependency resulted in disabling the event/octeontx driver
> > > > > in FreeBSD, since crypto/octeontx only builds on Linux.
> > > > > Removing hw support triggers a ABI failure for FreeBSD.
> > > > >
> > > > >
> > > > > - This had been reported by UNH CI:
> > > > > https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__mails.dpdk.org_archives_test-2Dreport_2021-
> 2DJune_200637.html&d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=DnL7Si2
> wl_PRwpZ9TWey3eu68gBzn7DkPwuqhd6WNyo&m=zikYn88P-
> Q3H517Go0NWLsokSeUCheJhQyY-Rh-
> DAWQ&s=v6vmJJNBDxjoA81J4rpuxvgPhR8DCT6qizgAkXauZIY&e=
> > > > > It seems the result has been ignored but it should have at least
> > > > > raised some discussion.
> > > > >
> > > > This was highlighted to CI ML
> > > > https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__patches.dpdk.org_project_dpdk_patch_0686a7c3fb3a22e37378a8545b
> &d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=DnL7Si2wl_PRwpZ9TWey3eu6
> 8gBzn7DkPwuqhd6WNyo&m=zikYn88P-Q3H517Go0NWLsokSeUCheJhQyY-
> Rh-DAWQ&s=68Xkwo5J0d3BngYD0gxM0JKIgDzd58pypXyJrprGIgA&e=
> > > > c37bce04f4c391.1624481225.git.sthotton@marvell.com/
> > > >
> > > > but I think I missed to take the follow up with Brandon and applied the
> patch
> > > > as it did not look an issue to me as octeon drivers are not currently built
> on
> > > > FreeBSD.
> > > > Not sure why event driver is getting built there.
> > > >
> > > > >
> > > > > - I asked UNH to stop testing FreeBSD abi for now, waiting to get the
> > > > > main branch fixed.
> > > > >
> > > > > I don't have the time to look at this, please can you work on it?
> > > > >
> > > > > Several options:
> > > > > * crypto/octeontx is made so that it compiles on FreeBSD,
> > > > > * the abi check is extended to have exceptions per OS,
> > > > > * the FreeBSD abi reference is regenerated at UNH not to have those
> > > > > drivers in it (not sure it is doable),
> > > >
> > > > Thanks for the suggestions, we are working on it to resolve this as soon
> as
> > > > possible.
> > > > We may need to add exception in ABI checking so that it does not shout
> if a
> > > > PMD
> > > > is not compiled.
> > > Can we have below change? Will it work to disable compilation of
> > > event/octeontx2 for FreeBSD? I believe this was done by mistake earlier
> > > as all other octeontx2 drivers are compiled off on platforms other than
> Linux.
> > >
> > > diff --git a/drivers/event/octeontx2/meson.build
> b/drivers/event/octeontx2/meson.build
> > > index 96ebb1f2e7..1ebc51f73f 100644
> > > --- a/drivers/event/octeontx2/meson.build
> > > +++ b/drivers/event/octeontx2/meson.build
> > > @@ -2,7 +2,7 @@
> > >  # Copyright(C) 2019 Marvell International Ltd.
> > >  #
> > >
> > > -if not dpdk_conf.get('RTE_ARCH_64')
> > > +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
> > >      build = false
> > >      reason = 'only supported on 64-bit'
> > >      subdir_done()
> >
> > I did not suggest this possibility.
> > That's the same as for other octeon drivers, such change has been
> > deferred to 21.11.
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__patches.dpdk.org_project_dpdk_list_-3Fseries-
> 3D15885&d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=DnL7Si2wl_PRwpZ9T
> Wey3eu68gBzn7DkPwuqhd6WNyo&m=zikYn88P-
> Q3H517Go0NWLsokSeUCheJhQyY-Rh-
> DAWQ&s=A5fHouoeBcH2sL_xt5dtzRwfA8Fq__eBUYc-J9ANBIg&e=
> >
> > >
> > > Or of this does not work, then we would need to add exception in ABI
> checking.
> > > Any suggestions how to do this?
> >
> > Sorry, no good idea from me.
> 
> We would need to revert the change breaking the ABI test.
> But I don't understand why it seems passing in recent CI runs?
> 
It is passing because FreeBSD is currently skipped. Right David?
BTW, no need to revert, we would be sending a patch to enable compilation
of crypto/octeontx


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  2021-07-21  9:44  3%             ` Thomas Monjalon
@ 2021-07-21 15:11  4%               ` Brandon Lo
  2021-07-22  7:45  0%               ` Akhil Goyal
  1 sibling, 0 replies; 200+ results
From: Brandon Lo @ 2021-07-21 15:11 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Akhil Goyal, dev, Ray Kinsella, Pavan Nikhilesh Bhagavatula,
	Anoob Joseph, Abhinandan Gujjar, Ankur Dwivedi,
	Jerin Jacob Kollanukkaran, Aaron Conole, dpdklab, Lincoln Lavoie,
	Shijith Thotton, David Marchand

On Wed, Jul 21, 2021 at 5:44 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 20/07/2021 14:14, David Marchand:
> > On Tue, Jul 20, 2021 at 1:59 PM Akhil Goyal <gakhil@marvell.com> wrote:
> > >
> > >  Hi David,
> > > >
> > > > > >  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev',
> > > > > 'net_octeontx']
> > > > > > +deps += ['crypto_octeontx']
> > > > >
> > > > > This extra dependency resulted in disabling the event/octeontx driver
> > > > > in FreeBSD, since crypto/octeontx only builds on Linux.
> > > > > Removing hw support triggers a ABI failure for FreeBSD.
> > > > >
> > > > >
> > > > > - This had been reported by UNH CI:
> > > > > http://mails.dpdk.org/archives/test-report/2021-June/200637.html
> > > > > It seems the result has been ignored but it should have at least
> > > > > raised some discussion.
> > > > >
> > > > This was highlighted to CI ML
> > > > http://patches.dpdk.org/project/dpdk/patch/0686a7c3fb3a22e37378a8545b
> > > > c37bce04f4c391.1624481225.git.sthotton@marvell.com/
> > > >
> > > > but I think I missed to take the follow up with Brandon and applied the patch
> > > > as it did not look an issue to me as octeon drivers are not currently built on
> > > > FreeBSD.
> > > > Not sure why event driver is getting built there.
> > > >
> > > > >
> > > > > - I asked UNH to stop testing FreeBSD abi for now, waiting to get the
> > > > > main branch fixed.
> > > > >
> > > > > I don't have the time to look at this, please can you work on it?
> > > > >
> > > > > Several options:
> > > > > * crypto/octeontx is made so that it compiles on FreeBSD,
> > > > > * the abi check is extended to have exceptions per OS,
> > > > > * the FreeBSD abi reference is regenerated at UNH not to have those
> > > > > drivers in it (not sure it is doable),
> > > >
> > > > Thanks for the suggestions, we are working on it to resolve this as soon as
> > > > possible.
> > > > We may need to add exception in ABI checking so that it does not shout if a
> > > > PMD
> > > > is not compiled.
> > > Can we have below change? Will it work to disable compilation of
> > > event/octeontx2 for FreeBSD? I believe this was done by mistake earlier
> > > as all other octeontx2 drivers are compiled off on platforms other than Linux.
> > >
> > > diff --git a/drivers/event/octeontx2/meson.build b/drivers/event/octeontx2/meson.build
> > > index 96ebb1f2e7..1ebc51f73f 100644
> > > --- a/drivers/event/octeontx2/meson.build
> > > +++ b/drivers/event/octeontx2/meson.build
> > > @@ -2,7 +2,7 @@
> > >  # Copyright(C) 2019 Marvell International Ltd.
> > >  #
> > >
> > > -if not dpdk_conf.get('RTE_ARCH_64')
> > > +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
> > >      build = false
> > >      reason = 'only supported on 64-bit'
> > >      subdir_done()
> >
> > I did not suggest this possibility.
> > That's the same as for other octeon drivers, such change has been
> > deferred to 21.11.
> > https://patches.dpdk.org/project/dpdk/list/?series=15885
> >
> > >
> > > Or of this does not work, then we would need to add exception in ABI checking.
> > > Any suggestions how to do this?
> >
> > Sorry, no good idea from me.
>
> We would need to revert the change breaking the ABI test.
> But I don't understand why it seems passing in recent CI runs?

Hi Thomas,

For the UNH lab, FreeBSD 13 ABI tests have been disabled due to a request
made during the community CI meeting on July 15th.

The recent CI ABI runs will show up as passes, but the older runs with
FreeBSD 13 included will keep their recorded failures.

Thanks,
Brandon


--
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
blo@iol.unh.edu
www.iol.unh.edu

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  2021-07-20 12:14  0%           ` David Marchand
@ 2021-07-21  9:44  3%             ` Thomas Monjalon
  2021-07-21 15:11  4%               ` Brandon Lo
  2021-07-22  7:45  0%               ` Akhil Goyal
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2021-07-21  9:44 UTC (permalink / raw)
  To: Akhil Goyal
  Cc: dev, Ray Kinsella, Pavan Nikhilesh Bhagavatula, Anoob Joseph,
	Abhinandan Gujjar, Ankur Dwivedi, Jerin Jacob Kollanukkaran,
	Aaron Conole, dpdklab, Lincoln Lavoie, Shijith Thotton,
	David Marchand

20/07/2021 14:14, David Marchand:
> On Tue, Jul 20, 2021 at 1:59 PM Akhil Goyal <gakhil@marvell.com> wrote:
> >
> >  Hi David,
> > >
> > > > >  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev',
> > > > 'net_octeontx']
> > > > > +deps += ['crypto_octeontx']
> > > >
> > > > This extra dependency resulted in disabling the event/octeontx driver
> > > > in FreeBSD, since crypto/octeontx only builds on Linux.
> > > > Removing hw support triggers a ABI failure for FreeBSD.
> > > >
> > > >
> > > > - This had been reported by UNH CI:
> > > > http://mails.dpdk.org/archives/test-report/2021-June/200637.html
> > > > It seems the result has been ignored but it should have at least
> > > > raised some discussion.
> > > >
> > > This was highlighted to CI ML
> > > http://patches.dpdk.org/project/dpdk/patch/0686a7c3fb3a22e37378a8545b
> > > c37bce04f4c391.1624481225.git.sthotton@marvell.com/
> > >
> > > but I think I missed to take the follow up with Brandon and applied the patch
> > > as it did not look an issue to me as octeon drivers are not currently built on
> > > FreeBSD.
> > > Not sure why event driver is getting built there.
> > >
> > > >
> > > > - I asked UNH to stop testing FreeBSD abi for now, waiting to get the
> > > > main branch fixed.
> > > >
> > > > I don't have the time to look at this, please can you work on it?
> > > >
> > > > Several options:
> > > > * crypto/octeontx is made so that it compiles on FreeBSD,
> > > > * the abi check is extended to have exceptions per OS,
> > > > * the FreeBSD abi reference is regenerated at UNH not to have those
> > > > drivers in it (not sure it is doable),
> > >
> > > Thanks for the suggestions, we are working on it to resolve this as soon as
> > > possible.
> > > We may need to add exception in ABI checking so that it does not shout if a
> > > PMD
> > > is not compiled.
> > Can we have below change? Will it work to disable compilation of
> > event/octeontx2 for FreeBSD? I believe this was done by mistake earlier
> > as all other octeontx2 drivers are compiled off on platforms other than Linux.
> >
> > diff --git a/drivers/event/octeontx2/meson.build b/drivers/event/octeontx2/meson.build
> > index 96ebb1f2e7..1ebc51f73f 100644
> > --- a/drivers/event/octeontx2/meson.build
> > +++ b/drivers/event/octeontx2/meson.build
> > @@ -2,7 +2,7 @@
> >  # Copyright(C) 2019 Marvell International Ltd.
> >  #
> >
> > -if not dpdk_conf.get('RTE_ARCH_64')
> > +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
> >      build = false
> >      reason = 'only supported on 64-bit'
> >      subdir_done()
> 
> I did not suggest this possibility.
> That's the same as for other octeon drivers, such change has been
> deferred to 21.11.
> https://patches.dpdk.org/project/dpdk/list/?series=15885
> 
> >
> > Or of this does not work, then we would need to add exception in ABI checking.
> > Any suggestions how to do this?
> 
> Sorry, no good idea from me.

We would need to revert the change breaking the ABI test.
But I don't understand why it seems passing in recent CI runs?



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe
  2021-07-19 22:00  3%   ` Stephen Hemminger
@ 2021-07-20 13:26  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-20 13:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Tyler Retzlaff, dev, anatoly.burakov, david.marchand

20/07/2021 00:00, Stephen Hemminger:
> On Mon, 19 Jul 2021 10:15:34 -0700
> Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:
> 
> > On Tue, Jul 13, 2021 at 01:12:21PM -0700, Stephen Hemminger wrote:
> > > The first argument to rte_bsf32_safe was incorrectly declared as
> > > a 64 bit value. This function only correctly handles on 32 bit values
> > > and the underlying function rte_bsf32 only accepts 32 bit values.
> > > This was introduced when the safe version was added and probably cause
> > > by copy/paste from the 64 bit version.  
> > 
> > there are multiple errors in this family of functions [1] both in usage
> > and signatures. we previously discussed rolling all fixes up into a single
> > patch and announcing an api break.
> > 
> > a doc patch was submitted as per the process documented for breaking api
> > but received no replies [2]
> > 
> > i have a full patch that corrects the whole family if you would like to
> > take it instead. contact me offline if you are interested.
> > 
> > 1. http://mails.dpdk.org/archives/dev/2021-March/201590.html
> > 2. http://mails.dpdk.org/archives/dev/2021-March/201868.html
> > 
> > the change stand-alone is correct so
> > 
> > Acked-By: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
> Thanks, I think the larger set should go into 21.11 where API/ABI break
> would be ok. My bit was all about fixing the bug where current code
> breaks C++ users.

Shouldn't we have a note in the API changes section of the release notes?




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  2021-07-20 11:58  3%         ` Akhil Goyal
@ 2021-07-20 12:14  0%           ` David Marchand
  2021-07-21  9:44  3%             ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2021-07-20 12:14 UTC (permalink / raw)
  To: Akhil Goyal
  Cc: Thomas Monjalon, Ray Kinsella, dev, Pavan Nikhilesh Bhagavatula,
	Anoob Joseph, Abhinandan Gujjar, Ankur Dwivedi,
	Jerin Jacob Kollanukkaran, Aaron Conole, dpdklab, Lincoln Lavoie,
	Shijith Thotton

On Tue, Jul 20, 2021 at 1:59 PM Akhil Goyal <gakhil@marvell.com> wrote:
>
>  Hi David,
> >
> > > >  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev',
> > > 'net_octeontx']
> > > > +deps += ['crypto_octeontx']
> > >
> > > This extra dependency resulted in disabling the event/octeontx driver
> > > in FreeBSD, since crypto/octeontx only builds on Linux.
> > > Removing hw support triggers a ABI failure for FreeBSD.
> > >
> > >
> > > - This had been reported by UNH CI:
> > > http://mails.dpdk.org/archives/test-report/2021-June/200637.html
> > > It seems the result has been ignored but it should have at least
> > > raised some discussion.
> > >
> > This was highlighted to CI ML
> > http://patches.dpdk.org/project/dpdk/patch/0686a7c3fb3a22e37378a8545b
> > c37bce04f4c391.1624481225.git.sthotton@marvell.com/
> >
> > but I think I missed to take the follow up with Brandon and applied the patch
> > as it did not look an issue to me as octeon drivers are not currently built on
> > FreeBSD.
> > Not sure why event driver is getting built there.
> >
> > >
> > > - I asked UNH to stop testing FreeBSD abi for now, waiting to get the
> > > main branch fixed.
> > >
> > > I don't have the time to look at this, please can you work on it?
> > >
> > > Several options:
> > > * crypto/octeontx is made so that it compiles on FreeBSD,
> > > * the abi check is extended to have exceptions per OS,
> > > * the FreeBSD abi reference is regenerated at UNH not to have those
> > > drivers in it (not sure it is doable),
> >
> > Thanks for the suggestions, we are working on it to resolve this as soon as
> > possible.
> > We may need to add exception in ABI checking so that it does not shout if a
> > PMD
> > is not compiled.
> Can we have below change? Will it work to disable compilation of
> event/octeontx2 for FreeBSD? I believe this was done by mistake earlier
> as all other octeontx2 drivers are compiled off on platforms other than Linux.
>
> diff --git a/drivers/event/octeontx2/meson.build b/drivers/event/octeontx2/meson.build
> index 96ebb1f2e7..1ebc51f73f 100644
> --- a/drivers/event/octeontx2/meson.build
> +++ b/drivers/event/octeontx2/meson.build
> @@ -2,7 +2,7 @@
>  # Copyright(C) 2019 Marvell International Ltd.
>  #
>
> -if not dpdk_conf.get('RTE_ARCH_64')
> +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
>      build = false
>      reason = 'only supported on 64-bit'
>      subdir_done()

I did not suggest this possibility.
That's the same as for other octeon drivers, such change has been
deferred to 21.11.
https://patches.dpdk.org/project/dpdk/list/?series=15885



>
> Or of this does not work, then we would need to add exception in ABI checking.
> Any suggestions how to do this?

Sorry, no good idea from me.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  2021-07-16  8:39  3%       ` [dpdk-dev] [EXT] " Akhil Goyal
@ 2021-07-20 11:58  3%         ` Akhil Goyal
  2021-07-20 12:14  0%           ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2021-07-20 11:58 UTC (permalink / raw)
  To: David Marchand, Thomas Monjalon, Ray Kinsella
  Cc: dev, Pavan Nikhilesh Bhagavatula, Anoob Joseph,
	Abhinandan Gujjar, Ankur Dwivedi, Jerin Jacob Kollanukkaran,
	Aaron Conole, dpdklab, Lincoln Lavoie, Shijith Thotton

 Hi David,
> 
> > >  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev',
> > 'net_octeontx']
> > > +deps += ['crypto_octeontx']
> >
> > This extra dependency resulted in disabling the event/octeontx driver
> > in FreeBSD, since crypto/octeontx only builds on Linux.
> > Removing hw support triggers a ABI failure for FreeBSD.
> >
> >
> > - This had been reported by UNH CI:
> > http://mails.dpdk.org/archives/test-report/2021-June/200637.html
> > It seems the result has been ignored but it should have at least
> > raised some discussion.
> >
> This was highlighted to CI ML
> http://patches.dpdk.org/project/dpdk/patch/0686a7c3fb3a22e37378a8545b
> c37bce04f4c391.1624481225.git.sthotton@marvell.com/
> 
> but I think I missed to take the follow up with Brandon and applied the patch
> as it did not look an issue to me as octeon drivers are not currently built on
> FreeBSD.
> Not sure why event driver is getting built there.
> 
> >
> > - I asked UNH to stop testing FreeBSD abi for now, waiting to get the
> > main branch fixed.
> >
> > I don't have the time to look at this, please can you work on it?
> >
> > Several options:
> > * crypto/octeontx is made so that it compiles on FreeBSD,
> > * the abi check is extended to have exceptions per OS,
> > * the FreeBSD abi reference is regenerated at UNH not to have those
> > drivers in it (not sure it is doable),
> 
> Thanks for the suggestions, we are working on it to resolve this as soon as
> possible.
> We may need to add exception in ABI checking so that it does not shout if a
> PMD
> is not compiled.
Can we have below change? Will it work to disable compilation of
event/octeontx2 for FreeBSD? I believe this was done by mistake earlier
as all other octeontx2 drivers are compiled off on platforms other than Linux.

diff --git a/drivers/event/octeontx2/meson.build b/drivers/event/octeontx2/meson.build
index 96ebb1f2e7..1ebc51f73f 100644
--- a/drivers/event/octeontx2/meson.build
+++ b/drivers/event/octeontx2/meson.build
@@ -2,7 +2,7 @@
 # Copyright(C) 2019 Marvell International Ltd.
 #

-if not dpdk_conf.get('RTE_ARCH_64')
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     build = false
     reason = 'only supported on 64-bit'
     subdir_done()

Or of this does not work, then we would need to add exception in ABI checking.
Any suggestions how to do this?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-19 12:50  0%         ` Xueming(Steven) Li
@ 2021-07-20  8:59  0%           ` Andrew Rybchenko
  2021-07-29  4:13  0%             ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-20  8:59 UTC (permalink / raw)
  To: Xueming(Steven) Li, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable

On 7/19/21 3:50 PM, Xueming(Steven) Li wrote:
> 
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Monday, July 19, 2021 8:36 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
>> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
>>
>> On 7/19/21 2:54 PM, Xueming(Steven) Li wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Monday, July 19, 2021 4:46 PM
>>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
>>>> <ajit.khaparde@broadcom.com>; Somnath Kotur
>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong
>>>> Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>;
>>>> Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>;
>>>> Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>;
>>>> Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>>>> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
>>>>
>>>> On 7/19/21 9:58 AM, Xueming(Steven) Li wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> Sent: Tuesday, July 13, 2021 12:18 AM
>>>>>> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
>>>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
>>>>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
>>>>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi
>>>>>> Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>;
>>>>>> Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>;
>>>>>> Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
>>>>>> Monjalon <thomas@monjalon.net>; Ferruh Yigit
>>>>>> <ferruh.yigit@intel.com>;
>>>>>> Xueming(Steven) Li <xuemingl@nvidia.com>
>>>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
>>>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>>>>>> Subject: [PATCH] ethdev: fix representor port ID search by name
>>>>>>
>>>>>> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
>>>>>>
>>>>>> Fix representor port ID search by name if the representor itself
>>>>>> does not provide representors info. Getting a list of representors
>>>>>> from a representor does not make sense. Instead, a parent device
>>>> should be used.
>>>>>>
>>>>>> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
>>>>>>
>>>>>> Fixes: df7547a6a2cc ("ethdev: add helper function to get
>>>>>> representor
>>>>>> ID")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Viacheslav Galaktionov
>>>>>> <viacheslav.galaktionov@oktetlabs.ru>
>>>>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>> ---
>>>>>> The new field is added into the hole in rte_eth_dev_data structure.
>>>>>> The patch does not change ABI, but extra care is required since ABI
>>>>>> check is disabled for the structure because of the libabigail bug
>>>> [1].
>>>>>>
>>>>>> Potentially it is bad for out-of-tree drivers which implement
>>>>>> representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Do we care?
>>>>>>
>>>>>> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
>>>>>>
>>>>>> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
>>>>>>
>>>>>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
>>>>
>>>> [snip]
>>>>
>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
>>>>>>      * For backward compatibility, if no representor info, direct
>>>>>>      * map legacy VF (no controller and pf).
>>>>>>      *
>>>>>> - * @param ethdev
>>>>>> - *  Handle of ethdev port.
>>>>>> + * @param parent_port_id
>>>>>> + *  Port ID of the backing device.
>>>>>>      * @param type
>>>>>>      *  Representor type.
>>>>>>      * @param controller
>>>>>> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
>>>>>>      */
>>>>>>     __rte_internal
>>>>>>     int
>>>>>> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
>>>>>> +rte_eth_representor_id_get(uint16_t parent_port_id,
>>>>>
>>>>> It make more sense to get representor info from parent port.
>>>>> Representor is a member of switch domain, PMD owns the information
>>>>> of the representor owner port and info of representors. This change
>>>>> looks better, but not sure whether it valuable to introduce a new
>>>> member to the EAL data structure.
>>>>
>>>> IMHO, it is simply incorrect to return representors info on a
>>>> representor itself. Representor info is an information which representors may be populated using the device.
>>>>
>>>> If above statement is correct, we need a way to get parent device by
>>>> representor to do name to representor ID mapping. I see two options to do it:
>>>>     A. Dedicated field in rte_eth_dev_data as the patch does.
>>>>     B. Dedicated ethdev op (since representor knows parent port ID anyway).
>>>> We have chosen (A) because of simplicity.
>>>
>>> Just recalled that representor port could be probed w/o owner PF, is a force for parent port?
>>
>> I thought that it is impossible and parent port is absolutely required for a representor. Could you provide an example and explain how
>> will it work?
> 
> In case of bonding, PF0 and PF1 become one PF port `bond0`, PCI address is PF0.
> 	-a <PF0>,representor=pf[0-1]vf[0-99] // this is the syntax we proposed.

Is it net/bonding or vendor-specific bonding in HW?
If I remember correctly in the case of net/bonding we have ethdev ports
for bonded devices.

> 
> To be backward compatible, also support the following 2 devargs:
> 	-a <pf0>,representor=[0-99] // probe bond0 and representor on pf0
> 	-a <pf1>,representor=[0-99] // probe representors on pf1.
> If devargs start with PF1 devargs, no owner PF1 created as it disabled in bonding. Can't create bond0(PF0) automatically here as
> device is located by PCI address(PF1) from devargs.

So, I guess the problem is vendor-specific bonding in HW. Anyway
legacy backward compatible representor spec should not require
representors info since it worked before without it. So, it does
not sound like a reason to have representors info on a representor
itself.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 0/2] Improvements to rte_security
@ 2021-07-20  5:46  3% Anoob Joseph
  0 siblings, 0 replies; 200+ results
From: Anoob Joseph @ 2021-07-20  5:46 UTC (permalink / raw)
  To: Akhil Goyal, Declan Doherty, Fan Zhang, Konstantin Ananyev
  Cc: Anoob Joseph, Jerin Jacob, Ankur Dwivedi, Tejasree Kondoj, dev

Add options for offloading
- IV generation
- SA lifetime

With lookaside protocol (IPsec) offloads, application is expected to
provide IV in rte_crypto_op. For cryptodevs which can generate true
random, this operation can be offloaded.

SA lifetime is used in tracking SA expiries and initiating SA renegotiation.
For cryptodevs which can track expiries, this operation can be offloaded.

This patchset introduces ABI breakages and is intended for 21.11 release

Anoob Joseph (2):
  lib/security: add IV generation
  lib/security: add SA lifetime configuration

 examples/ipsec-secgw/ipsec.c |  2 +-
 examples/ipsec-secgw/ipsec.h |  2 +-
 lib/cryptodev/rte_crypto.h   |  7 +++++++
 lib/security/rte_security.h  | 42 ++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 49 insertions(+), 4 deletions(-)

-- 
2.7.4


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe
  2021-07-19 17:15  0% ` Tyler Retzlaff
@ 2021-07-19 22:00  3%   ` Stephen Hemminger
  2021-07-20 13:26  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2021-07-19 22:00 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, anatoly.burakov

On Mon, 19 Jul 2021 10:15:34 -0700
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:

> On Tue, Jul 13, 2021 at 01:12:21PM -0700, Stephen Hemminger wrote:
> > The first argument to rte_bsf32_safe was incorrectly declared as
> > a 64 bit value. This function only correctly handles on 32 bit values
> > and the underlying function rte_bsf32 only accepts 32 bit values.
> > This was introduced when the safe version was added and probably cause
> > by copy/paste from the 64 bit version.  
> 
> there are multiple errors in this family of functions [1] both in usage
> and signatures. we previously discussed rolling all fixes up into a single
> patch and announcing an api break.
> 
> a doc patch was submitted as per the process documented for breaking api
> but received no replies [2]
> 
> i have a full patch that corrects the whole family if you would like to
> take it instead. contact me offline if you are interested.
> 
> 1. http://mails.dpdk.org/archives/dev/2021-March/201590.html
> 2. http://mails.dpdk.org/archives/dev/2021-March/201868.html
> 
> the change stand-alone is correct so
> 
> Acked-By: Tyler Retzlaff <roretzla@linux.microsoft.com>

Thanks, I think the larger set should go into 21.11 where API/ABI break
would be ok. My bit was all about fixing the bug where current code
breaks C++ users.


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe
  2021-07-13 20:12  3% [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe Stephen Hemminger
@ 2021-07-19 17:15  0% ` Tyler Retzlaff
  2021-07-19 22:00  3%   ` Stephen Hemminger
  2021-07-23  0:52  8% ` [dpdk-dev] [PATCH v2] " Stephen Hemminger
  2021-07-23 15:45  8% ` [dpdk-dev] [PATCH v3] " Stephen Hemminger
  2 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-07-19 17:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, anatoly.burakov

On Tue, Jul 13, 2021 at 01:12:21PM -0700, Stephen Hemminger wrote:
> The first argument to rte_bsf32_safe was incorrectly declared as
> a 64 bit value. This function only correctly handles on 32 bit values
> and the underlying function rte_bsf32 only accepts 32 bit values.
> This was introduced when the safe version was added and probably cause
> by copy/paste from the 64 bit version.

there are multiple errors in this family of functions [1] both in usage
and signatures. we previously discussed rolling all fixes up into a single
patch and announcing an api break.

a doc patch was submitted as per the process documented for breaking api
but received no replies [2]

i have a full patch that corrects the whole family if you would like to
take it instead. contact me offline if you are interested.

1. http://mails.dpdk.org/archives/dev/2021-March/201590.html
2. http://mails.dpdk.org/archives/dev/2021-March/201868.html

the change stand-alone is correct so

Acked-By: Tyler Retzlaff <roretzla@linux.microsoft.com>

> 
> The bug passed silently under the radar until some other code was
> built with -Wall and -Wextra in C++ and C++ complains about the
> missing cast.
> 
> Yes, this is a API signature change, but the original code was wrong.
> It is an inline so not an ABI change.
> 
> Fixes: 4e261f551986 ("eal: add 64-bit bsf and 32-bit safe bsf functions")
> Cc: anatoly.burakov@intel.com
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/eal/include/rte_common.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
> index d5a32c66a5fe..99eb5f1820ae 100644
> --- a/lib/eal/include/rte_common.h
> +++ b/lib/eal/include/rte_common.h
> @@ -623,7 +623,7 @@ rte_bsf32(uint32_t v)
>   *     Returns 0 if ``v`` was 0, otherwise returns 1.
>   */
>  static inline int
> -rte_bsf32_safe(uint64_t v, uint32_t *pos)
> +rte_bsf32_safe(uint32_t v, uint32_t *pos)
>  {
>  	if (v == 0)
>  		return 0;
> -- 
> 2.30.2

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash offload
       [not found]             ` <DM8PR11MB5639C757A790F65CBFB647C2D1E19@DM8PR11MB5639.namprd11.prod.outlook.com>
@ 2021-07-19 16:18  0%           ` Ferruh Yigit
  2021-07-22 11:03  0%             ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-07-19 16:18 UTC (permalink / raw)
  To: Wang, Jie1X, Li, Xiaoyun, dev; +Cc: andrew.rybchenko, stable

On 7/19/2021 10:55 AM, Wang, Jie1X wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh <ferruh.yigit@intel.com>
>> Sent: Friday, July 16, 2021 4:52 PM
>> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wang, Jie1X <jie1x.wang@intel.com>;
>> dev@dpdk.org
>> Cc: andrew.rybchenko@oktetlabs.ru; stable@dpdk.org
>> Subject: Re: [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show
>> RSS hash offload
>>
>> On 7/16/2021 9:30 AM, Li, Xiaoyun wrote:
>>>> -----Original Message-----
>>>> From: stable <stable-bounces@dpdk.org> On Behalf Of Li, Xiaoyun
>>>> Sent: Thursday, July 15, 2021 12:54
>>>> To: Wang, Jie1X <jie1x.wang@intel.com>; dev@dpdk.org
>>>> Cc: andrew.rybchenko@oktetlabs.ru; stable@dpdk.org
>>>> Subject: Re: [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd
>>>> doesn't show RSS hash offload
>>>>
>>>>> -----Original Message-----
>>>>> From: Wang, Jie1X <jie1x.wang@intel.com>
>>>>> Sent: Thursday, July 15, 2021 19:57
>>>>> To: dev@dpdk.org
>>>>> Cc: Li, Xiaoyun <xiaoyun.li@intel.com>;
>>>>> andrew.rybchenko@oktetlabs.ru; Wang, Jie1X <jie1x.wang@intel.com>;
>>>>> stable@dpdk.org
>>>>> Subject: [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash
>>>>> offload
>>>>>
>>>>> The driver may change offloads info into dev->data->dev_conf in
>>>>> dev_configure which may cause port->dev_conf and port->rx_conf
>>>>> contain
>>>> outdated values.
>>>>>
>>>>> This patch updates the offloads info if it changes to fix this issue.
>>>>>
>>>>> Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Jie Wang <jie1x.wang@intel.com>
>>>>> ---
>>>>> v4: delete the whitespace at the end of the line.
>>>>> v3:
>>>>>  - check and update the "offloads" of "port->dev_conf.rx/txmode".
>>>>>  - update the commit log.
>>>>> v2: copy "rx/txmode.offloads", instead of copying the entire struct
>>>>> "dev->data-
>>>>>> dev_conf.rx/txmode".
>>>>> ---
>>>>>  app/test-pmd/testpmd.c | 27 +++++++++++++++++++++++++++
>>>>>  1 file changed, 27 insertions(+)
>>>>
>>>> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>>
>>> Although I gave my ack, app shouldn't touch rte_eth_devices which this patch
>> does. Usually, testpmd should only call function like
>> eth_dev_info_get_print_err().
>>> But dev_info doesn't contain the info dev->data->dev_conf which the driver
>> modifies.
>>>
>>> Probably we need a better fix.
>>>
>>
>> Agree, an application accessing directly to 'rte_eth_devices' is sign of something
>> missing/wrong.
>>
>> In this case there is no way for application to know what is the configured
>> offload settings per port and queue. Which is missing part I think.
>>
>> As you said normally we get data from PMD mainly via 'rte_eth_dev_info_get()',
>> which is an overloaded function, it provides many different things, like driver
>> default values, limitations, current config/status, capabilities etc...
>>
>> So I think we can do a few things:
>> 1) Add current offload configuration to 'rte_eth_dev_info_get()', so application
>> can get it and use it.
>> The advantage is this API already called many places, many times, so there is a
>> big chance that application already have this information when it needs.
>> Disadvantage is, as mentioned above the API already big and messy, making it
>> bigger makes more error prone and makes easier to break ABI.
>>
> I prefer to choose the 1st suggestion. 
> 
> Normally PMD gets data via 'rte_eth_dev_info_get()'. When we add offloads configuration 
> to it, we can get offloads as same as getting other info.
> 

Most probably it is easier to implement 1), I see your point but as said before
I think 'rte_eth_dev_info_get()' is already messy and I am worried to make it
even bigger.

I prefer option 2).

@Thomas, @Andrew, what do you think?


>> 2) Add a new API to get configured offload information, so a specific API for it.
>>
>> 3) Get a more generic API to get configured config (dev_conf) which will cover
>> offloads too.
>> Disadvantage can be leaking out too many internal config to user unintentionally.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eventdev: configure the Rx event buffer size
  2021-07-19 15:26  3%   ` Kundapura, Ganapati
@ 2021-07-19 16:13  3%     ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-19 16:13 UTC (permalink / raw)
  To: Kundapura, Ganapati; +Cc: Jayatheerthan, Jay, dpdk-dev

On Mon, Jul 19, 2021 at 8:57 PM Kundapura, Ganapati
<ganapati.kundapura@intel.com> wrote:
>
> Hi Jerin,

HI Ganapati

>    Please find my response in lined.
>
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 19 July 2021 12:14
> To: Kundapura, Ganapati <ganapati.kundapura@intel.com>
> Cc: Jayatheerthan, Jay <jay.jayatheerthan@intel.com>; dpdk-dev <dev@dpdk.org>
> Subject: Re: [dpdk-dev] [PATCH] eventdev: configure the Rx event buffer size
>
> On Fri, Jul 16, 2021 at 10:33 PM Ganapati Kundapura <ganapati.kundapura@intel.com> wrote:
> >
> > As of now Rx event buffer size is static and set to 128.
> >
> > This patch sets the Rx event buffer size to 192, configurable at
> > compile time and also errors out at run time if Rx event buffer size
> > is configured more than 16 bits.
> >
> > Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
> > ---
> >  config/rte_config.h                     |  1 +
> >  lib/eventdev/rte_event_eth_rx_adapter.c | 14 +++++++++++++-
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/config/rte_config.h b/config/rte_config.h index
> > 590903c..3d938c8 100644
> > --- a/config/rte_config.h
> > +++ b/config/rte_config.h
> > @@ -77,6 +77,7 @@
> >  #define RTE_EVENT_ETH_INTR_RING_SIZE 1024  #define
> > RTE_EVENT_CRYPTO_ADAPTER_MAX_INSTANCE 32  #define
> > RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE 32
> > +#define RTE_EVENT_ETH_RX_ADAPTER_BUFFER_SIZE 128
>
> We are limiting any configuration to rte_config.h file.
> Could you make it dynamic with the default value and application can pass the value kind of scheme?
> [Ganapati]
> Making the Rx event buffer size dynamic seems to be a good idea but in case of rx adapter,
> either passing event buffer size to adapter create api requires api signature change which breaks ABI
> or by adding event buffer size in port_config parameter which comes from eventdev
> to adapter create function is not scalable as user can also call create_ext() with its own callback
> and parameter to callback is void * which is interpreted by user space callback function.
>
> I think one way to do the event buffer size dynamic is to add new api to set the event buffer size.
> If called, it'll set the event buffer size to the value passed otherwise rx adapter instance create api will do with
> default value.
>
> Let me know your opinion on this.

we can break the ABI in v21.11 so create API config structure can change.
Please send depreciation notice and submit the implementation for 21.11,

>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eventdev: configure the Rx event buffer size
  @ 2021-07-19 15:26  3%   ` Kundapura, Ganapati
  2021-07-19 16:13  3%     ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Kundapura, Ganapati @ 2021-07-19 15:26 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Jayatheerthan, Jay, dpdk-dev

Hi Jerin,
   Please find my response in lined.

-----Original Message-----
From: Jerin Jacob <jerinjacobk@gmail.com> 
Sent: 19 July 2021 12:14
To: Kundapura, Ganapati <ganapati.kundapura@intel.com>
Cc: Jayatheerthan, Jay <jay.jayatheerthan@intel.com>; dpdk-dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] eventdev: configure the Rx event buffer size

On Fri, Jul 16, 2021 at 10:33 PM Ganapati Kundapura <ganapati.kundapura@intel.com> wrote:
>
> As of now Rx event buffer size is static and set to 128.
>
> This patch sets the Rx event buffer size to 192, configurable at 
> compile time and also errors out at run time if Rx event buffer size 
> is configured more than 16 bits.
>
> Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
> ---
>  config/rte_config.h                     |  1 +
>  lib/eventdev/rte_event_eth_rx_adapter.c | 14 +++++++++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/config/rte_config.h b/config/rte_config.h index 
> 590903c..3d938c8 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -77,6 +77,7 @@
>  #define RTE_EVENT_ETH_INTR_RING_SIZE 1024  #define 
> RTE_EVENT_CRYPTO_ADAPTER_MAX_INSTANCE 32  #define 
> RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE 32
> +#define RTE_EVENT_ETH_RX_ADAPTER_BUFFER_SIZE 128

We are limiting any configuration to rte_config.h file.
Could you make it dynamic with the default value and application can pass the value kind of scheme?
[Ganapati] 
Making the Rx event buffer size dynamic seems to be a good idea but in case of rx adapter,
either passing event buffer size to adapter create api requires api signature change which breaks ABI
or by adding event buffer size in port_config parameter which comes from eventdev 
to adapter create function is not scalable as user can also call create_ext() with its own callback 
and parameter to callback is void * which is interpreted by user space callback function.

I think one way to do the event buffer size dynamic is to add new api to set the event buffer size.
If called, it'll set the event buffer size to the value passed otherwise rx adapter instance create api will do with 
default value. 

Let me know your opinion on this.


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-19 12:36  0%       ` Andrew Rybchenko
@ 2021-07-19 12:50  0%         ` Xueming(Steven) Li
  2021-07-20  8:59  0%           ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-07-19 12:50 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable



> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, July 19, 2021 8:36 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> 
> On 7/19/21 2:54 PM, Xueming(Steven) Li wrote:
> >
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Monday, July 19, 2021 4:46 PM
> >> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde
> >> <ajit.khaparde@broadcom.com>; Somnath Kotur
> >> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong
> >> Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>;
> >> Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>;
> >> Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>;
> >> Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> >> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> >>
> >> On 7/19/21 9:58 AM, Xueming(Steven) Li wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Tuesday, July 13, 2021 12:18 AM
> >>>> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> >>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>;
> >>>> Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> >>>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi
> >>>> Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>;
> >>>> Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>;
> >>>> Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
> >>>> Monjalon <thomas@monjalon.net>; Ferruh Yigit
> >>>> <ferruh.yigit@intel.com>;
> >>>> Xueming(Steven) Li <xuemingl@nvidia.com>
> >>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >>>> Subject: [PATCH] ethdev: fix representor port ID search by name
> >>>>
> >>>> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
> >>>>
> >>>> Fix representor port ID search by name if the representor itself
> >>>> does not provide representors info. Getting a list of representors
> >>>> from a representor does not make sense. Instead, a parent device
> >> should be used.
> >>>>
> >>>> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
> >>>>
> >>>> Fixes: df7547a6a2cc ("ethdev: add helper function to get
> >>>> representor
> >>>> ID")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Viacheslav Galaktionov
> >>>> <viacheslav.galaktionov@oktetlabs.ru>
> >>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> ---
> >>>> The new field is added into the hole in rte_eth_dev_data structure.
> >>>> The patch does not change ABI, but extra care is required since ABI
> >>>> check is disabled for the structure because of the libabigail bug
> >> [1].
> >>>>
> >>>> Potentially it is bad for out-of-tree drivers which implement
> >>>> representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Do we care?
> >>>>
> >>>> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
> >>>>
> >>>> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
> >>>>
> >>>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
> >>
> >> [snip]
> >>
> >>>> --- a/lib/ethdev/ethdev_driver.h
> >>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
> >>>>     * For backward compatibility, if no representor info, direct
> >>>>     * map legacy VF (no controller and pf).
> >>>>     *
> >>>> - * @param ethdev
> >>>> - *  Handle of ethdev port.
> >>>> + * @param parent_port_id
> >>>> + *  Port ID of the backing device.
> >>>>     * @param type
> >>>>     *  Representor type.
> >>>>     * @param controller
> >>>> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
> >>>>     */
> >>>>    __rte_internal
> >>>>    int
> >>>> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
> >>>> +rte_eth_representor_id_get(uint16_t parent_port_id,
> >>>
> >>> It make more sense to get representor info from parent port.
> >>> Representor is a member of switch domain, PMD owns the information
> >>> of the representor owner port and info of representors. This change
> >>> looks better, but not sure whether it valuable to introduce a new
> >> member to the EAL data structure.
> >>
> >> IMHO, it is simply incorrect to return representors info on a
> >> representor itself. Representor info is an information which representors may be populated using the device.
> >>
> >> If above statement is correct, we need a way to get parent device by
> >> representor to do name to representor ID mapping. I see two options to do it:
> >>    A. Dedicated field in rte_eth_dev_data as the patch does.
> >>    B. Dedicated ethdev op (since representor knows parent port ID anyway).
> >> We have chosen (A) because of simplicity.
> >
> > Just recalled that representor port could be probed w/o owner PF, is a force for parent port?
> 
> I thought that it is impossible and parent port is absolutely required for a representor. Could you provide an example and explain how
> will it work?

In case of bonding, PF0 and PF1 become one PF port `bond0`, PCI address is PF0.
	-a <PF0>,representor=pf[0-1]vf[0-99] // this is the syntax we proposed.

To be backward compatible, also support the following 2 devargs:
	-a <pf0>,representor=[0-99] // probe bond0 and representor on pf0
	-a <pf1>,representor=[0-99] // probe representors on pf1.
If devargs start with PF1 devargs, no owner PF1 created as it disabled in bonding. Can't create bond0(PF0) automatically here as 
device is located by PCI address(PF1) from devargs.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-19 11:54  0%     ` Xueming(Steven) Li
@ 2021-07-19 12:36  0%       ` Andrew Rybchenko
  2021-07-19 12:50  0%         ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-19 12:36 UTC (permalink / raw)
  To: Xueming(Steven) Li, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable

On 7/19/21 2:54 PM, Xueming(Steven) Li wrote:
> 
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Monday, July 19, 2021 4:46 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
>> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
>> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
>>
>> On 7/19/21 9:58 AM, Xueming(Steven) Li wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Tuesday, July 13, 2021 12:18 AM
>>>> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
>>>> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong
>>>> Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>;
>>>> Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>;
>>>> Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>;
>>>> Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
>>>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
>>>> Xueming(Steven) Li <xuemingl@nvidia.com>
>>>> Cc: dev@dpdk.org; Viacheslav Galaktionov
>>>> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>>>> Subject: [PATCH] ethdev: fix representor port ID search by name
>>>>
>>>> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
>>>>
>>>> Fix representor port ID search by name if the representor itself does
>>>> not provide representors info. Getting a list of representors from a representor does not make sense. Instead, a parent device
>> should be used.
>>>>
>>>> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
>>>>
>>>> Fixes: df7547a6a2cc ("ethdev: add helper function to get representor
>>>> ID")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Viacheslav Galaktionov
>>>> <viacheslav.galaktionov@oktetlabs.ru>
>>>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> ---
>>>> The new field is added into the hole in rte_eth_dev_data structure.
>>>> The patch does not change ABI, but extra care is required since ABI check is disabled for the structure because of the libabigail bug
>> [1].
>>>>
>>>> Potentially it is bad for out-of-tree drivers which implement
>>>> representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Do we care?
>>>>
>>>> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
>>>>
>>>> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
>>>>
>>>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
>>
>> [snip]
>>
>>>> --- a/lib/ethdev/ethdev_driver.h
>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
>>>>     * For backward compatibility, if no representor info, direct
>>>>     * map legacy VF (no controller and pf).
>>>>     *
>>>> - * @param ethdev
>>>> - *  Handle of ethdev port.
>>>> + * @param parent_port_id
>>>> + *  Port ID of the backing device.
>>>>     * @param type
>>>>     *  Representor type.
>>>>     * @param controller
>>>> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
>>>>     */
>>>>    __rte_internal
>>>>    int
>>>> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
>>>> +rte_eth_representor_id_get(uint16_t parent_port_id,
>>>
>>> It make more sense to get representor info from parent port.
>>> Representor is a member of switch domain, PMD owns the information of
>>> the representor owner port and info of representors. This change looks better, but not sure whether it valuable to introduce a new
>> member to the EAL data structure.
>>
>> IMHO, it is simply incorrect to return representors info on a representor itself. Representor info is an information which representors
>> may be populated using the device.
>>
>> If above statement is correct, we need a way to get parent device by representor to do name to representor ID mapping. I see two
>> options to do it:
>>    A. Dedicated field in rte_eth_dev_data as the patch does.
>>    B. Dedicated ethdev op (since representor knows parent port ID anyway).
>> We have chosen (A) because of simplicity.
> 
> Just recalled that representor port could be probed w/o owner PF, is a force for parent port?

I thought that it is impossible and parent port is absolutely required
for a representor. Could you provide an example and explain how will it
work?


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-19  8:46  0%   ` Andrew Rybchenko
@ 2021-07-19 11:54  0%     ` Xueming(Steven) Li
  2021-07-19 12:36  0%       ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-07-19 11:54 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable



> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, July 19, 2021 4:46 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing
> <beilei.xing@intel.com>; Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang
> <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> Subject: Re: [PATCH] ethdev: fix representor port ID search by name
> 
> On 7/19/21 9:58 AM, Xueming(Steven) Li wrote:
> >
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Tuesday, July 13, 2021 12:18 AM
> >> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> >> <somnath.kotur@broadcom.com>; John Daley <johndale@cisco.com>; Hyong
> >> Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>;
> >> Qiming Yang <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>;
> >> Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>;
> >> Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; NBU-Contact-Thomas Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>;
> >> Xueming(Steven) Li <xuemingl@nvidia.com>
> >> Cc: dev@dpdk.org; Viacheslav Galaktionov
> >> <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> >> Subject: [PATCH] ethdev: fix representor port ID search by name
> >>
> >> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
> >>
> >> Fix representor port ID search by name if the representor itself does
> >> not provide representors info. Getting a list of representors from a representor does not make sense. Instead, a parent device
> should be used.
> >>
> >> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
> >>
> >> Fixes: df7547a6a2cc ("ethdev: add helper function to get representor
> >> ID")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Viacheslav Galaktionov
> >> <viacheslav.galaktionov@oktetlabs.ru>
> >> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> ---
> >> The new field is added into the hole in rte_eth_dev_data structure.
> >> The patch does not change ABI, but extra care is required since ABI check is disabled for the structure because of the libabigail bug
> [1].
> >>
> >> Potentially it is bad for out-of-tree drivers which implement
> >> representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Do we care?
> >>
> >> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
> >>
> >> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
> >>
> >> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
> 
> [snip]
> 
> >> --- a/lib/ethdev/ethdev_driver.h
> >> +++ b/lib/ethdev/ethdev_driver.h
> >> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
> >>    * For backward compatibility, if no representor info, direct
> >>    * map legacy VF (no controller and pf).
> >>    *
> >> - * @param ethdev
> >> - *  Handle of ethdev port.
> >> + * @param parent_port_id
> >> + *  Port ID of the backing device.
> >>    * @param type
> >>    *  Representor type.
> >>    * @param controller
> >> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
> >>    */
> >>   __rte_internal
> >>   int
> >> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
> >> +rte_eth_representor_id_get(uint16_t parent_port_id,
> >
> > It make more sense to get representor info from parent port.
> > Representor is a member of switch domain, PMD owns the information of
> > the representor owner port and info of representors. This change looks better, but not sure whether it valuable to introduce a new
> member to the EAL data structure.
> 
> IMHO, it is simply incorrect to return representors info on a representor itself. Representor info is an information which representors
> may be populated using the device.
> 
> If above statement is correct, we need a way to get parent device by representor to do name to representor ID mapping. I see two
> options to do it:
>   A. Dedicated field in rte_eth_dev_data as the patch does.
>   B. Dedicated ethdev op (since representor knows parent port ID anyway).
> We have chosen (A) because of simplicity.

Just recalled that representor port could be probed w/o owner PF, is a force for parent port?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-19  6:58  0% ` Xueming(Steven) Li
@ 2021-07-19  8:46  0%   ` Andrew Rybchenko
  2021-07-19 11:54  0%     ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-19  8:46 UTC (permalink / raw)
  To: Xueming(Steven) Li, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable

On 7/19/21 9:58 AM, Xueming(Steven) Li wrote:
> 
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Tuesday, July 13, 2021 12:18 AM
>> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur <somnath.kotur@broadcom.com>; John Daley
>> <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>; Qiming Yang
>> <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad
>> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
>> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Xueming(Steven) Li <xuemingl@nvidia.com>
>> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
>> Subject: [PATCH] ethdev: fix representor port ID search by name
>>
>> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
>>
>> Fix representor port ID search by name if the representor itself does not provide representors info. Getting a list of representors from
>> a representor does not make sense. Instead, a parent device should be used.
>>
>> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
>>
>> Fixes: df7547a6a2cc ("ethdev: add helper function to get representor ID")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
>> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> ---
>> The new field is added into the hole in rte_eth_dev_data structure.
>> The patch does not change ABI, but extra care is required since ABI check is disabled for the structure because of the libabigail bug [1].
>>
>> Potentially it is bad for out-of-tree drivers which implement representors but do not fill in a new parert_port_id field in
>> rte_eth_dev_data structure. Do we care?
>>
>> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
>>
>> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
>>
>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060

[snip]

>> --- a/lib/ethdev/ethdev_driver.h
>> +++ b/lib/ethdev/ethdev_driver.h
>> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
>>    * For backward compatibility, if no representor info, direct
>>    * map legacy VF (no controller and pf).
>>    *
>> - * @param ethdev
>> - *  Handle of ethdev port.
>> + * @param parent_port_id
>> + *  Port ID of the backing device.
>>    * @param type
>>    *  Representor type.
>>    * @param controller
>> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
>>    */
>>   __rte_internal
>>   int
>> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
>> +rte_eth_representor_id_get(uint16_t parent_port_id,
> 
> It make more sense to get representor info from parent port. Representor is a member of switch domain, PMD owns
> the information of  the representor owner port and info of representors. This change looks better, but not sure
> whether it valuable to introduce a new member to the EAL data structure.

IMHO, it is simply incorrect to return representors info on a
representor itself. Representor info is an information which
representors may be populated using the device.

If above statement is correct, we need a way to get parent device
by representor to do name to representor ID mapping. I see two
options to do it:
  A. Dedicated field in rte_eth_dev_data as the patch does.
  B. Dedicated ethdev op (since representor knows parent port ID anyway).
We have chosen (A) because of simplicity.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
  2021-07-12 16:17  3% [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name Andrew Rybchenko
@ 2021-07-19  6:58  0% ` Xueming(Steven) Li
  2021-07-19  8:46  0%   ` Andrew Rybchenko
  2021-07-29  4:20  0% ` Xueming(Steven) Li
  2021-08-18 14:00  3% ` [dpdk-dev] [PATCH v2] " Andrew Rybchenko
  2 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-07-19  6:58 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde, Somnath Kotur, John Daley,
	Hyong Youb Kim, Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang,
	Matan Azrad, Shahaf Shuler, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit
  Cc: dev, Viacheslav Galaktionov, stable



> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, July 13, 2021 12:18 AM
> To: Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur <somnath.kotur@broadcom.com>; John Daley
> <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; Beilei Xing <beilei.xing@intel.com>; Qiming Yang
> <qiming.yang@intel.com>; Qi Zhang <qi.z.zhang@intel.com>; Haiyue Wang <haiyue.wang@intel.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Xueming(Steven) Li <xuemingl@nvidia.com>
> Cc: dev@dpdk.org; Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>; stable@dpdk.org
> Subject: [PATCH] ethdev: fix representor port ID search by name
> 
> From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
> 
> Fix representor port ID search by name if the representor itself does not provide representors info. Getting a list of representors from
> a representor does not make sense. Instead, a parent device should be used.
> 
> To this end, extend the rte_eth_dev_data structure to include the port ID of the parent device for representors.
> 
> Fixes: df7547a6a2cc ("ethdev: add helper function to get representor ID")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> ---
> The new field is added into the hole in rte_eth_dev_data structure.
> The patch does not change ABI, but extra care is required since ABI check is disabled for the structure because of the libabigail bug [1].
> 
> Potentially it is bad for out-of-tree drivers which implement representors but do not fill in a new parert_port_id field in
> rte_eth_dev_data structure. Do we care?
> 
> May be the patch should add lines to release notes, but I'd like to get initial feedback first.
> 
> mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly.
> 
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060
> 
>  drivers/net/bnxt/bnxt_reps.c             |  1 +
>  drivers/net/enic/enic_vf_representor.c   |  1 +
>  drivers/net/i40e/i40e_vf_representor.c   |  1 +
>  drivers/net/ice/ice_dcf_vf_representor.c |  1 +  drivers/net/ixgbe/ixgbe_vf_representor.c |  1 +
>  drivers/net/mlx5/linux/mlx5_os.c         | 11 +++++++++++
>  drivers/net/mlx5/windows/mlx5_os.c       | 11 +++++++++++
>  lib/ethdev/ethdev_driver.h               |  6 +++---
>  lib/ethdev/rte_class_eth.c               |  2 +-
>  lib/ethdev/rte_ethdev.c                  |  8 ++++----
>  lib/ethdev/rte_ethdev_core.h             |  4 ++++
>  11 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c index bdbad53b7d..902591cd39 100644
> --- a/drivers/net/bnxt/bnxt_reps.c
> +++ b/drivers/net/bnxt/bnxt_reps.c
> @@ -187,6 +187,7 @@ int bnxt_representor_init(struct rte_eth_dev *eth_dev, void *params)
>  	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
>  					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
>  	eth_dev->data->representor_id = rep_params->vf_id;
> +	eth_dev->data->parent_port_id = rep_params->parent_dev->data->port_id;
> 
>  	rte_eth_random_addr(vf_rep_bp->dflt_mac_addr);
>  	memcpy(vf_rep_bp->mac_addr, vf_rep_bp->dflt_mac_addr, diff --git a/drivers/net/enic/enic_vf_representor.c
> b/drivers/net/enic/enic_vf_representor.c
> index 79dd6e5640..6ee7967ce9 100644
> --- a/drivers/net/enic/enic_vf_representor.c
> +++ b/drivers/net/enic/enic_vf_representor.c
> @@ -662,6 +662,7 @@ int enic_vf_representor_init(struct rte_eth_dev *eth_dev, void *init_params)
>  	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
>  					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
>  	eth_dev->data->representor_id = vf->vf_id;
> +	eth_dev->data->parent_port_id = pf->port_id;
>  	eth_dev->data->mac_addrs = rte_zmalloc("enic_mac_addr_vf",
>  		sizeof(struct rte_ether_addr) *
>  		ENIC_UNICAST_PERFECT_FILTERS, 0);
> diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
> index 0481b55381..865b637585 100644
> --- a/drivers/net/i40e/i40e_vf_representor.c
> +++ b/drivers/net/i40e/i40e_vf_representor.c
> @@ -514,6 +514,7 @@ i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
>  	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
>  					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
>  	ethdev->data->representor_id = representor->vf_id;
> +	ethdev->data->parent_port_id = pf->dev_data->parent_port_id;
> 
>  	/* Setting the number queues allocated to the VF */
>  	ethdev->data->nb_rx_queues = vf->vsi->nb_qps; diff --git a/drivers/net/ice/ice_dcf_vf_representor.c
> b/drivers/net/ice/ice_dcf_vf_representor.c
> index 970461f3e9..c7cd3fd290 100644
> --- a/drivers/net/ice/ice_dcf_vf_representor.c
> +++ b/drivers/net/ice/ice_dcf_vf_representor.c
> @@ -418,6 +418,7 @@ ice_dcf_vf_repr_init(struct rte_eth_dev *vf_rep_eth_dev, void *init_param)
> 
>  	vf_rep_eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>  	vf_rep_eth_dev->data->representor_id = repr->vf_id;
> +	vf_rep_eth_dev->data->parent_port_id =
> +repr->dcf_eth_dev->data->port_id;
> 
>  	vf_rep_eth_dev->data->mac_addrs = &repr->mac_addr;
> 
> diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
> index d5b636a194..7a2063849e 100644
> --- a/drivers/net/ixgbe/ixgbe_vf_representor.c
> +++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
> @@ -197,6 +197,7 @@ ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
> 
>  	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>  	ethdev->data->representor_id = representor->vf_id;
> +	ethdev->data->parent_port_id = representor->pf_ethdev->data->port_id;
> 
>  	/* Set representor device ops */
>  	ethdev->dev_ops = &ixgbe_vf_representor_dev_ops; diff --git a/drivers/net/mlx5/linux/mlx5_os.c
> b/drivers/net/mlx5/linux/mlx5_os.c
> index be22d9cbd2..5550d30628 100644
> --- a/drivers/net/mlx5/linux/mlx5_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_os.c
> @@ -1511,6 +1511,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  	if (priv->representor) {
>  		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>  		eth_dev->data->representor_id = priv->representor_id;
> +		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
> +			const struct mlx5_priv *opriv =
> +				rte_eth_devices[port_id].data->dev_private;
> +
> +			if (!opriv ||
> +			    opriv->sh != priv->sh ||
> +			    opriv->representor)
> +				continue;
> +			eth_dev->data->parent_port_id = port_id;
> +			break;
> +		}
>  	}
>  	priv->mp_id.port_id = eth_dev->data->port_id;
>  	strlcpy(priv->mp_id.name, MLX5_MP_NAME, RTE_MP_MAX_NAME_LEN); diff --git a/drivers/net/mlx5/windows/mlx5_os.c
> b/drivers/net/mlx5/windows/mlx5_os.c
> index e30b682822..037c928dc1 100644
> --- a/drivers/net/mlx5/windows/mlx5_os.c
> +++ b/drivers/net/mlx5/windows/mlx5_os.c
> @@ -506,6 +506,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  	if (priv->representor) {
>  		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
>  		eth_dev->data->representor_id = priv->representor_id;
> +		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
> +			const struct mlx5_priv *opriv =
> +				rte_eth_devices[port_id].data->dev_private;
> +
> +			if (!opriv ||
> +			    opriv->sh != priv->sh ||
> +			    opriv->representor)
> +				continue;
> +			eth_dev->data->parent_port_id = port_id;
> +			break;
> +		}
>  	}
>  	/*
>  	 * Store associated network device interface index. This index diff --git a/lib/ethdev/ethdev_driver.h
> b/lib/ethdev/ethdev_driver.h index 40e474aa7e..07f6d1f9a4 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
>   * For backward compatibility, if no representor info, direct
>   * map legacy VF (no controller and pf).
>   *
> - * @param ethdev
> - *  Handle of ethdev port.
> + * @param parent_port_id
> + *  Port ID of the backing device.
>   * @param type
>   *  Representor type.
>   * @param controller
> @@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
>   */
>  __rte_internal
>  int
> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
> +rte_eth_representor_id_get(uint16_t parent_port_id,

It make more sense to get representor info from parent port. Representor is a member of switch domain, PMD owns 
the information of  the representor owner port and info of representors. This change looks better, but not sure
whether it valuable to introduce a new member to the EAL data structure.

>  			   enum rte_eth_representor_type type,
>  			   int controller, int pf, int representor_port,
>  			   uint16_t *repr_id);
> diff --git a/lib/ethdev/rte_class_eth.c b/lib/ethdev/rte_class_eth.c index 1fe5fa1f36..e3b7ab9728 100644
> --- a/lib/ethdev/rte_class_eth.c
> +++ b/lib/ethdev/rte_class_eth.c
> @@ -95,7 +95,7 @@ eth_representor_cmp(const char *key __rte_unused,
>  		c = i / (np * nf);
>  		p = (i / nf) % np;
>  		f = i % nf;
> -		if (rte_eth_representor_id_get(edev,
> +		if (rte_eth_representor_id_get(edev->data->parent_port_id,
>  			eth_da.type,
>  			eth_da.nb_mh_controllers == 0 ? -1 :
>  					eth_da.mh_controllers[c],
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index 6ebf52b641..acda1d43fb 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -5997,7 +5997,7 @@ rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)  }
> 
>  int
> -rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
> +rte_eth_representor_id_get(uint16_t parent_port_id,
>  			   enum rte_eth_representor_type type,
>  			   int controller, int pf, int representor_port,
>  			   uint16_t *repr_id)
> @@ -6012,7 +6012,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
>  		return -EINVAL;
> 
>  	/* Get PMD representor range info. */
> -	ret = rte_eth_representor_info_get(ethdev->data->port_id, NULL);
> +	ret = rte_eth_representor_info_get(parent_port_id, NULL);
>  	if (ret == -ENOTSUP && type == RTE_ETH_REPRESENTOR_VF &&
>  	    controller == -1 && pf == -1) {
>  		/* Direct mapping for legacy VF representor. */ @@ -6026,7 +6026,7 @@ rte_eth_representor_id_get(const struct
> rte_eth_dev *ethdev,
>  	info = calloc(1, size);
>  	if (info == NULL)
>  		return -ENOMEM;
> -	ret = rte_eth_representor_info_get(ethdev->data->port_id, info);
> +	ret = rte_eth_representor_info_get(parent_port_id, info);
>  	if (ret < 0)
>  		goto out;
> 
> @@ -6045,7 +6045,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
>  			continue;
>  		if (info->ranges[i].id_end < info->ranges[i].id_base) {
>  			RTE_LOG(WARNING, EAL, "Port %hu invalid representor ID Range %u - %u, entry %d\n",
> -				ethdev->data->port_id, info->ranges[i].id_base,
> +				parent_port_id, info->ranges[i].id_base,
>  				info->ranges[i].id_end, i);
>  			continue;
> 
> diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h index edf96de2dc..13cb84b52f 100644
> --- a/lib/ethdev/rte_ethdev_core.h
> +++ b/lib/ethdev/rte_ethdev_core.h
> @@ -185,6 +185,10 @@ struct rte_eth_dev_data {
>  			/**< Switch-specific identifier.
>  			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
>  			 */
> +	uint16_t parent_port_id;
> +			/**< Port ID of the backing device.
> +			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
> +			 */
> 
>  	pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex. */
>  	uint64_t reserved_64s[4]; /**< Reserved for future fields */
> --
> 2.30.2


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6] dmadev: introduce DMA device library
  @ 2021-07-19  6:21  3%   ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-19  6:21 UTC (permalink / raw)
  To: Chengwen Feng
  Cc: Thomas Monjalon, Ferruh Yigit, Richardson, Bruce, Jerin Jacob,
	Andrew Rybchenko, dpdk-dev, Morten Brørup, Nipun Gupta,
	Hemant Agrawal, Maxime Coquelin, Honnappa Nagarahalli,
	David Marchand, Satananda Burla, Prasun Kapoor, Ananyev,
	Konstantin

On Mon, Jul 19, 2021 at 9:02 AM Chengwen Feng <fengchengwen@huawei.com> wrote:
>
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
>
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>


The API specification aspects look pretty good to me.

Some minor comments are below. You can add my Acked by on future version
API header file where you will split the patch.


> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> new file mode 100644
> index 0000000..ecac281
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.h
> @@ -0,0 +1,1025 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + * Copyright(c) 2021 Marvell International Ltd.
> + * Copyright(c) 2021 SmartShare Systems.
> + */
> +
> +#ifndef _RTE_DMADEV_H_
> +#define _RTE_DMADEV_H_
> +
> +/**
> + * @file rte_dmadev.h
> + *
> + * RTE DMA (Direct Memory Access) device APIs.
> + *
> + * The DMA framework is built on the following model:
> + *
> + *     ---------------   ---------------       ---------------
> + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
> + *     | channel     |   | channel     |       | channel     |
> + *     ---------------   ---------------       ---------------
> + *            |                |                      |
> + *            ------------------                      |
> + *                     |                              |
> + *               ------------                    ------------
> + *               |  dmadev  |                    |  dmadev  |
> + *               ------------                    ------------
> + *                     |                              |
> + *            ------------------               ------------------
> + *            | HW-DMA-channel |               | HW-DMA-channel |
> + *            ------------------               ------------------
> + *                     |                              |
> + *                     --------------------------------
> + *                                     |
> + *                           ---------------------
> + *                           | HW-DMA-Controller |
> + *                           ---------------------
> + *
> + * The DMA controller could have multiple HW-DMA-channels (aka. HW-DMA-queues),
> + * each HW-DMA-channel should be represented by a dmadev.
> + *
> + * The dmadev could create multiple virtual DMA channels, each virtual DMA
> + * channel represents a different transfer context. The DMA operation request
> + * must be submitted to the virtual DMA channel. e.g. Application could create
> + * virtual DMA channel 0 for memory-to-memory transfer scenario, and create
> + * virtual DMA channel 1 for memory-to-device transfer scenario.
> + *
> + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
> + * PCI/SoC device probing phase performed at EAL initialization time. And could
> + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> + * phase.
> + *
> + * This framework uses 'uint16_t dev_id' as the device identifier of a dmadev,
> + * and 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
> + *
> + * The functions exported by the dmadev API to setup a device designated by its
> + * device identifier must be invoked in the following order:
> + *     - rte_dmadev_configure()
> + *     - rte_dmadev_vchan_setup()
> + *     - rte_dmadev_start()
> + *
> + * Then, the application can invoke dataplane APIs to process jobs.
> + *
> + * If the application wants to change the configuration (i.e. invoke
> + * rte_dmadev_configure() or rte_dmadev_vchan_setup()), it must invoke
> + * rte_dmadev_stop() first to stop the device and then do the reconfiguration
> + * before invoking rte_dmadev_start() again. The dataplane APIs should not be
> + * invoked when the device is stopped.
> + *
> + * Finally, an application can close a dmadev by invoking the
> + * rte_dmadev_close() function.
> + *
> + * The dataplane APIs include two parts:
> + * The first part is the submission of operation requests:
> + *     - rte_dmadev_copy()
> + *     - rte_dmadev_copy_sg()
> + *     - rte_dmadev_fill()
> + *     - rte_dmadev_submit()
> + *
> + * These APIs could work with different virtual DMA channels which have
> + * different contexts.
> + *
> + * The first three APIs are used to submit the operation request to the virtual
> + * DMA channel, if the submission is successful, a uint16_t ring_idx is
> + * returned, otherwise a negative number is returned.
> + *
> + * The last API was used to issue doorbell to hardware, and also there are flags
> + * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
> + * same work.
> + *
> + * The second part is to obtain the result of requests:
> + *     - rte_dmadev_completed()
> + *         - return the number of operation requests completed successfully.
> + *     - rte_dmadev_completed_status()
> + *         - return the number of operation requests completed.
> + *
> + * @note If the dmadev works in silent mode, application does not invoke the

in slient mode (@see RTE_DMA_DEV_CAPA_SILENT)

> + * above two completed APIs.
> + *
> + * About the ring_idx which enqueue APIs (e.g. rte_dmadev_copy()
> + * rte_dmadev_fill()) returned, the rules are as follows:
> + *     - ring_idx for each virtual DMA channel are independent.
> + *     - For a virtual DMA channel, the ring_idx is monotonically incremented,
> + *       when it reach UINT16_MAX, it wraps back to zero.
> + *     - This ring_idx can be used by applications to track per-operation
> + *       metadata in an application-defined circular ring.
> + *     - The initial ring_idx of a virtual DMA channel is zero, after the
> + *       device is stopped, the ring_idx needs to be reset to zero.
> + *
> + * One example:
> + *     - step-1: start one dmadev
> + *     - step-2: enqueue a copy operation, the ring_idx return is 0
> + *     - step-3: enqueue a copy operation again, the ring_idx return is 1
> + *     - ...
> + *     - step-101: stop the dmadev
> + *     - step-102: start the dmadev
> + *     - step-103: enqueue a copy operation, the cookie return is 0
> + *     - ...
> + *     - step-x+0: enqueue a fill operation, the ring_idx return is 65535
> + *     - step-x+1: enqueue a copy operation, the ring_idx return is 0
> + *     - ...
> + *
> + * The DMA operation address used in enqueue APIs (i.e. rte_dmadev_copy(),
> + * rte_dmadev_copy_sg(), rte_dmadev_fill()) defined as rte_iova_t type. The
> + * dmadev supports two types of address: memory address and device address.
> + *
> + * - memory address: the source and destination address of the memory-to-memory
> + * transfer type, or the source address of the memory-to-device transfer type,
> + * or the destination address of the device-to-memory transfer type.
> + * @note If the device support SVA, the memory address can be any VA address,

If the device supports SVA (@see RTE_DMA_DEV_CAPA_SVA)

> + * otherwise it must be an IOVA address.
> + *
> + * - device address: the source and destination address of the device-to-device
> + * transfer type, or the source address of the device-to-memory transfer type,
> + * or the destination address of the memory-to-device transfer type.
> + *
> + * By default, all the functions of the dmadev API exported by a PMD are
> + * lock-free functions which assume to not be invoked in parallel on different
> + * logical cores to work on the same target dmadev object.
> + * @note Different virtual DMA channels on the same dmadev *DO NOT* support
> + * parallel invocation because there virtual DMA channels share the same

their?

> + * HW-DMA-channel.
> + *
> + */
> +
> +#include <rte_common.h>
> +#include <rte_compat.h>
> +#include <rte_dev.h>
> +#include <rte_errno.h>
> +#include <rte_memory.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define RTE_DMADEV_NAME_MAX_LEN        RTE_DEV_NAME_MAX_LEN
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @param dev_id
> + *   DMA device index.
> + *
> + * @return
> + *   - If the device index is valid (true) or not (false).
> + */
> +__rte_experimental
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the total number of DMA devices that have been successfully
> + * initialised.
> + *
> + * @return
> + *   The total number of usable DMA devices.
> + */
> +__rte_experimental
> +uint16_t
> +rte_dmadev_count(void);
> +
> +/* Enumerates DMA device capabilities. */
> +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM    (1ull << 0)
> +/**< DMA device support memory-to-memory transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +
> +#define RTE_DMA_DEV_CAPA_MEM_TO_DEV    (1ull << 1)
> +/**< DMA device support memory-to-device transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + * @see struct rte_dmadev_port_param::port_type
> + */
> +
> +#define RTE_DMA_DEV_CAPA_DEV_TO_MEM    (1ull << 2)
> +/**< DMA device support device-to-memory transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + * @see struct rte_dmadev_port_param::port_type
> + */
> +
> +#define RTE_DMA_DEV_CAPA_DEV_TO_DEV    (1ull << 3)
> +/**< DMA device support device-to-device transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + * @see struct rte_dmadev_port_param::port_type
> + */
> +
> +#define RTE_DMA_DEV_CAPA_SVA           (1ull << 4)
> +/**< DMA device support SVA which could use VA as DMA address.
> + * If device support SVA then application could pass any VA address like memory
> + * from rte_malloc(), rte_memzone(), malloc, stack memory.
> + * If device don't support SVA, then application should pass IOVA address which
> + * from rte_malloc(), rte_memzone().
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +
> +#define RTE_DMA_DEV_CAPA_SILENT                (1ull << 5)
> +/**< DMA device support work in silent mode.
> + * In this mode, application don't required to invoke rte_dmadev_completed*()
> + * API.
> + *
> + * @see struct rte_dmadev_conf::silent_mode
> + */
> +
> +#define RTE_DMA_DEV_CAPA_OPS_COPY      (1ull << 32)
> +/**< DMA device support copy ops.
> + * This capability start with index of 32, so that it could leave gap between
> + * normal capability and ops capability.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +
> +#define RTE_DMA_DEV_CAPA_OPS_COPY_SG   (1ull << 33)
> +/**< DMA device support scatter-list copy ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +
> +#define RTE_DMA_DEV_CAPA_OPS_FILL      (1ull << 34)
> +/**< DMA device support fill ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +
> +/**
> + * A structure used to retrieve the information of a DMA device.
> + */
> +struct rte_dmadev_info {
> +       struct rte_device *device; /**< Generic Device information. */
> +       uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_*). */
> +       uint16_t max_vchans;
> +       /**< Maximum number of virtual DMA channels supported. */
> +       uint16_t max_desc;
> +       /**< Maximum allowed number of virtual DMA channel descriptors. */
> +       uint16_t min_desc;
> +       /**< Minimum allowed number of virtual DMA channel descriptors. */
> +       uint16_t nb_vchans; /**< Number of virtual DMA channel configured. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve information of a DMA device.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param[out] dev_info
> + *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
> + *   information of the device.
> + *
> + * @return
> + *   - =0: Success, driver updates the information of the DMA device.
> + *   - <0: Error code returned by the driver info get function.
> + *
> + */
> +__rte_experimental
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
> +
> +/**
> + * A structure used to configure a DMA device.
> + */
> +struct rte_dmadev_conf {
> +       uint16_t max_vchans;
> +       /**< Maximum number of virtual DMA channel to use.
> +        * This value cannot be greater than the field 'max_vchans' of struct
> +        * rte_dmadev_info which get from rte_dmadev_info_get().
> +        */
> +       uint8_t silent_mode;

bool instead of uint8_t?

> +       /**< Indicates whether to work in silent mode.
> +        * 0-default mode, 1-silent mode.
> +        *
> +        * @see RTE_DMA_DEV_CAPA_SILENT
> +        */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Configure a DMA device.
> + *
> + * This function must be invoked first before any other function in the
> + * API. This function can also be re-invoked when a device is in the
> + * stopped state.
> + *
> + * @param dev_id
> + *   The identifier of the device to configure.
> + * @param dev_conf
> + *   The DMA device configuration structure encapsulated into rte_dmadev_conf
> + *   object.
> + *
> + * @return
> + *   - =0: Success, device configured.
> + *   - <0: Error code returned by the driver configuration function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Start a DMA device.
> + *
> + * The device start step is the last one and consists of setting the DMA
> + * to start accepting jobs.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Success, device started.
> + *   - <0: Error code returned by the driver start function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_start(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Stop a DMA device.
> + *
> + * The device can be restarted with a call to rte_dmadev_start().
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Success, device stopped.
> + *   - <0: Error code returned by the driver stop function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stop(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Close a DMA device.
> + *
> + * The device cannot be restarted after this call.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *  - =0: Successfully close device
> + *  - <0: Failure to close device
> + */
> +__rte_experimental
> +int
> +rte_dmadev_close(uint16_t dev_id);
> +
> +/**
> + * rte_dma_direction - DMA transfer direction defines.
> + */
> +enum rte_dma_direction {
> +       RTE_DMA_DIR_MEM_TO_MEM = 0,

No need to give = 0 as it starts with 0.

> +       /**< DMA transfer direction - from memory to memory.
> +        *
> +        * @see struct rte_dmadev_vchan_conf::direction
> +        */
> +       RTE_DMA_DIR_MEM_TO_DEV = 1,

No need to give = 1.

> +       /**< DMA transfer direction - from memory to device.
> +        * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs

We can remove ARM. It can be RISC-V too. ;-)


> +        * through the PCIE interface. In this case, the ARM SoCs works in

PCIe

> +        * EP(endpoint) mode, it could initiate a DMA move request from memory
> +        * (which is ARM memory) to device (which is x86 host memory).

to the device.

> +        *
> +        * @see struct rte_dmadev_vchan_conf::direction
> +        */
> +       RTE_DMA_DIR_DEV_TO_MEM = 2,
> +       /**< DMA transfer direction - from device to memory.
> +        * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
> +        * through the PCIE interface. In this case, the ARM SoCs works in
> +        * EP(endpoint) mode, it could initiate a DMA move request from device
> +        * (which is x86 host memory) to memory (which is ARM memory).
> +        *
> +        * @see struct rte_dmadev_vchan_conf::direction
> +        */
> +       RTE_DMA_DIR_DEV_TO_DEV = 3,
> +       /**< DMA transfer direction - from device to device.
> +        * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
> +        * through the PCIE interface. In this case, the ARM SoCs works in
> +        * EP(endpoint) mode, it could initiate a DMA move request from device
> +        * (which is x86 host memory) to device (which is another x86 host
> +        * memory).
> +        *
> +        * @see struct rte_dmadev_vchan_conf::direction
> +        */
> +       RTE_DMA_DIR_BUTT

# Doxygen comment is missing
# Typically we use RTE_DMA_DIR_MAX.
# If there is no real need for this please remove this as it can break
ABI if we add more
items.


> +};
> +
> +/**
> + * enum rte_dmadev_port_type - DMA access port type defines.
> + *
> + * @see struct rte_dmadev_port_param::port_type
> + */
> +enum rte_dmadev_port_type {
> +       RTE_DMADEV_PORT_NONE = 0,

No need for = 0

> +       RTE_DMADEV_PORT_PCIE, /**< The DMA access port is PCIE. */
> +       RTE_DMADEV_PORT_BUTT


Same as the above comment for RTE_DMA_DIR_BUTT

> +};
> +
> +/**
> + * A structure used to descript DMA access port parameters.
> + *
> + * @see struct rte_dmadev_vchan_conf::src_port
> + * @see struct rte_dmadev_vchan_conf::dst_port
> + */
> +struct rte_dmadev_port_param {
> +       enum rte_dmadev_port_type port_type;
> +       /**< The device access port type.
> +        * @see enum rte_dmadev_port_type
> +        */
> +       union {
> +               /** PCIE access port parameter.
> +                *
> +                * The following model shows SoC's PCIE module connects to
> +                * multiple PCIE hosts and multiple endpoints. The PCIE module
> +                * has an integrate DMA controller.
> +                * If the DMA wants to access the memory of host A, it can be
> +                * initiated by PF1 in core0, or by VF0 of PF0 in core0.
> +                *
> +                * System Bus
> +                *    |     ----------PCIE module----------
> +                *    |     Bus
> +                *    |     Interface
> +                *    |     -----        ------------------
> +                *    |     |   |        | PCIE Core0     |
> +                *    |     |   |        |                |        -----------
> +                *    |     |   |        |   PF-0 -- VF-0 |        | Host A  |
> +                *    |     |   |--------|        |- VF-1 |--------| Root    |
> +                *    |     |   |        |   PF-1         |        | Complex |
> +                *    |     |   |        |   PF-2         |        -----------
> +                *    |     |   |        ------------------
> +                *    |     |   |
> +                *    |     |   |        ------------------
> +                *    |     |   |        | PCIE Core1     |
> +                *    |     |   |        |                |        -----------
> +                *    |     |   |        |   PF-0 -- VF-0 |        | Host B  |
> +                *    |-----|   |--------|   PF-1 -- VF-0 |--------| Root    |
> +                *    |     |   |        |        |- VF-1 |        | Complex |
> +                *    |     |   |        |   PF-2         |        -----------
> +                *    |     |   |        ------------------
> +                *    |     |   |
> +                *    |     |   |        ------------------
> +                *    |     |DMA|        |                |        ------
> +                *    |     |   |        |                |--------| EP |
> +                *    |     |   |--------| PCIE Core2     |        ------
> +                *    |     |   |        |                |        ------
> +                *    |     |   |        |                |--------| EP |
> +                *    |     |   |        |                |        ------
> +                *    |     -----        ------------------


This diagram does not show correctly in doxygen. Please fix it.

> +                *
> +                * The following structure is used to describe the above access
> +                * port.
> +                *
> +                * @note If some fields can not be supported by the
> +                * hardware/driver, then the driver ignores those fields.
> +                * Please check driver-specific documentation for limitations
> +                * and capablites.
> +                */
> +               struct {
> +                       uint64_t coreid : 4; /**< PCIE core id used. */
> +                       uint64_t pfid : 8; /**< PF id used. */
> +                       uint64_t vfen : 1; /**< VF enable bit. */
> +                       uint64_t vfid : 16; /**< VF id used. */
> +                       uint64_t pasid : 20;
> +                       /**< The pasid filed in TLP packet. */
> +                       uint64_t attr : 3;
> +                       /**< The attributes filed in TLP packet. */
> +                       uint64_t ph : 2;
> +                       /**< The processing hint filed in TLP packet. */
> +                       uint64_t st : 16;
> +                       /**< The steering tag filed in TLP packet. */
> +               } pcie;
> +       };
> +       uint64_t reserved[2]; /**< Reserved for future fields. */
> +};
> +
> +/**
> + * A structure used to configure a virtual DMA channel.
> + */
> +struct rte_dmadev_vchan_conf {
> +       enum rte_dma_direction direction;
> +       /**< Transfer direction
> +        * @see enum rte_dma_direction
> +        */
> +       uint16_t nb_desc;
> +       /**< Number of descriptor for the virtual DMA channel */
> +       struct rte_dmadev_port_param src_port;
> +       /**< 1) Used to describes the device access port parameter in the
> +        * device-to-memory transfer scenario.
> +        * 2) Used to describes the source device access port parameter in the
> +        * device-to-device transfer scenario.
> +        * @see struct rte_dmadev_port_param
> +        */
> +       struct rte_dmadev_port_param dst_port;
> +       /**< 1) Used to describes the device access port parameter in the
> +        * memory-to-device transfer scenario.
> +        * 2) Used to describes the destination device access port parameter in
> +        * the device-to-device transfer scenario.
> +        * @see struct rte_dmadev_port_param
> +        */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate and set up a virtual DMA channel.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param conf
> + *   The virtual DMA channel configuration structure encapsulated into
> + *   rte_dmadev_vchan_conf object.
> + *
> + * @return
> + *   - >=0: Allocate success, it is the virtual DMA channel id. This value must
> + *          be less than the field 'max_vchans' of struct rte_dmadev_conf
> + *          which configured by rte_dmadev_configure().
> + *   - <0: Error code returned by the driver virtual channel setup function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_vchan_setup(uint16_t dev_id,
> +                      const struct rte_dmadev_vchan_conf *conf);
> +
> +/**
> + * rte_dmadev_stats - running statistics.
> + */
> +struct rte_dmadev_stats {
> +       uint64_t submitted_count;
> +       /**< Count of operations which were submitted to hardware. */
> +       uint64_t completed_fail_count;
> +       /**< Count of operations which failed to complete. */
> +       uint64_t completed_count;
> +       /**< Count of operations which successfully complete. */
> +};
> +
> +#define RTE_DMADEV_ALL_VCHAN   0xFFFFu
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve basic statistics of a or all virtual DMA channel(s).
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + *   If equal RTE_DMADEV_ALL_VCHAN means all channels.
> + * @param[out] stats
> + *   The basic statistics structure encapsulated into rte_dmadev_stats
> + *   object.
> + *
> + * @return
> + *   - =0: Successfully retrieve stats.
> + *   - <0: Failure to retrieve stats.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stats_get(uint16_t dev_id, uint16_t vchan,
> +                    struct rte_dmadev_stats *stats);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Reset basic statistics of a or all virtual DMA channel(s).
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + *   If equal RTE_DMADEV_ALL_VCHAN means all channels.
> + *
> + * @return
> + *   - =0: Successfully reset stats.
> + *   - <0: Failure to reset stats.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stats_reset(uint16_t dev_id, uint16_t vchan);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Dump DMA device info.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param f
> + *   The file to write the output to.
> + *
> + * @return
> + *   0 on success. Non-zero otherwise.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_dump(uint16_t dev_id, FILE *f);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger the dmadev self test.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - 0: Selftest successful.
> + *   - -ENOTSUP if the device doesn't support selftest
> + *   - other values < 0 on failure.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_selftest(uint16_t dev_id);
> +
> +/**
> + * rte_dma_status_code - DMA transfer result status code defines.
> + */
> +enum rte_dma_status_code {
> +       RTE_DMA_STATUS_SUCCESSFUL = 0,

No need for = 0

> +       /**< The operation completed successfully. */
> +       RTE_DMA_STATUS_USRER_ABORT,
> +       /**< The operation failed to complete due abort by user.
> +        * This is mainly used when processing dev_stop, user could modidy the
> +        * descriptors (e.g. change one bit to tell hardware abort this job),
> +        * it allows outstanding requests to be complete as much as possible,
> +        * so reduce the time to stop the device.
> +        */
> +       RTE_DMA_STATUS_NOT_ATTEMPTED,
> +       /**< The operation failed to complete due to following scenarios:
> +        * The jobs in a particular batch are not attempted because they
> +        * appeared after a fence where a previous job failed. In some HW
> +        * implementation it's possible for jobs from later batches would be
> +        * completed, though, so report the status from the not attempted jobs
> +        * before reporting those newer completed jobs.
> +        */
> +       RTE_DMA_STATUS_INVALID_SRC_ADDR,
> +       /**< The operation failed to complete due invalid source address. */
> +       RTE_DMA_STATUS_INVALID_DST_ADDR,
> +       /**< The operation failed to complete due invalid destination
> +        * address.
> +        */
> +       RTE_DMA_STATUS_INVALID_LENGTH,
> +       /**< The operation failed to complete due invalid length. */
> +       RTE_DMA_STATUS_INVALID_OPCODE,
> +       /**< The operation failed to complete due invalid opcode.
> +        * The DMA descriptor could have multiple format, which are
> +        * distinguished by the opcode field.
> +        */
> +       RTE_DMA_STATUS_BUS_ERROR,
> +       /**< The operation failed to complete due bus err. */
> +       RTE_DMA_STATUS_DATA_POISION,
> +       /**< The operation failed to complete due data poison. */
> +       RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
> +       /**< The operation failed to complete due descriptor read error. */
> +       RTE_DMA_STATUS_DEV_LINK_ERROR,
> +       /**< The operation failed to complete due device link error.
> +        * Used to indicates that the link error in the memory-to-device/
> +        * device-to-memory/device-to-device transfer scenario.
> +        */
> +       RTE_DMA_STATUS_UNKNOWN = 0x100,
> +       /**< The operation failed to complete due unknown reason.
> +        * The initial value is 256, which reserves space for future errors.
> +        */
> +};
> +
> +/**
> + * rte_dmadev_sge - can hold scatter DMA operation request entry.
> + */
> +struct rte_dmadev_sge {
> +       rte_iova_t addr; /**< The DMA operation address. */
> +       uint32_t length; /**< The DMA operation length. */
> +};
> +
> +#include "rte_dmadev_core.h"
> +
> +/* DMA flags to augment operation preparation. */
> +#define RTE_DMA_OP_FLAG_FENCE  (1ull << 0)
> +/**< DMA fence flag.
> + * It means the operation with this flag must be processed only after all
> + * previous operations are completed.
> + * If the specify DMA HW works in-order (it means it has default fence between
> + * operations), this flag could be NOP.
> + *
> + * @see rte_dmadev_copy()
> + * @see rte_dmadev_copy_sg()
> + * @see rte_dmadev_fill()
> + */
> +
> +#define RTE_DMA_OP_FLAG_SUBMIT (1ull << 1)
> +/**< DMA submit flag.
> + * It means the operation with this flag must issue doorbell to hardware after
> + * enqueued jobs.
> + */
> +
> +#define RTE_DMA_OP_FLAG_LLC    (1ull << 2)
> +/**< DMA write data to low level cache hint.
> + * Used for performance optimization, this is just a hint, and there is no
> + * capability bit for this, driver should not return error if this flag was set.
> + */
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a copy operation onto the virtual DMA channel.
> + *
> + * This queues up a copy operation to be performed by hardware, if the 'flags'
> + * parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell to begin
> + * this operation, otherwise do not trigger doorbell.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The address of the source buffer.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the data to be copied.
> + * @param flags
> + *   An flags for this operation.
> + *   @see RTE_DMA_OP_FLAG_*
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> +               uint32_t length, uint64_t flags)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
> +#endif
> +
> +       return (*dev->copy)(dev, vchan, src, dst, length, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a scatter list copy operation onto the virtual DMA channel.
> + *
> + * This queues up a scatter list copy operation to be performed by hardware, if
> + * the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell
> + * to begin this operation, otherwise do not trigger doorbell.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The pointer of source scatter entry array.
> + * @param dst
> + *   The pointer of destination scatter entry array.
> + * @param nb_src
> + *   The number of source scatter entry.
> + * @param nb_dst
> + *   The number of destination scatter entry.
> + * @param flags
> + *   An flags for this operation.
> + *   @see RTE_DMA_OP_FLAG_*
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy scatterlist job.
> + *   - <0: Error code returned by the driver copy scatterlist function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, struct rte_dmadev_sge *src,
> +                  struct rte_dmadev_sge *dst, uint16_t nb_src, uint16_t nb_dst,
> +                  uint64_t flags)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans ||
> +           src == NULL || dst == NULL || nb_src == 0 || nb_dst == 0)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
> +#endif
> +
> +       return (*dev->copy_sg)(dev, vchan, src, dst, nb_src, nb_dst, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a fill operation onto the virtual DMA channel.
> + *
> + * This queues up a fill operation to be performed by hardware, if the 'flags'
> + * parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell to begin
> + * this operation, otherwise do not trigger doorbell.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param pattern
> + *   The pattern to populate the destination buffer with.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the destination buffer.
> + * @param flags
> + *   An flags for this operation.
> + *   @see RTE_DMA_OP_FLAG_*
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued fill job.
> + *   - <0: Error code returned by the driver fill function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> +               rte_iova_t dst, uint32_t length, uint64_t flags)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> +#endif
> +
> +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger hardware to begin performing enqueued operations.
> + *
> + * This API is used to write the "doorbell" to the hardware to trigger it
> + * to begin the operations previously enqueued by rte_dmadev_copy/fill().
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + *
> + * @return
> + *   - =0: Successfully trigger hardware.
> + *   - <0: Failure to trigger hardware.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_submit(uint16_t dev_id, uint16_t vchan)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->submit, -ENOTSUP);
> +#endif
> +
> +       return (*dev->submit)(dev, vchan);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that have been successfully completed.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_cpls
> + *   The maximum number of completed operations that can be processed.
> + * @param[out] last_idx
> + *   The last completed operation's index.
> + *   If not required, NULL can be passed in.
> + * @param[out] has_error
> + *   Indicates if there are transfer error.
> + *   If not required, NULL can be passed in.
> + *
> + * @return
> + *   The number of operations that successfully completed. This return value
> + *   must be less than or equal to the value of nb_cpls.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed(uint16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
> +                    uint16_t *last_idx, bool *has_error)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +       uint16_t idx;
> +       bool err;
> +
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans ||
> +           nb_cpls == 0)
> +               return 0;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->completed, 0);
> +#endif
> +
> +       /* Ensure the pointer values are non-null to simplify drivers.
> +        * In most cases these should be compile time evaluated, since this is
> +        * an inline function.
> +        * - If NULL is explicitly passed as parameter, then compiler knows the
> +        *   value is NULL
> +        * - If address of local variable is passed as parameter, then compiler
> +        *   can know it's non-NULL.
> +        */
> +       if (last_idx == NULL)
> +               last_idx = &idx;
> +       if (has_error == NULL)
> +               has_error = &err;
> +
> +       *has_error = false;
> +       return (*dev->completed)(dev, vchan, nb_cpls, last_idx, has_error);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that have been completed, and the
> + * operations result may succeed or fail.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_cpls
> + *   Indicates the size of status array.
> + * @param[out] last_idx
> + *   The last completed operation's index.
> + *   If not required, NULL can be passed in.
> + * @param[out] status
> + *   The error code of operations that completed.
> + *   @see enum rte_dma_status_code
> + *
> + * @return
> + *   The number of operations that completed. This return value must be less
> + *   than or equal to the value of nb_cpls.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed_status(uint16_t dev_id, uint16_t vchan,
> +                           const uint16_t nb_cpls, uint16_t *last_idx,
> +                           enum rte_dma_status_code *status)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +       uint16_t idx;
> +
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans ||
> +           nb_cpls == 0 || status == NULL)
> +               return 0;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->completed_status, 0);
> +#endif
> +
> +       if (last_idx == NULL)
> +               last_idx = &idx;
> +
> +       return (*dev->completed_status)(dev, vchan, nb_cpls, last_idx, status);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1] doc: update atomic operation deprecation
  2021-07-12  8:02  4% [dpdk-dev] [PATCH v1] doc: update atomic operation deprecation Joyce Kong
@ 2021-07-17 18:47  0% ` Honnappa Nagarahalli
  2021-07-23  9:49  4% ` [dpdk-dev] [PATCH v2] " Joyce Kong
  1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2021-07-17 18:47 UTC (permalink / raw)
  To: Joyce Kong, thomas, stephen, Ruifeng Wang, mdr
  Cc: dev, nd, stable, Honnappa Nagarahalli, nd

<snip>

> 
> Update the incorrect description about atomic operations with provided
> wrappers in deprecation doc[1].
> 
> [1]https://mails.dpdk.org/archives/dev/2021-July/213333.html
> 
> Fixes: 7518c5c4ae6a ("doc: announce adoption of C11 atomic operations
> semantics")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 9584d6bfd7..4142315842 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -19,16 +19,16 @@ Deprecation Notices
> 
>  * rte_atomicNN_xxx: These APIs do not take memory order parameter. This
> does
>    not allow for writing optimized code for all the CPU architectures supported
> -  in DPDK. DPDK will adopt C11 atomic operations semantics and provide
> wrappers
> -  using C11 atomic built-ins. These wrappers must be used for patches that
> -  need to be merged in 20.08 onwards. This change will not introduce any
> -  performance degradation.
> +  in DPDK. DPDK has adopted atomic operations semantics. GCC atomic
> + built-ins  must be used for patches that need to be merged in 20.08
> + onwards. This change  will not introduce any performance degradation.
Since there have been objections to the language used to refer to GCC C11 atomic built-ins, may be we add a reference to the GCC pages?

DPDK has adopted the atomic operations from https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. These operations must be used for patches that need to be merged in 20.08 onwards. This change  will not introduce any performance degradation.

> 
>  * rte_smp_*mb: These APIs provide full barrier functionality. However, many
> -  use cases do not require full barriers. To support such use cases, DPDK will
> -  adopt C11 barrier semantics and provide wrappers using C11 atomic built-
> ins.
> -  These wrappers must be used for patches that need to be merged in 20.08
> -  onwards. This change will not introduce any performance degradation.
> +  use cases do not require full barriers. To support such use cases,
> + DPDK has  adopted atomic barrier semantics. GCC atomic built-ins and a
> + new wrapper  ``rte_atomic_thread_fence`` instead of
> + ``__atomic_thread_fence`` must be  used for patches that need to be
> + merged in 20.08 onwards. This change will  not introduce any performance
> degradation.
Same here.
To support such use cases, DPDK has  adopted atomic operations from https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. A new wrapper  ``rte_atomic_thread_fence`` instead of ``__atomic_thread_fence`` must be  used for patches that need to be merged in 20.08 onwards. This change will  not introduce any performance degradation.

> 
>  * lib: will fix extending some enum/define breaking the ABI. There are
> multiple
>    samples in DPDK that enum/define terminated with a ``.*MAX.*`` value
> which is
> --
> 2.17.1


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] Minutes of Technical Board Meeting 2021-06-02
@ 2021-07-16 14:51  5% Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2021-07-16 14:51 UTC (permalink / raw)
  To: dev

Minutes of Technical Board Meeting, 2021-06-02
==============================================


NOTE: The technical board meetings every second Wednesday at
https://meet.jit.si/DPDK at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.

NOTE: Next meeting will be on Wednesday 2021-06-09 @3pm UTC, and will be
chaired by Thomas.

1/ CI infrastructure
--------------------
The current CI infrastructure is failing. The root cause appears to be
the upgrade by UNH IOL causing test failures. The test failures impact the
patch approval process since patches marked as failing CI are normally not
allowed.

Proposal was to have another set of resources to test upgrade before
deploying.


2/ ABI stability period
-----------------------

When initially discussed the stability period was going to be two
years, but in final compromise a trial period of one year was agreed
to but the wording in documentation allows for longer periods.
In the documentation (guides/contributing/abi_policy.rst)
 "Major ABI versions are declared no more frequently than yearly"

The proposal is to go to two year period but there are some open
concerns that need addressing:
  - several data structures and inline functions need to be hidden
    to reduce the exposed ABI.
  - many experimental features need to be moved to stable status.
  - deprecated functions and fields need to be removed.
If 21.11 is going to have two year ABI window, then cleanups are
needed.

Related discussions:

Should the scope of Long Term Stable (LTS) be expended? Right now,
the scope is limited to bug fixes. Vendors and distro's using LTS
would appreciate having new drivers (and PCI ids).
What about backporting standalone new libraries to LTS?
Conclusion: is that more discussion about requirements and risks
are needed before expanding LTS.

Indirect results of the current ABI policy has benefits. The ABI clamp
has acted to reduce wild/unstable changes and causes better designs.
Downside is that there is less of a trial window for changes, if a new
feature requires ABI change it goes into the yearly release without getting
longer period of review and testing.

What kind of upcoming features need ABI breakage?

Conclusion:
Taskforce will be setup to make a more concrete recommendation.
The taskforce will give status update in 2 weeks (next TAB)
and recommend action for 21.11 in one month.

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [RFC PATCH v4 0/3] Add PIE support for HQoS library
  @ 2021-07-16 12:46  0%   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2021-07-16 12:46 UTC (permalink / raw)
  To: Liguzinski, WojciechX, dev, Singh, Jasvinder
  Cc: Dharmappa, Savinay, Ajmera, Megha

Hi Wojciech,

Thank you for doing this work!

> -----Original Message-----
> From: Liguzinski, WojciechX <wojciechx.liguzinski@intel.com>
> Sent: Monday, July 5, 2021 9:04 AM
> To: dev@dpdk.org; Singh, Jasvinder <jasvinder.singh@intel.com>;
> Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: Dharmappa, Savinay <savinay.dharmappa@intel.com>; Ajmera, Megha
> <megha.ajmera@intel.com>
> Subject: [RFC PATCH v4 0/3] Add PIE support for HQoS library
> 
> DPDK sched library is equipped with mechanism that secures it from the
> bufferbloat problem
> which is a situation when excess buffers in the network cause high latency
> and latency
> variation. Currently, it supports RED for active queue management (which is
> designed
> to control the queue length but it does not control latency directly and is now
> being
> obsoleted). However, more advanced queue management is required to
> address this problem
> and provide desirable quality of service to users.

As already mentioned by other reviewers, I don't think RED/WRED is getting obsoleted. This entire paragraph is a bit fuzzy and not really adding much value IMO, I propose to remove it.

> 
> This solution (RFC) proposes usage of new algorithm called "PIE"
> (Proportional Integral
> controller Enhanced) that can effectively and directly control queuing latency
> to address
> the bufferbloat problem.

Please add a link to the public RFC for PIE in this cover letter.

> 
> The implementation of mentioned functionality includes modification of
> existing and
> adding a new set of data structures to the library, adding PIE related APIs.
> This affects structures in public API/ABI. That is why deprecation notice is
> going
> to be prepared and sent.

I think you are stating the obvious here, how about removing this paragraph as well?

> 
> Liguzinski, WojciechX (3):
>   sched: add PIE based congestion management
>   example/qos_sched: add PIE support
>   example/ip_pipeline: add PIE support
> 
>  config/rte_config.h                      |   1 -
>  drivers/net/softnic/rte_eth_softnic_tm.c |   6 +-
>  examples/ip_pipeline/tmgr.c              |   6 +-
>  examples/qos_sched/app_thread.c          |   1 -
>  examples/qos_sched/cfg_file.c            |  82 ++++-
>  examples/qos_sched/init.c                |   7 +-
>  examples/qos_sched/profile.cfg           | 196 +++++++----
>  lib/sched/meson.build                    |  10 +-
>  lib/sched/rte_pie.c                      |  82 +++++
>  lib/sched/rte_pie.h                      | 393 +++++++++++++++++++++++
>  lib/sched/rte_sched.c                    | 229 +++++++++----
>  lib/sched/rte_sched.h                    |  53 ++-
>  lib/sched/version.map                    |   3 +
>  13 files changed, 888 insertions(+), 181 deletions(-)
>  create mode 100644 lib/sched/rte_pie.c
>  create mode 100644 lib/sched/rte_pie.h
> 
> --
> 2.17.1

Regards,
Cristian

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows
  2021-07-09  1:03  2%   ` Tyler Retzlaff
@ 2021-07-16  9:40  4%     ` Dmitry Kozlyuk
  0 siblings, 0 replies; 200+ results
From: Dmitry Kozlyuk @ 2021-07-16  9:40 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, thomas

2021-07-08 18:03 (UTC-0700), Tyler Retzlaff:
> On Thu, Jul 08, 2021 at 11:49:53PM +0300, Dmitry Kozlyuk wrote:
> > Hi Tyler,
> > 
> > 2021-07-08 12:21 (UTC-0700), Tyler Retzlaff:  
> > > hi folks,
> > > 
> > > we would like to submit a a patch series that makes dll/dso for dpdk
> > > work on windows. there are two differences in the windows platform that
> > > would need to be address through enhancements to dpdk.
> > > 
> > > (1) windows dynamic objects don't export sufficient information for
> > >     tls variables and the windows loader and runtime would need to be
> > >     enhanced in order to perform runtime linking. [1][2]  
> > 
> > When will the new loader be available?  
> 
> the solution i have prototyped does not directly export the tls variables
> and instead relies on exports of tls offsets within a module.  no loader
> change or new os is required.
> 
> > Will it be ported to Server 2019?  
> 
> not necessary (as per above)
> 
> > Will this functionality require compiler support  
> 
> the prototype was developed using windows clang, mingw code compiles but
> i did not try to run it. i suspect it is okay though haven't examine any
> side-effects when using emul tls like mingw does. anyway mingw dll's
> don't work now and it probably shouldn't block them being available with
> clang.

AFAIK it's the opposite. MinGW can handle TLS varibale export from DLL,
although with "__emutls." prefix and some performance penalty.
Clang can't at all, despite compiling and linking without an issue.

No, it is not acceptable to add a generic feature supported by only one
compiler. (FWIW, I'm displeased even by mlx5 being tied to clang.)
Particularly, I don't understand how could MinGW and clang coexist
if they export different sets of symbols. Apps will need to know
if it's MingW- or clang-compiled DPDK? Sounds messy.

> > (you mention that accessing such variables will be "non-trivial")?  
> 
> the solution involves exporting offsets that then allow explicit tls
> accesses relative to the gs segment. it's non-trivial in the sense that
> none of the normal explicit tls functions in windows are used and the
> compiler doesn't generate the code for implicit tls access. the overhead
> is relatively tolerable (one or two additional dereferences).

A thorough benchmark will be required. I'm afraid that inline assembly
(which %gs mention suggests) can impact optimization of the code nearby.
Ideally it should be a DPDK performance autotest.

> 
> >    
> > > (2) importing exported data symbols from a dll/dso on windows requires
> > >     that the symbol be decorated with dllimport. optionally loading
> > >     performance of dll/dso is also further improved by decorating
> > >     exported function symbols. [3]  
> > 
> > Does it affect ABI?  
> 
> the data symbols are already part of the abi for linux. this just allows
> them to be properly accessed when exported from dll on windows.
> surprisingly lld-link doesn't fail when building dll's now which it should
> in the absence of a __declspec(dllimport) ms link would.
> 
> on windows now the tls variables are exported but not useful with this
> change we would choose not to export them at all and each exported tls
> variable would be replaced with a new variable.
> 
> one nit (which we will get separate feedback on) is how to export
> symbols only on windows (and don't export them on linux) because similar
> to the tls variables linux has no use for my new variables.

There's already WINDOWS_NO_EXPORT mark in .map to generate .def,
likewise, .map for Linux/FreeBSD could be generated from a basic .map
with similar marks.

> > 
> > It is also a huge code change, although a mechanical one.
> > Is it required? All exported symbols are listed in .map/def, after all.  
> 
> if broad sweeping mechanical change is a sensitive issue we can limit
> the change to just the data symbols which are required. but keeping in
> mind there is a penalty on load time when the function symbols are not
> decorated. ultimately we would like them all properly decorated but we
> don't need to push it now since we're just trying to enable the
> functionality.

I was asking in connection with the previous question about ABI,
because 21.11 ABI freeze may be a two-year one. Since ABI is not affected
for Unix and for Windows we don't maintain it currently, there is no rush for
the change at least.


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash offload
  @ 2021-07-16  8:52  3%       ` Ferruh Yigit
       [not found]             ` <DM8PR11MB5639C757A790F65CBFB647C2D1E19@DM8PR11MB5639.namprd11.prod.outlook.com>
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-07-16  8:52 UTC (permalink / raw)
  To: Li, Xiaoyun, Wang, Jie1X, dev; +Cc: andrew.rybchenko, stable

On 7/16/2021 9:30 AM, Li, Xiaoyun wrote:
>> -----Original Message-----
>> From: stable <stable-bounces@dpdk.org> On Behalf Of Li, Xiaoyun
>> Sent: Thursday, July 15, 2021 12:54
>> To: Wang, Jie1X <jie1x.wang@intel.com>; dev@dpdk.org
>> Cc: andrew.rybchenko@oktetlabs.ru; stable@dpdk.org
>> Subject: Re: [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show
>> RSS hash offload
>>
>>> -----Original Message-----
>>> From: Wang, Jie1X <jie1x.wang@intel.com>
>>> Sent: Thursday, July 15, 2021 19:57
>>> To: dev@dpdk.org
>>> Cc: Li, Xiaoyun <xiaoyun.li@intel.com>; andrew.rybchenko@oktetlabs.ru;
>>> Wang, Jie1X <jie1x.wang@intel.com>; stable@dpdk.org
>>> Subject: [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash
>>> offload
>>>
>>> The driver may change offloads info into dev->data->dev_conf in
>>> dev_configure which may cause port->dev_conf and port->rx_conf contain
>> outdated values.
>>>
>>> This patch updates the offloads info if it changes to fix this issue.
>>>
>>> Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Jie Wang <jie1x.wang@intel.com>
>>> ---
>>> v4: delete the whitespace at the end of the line.
>>> v3:
>>>  - check and update the "offloads" of "port->dev_conf.rx/txmode".
>>>  - update the commit log.
>>> v2: copy "rx/txmode.offloads", instead of copying the entire struct
>>> "dev->data-
>>>> dev_conf.rx/txmode".
>>> ---
>>>  app/test-pmd/testpmd.c | 27 +++++++++++++++++++++++++++
>>>  1 file changed, 27 insertions(+)
>>
>> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
> 
> Although I gave my ack, app shouldn't touch rte_eth_devices which this patch does. Usually, testpmd should only call function like eth_dev_info_get_print_err().
> But dev_info doesn't contain the info dev->data->dev_conf which the driver modifies.
> 
> Probably we need a better fix.
> 

Agree, an application accessing directly to 'rte_eth_devices' is sign of
something missing/wrong.

In this case there is no way for application to know what is the configured
offload settings per port and queue. Which is missing part I think.

As you said normally we get data from PMD mainly via 'rte_eth_dev_info_get()',
which is an overloaded function, it provides many different things, like driver
default values, limitations, current config/status, capabilities etc...

So I think we can do a few things:
1) Add current offload configuration to 'rte_eth_dev_info_get()', so application
can get it and use it.
The advantage is this API already called many places, many times, so there is a
big chance that application already have this information when it needs.
Disadvantage is, as mentioned above the API already big and messy, making it
bigger makes more error prone and makes easier to break ABI.

2) Add a new API to get configured offload information, so a specific API for it.

3) Get a more generic API to get configured config (dev_conf) which will cover
offloads too.
Disadvantage can be leaking out too many internal config to user unintentionally.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  2021-07-15 14:21  5%     ` David Marchand
@ 2021-07-16  8:39  3%       ` Akhil Goyal
  2021-07-20 11:58  3%         ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2021-07-16  8:39 UTC (permalink / raw)
  To: David Marchand, Shijith Thotton, Thomas Monjalon,
	Jerin Jacob Kollanukkaran
  Cc: dev, Pavan Nikhilesh Bhagavatula, Anoob Joseph,
	Abhinandan Gujjar, Ankur Dwivedi, Ray Kinsella, Aaron Conole,
	dpdklab, Lincoln Lavoie

Hi David,

> >  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev',
> 'net_octeontx']
> > +deps += ['crypto_octeontx']
> 
> This extra dependency resulted in disabling the event/octeontx driver
> in FreeBSD, since crypto/octeontx only builds on Linux.
> Removing hw support triggers a ABI failure for FreeBSD.
> 
> 
> - This had been reported by UNH CI:
> http://mails.dpdk.org/archives/test-report/2021-June/200637.html 
> It seems the result has been ignored but it should have at least
> raised some discussion.
> 
This was highlighted to CI ML
http://patches.dpdk.org/project/dpdk/patch/0686a7c3fb3a22e37378a8545bc37bce04f4c391.1624481225.git.sthotton@marvell.com/

but I think I missed to take the follow up with Brandon and applied the patch
as it did not look an issue to me as octeon drivers are not currently built on FreeBSD.
Not sure why event driver is getting built there.

> 
> - I asked UNH to stop testing FreeBSD abi for now, waiting to get the
> main branch fixed.
> 
> I don't have the time to look at this, please can you work on it?
> 
> Several options:
> * crypto/octeontx is made so that it compiles on FreeBSD,
> * the abi check is extended to have exceptions per OS,
> * the FreeBSD abi reference is regenerated at UNH not to have those
> drivers in it (not sure it is doable),

Thanks for the suggestions, we are working on it to resolve this as soon as possible.
We may need to add exception in ABI checking so that it does not shout if a PMD
is not compiled.

Regards,
Akhil

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] DPDK Release Status Meeting 15/07/2021
@ 2021-07-15 22:28  4% Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2021-07-15 22:28 UTC (permalink / raw)
  To: dev; +Cc: thomas, Yigit, Ferruh

Release status meeting minutes {Date}
=====================================
:Date: 15 July 2021
:toc:

.Agenda:
* Release Dates
* Subtrees
* Roadmaps
* LTS
* Defects
* Opens

.Participants:
* ARM
* Debian/Microsoft
* Intel
* Marvell
* Nvidia
* Red Hat


Release Dates
-------------

* `v21.08` dates
  - Proposal/V1:    Wednesday, 2 June  (completed)
  - rc1:            Saturday,  10 July (completed)
  - rc2:            Thursday,  22 July
  - rc3:            Thursday,  29 July
  - Release:        Tuesday,   3 August

* Note: We need to hold to the early August release date since
  several of the maintainers will be on holidays after that.

* `v21.11` dates (proposed and subject to discussion)
  - Proposal/V1:    Friday, 10 September
  - -rc1:           Friday, 15 October
  - Release:        Friday, 19 November

Subtrees
--------

* main
  - RC1 released.
  - RC2 targeted for Thursday 22 July.
  - Still waiting update on Solarflare patches.


* next-net
  - No update.

* next-crypto
  - 4 new PMDs in this release:
    ** CNXK - merged.
    ** MLX - on last review. Should be merged for RC2.
    ** Intel QAT - Should be merged for RC2.
    ** NXP baseband - will be deferred to next release.

* next-eventdev
  - PR for RC2 today or tomorrow.

* next-virtio
  - Some patches in PR today.
  - New DMA Dev work complicates merging of some of the vhost patches.
    An offline meeting will be held to view options and decide the
    best technical approach. Maxime to set up.

* next-net-brcm
  - No update.

* next-net-intel
  - No update.

* next-net-mlx
  - PR not pulled due to comments that need to be addressed.
  - New version sent today.

* next-net-mrvl
  - Almost ready for RC2.
  - 1 or 2 small series and bug fixes.


LTS
---

* `v19.11` (next version is `v19.11.9`)
  - RC4 tagged.
  - Target release date July 19.

* `v20.11` (next version is `v20.11.3`)
  - 20.11.2 released by Xueming Li on July 7.
  - https://git.dpdk.org/dpdk-stable/commit/?h=20.11&id=a86024748385423306aac45524d6fc8d22ea6703

* Distros
  - v20.11 in Debian 11
  - v20.11 in Ubuntu 21.04


Defects
-------

* Bugzilla links, 'Bugs',  added for hosted projects
  - https://www.dpdk.org/hosted-projects/


Opens
-----

* There in an ongoing inititive around ABI stability which was
  discussed in the Tech Board call. A workgroup has come up
  with a list of critical and major changes required to let us
  extend the ABI without as much disruption. For example:

  ** export driver interfaces as internal
  ** hide more structs (may require uninlining)
  ** split big structs + new feature-specific functions Major
  ** remove enum maximums
  ** reserved space initialized to 0
  ** reserved flags cleared

* We need to fill details and volunteers in this table:
  https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE/edit?usp=sharing




.DPDK Release Status Meetings
*****
The DPDK Release Status Meeting is intended for DPDK Committers to discuss the status of the master tree and sub-trees, and for project managers to track progress or milestone dates.

The meeting occurs on every Thursdays at 8:30 UTC. on https://meet.jit.si/DPDK

If you wish to attend just send an email to "John McNamara <john.mcnamara@intel.com>" for the invite.
*****

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  @ 2021-07-15 14:21  5%     ` David Marchand
  2021-07-16  8:39  3%       ` [dpdk-dev] [EXT] " Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2021-07-15 14:21 UTC (permalink / raw)
  To: Shijith Thotton, Akhil Goyal
  Cc: dev, Pavan Nikhilesh, Anoob Joseph, Jerin Jacob Kollanukkaran,
	Abhinandan Gujjar, Ankur Dwivedi, Ray Kinsella, Aaron Conole,
	dpdklab, Lincoln Lavoie

Hello,

On Wed, Jun 23, 2021 at 10:54 PM Shijith Thotton <sthotton@marvell.com> wrote:
> diff --git a/drivers/event/octeontx/meson.build b/drivers/event/octeontx/meson.build
> index 3cb140b4de..0d9eec3f2e 100644
> --- a/drivers/event/octeontx/meson.build
> +++ b/drivers/event/octeontx/meson.build
> @@ -12,3 +12,4 @@ sources = files(
>  )
>
>  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev', 'net_octeontx']
> +deps += ['crypto_octeontx']

This extra dependency resulted in disabling the event/octeontx driver
in FreeBSD, since crypto/octeontx only builds on Linux.
Removing hw support triggers a ABI failure for FreeBSD.


- This had been reported by UNH CI:
http://mails.dpdk.org/archives/test-report/2021-June/200637.html
It seems the result has been ignored but it should have at least
raised some discussion.


- I asked UNH to stop testing FreeBSD abi for now, waiting to get the
main branch fixed.

I don't have the time to look at this, please can you work on it?

Several options:
* crypto/octeontx is made so that it compiles on FreeBSD,
* the abi check is extended to have exceptions per OS,
* the FreeBSD abi reference is regenerated at UNH not to have those
drivers in it (not sure it is doable),


Thanks.

-- 
David Marchand


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] Techboard - minutes of meeting 2021-07-14
@ 2021-07-15  9:29  5% Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2021-07-15  9:29 UTC (permalink / raw)
  To: techboard; +Cc: dev

Attendees
---------

* Aaron Conole
* Bruce Richardson
* Ferruh Yigit
* Hemant Agrawal
* Honnappa Nagarahalli
* Jerin Jacob
* Kevin Traynor
* Konstantin Ananyev
* Stephen Hemminger
* Thomas Monjalon

Minutes
-------

1. Readout on survey on next-net maintainership

* Ferruh provided a summary of results of a survey he carried out with
  driver maintainers and techboard for feedback on the next-net tree he
  maintains.
* Unfortunately response rate was very low
* Key feedback received in survey:
  * working with patches on mailing list is found to be difficult, with
    large volumes of mails
  * submitters found it awkward to have to do patchwork updates manually on
    sending new patch revisions
  * request for more user-friendly tooling workflow

* Techboard held a discussion on a possible trial of using other tools for
  development workflow in the future. Largely requires a tree maintainer to
  volunteer to run such a trial for a release period to investigate how it
  works and what issues are discovered.

2. ABI/API compatibility and expanded ABI-stability window

* Proposal has been sent out to TB and maintainers on increasing the ABI
  compatibility period to 2 years from 1 year.
  * Lack of general feedback on this
* Work is ongoing to identify and address ABI concerns within DPDK project.
  Maintainers are asked to help with identifying issues in their own areas
  of expertise.
* Discussion was help on changing to 2-year ABI period immediately for
  21.11 or to do so after a review next year. No clear consensus emerged

* ACTION: Ferruh/Thomas to send out patch to DPDK mailing list on ABI:
  * To clarify 2-year proposal specifically
  * To expand discussion wider to the whole development community.

3. US DPDK Event

* the lower than expected attendance numbers was noted
* some discussion on selection criteria and avoidance of very
  vendor-specific content for future events
* general hope within TB for in-person events rather than virtual next
  year!

4. Techboard Membership

* Hemant led some initial discussion on latest document draft

* ACTION: All-TB-Members, (re)review latest document on Techboard
  governance.


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v3] dmadev: introduce DMA device library
  2021-07-13 13:37  0%     ` Bruce Richardson
@ 2021-07-15  6:44  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-15  6:44 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: fengchengwen, Thomas Monjalon, Ferruh Yigit, Jerin Jacob,
	Andrew Rybchenko, dpdk-dev, Morten Brørup, Nipun Gupta,
	Hemant Agrawal, Maxime Coquelin, Honnappa Nagarahalli,
	David Marchand, Satananda Burla, Prasun Kapoor, Ananyev,
	Konstantin

On Tue, Jul 13, 2021 at 7:08 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> > Thank you for your valuable comments, and I think we've taken a big step forward.
> >
> > @andrew Could you provide the copyright line so that I can add it to relevant file.
> >
> > @burce, jerin  Some unmodified review comments are returned here:
>
> Thanks. Some further comments inline below. Most points you make I'm ok
> with, but I do disagree on a number of others.
>
> /Bruce
>
> >
> > 1.
> > COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
> > to be wrapped so aggressively.
> >
> > REPLY: Our CI still has 80 characters limit, and I review most framework still comply.
> >
> Ok.
>
> > 2.
> > COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
> > RTE_DMA_DIRECTION_...
> >
> > REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.
> >
> DIRECTION could be shortened to DIR, but I think this is probably ok as is
> too.
>

I prefer to keep DIR so that it easy to point in documentation like
@see RTE_DMA_DIR_*


> > 3.
> > COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> > We are not making release as pubic API in other device class. See ethdev spec.
> > bbdev/eventdev/rawdev
> >
> > REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
> > I think release is OK, BTW: bbdev/eventdev also have release ops.

I don't see any API like rte_event_queue_release() in event dev. It
has the only setup.

Typical flow is
1) configure() the N vchan
2) for i..N setup() the chan
3) start()
3) stop()
4) configure again with M vchan
5)  for i..M setup() the chan
5) start()

And above is documented at the beginning of the rte_dmadev.h header file.
I think, above sequence makes it easy for drivers. Just like other
device class _release can be
PMD hook which will be handled in configure() common code.



> >
> Ok


> > 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> > fields */
> > > +};
> > Please add the capability for each counter in info structure as one
> > device may support all the counters.
> >
> > REPLY: This is a statistics function. If this function is not supported,
> > then do not need to implement the stats ops function. Also could to set
> > the unimplemented ones to zero.
> >
> +1
> The stats functions should be a minimum set that is supported by all
> drivers. Each of these stats can be easily tracked by software if HW
> support for it is not available, so I agree that we should not have each
> stat as a capability.

In our current HW, submitted_count and completed_count offloaded to HW.
In addition to that, we have a provision for getting stats for bytes
copied.( We can make it as xstat, if other drivers won't support)

our plan is to use enqueued_count and completed_fail_count in SW under
condition compilation flags or another scheme as it is in fastpath.

If we are not planning to add capability, IMO, we need to update the
documentation,
like unimplemented counters will return zero. But there is the
question of how to differentiate between
unimplemented vs genuine zero value. IMO, we can update the doc for
this case as well or
add capability.


>
> > 5.
> > COMMENT: > +#endif
> > > +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> > Instead of every driver set the NOP function, In the common code, If
> > the CAPA is not set,
> > common code can set NOP function for this with <0 return value.
> >
> > REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
> > don't call API which driver not supported (which could get from capabilities).
> >
> For datapath functions, +1.

OK. Probably add some NOP function(returns it as error) in pmd.h so
that all drivers can reuse.
No strong opnion.

>
> > 6.
> > COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> > > +                          const uint16_t nb_status, uint32_t *status,
> > uint32_t -> enum rte_dma_status_code
> >
> > REPLY:I'm still evaluating this. It takes a long time for the driver to perform error code
> > conversion in this API. Do we need to provide an error code conversion function alone ?
> >
> It's not that difficult a conversion to do, and so long as we have the
> regular "completed" function which doesn't do all the error manipulation we
> should be fine. Performance in the case of errors is not expected to be as
> good, since errors should be very rare.

+1

>
> > 7.
> > COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> > > +                                struct rte_dmadev_info *dev_info);
> > Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
> > as this header is exported.
> >
> > REPLY: I prefer not add 'rte_' prefix, it make the define too long.
> >
> I disagree on this, they need the rte_ prefix, despite the fact it makes
> them longer. If length is a concern, these can be changed from "dmadev_" to
> "rte_dma_", which is only one character longer.
> In fact, I believe Morten already suggested we use "rte_dma" rather than
> "rte_dmadev" as a function prefix across the library.

+1

>
> > 8.
> > COMMENT: > + *        - rte_dmadev_completed_fails()
> > > + *            - return the number of operation requests failed to complete.
> > Please rename this to "completed_status" to allow the return of information
> > other than just errors. As I suggested before, I think this should also be
> > usable as a slower version of "completed" even in the case where there are
> > no errors, in that it returns status information for each and every job
> > rather than just returning as soon as it hits a failure.
> >
> > REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
> > and we can build the slow path function on the two API.
> >
> I still disagree on this too. We have a "completed" op where we get
> informed of what has completed and minimal error indication, and a
> "completed_status" operation which provides status information for each
> operation completed, at the cost of speed.

+1

>
> > 9.
> > COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM       (1ull << 0)
> > > +/**< DMA device support mem-to-mem transfer.
> > Do we need this? Can we assume that any device appearing as a dmadev can
> > do mem-to-mem copies, and drop the capability for mem-to-mem and the
> > capability for copying?
> > also for RTE_DMA_DEV_CAPA_OPS_COPY
> >
> > REPLY: yes, I insist on adding this for the sake of conceptual integrity.
> > For ioat driver just make a statement.
> >
>
> Ok. It seems a wasted bit to me, but I don't see us running out of them
> soon.
>
> > 10.
> > COMMENT: > +  uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> > > +};
> > Let's add rte_dmadev_conf struct into this to return the configuration
> > settings.
> >
> > REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.
> >
> Yes, that is true, but I fail to see why that is a major problem. It just
> means that if the conf structure changes we have two functions to version
> instead of one. The information is still useful.
>
> If you don't want the actual conf structure explicitly put into the info
> struct, we can instead put the fields in directly. I really think that the
> info_get function should provide back to the user the details of what way
> the device was configured previously.
>
> regards,
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-14 13:29  0%                             ` Nithin Dabilpuram
@ 2021-07-14 17:28  0%                               ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2021-07-14 17:28 UTC (permalink / raw)
  To: Nithin Dabilpuram
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	 Radu, jiawenwu, jianwang



> -----Original Message-----
> From: Nithin Dabilpuram <nithind1988@gmail.com>
> Sent: Wednesday, July 14, 2021 2:30 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Akhil Goyal <gakhil@marvell.com>; dev@dpdk.org; hemant.agrawal@nxp.com; thomas@monjalon.net; g.singh@nxp.com; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Zhang, Roy Fan <roy.fan.zhang@intel.com>; olivier.matz@6wind.com; jerinj@marvell.com; Doherty, Declan
> <declan.doherty@intel.com>; Nicolau, Radu <radu.nicolau@intel.com>; jiawenwu@trustnetic.com; jianwang@trustnetic.com
> Subject: Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
> 
> On Wed, Jul 14, 2021 at 11:09:08AM +0000, Ananyev, Konstantin wrote:
> > > > >
> > > > > Adding more rte_security and PMD maintainers into the loop.
> > > > >
> > > > > > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > > > > > work on Tx.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > > > > > 7. Send packet out.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > > > > > ipsec-secgw is following.
> > > > > > > > > > > > >
> > > > > > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > > > > > >
> > > > > > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling
> > > rte_security_set_pkt_metadata().
> > > > > > > > > > > >
> > > > > > > > > > > > This is also fine with us.
> > > > > > > > > > >
> > > > > > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > > > > > Yes.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Is that correct understanding?
> > > > > > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > > > > > >
> > > > > > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > > > > > >
> > > > > > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > > > > > rte_security_set_pkt_metadata() is called again.
> > > > > > > > >
> > > > > > > > > Yep, I do have a concern here.
> > > > > > > > > Right now it is perfectly valid to do something like that:
> > > > > > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > > > > > /* can modify contents of the packet */
> > > > > > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > > > > > >
> > > > > > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > > > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > > > > > >
> > > > > > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > > > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > > > > > PMD which are the ones only implementing the call back are already expecting the
> > > > > > > > same ?
> > > > > > >
> > > > > > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > > > > > All that ixgbe callback does - store session related data inside mbuf.
> > > > > > > It's only expectation to have ESP trailer at the proper place (after ICV):
> > > > > >
> > > > > > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > > > > > have ESP trailer updated or when mbuf->pkt_len = 0
> > > > > >
> > > > > > >
> > > > > > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > > > > > >                                 rte_security_dynfield(m);
> > > > > > >   mdata->enc = 1;
> > > > > > >   mdata->sa_idx = ic_session->sa_index;
> > > > > > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > > > > > >
> > > > > > > Then this data will be used by tx_burst() function.
> > > > > > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > > > > > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > > > > > will be using incorrect pad len ?
> > > > >
> > > > > No, pkt_len can be modified.
> > > > > Though ESP trailer pad_len can't.
> > > > >
> > > > > >
> > > > > > This patch is also trying to add similar restriction on when
> > > > > > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > > > > > calling rte_security_set_pkt_metadata().
> > > > >
> > > > > No, I don't think it is really the same.
> > > > > Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> > > > > that ESP packet is already formed and trailer contains valid data.
> > > > > In fact, I think this pad_len calculation can be moved to actual TX function.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > > > > > security,
> > > > > > > > >
> > > > > > > > > Yes, that was my thought.
> > > > > > > > >
> > > > > > > > > > my only argument was that since there is already a hit in
> > > > > > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > > > > > do security related pre-processing there.
> > > > > > > > >
> > > > > > > > > Yes, it would be extra callback call that way.
> > > > > > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > > > > > of function call will be spread around the whole burst, and I presume
> > > > > > > > > shouldn't be too high.
> > > > > > > > >
> > > > > > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > > > > > non-security pkts.
> > > > > > > > >
> > > > > > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > > > > > modifications are required for the packet.
> > > > > > > >
> > > > > > > > But the major issues I see are
> > > > > > > >
> > > > > > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > > > > > >    In our case, we need to know the security session details to do things.
> > > > > > >
> > > > > > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> > > > > >
> > > > > > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > > > > > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > > > > > cycles per pkt.
> > > > >
> > > > > In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> > > > > second for  instance->ops->set_pkt_metadata() callback.
> > > > > Which off-course way too expensive for such simple operation.
> > > > > Actually same thought for rte_security_get_userdata().
> > > > > Both of these functions belong to data-path and ideally have to be as fast as possible.
> > > > > Probably 21.11 is a right timeframe for that.
> > > > >
> > > > > > >
> > > > > > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > > > > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > > > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > > > > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > > > > > >
> > > > > > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > > > > > specially for that - to store secuiryt related data inside the mbuf.
> > > > > > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > > > > > >
> > > > > > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > > > > > processing, it can be done via security specific set_pkt_metadata().
> > > > > > >
> > > > > > > But what you proposing introduces new limitations and might existing functionality.
> > > > > > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > > > > > itself is not an option?
> > > > > >
> > > > > > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > > > > > rte_security_set_pkt_metadata().
> > > > > >
> > > > > > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > > > > > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > > > > > rte_security_dynfield like below ?
> > > > > >
> > > > > > <snip>
> > > > > >
> > > > > > static inline void
> > > > > > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> > > > > >         struct rte_mbuf *mb[], uint16_t num)
> > > > > > {
> > > > > >         uint32_t i, ol_flags;
> > > > > >
> > > > > >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> > > > > >         for (i = 0; i != num; i++) {
> > > > > >
> > > > > >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> > > > > >
> > > > > >                 if (ol_flags != 0)
> > > > > >                         rte_security_set_pkt_metadata(ss->security.ctx,
> > > > > >                                 ss->security.ses, mb[i], NULL);
> > > > > > 		else
> > > > > >                 	*rte_security_dynfield(mb[i]) =
> > > > > >                                 (uint64_t)ss->security.ses->sess_private_data;
> > > > > >
> > > > > >
> > > > > > If the above can be done, then in our PMD, we will not have a callback for
> > > > > > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > > > > > in capabilities.
> > > > >
> > > > > That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> > > > > So all existing apps that use this API will have to be changed.
> > > > > We'd better avoid such changes unless there is really good reason for that.
> > > > > So, I'd suggest to tweak your idea a bit:
> > > > >
> > > > > 1) change rte_security_set_pkt_metadata():
> > > > > if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> > > > > otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> > > > > (fast-path)
> > > > >
> > > > > 2) consider to make rte_security_set_pkt_metadata() inline function.
> > > > > We probably can have some special flag inside struct rte_security_ctx,
> > > > > or even store inside ctx a pointer to set_pkt_metadata() itself.
> > > >
> > > > After another thoughts some new flags might be better.
> > > > Then later, if we'll realize that set_pkt_metadata() and get_useradata()
> > > > are not really used by PMDs, it might be easier to deprecate these callbacks.
> > >
> > > Thanks, I agree with your thoughts. I'll submit a V2 with above change, new flags and
> > > set_pkt_metadata() and get_userdata() function pointers moved to rte_security_ctx for
> > > review so that it can be targeted for 21.11.
> > >
> > > Even with flags moving set_pkt_metadata() and get_userdata() function pointers is still needed
> > > as we need to make rte_security_set_pkt_metadata() API inline while struct rte_security_ops is not
> > > exposed to user. I think this is fine as it is inline with how fast path function pointers
> > > of rte_ethdev and rte_cryptodev are currently placed.
> >
> > My thought was we can get away with just flags only.
> > Something like that:
> > rte_security.h:
> >
> > ...
> >
> > enum {
> > 	RTE_SEC_CTX_F_FAST_SET_MDATA = 0x1,
> >               RTE_SEC_CTX_F_FAST_GET_UDATA = 0x2,
> > };
> >
> > struct rte_security_ctx {
> >         void *device;
> >         /**< Crypto/ethernet device attached */
> >         const struct rte_security_ops *ops;
> >         /**< Pointer to security ops for the device */
> >         uint16_t sess_cnt;
> >         /**< Number of sessions attached to this context */
> >        uint32_t flags;
> > };
> >
> > extern int
> > __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> >                                struct rte_security_session *sess,
> >                                struct rte_mbuf *m, void *params);
> >
> > static inline int
> >  rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> >                                struct rte_security_session *sess,
> >                                struct rte_mbuf *m, void *params)
> > {
> >       /* fast-path */
> >        if (instance->flags & RTE_SEC_CTX_F_FAST_SET_MDATA) {
> >               *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> >               return 0;
> >         /* slow path */
> >         } else
> >             return __rte_security_set_pkt_metadata (instance->device, sess, m, params);
> > }
> >
> > rte_security.c:
> >
> > ...
> > /* existing one, just renamed */
> > int
> > __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> >                               struct rte_security_session *sess,
> >                               struct rte_mbuf *m, void *params)
> > {
> > #ifdef RTE_DEBUG
> >         RTE_PTR_OR_ERR_RET(sess, -EINVAL);
> >         RTE_PTR_OR_ERR_RET(instance, -EINVAL);
> >         RTE_PTR_OR_ERR_RET(instance->ops, -EINVAL);
> > #endif
> >         RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->set_pkt_metadata, -ENOTSUP);
> >         return instance->ops->set_pkt_metadata(instance->device,
> >                                                sess, m, params);
> > }
> >
> >
> > I think both ways are possible (flags vs actual func pointers) and both have
> > some pluses and minuses.
> > I suppose the main choice here what do we think should be the future of
> > set_pkt_metadata() and rte_security_get_userdata().
> > If we think that they will be useful for some future PMDs and we want to keep them,
> > then probably storing actual func pointers inside ctx is a better approach.
> > If not, then flags seems like a better one, as in that case we can eventually
> > deprecate and remove these callbacks.
> > From what I see right now, custom callbacks seems excessive,
> > and rte_security_dynfield is enough.
> > But might be there are some future plans that would require them?
> 
> Above method is also fine. Moving fn pointers to rte_security_ctx can be
> done later if other PMD's need it.

Yes, agree.

> 
> Atleast our HW PMD's doesn't plan to use set_pkt_metada()/get_user_data()
> fn pointers in future if above is implemented.
> 
> >
> > >
> > > >
> > > > >
> > > > > As a brief code snippet:
> > > > >
> > > > > struct rte_security_ctx {
> > > > >         void *device;
> > > > >         /**< Crypto/ethernet device attached */
> > > > >         const struct rte_security_ops *ops;
> > > > >         /**< Pointer to security ops for the device */
> > > > >         uint16_t sess_cnt;
> > > > >         /**< Number of sessions attached to this context */
> > > > > +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> > > > > };
> > > > >
> > > > > static inline int
> > > > > rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> > > > >                               struct rte_security_session *sess,
> > > > >                               struct rte_mbuf *m, void *params)
> > > > > {
> > > > >      /* fast-path */
> > > > >       if (instance->set_pkt_mdata == NULL) {
> > > > >              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> > > > >              return 0;
> > > > >        /* slow path */
> > > > >        } else
> > > > >            return instance->set_pkt_mdata(instance->device, sess, m, params);
> > > > > }
> > > > >
> > > > > That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> > > > > some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> > > > > (ctx_create()), but hopefully will benefit everyone.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > I'm fine to
> > > > > > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > > > > > compensate for the overhead.
> > > > > > > >
> > > > > > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > > > > > rte_mbuf had space for struct rte_security_session pointer,
> > > > > > >
> > > > > > > But it does, see above.
> > > > > > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > > > > > If your PMD requires more data to be associated with mbuf
> > > > > > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > > > > > >
> > > > > > > > then then I guess it would have been better to do the way you proposed.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > > > > > called.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > > > > > Does it makes sense ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Once called,
> > > > > > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  /**
> > > > > > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > > > > > >   */
> > > > > > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] Minutes of Technical Board Meeting, 2021-06-16
@ 2021-07-14 15:15  4% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-14 15:15 UTC (permalink / raw)
  To: dev; +Cc: techboard

Members Attending: 12/12
	- Aaron Conole
	- Bruce Richardson
	- Ferruh Yigit
	- Hemant Agrawal
	- Honnappa Nagarahalli
	- Jerin Jacob
	- Kevin Traynor
	- Konstantin Ananyev
	- Maxime Coquelin
	- Olivier Matz
	- Stephen Hemminger
	- Thomas Monjalon (Chair)

NOTE: The Technical Board meetings take place every second Wednesday
on https://meet.jit.si/DPDK at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.
Agenda and minutes can be found at http://core.dpdk.org/techboard/minutes

NOTE: Next meeting will be on Wednesday 2021-06-30 @3pm UTC,
and will be chaired by Aaron.


1/ DTS workgroup

A group is working on DTS (DPDK Test Suite) feedbacks
with the target of making DTS test mandatory for new features,
starting with 22.05.

The tests are being listed in 2 categories: reviewed / non-reviewed
so it does not block DTS development while introducing some new policies.

There are many questions like how to manage DPDK code and DTS tests
in separate repositories? What is the scope of DTS?
How to manage limited HW availability?

Working document:
https://docs.google.com/document/d/1c5S0_mZzFvzZfYkqyORLT2-qNvUb-fBdjA6DGusy4yM
Emails:
https://inbox.dpdk.org/dev/?q=DTS+Workgroup


2/ UNH report

There is a document of Community Lab updates to read carefully:
https://docs.google.com/document/d/1v0VKtZdsMXg35WNDawdsnqj5J4Xl9Egu_4180ukKD2o

The report will be discussed during the next techboard meeting.


3/ IRC network

It seems freenode is not a trusted/working IRC network anymore.
We need to choose a new place for quick discussions.
OFTC is an old trusted network, Libera.Chat is in continuation of freenode.
Libera.Chat is chosen to be the network used by the DPDK community.
Our default channel is #DPDK.


4/ CVE

The vulnerabilities are better managed since Cheng Jiang joined the effort.
Thanks to him.


5/ techboard policies

There is document in progress to better define the techboard policies:
https://docs.google.com/document/d/1Al9-DPJSn7kXgEF3nhbp-srb_IMUU_T_wEWqugw4vtA

We will try to get an agreement in mid-July meeting.


6/ ABI

Ray, Bruce, Ferruh and Thomas worked on a plan to improve the ABI stability
with the objective of extending the compatibility period to 2 years:
https://docs.google.com/document/d/1Kju9FxBj3zR_hezErzitaatUrtdBsgL0iAlE05QNpck

After discussing the status and the focus of next improvements,
it has been decided to share a spreadsheet for volunteering:
https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE

The objective should be discussed in details during the next meeting.



^ permalink raw reply	[relevance 4%]

* [dpdk-dev] Minutes of Technical Board Meeting, 2021-06-30
@ 2021-07-14 15:11  4% Aaron Conole
  0 siblings, 0 replies; 200+ results
From: Aaron Conole @ 2021-07-14 15:11 UTC (permalink / raw)
  To: techboard, dev

Attendees
---------
* Aaron
* Bruce
* Ferruh
* Hemant
* Honnappa
* Jerin
* Kevin
* Konstantin
* Lincoln Lavioe (UNH representative)
* Maxime
* Olivier
* Stephen
* Thomas

NOTE: The technical board meets every second Wednesday at
https://meet.jit.si/DPDK at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.

NOTE: Additional follow up for ABI scheduled 2021-07-02

* intro
** updated agenda
*** added temp. gov board rep - aaron chosen (Thomas)
*** added next-net survey (Ferruh)
*** added security process (Maxime)

* Temp governing board membership
** Honnappa will be on PTO for the next Gov. Board meeting
** Decision to have Aaron present at the governing board

* Discussion about API for clang / gcc builtins (Honnappa)
** shemminger: MSFT doesn't have support for atomic builtins
** thomas: need to know what the effort looks like for compatibility
** thomas: 3 options to vote for atomics
*** OPT1 - continue using gcc builtins
*** OPT2 - a wrapper at compile time that can be done in future
*** OPT3 - do mass renames / wrapping now with internal implementations
*** OPT4 - create a wrapper that clones gcc built-ins instead
**** Not a good option because it could clash with external project that links in
     stdatomic vs builtin
*** shemminger: only available in c++
** Tabled for more discussion - Honnappa to follow up via tech board mailing list

* IOL
** Ask governing board about coverity license for coverity desktop
** First cut tools, cppcheck, scan-build, flawfinder
** Daily sub-tree reporting for merging, for now dashboard
** Single release report as well - Lincoln to capture as a story
** DTS workgroup for virtio
** question for techboard - is compression testing a priority?

* API / ABI discussion
*** Meeting setup for Fri, Jul 02, 2021


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-14 11:09  0%                           ` Ananyev, Konstantin
@ 2021-07-14 13:29  0%                             ` Nithin Dabilpuram
  2021-07-14 17:28  0%                               ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Nithin Dabilpuram @ 2021-07-14 13:29 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	Radu, jiawenwu, jianwang

On Wed, Jul 14, 2021 at 11:09:08AM +0000, Ananyev, Konstantin wrote:
> > > >
> > > > Adding more rte_security and PMD maintainers into the loop.
> > > >
> > > > > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > > > > work on Tx.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > > > > >
> > > > > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > > > > 7. Send packet out.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > > > > ipsec-secgw is following.
> > > > > > > > > > > >
> > > > > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > > > > >
> > > > > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > > > > >
> > > > > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > > > > >
> > > > > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling
> > rte_security_set_pkt_metadata().
> > > > > > > > > > >
> > > > > > > > > > > This is also fine with us.
> > > > > > > > > >
> > > > > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Is that correct understanding?
> > > > > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > > > > >
> > > > > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > > > > >
> > > > > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > > > > rte_security_set_pkt_metadata() is called again.
> > > > > > > >
> > > > > > > > Yep, I do have a concern here.
> > > > > > > > Right now it is perfectly valid to do something like that:
> > > > > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > > > > /* can modify contents of the packet */
> > > > > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > > > > >
> > > > > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > > > > >
> > > > > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > > > > PMD which are the ones only implementing the call back are already expecting the
> > > > > > > same ?
> > > > > >
> > > > > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > > > > All that ixgbe callback does - store session related data inside mbuf.
> > > > > > It's only expectation to have ESP trailer at the proper place (after ICV):
> > > > >
> > > > > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > > > > have ESP trailer updated or when mbuf->pkt_len = 0
> > > > >
> > > > > >
> > > > > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > > > > >                                 rte_security_dynfield(m);
> > > > > >   mdata->enc = 1;
> > > > > >   mdata->sa_idx = ic_session->sa_index;
> > > > > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > > > > >
> > > > > > Then this data will be used by tx_burst() function.
> > > > > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > > > > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > > > > will be using incorrect pad len ?
> > > >
> > > > No, pkt_len can be modified.
> > > > Though ESP trailer pad_len can't.
> > > >
> > > > >
> > > > > This patch is also trying to add similar restriction on when
> > > > > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > > > > calling rte_security_set_pkt_metadata().
> > > >
> > > > No, I don't think it is really the same.
> > > > Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> > > > that ESP packet is already formed and trailer contains valid data.
> > > > In fact, I think this pad_len calculation can be moved to actual TX function.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > > > > security,
> > > > > > > >
> > > > > > > > Yes, that was my thought.
> > > > > > > >
> > > > > > > > > my only argument was that since there is already a hit in
> > > > > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > > > > do security related pre-processing there.
> > > > > > > >
> > > > > > > > Yes, it would be extra callback call that way.
> > > > > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > > > > of function call will be spread around the whole burst, and I presume
> > > > > > > > shouldn't be too high.
> > > > > > > >
> > > > > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > > > > non-security pkts.
> > > > > > > >
> > > > > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > > > > modifications are required for the packet.
> > > > > > >
> > > > > > > But the major issues I see are
> > > > > > >
> > > > > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > > > > >    In our case, we need to know the security session details to do things.
> > > > > >
> > > > > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> > > > >
> > > > > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > > > > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > > > > cycles per pkt.
> > > >
> > > > In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> > > > second for  instance->ops->set_pkt_metadata() callback.
> > > > Which off-course way too expensive for such simple operation.
> > > > Actually same thought for rte_security_get_userdata().
> > > > Both of these functions belong to data-path and ideally have to be as fast as possible.
> > > > Probably 21.11 is a right timeframe for that.
> > > >
> > > > > >
> > > > > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > > > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > > > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > > > > >
> > > > > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > > > > specially for that - to store secuiryt related data inside the mbuf.
> > > > > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > > > > >
> > > > > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > > > > processing, it can be done via security specific set_pkt_metadata().
> > > > > >
> > > > > > But what you proposing introduces new limitations and might existing functionality.
> > > > > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > > > > itself is not an option?
> > > > >
> > > > > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > > > > rte_security_set_pkt_metadata().
> > > > >
> > > > > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > > > > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > > > > rte_security_dynfield like below ?
> > > > >
> > > > > <snip>
> > > > >
> > > > > static inline void
> > > > > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> > > > >         struct rte_mbuf *mb[], uint16_t num)
> > > > > {
> > > > >         uint32_t i, ol_flags;
> > > > >
> > > > >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> > > > >         for (i = 0; i != num; i++) {
> > > > >
> > > > >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> > > > >
> > > > >                 if (ol_flags != 0)
> > > > >                         rte_security_set_pkt_metadata(ss->security.ctx,
> > > > >                                 ss->security.ses, mb[i], NULL);
> > > > > 		else
> > > > >                 	*rte_security_dynfield(mb[i]) =
> > > > >                                 (uint64_t)ss->security.ses->sess_private_data;
> > > > >
> > > > >
> > > > > If the above can be done, then in our PMD, we will not have a callback for
> > > > > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > > > > in capabilities.
> > > >
> > > > That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> > > > So all existing apps that use this API will have to be changed.
> > > > We'd better avoid such changes unless there is really good reason for that.
> > > > So, I'd suggest to tweak your idea a bit:
> > > >
> > > > 1) change rte_security_set_pkt_metadata():
> > > > if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> > > > otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> > > > (fast-path)
> > > >
> > > > 2) consider to make rte_security_set_pkt_metadata() inline function.
> > > > We probably can have some special flag inside struct rte_security_ctx,
> > > > or even store inside ctx a pointer to set_pkt_metadata() itself.
> > >
> > > After another thoughts some new flags might be better.
> > > Then later, if we'll realize that set_pkt_metadata() and get_useradata()
> > > are not really used by PMDs, it might be easier to deprecate these callbacks.
> > 
> > Thanks, I agree with your thoughts. I'll submit a V2 with above change, new flags and
> > set_pkt_metadata() and get_userdata() function pointers moved to rte_security_ctx for
> > review so that it can be targeted for 21.11.
> > 
> > Even with flags moving set_pkt_metadata() and get_userdata() function pointers is still needed
> > as we need to make rte_security_set_pkt_metadata() API inline while struct rte_security_ops is not
> > exposed to user. I think this is fine as it is inline with how fast path function pointers
> > of rte_ethdev and rte_cryptodev are currently placed.
> 
> My thought was we can get away with just flags only.
> Something like that:
> rte_security.h:
> 
> ...
> 
> enum {
> 	RTE_SEC_CTX_F_FAST_SET_MDATA = 0x1,
>               RTE_SEC_CTX_F_FAST_GET_UDATA = 0x2,
> }; 
> 
> struct rte_security_ctx {
>         void *device;
>         /**< Crypto/ethernet device attached */
>         const struct rte_security_ops *ops;
>         /**< Pointer to security ops for the device */
>         uint16_t sess_cnt;
>         /**< Number of sessions attached to this context */
>        uint32_t flags;
> };
> 
> extern int
> __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
>                                struct rte_security_session *sess,
>                                struct rte_mbuf *m, void *params); 
> 
> static inline int
>  rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
>                                struct rte_security_session *sess,
>                                struct rte_mbuf *m, void *params)
> {
>       /* fast-path */
>        if (instance->flags & RTE_SEC_CTX_F_FAST_SET_MDATA) {
>               *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
>               return 0;
>         /* slow path */
>         } else
>             return __rte_security_set_pkt_metadata (instance->device, sess, m, params);
> }
> 
> rte_security.c: 
> 
> ...
> /* existing one, just renamed */
> int
> __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
>                               struct rte_security_session *sess,
>                               struct rte_mbuf *m, void *params)
> {
> #ifdef RTE_DEBUG
>         RTE_PTR_OR_ERR_RET(sess, -EINVAL);
>         RTE_PTR_OR_ERR_RET(instance, -EINVAL);
>         RTE_PTR_OR_ERR_RET(instance->ops, -EINVAL);
> #endif
>         RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->set_pkt_metadata, -ENOTSUP);
>         return instance->ops->set_pkt_metadata(instance->device,
>                                                sess, m, params);
> }
> 
> 
> I think both ways are possible (flags vs actual func pointers) and both have
> some pluses and minuses.
> I suppose the main choice here what do we think should be the future of
> set_pkt_metadata() and rte_security_get_userdata(). 
> If we think that they will be useful for some future PMDs and we want to keep them,
> then probably storing actual func pointers inside ctx is a better approach.
> If not, then flags seems like a better one, as in that case we can eventually
> deprecate and remove these callbacks.
> From what I see right now, custom callbacks seems excessive,
> and rte_security_dynfield is enough.
> But might be there are some future plans that would require them?   

Above method is also fine. Moving fn pointers to rte_security_ctx can be
done later if other PMD's need it.

Atleast our HW PMD's doesn't plan to use set_pkt_metada()/get_user_data() 
fn pointers in future if above is implemented.

>  
> > 
> > >
> > > >
> > > > As a brief code snippet:
> > > >
> > > > struct rte_security_ctx {
> > > >         void *device;
> > > >         /**< Crypto/ethernet device attached */
> > > >         const struct rte_security_ops *ops;
> > > >         /**< Pointer to security ops for the device */
> > > >         uint16_t sess_cnt;
> > > >         /**< Number of sessions attached to this context */
> > > > +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> > > > };
> > > >
> > > > static inline int
> > > > rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> > > >                               struct rte_security_session *sess,
> > > >                               struct rte_mbuf *m, void *params)
> > > > {
> > > >      /* fast-path */
> > > >       if (instance->set_pkt_mdata == NULL) {
> > > >              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> > > >              return 0;
> > > >        /* slow path */
> > > >        } else
> > > >            return instance->set_pkt_mdata(instance->device, sess, m, params);
> > > > }
> > > >
> > > > That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> > > > some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> > > > (ctx_create()), but hopefully will benefit everyone.
> > > >
> > > > >
> > > > > >
> > > > > > > I'm fine to
> > > > > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > > > > compensate for the overhead.
> > > > > > >
> > > > > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > > > > rte_mbuf had space for struct rte_security_session pointer,
> > > > > >
> > > > > > But it does, see above.
> > > > > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > > > > If your PMD requires more data to be associated with mbuf
> > > > > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > > > > >
> > > > > > > then then I guess it would have been better to do the way you proposed.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > > > > called.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > > > > Does it makes sense ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Once called,
> > > > > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  /**
> > > > > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > > > > >   */
> > > > > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-13 15:58  0%                         ` Nithin Dabilpuram
@ 2021-07-14 11:09  0%                           ` Ananyev, Konstantin
  2021-07-14 13:29  0%                             ` Nithin Dabilpuram
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-07-14 11:09 UTC (permalink / raw)
  To: Nithin Dabilpuram
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	 Radu, jiawenwu, jianwang

> > >
> > > Adding more rte_security and PMD maintainers into the loop.
> > >
> > > > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > > > work on Tx.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > > > >
> > > > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > > > >
> > > > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > > > 7. Send packet out.
> > > > > > > > > > > >
> > > > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > > > ipsec-secgw is following.
> > > > > > > > > > >
> > > > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > > > >
> > > > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > > > >
> > > > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > > > >
> > > > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling
> rte_security_set_pkt_metadata().
> > > > > > > > > >
> > > > > > > > > > This is also fine with us.
> > > > > > > > >
> > > > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Is that correct understanding?
> > > > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > > > >
> > > > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > > > >
> > > > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > > > rte_security_set_pkt_metadata() is called again.
> > > > > > >
> > > > > > > Yep, I do have a concern here.
> > > > > > > Right now it is perfectly valid to do something like that:
> > > > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > > > /* can modify contents of the packet */
> > > > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > > > >
> > > > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > > > >
> > > > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > > > PMD which are the ones only implementing the call back are already expecting the
> > > > > > same ?
> > > > >
> > > > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > > > All that ixgbe callback does - store session related data inside mbuf.
> > > > > It's only expectation to have ESP trailer at the proper place (after ICV):
> > > >
> > > > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > > > have ESP trailer updated or when mbuf->pkt_len = 0
> > > >
> > > > >
> > > > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > > > >                                 rte_security_dynfield(m);
> > > > >   mdata->enc = 1;
> > > > >   mdata->sa_idx = ic_session->sa_index;
> > > > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > > > >
> > > > > Then this data will be used by tx_burst() function.
> > > > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > > > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > > > will be using incorrect pad len ?
> > >
> > > No, pkt_len can be modified.
> > > Though ESP trailer pad_len can't.
> > >
> > > >
> > > > This patch is also trying to add similar restriction on when
> > > > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > > > calling rte_security_set_pkt_metadata().
> > >
> > > No, I don't think it is really the same.
> > > Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> > > that ESP packet is already formed and trailer contains valid data.
> > > In fact, I think this pad_len calculation can be moved to actual TX function.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > > > security,
> > > > > > >
> > > > > > > Yes, that was my thought.
> > > > > > >
> > > > > > > > my only argument was that since there is already a hit in
> > > > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > > > do security related pre-processing there.
> > > > > > >
> > > > > > > Yes, it would be extra callback call that way.
> > > > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > > > of function call will be spread around the whole burst, and I presume
> > > > > > > shouldn't be too high.
> > > > > > >
> > > > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > > > non-security pkts.
> > > > > > >
> > > > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > > > modifications are required for the packet.
> > > > > >
> > > > > > But the major issues I see are
> > > > > >
> > > > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > > > >    In our case, we need to know the security session details to do things.
> > > > >
> > > > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> > > >
> > > > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > > > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > > > cycles per pkt.
> > >
> > > In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> > > second for  instance->ops->set_pkt_metadata() callback.
> > > Which off-course way too expensive for such simple operation.
> > > Actually same thought for rte_security_get_userdata().
> > > Both of these functions belong to data-path and ideally have to be as fast as possible.
> > > Probably 21.11 is a right timeframe for that.
> > >
> > > > >
> > > > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > > > >
> > > > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > > > specially for that - to store secuiryt related data inside the mbuf.
> > > > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > > > >
> > > > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > > > processing, it can be done via security specific set_pkt_metadata().
> > > > >
> > > > > But what you proposing introduces new limitations and might existing functionality.
> > > > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > > > itself is not an option?
> > > >
> > > > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > > > rte_security_set_pkt_metadata().
> > > >
> > > > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > > > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > > > rte_security_dynfield like below ?
> > > >
> > > > <snip>
> > > >
> > > > static inline void
> > > > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> > > >         struct rte_mbuf *mb[], uint16_t num)
> > > > {
> > > >         uint32_t i, ol_flags;
> > > >
> > > >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> > > >         for (i = 0; i != num; i++) {
> > > >
> > > >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> > > >
> > > >                 if (ol_flags != 0)
> > > >                         rte_security_set_pkt_metadata(ss->security.ctx,
> > > >                                 ss->security.ses, mb[i], NULL);
> > > > 		else
> > > >                 	*rte_security_dynfield(mb[i]) =
> > > >                                 (uint64_t)ss->security.ses->sess_private_data;
> > > >
> > > >
> > > > If the above can be done, then in our PMD, we will not have a callback for
> > > > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > > > in capabilities.
> > >
> > > That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> > > So all existing apps that use this API will have to be changed.
> > > We'd better avoid such changes unless there is really good reason for that.
> > > So, I'd suggest to tweak your idea a bit:
> > >
> > > 1) change rte_security_set_pkt_metadata():
> > > if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> > > otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> > > (fast-path)
> > >
> > > 2) consider to make rte_security_set_pkt_metadata() inline function.
> > > We probably can have some special flag inside struct rte_security_ctx,
> > > or even store inside ctx a pointer to set_pkt_metadata() itself.
> >
> > After another thoughts some new flags might be better.
> > Then later, if we'll realize that set_pkt_metadata() and get_useradata()
> > are not really used by PMDs, it might be easier to deprecate these callbacks.
> 
> Thanks, I agree with your thoughts. I'll submit a V2 with above change, new flags and
> set_pkt_metadata() and get_userdata() function pointers moved to rte_security_ctx for
> review so that it can be targeted for 21.11.
> 
> Even with flags moving set_pkt_metadata() and get_userdata() function pointers is still needed
> as we need to make rte_security_set_pkt_metadata() API inline while struct rte_security_ops is not
> exposed to user. I think this is fine as it is inline with how fast path function pointers
> of rte_ethdev and rte_cryptodev are currently placed.

My thought was we can get away with just flags only.
Something like that:
rte_security.h:

...

enum {
	RTE_SEC_CTX_F_FAST_SET_MDATA = 0x1,
              RTE_SEC_CTX_F_FAST_GET_UDATA = 0x2,
}; 

struct rte_security_ctx {
        void *device;
        /**< Crypto/ethernet device attached */
        const struct rte_security_ops *ops;
        /**< Pointer to security ops for the device */
        uint16_t sess_cnt;
        /**< Number of sessions attached to this context */
       uint32_t flags;
};

extern int
__rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
                               struct rte_security_session *sess,
                               struct rte_mbuf *m, void *params); 

static inline int
 rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
                               struct rte_security_session *sess,
                               struct rte_mbuf *m, void *params)
{
      /* fast-path */
       if (instance->flags & RTE_SEC_CTX_F_FAST_SET_MDATA) {
              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
              return 0;
        /* slow path */
        } else
            return __rte_security_set_pkt_metadata (instance->device, sess, m, params);
}

rte_security.c: 

...
/* existing one, just renamed */
int
__rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
                              struct rte_security_session *sess,
                              struct rte_mbuf *m, void *params)
{
#ifdef RTE_DEBUG
        RTE_PTR_OR_ERR_RET(sess, -EINVAL);
        RTE_PTR_OR_ERR_RET(instance, -EINVAL);
        RTE_PTR_OR_ERR_RET(instance->ops, -EINVAL);
#endif
        RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->set_pkt_metadata, -ENOTSUP);
        return instance->ops->set_pkt_metadata(instance->device,
                                               sess, m, params);
}


I think both ways are possible (flags vs actual func pointers) and both have
some pluses and minuses.
I suppose the main choice here what do we think should be the future of
set_pkt_metadata() and rte_security_get_userdata(). 
If we think that they will be useful for some future PMDs and we want to keep them,
then probably storing actual func pointers inside ctx is a better approach.
If not, then flags seems like a better one, as in that case we can eventually
deprecate and remove these callbacks.
From what I see right now, custom callbacks seems excessive,
and rte_security_dynfield is enough.
But might be there are some future plans that would require them?   
 
> 
> >
> > >
> > > As a brief code snippet:
> > >
> > > struct rte_security_ctx {
> > >         void *device;
> > >         /**< Crypto/ethernet device attached */
> > >         const struct rte_security_ops *ops;
> > >         /**< Pointer to security ops for the device */
> > >         uint16_t sess_cnt;
> > >         /**< Number of sessions attached to this context */
> > > +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> > > };
> > >
> > > static inline int
> > > rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> > >                               struct rte_security_session *sess,
> > >                               struct rte_mbuf *m, void *params)
> > > {
> > >      /* fast-path */
> > >       if (instance->set_pkt_mdata == NULL) {
> > >              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> > >              return 0;
> > >        /* slow path */
> > >        } else
> > >            return instance->set_pkt_mdata(instance->device, sess, m, params);
> > > }
> > >
> > > That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> > > some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> > > (ctx_create()), but hopefully will benefit everyone.
> > >
> > > >
> > > > >
> > > > > > I'm fine to
> > > > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > > > compensate for the overhead.
> > > > > >
> > > > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > > > rte_mbuf had space for struct rte_security_session pointer,
> > > > >
> > > > > But it does, see above.
> > > > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > > > If your PMD requires more data to be associated with mbuf
> > > > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > > > >
> > > > > > then then I guess it would have been better to do the way you proposed.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > > > called.
> > > > > > > > > > > >
> > > > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > > > Does it makes sense ?
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Once called,
> > > > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  /**
> > > > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > > > >   */
> > > > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe
@ 2021-07-13 20:12  3% Stephen Hemminger
  2021-07-19 17:15  0% ` Tyler Retzlaff
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Stephen Hemminger @ 2021-07-13 20:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, anatoly.burakov

The first argument to rte_bsf32_safe was incorrectly declared as
a 64 bit value. This function only correctly handles on 32 bit values
and the underlying function rte_bsf32 only accepts 32 bit values.
This was introduced when the safe version was added and probably cause
by copy/paste from the 64 bit version.

The bug passed silently under the radar until some other code was
built with -Wall and -Wextra in C++ and C++ complains about the
missing cast.

Yes, this is a API signature change, but the original code was wrong.
It is an inline so not an ABI change.

Fixes: 4e261f551986 ("eal: add 64-bit bsf and 32-bit safe bsf functions")
Cc: anatoly.burakov@intel.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/eal/include/rte_common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index d5a32c66a5fe..99eb5f1820ae 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -623,7 +623,7 @@ rte_bsf32(uint32_t v)
  *     Returns 0 if ``v`` was 0, otherwise returns 1.
  */
 static inline int
-rte_bsf32_safe(uint64_t v, uint32_t *pos)
+rte_bsf32_safe(uint32_t v, uint32_t *pos)
 {
 	if (v == 0)
 		return 0;
-- 
2.30.2


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-13 14:08  0%                       ` Ananyev, Konstantin
@ 2021-07-13 15:58  0%                         ` Nithin Dabilpuram
  2021-07-14 11:09  0%                           ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Nithin Dabilpuram @ 2021-07-13 15:58 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	Radu, jiawenwu, jianwang

On Tue, Jul 13, 2021 at 02:08:18PM +0000, Ananyev, Konstantin wrote:
> 
> > 
> > Adding more rte_security and PMD maintainers into the loop.
> > 
> > > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > > work on Tx.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > > >
> > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > > >
> > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > > >
> > > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > > >
> > > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > > >
> > > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > > >
> > > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > > 7. Send packet out.
> > > > > > > > > > >
> > > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > > ipsec-secgw is following.
> > > > > > > > > >
> > > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > > >
> > > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > > >
> > > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > > >
> > > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling rte_security_set_pkt_metadata().
> > > > > > > > >
> > > > > > > > > This is also fine with us.
> > > > > > > >
> > > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > > Yes.
> > > > > > >
> > > > > > > >
> > > > > > > > Is that correct understanding?
> > > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > > >
> > > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > > >
> > > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > > rte_security_set_pkt_metadata() is called again.
> > > > > >
> > > > > > Yep, I do have a concern here.
> > > > > > Right now it is perfectly valid to do something like that:
> > > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > > /* can modify contents of the packet */
> > > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > > >
> > > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > > >
> > > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > > PMD which are the ones only implementing the call back are already expecting the
> > > > > same ?
> > > >
> > > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > > All that ixgbe callback does - store session related data inside mbuf.
> > > > It's only expectation to have ESP trailer at the proper place (after ICV):
> > >
> > > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > > have ESP trailer updated or when mbuf->pkt_len = 0
> > >
> > > >
> > > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > > >                                 rte_security_dynfield(m);
> > > >   mdata->enc = 1;
> > > >   mdata->sa_idx = ic_session->sa_index;
> > > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > > >
> > > > Then this data will be used by tx_burst() function.
> > > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > > will be using incorrect pad len ?
> > 
> > No, pkt_len can be modified.
> > Though ESP trailer pad_len can't.
> > 
> > >
> > > This patch is also trying to add similar restriction on when
> > > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > > calling rte_security_set_pkt_metadata().
> > 
> > No, I don't think it is really the same.
> > Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> > that ESP packet is already formed and trailer contains valid data.
> > In fact, I think this pad_len calculation can be moved to actual TX function.
> > 
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > > security,
> > > > > >
> > > > > > Yes, that was my thought.
> > > > > >
> > > > > > > my only argument was that since there is already a hit in
> > > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > > do security related pre-processing there.
> > > > > >
> > > > > > Yes, it would be extra callback call that way.
> > > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > > of function call will be spread around the whole burst, and I presume
> > > > > > shouldn't be too high.
> > > > > >
> > > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > > non-security pkts.
> > > > > >
> > > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > > modifications are required for the packet.
> > > > >
> > > > > But the major issues I see are
> > > > >
> > > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > > >    In our case, we need to know the security session details to do things.
> > > >
> > > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> > >
> > > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > > cycles per pkt.
> > 
> > In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> > second for  instance->ops->set_pkt_metadata() callback.
> > Which off-course way too expensive for such simple operation.
> > Actually same thought for rte_security_get_userdata().
> > Both of these functions belong to data-path and ideally have to be as fast as possible.
> > Probably 21.11 is a right timeframe for that.
> > 
> > > >
> > > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > > >
> > > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > > specially for that - to store secuiryt related data inside the mbuf.
> > > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > > >
> > > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > > processing, it can be done via security specific set_pkt_metadata().
> > > >
> > > > But what you proposing introduces new limitations and might existing functionality.
> > > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > > itself is not an option?
> > >
> > > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > > rte_security_set_pkt_metadata().
> > >
> > > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > > rte_security_dynfield like below ?
> > >
> > > <snip>
> > >
> > > static inline void
> > > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> > >         struct rte_mbuf *mb[], uint16_t num)
> > > {
> > >         uint32_t i, ol_flags;
> > >
> > >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> > >         for (i = 0; i != num; i++) {
> > >
> > >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> > >
> > >                 if (ol_flags != 0)
> > >                         rte_security_set_pkt_metadata(ss->security.ctx,
> > >                                 ss->security.ses, mb[i], NULL);
> > > 		else
> > >                 	*rte_security_dynfield(mb[i]) =
> > >                                 (uint64_t)ss->security.ses->sess_private_data;
> > >
> > >
> > > If the above can be done, then in our PMD, we will not have a callback for
> > > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > > in capabilities.
> > 
> > That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> > So all existing apps that use this API will have to be changed.
> > We'd better avoid such changes unless there is really good reason for that.
> > So, I'd suggest to tweak your idea a bit:
> > 
> > 1) change rte_security_set_pkt_metadata():
> > if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> > otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> > (fast-path)
> > 
> > 2) consider to make rte_security_set_pkt_metadata() inline function.
> > We probably can have some special flag inside struct rte_security_ctx,
> > or even store inside ctx a pointer to set_pkt_metadata() itself.
> 
> After another thoughts some new flags might be better.
> Then later, if we'll realize that set_pkt_metadata() and get_useradata()
> are not really used by PMDs, it might be easier to deprecate these callbacks.

Thanks, I agree with your thoughts. I'll submit a V2 with above change, new flags and 
set_pkt_metadata() and get_userdata() function pointers moved to rte_security_ctx for
review so that it can be targeted for 21.11. 

Even with flags moving set_pkt_metadata() and get_userdata() function pointers is still needed
as we need to make rte_security_set_pkt_metadata() API inline while struct rte_security_ops is not
exposed to user. I think this is fine as it is inline with how fast path function pointers
of rte_ethdev and rte_cryptodev are currently placed.

> 
> > 
> > As a brief code snippet:
> > 
> > struct rte_security_ctx {
> >         void *device;
> >         /**< Crypto/ethernet device attached */
> >         const struct rte_security_ops *ops;
> >         /**< Pointer to security ops for the device */
> >         uint16_t sess_cnt;
> >         /**< Number of sessions attached to this context */
> > +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> > };
> > 
> > static inline int
> > rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> >                               struct rte_security_session *sess,
> >                               struct rte_mbuf *m, void *params)
> > {
> >      /* fast-path */
> >       if (instance->set_pkt_mdata == NULL) {
> >              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> >              return 0;
> >        /* slow path */
> >        } else
> >            return instance->set_pkt_mdata(instance->device, sess, m, params);
> > }
> > 
> > That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> > some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> > (ctx_create()), but hopefully will benefit everyone.
> > 
> > >
> > > >
> > > > > I'm fine to
> > > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > > compensate for the overhead.
> > > > >
> > > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > > rte_mbuf had space for struct rte_security_session pointer,
> > > >
> > > > But it does, see above.
> > > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > > If your PMD requires more data to be associated with mbuf
> > > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > > >
> > > > > then then I guess it would have been better to do the way you proposed.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > > called.
> > > > > > > > > > >
> > > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > > Does it makes sense ?
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Once called,
> > > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > > >
> > > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > > >
> > > > > > > > > > > > >  /**
> > > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > > >   */
> > > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
  2021-07-13 14:19  3%   ` Ananyev, Konstantin
@ 2021-07-13 14:28  0%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2021-07-13 14:28 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Chengwen Feng, thomas, Yigit, Ferruh, jerinj, jerinjacobk, dev,
	mb, nipun.gupta, hemant.agrawal, maxime.coquelin,
	honnappa.nagarahalli, david.marchand, sburla, pkapoor, liangma

On Tue, Jul 13, 2021 at 03:19:39PM +0100, Ananyev, Konstantin wrote:
> 
> > +#include "rte_dmadev_core.h"
> > +
> > +/**
> > + *  DMA flags to augment operation preparation.
> > + *  Used as the 'flags' parameter of rte_dmadev_copy/copy_sg/fill/fill_sg.
> > + */
> > +#define RTE_DMA_FLAG_FENCE   (1ull << 0)
> > +/**< DMA fence flag
> > + * It means the operation with this flag must be processed only after all
> > + * previous operations are completed.
> > + *
> > + * @see rte_dmadev_copy()
> > + * @see rte_dmadev_copy_sg()
> > + * @see rte_dmadev_fill()
> > + * @see rte_dmadev_fill_sg()
> > + */
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a copy operation onto the virtual DMA channel.
> > + *
> > + * This queues up a copy operation to be performed by hardware, but does not
> > + * trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param src
> > + *   The address of the source buffer.
> > + * @param dst
> > + *   The address of the destination buffer.
> > + * @param length
> > + *   The length of the data to be copied.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> > +             uint32_t length, uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> 
> One question I have - did you guys consider hiding definitions of struct rte_dmadev
> and  rte_dmadevices[] into .c straight from the start?
> Probably no point to repeat our famous ABI ethdev/cryptodev/... pitfalls here.
> 
I considered it, but I found even moving one operation (the doorbell one)
to be non-inline made a small but noticable perf drop. Until we get all the
drivers done and more testing in various scenarios, I'd rather err on the
side of getting the best performance.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
    2021-07-12 12:05  3%   ` Bruce Richardson
  2021-07-12 15:50  3%   ` Bruce Richardson
@ 2021-07-13 14:19  3%   ` Ananyev, Konstantin
  2021-07-13 14:28  0%     ` Bruce Richardson
  2 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-07-13 14:19 UTC (permalink / raw)
  To: Chengwen Feng, thomas, Yigit, Ferruh, Richardson,  Bruce, jerinj,
	jerinjacobk
  Cc: dev, mb, nipun.gupta, hemant.agrawal, maxime.coquelin,
	honnappa.nagarahalli, david.marchand, sburla, pkapoor, liangma


> +#include "rte_dmadev_core.h"
> +
> +/**
> + *  DMA flags to augment operation preparation.
> + *  Used as the 'flags' parameter of rte_dmadev_copy/copy_sg/fill/fill_sg.
> + */
> +#define RTE_DMA_FLAG_FENCE	(1ull << 0)
> +/**< DMA fence flag
> + * It means the operation with this flag must be processed only after all
> + * previous operations are completed.
> + *
> + * @see rte_dmadev_copy()
> + * @see rte_dmadev_copy_sg()
> + * @see rte_dmadev_fill()
> + * @see rte_dmadev_fill_sg()
> + */
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a copy operation onto the virtual DMA channel.
> + *
> + * This queues up a copy operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The address of the source buffer.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the data to be copied.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> +		uint32_t length, uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];

One question I have - did you guys consider hiding definitions of struct rte_dmadev 
and  rte_dmadevices[] into .c straight from the start?
Probably no point to repeat our famous ABI ethdev/cryptodev/... pitfalls here.  

> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->copy)(dev, vchan, src, dst, length, flags);
> +}
> +
> +/**

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-13 12:33  3%                     ` Ananyev, Konstantin
@ 2021-07-13 14:08  0%                       ` Ananyev, Konstantin
  2021-07-13 15:58  0%                         ` Nithin Dabilpuram
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-07-13 14:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, Nithin Dabilpuram
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	 Radu, jiawenwu, jianwang


> 
> Adding more rte_security and PMD maintainers into the loop.
> 
> > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > work on Tx.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > >
> > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > >
> > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > >
> > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > >
> > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > >
> > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > >
> > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > 7. Send packet out.
> > > > > > > > > >
> > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > ipsec-secgw is following.
> > > > > > > > >
> > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > >
> > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > >
> > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > >
> > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling rte_security_set_pkt_metadata().
> > > > > > > >
> > > > > > > > This is also fine with us.
> > > > > > >
> > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > Yes.
> > > > > >
> > > > > > >
> > > > > > > Is that correct understanding?
> > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > >
> > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > >
> > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > rte_security_set_pkt_metadata() is called again.
> > > > >
> > > > > Yep, I do have a concern here.
> > > > > Right now it is perfectly valid to do something like that:
> > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > /* can modify contents of the packet */
> > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > >
> > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > >
> > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > PMD which are the ones only implementing the call back are already expecting the
> > > > same ?
> > >
> > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > All that ixgbe callback does - store session related data inside mbuf.
> > > It's only expectation to have ESP trailer at the proper place (after ICV):
> >
> > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > have ESP trailer updated or when mbuf->pkt_len = 0
> >
> > >
> > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > >                                 rte_security_dynfield(m);
> > >   mdata->enc = 1;
> > >   mdata->sa_idx = ic_session->sa_index;
> > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > >
> > > Then this data will be used by tx_burst() function.
> > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > will be using incorrect pad len ?
> 
> No, pkt_len can be modified.
> Though ESP trailer pad_len can't.
> 
> >
> > This patch is also trying to add similar restriction on when
> > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > calling rte_security_set_pkt_metadata().
> 
> No, I don't think it is really the same.
> Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> that ESP packet is already formed and trailer contains valid data.
> In fact, I think this pad_len calculation can be moved to actual TX function.
> 
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > security,
> > > > >
> > > > > Yes, that was my thought.
> > > > >
> > > > > > my only argument was that since there is already a hit in
> > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > do security related pre-processing there.
> > > > >
> > > > > Yes, it would be extra callback call that way.
> > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > of function call will be spread around the whole burst, and I presume
> > > > > shouldn't be too high.
> > > > >
> > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > non-security pkts.
> > > > >
> > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > modifications are required for the packet.
> > > >
> > > > But the major issues I see are
> > > >
> > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > >    In our case, we need to know the security session details to do things.
> > >
> > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> >
> > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > cycles per pkt.
> 
> In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> second for  instance->ops->set_pkt_metadata() callback.
> Which off-course way too expensive for such simple operation.
> Actually same thought for rte_security_get_userdata().
> Both of these functions belong to data-path and ideally have to be as fast as possible.
> Probably 21.11 is a right timeframe for that.
> 
> > >
> > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > >
> > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > specially for that - to store secuiryt related data inside the mbuf.
> > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > >
> > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > processing, it can be done via security specific set_pkt_metadata().
> > >
> > > But what you proposing introduces new limitations and might existing functionality.
> > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > itself is not an option?
> >
> > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > rte_security_set_pkt_metadata().
> >
> > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > rte_security_dynfield like below ?
> >
> > <snip>
> >
> > static inline void
> > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> >         struct rte_mbuf *mb[], uint16_t num)
> > {
> >         uint32_t i, ol_flags;
> >
> >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> >         for (i = 0; i != num; i++) {
> >
> >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> >
> >                 if (ol_flags != 0)
> >                         rte_security_set_pkt_metadata(ss->security.ctx,
> >                                 ss->security.ses, mb[i], NULL);
> > 		else
> >                 	*rte_security_dynfield(mb[i]) =
> >                                 (uint64_t)ss->security.ses->sess_private_data;
> >
> >
> > If the above can be done, then in our PMD, we will not have a callback for
> > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > in capabilities.
> 
> That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> So all existing apps that use this API will have to be changed.
> We'd better avoid such changes unless there is really good reason for that.
> So, I'd suggest to tweak your idea a bit:
> 
> 1) change rte_security_set_pkt_metadata():
> if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> (fast-path)
> 
> 2) consider to make rte_security_set_pkt_metadata() inline function.
> We probably can have some special flag inside struct rte_security_ctx,
> or even store inside ctx a pointer to set_pkt_metadata() itself.

After another thoughts some new flags might be better.
Then later, if we'll realize that set_pkt_metadata() and get_useradata()
are not really used by PMDs, it might be easier to deprecate these callbacks.

> 
> As a brief code snippet:
> 
> struct rte_security_ctx {
>         void *device;
>         /**< Crypto/ethernet device attached */
>         const struct rte_security_ops *ops;
>         /**< Pointer to security ops for the device */
>         uint16_t sess_cnt;
>         /**< Number of sessions attached to this context */
> +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> };
> 
> static inline int
> rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
>                               struct rte_security_session *sess,
>                               struct rte_mbuf *m, void *params)
> {
>      /* fast-path */
>       if (instance->set_pkt_mdata == NULL) {
>              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
>              return 0;
>        /* slow path */
>        } else
>            return instance->set_pkt_mdata(instance->device, sess, m, params);
> }
> 
> That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> (ctx_create()), but hopefully will benefit everyone.
> 
> >
> > >
> > > > I'm fine to
> > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > compensate for the overhead.
> > > >
> > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > rte_mbuf had space for struct rte_security_session pointer,
> > >
> > > But it does, see above.
> > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > If your PMD requires more data to be associated with mbuf
> > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > >
> > > > then then I guess it would have been better to do the way you proposed.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > called.
> > > > > > > > > >
> > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > Does it makes sense ?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Once called,
> > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > >
> > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > >
> > > > > > > > > > > >  /**
> > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > >   */
> > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 00/10] new features for ipsec and security libraries
@ 2021-07-13 13:35  3% Radu Nicolau
    0 siblings, 1 reply; 200+ results
From: Radu Nicolau @ 2021-07-13 13:35 UTC (permalink / raw)
  Cc: dev, Radu Nicolau, Declan Doherty, Abhijit Sinha, Daniel Martin Buckley

Add support for:
TSO, NAT-T/UDP encapsulation, ESN
AES_CCM, CHACHA20_POLY1305 and AES_GMAC
SA telemetry
mbuf offload flags
Initial SQN value

This patchset introduces ABI breakages and it is intended for 21.11 release

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com>

Radu Nicolau (10):
  security: add support for TSO on IPsec session
  security: add UDP params for IPsec NAT-T
  security: add ESN field to ipsec_xform
  mbuf: add IPsec ESP tunnel type
  ipsec: add support for AEAD algorithms
  ipsec: add transmit segmentation offload support
  ipsec: add support for NAT-T
  ipsec: add support for SA telemetry
  ipsec: add support for initial SQN value
  ipsec: add ol_flags support

 lib/ipsec/crypto.h          | 137 ++++++++++++
 lib/ipsec/esp_inb.c         |  88 +++++++-
 lib/ipsec/esp_outb.c        | 262 +++++++++++++++++++----
 lib/ipsec/iph.h             |  23 +-
 lib/ipsec/meson.build       |   2 +-
 lib/ipsec/rte_ipsec.h       |  11 +
 lib/ipsec/rte_ipsec_sa.h    |  11 +-
 lib/ipsec/sa.c              | 406 ++++++++++++++++++++++++++++++++++--
 lib/ipsec/sa.h              |  43 ++++
 lib/ipsec/version.map       |   8 +
 lib/mbuf/rte_mbuf_core.h    |   1 +
 lib/security/rte_security.h |  31 +++
 12 files changed, 950 insertions(+), 73 deletions(-)

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] dmadev: introduce DMA device library
  2021-07-13 13:06  3%   ` fengchengwen
@ 2021-07-13 13:37  0%     ` Bruce Richardson
  2021-07-15  6:44  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2021-07-13 13:37 UTC (permalink / raw)
  To: fengchengwen
  Cc: thomas, ferruh.yigit, jerinj, jerinjacobk, andrew.rybchenko, dev,
	mb, nipun.gupta, hemant.agrawal, maxime.coquelin,
	honnappa.nagarahalli, david.marchand, sburla, pkapoor,
	konstantin.ananyev

On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> Thank you for your valuable comments, and I think we've taken a big step forward.
> 
> @andrew Could you provide the copyright line so that I can add it to relevant file.
> 
> @burce, jerin  Some unmodified review comments are returned here:

Thanks. Some further comments inline below. Most points you make I'm ok
with, but I do disagree on a number of others.

/Bruce

> 
> 1.
> COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
> to be wrapped so aggressively.
> 
> REPLY: Our CI still has 80 characters limit, and I review most framework still comply.
> 
Ok.

> 2.
> COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
> RTE_DMA_DIRECTION_...
> 
> REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.
> 
DIRECTION could be shortened to DIR, but I think this is probably ok as is
too.

> 3.
> COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> We are not making release as pubic API in other device class. See ethdev spec.
> bbdev/eventdev/rawdev
> 
> REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
> I think release is OK, BTW: bbdev/eventdev also have release ops.
> 
Ok

> 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> fields */
> > +};
> Please add the capability for each counter in info structure as one
> device may support all the counters.
> 
> REPLY: This is a statistics function. If this function is not supported,
> then do not need to implement the stats ops function. Also could to set
> the unimplemented ones to zero.
> 
+1
The stats functions should be a minimum set that is supported by all
drivers. Each of these stats can be easily tracked by software if HW
support for it is not available, so I agree that we should not have each
stat as a capability.

> 5.
> COMMENT: > +#endif
> > +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> Instead of every driver set the NOP function, In the common code, If
> the CAPA is not set,
> common code can set NOP function for this with <0 return value.
> 
> REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
> don't call API which driver not supported (which could get from capabilities).
> 
For datapath functions, +1.

> 6.
> COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> > +                          const uint16_t nb_status, uint32_t *status,
> uint32_t -> enum rte_dma_status_code
> 
> REPLY:I'm still evaluating this. It takes a long time for the driver to perform error code
> conversion in this API. Do we need to provide an error code conversion function alone ?
> 
It's not that difficult a conversion to do, and so long as we have the
regular "completed" function which doesn't do all the error manipulation we
should be fine. Performance in the case of errors is not expected to be as
good, since errors should be very rare.

> 7.
> COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> > +                                struct rte_dmadev_info *dev_info);
> Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
> as this header is exported.
> 
> REPLY: I prefer not add 'rte_' prefix, it make the define too long.
> 
I disagree on this, they need the rte_ prefix, despite the fact it makes
them longer. If length is a concern, these can be changed from "dmadev_" to
"rte_dma_", which is only one character longer.
In fact, I believe Morten already suggested we use "rte_dma" rather than
"rte_dmadev" as a function prefix across the library.

> 8.
> COMMENT: > + *        - rte_dmadev_completed_fails()
> > + *            - return the number of operation requests failed to complete.
> Please rename this to "completed_status" to allow the return of information
> other than just errors. As I suggested before, I think this should also be
> usable as a slower version of "completed" even in the case where there are
> no errors, in that it returns status information for each and every job
> rather than just returning as soon as it hits a failure.
> 
> REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
> and we can build the slow path function on the two API.
> 
I still disagree on this too. We have a "completed" op where we get
informed of what has completed and minimal error indication, and a
"completed_status" operation which provides status information for each
operation completed, at the cost of speed.

> 9.
> COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> > +/**< DMA device support mem-to-mem transfer.
> Do we need this? Can we assume that any device appearing as a dmadev can
> do mem-to-mem copies, and drop the capability for mem-to-mem and the
> capability for copying?
> also for RTE_DMA_DEV_CAPA_OPS_COPY
> 
> REPLY: yes, I insist on adding this for the sake of conceptual integrity.
> For ioat driver just make a statement.
> 

Ok. It seems a wasted bit to me, but I don't see us running out of them
soon.

> 10.
> COMMENT: > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> > +};
> Let's add rte_dmadev_conf struct into this to return the configuration
> settings.
> 
> REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.
> 
Yes, that is true, but I fail to see why that is a major problem. It just
means that if the conf structure changes we have two functions to version
instead of one. The information is still useful.

If you don't want the actual conf structure explicitly put into the info
struct, we can instead put the fields in directly. I really think that the
info_get function should provide back to the user the details of what way
the device was configured previously.

regards,
/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] dmadev: introduce DMA device library
  @ 2021-07-13 13:06  3%   ` fengchengwen
  2021-07-13 13:37  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2021-07-13 13:06 UTC (permalink / raw)
  To: thomas, ferruh.yigit, bruce.richardson, jerinj, jerinjacobk,
	andrew.rybchenko
  Cc: dev, mb, nipun.gupta, hemant.agrawal, maxime.coquelin,
	honnappa.nagarahalli, david.marchand, sburla, pkapoor,
	konstantin.ananyev

Thank you for your valuable comments, and I think we've taken a big step forward.

@andrew Could you provide the copyright line so that I can add it to relevant file.

@burce, jerin  Some unmodified review comments are returned here:

1.
COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
to be wrapped so aggressively.

REPLY: Our CI still has 80 characters limit, and I review most framework still comply.

2.
COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
RTE_DMA_DIRECTION_...

REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.

3.
COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
We are not making release as pubic API in other device class. See ethdev spec.
bbdev/eventdev/rawdev

REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
I think release is OK, BTW: bbdev/eventdev also have release ops.

4.
COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future fields */
> +};
Please add the capability for each counter in info structure as one
device may support all
the counters.

REPLY: This is a statistics function. If this function is not supported, then do not need
to implement the stats ops function. Also could to set the unimplemented ones to zero.

5.
COMMENT: > +#endif
> +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
Instead of every driver set the NOP function, In the common code, If
the CAPA is not set,
common code can set NOP function for this with <0 return value.

REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
don't call API which driver not supported (which could get from capabilities).

6.
COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> +                          const uint16_t nb_status, uint32_t *status,
uint32_t -> enum rte_dma_status_code

REPLY:I'm still evaluating this. It takes a long time for the driver to perform error code
conversion in this API. Do we need to provide an error code conversion function alone ?

7.
COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> +                                struct rte_dmadev_info *dev_info);
Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
as this header is exported.

REPLY: I prefer not add 'rte_' prefix, it make the define too long.

8.
COMMENT: > + *        - rte_dmadev_completed_fails()
> + *            - return the number of operation requests failed to complete.
Please rename this to "completed_status" to allow the return of information
other than just errors. As I suggested before, I think this should also be
usable as a slower version of "completed" even in the case where there are
no errors, in that it returns status information for each and every job
rather than just returning as soon as it hits a failure.

REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
and we can build the slow path function on the two API.

9.
COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA device support mem-to-mem transfer.
Do we need this? Can we assume that any device appearing as a dmadev can
do mem-to-mem copies, and drop the capability for mem-to-mem and the
capability for copying?
also for RTE_DMA_DEV_CAPA_OPS_COPY

REPLY: yes, I insist on adding this for the sake of conceptual integrity.
For ioat driver just make a statement.

10.
COMMENT: > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> +};
Let's add rte_dmadev_conf struct into this to return the configuration
settings.

REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.


[snip]

On 2021/7/13 20:27, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> v3:
> * rm reset and fill_sg ops.
> * rm MT-safe capabilities.
> * add submit flag.
> * redefine rte_dma_sg to implement asymmetric copy.
> * delete some reserved field for future use.
> * rearrangement rte_dmadev/rte_dmadev_data struct.
> * refresh rte_dmadev.h copyright.
> * update vchan setup parameter.
> * modified some inappropriate descriptions.
> * arrange version.map alphabetically.
> * other minor modifications from review comment.
> ---
>  MAINTAINERS                  |   4 +
>  config/rte_config.h          |   3 +
>  lib/dmadev/meson.build       |   7 +
>  lib/dmadev/rte_dmadev.c      | 561 +++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 968 +++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h | 161 +++++++
>  lib/dmadev/rte_dmadev_pmd.h  |  72 ++++
>  lib/dmadev/version.map       |  37 ++
>  lib/meson.build              |   1 +


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  @ 2021-07-13 12:33  3%                     ` Ananyev, Konstantin
  2021-07-13 14:08  0%                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-07-13 12:33 UTC (permalink / raw)
  To: Nithin Dabilpuram
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	 Radu, jiawenwu, jianwang


Adding more rte_security and PMD maintainers into the loop.

> > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > >
> > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > work on Tx.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > >
> > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > >
> > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > >
> > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > >
> > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > >
> > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > >
> > > > > > > > > 1. receive_pkt()
> > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > 5. Do route_lookup()
> > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > 7. Send packet out.
> > > > > > > > >
> > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > ipsec-secgw is following.
> > > > > > > >
> > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > >
> > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > >
> > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > >
> > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling rte_security_set_pkt_metadata().
> > > > > > >
> > > > > > > This is also fine with us.
> > > > > >
> > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > Is that correct understanding?
> > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > >
> > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > are callbacks from same PMD, do you see any issue ?
> > > > >
> > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > rte_security_set_pkt_metadata() is called again.
> > > >
> > > > Yep, I do have a concern here.
> > > > Right now it is perfectly valid to do something like that:
> > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > /* can modify contents of the packet */
> > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > rte_eth_tx_burst(..., &mb, 1);
> > > >
> > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > >
> > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > PMD which are the ones only implementing the call back are already expecting the
> > > same ?
> >
> > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > All that ixgbe callback does - store session related data inside mbuf.
> > It's only expectation to have ESP trailer at the proper place (after ICV):
> 
> This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> have ESP trailer updated or when mbuf->pkt_len = 0
> 
> >
> > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> >                                 rte_security_dynfield(m);
> >   mdata->enc = 1;
> >   mdata->sa_idx = ic_session->sa_index;
> >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> >
> > Then this data will be used by tx_burst() function.
> So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> will be using incorrect pad len ?

No, pkt_len can be modified.
Though ESP trailer pad_len can't.

> 
> This patch is also trying to add similar restriction on when
> rte_security_set_pkt_metadata() should be called and what cannot be done after
> calling rte_security_set_pkt_metadata().

No, I don't think it is really the same.
Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
that ESP packet is already formed and trailer contains valid data.
In fact, I think this pad_len calculation can be moved to actual TX function.

> 
> >
> > >
> > > >
> > > > >
> > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > security,
> > > >
> > > > Yes, that was my thought.
> > > >
> > > > > my only argument was that since there is already a hit in
> > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > do security related pre-processing there.
> > > >
> > > > Yes, it would be extra callback call that way.
> > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > of function call will be spread around the whole burst, and I presume
> > > > shouldn't be too high.
> > > >
> > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > non-security pkts.
> > > >
> > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > modifications are required for the packet.
> > >
> > > But the major issues I see are
> > >
> > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > >    In our case, we need to know the security session details to do things.
> >
> > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> 
> We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> just for storing session pointer in rte_security_dynfield consumes unnecessary
> cycles per pkt.

In fact there are two function calls: one for rte_security_set_pkt_metadata(),
second for  instance->ops->set_pkt_metadata() callback.
Which off-course way too expensive for such simple operation.
Actually same thought for rte_security_get_userdata().
Both of these functions belong to data-path and ideally have to be as fast as possible.
Probably 21.11 is a right timeframe for that.
 
> >
> > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> >
> > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > specially for that - to store secuiryt related data inside the mbuf.
> > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> >
> > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > processing, it can be done via security specific set_pkt_metadata().
> >
> > But what you proposing introduces new limitations and might existing functionality.
> > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > itself is not an option?
> 
> We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> rte_security_set_pkt_metadata().
> 
> Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> set, then, user needs to update struct rte_security_session's sess_private_data in a in
> rte_security_dynfield like below ?
> 
> <snip>
> 
> static inline void
> inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
>         struct rte_mbuf *mb[], uint16_t num)
> {
>         uint32_t i, ol_flags;
> 
>         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
>         for (i = 0; i != num; i++) {
> 
>                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> 
>                 if (ol_flags != 0)
>                         rte_security_set_pkt_metadata(ss->security.ctx,
>                                 ss->security.ses, mb[i], NULL);
> 		else
>                 	*rte_security_dynfield(mb[i]) =
>                                 (uint64_t)ss->security.ses->sess_private_data;
> 
> 
> If the above can be done, then in our PMD, we will not have a callback for
> set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> in capabilities.

That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
So all existing apps that use this API will have to be changed.
We'd better avoid such changes unless there is really good reason for that.
So, I'd suggest to tweak your idea a bit:

1) change rte_security_set_pkt_metadata():
if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
otherwise just: rte_security_dynfield(m) = sess->session_private_data;
(fast-path)

2) consider to make rte_security_set_pkt_metadata() inline function. 
We probably can have some special flag inside struct rte_security_ctx,
or even store inside ctx a pointer to set_pkt_metadata() itself.

As a brief code snippet:

struct rte_security_ctx {
        void *device;
        /**< Crypto/ethernet device attached */
        const struct rte_security_ops *ops;
        /**< Pointer to security ops for the device */
        uint16_t sess_cnt;
        /**< Number of sessions attached to this context */
+     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);   
}; 

static inline int
rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
                              struct rte_security_session *sess,
                              struct rte_mbuf *m, void *params)
{
     /* fast-path */
      if (instance->set_pkt_mdata == NULL) {
             *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
             return 0; 
       /* slow path */ 
       } else
           return instance->set_pkt_mdata(instance->device, sess, m, params);
}

That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require 
some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
(ctx_create()), but hopefully will benefit everyone.

> 
> >
> > > I'm fine to
> > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > compensate for the overhead.
> > >
> > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > rte_mbuf had space for struct rte_security_session pointer,
> >
> > But it does, see above.
> > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > If your PMD requires more data to be associated with mbuf
> > - you can request it via mbuf_dynfield and store there whatever is needed.
> >
> > > then then I guess it would have been better to do the way you proposed.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > called.
> > > > > > > > >
> > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > Does it makes sense ?
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Once called,
> > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > >
> > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > >
> > > > > > > > > > >  /**
> > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > >   */
> > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > 2.25.1
> > > > > > > > > >

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
  2021-07-12 15:50  3%   ` Bruce Richardson
@ 2021-07-13  9:07  0%     ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-13  9:07 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Chengwen Feng, Thomas Monjalon, Ferruh Yigit, Jerin Jacob,
	dpdk-dev, Morten Brørup, Nipun Gupta, Hemant Agrawal,
	Maxime Coquelin, Honnappa Nagarahalli, David Marchand,
	Satananda Burla, Prasun Kapoor, Ananyev, Konstantin, liangma

On Mon, Jul 12, 2021 at 9:21 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Sun, Jul 11, 2021 at 05:25:56PM +0800, Chengwen Feng wrote:
> > This patch introduce 'dmadevice' which is a generic type of DMA
> > device.
> >
> > The APIs of dmadev library exposes some generic operations which can
> > enable configuration and I/O with the DMA devices.
> >
> > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>
> Hi again,
>
> some further review comments inline.
>
> /Bruce
>
> > ---
> >  MAINTAINERS                  |    4 +
> >  config/rte_config.h          |    3 +
> >  lib/dmadev/meson.build       |    6 +
> >  lib/dmadev/rte_dmadev.c      |  560 +++++++++++++++++++++++
> >  lib/dmadev/rte_dmadev.h      | 1030 ++++++++++++++++++++++++++++++++++++++++++
> >  lib/dmadev/rte_dmadev_core.h |  159 +++++++
> >  lib/dmadev/rte_dmadev_pmd.h  |   72 +++
> >  lib/dmadev/version.map       |   40 ++
> >  lib/meson.build              |    1 +
>
> <snip>
>
> > diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> > new file mode 100644
> > index 0000000..8779512
> > --- /dev/null
> > +++ b/lib/dmadev/rte_dmadev.h
> > @@ -0,0 +1,1030 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2021 HiSilicon Limited.
> > + * Copyright(c) 2021 Intel Corporation.
> > + * Copyright(c) 2021 Marvell International Ltd.
> > + */
> > +
> > +#ifndef _RTE_DMADEV_H_
> > +#define _RTE_DMADEV_H_
> > +
> > +/**
> > + * @file rte_dmadev.h
> > + *
> > + * RTE DMA (Direct Memory Access) device APIs.
> > + *
> > + * The DMA framework is built on the following model:
> > + *
> > + *     ---------------   ---------------       ---------------
> > + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
> > + *     | channel     |   | channel     |       | channel     |
> > + *     ---------------   ---------------       ---------------
> > + *            |                |                      |
> > + *            ------------------                      |
> > + *                     |                              |
> > + *               ------------                    ------------
> > + *               |  dmadev  |                    |  dmadev  |
> > + *               ------------                    ------------
> > + *                     |                              |
> > + *            ------------------               ------------------
> > + *            | HW-DMA-channel |               | HW-DMA-channel |
> > + *            ------------------               ------------------
> > + *                     |                              |
> > + *                     --------------------------------
> > + *                                     |
> > + *                           ---------------------
> > + *                           | HW-DMA-Controller |
> > + *                           ---------------------
> > + *
> > + * The DMA controller could have multilpe HW-DMA-channels (aka. HW-DMA-queues),
> > + * each HW-DMA-channel should be represented by a dmadev.
> > + *
> > + * The dmadev could create multiple virtual DMA channel, each virtual DMA
> > + * channel represents a different transfer context. The DMA operation request
> > + * must be submitted to the virtual DMA channel.
> > + * E.G. Application could create virtual DMA channel 0 for mem-to-mem transfer
> > + *      scenario, and create virtual DMA channel 1 for mem-to-dev transfer
> > + *      scenario.
> > + *
> > + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
> > + * PCI/SoC device probing phase performed at EAL initialization time. And could
> > + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> > + * phase.
> > + *
> > + * We use 'uint16_t dev_id' as the device identifier of a dmadev, and
> > + * 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
> > + *
> > + * The functions exported by the dmadev API to setup a device designated by its
> > + * device identifier must be invoked in the following order:
> > + *     - rte_dmadev_configure()
> > + *     - rte_dmadev_vchan_setup()
> > + *     - rte_dmadev_start()
> > + *
> > + * Then, the application can invoke dataplane APIs to process jobs.
> > + *
> > + * If the application wants to change the configuration (i.e. call
> > + * rte_dmadev_configure()), it must call rte_dmadev_stop() first to stop the
> > + * device and then do the reconfiguration before calling rte_dmadev_start()
> > + * again. The dataplane APIs should not be invoked when the device is stopped.
> > + *
> > + * Finally, an application can close a dmadev by invoking the
> > + * rte_dmadev_close() function.
> > + *
> > + * The dataplane APIs include two parts:
> > + *   a) The first part is the submission of operation requests:
> > + *        - rte_dmadev_copy()
> > + *        - rte_dmadev_copy_sg() - scatter-gather form of copy
> > + *        - rte_dmadev_fill()
> > + *        - rte_dmadev_fill_sg() - scatter-gather form of fill
> > + *        - rte_dmadev_perform() - issue doorbell to hardware
> > + *      These APIs could work with different virtual DMA channels which have
> > + *      different contexts.
> > + *      The first four APIs are used to submit the operation request to the
> > + *      virtual DMA channel, if the submission is successful, a uint16_t
> > + *      ring_idx is returned, otherwise a negative number is returned.
> > + *   b) The second part is to obtain the result of requests:
> > + *        - rte_dmadev_completed()
> > + *            - return the number of operation requests completed successfully.
> > + *        - rte_dmadev_completed_fails()
> > + *            - return the number of operation requests failed to complete.
>
> Please rename this to "completed_status" to allow the return of information
> other than just errors. As I suggested before, I think this should also be
> usable as a slower version of "completed" even in the case where there are
> no errors, in that it returns status information for each and every job
> rather than just returning as soon as it hits a failure.
>
> > + * + * About the ring_idx which rte_dmadev_copy/copy_sg/fill/fill_sg()
> > returned, + * the rules are as follows: + *   a) ring_idx for each
> > virtual DMA channel are independent.  + *   b) For a virtual DMA channel,
> > the ring_idx is monotonically incremented, + *      when it reach
> > UINT16_MAX, it wraps back to zero.
>
> Based on other feedback, I suggest we put in the detail here that: "This
> index can be used by applications to track per-job metadata in an
> application-defined circular ring, where the ring is a power-of-2 size, and
> the indexes are masked appropriately."
>
> > + *   c) The initial ring_idx of a virtual DMA channel is zero, after the device
> > + *      is stopped or reset, the ring_idx needs to be reset to zero.
> > + *   Example:
> > + *      step-1: start one dmadev
> > + *      step-2: enqueue a copy operation, the ring_idx return is 0
> > + *      step-3: enqueue a copy operation again, the ring_idx return is 1
> > + *      ...
> > + *      step-101: stop the dmadev
> > + *      step-102: start the dmadev
> > + *      step-103: enqueue a copy operation, the cookie return is 0
> > + *      ...
> > + *      step-x+0: enqueue a fill operation, the ring_idx return is 65535
> > + *      step-x+1: enqueue a copy operation, the ring_idx return is 0
> > + *      ...
> > + *
> > + * By default, all the non-dataplane functions of the dmadev API exported by a
> > + * PMD are lock-free functions which assume to not be invoked in parallel on
> > + * different logical cores to work on the same target object.
> > + *
> > + * The dataplane functions of the dmadev API exported by a PMD can be MT-safe
> > + * only when supported by the driver, generally, the driver will reports two
> > + * capabilities:
> > + *   a) Whether to support MT-safe for the submit/completion API of the same
> > + *      virtual DMA channel.
> > + *      E.G. one thread do submit operation, another thread do completion
> > + *           operation.
> > + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VCHAN.
> > + *      If driver don't support it, it's up to the application to guarantee
> > + *      MT-safe.
> > + *   b) Whether to support MT-safe for different virtual DMA channels.
> > + *      E.G. one thread do operation on virtual DMA channel 0, another thread
> > + *           do operation on virtual DMA channel 1.
> > + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN.
> > + *      If driver don't support it, it's up to the application to guarantee
> > + *      MT-safe.
> > + *
> > + */
>
> Just to check - do we have hardware that currently supports these
> capabilities? For Intel HW, we will only support one virtual channel per
> device without any MT-safety guarantees, so won't be setting either of
> these flags. If any of these flags are unused in all planned drivers, we
> should drop them from the spec until they prove necessary. Idealy,
> everything in the dmadev definition should be testable, and features unused
> by anyone obviously will be untested.
>
> > +
> > +#include <rte_common.h>
> > +#include <rte_compat.h>
> > +#include <rte_errno.h>
> > +#include <rte_memory.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#define RTE_DMADEV_NAME_MAX_LEN      RTE_DEV_NAME_MAX_LEN
> > +
> > +extern int rte_dmadev_logtype;
> > +
> > +#define RTE_DMADEV_LOG(level, ...) \
> > +     rte_log(RTE_LOG_ ## level, rte_dmadev_logtype, "" __VA_ARGS__)
> > +
> > +/* Macros to check for valid port */
> > +#define RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, retval) do { \
> > +     if (!rte_dmadev_is_valid_dev(dev_id)) { \
> > +             RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> > +             return retval; \
> > +     } \
> > +} while (0)
> > +
> > +#define RTE_DMADEV_VALID_DEV_ID_OR_RET(dev_id) do { \
> > +     if (!rte_dmadev_is_valid_dev(dev_id)) { \
> > +             RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> > +             return; \
> > +     } \
> > +} while (0)
> > +
>
> Can we avoid using these in the inline functions in this file, and move
> them to the _pmd.h which is for internal PMD use only? It would mean we
> don't get logging from the key dataplane functions, but I would hope the
> return values would provide enough info.
>
> Alternatively, can we keep the logtype definition and first macro and move
> the other two to the _pmd.h file.
>
> > +/**
> > + * @internal
> > + * Validate if the DMA device index is a valid attached DMA device.
> > + *
> > + * @param dev_id
> > + *   DMA device index.
> > + *
> > + * @return
> > + *   - If the device index is valid (true) or not (false).
> > + */
> > +__rte_internal
> > +bool
> > +rte_dmadev_is_valid_dev(uint16_t dev_id);
> > +
> > +/**
> > + * rte_dma_sg - can hold scatter DMA operation request
> > + */
> > +struct rte_dma_sg {
> > +     rte_iova_t src;
> > +     rte_iova_t dst;
> > +     uint32_t length;
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Get the total number of DMA devices that have been successfully
> > + * initialised.
> > + *
> > + * @return
> > + *   The total number of usable DMA devices.
> > + */
> > +__rte_experimental
> > +uint16_t
> > +rte_dmadev_count(void);
> > +
> > +/**
> > + * The capabilities of a DMA device
> > + */
> > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM  (1ull << 0)
> > +/**< DMA device support mem-to-mem transfer.
>
> Do we need this? Can we assume that any device appearing as a dmadev can
> do mem-to-mem copies, and drop the capability for mem-to-mem and the
> capability for copying?
>
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_MEM_TO_DEV  (1ull << 1)
> > +/**< DMA device support slave mode & mem-to-dev transfer.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_DEV_TO_MEM  (1ull << 2)
> > +/**< DMA device support slave mode & dev-to-mem transfer.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_DEV_TO_DEV  (1ull << 3)
> > +/**< DMA device support slave mode & dev-to-dev transfer.
> > + *
>
> Just to confirm, are there devices currently planned for dmadev that

We are planning to use this support as our exiting raw driver has this.

> supports only a subset of these flags? Thinking particularly of the
> dev-2-mem and mem-2-dev ones here - do any of the devices we are
> considering not support using device memory?
> [Again, just want to ensure we aren't adding too much stuff that we don't
> need yet]



>
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_OPS_COPY    (1ull << 4)
> > +/**< DMA device support copy ops.
> > + *
>
> Suggest dropping this and making it min for dmadev.
>
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_OPS_FILL    (1ull << 5)
> > +/**< DMA device support fill ops.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_OPS_SG              (1ull << 6)
> > +/**< DMA device support scatter-list ops.
> > + * If device support ops_copy and ops_sg, it means supporting copy_sg ops.
> > + * If device support ops_fill and ops_sg, it means supporting fill_sg ops.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_FENCE               (1ull << 7)
> > +/**< DMA device support fence.
> > + * If device support fence, then application could set a fence flags when
> > + * enqueue operation by rte_dma_copy/copy_sg/fill/fill_sg.
> > + * If a operation has a fence flags, it means the operation must be processed
> > + * only after all previous operations are completed.
> > + *
>
> Is this needed? As I understand it, the Marvell driver doesn't require
> fences so providing one is a no-op. Therefore, this flag is probably
> unnecessary.

+1

>
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_SVA         (1ull << 8)
> > +/**< DMA device support SVA which could use VA as DMA address.
> > + * If device support SVA then application could pass any VA address like memory
> > + * from rte_malloc(), rte_memzone(), malloc, stack memory.
> > + * If device don't support SVA, then application should pass IOVA address which
> > + * from rte_malloc(), rte_memzone().
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_MT_VCHAN    (1ull << 9)
> > +/**< DMA device support MT-safe of a virtual DMA channel.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN      (1ull << 10)
> > +/**< DMA device support MT-safe of different virtual DMA channels.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
>
> As with comments above - let's check that these will actually be used
> before we add them.
>
> > +
> > +/**
> > + * A structure used to retrieve the contextual information of
> > + * an DMA device
> > + */
> > +struct rte_dmadev_info {
> > +     struct rte_device *device; /**< Generic Device information */
> > +     uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> > +     /** Maximum number of virtual DMA channels supported */
> > +     uint16_t max_vchans;
> > +     /** Maximum allowed number of virtual DMA channel descriptors */
> > +     uint16_t max_desc;
> > +     /** Minimum allowed number of virtual DMA channel descriptors */
> > +     uint16_t min_desc;
> > +     uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> > +};
>
> Let's add rte_dmadev_conf struct into this to return the configuration
> settings.
>
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Retrieve the contextual information of a DMA device.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param[out] dev_info
> > + *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
> > + *   contextual information of the device.
> > + *
> > + * @return
> > + *   - =0: Success, driver updates the contextual information of the DMA device
> > + *   - <0: Error code returned by the driver info get function.
> > + *
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
> > +
>
> Should have "const" on second param.
>
> > +/**
> > + * A structure used to configure a DMA device.
> > + */
> > +struct rte_dmadev_conf {
> > +     /** Maximum number of virtual DMA channel to use.
> > +      * This value cannot be greater than the field 'max_vchans' of struct
> > +      * rte_dmadev_info which get from rte_dmadev_info_get().
> > +      */
> > +     uint16_t max_vchans;
> > +     /** Enable bit for MT-safe of a virtual DMA channel.
> > +      * This bit can be enabled only when the device supports
> > +      * RTE_DMA_DEV_CAPA_MT_VCHAN.
> > +      * @see RTE_DMA_DEV_CAPA_MT_VCHAN
> > +      */
> > +     uint8_t enable_mt_vchan : 1;
> > +     /** Enable bit for MT-safe of different virtual DMA channels.
> > +      * This bit can be enabled only when the device supports
> > +      * RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN.
> > +      * @see RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN
> > +      */
> > +     uint8_t enable_mt_multi_vchan : 1;
> > +     uint64_t reserved[2]; /**< Reserved for future fields */
> > +};
>
> Drop the reserved fields. ABI versioning is a better way to deal with
> adding new fields.

+1

>
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Configure a DMA device.
> > + *
> > + * This function must be invoked first before any other function in the
> > + * API. This function can also be re-invoked when a device is in the
> > + * stopped state.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device to configure.
> > + * @param dev_conf
> > + *   The DMA device configuration structure encapsulated into rte_dmadev_conf
> > + *   object.
> > + *
> > + * @return
> > + *   - =0: Success, device configured.
> > + *   - <0: Error code returned by the driver configuration function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Start a DMA device.
> > + *
> > + * The device start step is the last one and consists of setting the DMA
> > + * to start accepting jobs.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *   - =0: Success, device started.
> > + *   - <0: Error code returned by the driver start function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_start(uint16_t dev_id);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Stop a DMA device.
> > + *
> > + * The device can be restarted with a call to rte_dmadev_start()
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *   - =0: Success, device stopped.
> > + *   - <0: Error code returned by the driver stop function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_stop(uint16_t dev_id);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Close a DMA device.
> > + *
> > + * The device cannot be restarted after this call.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *  - =0: Successfully close device
> > + *  - <0: Failure to close device
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_close(uint16_t dev_id);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Reset a DMA device.
> > + *
> > + * This is different from cycle of rte_dmadev_start->rte_dmadev_stop in the
> > + * sense similar to hard or soft reset.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *   - =0: Successfully reset device.
> > + *   - <0: Failure to reset device.
> > + *   - (-ENOTSUP): If the device doesn't support this function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_reset(uint16_t dev_id);
> > +
> > +/**
> > + * DMA transfer direction defines.
> > + */
> > +#define RTE_DMA_MEM_TO_MEM   (1ull << 0)
> > +/**< DMA transfer direction - from memory to memory.
> > + *
> > + * @see struct rte_dmadev_vchan_conf::direction
> > + */
> > +#define RTE_DMA_MEM_TO_DEV   (1ull << 1)
> > +/**< DMA transfer direction - slave mode & from memory to device.
> > + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> > + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> > + * request from ARM memory to x86 host memory.
>
> For clarity, it would be good to specify in the scenario described which
> memory is the "mem" and which is the "dev" (I assume SoC memory is "mem"
> and x86 host memory is "dev"??)
>
> > + *
> > + * @see struct rte_dmadev_vchan_conf::direction
> > + */
> > +#define RTE_DMA_DEV_TO_MEM   (1ull << 2)
> > +/**< DMA transfer direction - slave mode & from device to memory.
> > + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> > + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> > + * request from x86 host memory to ARM memory.
> > + *
> > + * @see struct rte_dmadev_vchan_conf::direction
> > + */
> > +#define RTE_DMA_DEV_TO_DEV   (1ull << 3)
> > +/**< DMA transfer direction - slave mode & from device to device.
> > + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> > + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> > + * request from x86 host memory to another x86 host memory.
> > + *
> > + * @see struct rte_dmadev_vchan_conf::direction
> > + */
> > +#define RTE_DMA_TRANSFER_DIR_ALL     (RTE_DMA_MEM_TO_MEM | \
> > +                                      RTE_DMA_MEM_TO_DEV | \
> > +                                      RTE_DMA_DEV_TO_MEM | \
> > +                                      RTE_DMA_DEV_TO_DEV)
> > +
> > +/**
> > + * enum rte_dma_slave_port_type - slave mode type defines
> > + */
> > +enum rte_dma_slave_port_type {
> > +     /** The slave port is PCIE. */
> > +     RTE_DMA_SLAVE_PORT_PCIE = 1,
> > +};
> > +
>
> As previously mentioned, this needs to be updated to use other terms.
> For some suggested alternatives see:
> https://doc.dpdk.org/guides-21.05/contributing/coding_style.html#naming
>
> > +/**
> > + * A structure used to descript slave port parameters.
> > + */
> > +struct rte_dma_slave_port_parameters {
> > +     enum rte_dma_slave_port_type port_type;
> > +     union {
> > +             /** For PCIE port */
> > +             struct {
> > +                     /** The physical function number which to use */
> > +                     uint64_t pf_number : 6;
> > +                     /** Virtual function enable bit */
> > +                     uint64_t vf_enable : 1;
> > +                     /** The virtual function number which to use */
> > +                     uint64_t vf_number : 8;
> > +                     uint64_t pasid : 20;
> > +                     /** The attributes filed in TLP packet */
> > +                     uint64_t tlp_attr : 3;
> > +             };
> > +     };
> > +};
> > +
> > +/**
> > + * A structure used to configure a virtual DMA channel.
> > + */
> > +struct rte_dmadev_vchan_conf {
> > +     uint8_t direction; /**< Set of supported transfer directions */
> > +     /** Number of descriptor for the virtual DMA channel */
> > +     uint16_t nb_desc;
> > +     /** 1) Used to describes the dev parameter in the mem-to-dev/dev-to-mem
> > +      * transfer scenario.
> > +      * 2) Used to describes the src dev parameter in the dev-to-dev
> > +      * transfer scenario.
> > +      */
> > +     struct rte_dma_slave_port_parameters port;
> > +     /** Used to describes the dst dev parameters in the dev-to-dev
> > +      * transfer scenario.
> > +      */
> > +     struct rte_dma_slave_port_parameters peer_port;
> > +     uint64_t reserved[2]; /**< Reserved for future fields */
> > +};
>
> Let's drop the reserved fields and use ABI versioning if necesssary in
> future.
>
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Allocate and set up a virtual DMA channel.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param conf
> > + *   The virtual DMA channel configuration structure encapsulated into
> > + *   rte_dmadev_vchan_conf object.
> > + *
> > + * @return
> > + *   - >=0: Allocate success, it is the virtual DMA channel id. This value must
> > + *          be less than the field 'max_vchans' of struct rte_dmadev_conf
> > +         which configured by rte_dmadev_configure().
>
> nit: whitespace error here.
>
> > + *   - <0: Error code returned by the driver virtual channel setup function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_vchan_setup(uint16_t dev_id,
> > +                    const struct rte_dmadev_vchan_conf *conf);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Release a virtual DMA channel.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel which return by vchan setup.
> > + *
> > + * @return
> > + *   - =0: Successfully release the virtual DMA channel.
> > + *   - <0: Error code returned by the driver virtual channel release function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> > +
> > +/**
> > + * rte_dmadev_stats - running statistics.
> > + */
> > +struct rte_dmadev_stats {
> > +     /** Count of operations which were successfully enqueued */
> > +     uint64_t enqueued_count;
> > +     /** Count of operations which were submitted to hardware */
> > +     uint64_t submitted_count;
> > +     /** Count of operations which failed to complete */
> > +     uint64_t completed_fail_count;
> > +     /** Count of operations which successfully complete */
> > +     uint64_t completed_count;
> > +     uint64_t reserved[4]; /**< Reserved for future fields */
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Retrieve basic statistics of a or all virtual DMA channel(s).
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel, -1 means all channels.
> > + * @param[out] stats
> > + *   The basic statistics structure encapsulated into rte_dmadev_stats
> > + *   object.
> > + *
> > + * @return
> > + *   - =0: Successfully retrieve stats.
> > + *   - <0: Failure to retrieve stats.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_stats_get(uint16_t dev_id, int vchan,
>
> vchan as uint16_t rather than int, I think. This would apply to all
> dataplane functions. There is no need for a signed vchan value.
>
> > +                  struct rte_dmadev_stats *stats);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Reset basic statistics of a or all virtual DMA channel(s).
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel, -1 means all channels.
> > + *
> > + * @return
> > + *   - =0: Successfully reset stats.
> > + *   - <0: Failure to reset stats.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_stats_reset(uint16_t dev_id, int vchan);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Dump DMA device info.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param f
> > + *   The file to write the output to.
> > + *
> > + * @return
> > + *   0 on success. Non-zero otherwise.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_dump(uint16_t dev_id, FILE *f);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Trigger the dmadev self test.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *   - 0: Selftest successful.
> > + *   - -ENOTSUP if the device doesn't support selftest
> > + *   - other values < 0 on failure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_selftest(uint16_t dev_id);
>
> I don't think this needs to be in the public API, since it should only be
> for the autotest app to use. Maybe move the prototype to the _pmd.h (since
> we don't have a separate internal header), and then the autotest app can
> pick it up from there.
>
> > +
> > +#include "rte_dmadev_core.h"
> > +
> > +/**
> > + *  DMA flags to augment operation preparation.
> > + *  Used as the 'flags' parameter of rte_dmadev_copy/copy_sg/fill/fill_sg.
> > + */
> > +#define RTE_DMA_FLAG_FENCE   (1ull << 0)
> > +/**< DMA fence flag
> > + * It means the operation with this flag must be processed only after all
> > + * previous operations are completed.
> > + *
> > + * @see rte_dmadev_copy()
> > + * @see rte_dmadev_copy_sg()
> > + * @see rte_dmadev_fill()
> > + * @see rte_dmadev_fill_sg()
> > + */
>
> As a general comment, I think all these multi-line comments should go
> before the item they describe. Comments after should only be used in the
> case where the comment fits on the rest of the line after a value.
>
> We also should define the SUBMIT flag as suggested by Jerin, to allow apps
> to automatically submit jobs after enqueue.
>
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a copy operation onto the virtual DMA channel.
> > + *
> > + * This queues up a copy operation to be performed by hardware, but does not
> > + * trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param src
> > + *   The address of the source buffer.
> > + * @param dst
> > + *   The address of the destination buffer.
> > + * @param length
> > + *   The length of the data to be copied.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> > +             uint32_t length, uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->copy)(dev, vchan, src, dst, length, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a scatter list copy operation onto the virtual DMA channel.
> > + *
> > + * This queues up a scatter list copy operation to be performed by hardware,
> > + * but does not trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param sg
> > + *   The pointer of scatterlist.
> > + * @param sg_len
> > + *   The number of scatterlist elements.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct rte_dma_sg *sg,
> > +                uint32_t sg_len, uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(sg, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->copy_sg)(dev, vchan, sg, sg_len, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a fill operation onto the virtual DMA channel.
> > + *
> > + * This queues up a fill operation to be performed by hardware, but does not
> > + * trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param pattern
> > + *   The pattern to populate the destination buffer with.
> > + * @param dst
> > + *   The address of the destination buffer.
> > + * @param length
> > + *   The length of the destination buffer.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> > +             rte_iova_t dst, uint32_t length, uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a scatter list fill operation onto the virtual DMA channel.
> > + *
> > + * This queues up a scatter list fill operation to be performed by hardware,
> > + * but does not trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param pattern
> > + *   The pattern to populate the destination buffer with.
> > + * @param sg
> > + *   The pointer of scatterlist.
> > + * @param sg_len
> > + *   The number of scatterlist elements.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_fill_sg(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> > +                const struct rte_dma_sg *sg, uint32_t sg_len,
> > +                uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(sg, -ENOTSUP);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->fill_sg)(dev, vchan, pattern, sg, sg_len, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Trigger hardware to begin performing enqueued operations.
> > + *
> > + * This API is used to write the "doorbell" to the hardware to trigger it
> > + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + *
> > + * @return
> > + *   - =0: Successfully trigger hardware.
> > + *   - <0: Failure to trigger hardware.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_submit(uint16_t dev_id, uint16_t vchan)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->submit, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->submit)(dev, vchan);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Returns the number of operations that have been successfully completed.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param nb_cpls
> > + *   The maximum number of completed operations that can be processed.
> > + * @param[out] last_idx
> > + *   The last completed operation's index.
> > + *   If not required, NULL can be passed in.
> > + * @param[out] has_error
> > + *   Indicates if there are transfer error.
> > + *   If not required, NULL can be passed in.
> > + *
> > + * @return
> > + *   The number of operations that successfully completed.
> > + */
> > +__rte_experimental
> > +static inline uint16_t
> > +rte_dmadev_completed(uint16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
> > +                  uint16_t *last_idx, bool *has_error)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +     uint16_t idx;
> > +     bool err;
> > +
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->completed, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +     if (nb_cpls == 0) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid nb_cpls\n");
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +
> > +     /* Ensure the pointer values are non-null to simplify drivers.
> > +      * In most cases these should be compile time evaluated, since this is
> > +      * an inline function.
> > +      * - If NULL is explicitly passed as parameter, then compiler knows the
> > +      *   value is NULL
> > +      * - If address of local variable is passed as parameter, then compiler
> > +      *   can know it's non-NULL.
> > +      */
> > +     if (last_idx == NULL)
> > +             last_idx = &idx;
> > +     if (has_error == NULL)
> > +             has_error = &err;
> > +
> > +     *has_error = false;
> > +     return (*dev->completed)(dev, vchan, nb_cpls, last_idx, has_error);
> > +}
> > +
> > +/**
> > + * DMA transfer status code defines
> > + */
> > +enum rte_dma_status_code {
> > +     /** The operation completed successfully */
> > +     RTE_DMA_STATUS_SUCCESSFUL = 0,
> > +     /** The operation failed to complete due active drop
> > +      * This is mainly used when processing dev_stop, allow outstanding
> > +      * requests to be completed as much as possible.
> > +      */
> > +     RTE_DMA_STATUS_ACTIVE_DROP,
> > +     /** The operation failed to complete due invalid source address */
> > +     RTE_DMA_STATUS_INVALID_SRC_ADDR,
> > +     /** The operation failed to complete due invalid destination address */
> > +     RTE_DMA_STATUS_INVALID_DST_ADDR,
> > +     /** The operation failed to complete due invalid length */
> > +     RTE_DMA_STATUS_INVALID_LENGTH,
> > +     /** The operation failed to complete due invalid opcode
> > +      * The DMA descriptor could have multiple format, which are
> > +      * distinguished by the opcode field.
> > +      */
> > +     RTE_DMA_STATUS_INVALID_OPCODE,
> > +     /** The operation failed to complete due bus err */
> > +     RTE_DMA_STATUS_BUS_ERROR,
> > +     /** The operation failed to complete due data poison */
> > +     RTE_DMA_STATUS_DATA_POISION,
> > +     /** The operation failed to complete due descriptor read error */
> > +     RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
> > +     /** The operation failed to complete due device link error
> > +      * Used to indicates that the link error in the mem-to-dev/dev-to-mem/
> > +      * dev-to-dev transfer scenario.
> > +      */
> > +     RTE_DMA_STATUS_DEV_LINK_ERROR,
> > +     /** Driver specific status code offset
> > +      * Start status code for the driver to define its own error code.
> > +      */
> > +     RTE_DMA_STATUS_DRV_SPECIFIC_OFFSET = 0x10000,
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Returns the number of operations that failed to complete.
> > + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param nb_status
> > + *   Indicates the size of status array.
> > + * @param[out] status
> > + *   The error code of operations that failed to complete.
> > + *   Some standard error code are described in 'enum rte_dma_status_code'
> > + *   @see rte_dma_status_code
> > + * @param[out] last_idx
> > + *   The last failed completed operation's index.
> > + *
> > + * @return
> > + *   The number of operations that failed to complete.
> > + */
> > +__rte_experimental
> > +static inline uint16_t
> > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> > +                        const uint16_t nb_status, uint32_t *status,
> > +                        uint16_t *last_idx)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(status, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(last_idx, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->completed_fails, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +     if (nb_status == 0) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid nb_status\n");
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->completed_fails)(dev, vchan, nb_status, status, last_idx);
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_DMADEV_H_ */
> > diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> > new file mode 100644
> > index 0000000..410faf0
> > --- /dev/null
> > +++ b/lib/dmadev/rte_dmadev_core.h
> > @@ -0,0 +1,159 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2021 HiSilicon Limited.
> > + * Copyright(c) 2021 Intel Corporation.
> > + */
> > +
> > +#ifndef _RTE_DMADEV_CORE_H_
> > +#define _RTE_DMADEV_CORE_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * RTE DMA Device internal header.
> > + *
> > + * This header contains internal data types, that are used by the DMA devices
> > + * in order to expose their ops to the class.
> > + *
> > + * Applications should not use these API directly.
> > + *
> > + */
> > +
> > +struct rte_dmadev;
> > +
> > +/** @internal Used to get device information of a device. */
> > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> > +                              struct rte_dmadev_info *dev_info);
>
> First parameter can be "const"
>
> > +/** @internal Used to configure a device. */
> > +typedef int (*dmadev_configure_t)(struct rte_dmadev *dev,
> > +                               const struct rte_dmadev_conf *dev_conf);
> > +
> > +/** @internal Used to start a configured device. */
> > +typedef int (*dmadev_start_t)(struct rte_dmadev *dev);
> > +
> > +/** @internal Used to stop a configured device. */
> > +typedef int (*dmadev_stop_t)(struct rte_dmadev *dev);
> > +
> > +/** @internal Used to close a configured device. */
> > +typedef int (*dmadev_close_t)(struct rte_dmadev *dev);
> > +
> > +/** @internal Used to reset a configured device. */
> > +typedef int (*dmadev_reset_t)(struct rte_dmadev *dev);
> > +
> > +/** @internal Used to allocate and set up a virtual DMA channel. */
> > +typedef int (*dmadev_vchan_setup_t)(struct rte_dmadev *dev,
> > +                                 const struct rte_dmadev_vchan_conf *conf);
> > +
> > +/** @internal Used to release a virtual DMA channel. */
> > +typedef int (*dmadev_vchan_release_t)(struct rte_dmadev *dev, uint16_t vchan);
> > +
> > +/** @internal Used to retrieve basic statistics. */
> > +typedef int (*dmadev_stats_get_t)(struct rte_dmadev *dev, int vchan,
> > +                               struct rte_dmadev_stats *stats);
>
> First parameter can be "const"
>
> > +
> > +/** @internal Used to reset basic statistics. */
> > +typedef int (*dmadev_stats_reset_t)(struct rte_dmadev *dev, int vchan);
> > +
> > +/** @internal Used to dump internal information. */
> > +typedef int (*dmadev_dump_t)(struct rte_dmadev *dev, FILE *f);
> > +
>
> First param "const"
>
> > +/** @internal Used to start dmadev selftest. */
> > +typedef int (*dmadev_selftest_t)(uint16_t dev_id);
> > +
>
> This looks an outlier taking a dev_id. It should take a rawdev parameter.
> Most drivers should not need to implement this anyway, as the main unit
> tests should be in "test_dmadev.c" in the autotest app.
>
> > +/** @internal Used to enqueue a copy operation. */
> > +typedef int (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                          rte_iova_t src, rte_iova_t dst,
> > +                          uint32_t length, uint64_t flags);
> > +
> > +/** @internal Used to enqueue a scatter list copy operation. */
> > +typedef int (*dmadev_copy_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                             const struct rte_dma_sg *sg,
> > +                             uint32_t sg_len, uint64_t flags);
> > +
> > +/** @internal Used to enqueue a fill operation. */
> > +typedef int (*dmadev_fill_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                          uint64_t pattern, rte_iova_t dst,
> > +                          uint32_t length, uint64_t flags);
> > +
> > +/** @internal Used to enqueue a scatter list fill operation. */
> > +typedef int (*dmadev_fill_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                     uint64_t pattern, const struct rte_dma_sg *sg,
> > +                     uint32_t sg_len, uint64_t flags);
> > +
> > +/** @internal Used to trigger hardware to begin working. */
> > +typedef int (*dmadev_submit_t)(struct rte_dmadev *dev, uint16_t vchan);
> > +
> > +/** @internal Used to return number of successful completed operations. */
> > +typedef uint16_t (*dmadev_completed_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                                    const uint16_t nb_cpls,
> > +                                    uint16_t *last_idx, bool *has_error);
> > +
> > +/** @internal Used to return number of failed completed operations. */
> > +typedef uint16_t (*dmadev_completed_fails_t)(struct rte_dmadev *dev,
> > +                     uint16_t vchan, const uint16_t nb_status,
> > +                     uint32_t *status, uint16_t *last_idx);
> > +
> > +/**
> > + * DMA device operations function pointer table
> > + */
> > +struct rte_dmadev_ops {
> > +     dmadev_info_get_t dev_info_get;
> > +     dmadev_configure_t dev_configure;
> > +     dmadev_start_t dev_start;
> > +     dmadev_stop_t dev_stop;
> > +     dmadev_close_t dev_close;
> > +     dmadev_reset_t dev_reset;
> > +     dmadev_vchan_setup_t vchan_setup;
> > +     dmadev_vchan_release_t vchan_release;
> > +     dmadev_stats_get_t stats_get;
> > +     dmadev_stats_reset_t stats_reset;
> > +     dmadev_dump_t dev_dump;
> > +     dmadev_selftest_t dev_selftest;
> > +};
> > +
> > +/**
> > + * @internal
> > + * The data part, with no function pointers, associated with each DMA device.
> > + *
> > + * This structure is safe to place in shared memory to be common among different
> > + * processes in a multi-process configuration.
> > + */
> > +struct rte_dmadev_data {
> > +     uint16_t dev_id; /**< Device [external] identifier. */
> > +     char dev_name[RTE_DMADEV_NAME_MAX_LEN]; /**< Unique identifier name */
> > +     void *dev_private; /**< PMD-specific private data. */
> > +     struct rte_dmadev_conf dev_conf; /**< DMA device configuration. */
> > +     uint8_t dev_started : 1; /**< Device state: STARTED(1)/STOPPED(0). */
> > +     uint64_t reserved[4]; /**< Reserved for future fields */
> > +} __rte_cache_aligned;
> > +
>
> While I generally don't like having reserved space, this is one place where
> it makes sense, so +1 for it here.
>
> > +/**
> > + * @internal
> > + * The generic data structure associated with each DMA device.
> > + *
> > + * The dataplane APIs are located at the beginning of the structure, along
> > + * with the pointer to where all the data elements for the particular device
> > + * are stored in shared memory. This split scheme allows the function pointer
> > + * and driver data to be per-process, while the actual configuration data for
> > + * the device is shared.
> > + */
> > +struct rte_dmadev {
> > +     dmadev_copy_t copy;
> > +     dmadev_copy_sg_t copy_sg;
> > +     dmadev_fill_t fill;
> > +     dmadev_fill_sg_t fill_sg;
> > +     dmadev_submit_t submit;
> > +     dmadev_completed_t completed;
> > +     dmadev_completed_fails_t completed_fails;
> > +     const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD. */
> > +     /** Flag indicating the device is attached: ATTACHED(1)/DETACHED(0). */
> > +     uint8_t attached : 1;
>
> Since it's in the midst of a series of pointers, this 1-bit flag is
> actually using 8-bytes of space. Is it needed. Can we use dev_ops == NULL
> or data == NULL instead to indicate this is a valid entry?
>
> > +     /** Device info which supplied during device initialization. */
> > +     struct rte_device *device;
> > +     struct rte_dmadev_data *data; /**< Pointer to device data. */
>
> If we are to try and minimise cacheline access, we should put this data
> pointer - or even better a copy of data->private pointer - at the top of
> the structure on the same cacheline as datapath operations. For dataplane,
> I can't see any elements of data, except the private pointer being
> accessed, so we would probably get most benefit for having a copy put there
> on init of the dmadev struct.
>
> > +     uint64_t reserved[4]; /**< Reserved for future fields */
> > +} __rte_cache_aligned;
> > +
> > +extern struct rte_dmadev rte_dmadevices[];
> > +
> > +#endif /* _RTE_DMADEV_CORE_H_ */
> > diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
> > new file mode 100644
> > index 0000000..45141f9
> > --- /dev/null
> > +++ b/lib/dmadev/rte_dmadev_pmd.h
> > @@ -0,0 +1,72 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2021 HiSilicon Limited.
> > + */
> > +
> > +#ifndef _RTE_DMADEV_PMD_H_
> > +#define _RTE_DMADEV_PMD_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * RTE DMA Device PMD APIs
> > + *
> > + * Driver facing APIs for a DMA device. These are not to be called directly by
> > + * any application.
> > + */
> > +
> > +#include "rte_dmadev.h"
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * @internal
> > + * Allocates a new dmadev slot for an DMA device and returns the pointer
> > + * to that slot for the driver to use.
> > + *
> > + * @param name
> > + *   DMA device name.
> > + *
> > + * @return
> > + *   A pointer to the DMA device slot case of success,
> > + *   NULL otherwise.
> > + */
> > +__rte_internal
> > +struct rte_dmadev *
> > +rte_dmadev_pmd_allocate(const char *name);
> > +
> > +/**
> > + * @internal
> > + * Release the specified dmadev.
> > + *
> > + * @param dev
> > + *   Device to be released.
> > + *
> > + * @return
> > + *   - 0 on success, negative on error
> > + */
> > +__rte_internal
> > +int
> > +rte_dmadev_pmd_release(struct rte_dmadev *dev);
> > +
> > +/**
> > + * @internal
> > + * Return the DMA device based on the device name.
> > + *
> > + * @param name
> > + *   DMA device name.
> > + *
> > + * @return
> > + *   A pointer to the DMA device slot case of success,
> > + *   NULL otherwise.
> > + */
> > +__rte_internal
> > +struct rte_dmadev *
> > +rte_dmadev_get_device_by_name(const char *name);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_DMADEV_PMD_H_ */
> > diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
> > new file mode 100644
> > index 0000000..0f099e7
> > --- /dev/null
> > +++ b/lib/dmadev/version.map
> > @@ -0,0 +1,40 @@
> > +EXPERIMENTAL {
> > +     global:
> > +
> > +     rte_dmadev_count;
> > +     rte_dmadev_info_get;
> > +     rte_dmadev_configure;
> > +     rte_dmadev_start;
> > +     rte_dmadev_stop;
> > +     rte_dmadev_close;
> > +     rte_dmadev_reset;
> > +     rte_dmadev_vchan_setup;
> > +     rte_dmadev_vchan_release;
> > +     rte_dmadev_stats_get;
> > +     rte_dmadev_stats_reset;
> > +     rte_dmadev_dump;
> > +     rte_dmadev_selftest;
> > +     rte_dmadev_copy;
> > +     rte_dmadev_copy_sg;
> > +     rte_dmadev_fill;
> > +     rte_dmadev_fill_sg;
> > +     rte_dmadev_submit;
> > +     rte_dmadev_completed;
> > +     rte_dmadev_completed_fails;
> > +
> > +     local: *;
> > +};
>
> The elements in the version.map file blocks should be sorted alphabetically.
>
> > +
> > +INTERNAL {
> > +        global:
> > +
> > +     rte_dmadevices;
> > +     rte_dmadev_pmd_allocate;
> > +     rte_dmadev_pmd_release;
> > +     rte_dmadev_get_device_by_name;
> > +
> > +     local:
> > +
> > +     rte_dmadev_is_valid_dev;
> > +};
> > +
> > diff --git a/lib/meson.build b/lib/meson.build
> > index 1673ca4..68d239f 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -60,6 +60,7 @@ libraries = [
> >          'bpf',
> >          'graph',
> >          'node',
> > +        'dmadev',
> >  ]
> >
> >  if is_windows
> > --
> > 2.8.1
> >

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
@ 2021-07-12 16:17  3% Andrew Rybchenko
  2021-07-19  6:58  0% ` Xueming(Steven) Li
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Andrew Rybchenko @ 2021-07-12 16:17 UTC (permalink / raw)
  To: Ajit Khaparde, Somnath Kotur, John Daley, Hyong Youb Kim,
	Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang, Matan Azrad,
	Shahaf Shuler, Viacheslav Ovsiienko, Thomas Monjalon,
	Ferruh Yigit, Xueming Li
  Cc: dev, Viacheslav Galaktionov, stable

From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>

Fix representor port ID search by name if the representor itself
does not provide representors info. Getting a list of representors
from a representor does not make sense. Instead, a parent device
should be used.

To this end, extend the rte_eth_dev_data structure to include the port ID
of the parent device for representors.

Fixes: df7547a6a2cc ("ethdev: add helper function to get representor ID")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
The new field is added into the hole in rte_eth_dev_data structure.
The patch does not change ABI, but extra care is required since ABI
check is disabled for the structure because of the libabigail bug [1].

Potentially it is bad for out-of-tree drivers which implement
representors but do not fill in a new parert_port_id field in
rte_eth_dev_data structure. Do we care?

May be the patch should add lines to release notes, but I'd like
to get initial feedback first.

mlx5 changes should be reviwed by maintainers very carefully, since
we are not sure if we patch it correctly.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060

 drivers/net/bnxt/bnxt_reps.c             |  1 +
 drivers/net/enic/enic_vf_representor.c   |  1 +
 drivers/net/i40e/i40e_vf_representor.c   |  1 +
 drivers/net/ice/ice_dcf_vf_representor.c |  1 +
 drivers/net/ixgbe/ixgbe_vf_representor.c |  1 +
 drivers/net/mlx5/linux/mlx5_os.c         | 11 +++++++++++
 drivers/net/mlx5/windows/mlx5_os.c       | 11 +++++++++++
 lib/ethdev/ethdev_driver.h               |  6 +++---
 lib/ethdev/rte_class_eth.c               |  2 +-
 lib/ethdev/rte_ethdev.c                  |  8 ++++----
 lib/ethdev/rte_ethdev_core.h             |  4 ++++
 11 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c
index bdbad53b7d..902591cd39 100644
--- a/drivers/net/bnxt/bnxt_reps.c
+++ b/drivers/net/bnxt/bnxt_reps.c
@@ -187,6 +187,7 @@ int bnxt_representor_init(struct rte_eth_dev *eth_dev, void *params)
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	eth_dev->data->representor_id = rep_params->vf_id;
+	eth_dev->data->parent_port_id = rep_params->parent_dev->data->port_id;
 
 	rte_eth_random_addr(vf_rep_bp->dflt_mac_addr);
 	memcpy(vf_rep_bp->mac_addr, vf_rep_bp->dflt_mac_addr,
diff --git a/drivers/net/enic/enic_vf_representor.c b/drivers/net/enic/enic_vf_representor.c
index 79dd6e5640..6ee7967ce9 100644
--- a/drivers/net/enic/enic_vf_representor.c
+++ b/drivers/net/enic/enic_vf_representor.c
@@ -662,6 +662,7 @@ int enic_vf_representor_init(struct rte_eth_dev *eth_dev, void *init_params)
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	eth_dev->data->representor_id = vf->vf_id;
+	eth_dev->data->parent_port_id = pf->port_id;
 	eth_dev->data->mac_addrs = rte_zmalloc("enic_mac_addr_vf",
 		sizeof(struct rte_ether_addr) *
 		ENIC_UNICAST_PERFECT_FILTERS, 0);
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
index 0481b55381..865b637585 100644
--- a/drivers/net/i40e/i40e_vf_representor.c
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -514,6 +514,7 @@ i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
 	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	ethdev->data->representor_id = representor->vf_id;
+	ethdev->data->parent_port_id = pf->dev_data->parent_port_id;
 
 	/* Setting the number queues allocated to the VF */
 	ethdev->data->nb_rx_queues = vf->vsi->nb_qps;
diff --git a/drivers/net/ice/ice_dcf_vf_representor.c b/drivers/net/ice/ice_dcf_vf_representor.c
index 970461f3e9..c7cd3fd290 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -418,6 +418,7 @@ ice_dcf_vf_repr_init(struct rte_eth_dev *vf_rep_eth_dev, void *init_param)
 
 	vf_rep_eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 	vf_rep_eth_dev->data->representor_id = repr->vf_id;
+	vf_rep_eth_dev->data->parent_port_id = repr->dcf_eth_dev->data->port_id;
 
 	vf_rep_eth_dev->data->mac_addrs = &repr->mac_addr;
 
diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
index d5b636a194..7a2063849e 100644
--- a/drivers/net/ixgbe/ixgbe_vf_representor.c
+++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
@@ -197,6 +197,7 @@ ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
 
 	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 	ethdev->data->representor_id = representor->vf_id;
+	ethdev->data->parent_port_id = representor->pf_ethdev->data->port_id;
 
 	/* Set representor device ops */
 	ethdev->dev_ops = &ixgbe_vf_representor_dev_ops;
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index be22d9cbd2..5550d30628 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1511,6 +1511,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->representor) {
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 		eth_dev->data->representor_id = priv->representor_id;
+		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+			const struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->representor)
+				continue;
+			eth_dev->data->parent_port_id = port_id;
+			break;
+		}
 	}
 	priv->mp_id.port_id = eth_dev->data->port_id;
 	strlcpy(priv->mp_id.name, MLX5_MP_NAME, RTE_MP_MAX_NAME_LEN);
diff --git a/drivers/net/mlx5/windows/mlx5_os.c b/drivers/net/mlx5/windows/mlx5_os.c
index e30b682822..037c928dc1 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -506,6 +506,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->representor) {
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 		eth_dev->data->representor_id = priv->representor_id;
+		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+			const struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->representor)
+				continue;
+			eth_dev->data->parent_port_id = port_id;
+			break;
+		}
 	}
 	/*
 	 * Store associated network device interface index. This index
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 40e474aa7e..07f6d1f9a4 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
  * For backward compatibility, if no representor info, direct
  * map legacy VF (no controller and pf).
  *
- * @param ethdev
- *  Handle of ethdev port.
+ * @param parent_port_id
+ *  Port ID of the backing device.
  * @param type
  *  Representor type.
  * @param controller
@@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
  */
 __rte_internal
 int
-rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
+rte_eth_representor_id_get(uint16_t parent_port_id,
 			   enum rte_eth_representor_type type,
 			   int controller, int pf, int representor_port,
 			   uint16_t *repr_id);
diff --git a/lib/ethdev/rte_class_eth.c b/lib/ethdev/rte_class_eth.c
index 1fe5fa1f36..e3b7ab9728 100644
--- a/lib/ethdev/rte_class_eth.c
+++ b/lib/ethdev/rte_class_eth.c
@@ -95,7 +95,7 @@ eth_representor_cmp(const char *key __rte_unused,
 		c = i / (np * nf);
 		p = (i / nf) % np;
 		f = i % nf;
-		if (rte_eth_representor_id_get(edev,
+		if (rte_eth_representor_id_get(edev->data->parent_port_id,
 			eth_da.type,
 			eth_da.nb_mh_controllers == 0 ? -1 :
 					eth_da.mh_controllers[c],
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 6ebf52b641..acda1d43fb 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5997,7 +5997,7 @@ rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
 }
 
 int
-rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
+rte_eth_representor_id_get(uint16_t parent_port_id,
 			   enum rte_eth_representor_type type,
 			   int controller, int pf, int representor_port,
 			   uint16_t *repr_id)
@@ -6012,7 +6012,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 		return -EINVAL;
 
 	/* Get PMD representor range info. */
-	ret = rte_eth_representor_info_get(ethdev->data->port_id, NULL);
+	ret = rte_eth_representor_info_get(parent_port_id, NULL);
 	if (ret == -ENOTSUP && type == RTE_ETH_REPRESENTOR_VF &&
 	    controller == -1 && pf == -1) {
 		/* Direct mapping for legacy VF representor. */
@@ -6026,7 +6026,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 	info = calloc(1, size);
 	if (info == NULL)
 		return -ENOMEM;
-	ret = rte_eth_representor_info_get(ethdev->data->port_id, info);
+	ret = rte_eth_representor_info_get(parent_port_id, info);
 	if (ret < 0)
 		goto out;
 
@@ -6045,7 +6045,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 			continue;
 		if (info->ranges[i].id_end < info->ranges[i].id_base) {
 			RTE_LOG(WARNING, EAL, "Port %hu invalid representor ID Range %u - %u, entry %d\n",
-				ethdev->data->port_id, info->ranges[i].id_base,
+				parent_port_id, info->ranges[i].id_base,
 				info->ranges[i].id_end, i);
 			continue;
 
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index edf96de2dc..13cb84b52f 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -185,6 +185,10 @@ struct rte_eth_dev_data {
 			/**< Switch-specific identifier.
 			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
 			 */
+	uint16_t parent_port_id;
+			/**< Port ID of the backing device.
+			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
+			 */
 
 	pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex. */
 	uint64_t reserved_64s[4]; /**< Reserved for future fields */
-- 
2.30.2


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
    2021-07-12 12:05  3%   ` Bruce Richardson
@ 2021-07-12 15:50  3%   ` Bruce Richardson
  2021-07-13  9:07  0%     ` Jerin Jacob
  2021-07-13 14:19  3%   ` Ananyev, Konstantin
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2021-07-12 15:50 UTC (permalink / raw)
  To: Chengwen Feng
  Cc: thomas, ferruh.yigit, jerinj, jerinjacobk, dev, mb, nipun.gupta,
	hemant.agrawal, maxime.coquelin, honnappa.nagarahalli,
	david.marchand, sburla, pkapoor, konstantin.ananyev, liangma

On Sun, Jul 11, 2021 at 05:25:56PM +0800, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>

Hi again,

some further review comments inline.

/Bruce

> ---
>  MAINTAINERS                  |    4 +
>  config/rte_config.h          |    3 +
>  lib/dmadev/meson.build       |    6 +
>  lib/dmadev/rte_dmadev.c      |  560 +++++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 1030 ++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h |  159 +++++++
>  lib/dmadev/rte_dmadev_pmd.h  |   72 +++
>  lib/dmadev/version.map       |   40 ++
>  lib/meson.build              |    1 +

<snip>

> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> new file mode 100644
> index 0000000..8779512
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.h
> @@ -0,0 +1,1030 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + * Copyright(c) 2021 Marvell International Ltd.
> + */
> +
> +#ifndef _RTE_DMADEV_H_
> +#define _RTE_DMADEV_H_
> +
> +/**
> + * @file rte_dmadev.h
> + *
> + * RTE DMA (Direct Memory Access) device APIs.
> + *
> + * The DMA framework is built on the following model:
> + *
> + *     ---------------   ---------------       ---------------
> + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
> + *     | channel     |   | channel     |       | channel     |
> + *     ---------------   ---------------       ---------------
> + *            |                |                      |
> + *            ------------------                      |
> + *                     |                              |
> + *               ------------                    ------------
> + *               |  dmadev  |                    |  dmadev  |
> + *               ------------                    ------------
> + *                     |                              |
> + *            ------------------               ------------------
> + *            | HW-DMA-channel |               | HW-DMA-channel |
> + *            ------------------               ------------------
> + *                     |                              |
> + *                     --------------------------------
> + *                                     |
> + *                           ---------------------
> + *                           | HW-DMA-Controller |
> + *                           ---------------------
> + *
> + * The DMA controller could have multilpe HW-DMA-channels (aka. HW-DMA-queues),
> + * each HW-DMA-channel should be represented by a dmadev.
> + *
> + * The dmadev could create multiple virtual DMA channel, each virtual DMA
> + * channel represents a different transfer context. The DMA operation request
> + * must be submitted to the virtual DMA channel.
> + * E.G. Application could create virtual DMA channel 0 for mem-to-mem transfer
> + *      scenario, and create virtual DMA channel 1 for mem-to-dev transfer
> + *      scenario.
> + *
> + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
> + * PCI/SoC device probing phase performed at EAL initialization time. And could
> + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> + * phase.
> + *
> + * We use 'uint16_t dev_id' as the device identifier of a dmadev, and
> + * 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
> + *
> + * The functions exported by the dmadev API to setup a device designated by its
> + * device identifier must be invoked in the following order:
> + *     - rte_dmadev_configure()
> + *     - rte_dmadev_vchan_setup()
> + *     - rte_dmadev_start()
> + *
> + * Then, the application can invoke dataplane APIs to process jobs.
> + *
> + * If the application wants to change the configuration (i.e. call
> + * rte_dmadev_configure()), it must call rte_dmadev_stop() first to stop the
> + * device and then do the reconfiguration before calling rte_dmadev_start()
> + * again. The dataplane APIs should not be invoked when the device is stopped.
> + *
> + * Finally, an application can close a dmadev by invoking the
> + * rte_dmadev_close() function.
> + *
> + * The dataplane APIs include two parts:
> + *   a) The first part is the submission of operation requests:
> + *        - rte_dmadev_copy()
> + *        - rte_dmadev_copy_sg() - scatter-gather form of copy
> + *        - rte_dmadev_fill()
> + *        - rte_dmadev_fill_sg() - scatter-gather form of fill
> + *        - rte_dmadev_perform() - issue doorbell to hardware
> + *      These APIs could work with different virtual DMA channels which have
> + *      different contexts.
> + *      The first four APIs are used to submit the operation request to the
> + *      virtual DMA channel, if the submission is successful, a uint16_t
> + *      ring_idx is returned, otherwise a negative number is returned.
> + *   b) The second part is to obtain the result of requests:
> + *        - rte_dmadev_completed()
> + *            - return the number of operation requests completed successfully.
> + *        - rte_dmadev_completed_fails()
> + *            - return the number of operation requests failed to complete.

Please rename this to "completed_status" to allow the return of information
other than just errors. As I suggested before, I think this should also be
usable as a slower version of "completed" even in the case where there are
no errors, in that it returns status information for each and every job
rather than just returning as soon as it hits a failure.

> + * + * About the ring_idx which rte_dmadev_copy/copy_sg/fill/fill_sg()
> returned, + * the rules are as follows: + *   a) ring_idx for each
> virtual DMA channel are independent.  + *   b) For a virtual DMA channel,
> the ring_idx is monotonically incremented, + *      when it reach
> UINT16_MAX, it wraps back to zero.

Based on other feedback, I suggest we put in the detail here that: "This
index can be used by applications to track per-job metadata in an
application-defined circular ring, where the ring is a power-of-2 size, and
the indexes are masked appropriately."

> + *   c) The initial ring_idx of a virtual DMA channel is zero, after the device
> + *      is stopped or reset, the ring_idx needs to be reset to zero.
> + *   Example:
> + *      step-1: start one dmadev
> + *      step-2: enqueue a copy operation, the ring_idx return is 0
> + *      step-3: enqueue a copy operation again, the ring_idx return is 1
> + *      ...
> + *      step-101: stop the dmadev
> + *      step-102: start the dmadev
> + *      step-103: enqueue a copy operation, the cookie return is 0
> + *      ...
> + *      step-x+0: enqueue a fill operation, the ring_idx return is 65535
> + *      step-x+1: enqueue a copy operation, the ring_idx return is 0
> + *      ...
> + *
> + * By default, all the non-dataplane functions of the dmadev API exported by a
> + * PMD are lock-free functions which assume to not be invoked in parallel on
> + * different logical cores to work on the same target object.
> + *
> + * The dataplane functions of the dmadev API exported by a PMD can be MT-safe
> + * only when supported by the driver, generally, the driver will reports two
> + * capabilities:
> + *   a) Whether to support MT-safe for the submit/completion API of the same
> + *      virtual DMA channel.
> + *      E.G. one thread do submit operation, another thread do completion
> + *           operation.
> + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VCHAN.
> + *      If driver don't support it, it's up to the application to guarantee
> + *      MT-safe.
> + *   b) Whether to support MT-safe for different virtual DMA channels.
> + *      E.G. one thread do operation on virtual DMA channel 0, another thread
> + *           do operation on virtual DMA channel 1.
> + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN.
> + *      If driver don't support it, it's up to the application to guarantee
> + *      MT-safe.
> + *
> + */

Just to check - do we have hardware that currently supports these
capabilities? For Intel HW, we will only support one virtual channel per
device without any MT-safety guarantees, so won't be setting either of
these flags. If any of these flags are unused in all planned drivers, we
should drop them from the spec until they prove necessary. Idealy,
everything in the dmadev definition should be testable, and features unused
by anyone obviously will be untested.

> +
> +#include <rte_common.h>
> +#include <rte_compat.h>
> +#include <rte_errno.h>
> +#include <rte_memory.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define RTE_DMADEV_NAME_MAX_LEN	RTE_DEV_NAME_MAX_LEN
> +
> +extern int rte_dmadev_logtype;
> +
> +#define RTE_DMADEV_LOG(level, ...) \
> +	rte_log(RTE_LOG_ ## level, rte_dmadev_logtype, "" __VA_ARGS__)
> +
> +/* Macros to check for valid port */
> +#define RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, retval) do { \
> +	if (!rte_dmadev_is_valid_dev(dev_id)) { \
> +		RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> +		return retval; \
> +	} \
> +} while (0)
> +
> +#define RTE_DMADEV_VALID_DEV_ID_OR_RET(dev_id) do { \
> +	if (!rte_dmadev_is_valid_dev(dev_id)) { \
> +		RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> +		return; \
> +	} \
> +} while (0)
> +

Can we avoid using these in the inline functions in this file, and move
them to the _pmd.h which is for internal PMD use only? It would mean we
don't get logging from the key dataplane functions, but I would hope the
return values would provide enough info.

Alternatively, can we keep the logtype definition and first macro and move
the other two to the _pmd.h file.

> +/**
> + * @internal
> + * Validate if the DMA device index is a valid attached DMA device.
> + *
> + * @param dev_id
> + *   DMA device index.
> + *
> + * @return
> + *   - If the device index is valid (true) or not (false).
> + */
> +__rte_internal
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> +
> +/**
> + * rte_dma_sg - can hold scatter DMA operation request
> + */
> +struct rte_dma_sg {
> +	rte_iova_t src;
> +	rte_iova_t dst;
> +	uint32_t length;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the total number of DMA devices that have been successfully
> + * initialised.
> + *
> + * @return
> + *   The total number of usable DMA devices.
> + */
> +__rte_experimental
> +uint16_t
> +rte_dmadev_count(void);
> +
> +/**
> + * The capabilities of a DMA device
> + */
> +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA device support mem-to-mem transfer.

Do we need this? Can we assume that any device appearing as a dmadev can
do mem-to-mem copies, and drop the capability for mem-to-mem and the
capability for copying?

> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_MEM_TO_DEV	(1ull << 1)
> +/**< DMA device support slave mode & mem-to-dev transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_DEV_TO_MEM	(1ull << 2)
> +/**< DMA device support slave mode & dev-to-mem transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_DEV_TO_DEV	(1ull << 3)
> +/**< DMA device support slave mode & dev-to-dev transfer.
> + *

Just to confirm, are there devices currently planned for dmadev that
supports only a subset of these flags? Thinking particularly of the
dev-2-mem and mem-2-dev ones here - do any of the devices we are
considering not support using device memory?
[Again, just want to ensure we aren't adding too much stuff that we don't
need yet]

> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_COPY	(1ull << 4)
> +/**< DMA device support copy ops.
> + *

Suggest dropping this and making it min for dmadev.

> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_FILL	(1ull << 5)
> +/**< DMA device support fill ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_SG		(1ull << 6)
> +/**< DMA device support scatter-list ops.
> + * If device support ops_copy and ops_sg, it means supporting copy_sg ops.
> + * If device support ops_fill and ops_sg, it means supporting fill_sg ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_FENCE		(1ull << 7)
> +/**< DMA device support fence.
> + * If device support fence, then application could set a fence flags when
> + * enqueue operation by rte_dma_copy/copy_sg/fill/fill_sg.
> + * If a operation has a fence flags, it means the operation must be processed
> + * only after all previous operations are completed.
> + *

Is this needed? As I understand it, the Marvell driver doesn't require
fences so providing one is a no-op. Therefore, this flag is probably
unnecessary.

> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_SVA		(1ull << 8)
> +/**< DMA device support SVA which could use VA as DMA address.
> + * If device support SVA then application could pass any VA address like memory
> + * from rte_malloc(), rte_memzone(), malloc, stack memory.
> + * If device don't support SVA, then application should pass IOVA address which
> + * from rte_malloc(), rte_memzone().
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_MT_VCHAN	(1ull << 9)
> +/**< DMA device support MT-safe of a virtual DMA channel.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN	(1ull << 10)
> +/**< DMA device support MT-safe of different virtual DMA channels.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */

As with comments above - let's check that these will actually be used
before we add them.

> +
> +/**
> + * A structure used to retrieve the contextual information of
> + * an DMA device
> + */
> +struct rte_dmadev_info {
> +	struct rte_device *device; /**< Generic Device information */
> +	uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> +	/** Maximum number of virtual DMA channels supported */
> +	uint16_t max_vchans;
> +	/** Maximum allowed number of virtual DMA channel descriptors */
> +	uint16_t max_desc;
> +	/** Minimum allowed number of virtual DMA channel descriptors */
> +	uint16_t min_desc;
> +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> +};

Let's add rte_dmadev_conf struct into this to return the configuration
settings.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve the contextual information of a DMA device.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param[out] dev_info
> + *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
> + *   contextual information of the device.
> + *
> + * @return
> + *   - =0: Success, driver updates the contextual information of the DMA device
> + *   - <0: Error code returned by the driver info get function.
> + *
> + */
> +__rte_experimental
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
> +

Should have "const" on second param.

> +/**
> + * A structure used to configure a DMA device.
> + */
> +struct rte_dmadev_conf {
> +	/** Maximum number of virtual DMA channel to use.
> +	 * This value cannot be greater than the field 'max_vchans' of struct
> +	 * rte_dmadev_info which get from rte_dmadev_info_get().
> +	 */
> +	uint16_t max_vchans;
> +	/** Enable bit for MT-safe of a virtual DMA channel.
> +	 * This bit can be enabled only when the device supports
> +	 * RTE_DMA_DEV_CAPA_MT_VCHAN.
> +	 * @see RTE_DMA_DEV_CAPA_MT_VCHAN
> +	 */
> +	uint8_t enable_mt_vchan : 1;
> +	/** Enable bit for MT-safe of different virtual DMA channels.
> +	 * This bit can be enabled only when the device supports
> +	 * RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN.
> +	 * @see RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN
> +	 */
> +	uint8_t enable_mt_multi_vchan : 1;
> +	uint64_t reserved[2]; /**< Reserved for future fields */
> +};

Drop the reserved fields. ABI versioning is a better way to deal with
adding new fields.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Configure a DMA device.
> + *
> + * This function must be invoked first before any other function in the
> + * API. This function can also be re-invoked when a device is in the
> + * stopped state.
> + *
> + * @param dev_id
> + *   The identifier of the device to configure.
> + * @param dev_conf
> + *   The DMA device configuration structure encapsulated into rte_dmadev_conf
> + *   object.
> + *
> + * @return
> + *   - =0: Success, device configured.
> + *   - <0: Error code returned by the driver configuration function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Start a DMA device.
> + *
> + * The device start step is the last one and consists of setting the DMA
> + * to start accepting jobs.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Success, device started.
> + *   - <0: Error code returned by the driver start function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_start(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Stop a DMA device.
> + *
> + * The device can be restarted with a call to rte_dmadev_start()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Success, device stopped.
> + *   - <0: Error code returned by the driver stop function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stop(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Close a DMA device.
> + *
> + * The device cannot be restarted after this call.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *  - =0: Successfully close device
> + *  - <0: Failure to close device
> + */
> +__rte_experimental
> +int
> +rte_dmadev_close(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Reset a DMA device.
> + *
> + * This is different from cycle of rte_dmadev_start->rte_dmadev_stop in the
> + * sense similar to hard or soft reset.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Successfully reset device.
> + *   - <0: Failure to reset device.
> + *   - (-ENOTSUP): If the device doesn't support this function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_reset(uint16_t dev_id);
> +
> +/**
> + * DMA transfer direction defines.
> + */
> +#define RTE_DMA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA transfer direction - from memory to memory.
> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_MEM_TO_DEV	(1ull << 1)
> +/**< DMA transfer direction - slave mode & from memory to device.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> + * request from ARM memory to x86 host memory.

For clarity, it would be good to specify in the scenario described which
memory is the "mem" and which is the "dev" (I assume SoC memory is "mem"
and x86 host memory is "dev"??)

> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_DEV_TO_MEM	(1ull << 2)
> +/**< DMA transfer direction - slave mode & from device to memory.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> + * request from x86 host memory to ARM memory.
> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_DEV_TO_DEV	(1ull << 3)
> +/**< DMA transfer direction - slave mode & from device to device.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> + * request from x86 host memory to another x86 host memory.
> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_TRANSFER_DIR_ALL	(RTE_DMA_MEM_TO_MEM | \
> +					 RTE_DMA_MEM_TO_DEV | \
> +					 RTE_DMA_DEV_TO_MEM | \
> +					 RTE_DMA_DEV_TO_DEV)
> +
> +/**
> + * enum rte_dma_slave_port_type - slave mode type defines
> + */
> +enum rte_dma_slave_port_type {
> +	/** The slave port is PCIE. */
> +	RTE_DMA_SLAVE_PORT_PCIE = 1,
> +};
> +

As previously mentioned, this needs to be updated to use other terms.
For some suggested alternatives see:
https://doc.dpdk.org/guides-21.05/contributing/coding_style.html#naming

> +/**
> + * A structure used to descript slave port parameters.
> + */
> +struct rte_dma_slave_port_parameters {
> +	enum rte_dma_slave_port_type port_type;
> +	union {
> +		/** For PCIE port */
> +		struct {
> +			/** The physical function number which to use */
> +			uint64_t pf_number : 6;
> +			/** Virtual function enable bit */
> +			uint64_t vf_enable : 1;
> +			/** The virtual function number which to use */
> +			uint64_t vf_number : 8;
> +			uint64_t pasid : 20;
> +			/** The attributes filed in TLP packet */
> +			uint64_t tlp_attr : 3;
> +		};
> +	};
> +};
> +
> +/**
> + * A structure used to configure a virtual DMA channel.
> + */
> +struct rte_dmadev_vchan_conf {
> +	uint8_t direction; /**< Set of supported transfer directions */
> +	/** Number of descriptor for the virtual DMA channel */
> +	uint16_t nb_desc;
> +	/** 1) Used to describes the dev parameter in the mem-to-dev/dev-to-mem
> +	 * transfer scenario.
> +	 * 2) Used to describes the src dev parameter in the dev-to-dev
> +	 * transfer scenario.
> +	 */
> +	struct rte_dma_slave_port_parameters port;
> +	/** Used to describes the dst dev parameters in the dev-to-dev
> +	 * transfer scenario.
> +	 */
> +	struct rte_dma_slave_port_parameters peer_port;
> +	uint64_t reserved[2]; /**< Reserved for future fields */
> +};

Let's drop the reserved fields and use ABI versioning if necesssary in
future.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate and set up a virtual DMA channel.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param conf
> + *   The virtual DMA channel configuration structure encapsulated into
> + *   rte_dmadev_vchan_conf object.
> + *
> + * @return
> + *   - >=0: Allocate success, it is the virtual DMA channel id. This value must
> + *          be less than the field 'max_vchans' of struct rte_dmadev_conf
> +	    which configured by rte_dmadev_configure().

nit: whitespace error here.

> + *   - <0: Error code returned by the driver virtual channel setup function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_vchan_setup(uint16_t dev_id,
> +		       const struct rte_dmadev_vchan_conf *conf);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Release a virtual DMA channel.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel which return by vchan setup.
> + *
> + * @return
> + *   - =0: Successfully release the virtual DMA channel.
> + *   - <0: Error code returned by the driver virtual channel release function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> +
> +/**
> + * rte_dmadev_stats - running statistics.
> + */
> +struct rte_dmadev_stats {
> +	/** Count of operations which were successfully enqueued */
> +	uint64_t enqueued_count;
> +	/** Count of operations which were submitted to hardware */
> +	uint64_t submitted_count;
> +	/** Count of operations which failed to complete */
> +	uint64_t completed_fail_count;
> +	/** Count of operations which successfully complete */
> +	uint64_t completed_count;
> +	uint64_t reserved[4]; /**< Reserved for future fields */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve basic statistics of a or all virtual DMA channel(s).
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel, -1 means all channels.
> + * @param[out] stats
> + *   The basic statistics structure encapsulated into rte_dmadev_stats
> + *   object.
> + *
> + * @return
> + *   - =0: Successfully retrieve stats.
> + *   - <0: Failure to retrieve stats.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stats_get(uint16_t dev_id, int vchan,

vchan as uint16_t rather than int, I think. This would apply to all
dataplane functions. There is no need for a signed vchan value.

> +		     struct rte_dmadev_stats *stats);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Reset basic statistics of a or all virtual DMA channel(s).
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel, -1 means all channels.
> + *
> + * @return
> + *   - =0: Successfully reset stats.
> + *   - <0: Failure to reset stats.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stats_reset(uint16_t dev_id, int vchan);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Dump DMA device info.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param f
> + *   The file to write the output to.
> + *
> + * @return
> + *   0 on success. Non-zero otherwise.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_dump(uint16_t dev_id, FILE *f);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger the dmadev self test.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - 0: Selftest successful.
> + *   - -ENOTSUP if the device doesn't support selftest
> + *   - other values < 0 on failure.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_selftest(uint16_t dev_id);

I don't think this needs to be in the public API, since it should only be
for the autotest app to use. Maybe move the prototype to the _pmd.h (since
we don't have a separate internal header), and then the autotest app can
pick it up from there.

> +
> +#include "rte_dmadev_core.h"
> +
> +/**
> + *  DMA flags to augment operation preparation.
> + *  Used as the 'flags' parameter of rte_dmadev_copy/copy_sg/fill/fill_sg.
> + */
> +#define RTE_DMA_FLAG_FENCE	(1ull << 0)
> +/**< DMA fence flag
> + * It means the operation with this flag must be processed only after all
> + * previous operations are completed.
> + *
> + * @see rte_dmadev_copy()
> + * @see rte_dmadev_copy_sg()
> + * @see rte_dmadev_fill()
> + * @see rte_dmadev_fill_sg()
> + */

As a general comment, I think all these multi-line comments should go
before the item they describe. Comments after should only be used in the
case where the comment fits on the rest of the line after a value.

We also should define the SUBMIT flag as suggested by Jerin, to allow apps
to automatically submit jobs after enqueue.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a copy operation onto the virtual DMA channel.
> + *
> + * This queues up a copy operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The address of the source buffer.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the data to be copied.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> +		uint32_t length, uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->copy)(dev, vchan, src, dst, length, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a scatter list copy operation onto the virtual DMA channel.
> + *
> + * This queues up a scatter list copy operation to be performed by hardware,
> + * but does not trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param sg
> + *   The pointer of scatterlist.
> + * @param sg_len
> + *   The number of scatterlist elements.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct rte_dma_sg *sg,
> +		   uint32_t sg_len, uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(sg, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->copy_sg)(dev, vchan, sg, sg_len, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a fill operation onto the virtual DMA channel.
> + *
> + * This queues up a fill operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param pattern
> + *   The pattern to populate the destination buffer with.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the destination buffer.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> +		rte_iova_t dst, uint32_t length, uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a scatter list fill operation onto the virtual DMA channel.
> + *
> + * This queues up a scatter list fill operation to be performed by hardware,
> + * but does not trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param pattern
> + *   The pattern to populate the destination buffer with.
> + * @param sg
> + *   The pointer of scatterlist.
> + * @param sg_len
> + *   The number of scatterlist elements.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_fill_sg(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> +		   const struct rte_dma_sg *sg, uint32_t sg_len,
> +		   uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(sg, -ENOTSUP);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->fill_sg)(dev, vchan, pattern, sg, sg_len, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger hardware to begin performing enqueued operations.
> + *
> + * This API is used to write the "doorbell" to the hardware to trigger it
> + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + *
> + * @return
> + *   - =0: Successfully trigger hardware.
> + *   - <0: Failure to trigger hardware.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_submit(uint16_t dev_id, uint16_t vchan)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->submit, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->submit)(dev, vchan);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that have been successfully completed.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_cpls
> + *   The maximum number of completed operations that can be processed.
> + * @param[out] last_idx
> + *   The last completed operation's index.
> + *   If not required, NULL can be passed in.
> + * @param[out] has_error
> + *   Indicates if there are transfer error.
> + *   If not required, NULL can be passed in.
> + *
> + * @return
> + *   The number of operations that successfully completed.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed(uint16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
> +		     uint16_t *last_idx, bool *has_error)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +	uint16_t idx;
> +	bool err;
> +
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->completed, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +	if (nb_cpls == 0) {
> +		RTE_DMADEV_LOG(ERR, "Invalid nb_cpls\n");
> +		return -EINVAL;
> +	}
> +#endif
> +
> +	/* Ensure the pointer values are non-null to simplify drivers.
> +	 * In most cases these should be compile time evaluated, since this is
> +	 * an inline function.
> +	 * - If NULL is explicitly passed as parameter, then compiler knows the
> +	 *   value is NULL
> +	 * - If address of local variable is passed as parameter, then compiler
> +	 *   can know it's non-NULL.
> +	 */
> +	if (last_idx == NULL)
> +		last_idx = &idx;
> +	if (has_error == NULL)
> +		has_error = &err;
> +
> +	*has_error = false;
> +	return (*dev->completed)(dev, vchan, nb_cpls, last_idx, has_error);
> +}
> +
> +/**
> + * DMA transfer status code defines
> + */
> +enum rte_dma_status_code {
> +	/** The operation completed successfully */
> +	RTE_DMA_STATUS_SUCCESSFUL = 0,
> +	/** The operation failed to complete due active drop
> +	 * This is mainly used when processing dev_stop, allow outstanding
> +	 * requests to be completed as much as possible.
> +	 */
> +	RTE_DMA_STATUS_ACTIVE_DROP,
> +	/** The operation failed to complete due invalid source address */
> +	RTE_DMA_STATUS_INVALID_SRC_ADDR,
> +	/** The operation failed to complete due invalid destination address */
> +	RTE_DMA_STATUS_INVALID_DST_ADDR,
> +	/** The operation failed to complete due invalid length */
> +	RTE_DMA_STATUS_INVALID_LENGTH,
> +	/** The operation failed to complete due invalid opcode
> +	 * The DMA descriptor could have multiple format, which are
> +	 * distinguished by the opcode field.
> +	 */
> +	RTE_DMA_STATUS_INVALID_OPCODE,
> +	/** The operation failed to complete due bus err */
> +	RTE_DMA_STATUS_BUS_ERROR,
> +	/** The operation failed to complete due data poison */
> +	RTE_DMA_STATUS_DATA_POISION,
> +	/** The operation failed to complete due descriptor read error */
> +	RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
> +	/** The operation failed to complete due device link error
> +	 * Used to indicates that the link error in the mem-to-dev/dev-to-mem/
> +	 * dev-to-dev transfer scenario.
> +	 */
> +	RTE_DMA_STATUS_DEV_LINK_ERROR,
> +	/** Driver specific status code offset
> +	 * Start status code for the driver to define its own error code.
> +	 */
> +	RTE_DMA_STATUS_DRV_SPECIFIC_OFFSET = 0x10000,
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that failed to complete.
> + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_status
> + *   Indicates the size of status array.
> + * @param[out] status
> + *   The error code of operations that failed to complete.
> + *   Some standard error code are described in 'enum rte_dma_status_code'
> + *   @see rte_dma_status_code
> + * @param[out] last_idx
> + *   The last failed completed operation's index.
> + *
> + * @return
> + *   The number of operations that failed to complete.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> +			   const uint16_t nb_status, uint32_t *status,
> +			   uint16_t *last_idx)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(status, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(last_idx, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->completed_fails, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +	if (nb_status == 0) {
> +		RTE_DMADEV_LOG(ERR, "Invalid nb_status\n");
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->completed_fails)(dev, vchan, nb_status, status, last_idx);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DMADEV_H_ */
> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> new file mode 100644
> index 0000000..410faf0
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_core.h
> @@ -0,0 +1,159 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#ifndef _RTE_DMADEV_CORE_H_
> +#define _RTE_DMADEV_CORE_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device internal header.
> + *
> + * This header contains internal data types, that are used by the DMA devices
> + * in order to expose their ops to the class.
> + *
> + * Applications should not use these API directly.
> + *
> + */
> +
> +struct rte_dmadev;
> +
> +/** @internal Used to get device information of a device. */
> +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> +				 struct rte_dmadev_info *dev_info);

First parameter can be "const"

> +/** @internal Used to configure a device. */
> +typedef int (*dmadev_configure_t)(struct rte_dmadev *dev,
> +				  const struct rte_dmadev_conf *dev_conf);
> +
> +/** @internal Used to start a configured device. */
> +typedef int (*dmadev_start_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to stop a configured device. */
> +typedef int (*dmadev_stop_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to close a configured device. */
> +typedef int (*dmadev_close_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to reset a configured device. */
> +typedef int (*dmadev_reset_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to allocate and set up a virtual DMA channel. */
> +typedef int (*dmadev_vchan_setup_t)(struct rte_dmadev *dev,
> +				    const struct rte_dmadev_vchan_conf *conf);
> +
> +/** @internal Used to release a virtual DMA channel. */
> +typedef int (*dmadev_vchan_release_t)(struct rte_dmadev *dev, uint16_t vchan);
> +
> +/** @internal Used to retrieve basic statistics. */
> +typedef int (*dmadev_stats_get_t)(struct rte_dmadev *dev, int vchan,
> +				  struct rte_dmadev_stats *stats);

First parameter can be "const"

> +
> +/** @internal Used to reset basic statistics. */
> +typedef int (*dmadev_stats_reset_t)(struct rte_dmadev *dev, int vchan);
> +
> +/** @internal Used to dump internal information. */
> +typedef int (*dmadev_dump_t)(struct rte_dmadev *dev, FILE *f);
> +

First param "const"

> +/** @internal Used to start dmadev selftest. */
> +typedef int (*dmadev_selftest_t)(uint16_t dev_id);
> +

This looks an outlier taking a dev_id. It should take a rawdev parameter.
Most drivers should not need to implement this anyway, as the main unit
tests should be in "test_dmadev.c" in the autotest app.

> +/** @internal Used to enqueue a copy operation. */
> +typedef int (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vchan,
> +			     rte_iova_t src, rte_iova_t dst,
> +			     uint32_t length, uint64_t flags);
> +
> +/** @internal Used to enqueue a scatter list copy operation. */
> +typedef int (*dmadev_copy_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> +				const struct rte_dma_sg *sg,
> +				uint32_t sg_len, uint64_t flags);
> +
> +/** @internal Used to enqueue a fill operation. */
> +typedef int (*dmadev_fill_t)(struct rte_dmadev *dev, uint16_t vchan,
> +			     uint64_t pattern, rte_iova_t dst,
> +			     uint32_t length, uint64_t flags);
> +
> +/** @internal Used to enqueue a scatter list fill operation. */
> +typedef int (*dmadev_fill_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> +			uint64_t pattern, const struct rte_dma_sg *sg,
> +			uint32_t sg_len, uint64_t flags);
> +
> +/** @internal Used to trigger hardware to begin working. */
> +typedef int (*dmadev_submit_t)(struct rte_dmadev *dev, uint16_t vchan);
> +
> +/** @internal Used to return number of successful completed operations. */
> +typedef uint16_t (*dmadev_completed_t)(struct rte_dmadev *dev, uint16_t vchan,
> +				       const uint16_t nb_cpls,
> +				       uint16_t *last_idx, bool *has_error);
> +
> +/** @internal Used to return number of failed completed operations. */
> +typedef uint16_t (*dmadev_completed_fails_t)(struct rte_dmadev *dev,
> +			uint16_t vchan, const uint16_t nb_status,
> +			uint32_t *status, uint16_t *last_idx);
> +
> +/**
> + * DMA device operations function pointer table
> + */
> +struct rte_dmadev_ops {
> +	dmadev_info_get_t dev_info_get;
> +	dmadev_configure_t dev_configure;
> +	dmadev_start_t dev_start;
> +	dmadev_stop_t dev_stop;
> +	dmadev_close_t dev_close;
> +	dmadev_reset_t dev_reset;
> +	dmadev_vchan_setup_t vchan_setup;
> +	dmadev_vchan_release_t vchan_release;
> +	dmadev_stats_get_t stats_get;
> +	dmadev_stats_reset_t stats_reset;
> +	dmadev_dump_t dev_dump;
> +	dmadev_selftest_t dev_selftest;
> +};
> +
> +/**
> + * @internal
> + * The data part, with no function pointers, associated with each DMA device.
> + *
> + * This structure is safe to place in shared memory to be common among different
> + * processes in a multi-process configuration.
> + */
> +struct rte_dmadev_data {
> +	uint16_t dev_id; /**< Device [external] identifier. */
> +	char dev_name[RTE_DMADEV_NAME_MAX_LEN]; /**< Unique identifier name */
> +	void *dev_private; /**< PMD-specific private data. */
> +	struct rte_dmadev_conf dev_conf; /**< DMA device configuration. */
> +	uint8_t dev_started : 1; /**< Device state: STARTED(1)/STOPPED(0). */
> +	uint64_t reserved[4]; /**< Reserved for future fields */
> +} __rte_cache_aligned;
> +

While I generally don't like having reserved space, this is one place where
it makes sense, so +1 for it here.

> +/**
> + * @internal
> + * The generic data structure associated with each DMA device.
> + *
> + * The dataplane APIs are located at the beginning of the structure, along
> + * with the pointer to where all the data elements for the particular device
> + * are stored in shared memory. This split scheme allows the function pointer
> + * and driver data to be per-process, while the actual configuration data for
> + * the device is shared.
> + */
> +struct rte_dmadev {
> +	dmadev_copy_t copy;
> +	dmadev_copy_sg_t copy_sg;
> +	dmadev_fill_t fill;
> +	dmadev_fill_sg_t fill_sg;
> +	dmadev_submit_t submit;
> +	dmadev_completed_t completed;
> +	dmadev_completed_fails_t completed_fails;
> +	const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD. */
> +	/** Flag indicating the device is attached: ATTACHED(1)/DETACHED(0). */
> +	uint8_t attached : 1;

Since it's in the midst of a series of pointers, this 1-bit flag is
actually using 8-bytes of space. Is it needed. Can we use dev_ops == NULL
or data == NULL instead to indicate this is a valid entry?

> +	/** Device info which supplied during device initialization. */
> +	struct rte_device *device;
> +	struct rte_dmadev_data *data; /**< Pointer to device data. */

If we are to try and minimise cacheline access, we should put this data
pointer - or even better a copy of data->private pointer - at the top of
the structure on the same cacheline as datapath operations. For dataplane,
I can't see any elements of data, except the private pointer being
accessed, so we would probably get most benefit for having a copy put there
on init of the dmadev struct.

> +	uint64_t reserved[4]; /**< Reserved for future fields */
> +} __rte_cache_aligned;
> +
> +extern struct rte_dmadev rte_dmadevices[];
> +
> +#endif /* _RTE_DMADEV_CORE_H_ */
> diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
> new file mode 100644
> index 0000000..45141f9
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_pmd.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + */
> +
> +#ifndef _RTE_DMADEV_PMD_H_
> +#define _RTE_DMADEV_PMD_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device PMD APIs
> + *
> + * Driver facing APIs for a DMA device. These are not to be called directly by
> + * any application.
> + */
> +
> +#include "rte_dmadev.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @internal
> + * Allocates a new dmadev slot for an DMA device and returns the pointer
> + * to that slot for the driver to use.
> + *
> + * @param name
> + *   DMA device name.
> + *
> + * @return
> + *   A pointer to the DMA device slot case of success,
> + *   NULL otherwise.
> + */
> +__rte_internal
> +struct rte_dmadev *
> +rte_dmadev_pmd_allocate(const char *name);
> +
> +/**
> + * @internal
> + * Release the specified dmadev.
> + *
> + * @param dev
> + *   Device to be released.
> + *
> + * @return
> + *   - 0 on success, negative on error
> + */
> +__rte_internal
> +int
> +rte_dmadev_pmd_release(struct rte_dmadev *dev);
> +
> +/**
> + * @internal
> + * Return the DMA device based on the device name.
> + *
> + * @param name
> + *   DMA device name.
> + *
> + * @return
> + *   A pointer to the DMA device slot case of success,
> + *   NULL otherwise.
> + */
> +__rte_internal
> +struct rte_dmadev *
> +rte_dmadev_get_device_by_name(const char *name);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DMADEV_PMD_H_ */
> diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
> new file mode 100644
> index 0000000..0f099e7
> --- /dev/null
> +++ b/lib/dmadev/version.map
> @@ -0,0 +1,40 @@
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_dmadev_count;
> +	rte_dmadev_info_get;
> +	rte_dmadev_configure;
> +	rte_dmadev_start;
> +	rte_dmadev_stop;
> +	rte_dmadev_close;
> +	rte_dmadev_reset;
> +	rte_dmadev_vchan_setup;
> +	rte_dmadev_vchan_release;
> +	rte_dmadev_stats_get;
> +	rte_dmadev_stats_reset;
> +	rte_dmadev_dump;
> +	rte_dmadev_selftest;
> +	rte_dmadev_copy;
> +	rte_dmadev_copy_sg;
> +	rte_dmadev_fill;
> +	rte_dmadev_fill_sg;
> +	rte_dmadev_submit;
> +	rte_dmadev_completed;
> +	rte_dmadev_completed_fails;
> +
> +	local: *;
> +};

The elements in the version.map file blocks should be sorted alphabetically.

> +
> +INTERNAL {
> +        global:
> +
> +	rte_dmadevices;
> +	rte_dmadev_pmd_allocate;
> +	rte_dmadev_pmd_release;
> +	rte_dmadev_get_device_by_name;
> +
> +	local:
> +
> +	rte_dmadev_is_valid_dev;
> +};
> +
> diff --git a/lib/meson.build b/lib/meson.build
> index 1673ca4..68d239f 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -60,6 +60,7 @@ libraries = [
>          'bpf',
>          'graph',
>          'node',
> +        'dmadev',
>  ]
>  
>  if is_windows
> -- 
> 2.8.1
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
  @ 2021-07-12 12:05  3%   ` Bruce Richardson
  2021-07-12 15:50  3%   ` Bruce Richardson
  2021-07-13 14:19  3%   ` Ananyev, Konstantin
  2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2021-07-12 12:05 UTC (permalink / raw)
  To: Chengwen Feng
  Cc: thomas, ferruh.yigit, jerinj, jerinjacobk, dev, mb, nipun.gupta,
	hemant.agrawal, maxime.coquelin, honnappa.nagarahalli,
	david.marchand, sburla, pkapoor, konstantin.ananyev, liangma

On Sun, Jul 11, 2021 at 05:25:56PM +0800, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>

Thanks for this V2.
Some initial (mostly minor) comments on the meson.build and dmadev .c file
below. I'll review the headers in a separate email.

/Bruce

> ---
>  MAINTAINERS                  |    4 +
>  config/rte_config.h          |    3 +
>  lib/dmadev/meson.build       |    6 +
>  lib/dmadev/rte_dmadev.c      |  560 +++++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 1030 ++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h |  159 +++++++
>  lib/dmadev/rte_dmadev_pmd.h  |   72 +++
>  lib/dmadev/version.map       |   40 ++
>  lib/meson.build              |    1 +
>  9 files changed, 1875 insertions(+)
>  create mode 100644 lib/dmadev/meson.build
>  create mode 100644 lib/dmadev/rte_dmadev.c
>  create mode 100644 lib/dmadev/rte_dmadev.h
>  create mode 100644 lib/dmadev/rte_dmadev_core.h
>  create mode 100644 lib/dmadev/rte_dmadev_pmd.h
>  create mode 100644 lib/dmadev/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4347555..0595239 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>  F: app/test/test_rawdev.c
>  F: doc/guides/prog_guide/rawdev.rst
>  
> +DMA device API - EXPERIMENTAL
> +M: Chengwen Feng <fengchengwen@huawei.com>
> +F: lib/dmadev/
> +
>  
>  Memory Pool Drivers
>  -------------------
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 590903c..331a431 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -81,6 +81,9 @@
>  /* rawdev defines */
>  #define RTE_RAWDEV_MAX_DEVS 64
>  
> +/* dmadev defines */
> +#define RTE_DMADEV_MAX_DEVS 64
> +
>  /* ip_fragmentation defines */
>  #define RTE_LIBRTE_IP_FRAG_MAX_FRAG 4
>  #undef RTE_LIBRTE_IP_FRAG_TBL_STAT
> diff --git a/lib/dmadev/meson.build b/lib/dmadev/meson.build
> new file mode 100644
> index 0000000..c918dae
> --- /dev/null
> +++ b/lib/dmadev/meson.build
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2021 HiSilicon Limited.
> +
> +sources = files('rte_dmadev.c')
> +headers = files('rte_dmadev.h', 'rte_dmadev_pmd.h')

If rte_dmadev_pmd.h is only for PMD use, then it should be in
"driver_sdk_headers".

> +indirect_headers += files('rte_dmadev_core.h')
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> new file mode 100644
> index 0000000..8a29abb
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -0,0 +1,560 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#include <ctype.h>
> +#include <inttypes.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include <rte_debug.h>
> +#include <rte_dev.h>
> +#include <rte_eal.h>
> +#include <rte_errno.h>
> +#include <rte_lcore.h>
> +#include <rte_log.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_string_fns.h>
> +
> +#include "rte_dmadev.h"
> +#include "rte_dmadev_pmd.h"
> +
> +RTE_LOG_REGISTER(rte_dmadev_logtype, lib.dmadev, INFO);
> +
> +struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];
> +
> +static const char *MZ_RTE_DMADEV_DATA = "rte_dmadev_data";
> +/* Shared memory between primary and secondary processes. */
> +static struct {
> +	struct rte_dmadev_data data[RTE_DMADEV_MAX_DEVS];
> +} *dmadev_shared_data;
> +
> +static int
> +dmadev_check_name(const char *name)
> +{
> +	size_t name_len;
> +
> +	if (name == NULL) {
> +		RTE_DMADEV_LOG(ERR, "Name can't be NULL\n");
> +		return -EINVAL;
> +	}
> +
> +	name_len = strnlen(name, RTE_DMADEV_NAME_MAX_LEN);
> +	if (name_len == 0) {
> +		RTE_DMADEV_LOG(ERR, "Zero length DMA device name\n");
> +		return -EINVAL;
> +	}
> +	if (name_len >= RTE_DMADEV_NAME_MAX_LEN) {
> +		RTE_DMADEV_LOG(ERR, "DMA device name is too long\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static uint16_t
> +dmadev_find_free_dev(void)
> +{
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (dmadev_shared_data->data[i].dev_name[0] == '\0') {
> +			RTE_ASSERT(rte_dmadevices[i].attached == 0);
> +			return i;
> +		}
> +	}
> +
> +	return RTE_DMADEV_MAX_DEVS;
> +}
> +
> +static struct rte_dmadev*
> +dmadev_allocated(const char *name)

The name implies a boolean lookup for whether a particular dmadev has been
allocated or not. Since this returns a pointer, I think a name like
"dmadev_find" or "dmadev_get" would be more appropriate.

> +{
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if ((rte_dmadevices[i].attached == 1) &&
> +		    (!strcmp(name, rte_dmadevices[i].data->dev_name)))
> +			return &rte_dmadevices[i];
> +	}
> +
> +	return NULL;
> +}
> +
> +static int
> +dmadev_shared_data_prepare(void)
> +{
> +	const struct rte_memzone *mz;
> +
> +	if (dmadev_shared_data == NULL) {
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +			/* Allocate port data and ownership shared memory. */
> +			mz = rte_memzone_reserve(MZ_RTE_DMADEV_DATA,
> +					 sizeof(*dmadev_shared_data),
> +					 rte_socket_id(), 0);
> +		} else {
> +			mz = rte_memzone_lookup(MZ_RTE_DMADEV_DATA);
> +		}
> +		if (mz == NULL)
> +			return -ENOMEM;
> +
> +		dmadev_shared_data = mz->addr;
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +			memset(dmadev_shared_data->data, 0,
> +			       sizeof(dmadev_shared_data->data));
> +	}
> +
> +	return 0;
> +}
> +
> +static struct rte_dmadev *
> +dmadev_allocate(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +	uint16_t dev_id;
> +
> +	dev = dmadev_allocated(name);
> +	if (dev != NULL) {
> +		RTE_DMADEV_LOG(ERR, "DMA device already allocated\n");
> +		return NULL;
> +	}
> +
> +	dev_id = dmadev_find_free_dev();
> +	if (dev_id == RTE_DMADEV_MAX_DEVS) {
> +		RTE_DMADEV_LOG(ERR, "Reached maximum number of DMA devices\n");
> +		return NULL;
> +	}
> +
> +	if (dmadev_shared_data_prepare() != 0) {
> +		RTE_DMADEV_LOG(ERR, "Cannot allocate DMA shared data\n");
> +		return NULL;
> +	}
> +
> +	dev = &rte_dmadevices[dev_id];
> +	dev->data = &dmadev_shared_data->data[dev_id];
> +	dev->data->dev_id = dev_id;
> +	strlcpy(dev->data->dev_name, name, sizeof(dev->data->dev_name));
> +
> +	return dev;
> +}
> +
> +static struct rte_dmadev *
> +dmadev_attach_secondary(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +	uint16_t i;
> +
> +	if (dmadev_shared_data_prepare() != 0) {
> +		RTE_DMADEV_LOG(ERR, "Cannot allocate DMA shared data\n");
> +		return NULL;
> +	}
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (!strcmp(dmadev_shared_data->data[i].dev_name, name))
> +			break;
> +	}
> +	if (i == RTE_DMADEV_MAX_DEVS) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %s is not driven by the primary process\n",
> +			name);
> +		return NULL;
> +	}
> +
> +	dev = &rte_dmadevices[i];
> +	dev->data = &dmadev_shared_data->data[i];
> +	RTE_ASSERT(dev->data->dev_id == i);
> +
> +	return dev;
> +}
> +
> +struct rte_dmadev *
> +rte_dmadev_pmd_allocate(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +
> +	if (dmadev_check_name(name) != 0)
> +		return NULL;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		dev = dmadev_allocate(name);
> +	else
> +		dev = dmadev_attach_secondary(name);
> +
> +	if (dev == NULL)
> +		return NULL;
> +	dev->attached = 1;
> +
> +	return dev;
> +}
> +
> +int
> +rte_dmadev_pmd_release(struct rte_dmadev *dev)
> +{
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	if (dev->attached == 0)
> +		return 0;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		rte_free(dev->data->dev_private);
> +		memset(dev->data, 0, sizeof(struct rte_dmadev_data));
> +	}
> +
> +	memset(dev, 0, sizeof(struct rte_dmadev));
> +	dev->attached = 0;
> +
> +	return 0;
> +}
> +
> +struct rte_dmadev *
> +rte_dmadev_get_device_by_name(const char *name)
> +{
> +	if (dmadev_check_name(name) != 0)
> +		return NULL;
> +	return dmadev_allocated(name);
> +}
> +
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id)
> +{
> +	if (dev_id >= RTE_DMADEV_MAX_DEVS ||
> +	    rte_dmadevices[dev_id].attached == 0)
> +		return false;
> +	return true;
> +}
> +
> +uint16_t
> +rte_dmadev_count(void)
> +{
> +	uint16_t count = 0;
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (rte_dmadevices[i].attached == 1)
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info)
> +{
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(dev_info, -EINVAL);
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_info_get, -ENOTSUP);
> +	memset(dev_info, 0, sizeof(struct rte_dmadev_info));
> +	ret = (*dev->dev_ops->dev_info_get)(dev, dev_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	dev_info->device = dev->device;
> +
> +	return 0;
> +}

Should the info_get function (and the related info structure), not include
in it the parameters passed into the configure function. That way, the user
can query a previously set up configuration. This should be done at the
dmadev level, rather than driver level, since I see the parameters are
already being saved in configure below.

Also, for ABI purposes, I would strongly suggest passing "sizeof(dev_info)"
to the driver in the "dev_info_get" call. When dev_info changes, we can
version rte_dmadev_info_get, but can't version the functions that it calls
in turn. When we add a new field to the struct, the driver functions that
choose to use that new field can check the size of the struct passed to
determine if it's safe to write that new field or not. [So long as field is
added at the end, driver functions not updated for the new field, need no
changes]

> +
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf)
> +{
> +	struct rte_dmadev_info info;
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(dev_conf, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (dev_conf->max_vchans > info.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u configure too many vchans\n", dev_id);

We allow up to 100 characters per line for DPDK code, so these don't need
to be wrapped so aggressively.

> +		return -EINVAL;
> +	}
> +	if (dev_conf->enable_mt_vchan &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MT_VCHAN)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support MT-safe vchan\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (dev_conf->enable_mt_multi_vchan &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support MT-safe multiple vchan\n",
> +			dev_id);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->dev_started != 0) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u must be stopped to allow configuration\n",
> +			dev_id);
> +		return -EBUSY;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_configure, -ENOTSUP);
> +	ret = (*dev->dev_ops->dev_configure)(dev, dev_conf);
> +	if (ret == 0)
> +		memcpy(&dev->data->dev_conf, dev_conf, sizeof(*dev_conf));
> +
> +	return ret;
> +}
> +
> +int
> +rte_dmadev_start(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (dev->data->dev_started != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u already started\n", dev_id);

Maybe make this a warning rather than error.

> +		return 0;
> +	}
> +
> +	if (dev->dev_ops->dev_start == NULL)
> +		goto mark_started;
> +
> +	ret = (*dev->dev_ops->dev_start)(dev);
> +	if (ret != 0)
> +		return ret;
> +
> +mark_started:
> +	dev->data->dev_started = 1;
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_stop(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (dev->data->dev_started == 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u already stopped\n", dev_id);

As above, suggest just warning rather than error.

> +		return 0;
> +	}
> +
> +	if (dev->dev_ops->dev_stop == NULL)
> +		goto mark_stopped;
> +
> +	ret = (*dev->dev_ops->dev_stop)(dev);
> +	if (ret != 0)
> +		return ret;
> +
> +mark_stopped:
> +	dev->data->dev_started = 0;
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_close(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	/* Device must be stopped before it can be closed */
> +	if (dev->data->dev_started == 1) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u must be stopped before closing\n", dev_id);
> +		return -EBUSY;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_close, -ENOTSUP);
> +	return (*dev->dev_ops->dev_close)(dev);
> +}
> +
> +int
> +rte_dmadev_reset(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
> +	/* Reset is not dependent on state of the device */
> +	return (*dev->dev_ops->dev_reset)(dev);
> +}

I would tend to agree with the query as to whether this is needed or not.
Can we perhaps remove for now, and add it back later if it does prove to be
needed. The less code to review and work with for the first version, the
better IMHO. :-)

> +
> +int
> +rte_dmadev_vchan_setup(uint16_t dev_id,
> +		       const struct rte_dmadev_vchan_conf *conf)
> +{
> +	struct rte_dmadev_info info;
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(conf, -EINVAL);

This is confusing, because you are actually doing a parameter check using a
macro named for checking a function. Better to explicitly just check conf
for null.

> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction == 0 ||
> +	    conf->direction & ~RTE_DMA_TRANSFER_DIR_ALL) {
> +		RTE_DMADEV_LOG(ERR, "Device %u direction invalid!\n", dev_id);
> +		return -EINVAL;
> +	}

I wonder should we allow direction == 0, to be the same as all bits set,
or to be all supported bits set?

> +	if (conf->direction & RTE_DMA_MEM_TO_MEM &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MEM_TO_MEM)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support mem2mem transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_MEM_TO_DEV &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MEM_TO_DEV)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support mem2dev transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_DEV_TO_MEM &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_DEV_TO_MEM)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support dev2mem transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_DEV_TO_DEV &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_DEV_TO_DEV)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support dev2dev transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->nb_desc < info.min_desc || conf->nb_desc > info.max_desc) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u number of descriptors invalid\n", dev_id);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vchan_setup, -ENOTSUP);
> +	return (*dev->dev_ops->vchan_setup)(dev, conf);
> +}
> +
> +int
> +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vchan_release, -ENOTSUP);
> +	return (*dev->dev_ops->vchan_release)(dev, vchan);
> +}
> +
> +int
> +rte_dmadev_stats_get(uint16_t dev_id, int vchan, struct rte_dmadev_stats *stats)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(stats, -EINVAL);
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan < -1 || vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_get, -ENOTSUP);
> +	return (*dev->dev_ops->stats_get)(dev, vchan, stats);
> +}
> +
> +int
> +rte_dmadev_stats_reset(uint16_t dev_id, int vchan)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan < -1 || vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_reset, -ENOTSUP);
> +	return (*dev->dev_ops->stats_reset)(dev, vchan);
> +}
> +
> +int
> +rte_dmadev_dump(uint16_t dev_id, FILE *f)
> +{
> +	struct rte_dmadev_info info;
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(f, -EINVAL);
> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	fprintf(f, "DMA Dev %u, '%s' [%s]\n",
> +		dev->data->dev_id,
> +		dev->data->dev_name,
> +		dev->data->dev_started ? "started" : "stopped");
> +	fprintf(f, "  dev_capa: 0x%" PRIx64 "\n", info.dev_capa);
> +	fprintf(f, "  max_vchans_supported: %u\n", info.max_vchans);
> +	fprintf(f, "  max_vchans_configured: %u\n", info.nb_vchans);
> +	fprintf(f, "  MT-safe-configured: vchans: %u multi-vchans: %u\n",
> +		dev->data->dev_conf.enable_mt_vchan,
> +		dev->data->dev_conf.enable_mt_multi_vchan);
> +
> +	if (dev->dev_ops->dev_dump != NULL)
> +		return (*dev->dev_ops->dev_dump)(dev, f);
> +
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_selftest(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_selftest, -ENOTSUP);
> +	return (*dev->dev_ops->dev_selftest)(dev_id);
> +}

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1] doc: update atomic operation deprecation
@ 2021-07-12  8:02  4% Joyce Kong
  2021-07-17 18:47  0% ` Honnappa Nagarahalli
  2021-07-23  9:49  4% ` [dpdk-dev] [PATCH v2] " Joyce Kong
  0 siblings, 2 replies; 200+ results
From: Joyce Kong @ 2021-07-12  8:02 UTC (permalink / raw)
  To: thomas, stephen, honnappa.nagarahalli, ruifeng.wang, mdr; +Cc: dev, nd, stable

Update the incorrect description about atomic operations
with provided wrappers in deprecation doc[1].

[1]https://mails.dpdk.org/archives/dev/2021-July/213333.html

Fixes: 7518c5c4ae6a ("doc: announce adoption of C11 atomic operations semantics")
Cc: stable@dpdk.org

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 doc/guides/rel_notes/deprecation.rst | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..4142315842 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -19,16 +19,16 @@ Deprecation Notices
 
 * rte_atomicNN_xxx: These APIs do not take memory order parameter. This does
   not allow for writing optimized code for all the CPU architectures supported
-  in DPDK. DPDK will adopt C11 atomic operations semantics and provide wrappers
-  using C11 atomic built-ins. These wrappers must be used for patches that
-  need to be merged in 20.08 onwards. This change will not introduce any
-  performance degradation.
+  in DPDK. DPDK has adopted atomic operations semantics. GCC atomic built-ins
+  must be used for patches that need to be merged in 20.08 onwards. This change
+  will not introduce any performance degradation.
 
 * rte_smp_*mb: These APIs provide full barrier functionality. However, many
-  use cases do not require full barriers. To support such use cases, DPDK will
-  adopt C11 barrier semantics and provide wrappers using C11 atomic built-ins.
-  These wrappers must be used for patches that need to be merged in 20.08
-  onwards. This change will not introduce any performance degradation.
+  use cases do not require full barriers. To support such use cases, DPDK has
+  adopted atomic barrier semantics. GCC atomic built-ins and a new wrapper
+  ``rte_atomic_thread_fence`` instead of ``__atomic_thread_fence`` must be
+  used for patches that need to be merged in 20.08 onwards. This change will
+  not introduce any performance degradation.
 
 * lib: will fix extending some enum/define breaking the ABI. There are multiple
   samples in DPDK that enum/define terminated with a ``.*MAX.*`` value which is
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  2021-07-09 19:15  3%     ` Tyler Retzlaff
@ 2021-07-11  7:22  0%       ` Jerin Jacob
  2021-08-03 14:12  3%         ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2021-07-11  7:22 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Ray Kinsella, dpdk-dev, Richardson, Bruce, John McNamara,
	Ferruh Yigit, Thomas Monjalon, David Marchand, Stephen Hemminger

On Sat, Jul 10, 2021 at 12:46 AM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Fri, Jul 09, 2021 at 11:46:54AM +0530, Jerin Jacob wrote:
> > > +
> > > +Promotion to stable
> > > +~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
> > > +once a maintainer and/or the original contributor is satisfied that the API is
> > > +reasonably mature. In exceptional circumstances, should an API still be
> >
> > Is this line with git commit message?
> > Why making an exceptional case? why not make it stable after two years
> > or remove it.
> > My worry is if we make an exception case, it will be difficult to
> > enumerate the exception case.
>
> i think the intent here is to indicate that an api/abi doesn't just
> automatically become stable after a period of time.  there also has to
> be an evaluation by the maintainer / community before making it stable.
>
> so i guess the timer is something that should force that evaluation. as
> a part of that evaluation one would imagine there is justification for
> keeping the api as experimental for longer and if so a rationale as to
> why.

I think, we need to have a deadline. Probably one year timer for evaluation and
two year for max time for decision to make it as stable or remove.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  2021-07-09  6:16  0%   ` Jerin Jacob
@ 2021-07-09 19:15  3%     ` Tyler Retzlaff
  2021-07-11  7:22  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-07-09 19:15 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ray Kinsella, dpdk-dev, Richardson, Bruce, John McNamara,
	Ferruh Yigit, Thomas Monjalon, David Marchand, Stephen Hemminger

On Fri, Jul 09, 2021 at 11:46:54AM +0530, Jerin Jacob wrote:
> > +
> > +Promotion to stable
> > +~~~~~~~~~~~~~~~~~~~
> > +
> > +Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
> > +once a maintainer and/or the original contributor is satisfied that the API is
> > +reasonably mature. In exceptional circumstances, should an API still be
> 
> Is this line with git commit message?
> Why making an exceptional case? why not make it stable after two years
> or remove it.
> My worry is if we make an exception case, it will be difficult to
> enumerate the exception case.

i think the intent here is to indicate that an api/abi doesn't just
automatically become stable after a period of time.  there also has to
be an evaluation by the maintainer / community before making it stable.

so i guess the timer is something that should force that evaluation. as
a part of that evaluation one would imagine there is justification for
keeping the api as experimental for longer and if so a rationale as to
why.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 0/3] Use WFE for spinlock and ring
  @ 2021-07-09 18:39  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-09 18:39 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: dev, david.marchand, bruce.richardson, jerinj, nd,
	honnappa.nagarahalli, ruifeng.wang

07/07/2021 07:48, Ruifeng Wang:
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'[1].
> 
> Use the API for the rte spinlock and ring implementations.
> With the wait until equal APIs being stable, changes will not impact ABI.
> 
> Gavin Hu (1):
>   spinlock: use wfe to reduce contention on aarch64
> 
> Ruifeng Wang (2):
>   ring: use wfe to wait for ring tail update on aarch64
>   build: add option to enable wait until equal

As discussed in the thread, patches 1 & 2 are applied.
The patch 3 (meson option) is rejected.



^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v10 5/8] power: remove thread safety from PMD power API's
    2021-07-09 16:08  3%       ` [dpdk-dev] [PATCH v10 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
@ 2021-07-09 16:08  3%       ` Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-09 16:08 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: konstantin.ananyev, ciara.loftus

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 912fb13b84..b9a3caabf0 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -146,6 +146,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 36e5a65874..bf937acde4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -22,4 +22,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v10 1/8] eal: use callbacks for power monitoring comparison
  @ 2021-07-09 16:08  3%       ` Anatoly Burakov
  2021-07-09 16:08  3%       ` [dpdk-dev] [PATCH v10 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-09 16:08 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 476822b47f..912fb13b84 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -144,6 +144,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index e518409fe5..8489f91f1d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..8d47637892 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx5_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx5_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's
  2021-07-09 15:53  3%     ` [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
@ 2021-07-09 16:00  3%       ` Anatoly Burakov
  0 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-09 16:00 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus, konstantin.ananyev

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 912fb13b84..b9a3caabf0 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -146,6 +146,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 36e5a65874..bf937acde4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -22,4 +22,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison
  2021-07-09 15:53  3%     ` [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
@ 2021-07-09 16:00  3%       ` Anatoly Burakov
  0 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-09 16:00 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 476822b47f..912fb13b84 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -144,6 +144,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index e518409fe5..8489f91f1d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..8d47637892 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx5_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx5_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's
    2021-07-09 15:53  3%     ` [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
@ 2021-07-09 15:53  3%     ` Anatoly Burakov
  2021-07-09 16:00  3%       ` Anatoly Burakov
    2 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2021-07-09 15:53 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus, konstantin.ananyev

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 912fb13b84..b9a3caabf0 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -146,6 +146,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 36e5a65874..bf937acde4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -22,4 +22,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison
  @ 2021-07-09 15:53  3%     ` Anatoly Burakov
  2021-07-09 16:00  3%       ` Anatoly Burakov
  2021-07-09 15:53  3%     ` [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2021-07-09 15:53 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 476822b47f..912fb13b84 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -144,6 +144,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index e518409fe5..8489f91f1d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..8d47637892 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx5_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx5_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file
  @ 2021-07-09 15:50  4%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-09 15:50 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, stephen, ktraynor, bruce.richardson, Ferruh Yigit, Neil Horman

25/06/2021 10:08, Ferruh Yigit:
> On 6/22/2021 4:50 PM, Ray Kinsella wrote:
> > Update to ABI MAINTAINERS.
> > 
> > Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> > ---
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> >  ABI Policy & Versioning
> >  M: Ray Kinsella <mdr@ashroe.eu>
> > -M: Neil Horman <nhorman@tuxdriver.com>
> >  F: lib/eal/include/rte_compat.h
> >  F: lib/eal/include/rte_function_versioning.h
> >  F: doc/guides/contributing/abi_*.rst
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Tried to reach out Neil multiple times for ABI issues without success.

Acked-by: Thomas Monjalon <thomas@monjalon.net>

Applied



^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 3/3] bitrate: promote rte_stats_bitrate_free() to stable
  @ 2021-07-09 15:19  3% ` Kevin Traynor
  0 siblings, 0 replies; 200+ results
From: Kevin Traynor @ 2021-07-09 15:19 UTC (permalink / raw)
  To: dev; +Cc: mdr, Kevin Traynor, Hemant Agrawal

rte_stats_bitrate_free() has been in DPDK since 20.11.

Its signature is very basic as it just frees an opaque
data struct allocated in rte_stats_bitrate_create()
and returns void.

It's unlikely that such a basic signature would need to change
so might as well promote it to stable for the next major ABI.

Cc: Hemant Agrawal <hemant.agrawal@nxp.com>

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/bitratestats/rte_bitrate.h | 3 ---
 lib/bitratestats/version.map   | 4 ++--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/bitratestats/rte_bitrate.h b/lib/bitratestats/rte_bitrate.h
index fcd1564ddc..e494389b95 100644
--- a/lib/bitratestats/rte_bitrate.h
+++ b/lib/bitratestats/rte_bitrate.h
@@ -8,6 +8,4 @@
 #include <stdint.h>
 
-#include <rte_compat.h>
-
 #ifdef __cplusplus
 extern "C" {
@@ -36,5 +34,4 @@ struct rte_stats_bitrates *rte_stats_bitrate_create(void);
  *   Pointer allocated by rte_stats_bitrate_create()
  */
-__rte_experimental
 void rte_stats_bitrate_free(struct rte_stats_bitrates *bitrate_data);
 
diff --git a/lib/bitratestats/version.map b/lib/bitratestats/version.map
index 152730bb4e..a14d21ebba 100644
--- a/lib/bitratestats/version.map
+++ b/lib/bitratestats/version.map
@@ -9,7 +9,7 @@ DPDK_21 {
 };
 
-EXPERIMENTAL {
+DPDK_22 {
 	global:
 
 	rte_stats_bitrate_free;
-};
+} DPDK_21;
-- 
2.31.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  @ 2021-07-09  6:16  0%   ` Jerin Jacob
  2021-07-09 19:15  3%     ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2021-07-09  6:16 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dpdk-dev, Richardson, Bruce, John McNamara, roretzla,
	Ferruh Yigit, Thomas Monjalon, David Marchand, Stephen Hemminger

On Thu, Jul 1, 2021 at 4:08 PM Ray Kinsella <mdr@ashroe.eu> wrote:
>
> Clarifying the ABI policy on the promotion of experimental APIS to stable.
> We have a fair number of APIs that have been experimental for more than
> 2 years. This policy amendment indicates that these APIs should be
> promoted or removed, or should at least form a conservation between the
> maintainer and original contributor.
>
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
> v2: addressing comments on abi expiry from Tyler Retzlaff.
> v3: addressing typos in the git commit message
>
>  doc/guides/contributing/abi_policy.rst | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
> index 4ad87dbfed..840c295e5d 100644
> --- a/doc/guides/contributing/abi_policy.rst
> +++ b/doc/guides/contributing/abi_policy.rst
> @@ -26,9 +26,10 @@ General Guidelines
>     symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
>  #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
>     once approved these will form part of the next ABI version.
> -#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
> -   be changed or removed without prior notice, as they are not considered part
> -   of an ABI version.
> +#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
> +   changed or removed without prior notice, as they are not considered part of
> +   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
> +   is not an indefinite state.
>  #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
>     support for hardware which was previously supported, should be treated as an
>     ABI change.
> @@ -358,3 +359,18 @@ Libraries
>  Libraries marked as ``experimental`` are entirely not considered part of an ABI
>  version.
>  All functions in such libraries may be changed or removed without prior notice.
> +
> +Promotion to stable
> +~~~~~~~~~~~~~~~~~~~
> +
> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
> +once a maintainer and/or the original contributor is satisfied that the API is
> +reasonably mature. In exceptional circumstances, should an API still be

Is this line with git commit message?
Why making an exceptional case? why not make it stable after two years
or remove it.
My worry is if we make an exception case, it will be difficult to
enumerate the exception case.

> +classified as ``experimental`` after two years and is without any prospect of
> +becoming part of the stable API. The API will then become a candidate for
> +removal, to avoid the acculumation of abandoned symbols.
> +
> +Should an API's Binary Interface change during the two year period, usually due
> +to a direct change in the to API's signature. It is reasonable for the expiry
> +clock to reset. The promotion or removal of symbols will typically form part of
> +a conversation between the maintainer and the original contributor.
> --
> 2.26.2
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows
  2021-07-08 20:49  3% ` Dmitry Kozlyuk
@ 2021-07-09  1:03  2%   ` Tyler Retzlaff
  2021-07-16  9:40  4%     ` Dmitry Kozlyuk
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-07-09  1:03 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: dev, thomas

On Thu, Jul 08, 2021 at 11:49:53PM +0300, Dmitry Kozlyuk wrote:
> Hi Tyler,
> 
> 2021-07-08 12:21 (UTC-0700), Tyler Retzlaff:
> > hi folks,
> > 
> > we would like to submit a a patch series that makes dll/dso for dpdk
> > work on windows. there are two differences in the windows platform that
> > would need to be address through enhancements to dpdk.
> > 
> > (1) windows dynamic objects don't export sufficient information for
> >     tls variables and the windows loader and runtime would need to be
> >     enhanced in order to perform runtime linking. [1][2]
> 
> When will the new loader be available?

the solution i have prototyped does not directly export the tls variables
and instead relies on exports of tls offsets within a module.  no loader
change or new os is required.

> Will it be ported to Server 2019?

not necessary (as per above)

> Will this functionality require compiler support

the prototype was developed using windows clang, mingw code compiles but
i did not try to run it. i suspect it is okay though haven't examine any
side-effects when using emul tls like mingw does. anyway mingw dll's
don't work now and it probably shouldn't block them being available with
clang.

> (you mention that accessing such variables will be "non-trivial")?

the solution involves exporting offsets that then allow explicit tls
accesses relative to the gs segment. it's non-trivial in the sense that
none of the normal explicit tls functions in windows are used and the
compiler doesn't generate the code for implicit tls access. the overhead
is relatively tolerable (one or two additional dereferences).

>  
> > (2) importing exported data symbols from a dll/dso on windows requires
> >     that the symbol be decorated with dllimport. optionally loading
> >     performance of dll/dso is also further improved by decorating
> >     exported function symbols. [3]
> 
> Does it affect ABI?

the data symbols are already part of the abi for linux. this just allows
them to be properly accessed when exported from dll on windows.
surprisingly lld-link doesn't fail when building dll's now which it should
in the absence of a __declspec(dllimport) ms link would.

on windows now the tls variables are exported but not useful with this
change we would choose not to export them at all and each exported tls
variable would be replaced with a new variable.

one nit (which we will get separate feedback on) is how to export
symbols only on windows (and don't export them on linux) because similar
to the tls variables linux has no use for my new variables.

> 
> It is also a huge code change, although a mechanical one.
> Is it required? All exported symbols are listed in .map/def, after all.

if broad sweeping mechanical change is a sensitive issue we can limit
the change to just the data symbols which are required. but keeping in
mind there is a penalty on load time when the function symbols are not
decorated. ultimately we would like them all properly decorated but we
don't need to push it now since we're just trying to enable the
functionality.

> 
> > for (1) a novel approach is proposed where a new set of per_lcore
> > macros are introduced and used to replace existing macros with some
> > adjustment to declaration/definition usage is made. of note
> > 
> >     * on linux the new macros would expand compatibly to maintain abi
> >       of existing exported tls variables. since windows dynamic
> >       linking has never worked there is no compatibility concern for
> >       existing windows binaries.
> > 
> >     * the existing macros would be retained for api compatibility
> >       potentially with the intent of deprecating them at a later time.
> > 
> >     * new macros would be "internal" to dpdk they should not be
> >       available to applications as a part of the stable api.
> > 
> > for (2) we would propose the introduction and use of two macros to
> > allow decoration of exported data symbols. these macro would be or
> > similarly named __rte_import and __rte_export. of note
> > 
> >     * on linux the macros would expand empty but optionally
> >       in the future__rte_export could be enhanced to expand to
> >       __attribute__((visibility("default"))) enabling the use of gcc
> >       -fvisibility=hidden in dpdk to improve dso load times. [4][5]
> > 
> >     * on windows the macros would trivially expand to
> >       __declspec(dllimport) and __declspec(dllexport)
> > 
> >     * library meson.build files would need to define a preprocessor
> >       knob to control decoration internal/external to libraries
> >       exporting data symbols to ensure optimal code generation for
> >       accesses.
> 
> Either you mean a macro to switch __rte_export between dllimport/dllexport
> or I don't understand this point. BTW, what will __rte_export be for static
> build?

there are two import cases that a library like eal has when it exports a
data symbol.

e.g. if eal exports a variable where the variable is used both within
eal and outside of eal you want different expansions of __rte_import.

when consuming the variable within eal if __declspec(import) is used you
will get less-optimal codegen (because the code is generated for
imported access). however outside of eal you need the __declspec(import)
to generate the correct code to access the exported data.

i haven't looked into how gcc/ld deals with this. maybe ld is just
smarter and figures out when to generate the optimal code.

static build doesn't really get negatively impacted by __rte_export but
when statically linking the ms linker will complain with warnings that
can be suppressed without harm.

it's still something that is on my mind (and i don't want to make it an
issue that blocks this proposal) but i'm starting to lean toward a build
time option where either static or dynamic build is requested instead of
cobbling both together out of the same build product.  but that is
really off topic for this change.
> 
> > 
> > the following is the base list of proposed macro additions for the new
> > per_lcore macros a new header is proposed named `rte_tls.h'
> 
> When rte_thread_key*() family of functions was introduced as rte_tls_*(),
> Jerin objected that there's a confusion with Transport Layer Security.
> How about RTE_THREAD_VAR, etc?

no objection, one of the reason i posted the set of macros from the
prototype was so people could offer up suggestions on better namespace.

> 
> > __rte_export
> > __rte_import
> > 
> >   have already been explained in (2)
> > 
> > __rte_thread_local
> > 
> >   is trivially expanded to __thread or _Thread_local or
> >   __declspec(thread) as appropriate.
> > 
> > RTE_DEFINE_TLS(vartype, varname, value)
> > RTE_DEFINE_TLS_EXPORT(vartype, varname, value)
> > RTE_DECLARE_TLS(vartype, varname)
> > RTE_DECLARE_TLS_IMPORT(vartype, varname)
> > 
> >   are roughly equivalent to RTE_DEFINE_PER_LCORE and
> >   RTE_DECLARE_PER_LCORE where the difference in the new macros are.
> > 
> >     * separate macros for exported vs non-exported variables.
> 
> Is it necessary, i.e. can' RTE_DECLARE/DEFINE_TLS compose with other
> attributes, like __rte_experimental and __rte_deprecated?

it's necessary in so far as the existing per-lcore variables that are
not imported/export can still have storage class specifier like static
applied without jumping through hoops.

i tried for some time to have a single declare/define macro but dealing
with 'static' being used made the problem hard. i can't just "shut-off"
the nested __rte_export/__rte_import expansion if static is put in front
of the macro or parameterized.

> 
> >     * DEFINE macros require initialization value as a parameter instead
> >       of the assignment usage. `RTE_DEFINE_PER_LCORE(t, n) = x;' there
> >       was no reasonable way to expand the windows variant of the macro
> >       to maintain assignment so it was parameterized to allow
> >       flexibility.
> > 
> > RTE_TLS(varname)
> > 
> >   is the equivalent of RTE_PER_LCORE to allow r/w access to the variable
> >   on linux the expansion is the same for windows it is non-trivial.
> > [...]


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows
  2021-07-08 19:21  3% [dpdk-dev] RFC enabling dll/dso for dpdk on windows Tyler Retzlaff
@ 2021-07-08 20:49  3% ` Dmitry Kozlyuk
  2021-07-09  1:03  2%   ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Dmitry Kozlyuk @ 2021-07-08 20:49 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, thomas

Hi Tyler,

2021-07-08 12:21 (UTC-0700), Tyler Retzlaff:
> hi folks,
> 
> we would like to submit a a patch series that makes dll/dso for dpdk
> work on windows. there are two differences in the windows platform that
> would need to be address through enhancements to dpdk.
> 
> (1) windows dynamic objects don't export sufficient information for
>     tls variables and the windows loader and runtime would need to be
>     enhanced in order to perform runtime linking. [1][2]

When will the new loader be available?
Will it be ported to Server 2019?
Will this functionality require compiler support
(you mention that accessing such variables will be "non-trivial")?
 
> (2) importing exported data symbols from a dll/dso on windows requires
>     that the symbol be decorated with dllimport. optionally loading
>     performance of dll/dso is also further improved by decorating
>     exported function symbols. [3]

Does it affect ABI?

It is also a huge code change, although a mechanical one.
Is it required? All exported symbols are listed in .map/def, after all.

> for (1) a novel approach is proposed where a new set of per_lcore
> macros are introduced and used to replace existing macros with some
> adjustment to declaration/definition usage is made. of note
> 
>     * on linux the new macros would expand compatibly to maintain abi
>       of existing exported tls variables. since windows dynamic
>       linking has never worked there is no compatibility concern for
>       existing windows binaries.
> 
>     * the existing macros would be retained for api compatibility
>       potentially with the intent of deprecating them at a later time.
> 
>     * new macros would be "internal" to dpdk they should not be
>       available to applications as a part of the stable api.
> 
> for (2) we would propose the introduction and use of two macros to
> allow decoration of exported data symbols. these macro would be or
> similarly named __rte_import and __rte_export. of note
> 
>     * on linux the macros would expand empty but optionally
>       in the future__rte_export could be enhanced to expand to
>       __attribute__((visibility("default"))) enabling the use of gcc
>       -fvisibility=hidden in dpdk to improve dso load times. [4][5]
> 
>     * on windows the macros would trivially expand to
>       __declspec(dllimport) and __declspec(dllexport)
> 
>     * library meson.build files would need to define a preprocessor
>       knob to control decoration internal/external to libraries
>       exporting data symbols to ensure optimal code generation for
>       accesses.

Either you mean a macro to switch __rte_export between dllimport/dllexport
or I don't understand this point. BTW, what will __rte_export be for static
build?

> 
> the following is the base list of proposed macro additions for the new
> per_lcore macros a new header is proposed named `rte_tls.h'

When rte_thread_key*() family of functions was introduced as rte_tls_*(),
Jerin objected that there's a confusion with Transport Layer Security.
How about RTE_THREAD_VAR, etc?

> __rte_export
> __rte_import
> 
>   have already been explained in (2)
> 
> __rte_thread_local
> 
>   is trivially expanded to __thread or _Thread_local or
>   __declspec(thread) as appropriate.
> 
> RTE_DEFINE_TLS(vartype, varname, value)
> RTE_DEFINE_TLS_EXPORT(vartype, varname, value)
> RTE_DECLARE_TLS(vartype, varname)
> RTE_DECLARE_TLS_IMPORT(vartype, varname)
> 
>   are roughly equivalent to RTE_DEFINE_PER_LCORE and
>   RTE_DECLARE_PER_LCORE where the difference in the new macros are.
> 
>     * separate macros for exported vs non-exported variables.

Is it necessary, i.e. can' RTE_DECLARE/DEFINE_TLS compose with other
attributes, like __rte_experimental and __rte_deprecated?

>     * DEFINE macros require initialization value as a parameter instead
>       of the assignment usage. `RTE_DEFINE_PER_LCORE(t, n) = x;' there
>       was no reasonable way to expand the windows variant of the macro
>       to maintain assignment so it was parameterized to allow
>       flexibility.
> 
> RTE_TLS(varname)
> 
>   is the equivalent of RTE_PER_LCORE to allow r/w access to the variable
>   on linux the expansion is the same for windows it is non-trivial.
> [...]


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] RFC enabling dll/dso for dpdk on windows
@ 2021-07-08 19:21  3% Tyler Retzlaff
  2021-07-08 20:49  3% ` Dmitry Kozlyuk
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-07-08 19:21 UTC (permalink / raw)
  To: dev, dmitry.kozliuk, thomas

hi folks,

we would like to submit a a patch series that makes dll/dso for dpdk
work on windows. there are two differences in the windows platform that
would need to be address through enhancements to dpdk.

(1) windows dynamic objects don't export sufficient information for
    tls variables and the windows loader and runtime would need to be
    enhanced in order to perform runtime linking. [1][2]

(2) importing exported data symbols from a dll/dso on windows requires
    that the symbol be decorated with dllimport. optionally loading
    performance of dll/dso is also further improved by decorating
    exported function symbols. [3]

for (1) a novel approach is proposed where a new set of per_lcore
macros are introduced and used to replace existing macros with some
adjustment to declaration/definition usage is made. of note

    * on linux the new macros would expand compatibly to maintain abi
      of existing exported tls variables. since windows dynamic
      linking has never worked there is no compatibility concern for
      existing windows binaries.

    * the existing macros would be retained for api compatibility
      potentially with the intent of deprecating them at a later time.

    * new macros would be "internal" to dpdk they should not be
      available to applications as a part of the stable api.

for (2) we would propose the introduction and use of two macros to
allow decoration of exported data symbols. these macro would be or
similarly named __rte_import and __rte_export. of note

    * on linux the macros would expand empty but optionally
      in the future__rte_export could be enhanced to expand to
      __attribute__((visibility("default"))) enabling the use of gcc
      -fvisibility=hidden in dpdk to improve dso load times. [4][5]

    * on windows the macros would trivially expand to
      __declspec(dllimport) and __declspec(dllexport)

    * library meson.build files would need to define a preprocessor
      knob to control decoration internal/external to libraries
      exporting data symbols to ensure optimal code generation for
      accesses.

the following is the base list of proposed macro additions for the new
per_lcore macros a new header is proposed named `rte_tls.h'

__rte_export
__rte_import

  have already been explained in (2)

__rte_thread_local

  is trivially expanded to __thread or _Thread_local or
  __declspec(thread) as appropriate.

RTE_DEFINE_TLS(vartype, varname, value)
RTE_DEFINE_TLS_EXPORT(vartype, varname, value)
RTE_DECLARE_TLS(vartype, varname)
RTE_DECLARE_TLS_IMPORT(vartype, varname)

  are roughly equivalent to RTE_DEFINE_PER_LCORE and
  RTE_DECLARE_PER_LCORE where the difference in the new macros are.

    * separate macros for exported vs non-exported variables.

    * DEFINE macros require initialization value as a parameter instead
      of the assignment usage. `RTE_DEFINE_PER_LCORE(t, n) = x;' there
      was no reasonable way to expand the windows variant of the macro
      to maintain assignment so it was parameterized to allow
      flexibility.

RTE_TLS(varname)

  is the equivalent of RTE_PER_LCORE to allow r/w access to the variable
  on linux the expansion is the same for windows it is non-trivial.

we look forward to feedback on this proposal, once we have initial
feedback the series will be submitted where further review can take
place.

thanks

1.  https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2492?view=msvc-160
2. https://docs.microsoft.com/en-us/windows/win32/debug/pe-format
3.  https://docs.microsoft.com/en-us/cpp/build/importing-into-an-application-using-declspec-dllimport?view=msvc-160
4. https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Function-Attributes.html
5. https://gcc.gnu.org/wiki/Visibility


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] Use WFE for spinlock and ring
  @ 2021-07-08 16:58  0%       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2021-07-08 16:58 UTC (permalink / raw)
  To: Ruifeng Wang, Stephen Hemminger
  Cc: dev, david.marchand, thomas, jerinj, nd, Honnappa Nagarahalli, nd

<snip>

> >
> > On Sun, 25 Apr 2021 05:56:51 +0000
> > Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> >
> > > The rte_wait_until_equal_xxx APIs abstract the functionality of
> > > 'polling for a memory location to become equal to a given value'[1].
> > >
> > > Use the API for the rte spinlock and ring implementations.
> > > With the wait until equal APIs being stable, changes will not impact ABI.
> > >
> > > [1] http://patches.dpdk.org/cover/62703/
> > >
> > > v3:
> > > Series rebased. (David)
> > >
> > > Gavin Hu (1):
> > >   spinlock: use wfe to reduce contention on aarch64
> > >
> > > Ruifeng Wang (1):
> > >   ring: use wfe to wait for ring tail update on aarch64
> > >
> > >  lib/eal/include/generic/rte_spinlock.h | 4 ++--
> > >  lib/ring/rte_ring_c11_pvt.h            | 4 ++--
> > >  lib/ring/rte_ring_generic_pvt.h        | 3 +--
> > >  3 files changed, 5 insertions(+), 6 deletions(-)
> > >
> >
> > Other places that should use WFE:
> Thank you Stephen for looking into this.
> 
> >
> > rte_mcslock.h:rte_mcslock_lock()
> Existing API can be used in this one.
> 
> > rte_mcslock_unlock:rte_mcslock_unlock()
> This one needs rte_wait_while_xxx variant.
> 
> >
> > rte_pflock.h:rte_pflock_lock()
> > rte_rwlock.h:rte_rwlock_read_lock()
> > rte_rwlock.h:rte_rwlock_write_lock()
> These occurrences have extra logic (AND, conditional branch, CAS) in the loop.
> I'm not sure generic API can be abstracted from these use cases.
I think it is possible to create additional abstractions to address these cases.

> 
> >
> >
> > You should also introduce rte_wait_while_XXX variants to handle some
> > of these cases.
> >
> 
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison
  2021-07-08 14:13  3%   ` [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-07-08 16:56  0%     ` McDaniel, Timothy
  0 siblings, 0 replies; 200+ results
From: McDaniel, Timothy @ 2021-07-08 16:56 UTC (permalink / raw)
  To: Burakov, Anatoly, dev, Xing, Beilei, Wu, Jingjing, Yang, Qiming,
	Zhang, Qi Z, Wang, Haiyue, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Richardson, Bruce, Ananyev, Konstantin
  Cc: Loftus, Ciara, Hunt, David



> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Thursday, July 8, 2021 9:14 AM
> To: dev@dpdk.org; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf
> Shuler <shahafs@nvidia.com>; Viacheslav Ovsiienko <viacheslavo@nvidia.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>; Hunt, David <david.hunt@intel.com>
> Subject: [PATCH v8 1/7] power_intrinsics: use callbacks for comparison
> 
> Previously, the semantics of power monitor were such that we were
> checking current value against the expected value, and if they matched,
> then the sleep was aborted. This is somewhat inflexible, because it only
> allowed us to check for a specific value in a specific way.
> 
> This commit replaces the comparison with a user callback mechanism, so
> that any PMD (or other code) using `rte_power_monitor()` can define
> their own comparison semantics and decision making on how to detect the
> need to abort the entering of power optimized state.
> 
> Existing implementations are adjusted to follow the new semantics.
> 
> Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> 
> Notes:
>     v4:
>     - Return error if callback is set to NULL
>     - Replace raw number with a macro in monitor condition opaque data
> 
>     v2:
>     - Use callback mechanism for more flexibility
>     - Address feedback from Konstantin
> 
>  doc/guides/rel_notes/release_21_08.rst        |  2 ++
>  drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
>  drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
>  drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
>  drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
>  drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
>  drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
>  .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
>  lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
>  9 files changed, 122 insertions(+), 44 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_21_08.rst
> b/doc/guides/rel_notes/release_21_08.rst
> index c92e016783..65910de348 100644
> --- a/doc/guides/rel_notes/release_21_08.rst
> +++ b/doc/guides/rel_notes/release_21_08.rst
> @@ -135,6 +135,8 @@ API Changes
>  * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
>    truncation.
> 
> +* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
> +
> 
>  ABI Changes
>  -----------
> diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> index eca183753f..252bbd8d5e 100644
> --- a/drivers/event/dlb2/dlb2.c
> +++ b/drivers/event/dlb2/dlb2.c
> @@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port,
> int num)
>  	}
>  }
> 
> +#define CLB_MASK_IDX 0
> +#define CLB_VAL_IDX 1
> +static int
> +dlb2_monitor_callback(const uint64_t val,
> +		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> +	/* abort if the value matches */
> +	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 :
> 0;
> +}
> +
>  static inline int
>  dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
>  		  struct dlb2_eventdev_port *ev_port,
> @@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
>  			expected_value = 0;
> 
>  		pmc.addr = monitor_addr;
> -		pmc.val = expected_value;
> -		pmc.mask = qe_mask.raw_qe[1];
> +		/* store expected value and comparison mask in opaque data */
> +		pmc.opaque[CLB_VAL_IDX] = expected_value;
> +		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
> +		/* set up callback */
> +		pmc.fn = dlb2_monitor_callback;
>  		pmc.size = sizeof(uint64_t);
> 
>  		rte_power_monitor(&pmc, timeout + start_ticks);
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index 8d65f287f4..65f325ede1 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -81,6 +81,18 @@
>  #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
>  		(PKT_TX_OFFLOAD_MASK ^
> I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
> 
> +static int
> +i40e_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ]
> __rte_unused)
> +{
> +	const uint64_t m = rte_cpu_to_le_64(1 <<
> I40E_RX_DESC_STATUS_DD_SHIFT);
> +	/*
> +	 * we expect the DD bit to be set to 1 if this descriptor was already
> +	 * written to.
> +	 */
> +	return (value & m) == m ? -1 : 0;
> +}
> +
>  int
>  i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
>  {
> @@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	/* watch for changes in status bit */
>  	pmc->addr = &rxdp->wb.qword1.status_error_len;
> 
> -	/*
> -	 * we expect the DD bit to be set to 1 if this descriptor was already
> -	 * written to.
> -	 */
> -	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> -	pmc->mask = rte_cpu_to_le_64(1 <<
> I40E_RX_DESC_STATUS_DD_SHIFT);
> +	/* comparison callback */
> +	pmc->fn = i40e_monitor_callback;
> 
>  	/* registers are 64-bit */
>  	pmc->size = sizeof(uint64_t);
> diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
> index f817fbc49b..d61b32fcee 100644
> --- a/drivers/net/iavf/iavf_rxtx.c
> +++ b/drivers/net/iavf/iavf_rxtx.c
> @@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
>  				rxdid_map[flex_type] :
> IAVF_RXDID_COMMS_OVS_1;
>  }
> 
> +static int
> +iavf_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ]
> __rte_unused)
> +{
> +	const uint64_t m = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> +	/*
> +	 * we expect the DD bit to be set to 1 if this descriptor was already
> +	 * written to.
> +	 */
> +	return (value & m) == m ? -1 : 0;
> +}
> +
>  int
>  iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>  {
> @@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	/* watch for changes in status bit */
>  	pmc->addr = &rxdp->wb.qword1.status_error_len;
> 
> -	/*
> -	 * we expect the DD bit to be set to 1 if this descriptor was already
> -	 * written to.
> -	 */
> -	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
> -	pmc->mask = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> +	/* comparison callback */
> +	pmc->fn = iavf_monitor_callback;
> 
>  	/* registers are 64-bit */
>  	pmc->size = sizeof(uint64_t);
> diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
> index 3f6e735984..5d7ab4f047 100644
> --- a/drivers/net/ice/ice_rxtx.c
> +++ b/drivers/net/ice/ice_rxtx.c
> @@ -27,6 +27,18 @@ uint64_t
> rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
>  uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
>  uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
> 
> +static int
> +ice_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ]
> __rte_unused)
> +{
> +	const uint64_t m = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> +	/*
> +	 * we expect the DD bit to be set to 1 if this descriptor was already
> +	 * written to.
> +	 */
> +	return (value & m) == m ? -1 : 0;
> +}
> +
>  int
>  ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>  {
> @@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	/* watch for changes in status bit */
>  	pmc->addr = &rxdp->wb.status_error0;
> 
> -	/*
> -	 * we expect the DD bit to be set to 1 if this descriptor was already
> -	 * written to.
> -	 */
> -	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
> -	pmc->mask = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> +	/* comparison callback */
> +	pmc->fn = ice_monitor_callback;
> 
>  	/* register is 16-bit */
>  	pmc->size = sizeof(uint16_t);
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index d69f36e977..c814a28cb4 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1369,6 +1369,18 @@ const uint32_t
>  		RTE_PTYPE_INNER_L3_IPV4_EXT |
> RTE_PTYPE_INNER_L4_UDP,
>  };
> 
> +static int
> +ixgbe_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ]
> __rte_unused)
> +{
> +	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> +	/*
> +	 * we expect the DD bit to be set to 1 if this descriptor was already
> +	 * written to.
> +	 */
> +	return (value & m) == m ? -1 : 0;
> +}
> +
>  int
>  ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
>  {
> @@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	/* watch for changes in status bit */
>  	pmc->addr = &rxdp->wb.upper.status_error;
> 
> -	/*
> -	 * we expect the DD bit to be set to 1 if this descriptor was already
> -	 * written to.
> -	 */
> -	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> -	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> +	/* comparison callback */
> +	pmc->fn = ixgbe_monitor_callback;
> 
>  	/* the registers are 32-bit */
>  	pmc->size = sizeof(uint32_t);
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 777a1d6e45..17370b77dc 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev,
> uint16_t rx_queue_id)
>  	return rx_queue_count(rxq);
>  }
> 
> +#define CLB_VAL_IDX 0
> +#define CLB_MSK_IDX 1
> +static int
> +mlx_monitor_callback(const uint64_t value,
> +		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> +	const uint64_t m = opaque[CLB_MSK_IDX];
> +	const uint64_t v = opaque[CLB_VAL_IDX];
> +
> +	return (value & m) == v ? -1 : 0;
> +}
> +
>  int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
>  {
>  	struct mlx5_rxq_data *rxq = rx_queue;
> @@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  		return -rte_errno;
>  	}
>  	pmc->addr = &cqe->op_own;
> -	pmc->val =  !!idx;
> -	pmc->mask = MLX5_CQE_OWNER_MASK;
> +	pmc->opaque[CLB_VAL_IDX] = !!idx;
> +	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
> +	pmc->fn = mlx_monitor_callback;
>  	pmc->size = sizeof(uint8_t);
>  	return 0;
>  }
> diff --git a/lib/eal/include/generic/rte_power_intrinsics.h
> b/lib/eal/include/generic/rte_power_intrinsics.h
> index dddca3d41c..c9aa52a86d 100644
> --- a/lib/eal/include/generic/rte_power_intrinsics.h
> +++ b/lib/eal/include/generic/rte_power_intrinsics.h
> @@ -18,19 +18,38 @@
>   * which are architecture-dependent.
>   */
> 
> +/** Size of the opaque data in monitor condition */
> +#define RTE_POWER_MONITOR_OPAQUE_SZ 4
> +
> +/**
> + * Callback definition for monitoring conditions. Callbacks with this signature
> + * will be used by `rte_power_monitor()` to check if the entering of power
> + * optimized state should be aborted.
> + *
> + * @param val
> + *   The value read from memory.
> + * @param opaque
> + *   Callback-specific data.
> + *
> + * @return
> + *   0 if entering of power optimized state should proceed
> + *   -1 if entering of power optimized state should be aborted
> + */
> +typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
> +		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
>  struct rte_power_monitor_cond {
>  	volatile void *addr;  /**< Address to monitor for changes */
> -	uint64_t val;         /**< If the `mask` is non-zero, location pointed
> -	                       *   to by `addr` will be read and compared
> -	                       *   against this value.
> -	                       */
> -	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
> -	uint8_t size;    /**< Data size (in bytes) that will be used to compare
> -	                  *   expected value (`val`) with data read from the
> +	uint8_t size;    /**< Data size (in bytes) that will be read from the
>  	                  *   monitored memory location (`addr`). Can be 1, 2,
>  	                  *   4, or 8. Supplying any other value will result in
>  	                  *   an error.
>  	                  */
> +	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
> +	                             *   entering power optimized state should
> +	                             *   be aborted.
> +	                             */
> +	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
> +	/**< Callback-specific data */
>  };
> 
>  /**
> diff --git a/lib/eal/x86/rte_power_intrinsics.c
> b/lib/eal/x86/rte_power_intrinsics.c
> index 39ea9fdecd..66fea28897 100644
> --- a/lib/eal/x86/rte_power_intrinsics.c
> +++ b/lib/eal/x86/rte_power_intrinsics.c
> @@ -76,6 +76,7 @@ rte_power_monitor(const struct
> rte_power_monitor_cond *pmc,
>  	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
>  	const unsigned int lcore_id = rte_lcore_id();
>  	struct power_wait_status *s;
> +	uint64_t cur_value;
> 
>  	/* prevent user from running this instruction if it's not supported */
>  	if (!wait_supported)
> @@ -91,6 +92,9 @@ rte_power_monitor(const struct
> rte_power_monitor_cond *pmc,
>  	if (__check_val_size(pmc->size) < 0)
>  		return -EINVAL;
> 
> +	if (pmc->fn == NULL)
> +		return -EINVAL;
> +
>  	s = &wait_status[lcore_id];
> 
>  	/* update sleep address */
> @@ -110,16 +114,11 @@ rte_power_monitor(const struct
> rte_power_monitor_cond *pmc,
>  	/* now that we've put this address into monitor, we can unlock */
>  	rte_spinlock_unlock(&s->lock);
> 
> -	/* if we have a comparison mask, we might not need to sleep at all */
> -	if (pmc->mask) {
> -		const uint64_t cur_value = __get_umwait_val(
> -				pmc->addr, pmc->size);
> -		const uint64_t masked = cur_value & pmc->mask;
> +	cur_value = __get_umwait_val(pmc->addr, pmc->size);
> 
> -		/* if the masked value is already matching, abort */
> -		if (masked == pmc->val)
> -			goto end;
> -	}
> +	/* check if callback indicates we should abort */
> +	if (pmc->fn(cur_value, pmc->opaque) != 0)
> +		goto end;
> 
>  	/* execute UMWAIT */
>  	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
> --
> 2.25.1

DLB changes look good to me

Acked-by: timothy.mcdaniel@intel.com

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v8 4/7] power: remove thread safety from PMD power API's
    2021-07-08 14:13  3%   ` [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-07-08 14:13  3%   ` Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-08 14:13 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus, konstantin.ananyev

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 65910de348..33e66d746b 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -137,6 +137,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison
  @ 2021-07-08 14:13  3%   ` Anatoly Burakov
  2021-07-08 16:56  0%     ` McDaniel, Timothy
  2021-07-08 14:13  3%   ` [dpdk-dev] [PATCH v8 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2021-07-08 14:13 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: ciara.loftus, david.hunt

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index c92e016783..65910de348 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -135,6 +135,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 8d65f287f4..65f325ede1 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..17370b77dc 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
  @ 2021-07-08 10:29  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-08 10:29 UTC (permalink / raw)
  To: Raslan Darawsheh, Andrew Rybchenko
  Cc: Singh, Aman Deep, dev, david.marchand, Olivier Matz

08/07/2021 11:39, Andrew Rybchenko:
> On 7/8/21 12:27 PM, Raslan Darawsheh wrote:
> > Thank you for the review,
> > 
> > Basically it's not used yet since it will break the abi
> > The main usage was in rte_flow item of gtp_psc
> > To replace the current structure with the header definition. And since
> > this will break the abi I'm adding the header definition now but will be
> > used later in rte_flow.
> 
> @Thomas If so, should we accept it in the current release cycle
> or should it simply wait for the code which uses it?

If no need, we can wait next release.


> > ------------------------------------------------------------------------
> > *From:* Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > *Sent:* Thursday, July 8, 2021, 12:23 PM
> > *To:* Raslan Darawsheh; Singh, Aman Deep; dev@dpdk.org
> > *Subject:* Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
> > 
> > Hi Raslan,
> > 
> > On 7/6/21 5:24 PM, Raslan Darawsheh wrote:
> >> Hi Guys,
> >>
> >> Sorry for missing this mail, for some reason it was missed in my inbox, 
> >> This is the link to this rfc:
> >> https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip
> > <https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip>
> > 
> > Thanks for the link. The patch LGTM, but I have only one question left.
> > Where is it used? Are you going to upstream corresponding code in
> > the release cycle?
> > 
> > Andrew.
> > 
> >> Kindest regards,
> >> Raslan Darawsheh
> >>
> >>> -----Original Message-----
> >>> From: dev <dev-bounces@dpdk.org> On Behalf Of Andrew Rybchenko
> >>> Sent: Thursday, July 1, 2021 5:06 PM
> >>> To: Singh, Aman Deep <aman.deep.singh@intel.com>; dev@dpdk.org
> >>> Subject: Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
> >>>
> >>> Hi Raslan,
> >>>
> >>> could you reply, please.
> >>>
> >>> Andrew.
> >>>
> >>> On 6/22/21 10:27 AM, Singh, Aman Deep wrote:
> >>>> Hi Raslan,
> >>>>
> >>>> Can you please provide link to this RFC 38415-g30 I just had some
> >>>> doubt on byte-order conversion as per RFC 1700
> >>>> <https://tools.ietf.org/html/rfc1700 <https://tools.ietf.org/html/rfc1700>>
> >>>>
> >>>> Regards
> >>>> Aman




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH V3] ethdev: add dev configured flag
  @ 2021-07-08  9:56  3%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2021-07-08  9:56 UTC (permalink / raw)
  To: Huisong Li, Thomas Monjalon, Andrew Rybchenko, Yigit, Ferruh
  Cc: dev, Ananyev, Konstantin, Mcnamara, John, Ray Kinsella, Dodji Seketeli

On Wed, Jul 7, 2021 at 11:54 AM Huisong Li <lihuisong@huawei.com> wrote:
>
> Currently, if dev_configure is not called or fails to be called, users
> can still call dev_start successfully. So it is necessary to have a flag
> which indicates whether the device is configured, to control whether
> dev_start can be called and eliminate dependency on user invocation order.
>
> The flag stored in "struct rte_eth_dev_data" is more reasonable than
>  "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
> primary and secondary processes, and can be independently controlled.
> However, the secondary process does not make resource allocations and
> does not call dev_configure(). These are done by the primary process
> and can be obtained or used by the secondary process. So this patch
> adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".
>
> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

As explained in the thread, I added a rather "large" ABI exception
rule so that we can merge this patch.

+; Ignore all changes to rte_eth_dev_data
+; Note: we only cared about dev_configured bit addition, but libabigail
+; seems to wrongly compute bitfields offset.
+; https://sourceware.org/bugzilla/show_bug.cgi?id=28060
+[suppress_type]
+        name = rte_eth_dev_data


*Reminder to ethdev maintainers*: with this exception, we have no
check on rte_eth_dev_data struct changes until 21.11.


Applied, thanks.

-- 
David Marchand


^ permalink raw reply	[relevance 3%]

Results 3401-3600 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2020-04-24  7:07     [dpdk-dev] [PATCH v1 0/2] Use WFE for spinlock and ring Gavin Hu
2021-04-25  5:56     ` [dpdk-dev] " Ruifeng Wang
2021-07-07 14:47       ` Stephen Hemminger
2021-07-08  9:41         ` Ruifeng Wang
2021-07-08 16:58  0%       ` Honnappa Nagarahalli
2021-07-07  5:48     ` [dpdk-dev] [PATCH v4 0/3] " Ruifeng Wang
2021-07-09 18:39  0%   ` Thomas Monjalon
2021-02-25 17:02     [dpdk-dev] [PATCH 1/7] common/octeontx: enable build only on 64bit Linux pbhagavatula
2021-03-25 14:52     ` [dpdk-dev] [PATCH 21.11 v2 0/3] octeontx build only on 64-bit Linux Thomas Monjalon
2021-08-17  8:46  0%   ` David Marchand
2021-04-12 21:53     [dpdk-dev] [PATCH] devtools: test different build types Thomas Monjalon
2021-05-21 15:03     ` David Marchand
2021-07-23 20:26  0%   ` Andrew Rybchenko
2021-08-02 22:45 23% ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
2021-08-08 12:51     ` [dpdk-dev] [PATCH v3 0/5] more build tests Thomas Monjalon
2021-08-08 12:51 23%   ` [dpdk-dev] [PATCH v3 5/5] devtools: test different build types Thomas Monjalon
2021-05-08  8:00     [dpdk-dev] [RFC] lib/ethdev: add dev configured flag Huisong Li
2021-07-07  9:53     ` [dpdk-dev] [PATCH V3] ethdev: " Huisong Li
2021-07-08  9:56  3%   ` David Marchand
2021-05-20 18:42     [dpdk-dev] [PATCH v3] doc: announce API changes for Windows compatibility Dmitry Kozlyuk
2021-07-21 19:55     ` Dmitry Kozlyuk
2021-07-21 19:55       ` [dpdk-dev] [PATCH v4] " Dmitry Kozlyuk
2021-08-02 12:13         ` Thomas Monjalon
2021-08-02 12:45           ` [dpdk-dev] [EXT] " Akhil Goyal
2021-08-02 13:00  3%         ` Dmitry Kozlyuk
2021-08-02 13:48  0%           ` Akhil Goyal
2021-08-02 14:57  0%             ` Tal Shnaiderman
2021-08-02 17:46  0%             ` Thomas Monjalon
2021-06-01  1:56     [dpdk-dev] [PATCH v1 0/2] relative path support for ABI compatibility check Feifei Wang
2021-06-01  1:56     ` [dpdk-dev] [PATCH v1 2/2] devtools: use absolute path for the build directory Feifei Wang
2021-07-28  7:20  0%   ` [dpdk-dev] 回复: " Feifei Wang
2021-08-11  6:17  8% ` [dpdk-dev] [PATCH v2 0/1] relative path support for ABI compatibility check Feifei Wang
2021-08-11  6:17 17%   ` [dpdk-dev] [PATCH v2 1/1] devtools: add " Feifei Wang
2021-06-01  8:41     [dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver Chenbo Xia
2021-07-23  7:39  3% ` Xia, Chenbo
2021-07-23 12:46  3%   ` Ferruh Yigit
2021-07-26  5:56  0%     ` Xia, Chenbo
2021-07-27  8:44  0%       ` Bruce Richardson
2021-07-28 15:32  0%         ` Andrew Rybchenko
2021-07-31 20:44  0%         ` Thomas Monjalon
2021-08-03  1:52  0%           ` Xia, Chenbo
2021-08-03  8:19  0%             ` Thomas Monjalon
2021-07-27 10:58  0% ` Ananyev, Konstantin
2021-06-17  9:17     [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc Raslan Darawsheh
2021-07-08  9:27     ` Raslan Darawsheh
2021-07-08  9:39       ` Andrew Rybchenko
2021-07-08 10:29  0%     ` Thomas Monjalon
2021-06-18 16:36     [dpdk-dev] [PATCH] devtools: script to track map symbols Ray Kinsella
2021-08-04 16:23  5% ` [dpdk-dev] [PATCH v6] " Ray Kinsella
2021-08-04 16:27  5% ` [dpdk-dev] [PATCH v7] " Ray Kinsella
2021-08-06 17:54     ` [dpdk-dev] [PATCH v8 0/2] devtools: scripts to count and track symbols Ray Kinsella
2021-08-06 17:54  5%   ` [dpdk-dev] [PATCH v8 1/2] devtools: script to track map symbols Ray Kinsella
2021-08-06 17:54  5%   ` [dpdk-dev] [PATCH v8 2/2] devtools: script to send notifications of expired symbols Ray Kinsella
2021-08-09 12:53     ` [dpdk-dev] [PATCH v9 0/2] devtools: scripts to count and track symbols Ray Kinsella
2021-08-09 12:53  5%   ` [dpdk-dev] [PATCH v9 1/2] devtools: script to track symbols over releases Ray Kinsella
2021-08-09 12:53  5%   ` [dpdk-dev] [PATCH v9 2/2] devtools: script to send notifications of expired symbols Ray Kinsella
2021-06-18 21:26     [dpdk-dev] [PATCH v10 0/9] eal: Add EAL API for threading Narcisa Ana Maria Vasile
2021-07-30 22:31  3% ` [dpdk-dev] [PATCH v11 00/10] " Narcisa Ana Maria Vasile
2021-08-02 17:32  3%   ` [dpdk-dev] [PATCH v12 " Narcisa Ana Maria Vasile
2021-08-03 19:01  3%     ` [dpdk-dev] [PATCH v13 " Narcisa Ana Maria Vasile
2021-06-19  1:57     [dpdk-dev] [PATCH v2 0/6] Enable the internal EAL thread API Narcisa Ana Maria Vasile
2021-08-18 13:44  4% ` [dpdk-dev] [PATCH v3 " Narcisa Ana Maria Vasile
2021-08-18 13:44  4%   ` [dpdk-dev] [PATCH v3 2/6] eal: add function for control thread creation Narcisa Ana Maria Vasile
2021-06-21  7:35     [dpdk-dev] [RFC PATCH v3 0/3] Add PIE support for HQoS library Liguzinski, WojciechX
2021-07-05  8:04     ` [dpdk-dev] [RFC PATCH v4 " Liguzinski, WojciechX
2021-07-16 12:46  0%   ` Dumitrescu, Cristian
2021-06-22 15:50     [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file Ray Kinsella
2021-06-25  8:08     ` Ferruh Yigit
2021-07-09 15:50  4%   ` Thomas Monjalon
2021-06-22 16:48     [dpdk-dev] [PATCH 0/2] OCTEONTX crypto adapter support Shijith Thotton
2021-06-23 20:53     ` [dpdk-dev] [PATCH v2 " Shijith Thotton
2021-06-23 20:53       ` [dpdk-dev] [PATCH v2 1/2] drivers: add octeontx crypto adapter framework Shijith Thotton
2021-07-15 14:21  5%     ` David Marchand
2021-07-16  8:39  3%       ` [dpdk-dev] [EXT] " Akhil Goyal
2021-07-20 11:58  3%         ` Akhil Goyal
2021-07-20 12:14  0%           ` David Marchand
2021-07-21  9:44  3%             ` Thomas Monjalon
2021-07-21 15:11  4%               ` Brandon Lo
2021-07-22  7:45  0%               ` Akhil Goyal
2021-07-22  9:06  3%                 ` [dpdk-dev] [PATCH] crypto/octeontx: enable build on non Linux OS Shijith Thotton
2021-07-22  9:17  0%                   ` Akhil Goyal
2021-07-22 19:06  0%                     ` Thomas Monjalon
2021-07-22 19:08  3%                       ` Thomas Monjalon
2021-07-22 20:20  3%                         ` Brandon Lo
2021-07-22 20:32  0%                           ` Thomas Monjalon
2021-06-23  0:03     [dpdk-dev] [PATCH v5 2/2] bus/auxiliary: introduce auxiliary bus Xueming Li
2021-06-25 11:47     ` [dpdk-dev] [PATCH v6 " Xueming Li
2021-08-04 10:00       ` Kinsella, Ray
     [not found]         ` <DM4PR12MB5373DBD9E73E5E0E8505C129A1F19@DM4PR12MB5373.namprd12.prod.outlook.com>
     [not found]           ` <97d5d1b3-40c3-09ac-2978-83c984b30af0@ashroe.eu>
     [not found]             ` <DM4PR12MB53736410D2C07101F872363EA1F19@DM4PR12MB5373.namprd12.prod.outlook.com>
2021-08-04 12:14  3%           ` Kinsella, Ray
2021-08-04 13:00  3%             ` Xueming(Steven) Li
2021-08-04 13:12  5%               ` Thomas Monjalon
2021-08-04 13:53  0%                 ` Kinsella, Ray
2021-08-04 14:13  4%                   ` Thomas Monjalon
2021-06-24 10:28     [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing Akhil Goyal
2021-07-06 10:56     ` Ananyev, Konstantin
2021-07-06 12:27       ` Nithin Dabilpuram
2021-07-06 12:42         ` Ananyev, Konstantin
2021-07-06 12:58           ` Nithin Dabilpuram
2021-07-06 14:07             ` Ananyev, Konstantin
2021-07-07  9:07               ` Nithin Dabilpuram
2021-07-07  9:59                 ` Ananyev, Konstantin
2021-07-07 11:22                   ` Nithin Dabilpuram
2021-07-10 12:57                     ` Ananyev, Konstantin
2021-07-12 17:01                       ` Nithin Dabilpuram
2021-07-13 12:33  3%                     ` Ananyev, Konstantin
2021-07-13 14:08  0%                       ` Ananyev, Konstantin
2021-07-13 15:58  0%                         ` Nithin Dabilpuram
2021-07-14 11:09  0%                           ` Ananyev, Konstantin
2021-07-14 13:29  0%                             ` Nithin Dabilpuram
2021-07-14 17:28  0%                               ` Ananyev, Konstantin
2021-06-29 16:00     [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs Ray Kinsella
2021-07-01 10:38     ` [dpdk-dev] [PATCH v3] doc: policy on the " Ray Kinsella
2021-07-09  6:16  0%   ` Jerin Jacob
2021-07-09 19:15  3%     ` Tyler Retzlaff
2021-07-11  7:22  0%       ` Jerin Jacob
2021-08-03 14:12  3%         ` Kinsella, Ray
2021-08-03 16:44 23% ` [dpdk-dev] [PATCH v4] " Ray Kinsella
2021-08-04  9:34 23% ` [dpdk-dev] [PATCH v5] " Ray Kinsella
2021-08-04 10:39  3%   ` Thomas Monjalon
2021-08-04 11:49  0%     ` Kinsella, Ray
2021-07-02 13:18     [dpdk-dev] [PATCH] dmadev: introduce DMA device library Chengwen Feng
2021-07-11  9:25     ` [dpdk-dev] [PATCH v2] " Chengwen Feng
2021-07-12 12:05  3%   ` Bruce Richardson
2021-07-12 15:50  3%   ` Bruce Richardson
2021-07-13  9:07  0%     ` Jerin Jacob
2021-07-13 14:19  3%   ` Ananyev, Konstantin
2021-07-13 14:28  0%     ` Bruce Richardson
2021-07-13 12:27     ` [dpdk-dev] [PATCH v3] " Chengwen Feng
2021-07-13 13:06  3%   ` fengchengwen
2021-07-13 13:37  0%     ` Bruce Richardson
2021-07-15  6:44  0%       ` Jerin Jacob
2021-07-19  3:29     ` [dpdk-dev] [PATCH v6] " Chengwen Feng
2021-07-19  6:21  3%   ` Jerin Jacob
2021-07-07 10:48     [dpdk-dev] [PATCH v7 0/7] Enhancements for PMD power management Anatoly Burakov
2021-07-08 14:13     ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2021-07-08 14:13  3%   ` [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
2021-07-08 16:56  0%     ` McDaniel, Timothy
2021-07-08 14:13  3%   ` [dpdk-dev] [PATCH v8 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
2021-07-09 15:53       ` [dpdk-dev] [PATCH v9 0/8] Enhancements for PMD power management Anatoly Burakov
2021-07-09 15:53  3%     ` [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
2021-07-09 16:00  3%       ` Anatoly Burakov
2021-07-09 15:53  3%     ` [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
2021-07-09 16:00  3%       ` Anatoly Burakov
2021-07-09 16:08         ` [dpdk-dev] [PATCH v10 0/8] Enhancements for PMD power management Anatoly Burakov
2021-07-09 16:08  3%       ` [dpdk-dev] [PATCH v10 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
2021-07-09 16:08  3%       ` [dpdk-dev] [PATCH v10 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
2021-07-08 19:21  3% [dpdk-dev] RFC enabling dll/dso for dpdk on windows Tyler Retzlaff
2021-07-08 20:49  3% ` Dmitry Kozlyuk
2021-07-09  1:03  2%   ` Tyler Retzlaff
2021-07-16  9:40  4%     ` Dmitry Kozlyuk
2021-07-09 15:19     [dpdk-dev] [PATCH 1/3] bitrate: change reg implementation to match API description Kevin Traynor
2021-07-09 15:19  3% ` [dpdk-dev] [PATCH 3/3] bitrate: promote rte_stats_bitrate_free() to stable Kevin Traynor
2021-07-12  8:02  4% [dpdk-dev] [PATCH v1] doc: update atomic operation deprecation Joyce Kong
2021-07-17 18:47  0% ` Honnappa Nagarahalli
2021-07-23  9:49  4% ` [dpdk-dev] [PATCH v2] " Joyce Kong
2021-07-12 16:17  3% [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name Andrew Rybchenko
2021-07-19  6:58  0% ` Xueming(Steven) Li
2021-07-19  8:46  0%   ` Andrew Rybchenko
2021-07-19 11:54  0%     ` Xueming(Steven) Li
2021-07-19 12:36  0%       ` Andrew Rybchenko
2021-07-19 12:50  0%         ` Xueming(Steven) Li
2021-07-20  8:59  0%           ` Andrew Rybchenko
2021-07-29  4:13  0%             ` Xueming(Steven) Li
2021-08-01  8:40  0%               ` Andrew Rybchenko
2021-08-01 14:25  0%                 ` Xueming(Steven) Li
2021-07-29  4:20  0% ` Xueming(Steven) Li
2021-08-01  8:50  0%   ` Andrew Rybchenko
2021-08-01 14:15  0%     ` Xueming(Steven) Li
2021-08-18 14:00  3% ` [dpdk-dev] [PATCH v2] " Andrew Rybchenko
2021-07-13 13:35  3% [dpdk-dev] [PATCH 00/10] new features for ipsec and security libraries Radu Nicolau
2021-07-13 13:35     ` [dpdk-dev] [PATCH 01/10] security: add support for TSO on IPsec session Radu Nicolau
2021-07-27 18:34  3%   ` [dpdk-dev] [EXT] " Akhil Goyal
2021-07-29  8:37  0%     ` Nicolau, Radu
2021-07-31 17:50  0%     ` Akhil Goyal
2021-07-13 20:12  3% [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe Stephen Hemminger
2021-07-19 17:15  0% ` Tyler Retzlaff
2021-07-19 22:00  3%   ` Stephen Hemminger
2021-07-20 13:26  0%     ` Thomas Monjalon
2021-07-23  0:52  8% ` [dpdk-dev] [PATCH v2] " Stephen Hemminger
2021-07-23 15:45  8% ` [dpdk-dev] [PATCH v3] " Stephen Hemminger
2021-07-24  7:58  0%   ` Thomas Monjalon
2021-07-24 23:50  0%     ` Stephen Hemminger
2021-07-14 15:11  4% [dpdk-dev] Minutes of Technical Board Meeting, 2021-06-30 Aaron Conole
2021-07-14 15:15  4% [dpdk-dev] Minutes of Technical Board Meeting, 2021-06-16 Thomas Monjalon
2021-07-15  9:29  5% [dpdk-dev] Techboard - minutes of meeting 2021-07-14 Bruce Richardson
2021-07-15 11:33     [dpdk-dev] [PATCH v3] app/testpmd: fix testpmd doesn't show RSS hash offload Jie Wang
2021-07-15 11:57     ` [dpdk-dev] [PATCH v4] " Jie Wang
2021-07-15  4:53       ` Li, Xiaoyun
2021-07-16  8:30         ` Li, Xiaoyun
2021-07-16  8:52  3%       ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
     [not found]             ` <DM8PR11MB5639C757A790F65CBFB647C2D1E19@DM8PR11MB5639.namprd11.prod.outlook.com>
2021-07-19 16:18  0%           ` Ferruh Yigit
2021-07-22 11:03  0%             ` Andrew Rybchenko
2021-08-09  8:53  0%               ` Ferruh Yigit
2021-07-15 22:28  4% [dpdk-dev] DPDK Release Status Meeting 15/07/2021 Mcnamara, John
2021-07-16 14:51  5% [dpdk-dev] Minutes of Technical Board Meeting 2021-06-02 Stephen Hemminger
2021-07-16 17:02     [dpdk-dev] [PATCH] eventdev: configure the Rx event buffer size Ganapati Kundapura
2021-07-19  6:43     ` Jerin Jacob
2021-07-19 15:26  3%   ` Kundapura, Ganapati
2021-07-19 16:13  3%     ` Jerin Jacob
2021-07-20  5:46  3% [dpdk-dev] [PATCH 0/2] Improvements to rte_security Anoob Joseph
2021-07-22 20:22  3% [dpdk-dev] DPDK Release Status Meeting 22/07/2021 Thomas Monjalon
2021-07-23  7:02     [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11 Kundapura, Ganapati
2021-07-26 13:04     ` Kundapura, Ganapati
2021-07-28  6:08  4%   ` Jerin Jacob
2021-07-28  6:23  4%     ` Kundapura, Ganapati
2021-07-30 11:17  0%       ` Jerin Jacob
2021-07-27  3:41     [dpdk-dev] [RFC] ethdev: change queue release callback Xueming Li
2021-07-28  7:40     ` Andrew Rybchenko
2021-08-09 14:39       ` Singh, Aman Deep
2021-08-09 15:31  4%     ` Ferruh Yigit
2021-08-10  8:03  3%       ` Xueming(Steven) Li
2021-08-10  8:54  0%         ` Ferruh Yigit
2021-08-10  9:07  0%           ` Xueming(Steven) Li
2021-08-11 11:57  0%             ` Ferruh Yigit
2021-08-11 12:13  0%               ` Xueming(Steven) Li
2021-08-12 14:29  0%                 ` Xueming(Steven) Li
2021-07-27 17:36     [dpdk-dev] [PATCH] doc: announce security API changes for Inline IPsec Nithin Dabilpuram
2021-07-30 22:16  3% ` Thomas Monjalon
2021-08-03  2:11  3%   ` Nithin Dabilpuram
2021-07-31 18:13  8% [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements Akhil Goyal
2021-07-31 18:13  3% ` [dpdk-dev] [PATCH 1/4] cryptodev: remove LIST_END enumerators Akhil Goyal
2021-07-31 18:13  3% ` [dpdk-dev] [PATCH 4/4] security: add reserved bitfields Akhil Goyal
2021-07-31 18:17  4% ` [dpdk-dev] [PATCH 0/4] cryptodev and security ABI improvements Akhil Goyal
2021-08-01 10:22     [dpdk-dev] [PATCH 1/2] ethdev: announce flow API action PORT_ID changes Andrew Rybchenko
2021-08-01 10:57     ` Eli Britstein
2021-08-01 12:03  3%   ` Andrew Rybchenko
2021-08-01 12:23  0%     ` Ori Kam
2021-08-01 12:43  0%       ` Andrew Rybchenko
2021-08-01 12:56  0%         ` Ori Kam
2021-08-01 13:23  0%           ` Andrew Rybchenko
2021-08-01 16:13  0%             ` Ori Kam
2021-08-01 20:09  0%               ` Andrew Rybchenko
2021-08-02  7:28  0%                 ` Ori Kam
2021-08-02 10:11  0%                   ` Andrew Rybchenko
2021-08-02 12:33  3% [dpdk-dev] [dpdk-announce] URGENT: review of deprecation notices before closing 21.08 Thomas Monjalon
2021-08-02 16:03 10% [dpdk-dev] [PATCH] doc: announce: make rte intr handle internal Harman Kalra
2021-08-02 19:20  0% ` Andrew Rybchenko
2021-08-03  2:37  0% ` Xia, Chenbo
2021-08-03  4:05  0%   ` Jerin Jacob
2021-08-04 14:22  0%     ` Thomas Monjalon
2021-08-02 17:32     [dpdk-dev] [PATCH] doc: announce changes to eventdev library pbhagavatula
2021-08-02 21:09     ` [dpdk-dev] [PATCH v2] " pbhagavatula
2021-08-03  4:12  3%   ` Jerin Jacob
2021-08-03  8:32  0%     ` Mattias Rönnblom
2021-08-04  5:57  0%     ` Jayatheerthan, Jay
2021-08-04  6:06  0%     ` Gujjar, Abhinandan S
2021-08-05 14:22  0%     ` Thomas Monjalon
2021-08-03 11:44  3% [dpdk-dev] [PATCH] doc: announce cryptodev-PMD interface as internal Akhil Goyal
2021-08-03 19:25  0% ` Ajit Khaparde
2021-08-04  6:44  0%   ` Matan Azrad
2021-08-04  8:44  0%     ` Hemant Agrawal
2021-08-04 14:35  0%       ` Thomas Monjalon
2021-08-03 11:55     [dpdk-dev] [PATCH] doc: announce restructuring of crypto session structs Akhil Goyal
2021-08-03 12:01     ` [dpdk-dev] [PATCH v2] " Akhil Goyal
2021-08-05 13:57       ` Zhang, Roy Fan
2021-08-05 14:09         ` Akhil Goyal
2021-08-05 14:53           ` Zhang, Roy Fan
2021-08-05 15:03  3%         ` Akhil Goyal
2021-08-05 21:57  7% [dpdk-dev] [PATCH v1] doc: update release notes for 21.08 John McNamara
2021-08-08 17:46  3% [dpdk-dev] [dpdk-announce] DPDK 21.08 released Thomas Monjalon
2021-08-08 17:50  0% ` St Leger, Jim
2021-08-08 19:26 11% [dpdk-dev] [PATCH] version: 21.11-rc0 Thomas Monjalon
2021-08-12 14:36  0% ` Ferruh Yigit
2021-08-12 18:57  0%   ` [dpdk-dev] [EXT] " Akhil Goyal
2021-08-17  6:34  4% ` [dpdk-dev] " David Marchand
2021-08-17 12:04  4%   ` [dpdk-dev] [dpdk-ci] " Lincoln Lavoie
2021-08-17 15:19  0%     ` David Marchand
2021-08-17 16:02  0%       ` Ali Alnubani
2021-08-10 18:27  5% [dpdk-dev] [Bug 788] i40e: 16BYTE_RX_DESC build broken on FreeBSD-13 bugzilla
2021-08-11 20:46     [dpdk-dev] [PATCHv2] include: fix sys/queue.h William Tu
2021-08-12 20:05     ` [dpdk-dev] [PATCHv3] " William Tu
2021-08-12 21:58  3%   ` Dmitry Kozlyuk
2021-08-13  1:02  1%   ` [dpdk-dev] [PATCHv4] eal: remove sys/queue.h from public headers William Tu
2021-08-13  1:11  0%     ` Stephen Hemminger
2021-08-13  1:36  0%       ` William Tu
2021-08-13  3:36  1%     ` [dpdk-dev] [PATCHv5] " William Tu
2021-08-13 18:59  0%       ` Dmitry Kozlyuk
2021-08-14  2:51  1%       ` [dpdk-dev] [PATCH v6] " William Tu
2021-08-18 23:26  1%         ` [dpdk-dev] [PATCH v7] " William Tu
2021-08-13 16:51     [dpdk-dev] [PATCH v1 0/6] bbdev update related to CRC usage Nicolas Chautru
2021-08-13 16:51  4% ` [dpdk-dev] [PATCH v1 1/6] bbdev: add capability for CRC16 check Nicolas Chautru
     [not found]     <e600e472-2b39-7f07-d20e-9d6fe8e6d515@intel.com>
2021-08-16  9:34  3% ` [dpdk-dev] Minutes of Technical Board Meeting, 2021-08-11 Ferruh Yigit
2021-08-18  9:07     [dpdk-dev] [PATCH 0/4] net/mlx5: implicit mempool registration Dmitry Kozlyuk
2021-08-18  9:07  4% ` [dpdk-dev] [PATCH 2/4] mempool: add non-IO flag Dmitry Kozlyuk
2021-08-18 13:44     [dpdk-dev] [PATCH v3 6/6] Allow choice between internal EAL thread API and external lib Narcisa Ana Maria Vasile
2021-08-18 21:19  4% ` [dpdk-dev] [PATCH v4 0/6] Enable the internal EAL thread API Narcisa Ana Maria Vasile
2021-08-18 21:19  4%   ` [dpdk-dev] [PATCH v4 2/6] eal: add function for control thread creation Narcisa Ana Maria Vasile

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).